mg-crawl

Purpose

Walk a target’s HTML and JavaScript and produce an inventory of pages, URLs, endpoints, parameters, GraphQL signals, internal-host references, vulnerable library hints, and redacted secret candidates. Authenticated where session headers exist.

Output (under `crawl/<host>/`)

pages/<sha256>.html, js/<sha256>.js — content-addressed bodies.
index.json — URL → content-hash map.
endpoints.json — enriched API rows: { url, method, source, body_format, params, graphql, ... }.
secrets.json — regex-matched secret candidates with SHA-256 fingerprints (no plaintext).
internal-refs.json — *.internal, *.corp, RFC1918 references for SSRF follow-up.
vulnerable-libraries.json — embedded library versions with CVE hints.
graphql-candidates.json — GraphQL signals. graphql-schema.json is written when a bounded in-scope introspection POST succeeds.

CLI

mg-crawl acme-bounty https://www.acme.example.com
mg-crawl acme-bounty https://www.acme.example.com https://api.acme.example.com --depth 4

Notes

Authenticated mode uses session.json env-var-backed headers when present. Secrets never appear in audit.log or crawl output.
Cross-host absolute URLs found in JavaScript are stored in internal-refs.json, never injected into the active-endpoint rows.
GraphQL introspection is attempted only against in-scope hosts and only when JS signals warrant it.

Purpose

Output (under crawl/<host>/)

CLI

Notes

Output (under `crawl/<host>/`)