Purpose
Walk a target’s HTML and JavaScript and produce an inventory of pages, URLs, endpoints, parameters, GraphQL signals, internal-host references, vulnerable library hints, and redacted secret candidates. Authenticated where session headers exist.
Output (under crawl/<host>/)
pages/<sha256>.html,js/<sha256>.js— content-addressed bodies.index.json— URL → content-hash map.endpoints.json— enriched API rows:{ url, method, source, body_format, params, graphql, ... }.secrets.json— regex-matched secret candidates with SHA-256 fingerprints (no plaintext).internal-refs.json—*.internal,*.corp, RFC1918 references for SSRF follow-up.vulnerable-libraries.json— embedded library versions with CVE hints.graphql-candidates.json— GraphQL signals.graphql-schema.jsonis written when a bounded in-scope introspection POST succeeds.
CLI
mg-crawl acme-bounty https://www.acme.example.com
mg-crawl acme-bounty https://www.acme.example.com https://api.acme.example.com --depth 4
Notes
- Authenticated mode uses
session.jsonenv-var-backed headers when present. Secrets never appear inaudit.logor crawl output. - Cross-host absolute URLs found in JavaScript are stored in
internal-refs.json, never injected into the active-endpoint rows. - GraphQL introspection is attempted only against in-scope hosts and only when JS signals warrant it.