What Recon Actually Is
Before you can find vulnerabilities in a web application, you need to know what you’re looking at. “Target: target.example.com” is a domain name. What it resolves to, what services are running on each host, what software stack those services use — that’s what recon tells you.
In a professional engagement, recon is where you figure out the actual attack surface.
The main domain might be a static marketing page with nothing interesting. The subdomain
api.target.example.com running an outdated version of Spring Boot with Actuator
endpoints exposed — that’s where the findings are.
GeistScope’s recon pipeline has four stages, each implemented as a standalone Rust binary that can also run independently:
subdomain-enum— find all the subdomainsmg-fingerprint— identify what technology each host runsmg-scan— find open ports on each discovered hostmg-recon— orchestrate all three stages, producesummary.json
Stage 1: Subdomain Enumeration
Subdomains are discovered two ways, and the tool does both:
Passive — Certificate Transparency logs:
Every TLS certificate issued for a domain is logged publicly by certificate authorities.
crt.sh aggregates these logs and exposes an API. If api.target.example.com ever
had a cert issued, it appears in CT logs regardless of whether it’s still resolvable.
subdomain-enum target.example.com --mode passive
Active — DNS brute force:
A wordlist of common subdomain prefixes (admin, api, dev, staging, mail,
auth, internal, …) is tried as DNS queries. Anything that resolves gets added
to the list. This catches hosts that were never issued a TLS cert.
subdomain-enum target.example.com --mode active --wordlist common.txt
Both at once, with results merged and deduplicated:
subdomain-enum target.example.com --mode all
Entries found by both passive and active are tagged ct_log+brute in the output —
a good signal that the host is well-established and worth attention.
The async implementation fires up to --concurrency (default 100) DNS resolution
tasks simultaneously using Tokio’s JoinSet. Each subdomain name gets resolved to
IPs concurrently, and the results are merged into a deduplicated hashmap.
When used with an engagement, out-of-scope hosts are dropped before output and
before the JSON is written to recon/subdomain-enum.json.
Stage 2: Tech Stack Fingerprinting
Knowing a host exists isn’t the same as knowing what it is. mg-fingerprint sends
an HTTP request to a URL and classifies the response:
mg-fingerprint https://api.target.example.com --engagement target-bounty
It identifies things like:
- Server software from the
Serverheader (nginx, Apache, IIS, Caddy) - Application frameworks from headers like
X-Powered-By, cookie names (PHPSESSID→ PHP,JSESSIONID→ Java), response patterns - CDNs from headers like
CF-Ray(Cloudflare),X-Cache(various),Via - Security header presence — or absence — at a glance
The fingerprint is written into recon/fingerprint.json as a map keyed by hostname,
so running fingerprint on multiple hosts accumulates in one file without overwriting
earlier entries.
Why does this matter? Because the attack surface depends entirely on what’s running.
A Java application with Spring Boot might have Actuator endpoints at /actuator/health,
/actuator/env, /actuator/mappings — all of which leak internal information and
have been the source of real CVEs. An nginx proxy in front of a Node.js app might
have path normalization quirks. A React frontend calling a Laravel API has a
completely different attack surface than a Rails monolith.
You can’t know where to look without knowing what’s there.
Stage 3: Port Scanning
I wrote about mg-scan in an earlier post on the port scanner itself, but in context
of the recon pipeline: it scans each discovered host for open ports and maps them to
services.
The scan output records which ports are open, what service is likely running on each
(by well-known port number), and any banner text grabbed from the initial connection.
A banner is the first bytes a service sends when you connect — SSH sends a version
string, SMTP says 220 mail.target.example.com ESMTP, HTTP/1.1 servers send their
response headers.
Banners matter because they sometimes include version numbers. A version number for a known service is the first thing you cross-reference against published CVEs.
The scanner supports deliberate evasion: randomised port order breaks sequential IDS signatures, configurable delay and jitter control probe rate, and an optional source port binding flag lets you send from port 53 or 80 — which occasionally bypasses naive firewall rules that permit traffic from those ports without inspecting where it’s going.
The Orchestrator: mg-recon
Running three tools manually against a target with 47 subdomains is tedious.
mg-recon runs all four stages in sequence, in-process, with one command:
mg-recon target-bounty
What that does, in order:
- Load the engagement, verify it has a
scope.json - Run subdomain enumeration — skip if
recon/subdomain-enum.jsonalready exists - Run fingerprinting on every HTTP-accessible host — skip already-fingerprinted hosts
- Run port scanning on each host — skip if
recon/mg-scan.jsonis current - Merge all results into
recon/summary.json
The skip logic is the key design decision here. Recon pipelines take time.
Subdomain enumeration against a large program might take several minutes.
Port scanning 47 hosts across a 1024-port range is not instant.
If you run mg-recon and the connection drops partway through, you can restart it
and it picks up where it left off. Only stages with missing output files re-run.
Pass --force to override.
The summary.json output is the canonical input to every downstream tool:
mg-probe, ai-prioritize, mg-crawl. It contains a record for each discovered host:
{
"hostname": "api.target.example.com",
"ips": ["203.0.113.42"],
"source": "ct_log+brute",
"http_accessible": true,
"fingerprint": { "server": "nginx", "frameworks": ["node.js"] },
"open_ports": [80, 443, 8080],
"services": ["http", "https", "http-alt"]
}
In-Process vs Subprocess
An earlier design had mg-recon spawning subprocesses — calling the individual tool
binaries as shell commands. That works but has friction: you need the binaries
installed, you parse stdout, error handling is awkward.
The refactor moved to calling the library functions directly in-process.
subdomain-enum, mg-scan, and fingerprint are each a library crate
(subdomain_enum, mg_scan, fingerprint) with a binary entrypoint that just calls
the library. mg-recon depends on the library crates and calls the same functions.
This means:
- No subprocess overhead
- Shared types — the same
SubdomainEntrystruct that the library produces is whatmg-reconreads, with the compiler verifying the shape matches - Richer error handling —
anyhow::Resultpropagates properly rather than getting serialized to stderr and parsed out of text
The individual binaries still exist and work standalone. The library crate structure
just gives mg-recon direct access to the same code.
Running It
# Initialize an engagement
mg-engagement init target-bounty \
--target target.example.com \
--platform hackerone
# Set scope
mg-engagement scope-add target-bounty "*.target.example.com"
# Run full recon pipeline
mg-recon target-bounty --ports 1-1024 --concurrency 100
# See what was found
cat engagements/target-bounty/recon/summary.json | jq '.hosts[] | .hostname'
After mg-recon finishes, you have a complete picture of the attack surface:
every subdomain, every open port, every tech stack. That’s the input to the next
phase — crawling, probing, and finding vulnerabilities.
Part 3 covers mg-crawl and mg-probe: crawling the application surface and
checking security posture without sending a single attack payload.