Purpose
Build a long-lived, reusable hunting corpus. CT-log queries and Wayback archive crawls are slow and rate-limited, so we cache them in SQLite and let the other tools query the cache locally.
Output
- A SQLite database (default
~/.geistscope/corpus.sqlite) with tables for domains, subdomains, and observed paths.
CLI
corpus-builder ingest --target acme.example.com
corpus-builder query --domain acme.example.com --kind paths
Notes
- Corpus is engagement-agnostic. Use it to bootstrap a new engagement’s wordlists from real prior data instead of generic lists.
- Source attributions are stored so the operator can re-check origins if a hit looks suspicious.