Purpose

Build a long-lived, reusable hunting corpus. CT-log queries and Wayback archive crawls are slow and rate-limited, so we cache them in SQLite and let the other tools query the cache locally.

Output

  • A SQLite database (default ~/.geistscope/corpus.sqlite) with tables for domains, subdomains, and observed paths.

CLI

corpus-builder ingest --target acme.example.com
corpus-builder query  --domain acme.example.com --kind paths

Notes

  • Corpus is engagement-agnostic. Use it to bootstrap a new engagement’s wordlists from real prior data instead of generic lists.
  • Source attributions are stored so the operator can re-check origins if a hit looks suspicious.