corpus-builder

Purpose

Build a long-lived, reusable hunting corpus. CT-log queries and Wayback archive crawls are slow and rate-limited, so we cache them in SQLite and let the other tools query the cache locally.

Output

A SQLite database (default ~/.geistscope/corpus.sqlite) with tables for domains, subdomains, and observed paths.

CLI

corpus-builder ingest --target acme.example.com
corpus-builder query  --domain acme.example.com --kind paths

Notes

Corpus is engagement-agnostic. Use it to bootstrap a new engagement’s wordlists from real prior data instead of generic lists.
Source attributions are stored so the operator can re-check origins if a hit looks suspicious.