What a Flat List Can’t Answer
After a full recon and fuzz run, you have a findings directory full of individual markdown files. Each finding knows its target host and its severity. But the pipeline also accumulates information that doesn’t map to a single finding:
- A JWT found in a crawled response that’s also referenced by a parameter in a different endpoint
- An identity (authenticated user) that shows up in both an IDOR finding and a mass-assignment finding on different hosts
- A replay chain that connects a login request to a token extraction to an admin API call
None of that structure lives in the flat findings/ directory. security-graph
is where it goes.
Node Kinds
The graph recognizes eleven node kinds, each representing a class of security-relevant entity:
| Kind | What it represents |
|---|---|
host | A resolved hostname or IP in scope |
url | A specific URL discovered during crawl or fuzz |
parameter | A named input — query param, body field, header, cookie |
identity | An authenticated principal used during testing |
jwt | A captured JSON Web Token |
session | A session identifier or session cookie |
cookie | A named HTTP cookie |
api | An API endpoint or resource |
finding | A link back to a finding in the engagement directory |
technology | A tech stack element — framework, server, library |
replay_chain | A sequence of requests that together demonstrate a vulnerability |
Nodes have a stable content-addressed ID: SHA-256 of kind:label, hex-encoded.
That means the same node (same host, same parameter name) gets the same ID across
runs, and inserting it twice is idempotent.
Edges and Relationships
Edges connect nodes with a typed relationship. Some examples:
host → url(contains)url → parameter(has_parameter)parameter → finding(triggers)identity → jwt(holds)jwt → api(authenticates)replay_chain → finding(demonstrates)
The file-backed adapter stores nodes and edges as JSONL — one JSON object per line, appended on insert. The store keeps a hash set of seen node IDs in memory to skip duplicate writes without reading the file back.
Querying the Graph
The current store supports three read operations:
Neighbors. Given a node ID and an optional relationship filter, return the nodes connected to it (up to a configurable limit, capped at 100):
let neighbors = store.neighbors(node_id, Some("triggers"), 50)?;
Node lookup. Given a node ID, return the full node record including all metadata attached at insertion time.
Node list by kind. Return all nodes of a given kind — all findings, all parameters, all identities. Useful for building a summary view.
The harness (Part 10) exposes these as endpoints so an AI operator can query the graph via JSON without reading JSONL directly.
Why JSONL Instead of a Graph Database
The engagement directory is already the source of truth. Findings, recon output, replay evidence — it’s all flat files. Introducing a graph database (Neo4j, DGraph) would mean another process to manage, another dependency to install, and a store that lives outside the engagement directory.
JSONL keeps everything in the same place. The graph file is human-readable, can be committed alongside the engagement, and doesn’t require any external process. The tradeoff is that queries are O(n) in the number of edges — acceptable for single- engagement scale where the graph has hundreds of nodes, not millions.
The FileGraphStore interface mirrors a future GraphStore trait, so swapping
in a Postgres-backed implementation later is a boundary change rather than a rewrite.
How the Graph Gets Populated
Nodes and edges are inserted by the harness endpoints during tool dispatch.
When mg-crawl discovers a URL and the harness writes the result, it inserts
a url node connected to the host node. When mg-fuzz confirms a finding,
the harness inserts a finding node and edges from the relevant parameter and
URL. When mg-recopilot identifies an exploit primitive in decompiled code, that
goes in as a finding node connected to a technology node representing the
binary.
The graph accumulates during an engagement rather than being computed at the end. By the time you’re ready to write a report, the structure is already there.
Part 10 covers mg-harness: the JSON dispatcher that gives AI operators
typed, risk-controlled access to every tool endpoint in the GeistScope pipeline.