Purpose

mg-harness is the safe adapter between an AI agent (or a human operator working interactively) and the GeistScope CLIs. Callers send a JSON invocation that names an endpoint and arguments; the harness validates types, applies risk policy, checks scope, dispatches to a Rust library or subprocess, and returns a bounded JSON result.

It exists so an AI assistant can drive an engagement without raw shell access, and so a human can do the same through one consistent interface.

Two modes

# Single-shot — read one Invocation from stdin (or --input file), print one result
echo '{"endpoint":"engagement.status","engagement":"acme-bounty"}' | mg-harness dispatch --pretty
mg-harness dispatch --input invocation.json --pretty

# Chat REPL — interactive coding-agent loop pinned to one engagement
mg-harness chat acme-bounty --backend ollama --model qwen2.5-coder
mg-harness chat acme-bounty --backend ollama --model qwen2.5-coder --tool-profile advanced
mg-harness chat acme-bounty --backend ollama --model qwen2.5-coder --tool-profile lab
mg-harness chat acme-bounty --backend openai --base-url http://localhost:1234/v1 --model llama-3-70b
mg-harness chat acme-bounty --backend anthropic --model claude-opus-4-7

Endpoint contract

Every endpoint declares a risk class:

  • read_only — no network, no state change.
  • passive_remote — outbound HTTP to third-party intel services, no target contact.
  • low_active — sends a small number of bounded requests to in-scope targets.
  • high_active — bulk fuzzing, auth probes, exploitation payloads; requires confirmed: true.
  • state_change — writes engagement state (e.g. records consent, imports traffic).
  • destructive — post-exploitation; blocked unless the chat REPL is in --unsafe-mode or the JSON caller is explicitly authorized.

90+ endpoints are registered today. The core control surface:

EndpointRiskWhat it does
endpoint.registryreadList every registered endpoint with its risk class and status.
engagement.open / engagement.statusreadWorkspace metadata and output-file summary.
scope.checkreadTest a host or URL against scope.json.
recon.runhigh_activeRun the mg-recon pipeline after confirmation.
graph.summary / graph.neighborsreadRead the local security graph with bounded sampling.
request.import / request.search / request.replaymixedBring HAR / Burp / Caido traffic in; search the corpus; replay one.
finding.create / finding.readread / stateScoped finding creation and bounded reads.
chain.readreadBounded read of recon/chain-analysis.{md,json}.
report.generate / report.disclosereadRun mg-report generate / disclose.
re.analyze / re.readreadDrive mg-recopilot.
artifact.auditpassive_remoteRun the merged mg-artifact-audit analyzers.
aifuzz.consent / aifuzz.runstate / high_activeRecord consent then run mg-aifuzz.
exploit.scaffoldreadScaffold an mg-exploitgen tree.
session.set / session.get_headersstate / readManage env-var-backed session profiles; return only redacted header metadata.

The 67 subprocess-dispatched tool endpoints (xss.scan, sqli.scan, jwt.analyze, tls.scan, aws.chain, graphql.scan, shodan.lookup, takeover.scan, …) all live in one SUBPROCESS_TOOLS const slice that pairs each endpoint with its binary, risk class, and short description. Dispatch, registry, and binary lookup all read from that single row, so adding a new tool is a one-line change. Invoke endpoint.registry to enumerate the full list at runtime. The six retired artifact binaries are the exception to the one-endpoint/one-binary mental model: their legacy endpoints now share mg-artifact-audit with subcommand routing, and artifact.audit is the new pack-level entry point.

endpoint.registry also now includes pack/profile metadata: every endpoint has a pack such as reconx, vuln_scan, identity_audit, protocol_audit, cloud_audit, artifact_audit, eventing, or redteam_lab; an exposure tier such as default_profile, advanced_profile, lab_only, service_internal, or retired_pending_pack; and, for tools being collapsed, a repurpose note. That metadata now drives the chat catalog via --tool-profile: default exposes only default-profile endpoints; advanced adds explicit OOB/high-active helper surfaces; lab includes lab-only scaffolding while still requiring --unsafe-mode for destructive endpoints. That lets the chat UI and future TUI expose fewer high-level packs by default while retaining specialist binaries behind explicit operator intent. Sharp tools are repurposed before pruning: for example brute.run moves toward identity-audit rate-limit/lockout analysis, snmp.brute toward protocol audit, notify.start toward eventing infrastructure, and loot.run toward artifact inventory/redaction helpers rather than default post-exploitation collection.

Standardized findings

Tools that detect a vulnerability emit a ToolFinding JSON record to engagements/<name>/findings/<tool>-<id>.json. The id is a deterministic SHA-256 over (tool, url, parameter, title, discriminator) so re-running the same tool against the same target is idempotent. Evidence is capped at 2 KiB on disk with a ... [truncated] marker.

ai-prioritize reads these findings sorted by severity and surfaces the top 50 as structured context to the LLM. security-graph ingests them as Finding nodes with DiscoveredBy edges to the tool that produced them.

Chat REPL

Tool calls flow through the same risk policy as the JSON path:

  • read_only and passive_remote endpoints run immediately.
  • low_active, high_active, and state_change endpoints prompt the user in the REPL with the proposed args before running. Pass --yes to skip the prompt for unattended runs.
  • The chat tool catalog is profile-filtered: default hides retired, lab-only, service-internal, and destructive endpoints; advanced adds explicit OOB/high-active helper surfaces; lab exposes lab-only scaffolding while destructive tools still require --unsafe-mode.

Slash commands inside the REPL:

/help    show this help
/tools   list available tools and their risk classes
/reset   clear conversation history (keeps system prompt)
/quit    exit

The agent loop is capped at 20 tool turns per user message — long-running investigations should be split across messages so the operator stays in the loop.

Docker

The shipped container image has mg-harness as its entrypoint:

docker run --rm -it --network host \
  -v "$PWD/engagements:/workspace/engagements" \
  ghcr.io/machinageist/geistscope:latest chat acme-bounty \
  --backend ollama --base-url http://localhost:11434 --model qwen2.5-coder

Notes

  • Every dispatch — allow, block, and error — is appended to audit.log.
  • High-active endpoints refuse to run unless the caller sets confirmed: true in the JSON path. The chat REPL sets it automatically after the operator’s in-REPL confirmation.
  • Subprocess tool calls run with a 5-minute wall-clock timeout and capture at most 64 KiB of stdout / stderr each — runaway tools cannot starve the harness or OOM the agent.
  • Model-visible response bodies are truncated to 256 KiB on UTF-8 char boundaries with a recorded byte-count.
  • Credentials never appear in dispatch output. session.get_headers returns <redacted> for every value and a header count.