Fully manual
The human runs every command. Claude provides no autonomous action.
Features · AI transparency
Some kit features run deterministic code. Some call an LLM. Both are useful, neither is hidden. This page lists every surface where an LLM is in the loop, the full autonomy spectrum, and the controls on each level.
The L0–L4 autonomy spectrum
The human runs every command. Claude provides no autonomous action.
Claude assists and proposes actions. The human approves every action before it executes.
Claude executes within a fixed scope. The human reviews after the batch, not before each action.
Claude executes with bounded mutation: kill switch, rate limit, rollback record, audit log, status: draft|active|deprecated. Requires the full 15-step control plane.
Never allowed. The kit refuses to ship an unbounded autonomous loop. No manifest can reach status: active without all 15 control-plane steps. L4 has no path to existence in the kit.
Every surface
Deterministic
Deterministic
Deterministic
LLM-driven
LLM-driven
Hybrid
LLM-driven (no memory)
Hybrid
Deterministic lookup
LLM-driven
Deterministic estimate
LLM-driven
Adversarial review gate
Any L3 capability ships in status: draft. Promotion to active requires a fresh-Claude review in a zero-context session. The reviewer answers eight structured questions. Hit / partial-hit / miss classification is recorded in review_notes in the manifest. See Autonomous operation for the full control-plane detail.
Worst-case attack vector — what could an adversary do with this capability?
Broken trust model — what assumption does this rely on that could be violated?
Theater test — does this actually do anything, or does it create the appearance of safety?
Memory pathologies — could the append-only memory file be poisoned over time?
Outcome window timing — could a well-timed input manipulate the outcome check?
Simulation-vs-reality — is the capability being tested against real data or controlled inputs?
Capability-count ceiling — how many active capabilities can run without degrading each other?
The one cheap rule — what single, low-cost check would catch the most likely failure?
What we never do
We never make an LLM call without a user-invoked trigger. No background telemetry-driven prompts.
We never send the contents of files outside .claude/ to Groq.
We never log full credentials. License hashes are truncated to last 6 chars in any log line.
We never train on your code. The kit has no feedback loop into model training.
We never promote a draft capability to active in the same session it was authored.
We never write outside .claude/ without explicit user consent declared in the manifest preview.
A note on Groq
If we adopt Groq for fast hook-time decisions — sub-300ms safety checks, session briefs — it'll appear in the table above with full disclosure: which model, what data leaves the machine, opt-out flag. Until then, this row stays out of the table. No surprises.