Claude Code Kit

Features · AI transparency

AI transparency

Some kit features run deterministic code. Some call an LLM. Both are useful, neither is hidden. This page lists every surface where an LLM is in the loop, the full autonomy spectrum, and the controls on each level.

The L0–L4 autonomy spectrum

Five levels. Only L0–L3 are allowed.

L0

Fully manual

The human runs every command. Claude provides no autonomous action.

L1

Assisted

Claude assists and proposes actions. The human approves every action before it executes.

L2

Supervised execution

Claude executes within a fixed scope. The human reviews after the batch, not before each action.

L3

Bounded autonomous

Claude executes with bounded mutation: kill switch, rate limit, rollback record, audit log, status: draft|active|deprecated. Requires the full 15-step control plane.

L4

Unbounded

Never allowed. The kit refuses to ship an unbounded autonomous loop. No manifest can reach status: active without all 15 control-plane steps. L4 has no path to existence in the kit.

Every surface

What runs where, and how to disable it.

Rules (.claude/rules/*.md)

Deterministic

Trigger
File pattern match in any tool call
What it sees
The file Claude is touching + the rule text
Where it runs
Locally in Claude
How to disable
Delete the rule file

PreToolUse Bash hook

Deterministic

Trigger
Any Bash tool call
What it sees
The bash command string
Where it runs
Locally on your machine
How to disable
Remove from .claude/settings.json

SessionStart hook

Deterministic

Trigger
Session start
What it sees
git status + open-issues file
Where it runs
Locally on your machine
How to disable
Remove from .claude/settings.json

code-reviewer agent

LLM-driven

Trigger
You invoke it
What it sees
The diff + reviewer system prompt
Where it runs
Claude (your subscription)
How to disable
Delete .claude/agents/code-reviewer.md

planner agent

LLM-driven

Trigger
You invoke it
What it sees
Task description + plan template
Where it runs
Claude (your subscription)
How to disable
Delete .claude/agents/planner.md

test-runner agent

Hybrid

Trigger
You invoke it
What it sees
Test output, then LLM summarises
Where it runs
Claude + your shell
How to disable
Delete .claude/agents/test-runner.md

security-reviewer agent

LLM-driven (no memory)

Trigger
You invoke it before merge
What it sees
Auth/license-touching files
Where it runs
Claude (your subscription)
How to disable
Delete .claude/agents/security-reviewer.md

/audit

Hybrid

Trigger
You type /audit
What it sees
Project file tree + audit checklist
Where it runs
claudekit MCP server
How to disable
Don't run the command

/recover

Deterministic lookup

Trigger
You type /recover <name>
What it sees
Disaster name → playbook section
Where it runs
claudekit MCP server
How to disable
Don't run the command

/explain

LLM-driven

Trigger
You type /explain <file>
What it sees
The file contents + project brain context
Where it runs
Claude (your subscription)
How to disable
Don't run the command

/cost-check

Deterministic estimate

Trigger
You type /cost-check
What it sees
Token counts + planned tool list
Where it runs
claudekit MCP server
How to disable
Don't run the command

/teach-me-this

LLM-driven

Trigger
You type /teach-me-this
What it sees
The code + grilling prompt
Where it runs
Claude (your subscription)
How to disable
Don't run the command

Adversarial review gate

Eight questions. Fresh context. Author cannot review their own work.

Any L3 capability ships in status: draft. Promotion to active requires a fresh-Claude review in a zero-context session. The reviewer answers eight structured questions. Hit / partial-hit / miss classification is recorded in review_notes in the manifest. See Autonomous operation for the full control-plane detail.

  • 01

    Worst-case attack vector — what could an adversary do with this capability?

  • 02

    Broken trust model — what assumption does this rely on that could be violated?

  • 03

    Theater test — does this actually do anything, or does it create the appearance of safety?

  • 04

    Memory pathologies — could the append-only memory file be poisoned over time?

  • 05

    Outcome window timing — could a well-timed input manipulate the outcome check?

  • 06

    Simulation-vs-reality — is the capability being tested against real data or controlled inputs?

  • 07

    Capability-count ceiling — how many active capabilities can run without degrading each other?

  • 08

    The one cheap rule — what single, low-cost check would catch the most likely failure?

What we never do

Hard limits. Not aspirational.

  • We never make an LLM call without a user-invoked trigger. No background telemetry-driven prompts.

  • We never send the contents of files outside .claude/ to Groq.

  • We never log full credentials. License hashes are truncated to last 6 chars in any log line.

  • We never train on your code. The kit has no feedback loop into model training.

  • We never promote a draft capability to active in the same session it was authored.

  • We never write outside .claude/ without explicit user consent declared in the manifest preview.

A note on Groq

Currently exploratory.

If we adopt Groq for fast hook-time decisions — sub-300ms safety checks, session briefs — it'll appear in the table above with full disclosure: which model, what data leaves the machine, opt-out flag. Until then, this row stays out of the table. No surprises.