Claude Code Kit

Features · Autonomous operation

Autonomous operation

For capabilities that need to mutate state on their own — without a human in the loop for every action — the kit ships the L3 control plane. Kill switch, rate limit, rollback record, hash-chain audit log, adversarial review before any capability goes live.

The 5-check rule

Before any capability is called autonomous, it passes five checks.

Activated

The component ran on real data this week. Not in a test harness — on live inputs.

Visible

Its output is on a dashboard a human can read without SSH'ing into the box.

Logged

Every decision is in a structured log with reasoning and confidence fields. Opaque audit logs are not audit logs.

Self-healing

When it fails, it retries with backoff and recovers without manual intervention. If a human has to restart it, it is not self-healing.

Self-expanding

When bounds are met cleanly, the component proposes a wider window for human review. It does not expand itself — it asks.

Dashboard-first

If you have to SSH in to check on it, the architecture is wrong.

Status, logs, and audit trail are all visible on a single dashboard accessible to non-engineers. The rule is not aspirational — if the output of a capability requires SSH to observe, it fails the Visible check and cannot be promoted to active.

The systemd timer schedule runs on the server. Reports land on the dashboard. A human reading the daily report should be able to answer "did it run, did it do the right thing, did it find any anomalies" without opening a terminal.

Timer schedule

What runs, when.

Every 15 min

Health check + heartbeat

Daily

Report — delta vs yesterday, anomaly count, decisions made

Daily

Data download — refresh inputs from upstream

Every 6 h

Advisory pass — read-only insight generation

Weekly

Retrain / refit (if applicable)

The L3 control plane

15 steps. Every one required. No shortcuts.

A capability in status: draft cannot be promoted to active unless all 15 steps are in place. See also: AI transparency for the full L0–L4 autonomy spectrum.

01

Kill-switch file

logs/CLAUDE_HALT checked before every cycle. If this file exists, the capability halts.

02

Capability manifest

YAML file per capability. status: draft|active|deprecated. No capability runs without a manifest.

03

Schema validation

Inputs and outputs validated against declared schema. Malformed input is rejected, not silently corrupted.

04

Bounds check

Every parameter is checked against declared range before the action runs.

05

Rate limit

Hourly and daily counters, fcntl.flock. Two concurrent runs cannot both increment.

06

Cooldown timer

Minimum interval between cycles. Prevents burst behavior after a brief stall.

07

Preconditions check

Required upstream components must be active. If a dependency is down, halt — do not proceed on stale data.

08

Pre-state snapshot

Snapshot of state before mutation. Rollback can be exact because the pre-state is recorded.

09

Rollback record

Written before the mutation. If the mutation fails mid-flight, recovery is possible.

10

The mutation

The actual change. Surrounded by all the above.

11

Outcome check

outcome_action_on_worse: rollback|accept|alert. Not every worse outcome is a rollback — but every worse outcome is a decision.

12

Outcome recorded

The outcome is written to the log with reasoning and confidence.

13

Hash-chain audit log entry

SHA256 hash-chain entry linking this action to the previous. Any gap or tamper is detectable.

14

Pattern-poisoning check

If memory keeps reinforcing the same answer, the kit alerts the human. Could be poisoning.

15

Health check + heartbeat

The component signals it is alive and functioning. Silence is treated as failure.

Capability manifest

One YAML file per capability.

The manifest is not documentation — it is machine-read on every run. Every check in the control plane reads from this file.

id

Unique identifier. Stable — never renamed after first ship.

status

draft | active | deprecated. draft is the only allowed status on first ship.

inputs.requires

List of upstream components that must be active.

outputs.schema

Declared output shape. Validated on every run.

bounds

Parameter ranges. Checked before every mutation.

rate_limit

Hourly and daily caps.

cooldown

Minimum seconds between cycles.

outcome_action_on_worse

rollback | accept | alert

review_notes

Adversarial review output. Required for promotion to active.

Adversarial review gate

A fresh Claude, zero context, reviews before promotion.

A capability ships in status: draft. Promotion to active requires a fresh-Claude review in a zero-context session — no conversation history, no project context loaded. The reviewer starts clean.

Eight structured questions are answered: worst-case attack vector, broken trust model, theater test (does this actually do anything), memory pathologies, outcome window timing, simulation-vs-reality scenarios, capability-count ceiling, and the one cheap rule that would catch this. Each question receives a hit, partial-hit, or miss classification, recorded in review_notes in the manifest.

Hard rule

The author of a capability is never the reviewer. The kit refuses same-session promotion — a capability written and reviewed in the same session cannot be promoted to active.

Agent memory

Append-only. AST-enforced. Recency-weighted.

  • Append-only JSONL

    One file per agent, scoped to the project. An AST-level test enforces that no delete, update, clear, or truncate operations appear anywhere in the memory code path.

  • baseline_anchor field

    Anchors recurring decisions. When the same decision comes up repeatedly, the anchor suppresses noise and surfaces only genuine drift.

  • Recency-weighted load

    On read, the most recent N entries are weighted higher. Old decisions decay without being deleted.

  • Pattern-reinforcement defense

    If memory keeps suggesting the same answer, the kit alerts the human. Consistent reinforcement of a single answer is a poisoning signal — not a sign the memory is working.

The hard limit

L4 is never allowed.

L0–L3 are supported. L4 — unbounded autonomous operation — is not in the kit, not on the roadmap, and not possible to reach by combining L3 primitives. The framework refuses to ship a manifest with status: active that lacks any of the 15 control-plane steps. There is no override.