Chapter 5Bounded autonomy as discipline

Bounded autonomy is the most under-treated load-bearing concept in the agentic literature. Most pattern catalogs mention it as a property — “patterns must respect bounded autonomy” — without naming the structural commitments that make it real. This chapter treats bounded autonomy as a discipline in its own right: a set of explicit limits, enforced by deterministic infrastructure outside the agent, that together make the agent’s behavior describable, testable, and recoverable.

The argument of the chapter is straightforward and uncompromising:

Important An agent that cannot be bounded is not a system component. It is a research artifact that occasionally produces useful output.

Unbounded agents look fine in demos. They produce incidents in production. The difference between the two is not the model, the prompt, or the framework: it is whether the surrounding architecture enforces explicit, multi-dimensional limits on what the agent is permitted to do.

The literature treats bounding inconsistently. Andrew Ng’s four-pattern framework mentions iteration limits as a feature of ReAct. Gulli (2025) discusses resource optimization as a pattern. The CSIRO catalog (2025) calls out bounded reasoning loops as a recurring concern across patterns. None develop bounding as the substrate that the other patterns rest on. This chapter does, because in the large majority of production incidents this author and the surveyed literature have cataloged, the proximate cause traces back to a missing or weakly enforced bound rather than to a failure of the cognitive pattern or the model itself.

What bounded autonomy is and is not

Bounded autonomy is not a moral or safety stance. It is an engineering property: the agent’s reasoning loop and action surface are explicitly limited along several axes, and the limits are enforced from outside the agent.

It is not a feature toggle. “Enable safety mode” is not bounded autonomy. Bounded autonomy is a structural commitment; if it has to be enabled, it can be disabled, and a system whose safety is opt-in is not safe.

It is not a single dimension. Capping iterations alone is insufficient, a single iteration can spend thousands of dollars on a tool call. Capping tokens alone is insufficient, a token-cheap run can corrupt production state. Bounded autonomy is the intersection of limits along multiple axes.

It is not the agent’s job. An agent cannot reliably bound itself under uncertainty or adversarial conditions. The architectural commitment is that bounds are enforced by infrastructure the agent does not control.

It is not the bounds the agent declares it will respect. A skill (Chapter 10) may declare its expected iteration count or cost; the architecture treats the declaration as advisory and enforces its own limits regardless.

What it is: a set of explicit, multi-dimensional, externally enforced limits on iterations, cost, time, action surface, data access, and recoverability, together with the audit trail that proves the limits held.

The six axes of bounding

Bounded autonomy in this book consists of explicit limits along the following six axes. Architects should be able to state, for any agent in a system, the value chosen on each axis and the mechanism that enforces it.

The six axes are not arbitrary. They correspond to the six ways an agent can do damage if uncontrolled: it can iterate forever (axis 1), spend without limit (axis 2), block resources indefinitely (axis 3), use the wrong tool (axis 4), read or write the wrong data (axis 5), or take an action that cannot be undone (axis 6). Each axis has a defense; each defense is a structural commitment. The MIT 2025 AI Agent Index — an in-depth survey of thirty deployed production agents — reports that where safeguards are disclosed, high-risk actions are typically constrained through permissions limits, guardrails, or approval workflows rather than model instruction alone; the same survey documents how often those safeguards are not disclosed at all (Chapter 21).

1. Iteration limit

The maximum number of reasoning/action steps the agent may take before the loop is aborted, regardless of whether the goal is satisfied. Iteration limits prevent thrashing, the loop where the agent makes incremental modifications without convergence.

Enforcement. External counter, incremented per step, checked before each step. The check is not a request to the agent (“are you done yet?”); it is a hard precondition. When the counter hits the cap, the loop is aborted and an iteration_exhausted event is emitted to the trace.

Typical values. 3–10 for interactive systems; 20–50 for batch research agents; 100+ only in clearly characterized long-running tasks with separate checkpointing. The values are not the point; the discipline is. The architectural commitment is that there is a value, it is enforced externally, and the abort behavior is deterministic.

A concrete failure. A customer-support agent, given an ambiguous goal, attempted to “make the customer happy” by escalating discounts iteratively without termination. The model emitted plausible language at each step; there was no external iteration cap. The agent issued seventeen sequential refunds against a single account before a monitoring alert caught the cost spike. A two-line iteration cap would have prevented the incident.

Failure mode if absent. Infinite loops (Chapter 11). Common cause of incident-grade cost overruns. Also the most common cause of stuck sessions, agents that appear to be working but are making no measurable progress.

2. Cost budget

A hard ceiling on the total monetary cost of the run, measured across all model calls, tool invocations, and downstream resource use. Cost is not the same as tokens; a single tool call can cost dollars while consuming few tokens. The reverse is equally true: a cheap-per-token operation can become expensive when invoked thousands of times across a long-running session.

Enforcement. A budget tracker that aggregates known cost dimensions (token cost per model call; published per-call cost for paid tools; estimated cost for unmetered tools). The tracker is consulted before every priced operation. Hard abort at the ceiling.

The tracker must measure aggregate cost, not per-call cost. A budget that admits any individual call below a threshold but does not aggregate across calls fails to bound a session that makes many cheap calls. Conversely, a budget that only measures aggregate without a per-call check misses the failure where a single tool call is the cost incident.

Cost also has a timing problem the other axes do not. With token-priced model calls, the exact cost of a call is not known until it finishes, and a streaming response can blow past the remaining budget mid-generation. A tracker that only checks before a call (pre-flight) cannot stop a single call that overshoots. The bounding layer therefore needs in-flight enforcement as well: set a hard max_tokens on each model request, derived from the remaining budget, so the provider truncates before the ceiling is breached; and for streaming responses, meter tokens as they arrive and sever the stream the moment the budget is exhausted. After the call, reconcile the actual billed cost against the estimate (post-flight) and carry any discrepancy into the ledger. Pre-flight checks bound the decision to start a call; in-flight and post-flight accounting bound its outcome.

Truncation has a correctness consequence distinct from its cost one. When a model call ends because it hit a token ceiling, the provider’s own limit, the gateway’s cap (Chapter 15), or the max_tokens the budget tracker imposed above, its output is incomplete, and the stop reason says so. The bounding layer must treat such a step as failed, detected deterministically from that stop reason, and must never parse a truncated reasoning step or a half-written tool argument as a valid result. A truncated step routes to a clean retry or, if it had already begun a multi-step action, to the compensating saga (Chapter 9); it is never fed forward as though complete. Bounds abort the loop between steps, not by splicing a half-finished thought into the next one.

Typical values. Per-task ceiling matched to the task’s economic value, with an order-of-magnitude headroom. A coding task for a feature worth several thousand dollars to the business can tolerate a several-dollar agent run; a customer-support session should be measured in cents. Set the ceiling at a multiple of the median observed cost, not at a multiple of the maximum tolerable cost; this catches drift earlier.

The greenfield problem. A new system has no production cost history to set a median-based ceiling against. Use the bootstrap procedure in Chapter 12: initial ceiling from task economic value plus headroom, then tighten from observed distributions once seeded sessions exist.

A concrete failure. A research-agent deployment in 2025 had a per-iteration cost check but no aggregate ceiling. The agent’s individual tool calls were cheap (search API queries at fractions of a cent), but its loops ran long, and the aggregate cost per session occasionally exceeded several hundred dollars. The team discovered the issue from their cloud bill, not from monitoring. An aggregate cost ceiling per session, say, $5, would have aborted the runaway sessions automatically.

Anthropic’s Project Vend (June 2025; Chapter 21) is a named primary-source parallel: over a month, a Claude instance (“Claudius”) ran a small in-office shop with pricing and inventory tools and accumulated net operating losses through below-cost sales, employee-directed discounting, and a hallucinated Venmo payment address — long-horizon economic drift, not a token bill, but the same architectural shape (no aggregate exposure cap, no architectural in-flight governance gate beyond configuration; humans were present in the experiment, but no runtime gate interrupted the drift). A session cost ceiling would not have caught Claudius; an aggregate business-loss bound or a plan approval gate on pricing policy would have.

Failure mode if absent. Cost explosions, often invisible until the bill arrives. A single misbehaving agent can produce a five-figure spend in hours. A fleet of agents with weak bounds can produce a six-figure spend in a weekend.

3. Time budget

The wall-clock limit for the run. Independent of iteration and cost limits because tool calls can block for long periods even at low token cost. A tool that hangs on a slow downstream service can hold a session open for hours without making progress; the iteration counter is not incremented, the cost ledger barely moves, and the agent is effectively stuck.

Enforcement. A deadline timer that aborts the loop. The deadline must be checked between steps and during tool invocations (cooperative interruption is rarely sufficient under failure). Tool wrappers should themselves carry per-call deadlines that are tighter than the overall session deadline, so a single slow tool does not consume the whole budget.

Typical values. Interactive systems target seconds (a chat response under 30s; an IDE assistant under 10s). Batch agents target minutes to hours. Set conservatively; running long is usually a symptom, not a feature.

A concrete failure. An operations-controller agent invoked a diagnostic tool that, under a specific failure of the target system, hung indefinitely. The agent’s iteration counter did not advance (the agent was waiting on the tool result), and no deadline was enforced at the session level. The session held resources for over 12 hours before a routine restart caught it. A 5-minute session deadline would have aborted within a tolerable window and surfaced the underlying tool problem.

Failure mode if absent. Stuck agents holding resources. Cascading SLO violations. Customer-visible hangs. Especially severe for operations and incident-response agents where a stuck session can mean a stuck remediation.

4. Action surface

The set of tools and operations the agent is permitted to invoke. The most powerful axis: a constrained action surface narrows the failure modes far more than any prompt engineering. An agent that cannot call the production database cannot corrupt it. An agent that cannot send email cannot leak data through email. An agent that cannot spawn processes cannot escape its sandbox by spawning a shell.

Enforcement. A capability layer between the agent and the world. Tool adapters expose only the operations the agent is allowed to call, with their schemas and authorization scopes. Tools the agent is not allowed to use should not be discoverable, they are not in the tool list the agent receives, and attempts to invoke them by name fail at the adapter, not at the underlying service.

The action surface should be expressed as a positive list (these tools are allowed) rather than a negative list (these are forbidden). Positive lists fail closed; negative lists fail open whenever a new tool is added without updating the list.

Typical values. Specialist agents have narrow surfaces (3–10 tools). Generalist agents have broader surfaces but pay for it in failure-mode density. A coding agent might have: read file, write file in sandbox, run tests in sandbox, search repo, get diagnostics. That is the entire surface; anything else is unavailable.

A concrete failure. In December 2023, users manipulated a ChatGPT-powered sales chatbot on a Chevrolet dealership website into agreeing to sell a 2024 Tahoe for $1 — binding-sounding language the model emitted after prompt-injection instructions overrode its guardrails (VentureBeat, December 2023; widely reported in the trade press). The bot was not a ReAct agent with a tool registry, but it exposed an implicit effect surface: any customer utterance could steer output toward commercial commitments, with no positive list of permitted effects and no adapter between model text and what the interface could treat as an offer. A constrained action surface — including output effects, not only named tools — and governance at display would have kept the commitment off the record regardless of what the model said.

Failure mode if absent. Tool misuse, unauthorized actions, data exfiltration, supply-chain risks via uncontrolled tool invocation. The single largest source of severe incidents in the surveyed literature.

5. Data access scope

The set of data the agent is permitted to read or write, including memory, retrieval indexes, and tool-side state. Often confused with the action surface but logically distinct: an agent may legitimately call a tool that has access to data the agent should not see. The action-surface check determines can the agent call this tool; the data-access-scope check determines what is the tool allowed to return for this agent.

Enforcement. Row-, document-, or tenant-level access policies at the tool layer. Retrieval indexes scoped per session or per user. Memory namespaces isolated by identity. The scope must be carried with every call and enforced at the data layer, not at the agent layer (the agent cannot be trusted to scope itself).

Typical values. Always tenant-scoped in multi-tenant systems. Always user-scoped where personal data is involved. Default deny. Cross-scope reads are explicit, audited, and require justification.

A concrete failure. Johann Rehberger disclosed this exfiltration shape in GitHub Copilot Chat (June 2024; Embrace The Red; see Chapter 21). Simon Willison cataloged the same pattern across coding agents in 2023–2024: a coding agent that reads private repository context alongside untrusted content, with Markdown image rendering enabled, can be induced to encode session secrets in image URLs the client then fetches. The failure is private data in the agent’s context co-resident with attacker-influenced content, with no scoping or egress control that separates what may be read from what may leave the session; breaking the lethal trifecta at the architecture (Chapter 6), not the prompt, would have removed the channel.

Failure mode if absent. Cross-tenant data leakage. Prompt-injection-induced exfiltration via attacker-controlled retrieval content (the lethal-trifecta failure, see Chapter 6 and Chapter 11). Among the most severe incident classes the system can suffer: privacy violations, regulatory exposure, and customer-trust loss.

6. Reversibility envelope

The set of actions the agent may take without explicit human approval, defined by what is reversible from the system’s standpoint. Sending a customer email is irreversible (apologies notwithstanding); writing to a draft is reversible. Issuing a refund is irreversible; flagging an account for review is reversible. Pushing to a production branch is irreversible; opening a pull request is reversible.

Enforcement. Two-tier action surface: reversible actions execute directly, irreversible actions are routed to an approval path before they can take effect. That approval path is human-in-the-loop (HITL), and it belongs in the architecture, not the UX. HITL is often treated as an interface nicety, a confirmation dialog, but here it is the enforcement mechanism for the reversibility envelope: a deterministic gate that holds an irreversible action in a pending state until a human, or a stricter policy gate (Chapter 6), authorizes it. The tiering is part of the action-surface specification, not a runtime decision the agent makes.

The architectural test for whether an action belongs in the irreversible tier: if this action was taken in error, what would it cost to undo? If the answer is more than a customer-service note or a database update, the action belongs above the approval line.

Typical values. Irreversibility default for any external communication, financial transaction, state mutation in production systems, and resource provisioning. Conservatism here is rewarded; the cost of a false positive (an approval gate that asks for human review when one was not strictly needed) is a small delay; the cost of a false negative is an incident.

A concrete failure. In Moffatt v. Air Canada (British Columbia Civil Resolution Tribunal, February 2024), the airline’s customer-support chatbot told a passenger he could apply for a bereavement fare retroactively after travel — contradicting the airline’s published policy. The tribunal held Air Canada liable for negligent misrepresentation and ordered compensation; the airline’s argument that the chatbot was a separate legal entity failed. The chatbot’s first customer-facing answer carried enforceable obligation with no verification gate between model output and a representation the airline would be held to. A reversibility envelope that routed fare and refund representations through policy or human review before they reached the customer would have caught the error before it became liability.

Failure mode if absent. Irreversible incidents at the speed of the model. A bug in a workflow takes hours to escalate; an agent without a reversibility envelope can do irreversible damage in seconds.

Where the reversibility envelope does not reach

The six-axis discipline this chapter develops assumes that most irreversible actions can be gated behind human approval, and for the domains the book treats, coding, support, research, and operations, that assumption holds. It does not hold everywhere. Domains where latency demands autonomous irreversible action, a trading agent that must execute before a price moves, an autonomous-vehicle control agent that must brake or steer in milliseconds, a real-time fraud-block agent that must hold a transaction before it clears, cannot wait on a human in the loop, and the reversibility envelope as stated here does not apply. The book’s thesis is not universal, and saying so plainly is more honest than letting the framing imply it is. What carries over to those domains is the rest of the discipline: the iteration, cost, time, and action-surface axes still bound the agent, and governance still gates what it may attempt; what does not carry over is the human approval tier, which is replaced by a pre-committed policy the system evaluates deterministically and faster than a human could. The architect in such a domain is not exempt from bounding; they are bounding against a tighter time budget and replacing the human gate with a policy gate whose rules were approved, in advance, by the humans the latency constraint excludes from the loop. This book does not develop that regime in depth; it names its boundary so a reader in such a domain knows where the treatment stops and where their own work begins.

A concrete bounding specification

A bounded-autonomy specification for a coding-assistant agent might look like this. It is the specification for Concord, the worked example developed end-to-end in Chapter 17; the values reappear there inside the running system. It is not framework code; it is the architectural statement that the architecture must enforce somewhere.

agent: coding-assistant
bounds:
  iteration_limit: 30
  cost_budget:
    total_usd: 2.00
    per_tool_call_usd_max: 0.25
  time_budget:
    wall_clock_seconds: 180
    per_tool_call_seconds_max: 30
  action_surface:
    allowed:
      - read_file
      - write_file (sandboxed working dir only)
      - run_tests (sandboxed)
      - search_repo (read-only)
    forbidden:
      - any network egress
      - any process spawn outside sandbox
      - any write outside working dir
  data_access_scope:
    default_scope: per-session sandbox
    no_cross_session_memory_read: true
    no_external_index_read: true
  reversibility_envelope:
    reversible_by_default:
      - sandbox file writes
      - test runs
    requires_human_approval:
      - any commit to a real branch
      - any push to remote
      - any package install

This specification never touches the LLM. It is configuration for the deterministic bounding layer: the gateway or proxy between the model and the tools, called from the agent loop (the harness introduced in Chapter 4, designed in full in Chapter 19, and enforced end-to-end in Chapter 17). The model is entirely blind to it. The agent does not read these limits, cannot negotiate them, and is not aware they exist; it simply experiences a refused tool call or an aborted loop when it crosses one. That blindness is the point: the bound holds precisely because it lives in the deterministic shell, not in anything the probabilistic component can see or alter.

This kind of specification is the load-bearing artifact of bounded autonomy. Without it, the agent is “bounded” only by hope. With it, the system has something to test, audit, and govern. The specification is version-controlled, reviewed at change time, and surfaced in the system’s documentation so that users and reviewers can see what the agent is permitted to do.

Chapter 17 develops the gateway that enforces a specification of this shape end-to-end, with pseudocode for the iteration counter, the cost ledger, the deadline timer, the action-surface check, the data-scope check, and the reversibility-envelope routing.

The specification is also what the system regenerates as part of incident response. After an incident, the team reviews the bounds in force at the time and asks whether tighter bounds would have prevented or contained the failure. The specification is the artifact of that review.

Where bounds live in the architecture

Bounds are not properties of the agent. They are properties of the surrounding deterministic infrastructure. The architectural placement matters because bounds enforced by the agent itself are unreliable, the model can choose to ignore an instruction; it cannot choose to exceed a counter that aborts its process.

The standard placement is a bounding layer between the agent and the rest of the system:

Figure 2. Where bounds live in the architecture

Architecturally, the bounding layer is the first deterministic surface the agent’s outputs encounter. The agent does not call tools directly; it calls them through the bounding layer’s gateway, which checks the cost ledger, deadline, action allow-list, data access scope, and reversibility envelope before forwarding the call. If any check fails, the call is refused or escalated, and the agent observes the refusal as a normal tool result (so it can adapt).

This decomposition has three architectural benefits:

The bounds become testable. The bounding layer is deterministic; its behavior under given inputs can be asserted directly. The agent’s behavior need only be tested within the bounds, not in addition to them.
The bounds become composable. A single bounding layer can enforce policy for many agents in a fleet, with per-agent overrides expressed as configuration rather than code. New agents inherit the discipline by default; tightening bounds across the fleet is a configuration change, not a code change.
The bounds become observable. Every bound-check event is part of the trace (Chapter 12). The team can see, per session and across sessions, how often each bound triggered and what the agent was doing when it did. Patterns in bound triggers are early signals of drift, attack, or design defect.

The bounding layer is small but central. Its behavior can be described in a few thousand lines of code in any production system; its absence is felt in every incident.

How bounding interacts with the cognitive layer

The cognitive patterns from Chapter 4 affect which bounds matter most:

ReAct loops rely on iteration bounds. Without iteration bounds they are the canonical source of thrashing. Reasoning models that internalize ReAct still require an outer iteration bound on agent invocations, because the loop has merely moved.
Plan–Execute with replanning needs iteration bounds on the replan count specifically; otherwise replanning produces an outer loop that escapes the inner step bounds. A plan that replans five times has effectively used the iteration budget five times over, and the budget must account for that.
Reflection with critique loops needs a critique-iteration bound; absent that, the critic and generator can engage in unproductive debate. A reflection loop should typically cap at two or three rounds; more is rarely productive.
Self-Consistency, Debate, and Tree-of-Thought are heavy on cost; they need explicit cost ceilings, not just iteration ceilings, because the work scales with branch factor and depth rather than turn count.
Tool Use is governed primarily by the action-surface and data-access axes; iteration and cost bounds are secondary but still required.

The architectural reading is that the cognitive pattern suggests which axes are critical, but the bounding layer enforces all six regardless. Defense in depth matters in agentic systems for the same reason it matters in security: any single layer can fail, and the cost of failure is asymmetric.

Bounded autonomy and reasoning-model erosion

Reasoning models in 2026 take more autonomous actions per call than earlier models did. A single call can produce a long sequence of internal reasoning steps and external tool calls. This raises a question: does the iteration bound still mean anything when the “iteration” is internal to the model?

The answer is yes, but the bound has to be redefined. In a 2024 ReAct architecture, an iteration meant one model call. In a 2026 reasoning-model architecture, an iteration is more usefully defined as one external action (tool call, output emission, state mutation). The cost budget moves up in importance because a single reasoning-model call can consume substantial tokens internally before it acts. The architectural posture shifts toward:

Per-call cost ceilings (caps on tokens-per-call) become more important.
Per-action limits (caps on tool calls per turn) become more important.
Outer-loop iteration limits remain important when the agent is invoked repeatedly across user turns.
Time bounds at the per-call level matter more, because a single reasoning-model call can take minutes.

The discipline is the same; the unit of measurement moves. The team that operates a reasoning-model agent has to re-derive its bounds in the new unit and verify that the rederivation holds.

Bounding and multi-agent systems

In multi-agent systems, bounds apply per agent and aggregate across agents. A worker agent has its own iteration, cost, and time bounds. The orchestrator that called the worker has its own bounds, which must account for the cost the workers consume.

The architectural pattern is to carry a remaining-budget marker across agent boundaries. When an orchestrator delegates to a worker, the worker receives a budget that is the orchestrator’s remaining budget minus what the orchestrator reserves for itself. The worker reports back the budget consumed; the orchestrator updates its ledger. The aggregate session budget is enforced even though the work is distributed across agents.

Without this discipline, multi-agent systems exhibit a specific failure mode: each agent appears bounded individually, but the aggregate run blows through the session budget because no agent has a view of the total. The orchestrator must own the aggregate, even when the work is decentralized.

Anti-patterns in bounded autonomy

Five anti-patterns recur in production:

Single-axis bounding. “We capped iterations to 10.” Cost can still explode, time can still blow out, the action surface can still be unsafe. Bounds are not bounds if they are one-dimensional. Bound on all six axes.

Prompt-based bounding. “We added ‘do not spend more than $5’ or ‘stop after 10 steps’ to the system prompt.” The model cannot be trusted to self-police a quantitative limit. Bounds must be enforced by deterministic code (Chapter 6); Chapter 11 catalogs the same shape as prompt-based governance. This is the single most common architectural mistake at this layer — close to the default behavior of teams new to building agents.

Self-policing bounds. “We use a critic agent to monitor the worker agent’s cost.” A critic agent has the same failure modes as the worker. The bound must be enforced by deterministic code, not by another reasoning component. Reasoning components are useful within the bounds; they cannot be the bounds.

Bounds expressed as recommendations. “Aim for ~10 iterations.” A recommendation is not enforceable; what is not enforceable is not a bound. Bounds are hard limits, expressed as such, with deterministic abort behavior.

Bounds that drift silently. Bounds set at design time, never reviewed, never updated as the system evolves. New tools are added without updating the action surface; cost dimensions change without updating the budget; new attack vectors emerge without tightening data scope. The bounding specification must be reviewed on a regular cadence, alongside other architectural artifacts.

Test implications

Bounded autonomy makes a class of tests possible that are otherwise out of reach:

Bound-respect tests. Run the agent against synthetic scenarios designed to provoke runaway behavior; assert that the bound aborts the run. These are deterministic tests over a non-deterministic agent.
Escalation tests. Trigger irreversible-action attempts; assert that the reversibility gate refuses or routes to approval.
Cost-regression tests. Track per-run cost across time; alert on drift even when functional behavior appears normal.
Action-surface tests. Attempt every disallowed action; assert refusal.
Cross-scope tests. In multi-tenant systems, attempt to read or write data outside the current session’s scope; assert refusal.
Time-bound tests. Inject artificial latency at the tool layer; assert that the session deadline aborts the run cleanly.

These tests do not validate the agent’s quality. They validate the envelope. The agent can be wrong; it must not be uncontained. Chapter 12 develops this further.

Bounded autonomy and the user

Bounds are also a contract with the user. An agent advertised as bounded must be observably bounded. Users who can see that the system stops, asks for approval, or refuses to act on an unreasonable request build trust. Users who see runaway behavior withdraw it.

This argues for surfacing the bounds: showing the iteration counter, the running cost, the deadline, the action surface. The Skills layer (Chapter 10) makes this discipline more tractable because the agent’s loaded skills declare what they can do, providing a natural place to advertise the action surface to the user. The bounds are not only an engineering property; they are part of the system’s communicative contract.

For high-stakes systems (financial, healthcare, legal), the bounds are also a regulatory and audit artifact. A regulator asking “what was the agent permitted to do at the time of the incident” expects an answer that traces to a versioned specification, not to a prompt that may have been edited.

Summary

Bounded autonomy is the discipline of imposing explicit, multi-dimensional, externally enforced limits on what an agent may do. Six axes — iteration, cost, time, action surface, data access, reversibility — together constitute the bound. The bounding layer sits between the agent and the rest of the deterministic infrastructure; it is the first surface the agent’s outputs encounter, and the layer that makes the agent’s behavior testable, governable, and recoverable.

Bounded autonomy is the substrate on which the governance layer (Chapter 6) sits. The two layers divide cleanly: bounding is primarily quantitative and binary — counters, timeouts, static allow-lists, checks that pass or fail without judgment — while governance is qualitative and dynamic — schema and policy evaluation, redaction, risk scoring, and approval that weigh the content of an action rather than merely its count. Without bounds, there is nothing to govern; with bounds, governance becomes a question of which limits apply where, and who approves their relaxation. Chapter 6 takes up that question.