Chapter 17A worked example, Concord

The preceding chapters develop the patterns and discipline of agentic-system architecture. This chapter does the work of applying them together on a single, coherent system, end-to-end, with concrete artifacts the reader can adapt. The system is a fictional but realistic coding assistant called Concord. The chapter walks through every architectural decision, shows the artifacts that result, and surfaces the failure modes the architecture defends against. The reader should finish this chapter with a working mental model, and a usable template, for building their own bounded, governed, observable agentic system.

Concord is a worked example, not a product specification. The names, parameters, and structures here are chosen to be illustrative; readers building a real coding assistant will adjust them. What is transferable is the shape of the architecture and the discipline of the artifacts.

Concord: Statement of purpose

Concord helps software engineers make changes to a codebase. Given a task, implement a feature, fix a bug, refactor a module, write a test, update documentation, Concord proposes a change, runs the codebase’s tests against it, and submits the change for human review. It operates in a sandbox; it never commits, pushes, or deploys without explicit human approval.

The architectural framing is deliberate:

The chapter walks through each of these in turn, with the concrete artifacts that realize them.

Concord at a glance

Figure 13. Concord at a glance

Every box in this diagram corresponds to a chapter of the book. The chapter takes them in order, building Concord layer by layer.

The bounding specification

Concord’s bounded-autonomy specification (Chapter 5) is the load-bearing artifact: the explicit, multi-dimensional limits the surrounding infrastructure enforces. Every value here is enforced by deterministic code that does not consult the agent.

agent: concord
version: 4.3.0
bounds:
  iteration_limit:
    outer_actions: 30
    per_subagent_actions: 15
    plan_revisions: 3
  cost_budget:
    total_usd: 2.00
    per_tool_call_usd_max: 0.25
    per_model_call_tokens_max: 200000
  time_budget:
    wall_clock_seconds: 180
    per_tool_call_seconds_max: 30
    per_model_call_seconds_max: 60
  action_surface:
    allowed:
      - read_file
      - write_file              # sandboxed working dir only
      - run_tests               # sandboxed
      - search_repo             # read-only
      - get_diagnostics         # read-only
      - run_linter              # read-only
      - propose_commit          # routes to approval gate
    forbidden:
      - any_network_egress
      - any_process_spawn_outside_sandbox
      - any_write_outside_working_dir
      - any_git_push
      - any_pkg_install_without_approval
  data_access_scope:
    default_scope: per-project, per-session
    read_indexes:
      - project_codebase_index
      - project_history_index
    no_cross_project_read: true
    no_external_index_read: true
  reversibility_envelope:
    reversible_by_default:
      - sandbox_file_writes
      - test_runs
      - sandbox_lint_runs
    requires_human_approval:
      - propose_commit
      - any_change_to_protected_paths
      - any_change_above_diff_size_threshold
      - any_pkg_install
skills_admission:
  allowed_registries:
    - internal-corp-registry
  require_signed_manifests: true
  declared_tools_must_subset_action_surface: true

A few notes on the spec:

The spec is version-controlled. Changes to it go through the same review process as code. The spec at the time of any session is recorded in the trace, so incident response can verify what bounds were in force.

The bounding gateway (pseudocode)

The spec is enforced by a deterministic gateway through which every agent-initiated action passes. The pseudocode below is architectural, not framework-specific code, but the shape of what must exist.

def bounding_gateway(session, proposed_action):
    # session carries: iteration counter, cost ledger, deadline,
    # action surface, data access scope, reversibility envelope, trace handle

    session.trace("agent.action_proposed", proposed_action)

    # 1. Iteration check
    if session.iter_count >= session.bounds.iteration_limit.outer_actions:
        session.trace("bounds.check_failed", reason="iteration_exhausted")
        return Refused("iteration limit exceeded")

    # 2. Time check
    if session.now() >= session.deadline:
        session.trace("bounds.check_failed", reason="deadline_passed")
        return Refused("session deadline passed")

    # 3. Action-surface check
    if proposed_action.tool not in session.bounds.action_surface.allowed:
        session.trace("bounds.check_failed",
                       reason="tool_not_in_surface",
                       tool=proposed_action.tool)
        return Refused(f"tool {proposed_action.tool} not allowed")

    # 4. Estimated cost check (cheap upper bound)
    est_cost = estimate_cost(proposed_action)
    if session.cost_spent + est_cost > session.bounds.cost_budget.total_usd:
        session.trace("bounds.check_failed", reason="cost_would_exceed")
        return Refused("cost budget would be exceeded")
    if est_cost > session.bounds.cost_budget.per_tool_call_usd_max:
        session.trace("bounds.check_failed", reason="per_call_cost_exceeded")
        return Refused("per-call cost exceeds limit")

    # 5. Data-access scope check (delegated to tool adapter)
    if not session.tool_adapters[proposed_action.tool].check_scope(
            session.identity, proposed_action.args):
        session.trace("bounds.check_failed", reason="data_access_scope_violation")
        return Refused("data access outside scope")

    # 6. Reversibility envelope check
    if proposed_action.is_irreversible():
        if proposed_action.tool not in session.bounds.reversibility_envelope.requires_human_approval:
            session.trace("bounds.check_failed", reason="irreversible_without_approval_path")
            return Refused("irreversible action with no approval route")
        # Route to governance pipeline (next section); do not invoke directly
        return route_to_governance(session, proposed_action)

    # All bounds passed; forward to governance pipeline
    return route_to_governance(session, proposed_action)

The gateway is deterministic. Its behavior under given inputs can be unit-tested without invoking the agent. Failure modes, what happens when each bound is hit, are explicit and recorded in the trace.

The governance pipeline (pseudocode)

Once the bounds pass, the proposed action enters the governance pipeline. The pipeline applies schema validation, policy gates, risk scoring, and (for high-risk or irreversible actions) approval routing.

def governance_pipeline(session, action):
    # 1. Schema validation
    schema = session.schemas[action.tool]
    schema_result = schema.validate(action.args)
    session.trace("governance.validator",
                   tool=action.tool,
                   result=schema_result.status,
                   errors=schema_result.errors)
    if not schema_result.ok:
        return Refused(f"schema validation failed: {schema_result.errors}")

    # 2. Policy gates
    for gate in session.policy_gates_for(action.tool):
        decision = gate.evaluate(session.context, action)
        session.trace("governance.policy_gate",
                       gate=gate.name,
                       decision=decision.status,
                       rule=decision.rule_id)
        if decision.deny:
            return Refused(f"policy {gate.name} denied: {decision.reason}")
        if decision.escalate:
            return route_to_approval(session, action, reason=decision.reason)

    # 3. Risk scoring
    score = session.risk_scorer.score(session.context, action)
    session.trace("governance.risk_score", action=action.tool, score=score)

    # 4. Mandatory approval for reversibility-envelope actions, then risk thresholds
    if action.tool in session.bounds.reversibility_envelope.requires_human_approval:
        return route_to_approval(session, action, reason="requires_human_approval")
    if score >= session.thresholds.approval_required:
        return route_to_approval(session, action, reason=f"risk_score={score}")
    if score >= session.thresholds.elevated_logging:
        session.elevate_trace_retention()  # full trace, longer retention

    # 5. Execute (registers rollback path for reversible actions)
    if action.is_reversible():
        session.register_rollback(action)
    result = session.tool_adapters[action.tool].invoke(action.args, idempotency_key=action.hash)

    # 6. Output validation
    out_schema = session.output_schemas.get(action.tool)
    if out_schema:
        out_result = out_schema.validate(result)
        if not out_result.ok:
            session.trace("governance.output_validator_failed", errors=out_result.errors)
            return Refused("tool output failed validation")

    # 7. Cost accounting
    session.cost_spent += result.cost
    session.iter_count += 1
    session.trace("cost.tick", spent=session.cost_spent)

    return Success(result)

Three architectural observations:

Concord’s policy gates

Examples of the policy gates Concord enforces. Each is expressed as a rule that can be evaluated against the action and the session context.

RuleApplies toConditionDecision
no_secrets_in_diffwrite_file, propose_commitDiff contains api[_-]?key, secret, token, or password followed by =Deny
protected_pathswrite_file, propose_commitDiff touches infra/, migrations/, .github/workflows/, package.jsonEscalate
diff_size_thresholdpropose_commitDiff exceeds 500 lines changedEscalate
test_coveragepropose_commitTests have not been run, or run had failuresDeny (or escalate, per project policy)
dependency_changespropose_commitDiff touches dependency manifestsEscalate
forbidden_librarieswrite_fileDiff imports from project’s deprecated-libraries listDeny

Policies are declarative. Adding a new policy is a configuration change, not a code change. Policies are version-controlled and reviewed; the policy in force at any session is part of the trace.

Concord’s risk scorer

Risk-based escalation routes the riskiest actions to human review while allowing the rest to flow autonomously. Concord’s risk score is a small composite:

def score(context, action):
    s = 0
    # Action-class contribution
    if action.tool == "propose_commit":
        s += 30
    elif action.tool == "write_file" and action.touches_test_file():
        s += 5
    elif action.tool == "write_file":
        s += 10
    # Diff-size contribution
    if action.diff_lines_changed() > 100:
        s += 20
    if action.diff_lines_changed() > 500:
        s += 30
    # Path-sensitivity contribution
    if action.touches_path_pattern(["src/auth/", "src/payments/"]):
        s += 25
    # First-of-kind contribution
    if not context.similar_action_in_history(action):
        s += 10
    # Session-context contribution
    if context.recent_bound_triggers > 0:
        s += 15
    return s

thresholds:
    elevated_logging: 30
    approval_required: 50

Scores cumulate: a commit that touches an auth path and is also large will easily exceed the approval threshold. Scores below the elevated-logging threshold pass autonomously with normal trace retention; scores in between get elevated trace retention (full trace, longer retention); scores above the approval threshold route through human review.

Risk scoring is calibrated on incident data: the team reviews historical incidents and verifies that the actions involved would have had scores above the approval threshold. Where they would not, the scorer is adjusted.

Concord’s memory architecture

Concord’s memory is tiered (Chapter 7), all access mediated by the gateway, and all writes governed.

Working memory is task-scoped: the current plan, files touched, test results, intermediate notes. Held in a durable session store (so the session can survive process restarts) but discarded at session end.

Episodic memory is the project’s history of past Concord tasks. Each completed session is summarized by a deterministic prompt-and-schema into a structured record (task summary, files touched, outcome, approval decision). The raw trace is kept in cold storage for audit; only the summary is surfaced to retrieval. Episodic memory is scoped per-project; cross-project reads are not permitted.

Semantic memory is curated project knowledge: the codebase index, the project’s style guide, the testing approach, the architecture documentation. Curation is explicit, content enters semantic memory through a documented ingestion pipeline, not through Concord’s writes. Concord cannot write to semantic memory; Concord can only read it (mediated by the gateway).

Cross-project isolation is the load-bearing memory commitment. Concord serving Project A cannot see anything from Project B, regardless of how the retrieval index is implemented. The gateway enforces project scope on every read; the index can be unified (one store, scoped at query time) or partitioned (one store per project), but the access decision is the gateway’s.

Concord’s skills

Concord uses Skills (Chapter 10) to load project-specific procedural knowledge. Each project has a small set of skills that Concord loads on demand based on task description.

Example SKILL.md, project-conventions:

---
name: project-conventions
description: Code conventions, style guide, and structural rules for this codebase.
  Load this skill whenever Concord makes any code change in this project.
version: 2.1.0
requires_tools:
  - read_file
  - search_repo
---

# Project conventions

## Language and tooling

- TypeScript with strict mode. No `any` types in new code; existing `any` should be narrowed when touched.
- Code style is enforced by the project's linter (`run_linter`). If the linter fails, fix the issues; do not disable the rule.
- Tests use the project's test framework. Each new public function must have at least one corresponding test.

## Module boundaries

- `src/core/` is the domain layer. It does not import from `src/api/`, `src/db/`, or `src/ui/`.
- `src/api/` is the HTTP layer. It imports from `src/core/` only.
- `src/db/` is the persistence layer. It imports from `src/core/` only.
- Cross-layer imports flag the change for review.

## Naming

- Functions: `camelCase`. Exported types: `PascalCase`. Constants: `SCREAMING_SNAKE_CASE`.
- File names match the primary export.

## Errors

- Errors thrown across module boundaries must be subclasses of `AppError`.
- Never swallow errors without explicit acknowledgment in the code (a comment naming why it is safe).

## What counts as a test-worthy change

- Pure refactors with no behavior change can omit new tests but should not break existing ones.
- Any change that touches a function's externally observable behavior requires a test demonstrating the change.

## Forbidden patterns

- Direct database calls outside `src/db/`. Use the repository layer.
- Direct HTTP calls. Use the configured HTTP client.
- `console.log` in production code paths. Use the structured logger.

Example SKILL.md, change-class-payment-flow:

---
name: change-class-payment-flow
description: Special procedure for changes to the payment-processing flow.
  Load when the task involves files under src/payments/ or files that touch
  payment workflows (Stripe webhooks, refund logic, billing reconciliation).
version: 1.4.0
requires_tools:
  - read_file
  - run_tests
requires_approval_class: "payment-flow"
---

# Payment-flow changes

## Why this skill exists

Payment-flow changes have caused incidents in this project's history. This skill encodes the additional procedure that mitigates those incidents.

## Procedure

1. Before any change, read `docs/payments/INVARIANTS.md` and surface the invariant most relevant to the change.
2. Write or update a test that exercises the invariant.
3. Make the code change.
4. Run the payments test suite (`run_tests --suite=payments`).
5. The diff routes through the payment-flow approval class regardless of size.

## Forbidden without explicit authorization

- Changes to the Stripe webhook signature verification.
- Changes to the idempotency-key handling.
- Changes to the refund-amount validation rules.

If the task requires any of the above, abort the change and surface the requirement to the user. Do not attempt a workaround.

These skills do not change Concord’s bounds, governance, or action surface. They tell Concord what the project expects; the architecture enforces what Concord is permitted to do. A skill that attempted to relax governance (“ignore the diff-size threshold for this change”) would have no effect, the governance layer does not consult skills for policy decisions.

Concord’s control structure

For simple tasks (small bug fixes, documentation changes, single-file refactors), Concord operates as a single agent calling tools directly. For complex tasks, Concord uses an orchestrator-worker shape (Chapter 9):

Each sub-agent has its own bounded autonomy. The orchestrator carries the aggregate session budget; sub-agent costs accumulate against the orchestrator’s ledger.

The orchestrator is not itself the agent; it is the control structure that coordinates the sub-agents. The orchestrator’s decisions (which sub-agent to invoke, when to replan, when to abort) are also subject to the bounding layer.

All sub-agents route their proposed actions through the same centralized bounding gateway and governance pipeline (Chapter 9); none has its own. The gateway enforces the write_file policy identically whether the action was proposed by the Editor worker or by the orchestrator itself. Centralizing the gateway is what makes per-agent bounds and fleet-wide policy changes configuration rather than code, and it guarantees that no sub-agent can become a weaker-governed path into the system.

Concord’s trace

Concord’s trace makes every step of every session observable. A simplified excerpt of the trace for a small change might look like:

[
  {"ts": "2026-06-17T09:14:01Z", "session": "s_8f23", "event": "session.start",
   "user": "u_412", "project": "p_acme", "task": "Add input validation to POST /signup"},
  {"ts": "2026-06-17T09:14:01Z", "session": "s_8f23", "event": "skill.activated",
   "skill": "project-conventions", "version": "2.1.0"},
  {"ts": "2026-06-17T09:14:03Z", "session": "s_8f23", "event": "agent.action_proposed",
   "tool": "search_repo", "args": {"query": "POST /signup handler"}},
  {"ts": "2026-06-17T09:14:03Z", "session": "s_8f23", "event": "bounds.check_passed",
   "tool": "search_repo"},
  {"ts": "2026-06-17T09:14:03Z", "session": "s_8f23", "event": "governance.validator",
   "tool": "search_repo", "result": "pass"},
  {"ts": "2026-06-17T09:14:04Z", "session": "s_8f23", "event": "tool.invocation",
   "tool": "search_repo", "latency_ms": 230, "cost_usd": 0.001,
   "result": {"hits": 3, "files": ["src/api/signup.ts", "src/api/signup.test.ts", ...]}},
  {"ts": "2026-06-17T09:14:05Z", "session": "s_8f23", "event": "memory.read",
   "tier": "semantic", "scope": "project:p_acme", "query": "signup validation rules"},
  {"ts": "2026-06-17T09:14:08Z", "session": "s_8f23", "event": "agent.action_proposed",
   "tool": "read_file", "args": {"path": "src/api/signup.ts"}},
  // ... many more steps ...
  {"ts": "2026-06-17T09:14:50Z", "session": "s_8f23", "event": "agent.action_proposed",
   "tool": "write_file", "args": {"path": "src/api/signup.ts", "diff_size_lines": 12}},
  {"ts": "2026-06-17T09:14:50Z", "session": "s_8f23", "event": "bounds.check_passed",
   "tool": "write_file"},
  {"ts": "2026-06-17T09:14:50Z", "session": "s_8f23", "event": "governance.validator",
   "tool": "write_file", "result": "pass"},
  {"ts": "2026-06-17T09:14:50Z", "session": "s_8f23", "event": "governance.policy_gate",
   "gate": "no_secrets_in_diff", "decision": "allow"},
  {"ts": "2026-06-17T09:14:50Z", "session": "s_8f23", "event": "governance.policy_gate",
   "gate": "protected_paths", "decision": "allow", "rule": "path_not_in_protected_set"},
  {"ts": "2026-06-17T09:14:50Z", "session": "s_8f23", "event": "governance.risk_score",
   "action": "write_file", "score": 10},
  {"ts": "2026-06-17T09:14:51Z", "session": "s_8f23", "event": "tool.invocation",
   "tool": "write_file", "latency_ms": 118, "cost_usd": 0.0009, "result": "ok"},
  // ... test runs, more changes ...
  {"ts": "2026-06-17T09:15:35Z", "session": "s_8f23", "event": "agent.action_proposed",
   "tool": "propose_commit", "args": {"message": "validate email and password on signup",
                                      "diff_files": 2, "diff_lines": 18}},
  {"ts": "2026-06-17T09:15:35Z", "session": "s_8f23", "event": "governance.risk_score",
   "action": "propose_commit", "score": 30},
  {"ts": "2026-06-17T09:15:35Z", "session": "s_8f23", "event": "governance.approval.requested",
   "queue": "concord_reviews", "reviewer_role": "engineer", "context_url": "..."},
  {"ts": "2026-06-17T09:19:58Z", "session": "s_8f23", "event": "governance.approval.granted",
   "reviewer": "u_412", "rationale": "Looks good."},
  {"ts": "2026-06-17T09:19:59Z", "session": "s_8f23", "event": "tool.invocation",
   "tool": "propose_commit", "result": {"branch": "concord/s_8f23-add-signup-validation"}},
  {"ts": "2026-06-17T09:19:59Z", "session": "s_8f23", "event": "session.end",
   "outcome": "approved", "cost_usd": 0.43, "iterations": 14, "duration_s": 94}
]

A few architectural facts the trace makes visible:

Concord’s testing

Concord’s test suite is structured by the three layers from Chapter 12.

Substrate tests. Standard unit and integration tests on the bounding gateway, governance pipeline, memory gateway, tool adapters, and approval workflow. Every policy gate has positive and negative test cases. Every schema has at least one passing and one failing example. The approval queue is tested for routing, timeout behavior, and audit-log completeness.

Envelope tests. Synthetic scenarios designed to assert that the agent’s behavior stays within the envelope:

Replay-driven tests. A golden trace set, a curated collection of historical sessions representing the system’s behavioral envelope, is replayed against every release. Each trace’s deterministic substrate is re-executed against the new model version, and the new behavior is checked against the envelope properties (bounds held, governance decisions consistent, end-task outcome comparable).

Adversarial replay generates additional tests from incident traces: for any session that did produce an incident, a counterfactual replay with tighter bounds or stricter policy demonstrates the policy change would have prevented the incident, and the modified policy becomes a candidate for adoption.

Concord’s failure-mode defenses

Walking through the failure modes cataloged in Chapter 11 and how Concord’s architecture defends against each:

Failure modeConcord’s defense
Infinite loopiteration_limit.outer_actions aborts after 30 actions
Cost explosioncost_budget.total_usd aborts at $2; per_tool_call_usd_max catches single expensive calls
Stuck sessiontime_budget.wall_clock_seconds aborts at 3 minutes
Plan corruptioniteration_limit.plan_revisions caps replanning at 3
Tool hallucinationSchema validators refuse non-conforming calls; surface is positive-allowlist
Tool misuseSandbox confines file writes; semantic validators on commit content
State corruptionSandbox is discarded at session end; nothing escapes without approval
Cross-project leakageMemory gateway enforces project scope on every read
Memory poisoningEpisodic-memory writes are curated; raw traces do not surface to retrieval
Tool injectionTool responses are structured; retrieved content is treated as data, not instruction
Skill compromiseSkills cannot add tools to the surface or relax bounds; declared requirements are validated at admission
Approval fatigueApproval is reserved for the externally-visible mutation (the commit); reads, searches, and sandboxed writes pass autonomously, so the reviewer sees one diff per task, not every action
Audit gapTrace is structured, correlated, retained per class, replayable
Cost driftPer-session cost in trace; percentile monitoring with alerts
Latency driftPer-tool latency monitored; alerts on percentile shift
Model driftReplay against golden trace set on every model upgrade

The pattern is the same in every row: the architecture catches the failure, not the agent. The agent can be wrong; it cannot be uncontained.

Operating Concord

Concord’s operating discipline (Chapter 18) is:

The operating cost of Concord is small relative to its value. A coding-assistant session that produces a reviewed-and-approved commit on a real task is worth substantially more than the bounded-by-design session cost.

What Concord does not do

A worked example is only useful if its limits are also visible:

These limits are deliberate. The system Concord is, bounded, governed, single-purpose, is more useful in production than a Concord that tried to be everything.

Adapting Concord to your system

The reader adapting this worked example to their own domain should expect to change:

The architectural shape, bounding layer, governance pipeline, memory gateway, trace store, skill admission, does not change. That is the contribution of this book: the shape is portable; the specifics are configuration.

Summary

Concord is a bounded, governed, observable coding-assistant agent. Its architecture demonstrates how the patterns and disciplines developed in earlier chapters compose into a working system. The bounding spec (Chapter 5), the governance pipeline (Chapter 6), the memory architecture (Chapter 7), the orchestrator-worker control structure (Chapter 9), the Skills layer (Chapter 10), the failure-mode defenses (Chapter 11), and the trace discipline (Chapter 12) are all visible in this single example.

The artifacts in this chapter, the bounding YAML, the gateway and pipeline pseudocode, the policy table, the risk scorer, the skill manifests, the trace excerpt, the failure-mode defense table, are templates the reader can adapt. The principle they realize is the principle the book opened with: probabilistic reasoning components are useful exactly insofar as their behavior can be bounded, governed, observed, and recovered from by deterministic infrastructure around them.

A production deployment of Concord would extend this skeleton with the layers the later chapters add: an ingestion pipeline behind its semantic memory (Chapter 8), a trust-calibrating interface for its human reviewers (Chapter 13), and a model gateway at its network boundary (Chapter 15). The skeleton is the same; the production system has more of it.

Chapter 18 turns from designing the system to running it, deployment, cost, observability, and lifecycle, after which the Glossary and Annotated Bibliography close the book.