Chapter 11Failure modes and anti-patterns

Part III turns from what to build to how production systems fail, how to test and observe them, and how they meet users, enterprises, and the model layer.

Most pattern catalogs describe what to build. Few describe what breaks. This chapter is about how agentic systems fail, the modes, the cascades, the anti-patterns that produce them, because pattern selection and bound-setting are only defensible when the team has thought about what failure looks like.

The chapter has two sections. The first is a taxonomy of failure modes: the recurring ways agentic systems go wrong, regardless of the patterns or frameworks used. The second is a catalog of anti-patterns: structural choices that produce these failure modes reliably enough to be named.

The book treats this chapter as load-bearing. A team that can name the failure modes of its system has done meaningful architectural work; a team that cannot has not.

A note on method: the failure modes and anti-patterns below are derived from the published post-incident literature of 2024–2026, the security literature on prompt injection and the lethal trifecta, the safety literature on guardrails and red-teaming, and the operational reports of teams running agentic systems at scale. The book-level convention for unattributed incidents is stated in the Preface; this chapter’s catalog names public cases where they illustrate a mode directly. They are not exhaustive, new failure modes will emerge, but they cover the substantial majority of severe incidents documented through the time of writing. Treat the chapter as the current best taxonomy rather than as a closed set.

A taxonomy of failure modes

Failure modes are organized by where they originate in the architecture. Each entry names the symptom, the typical root cause, the cascade (how the failure propagates), and the structural defense (the chapter where the relevant pattern is developed).

Reasoning loop failures

Infinite or thrashing loops. The agent iterates without progress, often revisiting the same approach with cosmetic variations.

Premature termination. The agent declares success or gives up before the task is complete.

Plan corruption. The agent’s plan drifts from the goal across replanning steps, ending up pursuing something the user did not ask for.

State corruption. The agent’s working or episodic memory becomes inconsistent with the world.

Reasoning loop deadlock with critic. A generator–critic pair (reflection, evaluator-optimizer) enters a stable disagreement: the generator produces, the critic rejects, the generator produces a minor variant, the critic rejects again, indefinitely.

Tool-use failures

Tool hallucination. The agent invokes a tool that does not exist, a fabricated tool name, or a call shaped to fit a tool the agent assumes is available but that was never exposed to it. (Malformed arguments to a real tool are a distinct failure, parameter hallucination, below.)

Parameter hallucination (type coercion). The agent calls a real tool but fabricates an argument that is not in the schema, or coerces a value to the wrong type, passing a JSON string where an integer is required, inventing an optional field, supplying a plausible-but-wrong enum value.

Tool misuse. The agent invokes a real tool with arguments that are syntactically valid but semantically wrong.

Cascading tool failure. A tool call fails; the agent retries; the retry partially succeeds; the agent retries again with corrupted state.

Tool injection. A tool’s response contains content that, when read by the agent, alters the agent’s behavior against the user’s interest.

Tool quota exhaustion. The agent consumes an external tool’s quota or rate limit, affecting other workloads.

Memory failures

Cross-identity leakage. Memory from one user, tenant, or session is surfaced to a different one.

Stale-semantic retrieval. The agent retrieves and confidently uses semantic memory entries that have been superseded.

Context exhaustion. The agent runs out of context window mid-task, often silently truncating critical material.

Memorization of sensitive content. Memory accumulates PII, secrets, or credentials.

Memory poisoning. An attacker, internal or external, writes to memory in a way that influences subsequent agent behavior.

Coordination failures

Inter-agent injection. One agent’s output, treated as input by another, contains content that subverts the receiving agent.

Coordination deadlock. Two or more agents wait on each other and neither progresses.

Convergence failure (committee, debate, swarm). Multi-agent reasoning patterns do not converge in budget.

Orchestrator-worker misdelegation. The orchestrator delegates a task to a worker that lacks the necessary capabilities, but the worker proceeds as if it has them.

Governance failures

Approval-fatigue collapse. Every action requires human approval; reviewers stop reading; approvals become rubber-stamping.

Ungoverned customer-facing output. The model emits pricing, policy, legal, or brand language to a customer with no policy gate, output validator, or approval path before display — chatbots treated as if text were costless when it can carry enforceable or reputational weight.

Policy drift. Policy gates written for an earlier version of the system no longer match current behavior.

Audit gap. The trace does not capture enough to reconstruct an incident.

Governance bypass via prompt injection. An attacker’s prompt content causes the agent to invoke actions that the governance layer would block, but the agent does so in a way the governance layer does not detect.

Operational failures

Cost explosion. The system runs over budget by an order of magnitude with no immediate visibility.

Latency explosion. Per-request latency drifts upward over time, often invisibly.

Drift. The system’s behavior changes over time without a clear cause, quality degrades, costs change, failure rates rise, without code changes.

Capacity collapse under load. The system performs well at normal load and collapses ungracefully at peak.

Cascades and compound failures

Failure modes do not occur in isolation. The patterns above interact in cascades that produce the worst incidents:

The runaway-loop-cost-cascade. An iteration-unbounded reasoning loop calls a per-call-expensive tool; the cost ledger is not aggregated across calls; the session spends an order of magnitude over budget before being noticed. Defense: bound on iterations, and aggregate cost ledger, and per-tool latency limits.

The injection-exfiltration cascade. A retrieval index contains attacker-controlled content; the agent is allowed to act on retrieval content as if it were authoritative; the agent has sensitive data in memory; the agent is allowed to call external tools. The retrieval content instructs the agent to send memory contents to an attacker-controlled endpoint. This is the lethal trifecta — untrusted input, sensitive data access, and external action capability — as a chain of dominoes, each of which a different defense can topple. Johann Rehberger’s disclosure against GitHub Copilot Chat (June 2024; Embrace The Red; see Chapter 21) is a public instance: private repository context, attacker-influenced content in the same session, and an external fetch channel (Markdown image URLs) combined in one interface. Simon Willison catalogs the same lethal-trifecta pattern across products (Chapter 21).

Figure 8. Cascades and compound failures

Defense is layered governance, tool-response sanitization, action-surface restriction, output validation, any one of which breaks the chain.

The poisoned-memory cascade. An attacker submits content designed to be memorized; the memorization succeeds; subsequent agents retrieve the poisoned content and act on it; the influence is hard to attribute because it manifests sessions later. Defense: memory writes governed, attacker-controllable content not promoted to long-term memory, periodic audit of memory contents.

The over-trusted-orchestrator cascade. An orchestrator delegates to a worker; the worker is compromised (or behaves incorrectly); the orchestrator integrates the worker’s output as authoritative; the orchestrator then propagates the bad output to other workers or to the user. Defense: inter-agent governance, capability verification at delegation, output validation between agents.

The drift-incident cascade. The model is upgraded; behavior shifts; metrics drift slowly; no individual session looks bad; a long-tail incident lands months later from a behavior pattern that has been building. Defense: replay testing of new model versions, behavioral-envelope monitoring, willingness to roll back model versions.

Cascades are the reason defense in depth is the dominant architectural posture. No single bound, no single validator, no single check is reliable. The system survives because multiple independent defenses each have to fail for the cascade to reach the user.

Anti-patterns

An anti-pattern is a structural choice that reliably produces the failure modes above. Each entry below names the anti-pattern, describes its appeal (why teams choose it), names what it produces, and offers the structural alternative.

“The agent is the system”

Description. The architecture treats the agent as the whole system. Tools are exposed directly. State lives in the conversation. There is no bounding layer, no governance layer, no memory architecture beyond what the model maintains in context.

Appeal. Simple to set up. Looks like a clean architecture diagram. Performs well in demos.

Produces. Cost explosions (no bound), severe incidents (no governance), state corruption (no managed memory), cross-identity leakage (no scoping), drift (nothing to compare against).

Alternative. The agent is a component inside deterministic infrastructure (Chapter 2). Architecture is around the agent, not under it.

“Agent where a workflow would do”

Description. An open-ended reasoning loop is deployed on a task whose steps are known at design time — a scheduled report, a fixed ETL sequence, a routing decision with a closed choice set — where a workflow, a cron job, or a simple router would suffice.

Appeal. The demo is impressive. Point-and-click agent builders make the agent the path of least resistance. The team already has the model API key; wiring a loop feels faster than modeling the workflow.

Produces. Non-determinism where none was needed; a debugging and oversight surface the task never required; cost per run often an order of magnitude above the workflow equivalent. The failure is one step earlier than Multi-agent for the sake of it (below): the question is not how many agents, but whether an agent was the right category at all.

Alternative. The category test from Chapter 2: is the choice set open? The decision table in Chapter 9; Anthropic’s workflow patterns (Chapter 21) for the canonical shapes. If the steps are known, encode them.

“Prompt-based governance”

Description. Safety, policy, and bounds are expressed in the agent’s system prompt, “do not do X,” “always validate Y,” “stop after Z iterations.”

Appeal. Trivial to implement. No new infrastructure. Apparent compliance.

Produces. Real-world prompt-injection success; real-world incidents in which the agent did exactly the thing the prompt told it not to. The prompt is a recommendation; the model can ignore it.

Alternative. Governance is deterministic code (Chapter 6); bounds are externally enforced (Chapter 5). The argument for why prompt-based enforcement fails at the model layer is made once, in Chapter 6; this catalog entry exists only to name the shape so a reviewer can recognize it. The bounding-layer anti-pattern prompt-based bounding (Chapter 5) is the same mistake applied to quantitative limits. Prompts express preferences; architecture enforces constraints.

“Multi-agent for the sake of it”

Description. A system decomposed into multiple agents (researcher, planner, executor, critic, archivist) where a single agent with tools would suffice.

Appeal. Elegant on the diagram. Sounds modern. Frameworks promote it.

Produces. Multiplied failure modes; harder debugging; coordination overhead; more attack surface (inter-agent injection, with the substrate treatment in Chapter 14); cost.

Alternative. Single agent with tools. Orchestrator–worker when the task genuinely decomposes. First ask whether an agent is needed at all — see Agent where a workflow would do (above). The argument that multi-agent coordination is over-prescribed is made in Chapter 3 and Chapter 9; this entry names the shape only so a reviewer can recognize it. Peer multi-agent is justified only when the problem has dialectical structure.

“Self-policing autonomy”

Description. A critic agent monitors a worker agent’s cost, iteration, or risk. Or the agent itself is asked to determine when it should stop.

Appeal. Apparently sophisticated. Allows the architecture to claim oversight without committing to deterministic bounds.

Produces. No real bound. The critic has the same failure modes as the worker. Adversarial inputs can subvert both.

Alternative. Bounds enforced by deterministic code (Chapter 5). A critic agent is fine as one of multiple layers; it is not a substitute for the deterministic bound.

“Conversation history as memory”

Description. The agent’s memory is whatever the model recalls from the running conversation. No separate memory store, no retrieval, no scoping.

Appeal. Easy. Works for a single session of a single user.

Produces. Context exhaustion; no cross-session continuity; no scoping; eventual quality erosion; identity-leakage risks when sessions are reused.

Alternative. Tiered memory architecture (Chapter 7) with explicit working / episodic / semantic stores and a gateway.

“Vector index as semantic memory”

Description. Documents are ingested into a vector store; the index is treated as the system’s source of truth.

Appeal. Easy to build. Looks like RAG.

Produces. Confidently surfaced stale or wrong material; no notion of freshness; no curation; no retirement of superseded entries.

Alternative. Curated semantic memory (Chapter 7) with explicit ingestion processes, freshness tracking, and authority signals.

“Text-to-SQL over raw schemas”

Description. The model is handed a raw database schema and asked to write analytical queries directly against it.

Appeal. It appears to remove the need to model anything; the agent simply answers questions about the data.

Produces. Structural hallucination, a syntactically valid query that passes the schema validator and executes cleanly but encodes a guessed business rule (what counts as an active customer, whether revenue is gross or net of tax) that the organization defines precisely elsewhere. The number is confidently wrong, and there is no malformed output to catch it.

Alternative. Remove raw queries from the action surface and route analytics through a semantic layer that compiles governed metric definitions (Chapter 14). The agent chooses the metric; the deterministic layer owns its definition.

“Tools-as-permission-boundary”

Description. The team trusts the schema validation at the tool boundary as the complete defense. If the call is well-formed, it proceeds.

Appeal. Validators are real and useful; treating them as the boundary is appealing.

Produces. Semantic misuse (well-formed but wrong); policy violations the schema doesn’t catch; insufficient defense against tool-response injection.

Alternative. Defense in depth (Chapter 6): schema + policy + risk-based escalation + reversibility envelope. The schema is one of several layers, not the only one.

“Skills as escape hatch”

Description. Skills are admitted without the same governance applied elsewhere, the system trusts skills because they are documented procedures.

Appeal. Skills are useful, and gatekeeping them feels redundant.

Produces. Prompt injection at the skill layer; trust transfer from the curator of the skill to the agent’s full authority; lethal-trifecta vulnerabilities introduced via untrusted skill content.

Alternative. Skills are subordinate to architecture (Chapter 10). Skill admission and activation are governed.

“Approval as compliance theater”

Description. Human-in-the-loop is added for compliance reasons, with high volume and low-context approvals; reviewers approve quickly without reading.

Appeal. Satisfies auditors. Demonstrates oversight on paper.

Produces. No real review; eventual incident; loss of meaningful human oversight as a tool.

Alternative. Risk-based escalation (Chapter 6) routing only the actions that genuinely warrant review; review with structured context (the proposed diff, the trace, the risk score); review SLA monitoring.

“Pattern-as-architecture”

Description. The system is designed around one pattern (ReAct, Plan–Execute, Reflection) treated as the architectural answer.

Appeal. The pattern’s elegance carries the design.

Produces. Systems with no bounding discipline, no governance, no memory architecture, patterns sit on nothing. The pattern is fine; the absence of architecture under it is the problem.

Alternative. Architecture first (Chapters 510), patterns within it. Patterns answer how the agent reasons or coordinates; architecture answers how the system stays reliable.

“Optimism in production”

Description. The team treats incidents as bugs to be fixed rather than as expected behavior to be designed for.

Appeal. Cultural. Easy to slip into when the system is new and incidents are rare.

Produces. Each incident causes panic. Trace discipline is set up after the first incident, not before. Anti-patterns accumulate.

Alternative. This chapter. Treat failure modes as part of the design surface from the start.

“Vibe-driven evaluation”

Description. The team validates the agent by chatting with it in a UI for ten minutes. If it looks good, they ship. There is no assertion suite, no trace replay, no property tests, evaluation is a human forming an impression (“LGTM”).

Appeal. Fast, requires no infrastructure, and feels like testing because a human looked at the output.

Produces. Regression blindness. The model is upgraded, or a prompt is edited, and a previously working tool surface silently breaks, no test fails, because there are no tests. The failure is discovered in production, by users.

Alternative. Deterministic envelope assertions and property-based tests over traces (Chapter 12). Manual inspection is a supplement to an automated suite, never the suite itself.

“Stack-trace mindset on probabilistic systems”

Description. When an incident lands, the team looks for the line of code that caused it. They find none, because the cause is a combination of model behavior, prompt, retrieved content, and tool responses that no single line can capture.

Appeal. It is how engineers debug deterministic systems. Hard to unlearn.

Produces. MTTR (mean-time-to-resolution) bloat: a failure that takes ten minutes to localize in a deterministic microservice can take days in an agentic system, because the cause is a distributed collapse across prompt, retrieved content, tool responses, and model weights rather than a single line. Without trace replay, root-cause analyses come back incomplete and the same incident recurs.

Alternative. Trace replay (Chapter 12) as primary debug tool. Reconstruct the session. Modify counterfactually. Identify the cascade rather than the single point.

Closing note

The anti-patterns above are not exhaustive, but they cover the majority of severe incidents observed in production agentic systems through 2024–2026. Each names a structural choice that seemed reasonable at the time and turns out to undermine the architecture rather than support it.

Chapter 12 takes up trace discipline, testing, and evaluation as the operational counterpart to the structural commitments developed in earlier chapters.