Chapter 13The glass layer: UI and interaction architecture
In a conventional web application the user interface is a presentation layer. If the backend enforces its invariants, the system is correct no matter how the frontend renders the data; the UI can be redesigned without touching the security model. Agentic systems break this separation. Chapters 5 and 6 established that much of an agentic system’s reliability rests on the reversibility envelope and on human-in-the-loop (HITL) approval gates. The moment a human sits inside the control loop, the surface that human reads and acts through becomes part of the enforcement mechanism. A gate that a reviewer rubber-stamps is not a gate.
This chapter treats the glass layer — a term this book coins for the user interface and client-side interaction state when that surface is load-bearing enforcement rather than presentation — as architectural structure, not a cosmetic wrapper. When the UI hides the agent’s reasoning, buries the risk, or makes approval a reflex, the governance layer has not failed in the backend; it has failed at the glass. The governing commitment of the chapter is simple to state and demanding to honor: if the human is the final policy gate, the UI is the policy engine.
The streaming-versus-validation paradox
The first architectural conflict in agentic UI sits between user experience and governance, and it has no purely cosmetic resolution.
Users expect a model to stream its output token by token; a time to first token (TTFT) much above two seconds reads as broken. Chapter 6 requires that output pass through schema validators and policy gates, data-loss prevention, toxicity checks, grounding checks, before it reaches the user. These requirements collide. A regular-expression scrubber, a schema validator, and a model-based policy judge all need a complete unit of output to evaluate. Buffer the whole response to validate it and streaming dies, turning a one-second first token into a fifteen-second wait. Stream straight to the user to preserve the experience and a policy-violating span renders on screen before any gate has seen it.
The glass layer resolves the conflict in one of two ways, and the choice is an explicit architectural decision with a regulatory dimension. The two options trade latency against exposure risk, and the right choice turns on whether a brief flash of policy-violating content is itself the harm.
| Option | Time to first token | Exposure risk | When to use |
|---|---|---|---|
| Chunked buffering | Modest increase (a gate runs per semantic unit) | None — no unvalidated text is ever rendered | Regulated settings where a momentary flash of protected data is a reportable disclosure; the correct default there |
| Optimistic streaming with rollback | Ideal (tokens stream immediately as provisional output) | Brief — a violating span is visible before retraction | Low-stakes, high-interactivity surfaces; wrong wherever a brief exposure is the harm |
In chunked buffering, the backend accumulates output into semantic units, a complete sentence, a complete structured field, and runs the policy gate on each unit; a unit that passes is flushed over server-sent events or a socket, a unit that fails never leaves the server. In optimistic streaming, the client holds an explicit state machine over the rendered text: provisional text is rendered in a visibly distinct treatment, commits to its normal appearance when the trailing gate clears, and is replaced by a redaction block with a short explanation when the gate fails. The state machine is the cost the option buys its latency with.
Optimistic streaming state machine
Each streamed span cycles through provisional, committed, and redacted render states; only a passing trailing gate commits text to its final appearance.

One boundary is worth keeping distinct from the model gateway’s egress filter (Chapter 15): this gate enforces Chapter 6 on generated output at the moment of display — on its way to the user, not on prompts on their way to the provider. Chapter 15 develops the distinction in full.
Trace progressive disclosure
Chapter 12 established the structured trace as the system of record: every thought, bound check, policy decision, and tool call is logged. The glass layer must decide how much of that record to show, and the two easy answers are both wrong. Hiding the trace entirely, a chat box that shows a spinner for thirty seconds, breeds blind trust and frustration. Dumping the trace raw, a firehose of debug events, breeds the alert fatigue that makes a reviewer stop reading. The architecture’s job is progressive disclosure: mapping granular backend events to a small set of human-legible milestones, with detail available on demand but not by default.
| Backend trace event | Client rendering |
|---|---|
| Session start | Initialization state |
| Agent thought | Hidden behind an expandable reasoning toggle |
| Skill activated | “Loaded project guidelines” (shows the context is grounded) |
| Action proposed | “Preparing to search the codebase…" |
| Tool call, slow | “Searching 450 files…" (surfaces action latency) |
| Policy gate, deny | “Action blocked by policy: diff contains secrets” |
| Approval requested | Halts the stream; renders the approval component |
The client subscribes to the event stream and folds it into a single task-state object that drives the rendering. The point of surfacing tool calls and passing gates is not decoration; it changes the user’s mental model from chatting with an oracle to supervising a junior colleague, and a supervisor reads before approving where a chatter does not. Crucially, the mapping is deterministic, the UI state derives from typed trace events, never from text the model emitted about its own progress.
Approval queues and structured friction
When a risk score crosses a threshold or an action breaches the reversibility envelope (Chapter 5), the backend suspends the agent and requests human approval. If the UI presents this as a dialog reading “The agent wants to run a database migration. Cancel / Approve,” the architecture has already lost: a reviewer trying to make progress will approve to unblock themselves, every time. The approval surface must impose structured friction, deliberate design that forces cognitive engagement before the action can execute.
An approval component carries five commitments.
-
Project the reversibility envelope. State the stakes in plain terms: whether the action mutates production data and whether it can be rolled back automatically.
-
Render the payload, not the prompt. Show the exact deterministic artifact that will execute, a code diff, the literal tool arguments, never the agent’s prose summary of its intent. A reviewer approving a summary is approving a paraphrase.
-
Surface the risk score and its triggers. Display the backend’s risk score and name the policies that fired (“risk 45 of 50; trigger: diff exceeds 500 lines”), so the reviewer knows why the action was held.
-
Allow the payload to be edited. When the agent proposes a large change with one mistake, forcing a full reject-and-reprompt cycle is the friction that trains reviewers to approve blindly instead. Let the human edit the tool arguments in place and approve the corrected version; the reviewer becomes a participant in the control loop rather than a gate on it.
-
Require explicit acknowledgment for the highest risk. For irreversible actions, keep approval disabled until the reviewer checks boxes corresponding to the specific risks (“I have verified this query contains no personal data”). The pause is the point.
Steering and interruption
Approval is the human acting on an action the agent has proposed. The harder case is the human acting on an agent that is already running, watching the trace unfold and judging, before any gate fires, that the agent has misunderstood the goal. The glass layer is the surface for that intervention, and it supports two forms of it: stopping the agent and steering it.
Stopping is the break-glass case. A user who sees the agent confidently editing the wrong module wants it to halt now, before it spends another dollar or touches another file. Closing the browser tab does not achieve this, the backend loop keeps reasoning, spending, and acting, oblivious to the absent observer (Chapter 9). The glass layer must therefore expose an explicit stop control, present throughout a run rather than buried in a menu, wired to the deterministic interrupt signal that terminates the loop, severs the open model connection, and cancels in-flight tools. The interface commitment mirrors the one for approval: the stop is confirmed by a typed session. trace event rendered as a terminal state, not by an optimistic spinner, and any partially completed action sequence shows its compensating rollback, so the user can see the system was left consistent rather than abandoned mid-mutation.
Steering is the softer and more common intervention. Often the supervisor does not want to abort but to correct: you are editing the staging config, not production; skip the integration tests, they are known to be flaky today. The naive implementation injects the correction as another conversational turn and hopes the model attends to it. The architectural implementation treats steering as a deterministic control event: the correction is captured as a typed message, injected into the agent’s working memory as a high-priority observation at the next loop boundary (Chapter 7), and recorded in the trace, so that the agent’s subsequent reasoning demonstrably incorporates it and the intervention is auditable after the fact. Steering redirects the agent within its existing limits; it cannot widen them. A mid-run correction is still subject to the bounding layer (Chapter 5), so a human cannot use the steering channel to push the agent past a cost ceiling or onto a tool outside its action surface, the same discipline that prevents the agent from expanding its own authority prevents a hurried operator from doing it on the agent’s behalf.
The unifying principle is the one that governs the rest of the glass layer: real-time human input is a stream of typed control events with defined semantics and trace records, never free text the model is trusted to interpret correctly. Intervention is architecture, not a chat message.
Trust calibration and seams
A conventional interface is designed to feel fast and authoritative. An agentic interface must do the opposite where it counts: it must show its seams, because an interface that looks too polished invites the user to over-trust a probabilistic system underneath it. The glass layer continuously calibrates the user’s trust to the agent’s actual reliability.
Three mechanisms do most of this work. Grounding requires that every factual claim the UI renders carry a deterministic, clickable citation back to the specific source chunk in semantic memory (Chapter 8); a sentence the backend cannot ground is marked as unverified synthesis rather than presented as fact. Surfacing alternatives exposes the paths the agent considered and discarded, where a cognitive pattern (Chapter 4) explored several branches, naming the runner-up and why it lost calibrates trust better than presenting the chosen path as inevitable. And deterministic interaction tools replace prompt-driven formatting: rather than instructing the model to phrase a question for the user, the agent invokes a request-for-input tool whose typed event the UI renders as a structured form, so the interaction surface never depends on the model emitting the right text.
Asynchronous hydration
Enterprise agentic workflows can run for hours, waiting on rate limits, slow external tools, or a human in an approval queue, so the glass layer cannot assume a browser tab stays open on a live socket for the duration. The client must treat a session as an asynchronous, durable task rather than a connection.
The architecture follows from that premise. The client subscribes to a session identifier rather than holding a connection. When the user closes the laptop and the agent is waiting on an external system, the backend hydrates the agent’s state to durable storage and sleeps it (Chapter 5). When the user returns, minutes or days later, the client fetches the trace from the trace store, replays the event log through the same client-side state machine that drives live rendering, and reconstructs the progressive-disclosure view exactly where it left off. The presentation layer is ephemeral; the reasoning loop and its trace are durable; replay is what connects them.
Testing the glass layer
The discipline that makes the glass layer structural also makes it testable. Because interaction state derives from typed trace events rather than from the model’s text, the client is a deterministic function of an event log, and an event log is a fixture. A recorded trace can be replayed through the client state machine to assert that the rendered milestones match the events, that a failing trailing gate drives a provisional span to its redacted state, and that the approval action stays disabled until the required acknowledgments are checked. None of these tests invoke a model; they are ordinary assertions over a state machine (Chapter 12). A UI whose state were parsed from model output could not be tested this way, which is a second reason, beyond correctness, to drive it from the trace.
Anti-patterns at the glass
The magic chat box. A surface consisting of a text input and a streaming output, hiding every tool call, retrieval, and gate. It produces frustration when a response takes forty-five seconds with no visible progress, and blind trust in whatever finally appears.
The looks-good-to-me button. An approval flow that defaults to approve, takes one click, and never shows the execution payload. It nullifies the backend reversibility envelope by moving the decision to a reviewer who has seen nothing to decide on.
Prompt-rendered UI state. Instructing the model to emit its own interface state, “begin your reply with a waiting marker if you need input.” The model will eventually hallucinate the marker, or omit it. UI state must be driven by deterministic backend trace events, never parsed from the model’s text.
Summary
The glass layer is the physical manifestation of the system’s governance and bounded autonomy, not a skin over them. It resolves the streaming-versus-validation paradox through chunked buffering or optimistic rollback; it translates raw traces into progressive disclosure so the user supervises rather than chats; and it imposes structured friction on approvals so a human gate stays a real gate. By driving every interaction-state decision from deterministic trace events rather than model-generated text, the architecture keeps the human a reliable, structural component of the system’s safety posture. Chapter 14 turns outward, to the problem of embedding these systems inside the legacy, multi-tenant enterprise platforms where most of them will actually live.