2026-05-22
Lyrikai:Research
Vol. 01 · L1
Research · L1

When Code Agents Silently Corrupt Their Own State

Code agents fail in ways that look like success. A tool gets called correctly, the response parses, and the agent moves forward—but the internal state has drifted, the context window has lost critical information, or a tool chain has created a cascading inconsistency that won't surface until three steps later. Microsoft and arXiv researchers have documented these failure modes, but production teams have almost no systematic way to catch them before deployment. The problem isn't that agents are unreliable; it's that their failures are often invisible until they matter most.

The most counterintuitive failure in agentic systems isn't hallucination or refusal—it's plausible continuation. An agent receives a tool response, updates its internal state, and proceeds with an assumption that's subtly wrong. Nobody throws an error. The agent doesn't fail; it confidently walks off a cliff it can't see.

Microsoft's taxonomy of agentic failure modes and the arXiv paper "Characterizing Faults in Agentic AI: A Taxonomy of Types" both categorize this: context loss, state corruption, and cascading errors form a distinct failure family separate from model hallucination. Here's what distinguishes them: a hallucinating agent makes something up. A state-corrupted agent believes something false because an earlier step lost or misinterpreted information. By the time the failure surfaces, the agent has built decisions on that foundation.

Latitude.so's production guide identifies six specific failure modes in deployed agents: tool misuse, context loss, goal drift, retry loops, cascading errors, and multi-agent orchestration failures. Of these, four—context loss, cascading errors, goal drift, and multi-agent failures—are silent by default. An agent can make all of them and still produce output that looks structurally correct. O'Reilly's radar article on agentic composition notes the core problem explicitly: "Once agents are wired together without validation, the composition layer becomes a silent failure surface. Each agent's output is another agent's input assumption."

Tool chaining is where this breaks most visibly. Imagine an agent that needs to fetch a user's account balance, then initiate a transfer, then log the action. If the fetch call succeeds but returns null (due to a permission boundary the agent doesn't understand), the agent might proceed with a zero balance assumption or skip validation entirely. The transfer gets initiated anyway. The log entry records it. Three actions succeeded; the state is now inconsistent with reality.

Why don't existing observability tools catch this? LangSmith, Langfuse, and similar platforms log tokens, tool calls, and latencies beautifully. They tell you what happened. They don't tell you whether what happened was correct relative to the agent's internal model. Detecting a semantic failure—where the agent's belief about the world has diverged from the actual world—requires understanding the agent's assumptions, not just its outputs. That's an inference problem that most observability platforms don't attempt.

Agus Sudjianto's work on agentic failure modes emphasizes that orchestration becomes an execution failure vector. When one agent's output feeds into another's input, the failure doesn't just propagate—it compounds. The second agent inherits the first agent's wrong assumption and builds further decisions on it. By the time you notice something's wrong, you've got a chain of dependent decisions all resting on a false premise.

The research identifies this gap clearly: production teams have taxonomies of failure modes but no systematic detection layer that runs before deployment. You can describe what goes wrong. You can't reliably prevent it.


Potentials

A useful detection system would need to do two things: (1) instrument the agent to capture its internal assumptions at each step—what it believes about the state of the world after each tool call—and (2) compare those assumptions against ground truth via secondary verification queries before the agent commits to downstream actions. This isn't validation in the traditional sense; it's a belief audit. After the agent decides what to do next, ask it why, and check whether its reasoning still holds. If the agent can't defend its assumptions, or if a quick secondary check reveals the assumption is false, halt and escalate before the agent acts.

Teams building multi-agent systems would benefit most from this immediately. Single-agent systems have fewer state boundaries to corrupt. Multi-agent orchestration is where silent failures become existential. A framework that made belief auditing routine—where agents could query each other's assumptions or have their assumptions queried by a coordination layer—would catch the cascading failures that current logs completely miss.

"A tool gets called correctly, the response parses, and the agent moves forward—but the internal state has drifted and the error won't surface until three steps later."
"Existing observability tools tell you what the agent did. They don't tell you whether what it did was correct relative to the agent's own internal model of the world."
"Once agents are wired together without validation, the composition layer becomes a silent failure surface. Each agent's output is another agent's input assumption."