When Code Agents Silently Corrupt Their Own State

Code agents fail in ways that look like success. A tool gets called correctly, the response parses, and the agent moves forward—but the internal state has drifted, the context window has lost critical information, or a tool chain has created a cascading inconsistency that won't surface until three steps later. Microsoft and arXiv researchers have documented these failure modes, but production teams have almost no systematic way to catch them before deployment. The problem isn't that agents are unreliable; it's that their failures are often invisible until they matter most.

The most counterintuitive failure in agentic systems isn't hallucination or refusal—it's plausible continuation. An agent receives a tool response, updates its internal state, and proceeds with an assumption that's subtly wrong. Nobody throws an error. The agent doesn't fail; it confidently walks off a cliff it can't see.

Microsoft's taxonomy of agentic failure modes and the arXiv paper "Characterizing Faults in Agentic AI: A Taxonomy of Types" both categorize this: context loss, state corruption, and cascading errors form a distinct failure family separate from model hallucination. Here's what distinguishes them: a hallucinating agent makes something up. A state-corrupted agent believes something false because an earlier step lost or misinterpreted information. By the time the failure surfaces, the agent has built decisions on that foundation.

Latitude.so's production guide identifies six specific failure modes in deployed agents: tool misuse, context loss, goal drift, retry loops, cascading errors, and multi-agent orchestration failures. Of these, four—context loss, cascading errors, goal drift, and multi-agent failures—are silent by default. An agent can make all of them and still produce output that looks structurally correct. O'Reilly's radar article on agentic composition notes the core problem explicitly: "Once agents are wired together without validation, the composition layer becomes a silent failure surface. Each agent's output is another agent's input assumption."

Tool chaining is where this breaks most visibly. Imagine an agent that needs to fetch a user's account balance, then initiate a transfer, then log the action. If the fetch call succeeds but returns null (due to a permission boundary the agent doesn't understand), the agent might proceed with a zero balance assumption or skip validation entirely. The transfer gets initiated anyway. The log entry records it. Three actions succeeded; the state is now inconsistent with reality.

Why don't existing observability tools catch this? LangSmith, Langfuse, and similar platforms log tokens, tool calls, and latencies beautifully. They tell you what happened. They don't tell you whether what happened was correct relative to the agent's internal model. Detecting a semantic failure—where the agent's belief about the world has diverged from the actual world—requires understanding the agent's assumptions, not just its outputs. That's an inference problem that most observability platforms don't attempt.

Agus Sudjianto's work on agentic failure modes emphasizes that orchestration becomes an execution failure vector. When one agent's output feeds into another's input, the failure doesn't just propagate—it compounds. The second agent inherits the first agent's wrong assumption and builds further decisions on it. By the time you notice something's wrong, you've got a chain of dependent decisions all resting on a false premise.

The research identifies this gap clearly: production teams have taxonomies of failure modes but no systematic detection layer that runs before deployment. You can describe what goes wrong. You can't reliably prevent it.

When Code Agents Silently Corrupt Their Own State

Potentials