Agent Architecture

In short: Many AI agents look productive but are actually drifting — confidently executing the wrong moves on a wrong picture of the situation. The bottleneck for the next phase of agent systems is not larger context windows or stronger base models; it is whether the system can construct and maintain a stable belief state. This piece argues why belief state quality is the right optimization target, proposes five proxy metrics to measure it, and lays out where to put incremental engineering resources next. AI agents that look productive often turn out to be drifting — confidently executing the wrong moves on a wrong picture of the situation. Competition in agent systems is shifting from “whose model is stronger” toward “who can keep producing higher-quality belief state.” If you accept that framing, several seemingly unrelated problems suddenly line up: the same model behaves very differently inside different product shells; long-running agents fail not because they cannot answer but because their judgment of the situation is wrong; context windows keep growing, but system capability does not scale linearly with them; and scattered engineering pieces — skill, memory, retrieval, tool use, trace, summary — all start to matter at the same time. ...

Introduction: Agents Are at Their DOS Moment In 2025, AI agents are exploding in capability. Tools like Claude Code can write code, run tests, fix bugs, and autonomously complete complex engineering tasks. For many people, this feels like the second major shift since ChatGPT first appeared. But if you look closely at how today’s agents actually operate, a quieter and more uncomfortable truth emerges: Their foundations are still extremely primitive. Most agents today manipulate your file system and terminal directly. There may be confirmation prompts or guardrails, but the underlying model remains trust-based, not isolation-based. Safety depends largely on the agent behaving well. This should feel familiar. It closely resembles DOS-era computing in the 1980s. DOS worked. You could write programs, edit files, and build real software. But it lacked nearly everything we now associate with a modern operating system: No memory protection No true multitasking No standardized device abstraction Applications talked directly to hardware. Developers were responsible for everything. AI agents are standing at the same starting line today. What took traditional computing nearly three decades—from DOS to Unix, Windows, and modern kernels—will likely replay in a much shorter window for agents. ...

Agent Architecture

Why AI Agents Drift: Belief State Is the Real Bottleneck, Not Context Length

The Operating System Moment of AI Agents