Introduction: Agents Are at Their DOS Moment
In 2025, AI agents are exploding in capability.
Tools like Claude Code can write code, run tests, fix bugs, and autonomously complete complex engineering tasks. For many people, this feels like the second major shift since ChatGPT first appeared.
But if you look closely at how today’s agents actually operate, a quieter and more uncomfortable truth emerges:
Their foundations are still extremely primitive.
Most agents today manipulate your file system and terminal directly. There may be confirmation prompts or guardrails, but the underlying model remains trust-based, not isolation-based. Safety depends largely on the agent behaving well.
This should feel familiar.
It closely resembles DOS-era computing in the 1980s.
DOS worked. You could write programs, edit files, and build real software. But it lacked nearly everything we now associate with a modern operating system:
- No memory protection
- No true multitasking
- No standardized device abstraction
Applications talked directly to hardware. Developers were responsible for everything.
AI agents are standing at the same starting line today.
What took traditional computing nearly three decades—from DOS to Unix, Windows, and modern kernels—will likely replay in a much shorter window for agents.
This article is written for readers who are:
- experimenting with AI agents,
- building or evaluating agent frameworks,
- or trying to understand why agents feel powerful but fragile at the same time.
The core thesis is simple:
The evolution of operating systems is the best mental model for understanding the future of agent infrastructure.
1. A Core Framework: The Five Subsystems of an Agent OS
In traditional computing:
- CPU provides computation
- RAM provides temporary state
- Disk provides persistent storage
In the agent world, the mapping is surprisingly direct:
- LLMs are the CPU
- Context windows are memory
- Databases and file systems are disk
- Agents are applications
The key constraint is that LLM context behaves exactly like volatile memory.
Once inference ends, internal state disappears. End the session, and the agent forgets everything. All meaningful state must be externalized.
This is why we need something equivalent to an operating system for agents.
A mature Agent OS must manage resources, provide abstractions, and coordinate components. At a minimum, it consists of five subsystems:
- Memory management
- External storage
- Process management
- I/O management
- Security and observability
These subsystems form the skeleton of an Agent OS.
2. Memory Management: The Hardest and Most Valuable Battlefield
The most important lesson from the OS analogy is this:
Memory management—context engineering—is both the hardest problem and the biggest opportunity.
A Familiar Mistake: “128K Tokens Should Be Enough”
In 1981, IBM engineers believed 640KB of memory was sufficient. It became one of the most infamous misjudgments in computing history.
Today, when we say “128K tokens is a lot of context”, we are repeating the same mistake.
Context is the scarcest resource in LLM systems:
- System prompts: 10–20K tokens
- Tool definitions: 10–20K
- Retrieved documents: 50–80K
What remains for actual reasoning is often far smaller than expected.
This is the modern equivalent of hitting the 640KB ceiling.
Virtual Memory: The Missing Abstraction
Before virtual memory, programs crashed when they ran out of RAM—or implemented their own swap logic.
Virtual memory changed everything by creating an illusion of abundance. The OS transparently moved pages between RAM and disk based on access patterns.
Agent systems need the same abstraction.
Lessons from Manus: Context Failures, Not Model Failures
Manus, one of the most capable general-purpose agents in 2025, published a crucial insight:
Most agent failures are context failures, not model failures.
Their team rewrote their framework multiple times and found that architecture mattered more than model choice:
-
KV-cache hit rate matters
Cached tokens are significantly cheaper and faster. Poor context structure directly increases cost and latency. -
File systems as external memory
Treating the file system as an infinite context extension mirrors swap space in classical systems. -
Todo lists as attention control
Restating goals at each step prevents drift and effectively prewarms high-priority information.
For builders, this means the next breakthroughs will come from architecture, not prompts.
Memory Hierarchies and Engram
DeepSeek’s Engram paper reinforces this idea with the concept of memory hierarchy.
Their experiments show optimal performance when roughly 75–80% of resources go to computation and 20–25% to memory.
This mirrors classical storage hierarchies:
- Fast, small, expensive layers
- Slow, large, cheap layers
- Automatic paging, not manual control
Even with million-token contexts, intelligent memory management remains essential.
3. External Storage: The Most Deterministic Opportunity
When context is swapped out, where does it go?
Today, often into Markdown files. Long term, into databases.
If context engineering is the hardest problem, storage is the most commercially predictable one.
Databases in agent systems serve many roles:
- Long-term memory
- State persistence and recovery
- Vector retrieval
- Coordination and locking
- Audit logs and replayability
4. Process Management: A False Red Ocean
Most agent frameworks share the same loop:
Think → Act → Observe → Repeat
When an abstraction is this simple, it cannot be a durable moat.
The real challenges lie elsewhere:
- Concurrency and scheduling
- Fault recovery and checkpoints
- Inter-agent communication
- Graceful termination
These problems are easy to ignore when agents are short-lived.
They become unavoidable when agents run continuously and handle real responsibilities.
5. I/O Management: Beyond Protocol Wars
Tool invocation is the agent equivalent of device drivers.
Protocols like MCP improve interoperability, but adoption alone does not guarantee good architecture. MCP introduces token overhead and re-solves problems Unix tooling addressed decades ago.
CLI tools still offer underappreciated advantages:
- Models are trained on them
- They follow composable design
- They require minimal schema overhead
The likely endpoint is agent-native CLI:
- Structured output
- Standardized error codes
- Machine-readable self-description
This is evolution, not reinvention.
6. Security and Observability: Trust Comes from Visibility
Prompt injection is the AI-era equivalent of buffer overflow.
Both result from failing to separate instructions from data at the architectural level.
Sandboxing is necessary, but insufficient.
Real trust requires observability:
- What did the agent see?
- How did it reason?
- Why did it act?
Agent observability will likely become its own category, much like APM in cloud infrastructure.
What This Means Going Forward
- Better models alone will not fix agent reliability
- Context engineering will matter more than prompt design
- Infrastructure, not UX, will define long-term winners
- The Agent OS layer will quietly shape the ecosystem
Conclusion: The Missing Kernel
In the early 1990s, GNU had everything except a kernel.
Then Linux appeared.
The agent ecosystem today is in a similar state. We have tools, frameworks, storage, and sandboxes—but no unifying kernel.
The real inflection point will not come from stronger models, but from complete system capabilities.
When that happens, agents will stop being clever demos and become processes we can trust with real work.
Somewhere, someone may already be writing the first lines of an Agent OS kernel.
History suggests they probably think it’s “just a hobby.”