The Operating System Moment of AI Agents

Introduction: Agents Are at Their DOS Moment

In 2025, AI agents are exploding in capability.

Tools like Claude Code can write code, run tests, fix bugs, and autonomously complete complex engineering tasks. For many people, this feels like the second major shift since ChatGPT first appeared.

But if you look closely at how today’s agents actually operate, a quieter and more uncomfortable truth emerges:

Their foundations are still extremely primitive.

Most agents today manipulate your file system and terminal directly. There may be confirmation prompts or guardrails, but the underlying model remains trust-based, not isolation-based. Safety depends largely on the agent behaving well.

This should feel familiar.

It closely resembles DOS-era computing in the 1980s.

DOS worked. You could write programs, edit files, and build real software. But it lacked nearly everything we now associate with a modern operating system:

No memory protection
No true multitasking
No standardized device abstraction

Applications talked directly to hardware. Developers were responsible for everything.

AI agents are standing at the same starting line today.

What took traditional computing nearly three decades—from DOS to Unix, Windows, and modern kernels—will likely replay in a much shorter window for agents.

This article is written for readers who are:

experimenting with AI agents,
building or evaluating agent frameworks,
or trying to understand why agents feel powerful but fragile at the same time.

The core thesis is simple:

The evolution of operating systems is the best mental model for understanding the future of agent infrastructure.

1. A Core Framework: The Five Subsystems of an Agent OS

In traditional computing:

CPU provides computation
RAM provides temporary state
Disk provides persistent storage

In the agent world, the mapping is surprisingly direct:

LLMs are the CPU
Context windows are memory
Databases and file systems are disk
Agents are applications

The key constraint is that LLM context behaves exactly like volatile memory.

Once inference ends, internal state disappears. End the session, and the agent forgets everything. All meaningful state must be externalized.

This is why we need something equivalent to an operating system for agents.

A mature Agent OS must manage resources, provide abstractions, and coordinate components. At a minimum, it consists of five subsystems:

Memory management
External storage
Process management
I/O management
Security and observability

These subsystems form the skeleton of an Agent OS.

2. Memory Management: The Hardest and Most Valuable Battlefield

The most important lesson from the OS analogy is this:

Memory management—context engineering—is both the hardest problem and the biggest opportunity.

A Familiar Mistake: “128K Tokens Should Be Enough”

In 1981, IBM engineers believed 640KB of memory was sufficient. It became one of the most infamous misjudgments in computing history.

Today, when we say “128K tokens is a lot of context”, we are repeating the same mistake.

Context is the scarcest resource in LLM systems:

System prompts: 10–20K tokens
Tool definitions: 10–20K
Retrieved documents: 50–80K

What remains for actual reasoning is often far smaller than expected.

This is the modern equivalent of hitting the 640KB ceiling.

Virtual Memory: The Missing Abstraction

Before virtual memory, programs crashed when they ran out of RAM—or implemented their own swap logic.

Virtual memory changed everything by creating an illusion of abundance. The OS transparently moved pages between RAM and disk based on access patterns.

Agent systems need the same abstraction.

Lessons from Manus: Context Failures, Not Model Failures

Manus, one of the most capable general-purpose agents in 2025, published a crucial insight:

Most agent failures are context failures, not model failures.

Their team rewrote their framework multiple times and found that architecture mattered more than model choice:

KV-cache hit rate matters
Cached tokens are significantly cheaper and faster. Poor context structure directly increases cost and latency.
File systems as external memory
Treating the file system as an infinite context extension mirrors swap space in classical systems.
Todo lists as attention control
Restating goals at each step prevents drift and effectively prewarms high-priority information.

For builders, this means the next breakthroughs will come from architecture, not prompts.

Memory Hierarchies and Engram

DeepSeek’s Engram paper reinforces this idea with the concept of memory hierarchy.

Their experiments show optimal performance when roughly 75–80% of resources go to computation and 20–25% to memory.

This mirrors classical storage hierarchies:

Fast, small, expensive layers
Slow, large, cheap layers
Automatic paging, not manual control

Even with million-token contexts, intelligent memory management remains essential.

3. External Storage: The Most Deterministic Opportunity

When context is swapped out, where does it go?

Today, often into Markdown files. Long term, into databases.

If context engineering is the hardest problem, storage is the most commercially predictable one.

Databases in agent systems serve many roles:

Long-term memory
State persistence and recovery
Vector retrieval
Coordination and locking
Audit logs and replayability

4. Process Management: A False Red Ocean

Most agent frameworks share the same loop:

Think → Act → Observe → Repeat

When an abstraction is this simple, it cannot be a durable moat.

The real challenges lie elsewhere:

Concurrency and scheduling
Fault recovery and checkpoints
Inter-agent communication
Graceful termination

These problems are easy to ignore when agents are short-lived.

They become unavoidable when agents run continuously and handle real responsibilities.

5. I/O Management: Beyond Protocol Wars

Tool invocation is the agent equivalent of device drivers.

Protocols like MCP improve interoperability, but adoption alone does not guarantee good architecture. MCP introduces token overhead and re-solves problems Unix tooling addressed decades ago.

CLI tools still offer underappreciated advantages:

Models are trained on them
They follow composable design
They require minimal schema overhead

The likely endpoint is agent-native CLI:

Structured output
Standardized error codes
Machine-readable self-description

This is evolution, not reinvention.

6. Security and Observability: Trust Comes from Visibility

Prompt injection is the AI-era equivalent of buffer overflow.

Both result from failing to separate instructions from data at the architectural level.

Sandboxing is necessary, but insufficient.

Real trust requires observability:

What did the agent see?
How did it reason?
Why did it act?

Agent observability will likely become its own category, much like APM in cloud infrastructure.

What This Means Going Forward

Better models alone will not fix agent reliability
Context engineering will matter more than prompt design
Infrastructure, not UX, will define long-term winners
The Agent OS layer will quietly shape the ecosystem

Conclusion: The Missing Kernel

In the early 1990s, GNU had everything except a kernel.

Then Linux appeared.

The agent ecosystem today is in a similar state. We have tools, frameworks, storage, and sandboxes—but no unifying kernel.

The real inflection point will not come from stronger models, but from complete system capabilities.

When that happens, agents will stop being clever demos and become processes we can trust with real work.

Somewhere, someone may already be writing the first lines of an Agent OS kernel.

History suggests they probably think it’s “just a hobby.”

Introduction: Agents Are at Their DOS Moment#

1. A Core Framework: The Five Subsystems of an Agent OS#

2. Memory Management: The Hardest and Most Valuable Battlefield#

A Familiar Mistake: “128K Tokens Should Be Enough”#

Virtual Memory: The Missing Abstraction#

Lessons from Manus: Context Failures, Not Model Failures#

Memory Hierarchies and Engram#

3. External Storage: The Most Deterministic Opportunity#

4. Process Management: A False Red Ocean#

5. I/O Management: Beyond Protocol Wars#

6. Security and Observability: Trust Comes from Visibility#

What This Means Going Forward#

Conclusion: The Missing Kernel#