Token Efficiency

In short: AI agent cost optimization starts with context growth. A long-running agent can move from $50 a month to $2,500 because each turn resends system prompts, tool definitions, memory, files, and earlier messages. Four practices bring the bill back under control: prompt caching, lazy-loaded tools, model routing, and context cleanup. When an agent is new, the system prompt may be only a few hundred tokens, with two or three tools. Then the prompt grows, the tool list expands, memory accumulates, and every turn starts paying for earlier turns. The Claude system prompt leaked in late 2024 was 24,000 tokens, nearly 50 times larger than the starting point. OpenClaw users have reported sending more than 150,000 input tokens to Gemini 3.1 Pro, only to get 29 output tokens in the first turn. An unoptimized agent handling 100 messages a day with 166K input tokens can cost about $996 a month on Gemini 3.1 Pro and about $2,490 a month on Claude Opus 4.6. There are ways to push that cost back down to $50-$100 a month. ...