Running multi-turn or multi-agent AI sessions? There is a consistent degradation pattern across tools: context fills with repeated history, tool schemas, and subagent handoffs. A 2026 paper by Bai et al. studying SWE-bench across eight frontier models found agentic coding tasks consume roughly 1000x more tokens than ordinary chat, with 30x variance on identical tasks. Accuracy does not rise with spend.
In one tracked research synthesis run I observed context hit 450,000 tokens. The agent dropped early constraints, re-queried sources already in history, and required manual reset. After adding three controls, the same class of task peaked near 85,000 tokens:
- PLAN.md and INVARIANTS.md outside the conversation window, read fresh each major turn
- A 2,000-line read budget gate per turn (agent states intent before any retrieval)
- Out-of-band notes for subagent coordination so side traffic never enters the main transcript
Dynamic tool discovery produces similar ratios. One harness reduced input tokens 96% and total spend 90% by loading schemas only for tools the agent actually selects, rather than injecting a full catalog on every call.
What token or cost patterns have you run into in your own agent sessions?
[link] [comments]