Subagents Account for Most Token Costs in Long Agent Runs: Fixes That Cut Usage 70 to 90 Percent in Practice

Running multi-turn or multi-agent AI sessions? There is a consistent degradation pattern across tools: context fills with repeated history, tool schemas, and subagent handoffs. A 2026 paper by Bai et al. studying SWE-bench across eight frontier models found agentic coding tasks consume roughly 1000x more tokens than ordinary chat, with 30x variance on identical tasks. Accuracy does not rise with spend.

In one tracked research synthesis run I observed context hit 450,000 tokens. The agent dropped early constraints, re-queried sources already in history, and required manual reset. After adding three controls, the same class of task peaked near 85,000 tokens:

PLAN.md and INVARIANTS.md outside the conversation window, read fresh each major turn
A 2,000-line read budget gate per turn (agent states intent before any retrieval)
Out-of-band notes for subagent coordination so side traffic never enters the main transcript

Dynamic tool discovery produces similar ratios. One harness reduced input tokens 96% and total spend 90% by loading schemas only for tools the agent actually selects, rather than injecting a full catalog on every call.

Full write-up with the paper analysis, tree-sitter extraction patterns, and an implementation checklist

What token or cost patterns have you run into in your own agent sessions?

submitted by /u/magicroot75
[link] [comments]