One million tokens and the legacy codebase problem

Anthropic released Opus 4.6 on February 5 with a 1M token context window in beta. It's a meaningful step. For the first time, an Opus-class model can hold a nontrivial slice of a real production codebase in-context without aggressive summarization.

"Nontrivial slice" is the phrase worth unpacking. Let me do the arithmetic on an actual Android app.

A mid-sized eBay-scale module is roughly 300,000 to 500,000 lines of Kotlin, Java, and XML. At ~4 characters per token and ~40 tokens per line for dense Kotlin, that's somewhere between 12 and 20 million tokens — an order of magnitude past what 1M can hold. Add the Android SDK surface the code actually touches (activity lifecycle, lifecycle-aware components, the specific Jetpack libraries at play) and the context budget is gone before the agent has read half of the feature directory.

So 1M context does not mean "the agent can just read the codebase." On a real project, it still can't.

What 1M context does change is the calculus of retrieval.

Under a 200K context window, retrieval has to be aggressive. The agent gets a handful of files plus a summary plus maybe a search result, and the rest of the codebase is dark matter — present but not visible. The retrieval system has to make precise cuts, and when it cuts wrong, the agent misses something load-bearing and produces plausible-but-wrong code.

Under 1M, retrieval can be looser. You can pull a full feature module plus the related test directory plus a chunk of the design system plus the relevant platform documentation, and still have room left. The cost of over-retrieval drops. The cost of under-retrieval is still punishing, but the window to fail is wider.

Where this bites in practice on mobile:

Build graph reasoning. A Gradle multi-module build with 40+ modules produces a dependency graph the agent benefits from seeing in full. At 200K, you summarize. At 1M, you can include the raw settings.gradle.kts plus every module's build.gradle.kts plus the version catalog and still have room.
API version matrices. The reason agents silently produce code that fails on older Android SDK versions is that the per-version behavior tables are too long to include. At 1M, you can fit them.
Design system scans. Compose design systems in big orgs run to dozens of components with overlapping variants. At 200K you retrieve three. At 1M you retrieve the index.

What 1M doesn't do: replace retrieval. The naive "just stuff the codebase into context" approach still breaks on real projects, it just breaks later. The interesting engineering is still in what gets retrieved, how it's structured, and how the agent signals what else it needs.

I've switched my main Claude Code session to Opus 4.6 with the 1M beta enabled. The immediate effect is less defensive summarization in my own prompts. The agent can ask for context directly, and the context I can hand it is bigger. The scaffolding still has to be there — but the ceiling moved up.