The 2M Token Trap

Context Engineering for AI-Assisted Development

Context windows grew 62x in two years. AI quality didn't.
A context window is not storage. It is cognitive load.

More tokens, same confusion

LLMs can now hold 2 million tokens in context. But capacity is not comprehension. Throwing more code at a model doesn't mean it understands your intent.

62x
Context Growth
From 32K to 2M tokens in two years. Yet quality plateaued.
75%
The Threshold
Filling context to capacity degrades output quality.
Lost
In the Middle
Information in the middle of long contexts gets systematically ignored.

A context window is not storage. It is cognitive load.
Stuffing 195K tokens into a 200K window leaves no room for reasoning.

Three principles of context engineering

Stop treating context as infinite storage. Start engineering it like you engineer code.

01

Isolation

Provide the minimum effective context for each task. Scope by responsibility, not file size.

OAuth2 + billing + CSS + tests
OAuth2 models + relevant controller
02

Chaining

Break work into stages. Pass artifacts between them, not entire conversation histories.

Plan artifact (300 lines)
not conversation history (30K tokens)
03

Headroom

Never operate at 100% capacity. Reserve space for the model to actually think.

Token limit = input + output
Leave room for reasoning

Plan Stack

Implementation plans as first-class artifacts

Instead of letting research and decisions disappear with each /clear, Plan Stack captures them in lightweight, reusable plans.

A 50-file investigation becomes a 300-line plan. Six months later, reviewing one plan beats re-reading 50 files and re-discovering architectural intent.

  • + Compressed research for AI context
  • + Long-term memory for humans
  • + A reliable starting point after context reset

Research, Plan, Execute, Review

Each phase applies context engineering principles. The workflow creates a self-reinforcing loop where knowledge compounds.

Research

AI checks docs/plans/ for similar implementations. Never start from zero.

Isolation

Plan

Generate a structured implementation plan. Human reviews before any code is written.

Headroom

Execute

Implement with the plan as guide. The plan carries intent across context resets.

Chaining

Review

Compare implementation against plan. Detect drift between intent and code.

Isolation + Chaining

Embrace the reset

Context degradation is inevitable. Plan Stack turns /clear from a loss into a feature.

! Before /clear
95% context used, quality degrading
+
+ Resume from plan
15% context, full fidelity restored

Restart at 0% context without restarting your work.

One line to begin

Add this instruction to your CLAUDE.md:

CLAUDE.md
Search docs/plans/ for similar past implementations before planning.

This single line creates the self-reinforcing loop:

  • 1. AI checks docs/plans/ first
  • 2. Finds distilled context (hundreds of tokens)
  • 3. Skips reading raw code (tens of thousands of tokens)
  • 4. Each new plan adds to the knowledge base

Stop fighting context limits

Start engineering context. Plans compound with every commit.