AI-Native Workflow

Context Engineering for AI-Assisted Development

In large codebases, AI spends more tokens searching than building.
Plan Stack turns codebase knowledge into reusable plans — so every task starts with context, not exploration.

A zero-infrastructure cache for LLM codebase exploration.
No install required — just a docs/ folder and one line in CLAUDE.md.

Proven with a 10-engineer team on a 500-table Rails app

The real bottleneck: search cost

In a 500-table application, every AI task begins with exploration — ls, grep, file reading — burning thousands of tokens before writing a single line of code. The codebase search pollutes the context window, leaving less room for actual implementation.

62x
Context Growth
From 32K to 2M tokens in two years. Yet quality plateaued.
~70%
Search Overhead
In large apps, agentic exploration dominates token usage before any code is generated.
Lost
In the Middle
Information in the middle of long contexts gets systematically ignored.

A context window is not storage. It is cognitive load.
Stuffing 195K tokens into a 200K window leaves no room for reasoning.

Three principles of context engineering

Stop treating context as infinite storage. Start engineering it like you engineer code.

01

Isolation

Provide the minimum effective context for each task. Scope by responsibility, not file size.

OAuth2 + billing + CSS + tests
OAuth2 models + relevant controller
02

Chaining

Break work into stages. Pass artifacts between them, not entire conversation histories.

Plan artifact (300 lines)
not conversation history (30K tokens)
03

Headroom

Never operate at 100% capacity. Reserve space for the model to actually think.

Token limit = input + output
Leave room for reasoning

Codebases follow a Pareto distribution

Commit history reveals a power law: a small fraction of files account for the majority of changes. You don't need to search everything every time — distill knowledge of frequently-touched areas into plans, and search cost drops dramatically.

! Without Plan Stack
Every task → full codebase search
500 files scanned → ~15,000 tokens
+ With Plan Stack
Every task → check plans first
1 plan read → ~300 tokens

Plan Stack

Implementation plans as first-class artifacts

Instead of letting research and decisions disappear with each /clear, Plan Stack captures them in lightweight, reusable plans.

A 50-file investigation becomes a 300-line plan. Six months later, reviewing one plan beats re-reading 50 files and re-discovering architectural intent.

  • + Compressed research for AI context
  • + Long-term memory for humans
  • + A reliable starting point after context reset
  • + No infrastructure needed — Git-versioned, readable by both AI and humans
AI

LLM-Readable Guides

Plans become implementation guides that Claude Code reads and follows. No need to re-explain your architecture every session.

AI reads plan → understands intent
not: AI greps 500 files blindly
++

Knowledge Inheritance

Similar modifications? The AI finds past plans with design decisions already documented. No human explanation needed.

"Add OAuth" → finds OAuth plan
not: rediscover patterns each time
?

Why docs/, not RAG?

Structured markdown is inherently searchable. Git-versioned naturally. No vector database, no embedding pipeline. Both AI and humans read the same source of truth.

docs/plans/ → versioned, reviewable, zero infra
not: RAG pipeline → embeddings → retrieval → overhead

Research, Plan, Execute, Review

Each phase applies context engineering principles. The workflow creates a self-reinforcing loop where knowledge compounds.

Research

AI checks docs/plans/ for similar implementations. Plan exists? Target files identified in hundreds of tokens. No plan? Agentic search runs — results get committed as a new plan.

Isolation

Plan

AI generates structured plan, human reviews before code. Catches intent misalignment before tokens are spent on wrong implementation.

Headroom

Execute

Plan carries intent across context resets. /clear and resume — no re-explanation needed, no context re-discovery.

Chaining

Review

Plan in PR enables both AI and human review. Detect drift between intended design and actual implementation.

Isolation + Chaining

Embrace the reset

Context degradation is inevitable. Plan Stack turns /clear from a loss into a feature.

! Before /clear
95% context used, quality degrading
+
+ Resume from plan
15% context, full fidelity restored

Restart at 0% context without restarting your work.

One line to begin

Add this instruction to your CLAUDE.md:

CLAUDE.md
Search docs/plans/ for similar past implementations before planning.

This single line creates the self-reinforcing loop:

  • 1. AI checks docs/plans/ first (hundreds of tokens)
  • 2. Plan found → target files identified immediately
  • 3. No plan → agentic search runs on codebase
  • 4. Search results committed as new plan → knowledge base grows

Built for every role in your team

Plan Stack isn't just an AI optimization. It improves workflows for everyone involved in software development.

AI

AI-Friendly

Reduced search tokens, prevention of context pollution, explicit intent. AI spends tokens implementing, not exploring.

300 tokens to load plan
not: 15,000 tokens to grep codebase
PR

Reviewer-Friendly

Plan in every PR. Reviewer — human or AI — knows the intended design before reading code.

Plan + diff = clear intent
not: guess intent from code changes
++

Team-Friendly

Knowledge accumulates and inherits. New members read plans to understand past decisions. Onboarding accelerates.

Read 10 plans → understand the system
not: ask 10 people → different answers

Stop searching the same codebase every time

Start accumulating knowledge. Plans compound with every commit — from your first plan to your thousandth.