If you've been using Claude Code or Cursor seriously, you've probably had this moment:

"Wait... didn't I already figure this out yesterday?"

You grep the same files. You re-discover the same constraints. You re-explain the same architectural decisions to the model.

Claude is great at finding things. But it forgets why things were done the way they were — because we let it.

That's what this post is about.


The problem: AI dev feels like starting from zero every time

Agentic coding tools feel magical at first. "Search the repo." "Find related code." "Explain how this works."

But after a while, a pattern emerges. The same investigations happen again and again. The same reasoning gets recomputed. The same context disappears between sessions.

This isn't a UX bug. It's a design choice.


From RAG to Agentic Search: a quick map

Before Claude Code, the dominant idea was RAG: pre-index documents, embed everything, retrieve via vector search.

Then came Agentic Search: no upfront indexing, the model explores at runtime, using grep, glob, and read instead of embeddings.

Claude Code fully committed to this approach.

In a Latent Space podcast interview (May 2025), Boris Cherny — the creator of Claude Code — explained the decision directly:

"Early versions of Claude Code used RAG + a local vector db, but we found pretty quickly that agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."

He also noted that even Anthropic's own codebase was too sensitive to upload to a third-party index. If Anthropic won't trust the RAG security model with their own code, it says a lot.

On the other side, Milvus published a detailed counter-argument: "Why I'm Against Claude Code's Grep-Only Retrieval". Their claim: grep burns too many tokens, lacks semantic understanding, and their vector search-based approach reduced token usage by 40% or more.

They're not wrong — but they're arguing from a different optimization target.


Why Claude Code's "simple" approach won

Claude Code deliberately avoided vector DBs, caches, and persistent memory systems. Instead, it bet on raw model capability, cheap tokens (eventually), and simple, transparent tools.

And it worked. grep + glob + read beat carefully engineered RAG pipelines.

The market validated this decisively. Claude Code reached $1 billion in annualized revenue within six months of its public launch in May 2025 — a velocity that even ChatGPT didn't match. Netflix, Spotify, KPMG, L'Oreal, and Salesforce are among its enterprise users.

Claude Code didn't win because it was clever. It won because it was honest about scope.

As Boris described in his widely discussed workflow thread, the philosophy is "do the simple thing first." Whether it's the memory implementation (a markdown file that gets auto-loaded) or prompt summarization (just ask Claude to summarize), the team always picks the smallest building blocks that are useful, understandable, and extensible.

That's important for what comes next.


The intentional omission

Claude Code is excellent at one-shot exploration, ad-hoc understanding, and answering "what's going on right now?"

But it intentionally does not optimize for token efficiency across runs, reuse of reasoning, or team-level knowledge accumulation.

This is not a flaw. It's a boundary.

Claude Code optimizes for the best possible experience in this moment.

Which raises a simple question:

Why do we throw away the result of that exploration every time?


Plan Stack: don't redo what you already learned

Plan Stack doesn't reject Agentic Search. It assumes it.

The only difference is this: if the exploration was valuable, don't let it evaporate.

That's it.

No vector database. No memory service. No special infra.

Just docs/, plans/, and Git.

Same philosophy as Claude Code: don't build clever systems. Use boring tools well.

Agentic Search happens once. Its result becomes non-volatile.


What "context engineering" looks like in practice

The key question after a search is: "Is this reasoning reusable?"

If yes, extract the decision, write why (not just what), and commit it as a plan or doc.

A concrete example

Say you're working on a Rails app with a complex payment integration. The first time around, Claude Code explores 14 files across app/models/, app/services/, and lib/. It discovers that error handling uses a custom retry wrapper because the payment gateway sometimes returns transient 5xx errors. It finds that timeout settings were intentionally set to 30 seconds (not the default 15s) due to a production incident six months ago.

This exploration costs roughly 3,000–5,000 tokens. And the reasoning — why those decisions exist — lives nowhere except that single session.

Two weeks later, a different team member asks Claude Code to refactor the payment flow. Same investigation. Same token cost. Same rediscovery. Zero memory of the previous session.

With Plan Stack, after the first exploration, the decision context gets committed as a structured document:

docs/plans/payment-gateway-error-handling.md

Inside, it records: custom retry wrapper in PaymentRetryService, 30-second timeout (not default 15s), the reason (transient 5xx errors under load, cascading failures from a past incident), retry logic (exponential backoff, max 3 attempts), and the key files involved.

The second run starts here. No re-exploration. Focus only on what actually changed.


This isn't RAG vs Agentic Search

That debate misses the point.

Claude Code proved that runtime search beats upfront indexing. Plan Stack asks a different question: why should we forget what we just learned?

This isn't opposition. It's continuation.

Both share the same core philosophy. Both avoid complex infrastructure — Claude Code uses grep and glob instead of vector DBs; Plan Stack uses docs and Git instead of special storage. Both bet on model improvement — models get smarter at searching, and they also get smarter at reading plans. Both start simple — "do the simple thing first" means markdown files in a repo, not a new platform to learn.

The difference is scope. Claude Code delivers the best experience in this session. Plan Stack delivers compounding value across sessions and team members.


Closing thought

Agentic Search proved that searching at runtime is better than indexing upfront. Plan Stack asks a different question: why should we forget what we just learned?

If you're curious, visit plan-stack.ai or check out the GitHub repo at planstack-ai/planstack.

I'd love to hear pushback — especially from folks who run Claude Code in large repos with multiple contributors, have tried CLAUDE.md-based approaches and hit scaling limits, or think this is just "writing docs" with extra steps. (It kind of is — and that's the point.)

Sources

  1. Boris Cherny on Latent Space Podcast (May 2025) — latent.space/p/claude-code
  2. Milvus: "Why I'm Against Claude Code's Grep-Only Retrieval" — milvus.io
  3. Anthropic: "Claude Code reaches $1B milestone" — anthropic.com
  4. Boris Cherny's workflow thread (Jan 2026) — twitter-thread.com
  5. VentureBeat: "The creator of Claude Code just revealed his workflow" — venturebeat.com