Karpathy's LLM Wiki: A Simpler Alternative to RAG?

Andrej Karpathy posted a thread on X on April 3 describing how he now spends most of his LLM token budget not on code generation but on building personal knowledge bases. The next day he followed up with a GitHub Gist laying out the full pattern. Within 72 hours, community implementations were everywhere: Claude Code plugins, Obsidian integrations, full repo scaffolds. The pace tells you something about how many people were already frustrated with RAG.

The pitch

Karpathy's argument is straightforward. Most people's experience with LLMs and documents looks like retrieval-augmented generation: you upload files, the model finds chunks at query time, stitches together an answer, and forgets everything by the next question. NotebookLM, ChatGPT file uploads, the whole ecosystem works this way. Nothing accumulates.

His alternative: have the LLM read a source once, extract what matters, and compile it into structured markdown wiki pages. Entity pages, concept pages, cross-references, an index, a log. Each new source updates existing pages and creates new ones. Ask a question, and the model navigates a knowledge base it built itself. Good answers get saved back as new pages.

Karpathy says his own wiki hit about 100 articles and 400,000 words. All markdown. Viewable in Obsidian.

What's actually in the gist?

Here's where it gets interesting, and also where the hype needs some deflation. The gist is not code. It is not a library. Karpathy calls it an "idea file," and he means it literally. You copy-paste the document into your LLM agent (Claude Code, OpenAI Codex, whatever) and it builds a customized implementation for your setup. His exact framing: there is less point in sharing specific code in the era of LLM agents, so you share the idea instead.

That's a genuinely novel distribution model for software patterns. But it also means there's nothing to install, nothing to benchmark, and nothing to file bugs against. The community repos popping up on GitHub are other people's interpretations, not Karpathy's code.

The gist describes three layers: a raw/ folder for immutable source material (papers, transcripts, notes), a wiki/ folder the LLM maintains, and a schema file (CLAUDE.md or AGENTS.md) that defines the rules for how knowledge gets organized. That schema, Karpathy notes, is co-evolved between human and LLM over time. This is arguably the most intellectually demanding part of the system, and people keep glossing over it.

Does this actually replace RAG?

Depends on scale. For a personal knowledge base under a few hundred documents, reading markdown directly is probably fine. Context windows are large enough now that an LLM can navigate an index and drill into specific pages without vector search. No embeddings, no chunking, no cosine similarity thresholds to tune.

But Karpathy's wiki is 400,000 words. That's roughly a million tokens. Even with the latest context windows, you can't load the whole thing at once. The approach works because the LLM reads the index first (a few thousand tokens), identifies relevant pages, then reads those. It's still retrieval, just with a human-readable index instead of a vector database.

At enterprise scale, with tens of thousands of documents? I'm skeptical. The gist doesn't address this, and the community implementations I've seen don't either. The compilation step alone would be brutal on token costs, and keeping the wiki consistent as sources contradict each other is a problem the gist acknowledges but doesn't solve.

The drift problem

One thing nobody's talking about enough: if your LLM compiles the wiki and your LLM maintains the wiki and your LLM queries the wiki, you've built a closed epistemic loop. Errors compound the same way knowledge does. A misinterpretation during ingestion gets baked into a wiki page, which then informs future queries, which might generate new pages that build on the error. Karpathy mentions a lint operation for catching broken links and stale references, but that's structural maintenance, not factual verification.

Why it resonated

The real story here isn't the architecture. Anyone who's built a Zettelkasten or used Obsidian has thought about LLM-maintained notes. The reason Karpathy's post went viral is that he named a frustration most power users share: the feeling that every conversation with an LLM starts from zero. You explain context, the model helps, and then it all evaporates. Your chat history is a graveyard of good answers you can never find again.

The wiki pattern promises persistence. And I think that's what people are responding to more than any specific technical claim.

Whether "idea files" become a real distribution pattern or this specific architecture holds up at scale, those are open questions. The gist has been up for three days. Give it six months.