Developers Restructure Codebases for Claude Code's AI Agent

Something odd is happening in developer workflows. Instead of just plugging AI coding tools into existing projects, a growing number of teams are reorganizing their entire repository structures to squeeze more out of Anthropic's Claude Code. The practice has gone from fringe experiment to a recognizable pattern, complete with its own conventions and community guides.

Claude Code launched as a research preview in February 2025 and reached general availability that May. By November, Anthropic reported the tool had crossed $1 billion in annualized revenue, and the open-source CLI has tens of thousands of GitHub stars. According to one analysis, it now accounts for roughly 4% of all public GitHub commits.

The CLAUDE.md question

At the center of all this restructuring is a single markdown file. CLAUDE.md sits in your project root and acts as persistent memory for the agent, loaded automatically at the start of every session. It tells Claude about your architecture, your coding standards, your preferred commands, your project's weird quirks. Without it, every conversation starts cold. With a good one, the agent already knows that your Stripe webhook handler requires signature validation and that your team uses conventional commits.

Anthropic's official guidance suggests keeping CLAUDE.md files under 200 lines. The reasoning is practical: everything in the file competes for attention in the context window, and longer files reduce how consistently Claude follows instructions. But plenty of teams have blown past that limit and are now dealing with the consequences. The HumanLayer blog puts it more bluntly: the general consensus is that under 300 lines works, and shorter is better. At HumanLayer, their root CLAUDE.md is under sixty lines.

The /init command generates a starter file automatically, but most developers treat it as a rough draft. "Deleting is easier than creating from scratch," as one popular guide put it. The generated file tends to include obvious information that wastes context tokens.

Skills, hooks, and the new project anatomy

CLAUDE.md is just the entry point. The deeper restructuring involves the .claude/ directory, which can contain skills folders, custom slash commands, agent definitions, and hooks that enforce consistency. Skills are folders with a SKILL.md file and optional helper scripts. Claude loads them on demand when a task matches the skill's description, which means your context window isn't bloated with instructions about PDF processing when you're debugging a React component.

Hooks are where things get interesting. Pre-tool hooks can block dangerous commands. Post-tool hooks can auto-format code or run type checks after every edit. One showcase project on GitHub includes hooks that auto-format code, run tests when test files change, and block edits to the main branch. It is, in effect, a set of guardrails that treats the AI agent the way you'd treat a new hire with admin access.

Garry Tan's prompt went viral. The reaction was mixed.

Y Combinator CEO Garry Tan posted his Claude Code workflow on X earlier this year and it racked up thousands of likes. His approach uses Claude Code's Plan Mode to force a four-stage review before any code gets written: architecture, code quality, tests, and performance. For every issue Claude finds, it has to present options with tradeoffs and make an opinionated recommendation, then wait for human approval before proceeding.

Tan reported shipping features exceeding 4,000 lines of code with full test coverage in about an hour. He averages 1.3 lines of test for every line of production code using this workflow, according to a breakdown on Anup.io.

Not everyone was impressed. A Medium post titled "Y Combinator's CEO Shared His Claude Code Prompt. It Solves the Wrong Problem" argued that the four-stage review pipeline created more overhead than it eliminated. The author spent more time approving Claude's suggestions than actually building. Which, if you've used Plan Mode on anything complex, sounds about right.

The prompt itself bakes in Tan's engineering preferences: DRY, explicit over clever, minimal diffs. That last part matters. The structure is the valuable contribution; the specific opinions are things you'd want to swap out for your own team's standards.

Everyone shipped multi-agent in the same two weeks

The competitive landscape shifted in early 2026 when, in a span of about two weeks, every major AI coding tool shipped some form of multi-agent support. Cursor 2.4 introduced subagents with agent-to-agent communication. OpenAI's Codex CLI got its Agents SDK. Windsurf shipped 5 parallel agents. Claude Code launched Agent Teams. Even Grok Build showed up with 8 agents.

The convergence is telling. Multi-agent is now table stakes. But the implementations differ. Cursor's subagents work in parallel but can't communicate with each other across sessions. Claude Code's Agent Teams share task lists and send bidirectional messages. OpenAI's Codex emphasizes autonomous cloud sandboxes where you fire off a task and check back later.

For developers who've invested time restructuring their repos around CLAUDE.md and skills, this is where the payoff is supposed to show up. A well-configured project gives each sub-agent the context it needs without burning the entire context window on orientation. A poorly configured one means each agent wastes tokens rediscovering your project structure.

The vibe coding backdrop

All of this is unfolding against the "vibe coding" phenomenon that Andrej Karpathy named in February 2025. Tan himself noted that 25% of YC's Winter 2025 batch had codebases that were over 95% AI-generated. Collins Dictionary named "vibe coding" its Word of the Year for 2025.

But there's a tension here that doesn't get discussed enough. Vibe coding, in Karpathy's original description, was about giving in to the vibes, accepting whatever the AI produces without fully reading it. The CLAUDE.md restructuring movement is the opposite impulse: it's about imposing structure on the AI so thoroughly that its output conforms to your standards before you even look at it. One philosophy says trust the machine. The other says configure the machine until trust isn't required.

I'm not sure which camp wins long-term. The restructuring approach clearly produces better results for complex, multi-file projects where consistency matters. But it also creates a new category of engineering work that didn't exist eighteen months ago: maintaining the configuration files that tell your AI how to write code.

What's actually at stake

Claude Code's SWE-bench Verified score of 80.8% is the highest published among coding agents. OpenAI's GPT-5.3-Codex leads Terminal-Bench 2.0 at 77.3%. These benchmarks measure different things, but they suggest both tools are genuinely capable on hard problems. The question isn't whether AI can write code. It's whether your project is set up to let it write your code, with your conventions and your edge cases handled.

That's what the CLAUDE.md restructuring is really about. Not making AI smarter, but making your codebase legible to an agent that starts every session with amnesia. The bet is that structure outlasts prompts, that a well-configured repo will keep producing good results even as the underlying models change.

Whether that bet pays off depends on how fast the tools evolve. Anthropic's auto-memory feature already lets Claude write its own notes based on corrections. Cursor's skills marketplace creates reusable workflow packages. If the tools get smart enough to understand your project without a 200-line briefing document, all this restructuring becomes overhead.

For now, though, the developers doing it report real results. And the ones who aren't are starting to notice the gap.