How to Set Up the Ralph Wiggum Technique for AI-Powered Development

QUICK INFO


Difficulty	Intermediate
Time Required	45-90 minutes for initial setup
Prerequisites	Terminal/command line familiarity, Git basics, a project idea
Tools Needed	Claude Code CLI (latest), Git, Bash/Zsh, Docker (recommended)

What You'll Learn:

Set up the Ralph Wiggum autonomous development loop from scratch
Create specification files and prompts that guide AI effectively
Configure sandboxing to run Claude Code safely without permission interruptions
Troubleshoot common loop failures and tune prompts based on observed patterns

The Ralph Wiggum technique is a bash loop that feeds Claude Code the same prompt repeatedly until it finishes your project. Named after the Simpsons character who persists despite setbacks, it works because each iteration gets a fresh context window while state persists in files and git. This guide covers the full setup: from installing Claude Code through writing your first autonomous loop.

What Ralph Actually Is

At its core, Ralph is five lines of bash:

while :; do cat PROMPT.md | claude ; done

That's the purest form. Claude reads a prompt file, does work, commits progress, and exits. The bash loop immediately restarts it with fresh context. The IMPLEMENTATION_PLAN.md file on disk tells each new Claude instance what to work on next.

The technique addresses a real limitation: AI models degrade as their context window fills with failed attempts and irrelevant code. By forcing fresh context on every task, Ralph avoids "context pollution" where the model keeps referencing bad information. State lives in files and git, not in Claude's memory.

Prerequisites Check

Before starting, verify you have:

Required:

A Unix-like environment (macOS, Linux, or WSL on Windows)
Git installed and configured
An Anthropic account with Claude Pro, Claude Max, or API access

Strongly Recommended:

Docker for sandboxing (more on this later)
A project with existing tests or type checking (this becomes your "backpressure")

If you're on Windows without WSL, install it first. The native Windows command line won't work for any of this.

Installing Claude Code

The native installer is the recommended approach. Open your terminal:

curl -fsSL https://claude.ai/install.sh | bash

After installation, restart your terminal or run source ~/.bashrc (or ~/.zshrc on macOS). Verify it worked:

claude --version

You should see a version number. If you get "command not found," add Claude's binary location to your PATH:

echo 'export PATH="$HOME/.claude/bin:$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Authentication: Run claude once in any directory. It will open a browser window for OAuth authentication with your Anthropic account. Complete this before proceeding.

Project Structure

Ralph needs specific files to function. Create this structure in your project:

your-project/
├── loop.sh                    # The bash script that runs Ralph
├── PROMPT_build.md            # Instructions for building/implementing
├── PROMPT_plan.md             # Instructions for planning/gap analysis
├── AGENTS.md                  # How to build/test this project
├── IMPLEMENTATION_PLAN.md     # Task list (generated by Ralph)
├── specs/                     # One file per feature/topic
│   └── [feature-name].md
└── src/                       # Your source code

The IMPLEMENTATION_PLAN.md file doesn't exist initially. Ralph creates it during the planning phase.

AGENTS.md

This file tells Claude how to build and test your project. Keep it short and operational. Around 60 lines maximum. Don't use it as a progress diary.

## Build & Run

- Install: `npm install`
- Dev server: `npm run dev`
- Build: `npm run build`

## Validation

Run these after implementing changes:

- Tests: `npm test`
- Typecheck: `npx tsc --noEmit`
- Lint: `npm run lint`

## Codebase Patterns

- Components go in `src/components/`
- Utilities go in `src/lib/`
- Tests are co-located: `Button.tsx` → `Button.test.tsx`

The validation commands are critical. They create "backpressure" that prevents Claude from committing broken code. If tests fail, Claude has to fix them before proceeding.

specs/ Directory

Each spec file describes one feature or "topic of concern." Write these collaboratively with an AI before starting the loop, or write them yourself. The key is one concept per file.

A spec might look like:

# User Authentication

## Overview
Users can sign up, log in, and log out. Sessions persist across browser restarts.

## Requirements
- Email/password signup with validation
- JWT-based sessions stored in httpOnly cookies
- Password reset via email link
- Rate limiting on login attempts (5 per minute)

## Acceptance Criteria
- User can create account with valid email
- Invalid emails show clear error message
- Session persists after browser restart
- Logout clears session completely

Don't overthink the format. These specs are for Claude to read, and it handles unstructured text well.

The Prompt Files

You need two prompt files: one for planning, one for building.

PROMPT_plan.md (creates the implementation plan):

0a. Study `specs/*` with up to 250 parallel Sonnet subagents to learn the application specifications.
0b. Study @IMPLEMENTATION_PLAN.md (if present) to understand the plan so far.
0c. Study `src/lib/*` with up to 250 parallel Sonnet subagents to understand shared utilities & components.
0d. For reference, the application source code is in `src/*`.

1. Study @IMPLEMENTATION_PLAN.md (if present; it may be incorrect) and use up to 500 Sonnet subagents to study existing source code in `src/*` and compare it against `specs/*`. Use an Opus subagent to analyze findings, prioritize tasks, and create/update @IMPLEMENTATION_PLAN.md as a bullet point list sorted in priority of items yet to be implemented. Ultrathink. Consider searching for TODO, minimal implementations, placeholders, skipped/flaky tests, and inconsistent patterns. Study @IMPLEMENTATION_PLAN.md to determine starting point for research and keep it up to date with items considered complete/incomplete using subagents.

IMPORTANT: Plan only. Do NOT implement anything. Do NOT assume functionality is missing; confirm with code search first. Treat `src/lib` as the project's standard library for shared utilities and components. Prefer consolidated, idiomatic implementations there over ad-hoc copies.

ULTIMATE GOAL: We want to achieve [YOUR PROJECT GOAL HERE]. Consider missing elements and plan accordingly. If an element is missing, search first to confirm it doesn't exist, then if needed author the specification at specs/FILENAME.md. If you create a new element then document the plan to implement it in @IMPLEMENTATION_PLAN.md using a subagent.

Replace [YOUR PROJECT GOAL HERE] with your actual goal.

PROMPT_build.md (implements from the plan):

0a. Study `specs/*` with up to 500 parallel Sonnet subagents to learn the application specifications.
0b. Study @IMPLEMENTATION_PLAN.md.
0c. For reference, the application source code is in `src/*`.

1. Your task is to implement functionality per the specifications using parallel subagents. Follow @IMPLEMENTATION_PLAN.md and choose the most important item to address. Before making changes, search the codebase (don't assume not implemented) using Sonnet subagents. You may use up to 500 parallel Sonnet subagents for searches/reads and only 1 Sonnet subagent for build/tests. Use Opus subagents when complex reasoning is needed (debugging, architectural decisions).
2. After implementing functionality or resolving problems, run the tests for that unit of code that was improved. If functionality is missing then it's your job to add it as per the application specifications. Ultrathink.
3. When you discover issues, immediately update @IMPLEMENTATION_PLAN.md with your findings using a subagent. When resolved, update and remove the item.
4. When the tests pass, update @IMPLEMENTATION_PLAN.md, then `git add -A` then `git commit` with a message describing the changes. After the commit, `git push`.

99999. Important: When authoring documentation, capture the why — tests and implementation importance.
999999. Important: Single sources of truth, no migrations/adapters. If tests unrelated to your work fail, resolve them as part of the increment.
9999999. As soon as there are no build or test errors create a git tag. If there are no git tags start at 0.0.0 and increment patch by 1 for example 0.0.1 if 0.0.0 does not exist.
99999999. You may add extra logging if required to debug issues.
999999999. Keep @IMPLEMENTATION_PLAN.md current with learnings using a subagent — future work depends on this to avoid duplicating efforts. Update especially after finishing your turn.
9999999999. When you learn something new about how to run the application, update @AGENTS.md using a subagent but keep it brief.
99999999999. For any bugs you notice, resolve them or document them in @IMPLEMENTATION_PLAN.md using a subagent even if it is unrelated to the current piece of work.
999999999999. Implement functionality completely. Placeholders and stubs waste efforts and time redoing the same work.
9999999999999. When @IMPLEMENTATION_PLAN.md becomes large periodically clean out the items that are completed from the file using a subagent.
99999999999999. If you find inconsistencies in the specs/* then use an Opus 4.5 subagent with 'ultrathink' requested to update the specs.
999999999999999. IMPORTANT: Keep @AGENTS.md operational only — status updates and progress notes belong in IMPLEMENTATION_PLAN.md. A bloated AGENTS.md pollutes every future loop's context.

The numbered guardrails (99999, 999999, etc.) aren't arbitrary. Higher numbers signal higher importance to the model. These accumulate over time as you observe failure patterns and add "signs" to prevent them.

loop.sh

This script wraps the core loop with mode selection and iteration limits:

#!/bin/bash
# Usage: ./loop.sh [plan] [max_iterations]
# Examples:
#   ./loop.sh              # Build mode, unlimited iterations
#   ./loop.sh 20           # Build mode, max 20 iterations
#   ./loop.sh plan         # Plan mode, unlimited iterations
#   ./loop.sh plan 5       # Plan mode, max 5 iterations

if [ "$1" = "plan" ]; then
    MODE="plan"
    PROMPT_FILE="PROMPT_plan.md"
    MAX_ITERATIONS=${2:-0}
elif [[ "$1" =~ ^[0-9]+$ ]]; then
    MODE="build"
    PROMPT_FILE="PROMPT_build.md"
    MAX_ITERATIONS=$1
else
    MODE="build"
    PROMPT_FILE="PROMPT_build.md"
    MAX_ITERATIONS=0
fi

ITERATION=0
CURRENT_BRANCH=$(git branch --show-current)

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Mode:   $MODE"
echo "Prompt: $PROMPT_FILE"
echo "Branch: $CURRENT_BRANCH"
[ $MAX_ITERATIONS -gt 0 ] && echo "Max:    $MAX_ITERATIONS iterations"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

if [ ! -f "$PROMPT_FILE" ]; then
    echo "Error: $PROMPT_FILE not found"
    exit 1
fi

while true; do
    if [ $MAX_ITERATIONS -gt 0 ] && [ $ITERATION -ge $MAX_ITERATIONS ]; then
        echo "Reached max iterations: $MAX_ITERATIONS"
        break
    fi

    cat "$PROMPT_FILE" | claude -p \
        --dangerously-skip-permissions \
        --output-format=stream-json \
        --model opus \
        --verbose

    git push origin "$CURRENT_BRANCH" || {
        echo "Failed to push. Creating remote branch..."
        git push -u origin "$CURRENT_BRANCH"
    }

    ITERATION=$((ITERATION + 1))
    echo -e "\n\n======================== LOOP $ITERATION ========================\n"
done

Make it executable:

chmod +x loop.sh

The -p flag runs Claude in headless mode. --dangerously-skip-permissions is required for autonomous operation, which brings us to the next critical topic.

Sandboxing (Non-Negotiable for Autonomous Runs)

The --dangerously-skip-permissions flag does exactly what it says. It bypasses all safety prompts. Without sandboxing, a prompt injection attack could access your SSH keys, browser cookies, and everything else on your machine.

Geoffrey Huntley's philosophy: "It's not if it gets popped, it's when. And what is the blast radius?"

Your options, roughly in order of difficulty:

Option 1: Docker (Local)

Create a Dockerfile:

FROM ubuntu:24.04

RUN apt-get update && apt-get install -y \
    curl git nodejs npm python3 \
    && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://claude.ai/install.sh | bash

WORKDIR /workspace
CMD ["bash"]

Run Claude inside:

docker build -t ralph-sandbox .
docker run -it --rm \
    -v $(pwd):/workspace \
    -v ~/.claude:/root/.claude \
    ralph-sandbox bash

Then run ./loop.sh from inside the container.

Option 2: DevContainer (VS Code/Cursor)

Anthropic provides an official devcontainer spec. Create .devcontainer/devcontainer.json:

{
  "name": "Ralph Sandbox",
  "image": "mcr.microsoft.com/devcontainers/base:ubuntu",
  "features": {
    "ghcr.io/devcontainers/features/node:1": {}
  },
  "postCreateCommand": "curl -fsSL https://claude.ai/install.sh | bash",
  "remoteUser": "vscode"
}

VS Code will prompt to reopen in the container.

Option 3: Native Sandboxing (macOS)

Claude Code has built-in sandboxing on macOS. Create .claude/settings.json in your project:

{
  "sandbox": {
    "enabled": true,
    "allowedPaths": ["./"],
    "networkAccess": true
  }
}

This provides filesystem isolation without Docker. Network access remains open for package installations.

Option 4: Remote Sandboxes

For production workloads, consider Fly Sprites, E2B, Modal, or Google Cloud Run. These provide VM-level isolation with persistence between runs. I haven't tested these extensively, so I can't give specific setup instructions.

Whatever you choose, don't run --dangerously-skip-permissions on your host machine with access to sensitive data. The one time you forget is the one time something goes wrong.

Running Your First Loop

With everything in place:

Step 1: Initialize Git

git init
git add .
git commit -m "Initial setup"

Ralph needs git for state persistence between iterations.

Step 2: Run Planning Mode

./loop.sh plan 5

This runs up to 5 planning iterations. Claude will read your specs, analyze existing code, and create IMPLEMENTATION_PLAN.md. Watch the output. If it finishes early (the plan is complete), it will exit and the loop will restart, but the second iteration should find nothing to do and exit quickly.

Step 3: Review the Plan

Open IMPLEMENTATION_PLAN.md. Does it make sense? Are the priorities reasonable? If not, delete it and adjust your specs or PROMPT_plan.md, then run planning again. The plan is disposable.

Step 4: Run Build Mode

./loop.sh 20

This runs up to 20 build iterations. Each iteration picks the highest-priority task from the plan, implements it, runs tests, updates the plan, and commits.

Watch the first few iterations closely. Where does Ralph struggle? What assumptions does it make incorrectly?

Observing and Tuning

The first runs rarely go smoothly. Common failure patterns:

Ralph implements something that already exists

Add this to your build prompt: "Before making changes, search the codebase (don't assume not implemented)." This is already in the template above, but you might need to emphasize it.

Ralph gets stuck in a loop fixing the same test

Your backpressure might be too strict, or there's a genuine bug it can't figure out. Press Ctrl+C, examine the code, and either fix it manually or add a hint to AGENTS.md about how to handle that case.

Ralph ignores your instructions

The context window might be too full. This happens less with the loop approach since each iteration starts fresh, but if your specs and AGENTS.md are huge, consider trimming them.

Ralph commits broken code

Your backpressure isn't working. Make sure the validation commands in AGENTS.md actually run and fail when they should. Claude only knows tests failed if the command returns a non-zero exit code.

Each failure is information. When you see a pattern, add a guardrail. If Ralph keeps deleting files it shouldn't, add "Never delete files without explicit user confirmation" to the prompt. If it keeps creating duplicate implementations, add "Search before implementing." The prompts evolve through observed failures.

Troubleshooting

"command not found: claude"

The PATH isn't set. Add Claude's bin directory to your shell config and restart your terminal.

Authentication errors

Run claude interactively once to complete OAuth. The token persists in ~/.claude/.

"Rate limit exceeded"

You've hit the API rate limit. Wait and try again. With Claude Max, limits are generous but not infinite.

Loop runs forever without progress

The completion condition isn't being met. Add a maximum iteration limit (./loop.sh 20) as a safety valve.

Permission denied running loop.sh

Run chmod +x loop.sh to make it executable.

Docker container can't push to git

Mount your SSH keys or set up credential caching inside the container. Or push manually after the loop completes.

Alternative: The Official Plugin

Anthropic released an official Ralph Wiggum plugin for Claude Code. Install it with:

/install-github-plugin anthropics/claude-code plugins/ralph-wiggum

Then use:

/ralph-loop "Your task" --max-iterations 20 --completion-promise "DONE"

The plugin handles the loop internally instead of using bash. It's easier to get started but has some quirks: it doesn't truly reset context the way the bash loop does, and it uses stop hooks that can leave state in unexpected places. Geoffrey Huntley (the technique's creator) recommends the bash approach for the full fresh-context benefit. I've had mixed results with the plugin, so I default to the bash loop.

What's Next

Once your first project is running, you'll want to explore work branches (running separate plans on feature branches), acceptance-driven backpressure (deriving test requirements from specs), and maybe LLM-as-judge for subjective criteria like UI design. The GitHub repository at github.com/ghuntley/how-to-ralph-wiggum covers these enhancements in detail.

The core insight remains simple: feed Claude the same prompt repeatedly, let state persist in files, and tune based on what you observe. Everything else is refinement.

TROUBLESHOOTING

Symptom: Claude exits immediately without doing work Fix: Check that PROMPT_plan.md or PROMPT_build.md exists and contains valid content. Verify the file path in loop.sh matches your actual filenames.

Symptom: "Error: Cannot find module 'claude'" on some systems Fix: If you installed via npm, ensure your npm global bin is in PATH. Run npm bin -g to find the path, then add it to your shell config.

Symptom: Loop continues but nothing is committed Fix: The tests are failing silently. Run the validation commands from AGENTS.md manually to see the actual errors. Fix them or adjust the tests.

Symptom: IMPLEMENTATION_PLAN.md keeps growing with completed items Fix: The guardrail for cleanup isn't triggering. Manually clean out completed items, or make the cleanup instruction more prominent in the build prompt.

Symptom: Claude creates the same file multiple times Fix: Add to your prompt: "Search the codebase before creating new files. If a similar file exists, modify it instead of creating a duplicate."

PROMPT TEMPLATES

For Starting a New Feature

Read IMPLEMENTATION_PLAN.md and begin work on the highest priority incomplete item. Before implementing, search the codebase to understand existing patterns. Implement the minimal solution that passes tests. Commit when tests pass.

For Debugging a Stuck Loop

Review the last 3 commits in git log. Identify what went wrong. Document the issue in IMPLEMENTATION_PLAN.md. Then fix it.

Example output: Claude examines recent commits, finds a circular dependency it introduced, adds a note about the pattern to AGENTS.md, and restructures the code.

FAQ

Q: How much does this cost to run overnight? A: With Claude Max, you get included usage that's usually sufficient. With API billing, an 8-hour session might cost $20-100 depending on task complexity and model choice. Check your Anthropic dashboard for actual usage.

Q: Can I use this with GPT-4 or other models? A: The technique works with any AI coding CLI that doesn't cap tool calls. The prompt syntax would need adjustment. Cursor, Codex, and others have their own Ralph implementations.

Q: What happens if Ralph deletes important files? A: That's what sandboxing and git are for. If something goes wrong, git reset --hard reverts to the last good commit. Don't run this without version control.

Q: Do I need to watch it constantly? A: Early on, yes. Watch the first 10-20 iterations to understand failure patterns. Once tuned, many people let it run overnight and review in the morning.

Q: How do I know when it's "done"? A: When IMPLEMENTATION_PLAN.md is empty and all tests pass. Or when you've hit your iteration limit. There's no magic completion detection, though some implementations add a "DONE" marker that the loop watches for.

RESOURCES

Original Ralph Wiggum post: Geoffrey Huntley's foundational article (requires newsletter signup)
how-to-ralph-wiggum repository: Detailed playbook with prompt templates and enhancements
Claude Code documentation: Official setup and sandboxing docs
Claude Code sandboxing guide: Detailed security configuration