How to Run Claude Code in a Loop (Autonomous Coding That Ships While You Sleep)

Terminal running Claude Code in an autonomous Ralph loop with a fresh context per iteration

Feb 14, 2026 - 14 min read - 2900 words

Creator of RalphLoop.sh, founder of PageAI

To run Claude Code in a loop, wrap the claude -p headless command so it restarts with a fresh context after every iteration, reads its next task from disk, and stops on an explicit completion signal instead of running forever. The crude version is a one line Bash loop. The version you actually want to leave running overnight is ralph.sh, where claude is the default agent and the loop already handles task selection, verification, and a clean exit.

This post shows both. You get the naive loop so you understand the mechanic, then the hackable script that makes the mechanic safe to walk away from. If you want the full setup from an empty repo (install, task list, login, review), follow how to run a Ralph loop with Claude Code; this post focuses on the loop mechanics.

The fastest answer: two ways to loop Claude Code

There are two honest answers to “how do I run Claude Code in a loop.” One is a Bash one liner you can paste right now. The other is a script that turns that one liner into something you trust unattended.

The naive while loop

Claude Code has a headless mode. claude -p takes a prompt on stdin, runs to completion, and exits. So the smallest possible loop is this:

while :; do cat PROMPT.md | claude -p; done

That is the whole trick the Ralph technique is built on. Geoffrey Huntley popularized it with the line “Ralph is a Bash loop,” and his original writeup is worth reading for the full story. Each pass feeds the same prompt file into a brand new Claude process. Because every process is new, every iteration starts with a clean context window.

The naive loop works, and it is also a footgun. It never stops on its own. It happily reruns finished work. It runs on your laptop with your permissions, so a wrong move touches your real files. There is no verification gate, no log of what changed, and no way to tell the difference between “done” and “stuck.” You will babysit it, which defeats the point.

The hackable version: ralph.sh

ralph.sh keeps the exact same core mechanic (restart the agent with a fresh context each pass) and wraps it with the parts the one liner is missing: task selection, verification, logging, a completion check, and sandbox isolation. Claude Code is the default agent, so you do not pass an agent flag at all:

./ralph.sh -n 50

That runs up to 50 iterations with Claude Code. The default is 10 iterations if you pass no count. You can also run exactly one pass to test your setup:

./ralph.sh --once

If you want the other coding CLIs instead, this is the same script with a different --agent value. The broader picture of which CLI fits which job lives in the agentic coding CLIs guide, and Codex specifically gets its own walkthrough in how to run the Codex CLI in a loop.

Setting up Claude Code in a loop

You need three commands to go from nothing to a running loop. Install, authenticate, run.

First, drop Ralph into your project:

npx @pageai/ralph-loop

This adds the ralph.sh script and the .agent/ directory that holds the prompt, the task list, and the logs.

Claude Code runs inside an isolated sandbox, so it needs its own login there. Authenticate once and the session persists for later runs:

./ralph.sh --login

Because claude is the default agent, --login logs into Claude Code without any extra flags. If you ever loop a different agent, you can scope the login with ./ralph.sh --login --agent codex.

Then start the loop with the iteration count you want:

./ralph.sh -n 50

The script picks the highest priority incomplete task, works it, verifies it, commits, and moves to the next one. You can close the terminal once it is rolling. If you need the agent’s dev server reachable from your host, publish the port with ./ralph.sh --ports, and print the sandbox name with ./ralph.sh --print-name so you know what to attach to later.

What actually happens on each pass:

Find the highest priority incomplete task in .agent/tasks.json.
Work the steps in .agent/tasks/TASK-{ID}.json.
Run tests, linting, and type checking.
Complete the task, take a screenshot, update task status, and commit.
Repeat until every task passes or the iteration cap is reached.

The prompt that drives all of this lives in .agent/PROMPT.md. The default mode is implementation, and you can swap it for a refactor, review, or test pass. One rule is baked in and matters more than it looks: one task per invocation. The agent finishes exactly one task, commits, and stops. It never batches several tasks into a single pass, which keeps each fresh context focused on a small, verifiable unit of work.

Why a fresh context each iteration matters

A single long Claude Code session degrades. The context window fills with old tool output, half-finished reasoning, and dead ends, and the agent starts losing the plot. This is context rot, and it is the main reason a chat that felt sharp at message ten feels confused at message eighty.

The Ralph loop sidesteps it by throwing the context window away after every iteration. Each new claude -p process boots clean. It does not carry the cruft of the last attempt.

The obvious worry is memory. If the agent forgets everything each pass, how does it make progress? The answer is that the filesystem and git history are the memory layer, not the chat transcript. Progress is durable because it lives in files:

.agent/tasks.json holds the task lookup table, with the current status of each task.
.agent/tasks/TASK-{ID}.json holds the spec for one task.
.agent/logs/LOG.md and .agent/history/ hold what happened on previous iterations.
The git log holds the committed work itself.

A fresh agent reorients by reading those files, not by remembering a conversation. That separation of “thinking” (ephemeral, per iteration) from “state” (durable, on disk) is the whole reason a loop can run for hours without drifting. The deeper version of this idea, and the patterns that keep a run productive across days, are covered in the best agentic CLI for long-running tasks.

How the loop knows when to stop

A loop that cannot stop is a liability. Ralph stops on an explicit completion promise, not on a vibe. The agent emits a semantic status tag, and the script reads it to decide what to do next:

<promise>COMPLETE</promise> means every task is finished.
<promise>BLOCKED:reason</promise> means the agent needs human help.
<promise>DECIDE:question</promise> means the agent needs a decision before it can continue.

Those tags map to exit codes, so you can wire the loop into CI or a parent script and branch on the result:

./ralph.sh -n 50
echo "exit: $?"
# 0 COMPLETE
# 1 MAX_ITERATIONS
# 2 BLOCKED
# 3 DECIDE

0 means the work is done. 1 means the loop hit its iteration cap before finishing, which is your cue to inspect progress and run it again. 2 and 3 mean the agent stopped on purpose and wants you. This is the difference between an autonomous loop and a runaway process: it ends with a verdict you can act on.

What happens inside one Claude Code iteration

Here is the lifecycle of a single pass, from a fresh context to a commit or a stop signal.

flowchart TD
    Start["Start iteration"] --> Fresh["Boot claude -p with a fresh context"]
    Fresh --> Read["Read PROMPT.md, tasks.json, logs"]
    Read --> Pick["Pick highest-priority incomplete task"]
    Pick --> Work["Work exactly one task"]
    Work --> Verify{"Tests, lint, typecheck pass?"}
    Verify -->|No| Fix["Fix and re-run checks"]
    Fix --> Verify
    Verify -->|Yes| Commit["Screenshot, update status, commit"]
    Commit --> Promise{"Promise emitted?"}
    Promise -->|COMPLETE| Done["Exit 0"]
    Promise -->|BLOCKED or DECIDE| Stop["Exit 2 or 3"]
    Promise -->|Not yet, cap remaining| Start

The key edge is the loop back to “Start iteration.” Control returns to a fresh Claude process, not to the same session, which is what keeps context rot from accumulating across passes.

The official Claude Code Stop Hook plugin as an alternative

Anthropic shipped an official Claude Code plugin that achieves a similar effect with a Stop Hook. The hook fires when Claude finishes a turn and re-injects the prompt, which keeps the agent working without you retyping anything. If you want to stay entirely inside Claude Code’s own tooling, that is a reasonable path, and the mechanics are documented in the Claude Code docs.

The tradeoff is control. The Stop Hook lives inside one Claude Code process and one configuration surface. ralph.sh is a plain script you own, so the task selection, the verification gate, the sandbox boundary, and the completion logic are all readable, editable shell you can fork. It also runs any of the supported agents (claude, codex, copilot, cursor, gemini, opencode) behind the same interface, so the loop you build around Claude Code is not locked to Claude Code. Pick the hook if you want minimal moving parts inside Claude. Pick the script if you want the loop to be hackable and agent agnostic.

Running Claude Code in a loop safely

Now the part most people skip. An autonomous Claude Code loop will run commands you did not preview. If it runs on your laptop with your permissions, it can read your SSH keys, your environment variables, and your credentials, and it can delete things. The fix is not to make the agent more cautious. The fix is to change the blast radius.

Ralph runs each agent inside a Docker Sandbox, an isolated microVM managed by the sbx CLI. Inside that boundary, Claude Code runs in bypass-permissions mode, which is the only way a loop runs unattended without stopping to ask for approval on every command:

claude -p --dangerously-skip-permissions
# or, equivalently
claude -p --permission-mode bypassPermissions

That flag is terrifying on a real machine and fine inside a sandbox, because the sandbox is the boundary the agent cannot cross. The microVM gets a deterministic name in the form ralph-<agent>-<current-dir>-<hash8>, so you can find it and inspect it:

sbx ls
sbx exec -it ralph-claude-myapp-1a2b3c4d bash

Network access is deny by default inside the sandbox. The agent gets no outbound network until you allow specific domains, which stops a confused or compromised run from exfiltrating anything:

sbx policy allow network ralph-claude-myapp-1a2b3c4d registry.npmjs.org

This is the rule that makes YOLO mode reasonable: contain first, then let the agent move fast inside the container. The full treatment of sandboxing autonomous agents, including how the microVM compares to a hand rolled container, lives in how to run AI coding agents in Docker sandboxes safely.

Verifying every iteration with tests, lint, and typecheck

A loop without verification is just a faster way to produce broken code. The repo’s mantra is blunt: if you didn’t test it, it doesn’t work. So every Ralph iteration runs the checks before it commits, and a failed check sends the agent back to fix the work rather than forward to the next task.

The verification stack the loop assumes:

Vitest for unit tests.
Playwright for end to end tests.
TypeScript for type checking.
ESLint for linting.
Prettier for formatting.

A typical project wires these into npm scripts the agent calls during step 3 of each iteration:

{
  "scripts": {
    "test": "vitest run",
    "test:e2e": "playwright test",
    "typecheck": "tsc --noEmit",
    "lint": "eslint .",
    "format": "prettier --check ."
  }
}

The screenshot step matters too. For UI work, a passing test suite is necessary but not sufficient, so the agent captures a screenshot and uses it as visual evidence that the change rendered. That combination, automated checks plus a screenshot, is the feedback an autonomous agent needs to know whether the last task actually landed. Verification is also what stops a loop from thrashing: a task that cannot pass its checks gets surfaced as a BLOCKED promise instead of being silently committed.

Passing model and agent flags

Claude Code is the default, so the common case needs no agent flag. When you do want to tune the run, agent specific flags go after a -- separator so the script knows they belong to the agent and not to Ralph:

./ralph.sh -- --model claude-opus-4-5

If you are comparing agents, swapping is a one word change. The same loop runs Codex with ./ralph.sh --agent codex or Cursor with ./ralph.sh -a cursor -n 5, and you pass each agent’s own flags after the --. That uniform interface is the reason you can treat the choice of CLI as a variable rather than a rewrite.

Steering a run mid flight is also possible without killing momentum. Edit .agent/STEERING.md while the loop is running, and the agent reads it and handles the injected work before resuming its task list. That is how you correct course on a long run without stopping and restarting from scratch.

Putting it together

The mechanic is a one line Bash loop that restarts Claude Code with a fresh context. Everything else (task selection from .agent/tasks.json, the completion promise, the exit codes, the verification gate, the sandbox) exists to make that mechanic safe to leave alone. Install with npx @pageai/ralph-loop, log in once with ./ralph.sh --login, and run ./ralph.sh -n 50. Claude Code is the default, so you point it at a real task list, sandbox it, and check the diff and the screenshots in the morning.

Frequently asked questions

How do I run Claude Code in a loop with a single command?

Install Ralph with npx @pageai/ralph-loop, authenticate once with ./ralph.sh --login, then run ./ralph.sh -n 50. Claude Code is the default agent, so you do not pass an agent flag. The script restarts Claude with a fresh context each iteration, works one task, verifies it, commits, and stops on a completion promise.

Why does each iteration start with a fresh context?

A single long session accumulates old tool output and dead ends until the agent loses the plot, which is called context rot. Booting a new claude process each iteration throws that cruft away. Progress is preserved because the filesystem and git history act as the memory layer, not the chat transcript.

Is it safe to run Claude Code with --dangerously-skip-permissions?

It is unsafe on your laptop and reasonable inside a sandbox. Ralph runs the agent in an isolated Docker Sandbox microVM with network denied by default, so bypass-permissions mode lets the agent move fast without being able to touch your real files or exfiltrate data. The sandbox is the boundary, not the agent.

How does the loop know when to stop?

The agent emits a promise tag that the script reads. COMPLETE means all tasks are done and exits with code 0, MAX_ITERATIONS exits 1 when the iteration cap is hit, BLOCKED exits 2 when it needs help, and DECIDE exits 3 when it needs a decision. The loop ends with a verdict you can act on instead of running forever.

What is the difference between ralph.sh and the official Claude Code Stop Hook plugin?

Both restart the agent automatically. The Stop Hook lives inside one Claude Code process and re-injects the prompt when a turn ends. ralph.sh is a plain script you own that adds task selection, a verification gate, a sandbox boundary, and explicit completion logic, and it runs any supported agent behind the same interface.

Run your own Ralph loop

Ralph is a hackable script you point at your project. Install it and let an agent work through your task list.

npx @pageai/ralph-loop

Install from npm Star on GitHub Watch the walkthrough