Ralph Loop vs One-Shot Prompting: Why Iteration Beats a Single Prompt
One-shot prompting asks an agent to solve the entire task in a single response. It works for small, well-scoped edits. It stalls the moment the task needs more than one pass: a multi-file refactor, a feature with tests, anything where the first draft is wrong in ways the model cannot see from inside the same context window.
A Ralph loop runs the agent again and again. Each iteration starts with a clean context, completes one task, checks its own output against tests, and commits before moving on. The short version: one-shot is a single guess, a loop is a feedback system. This post compares the two head to head and shows when each one actually wins.
If you want the wider background first, read what the Ralph technique is and come back. The loop pattern was popularized by Geoffrey Huntley, who described Ralph as a Bash loop that builds software while you sleep.
What is the difference between a Ralph loop and one-shot prompting?
Section titled “What is the difference between a Ralph loop and one-shot prompting?”One-shot prompting is the default mode of a chat box. You write a prompt, the model produces output, and you accept or reject it. There is no second look. If the answer is wrong, you are the feedback loop: you read the diff, spot the bug, and re-prompt.
A Ralph loop puts the feedback loop inside the machine. The ralph.sh script invokes an agent, waits for it to finish one unit of work, runs the verification stack, and then invokes the agent again. The agent reads the current state from disk every iteration, so it sees what the previous run actually produced rather than what it intended to produce.
# One iteration: a single guess, then stop../ralph.sh --once
# Up to fifty iterations: a feedback system that keeps going../ralph.sh -n 50The default is ten iterations. The flag that changes everything is the count. With one iteration you get one-shot behavior wrapped in a script. With fifty you get a system that can see its own output, notice a failing test, and try again.
Where does one-shot prompting break down?
Section titled “Where does one-shot prompting break down?”One-shot prompting fails in predictable places. None of these are model intelligence problems. They are structural problems with asking for everything at once.
Multi-step work that depends on its own output
Section titled “Multi-step work that depends on its own output”Real tasks have ordering. You cannot write the migration until the schema is decided, and you cannot write the tests until the function signature exists. In a single prompt the model has to imagine all of those steps at once and get every one right with no chance to react to the previous step. The longer the chain, the lower the odds that the whole thing lands.
A loop turns one giant guess into a sequence of small, verified steps. Each iteration handles one task from .agent/tasks.json, runs the steps in .agent/tasks/TASK-{ID}.json, and commits. The next iteration builds on a commit that already passed tests, not on a hopeful sketch.
Large refactors across many files
Section titled “Large refactors across many files”Ask a model to rename a concept across forty files in one shot and you will get a confident response that touches eight of them, misses the imports in twelve more, and breaks the build. The model runs out of attention budget before it runs out of files. It also cannot run the type checker mid-response to find what it missed.
The loop handles this by breaking the refactor into atomic tasks and verifying after each one. TypeScript and ESLint run every iteration, so a missed import surfaces as a failure that the next iteration fixes instead of a silent bug you find a week later.
The agent cannot grade its own first draft
Section titled “The agent cannot grade its own first draft”This is the core limitation. Inside a single response, the model has no ground truth. It cannot execute the code, read the test output, or open the screenshot. It is predicting what correct code looks like, not confirming that the code is correct. One-shot prompting asks for an answer and gives the model no way to check it.
flowchart TD
subgraph OneShot["One-shot prompting"]
A["Write one large prompt"] --> B["Agent generates output"]
B --> C{"Correct?"}
C -->|"No"| D["You debug it by hand"]
D --> A
C -->|"Yes"| E["Done"]
end
subgraph Loop["Ralph loop"]
F["Pick highest-priority task"] --> G["Agent works one task"]
G --> H["Run tests, lint, types"]
H --> I{"Pass?"}
I -->|"No"| J["Fresh context, next iteration retries"]
J --> F
I -->|"Yes"| K["Commit and update status"]
K --> L{"All tasks done?"}
L -->|"No"| F
L -->|"Yes"| M["Emit COMPLETE promise"]
end
The diagram makes the asymmetry obvious. In one-shot prompting the only correction path runs through you. In a loop the correction path is automated, and the agent only stops when a real signal says it is done.
How does the loop actually differ under the hood?
Section titled “How does the loop actually differ under the hood?”Three properties separate a Ralph loop from a clever one-shot prompt: fresh context, a commit per iteration, and a completion promise.
Fresh context per iteration
Section titled “Fresh context per iteration”Each loop starts the agent with a clean context window. This is deliberate. Long sessions suffer from context rot, where the agent drifts, repeats itself, and loses the plot as the transcript fills with old output. By resetting every iteration, the agent always works from a short, relevant context rather than a bloated one.
The cost of resetting is memory, and the loop solves that by putting memory on disk. The filesystem and git history are the memory layer. Progress lives in .agent/tasks.json, .agent/logs/LOG.md, the per-task spec files, and the git log, not in a chat transcript that the next iteration would have to re-read. A fresh agent reorients in seconds by reading state, then gets to work.
A commit per iteration
Section titled “A commit per iteration”One-shot prompting produces a single diff that you keep or throw away whole. A loop commits after every completed task. That gives you a clean, bisectable history where each commit passed verification when it landed. If iteration thirty-one introduced a regression, git log and git bisect find it. You are not staring at one enormous diff trying to guess which line broke.
The discipline behind this is one task per invocation. The agent completes exactly one task, commits, and stops. It never batches multiple tasks into a single commit, which keeps each step small enough to verify and revert.
A completion promise instead of a vibe
Section titled “A completion promise instead of a vibe”A one-shot prompt ends when the model stops typing. A loop ends on an explicit signal. The agent emits a completion promise, a semantic status tag that tells the script what happened.
<promise>COMPLETE</promise> all tasks finished<promise>BLOCKED:reason</promise> needs human help<promise>DECIDE:question</promise> needs a decisionThose tags map to exit codes, so the loop is scriptable in CI or a cron job:
./ralph.sh -n 50echo $? # 0 COMPLETE, 1 MAX_ITERATIONS, 2 BLOCKED, 3 DECIDEThis is the part one-shot prompting cannot replicate. A single response has no notion of “not done yet, keep going.” The loop does, and it knows the difference between finished, stuck, and waiting on you.
What are the cost and latency tradeoffs?
Section titled “What are the cost and latency tradeoffs?”Iteration is not free. A loop that runs the agent fifty times costs roughly fifty times the tokens of a single run, plus the time to run tests and linting each pass. One-shot prompting is cheaper and faster by definition because it runs once.
So the tradeoff is concrete. One-shot trades reliability for speed and cost. A loop trades speed and cost for reliability and the ability to finish hard work unattended. The right call depends on the task.
One-shot prompting is the right tool when:
- The change is small and local, like a single function, a config tweak, or a one-file bug fix.
- You can eyeball the diff and verify it yourself in seconds.
- You want an answer now and you are sitting at the keyboard to catch mistakes.
- The task has no real verification stack, so a loop would have nothing to gate on.
A loop earns its cost when:
- The work spans many files or many steps that depend on each other.
- There are tests, types, and lint rules that can decide pass or fail without you.
- You want to start the run and walk away, then review a clean commit history later.
- The first draft being wrong is likely, and hand-debugging it would cost more than the extra tokens.
You also control the ceiling. ./ralph.sh --max-iterations 5 caps a run so a loop cannot thrash forever, and the completion promise stops it early when the work is genuinely done. Cost control is a dial, not a gamble. If a loop is misbehaving, that usually points at one of the known Ralph loop failure modes such as a badly scoped task or a missing test gate, and the fix is structural rather than throwing more iterations at it.
Why is verification the real differentiator?
Section titled “Why is verification the real differentiator?”Strip away the script and the prompt files and one thing remains as the actual reason a loop beats a single prompt: tests gate progress.
The repo runs a full verification stack every iteration. Playwright for end to end, Vitest for unit tests, TypeScript for types, ESLint for lint, and Prettier for format. The mantra is blunt: if you didn’t test it, it doesn’t work. An iteration that breaks a test does not advance. The agent sees the failure, and the next fresh-context iteration starts from a state that includes that failing signal.
This is what one-shot prompting structurally lacks. A single response cannot run Vitest, read the red output, and react. It produces code that looks correct and hands it to you unverified. The loop closes that gap by making the test suite the judge instead of the model’s own confidence. For the full pattern, see verification loops for AI agents, which covers how tests, type checks, and screenshots give an agent the feedback it needs to self-correct.
Two implications follow from this.
First, a loop is only as good as its verification. Point it at a project with no tests and it degrades toward one-shot quality, because there is nothing to gate on. The investment that makes a loop reliable is the test suite, not the script.
Second, verification is what lets you walk away. You can run ./ralph.sh -n 50 --agent codex overnight and trust the result in the morning because every commit on that branch passed the same gates you would have checked by hand. The tests are doing the reviewing you would otherwise be doing live.
A concrete comparison
Section titled “A concrete comparison”Picture a feature: add pagination to a list endpoint, wire it into the UI, and cover it with tests.
One-shot prompting produces one diff. Maybe the API change is right but the UI forgets the loading state, or the tests reference a prop that does not exist. You read all of it, find the gaps, and re-prompt. You are the loop, doing it slowly and by hand.
The Ralph loop breaks the same feature into tasks: change the endpoint, update the client, add the component state, write the tests. Each iteration completes one task, runs the suite, and commits. When a test fails, the next iteration fixes it before touching anything else. You come back to a branch of small green commits instead of one large diff you have to audit line by line.
Same agent, same model. The difference is the loop, the fresh context, and the tests that decide when each step is done.
Frequently asked questions
Is a Ralph loop always better than one-shot prompting?
No. One-shot prompting is faster and cheaper, and it is the right choice for small local changes you can verify yourself in seconds. A loop earns its extra cost on multi-step work, large refactors, and any task with a real test suite that can gate progress without you watching.
Does running fifty iterations cost fifty times as much?
Roughly, yes. A loop runs the agent once per iteration, so token cost scales with the iteration count plus the time to run tests each pass. You cap the ceiling with flags like --max-iterations 5 or -n 50, and the completion promise stops the loop early when all tasks pass, so you rarely pay for the full ceiling.
Why does each iteration start with a fresh context?
A clean context window avoids context rot, where a long session drifts and the agent loses the plot. Resetting every iteration keeps the working context short and relevant. Memory is not lost because it lives on disk in tasks.json, the logs, the task specs, and the git history, which a fresh agent reads to reorient.
What actually makes the loop more reliable than a single prompt?
Verification. The loop runs Playwright, Vitest, TypeScript, ESLint, and Prettier every iteration, so a failing test blocks progress and the next iteration fixes it. A single response cannot execute code or read test output, so it hands you unverified work. A loop on a project with no tests degrades toward one-shot quality.
How does the loop know when to stop instead of looping forever?
It stops on an explicit completion promise, not a vibe. The agent emits COMPLETE when all tasks pass, BLOCKED when it needs human help, or DECIDE when it needs a decision. Those map to exit codes 0, 2, and 3, while exit code 1 means it hit the max iteration cap.