Skip to content
RALPH LOOP

Completion Promises and Exit Codes: How a Ralph Loop Knows When to Stop

A green CRT terminal showing a promise tag and a shell exit code, with a branching arrow that either loops back to the prompt or stops, suggesting an agent loop deciding whether to continue.

A Ralph loop stops on an explicit signal, not a guess. Each iteration the agent prints a promise tag, a short machine-readable status, and the loop reads it to decide whether to run again, stop, or hand control back to you. When the loop exits, ralph.sh returns a numeric exit code that tells you and any surrounding automation exactly why it stopped. No “looks done”, no agent declaring victory on a feeling. A signal the script can match, and a code your shell can branch on.

This matters because the alternative is an agent that loops forever, or one that quits the moment it gets tired of the task. The completion promise is the part of the Ralph technique that turns an open-ended loop into a finite, scriptable job.

It watches the agent’s output for a <promise> tag and reacts to it. There are three signals the agent can emit, and four primary exit codes the script can return. That is the whole stop mechanism.

The agent never decides when to terminate the process. It only reports status. The loop owns the decision to continue or halt, which keeps control in the script you can read and edit rather than buried inside the model.

A promise tag is a semantic status the agent writes to its output. The format is fixed so the loop can pattern-match it reliably:

<promise>TYPE:content</promise>

The loop scans both the raw agent output and the final summary for these tags after every iteration. There are three that change control flow.

This means every task is finished. The agent has worked through .agent/tasks.json, verified each one, committed, and found nothing left to do. When the loop sees COMPLETE, it prints a success banner and exits cleanly with code 0.

<promise>COMPLETE</promise>

COMPLETE is the happy path. It is also the only tag that should be earned, not asserted. The agent is instructed to emit it only after the task list is empty and the verification gates have passed, which is the difference between real completion and an agent that wants to stop.

This means the agent cannot continue without you. A missing credential, an ambiguous external dependency, a failing service it does not control. The agent attaches a human-readable reason after the colon, and the loop surfaces it instead of silently stalling.

<promise>BLOCKED:Missing API credentials for the payments service</promise>

When the loop detects a BLOCKED tag, it extracts the reason, plays a notification, prints the blocked message with the iteration number, and exits with code 2. You read the reason, fix the thing, and run the loop again. The point is that a blocked agent tells you why it is blocked rather than thrashing on a task it can never finish.

This means the agent has hit a real decision point and wants your call before it commits to a direction. Not a blocker, a fork. Two valid architectures, a naming convention that will ripple across the codebase, a tradeoff the spec did not pin down.

<promise>DECIDE:Should the new endpoint use REST or GraphQL?</promise>

The loop extracts the question, notifies you, and exits with code 3. You answer the question (usually by updating the task spec or .agent/STEERING.md), then restart the loop so a fresh agent picks up with the decision settled.

There is a fourth tag worth knowing, even though it does not stop the loop. <promise>TASK-{ID}:DONE</promise> reports that a specific task finished during the current iteration. The loop collects these for progress display (the Tasks: TASK-1, TASK-2 line you see after each pass) but keeps running. It is bookkeeping, not a stop signal.

Where the agent learns to emit these tags is the prompt file. The instruction to print COMPLETE, BLOCKED, or DECIDE lives in .agent/PROMPT.md, which is why writing a good PROMPT.md is what makes the promise reliable. A prompt that never tells the agent how to signal a blocker gets you an agent that fakes progress instead of asking for help.

Here is the control flow the loop runs after every iteration. The agent works one task, verifies it, commits, and prints its status. The loop reads the output and branches.

flowchart TD
    Start(["Start ralph.sh -n N"]) --> Run["Run iteration i: agent works one task"]
    Run --> Scan["Scan output and final summary for promise tags"]
    Scan --> Complete{"COMPLETE tag?"}
    Complete -->|Yes| Exit0["Exit 0 COMPLETE"]
    Complete -->|No| Blocked{"BLOCKED tag?"}
    Blocked -->|Yes| Exit2["Exit 2 BLOCKED, print reason"]
    Blocked -->|No| Decide{"DECIDE tag?"}
    Decide -->|Yes| Exit3["Exit 3 DECIDE, print question"]
    Decide -->|No| Cap{"i reached max iterations?"}
    Cap -->|Yes| Exit1["Exit 1 MAX_ITERATIONS"]
    Cap -->|No| Next["Increment i, fresh context"]
    Next --> Run

Read it as a priority order. COMPLETE wins first, then BLOCKED, then DECIDE. If none of the three fire and there is still iteration budget left, the loop spawns a fresh agent and runs again. Only when the budget runs out does it stop with the max-iterations code.

When ralph.sh returns, its exit code is the single source of truth for why it stopped. These are defined in scripts/lib/constants.sh:

CodeNameMeaning
0COMPLETEAll tasks finished and verified.
1MAX_ITERATIONSHit the iteration cap with work pending.
2BLOCKEDAgent needs human help.
3DECIDEAgent needs a human decision.
4DOCKER_ERRORThe sandbox failed to start or run.
5AUTH_ERRORThe agent is not authenticated.

Codes 0 through 3 map one to one onto the loop’s logical outcomes. Codes 4 and 5 are environment failures: the Docker Sandbox did not come up, or the agent CLI is not logged in. Treat 4 and 5 as setup problems to fix, not as something the loop did wrong.

A key point: exit code 1 is not a failure. It means the loop spent its iteration budget and there is still work in the queue, which is exactly what a safety cap is supposed to do. You read the log, decide whether to top up the budget, and run again.

Because the codes are standard, you can wrap the loop in a script and react to each outcome. A case statement on $? covers every branch:

#!/usr/bin/env bash
./ralph.sh -n 50
code=$?
case "$code" in
0) echo "All tasks complete. Ship it." ;;
1) echo "Hit the iteration cap. Topping up and rerunning." && ./ralph.sh -n 50 ;;
2) echo "Blocked. A human needs to clear something." ;;
3) echo "Decision needed. Check the question and update the spec." ;;
4) echo "Sandbox failed to start. Check Docker." ;;
5) echo "Not authenticated. Run ./ralph.sh --login first." ;;
*) echo "Unexpected exit code: $code" ;;
esac

This is the difference between an agent loop you babysit and one you can automate. The script tells you whether to walk away, top up the budget, or step in.

The same property makes the loop usable in continuous integration or a scheduled job. CI treats exit 0 as success and everything else as failure by default, which is almost right. You usually want to fail the pipeline on a blocker or a decision, succeed on completion, and treat the iteration cap as a soft outcome that needs a human glance.

Terminal window
./ralph.sh -n 100
code=$?
# Complete is a pass. Blocked and Decide are hard failures.
# Max iterations is a soft outcome we flag but do not hard-fail.
if [ "$code" -eq 0 ]; then
exit 0
elif [ "$code" -eq 1 ]; then
echo "::warning::Ralph hit max iterations with work pending"
exit 0
else
echo "Ralph stopped with code $code"
exit "$code"
fi

Running an agent unattended in CI only works because each agent runs inside an isolated Docker Sandbox microVM, so the loop can run in bypass-permissions mode without risking the host. The sandbox is the boundary, and the exit code is how the pipeline learns what happened inside it.

Why machine-verifiable completion beats “looks done”

Section titled “Why machine-verifiable completion beats “looks done””

The reason a Ralph loop emits COMPLETE only after tests pass, and not when the agent feels finished, is that agents are unreliable narrators of their own work. An agent will cheerfully tell you a feature is done while the build is red. “Looks done” is a vibe, and a loop driven by vibes either stops too early on broken code or never stops at all.

Machine-verifiable completion replaces the vibe with a gate. Before the agent is allowed to call a task done, it runs the project’s checks: Playwright for end to end tests, Vitest for unit tests, TypeScript for types, ESLint for linting, Prettier for formatting. The repo mantra is blunt: if you didn’t test it, it doesn’t work. A task is not done because the agent says so. It is done because the suite is green.

This is why the promise is trustworthy. COMPLETE is downstream of a passing verification stack, not upstream of it. The agent cannot emit a real completion signal for code that fails its own checks, because the gate sits between “I wrote it” and “I am done”. The deeper version of this argument, including how tests and screenshots feed an agent the signal it needs to self-correct, is in verification loops for AI agents.

Tie this back to the tags. COMPLETE is verified work. BLOCKED is honest failure with a reason. DECIDE is honest uncertainty with a question. None of the three is a guess. That is the whole design goal: every way the loop can stop is a signal you can act on, not a state you have to interpret.

How max iterations interacts with the promise

Section titled “How max iterations interacts with the promise”

The promise and the iteration cap are two independent stop conditions, and you want both. The promise stops the loop when the agent reaches a logical endpoint. The cap stops the loop when it has run long enough regardless of what the agent thinks.

You set the cap with -n or --max-iterations. The default is ten:

Terminal window
./ralph.sh -n 50
./ralph.sh --max-iterations 5
./ralph.sh --once

--once is the special case of a cap of one. It runs a single iteration and stops, which is how you smoke test a setup before turning it loose.

Internally the loop is a counted loop, roughly for i in 1..N. On each pass it runs the agent, then checks for COMPLETE, BLOCKED, and DECIDE in that order. If any tag fires, it exits with the matching code immediately, before the cap is relevant. If no tag fires, the loop checks whether i has reached N. If it has, the loop falls out the bottom and exits 1. If not, it increments and runs a fresh agent.

So the two conditions race, and the promise almost always wins on a healthy run. The cap is the backstop for the cases where it does not: an agent stuck repeating a task without finishing it, a task list with a subtle dependency the agent keeps tripping over, a spec that is too vague to ever satisfy. Without a cap, those situations become a money fire that runs until you notice. With a cap, the worst case is a bounded, inspectable run that exits 1 and waits for you.

A practical pattern is to set the cap above your honest estimate of the task count, then read the exit code. Exit 0 means the agent finished inside the budget. Exit 1 means it did not, which is your cue to inspect the log and figure out whether the loop is making progress slowly or thrashing in place. Thrashing against the cap is one of the classic Ralph loop failure modes, and the cap is precisely the guardrail that keeps it from running up an unbounded bill.

One more interaction worth naming. BLOCKED and DECIDE short-circuit the cap entirely. If the agent hits a blocker on iteration three of a fifty iteration budget, the loop stops at three with code 2. It does not waste the remaining forty seven iterations re-discovering the same blocker, because a fresh-context agent would just hit the same wall. Stopping early and telling you the reason is the correct behavior.

The completion promise is a small mechanism with a large payoff. Three tags the agent emits, four primary exit codes the script returns, and a strict priority order between them. That is enough to make a loop both safe to leave running and easy to wire into automation.

The flow, end to end:

  1. The agent works one verified task and prints a status tag.
  2. The loop reads the tag and branches: COMPLETE exits 0, BLOCKED exits 2, DECIDE exits 3.
  3. If no tag fires and budget remains, a fresh agent runs again.
  4. If the budget runs out first, the loop exits 1.
  5. Your script or CI reads the exit code and decides what happens next.

The promise is what stops the loop on a signal. Verification is what makes the COMPLETE signal honest. The iteration cap is what stops the loop when no signal comes. Use all three together and you get an agent you can run overnight and trust the exit code in the morning.

Frequently asked questions

What is a completion promise in a Ralph loop?

It is a machine-readable status tag the agent prints to its output, in the format <promise>TYPE:content</promise>. The loop scans for it after every iteration and uses it to decide whether to continue, stop, or hand control back to you. The three control-flow tags are COMPLETE, BLOCKED, and DECIDE.

What are the Ralph loop exit codes?

There are six. 0 means COMPLETE, all tasks finished and verified. 1 means MAX_ITERATIONS, the loop hit its iteration cap with work pending. 2 means BLOCKED, the agent needs human help. 3 means DECIDE, the agent needs a human decision. 4 means a Docker Sandbox error, and 5 means an authentication error. They are defined in scripts/lib/constants.sh.

Is exit code 1 a failure?

No. Exit code 1 means the loop reached its iteration cap while tasks were still pending, which is the safety cap working as designed. You read the log, decide whether to add more iterations, and run the loop again. It is not an error, just an unfinished run.

How does the loop avoid stopping on broken code?

The agent is only allowed to emit COMPLETE after the verification stack passes: Playwright, Vitest, TypeScript, ESLint, and Prettier. Completion is downstream of a green test suite, not upstream of it, so the agent cannot signal done on code that fails its own checks.

What is the difference between BLOCKED and DECIDE?

BLOCKED means the agent cannot continue without external help, like a missing credential or a failing service it does not control. DECIDE means the agent reached a fork it can technically pass but wants your input on, like choosing between two valid architectures. BLOCKED exits with code 2 and DECIDE exits with code 3.