Skip to content
RALPH LOOP

One Task Per Iteration: The Rule That Makes Autonomous Agents Reliable

Diagram of an autonomous agent loop running one task per iteration: pick a task, verify, commit, then start the next iteration with fresh context.

The most reliable rule for autonomous coding is also the most boring: one task per invocation, commit, then stop. Each iteration the agent picks the single highest-priority incomplete task, works it to done, verifies it, commits, and exits. It never batches two tasks into one run. Break that rule and a loop that should grind cleanly through fifty tasks turns into a branch full of half-finished, hard-to-review work.

This rule is the operational core of a Ralph loop, and it sits one level below the spec-driven development workflow that produces the task list in the first place. The spec decides what the agent builds. This rule decides how it builds it without losing the plot.

Stated as plainly as it goes in the repo’s own AGENTS.md: one task per invocation. When working from .agent/tasks.json, the agent completes exactly one task, commits, and stops. It never batches multiple tasks.

That is the whole rule. Three verbs: complete, commit, stop. The reason it is worth a thousand words is that every instinct (yours and the model’s) pushes the other way. You have fifty tasks and an agent that can clearly do more than one. Letting it power through feels efficient. It is not. It is the fastest way to turn a clean autonomous run into a debugging session.

An iteration that follows the rule produces one self-contained unit of progress: a feature built, its tests passing, its types checked, and a single commit that maps to exactly one task in the list. The next iteration starts from a known-good state. That property, every iteration ends in a verified checkpoint, is what makes long runs survivable.

Batching means telling the agent (or letting it decide) to knock out three or five tasks before it commits. It looks like a throughput win. In practice it degrades the run in three compounding ways.

A coding agent has a finite context window and finite attention inside it. The longer a single session runs, the more it fills with intermediate state: files it opened, tests it ran, dead ends it explored, decisions it half-remembers. This is context rot. The agent starts contradicting earlier choices, re-editing files it already finished, and forgetting which of the five tasks it was actually on.

One task per iteration keeps the working context small and on-topic. The agent reads the summary and the one spec it needs, builds the one thing, and exits before the window gets crowded. Batching does the opposite: it lets the context grow across multiple unrelated tasks, which is precisely the condition under which agents start producing confident nonsense. Context rot is one of the headline Ralph loop failure modes, and batching is the most direct route to it.

When an agent juggles several tasks at once, a failure on task four can leave tasks one through three in a partial state. It edited shared files, started a refactor it did not finish, and now nothing in the batch is cleanly done. You cannot ship any of it, and you cannot easily tell where the good work ends and the broken work begins.

One task per iteration makes failure local. If the agent cannot finish the current task, it emits a <promise>BLOCKED:reason</promise> and stops, and everything committed before that point is intact and verified. You lost one iteration, not five tasks of tangled progress.

A commit that contains one task is reviewable. The diff maps to a single entry in tasks.json, the acceptance criteria for that task tell you what to check, and the tests in the diff prove it. A commit (or worse, a single giant commit) that contains five tasks is archaeology. You cannot bisect it, you cannot revert one piece of it, and you cannot read why any single change exists six months later.

Clean, atomic commits are not a nicety here. They are the audit trail that makes you trust a diff you did not write. Batching trades that trail for a vague sense of speed you do not actually get, because the review and rework cost lands on you in the morning.

Fresh context per iteration is why one task works

Section titled “Fresh context per iteration is why one task works”

One task per iteration only makes sense because of the other half of the design: every iteration starts the agent with a fresh context window. The loop does not carry chat history forward. It reboots the agent’s understanding from files on disk each time.

This is the central idea of the Ralph technique, popularized by Geoffrey Huntley in his original Ralph writeup. The agent’s memory is not the conversation. It is the filesystem and the git history: .agent/tasks.json, the per-task specs, the logs, and the commits. Because state lives on disk, a fresh agent can reconstruct exactly where the project stands and pick up cleanly.

Fresh context and one-task scope are two sides of the same coin. Fresh context per iteration only helps if the unit of work fits inside one clean window, and one task is the unit sized to do that. Pair them and each iteration is crisp: clean slate in, one verified task out. Try to batch tasks on top of fresh context and you reintroduce the bloat that fresh context was meant to prevent.

flowchart TD
  Fresh["Fresh context window"] --> Read["Read SUMMARY.md and tasks.json"]
  Read --> Pick["Pick highest-priority incomplete task"]
  Pick --> Spec["Open one TASK-{ID}.json"]
  Spec --> Work["Work the steps for that one task"]
  Work --> Gate["Verify: tests, lint, types, screenshot"]
  Gate -->|"fail"| Work
  Gate -->|"pass"| Commit["Set passes true and commit one task"]
  Commit --> Stop["Stop. Iteration ends at a checkpoint"]
  Stop --> Next["Next iteration starts with fresh context"]
  Next --> Fresh

Read that diagram as a single lap. The only way out of a lap is a verified commit or an explicit stop signal. There is no path where the agent commits two tasks in one pass, because the lap ends the moment one task is done.

Clean commits are your checkpoints and rollback points

Section titled “Clean commits are your checkpoints and rollback points”

Treat each per-task commit as a save point in a game. When the agent finishes a task and commits, it records a state you can return to. If iteration twelve goes sideways, you do not lose the eleven good iterations before it. You reset to the last clean commit and keep the verified work.

This is why the commit is not optional and not deferred. The loop’s standard iteration ends by updating the task status, taking a screenshot for UI work, and committing in Conventional Commit format. One task, one commit, one checkpoint. The git log becomes a ledger of progress where each line is a unit you can read, verify, and if necessary revert in isolation.

Batching destroys this. A commit that bundles several tasks is a single, coarse save point. Revert it and you lose good work alongside the bad. Bisect it and every “bad” commit contains multiple changes, so you cannot pin the regression to one task. The granularity of your commits is the granularity of your recovery, and one task per commit is the finest useful grain.

There is a second benefit that matters for long runs. Because every checkpoint is verified before it is committed, the branch is always in a shippable-ish state between iterations. You can stop the loop at any point, after iteration three or after iteration thirty, and what you have is a set of completed, tested tasks rather than a half-built mess. That is what lets you run an agent overnight and trust the morning diff.

The rule is not a suggestion the agent is free to ignore. It is wired into how the loop and the prompt work together.

The loop itself is a Bash script (ralph.sh) that runs the agent once per iteration. Each iteration follows the same shape:

  1. Find the highest-priority incomplete task in .agent/tasks.json.
  2. Work the ordered steps in .agent/tasks/TASK-{ID}.json.
  3. Run tests, linting, and type checking.
  4. Complete the task, take a screenshot, update task status, and commit.
  5. Repeat until all tasks pass or the iteration cap is reached.

Step one is where scope gets enforced: the agent selects the highest-priority incomplete task, singular. The prompt sent each iteration tells the agent to complete that one task, commit, and stop rather than continuing to the next. Because the next iteration starts a brand new agent with fresh context, there is no natural way to “keep going” across tasks. The process boundary between iterations is the enforcement mechanism.

You control how many iterations run. The default is ten. Use the flag to set your own cap:

Terminal window
# Run up to 50 iterations (50 tasks, one per iteration)
./ralph.sh -n 50
# Run exactly one iteration to watch a single task closely
./ralph.sh --once

./ralph.sh --once is the cleanest demonstration of the rule. It runs a single iteration: one task, one commit, one stop. Use it the first time you point Ralph at a new task list, watch it complete exactly one task, review the commit, and only then turn it loose with a larger cap.

The loop does not stop on a feeling. It stops on an explicit signal. After the run, the agent emits a promise tag and the script maps it to an exit code:

<promise>COMPLETE</promise> all tasks finished exit 0
<promise>BLOCKED:reason</promise> needs human help exit 2
<promise>DECIDE:question</promise> needs a decision exit 3

If the loop hits its iteration cap before the list is finished, it exits with code 1 (MAX_ITERATIONS). Either way, every task that did complete completed on its own iteration, with its own verified commit. The completion signal is per-run; the one-task discipline is per-iteration.

What about tasks that are too small or too big?

Section titled “What about tasks that are too small or too big?”

The obvious objection: if the agent does only one task per iteration, the task had better be the right size. That is true, and it is why scoping happens before the loop ever runs, during breakdown.

If your tasks are too big, the agent runs out of room inside one iteration and you get half-finished work, the exact failure the rule was meant to prevent. The fix is not to relax the rule. It is to cut the task smaller. A good task is something an agent can finish in one short sitting, with its own acceptance criteria and its own verification. Getting tasks to that size is the subject of breaking a PRD into atomic agent tasks, and it is the prerequisite that makes one task per iteration practical.

If your tasks are too small, you simply burn an iteration on something trivial, which is cheap and harmless. The asymmetry matters: tasks that are slightly too small cost you a little time, tasks that are too big cost you a tangled branch. When in doubt, split.

The other piece is ordering. One task per iteration only flows smoothly if the highest-priority incomplete task is actually buildable right now, with its dependencies already satisfied. That is what the task lookup table manages. Each task declares its dependencies, and the loop respects them so it never tries to build the dashboard before the data exists. Scaling that ordering to hundreds of tasks is covered in task lookup tables for agents. Get the breakdown and the ordering right, and one task per iteration is not a constraint you fight. It is the rhythm the whole run settles into.

One task per iteration is the discipline that turns a spec into a reliable run. To build the rest of the workflow around it:

Frequently asked questions

What does one task per iteration mean?

It means the agent completes exactly one task per run of the loop, commits the result, and stops. The next iteration starts a fresh agent that picks the next highest-priority incomplete task. The agent never batches several tasks into a single invocation.

Why should an agent not do multiple tasks in one run?

Batching causes three problems. The context window fills with unrelated state and the agent starts contradicting itself, which is context rot. A failure mid-batch leaves several tasks half-finished and unshippable. And the resulting commit bundles many changes, so it is hard to review, revert, or bisect. One task per iteration keeps context small, failures local, and commits atomic.

How does one task per iteration relate to fresh context?

They are two halves of the same design. Each iteration starts the agent with a fresh context window and reconstructs state from files on disk, not from chat history. That only works if the unit of work fits inside one clean window, and one task is the unit sized to do that. Together they keep every iteration crisp: clean slate in, one verified task out.

How does Ralph enforce one task per iteration?

The loop runs the agent once per iteration. Each iteration the agent finds the single highest-priority incomplete task in .agent/tasks.json, works only that task spec, runs tests, lint, and type checks, updates the status, commits, and stops. Because the next iteration starts a brand new agent with fresh context, there is no way to carry work across tasks. The process boundary between iterations is the enforcement.

What if a task is too big to finish in one iteration?

Do not relax the rule, cut the task smaller. A good task is something an agent can finish in one short sitting with its own acceptance criteria. If the agent runs out of room inside one iteration, the breakdown was too coarse. Split the task during the breakdown phase so one task fits cleanly in one iteration.