Cost Control for Autonomous AI Coding Agents

Terminal showing a Ralph loop with iteration caps and cost guardrails

Mar 2, 2026 - 14 min read - 2800 words

Creator of RalphLoop.sh, founder of PageAI

An autonomous coding loop burns money when it thrashes: it retries the same broken task, drags a bloated context window from one call to the next, or runs a frontier model on work a cheap model could finish. You control spend with three knobs, not a billing dashboard. Cap the iterations so the run has a hard ceiling. Pick the model per agent so mechanical tasks do not pay frontier prices. Gate every iteration on tests, lint, and type checks so the loop stops working a task it cannot finish instead of grinding on it forever.

This is the cost-control chapter of the larger guide to running an AI coding agent overnight. The mechanics below are the same Bash loop Geoffrey Huntley described, with the spend-shaped edges called out.

Why an autonomous loop costs more than one prompt

A single prompt costs one call. A loop costs one call per iteration, and a badly built loop multiplies that in two ways.

First, runaway iterations. If nothing tells the loop to stop, it keeps spawning the agent. Ten iterations is fine. Two hundred iterations on a task that was already done at iteration three is two hundred calls you paid for and zero you needed.

Second, context bloat. A naive long-running agent keeps appending to one conversation. Tokens accumulate, every turn re-sends the whole transcript, and the per-call cost climbs as the session drags on. This is also where quality falls apart, because the agent loses the plot in a wall of stale context. The Ralph technique fixes both at once: each iteration starts the agent with a fresh context window and reads state from disk. Cost per iteration stays flat instead of growing with session length. The deeper version of that argument is in the writeup on what the Ralph technique is.

So cost control is not a separate feature you bolt on. It falls out of building the loop correctly. The rest of this post is the specific knobs.

Cap iterations with -n, —once, and —max-iterations

The iteration count is your budget dial. It is the hard ceiling on how many agent calls a single run can make. Ralph defaults to 10:

# 10 iterations, the default
./ralph.sh

Set the cap explicitly when you want a longer unattended run or a tighter leash:

# 50 iterations for an overnight run
./ralph.sh -n 50

# the long form does the same thing
./ralph.sh --max-iterations 5

For a smoke test, run exactly one iteration. This is the cheapest possible way to check that the agent authenticates, reads your prompt, picks up a task, and produces a sane diff before you commit to a long run:

# exactly one iteration
./ralph.sh --once

Treat --once as your dry run. Spend one call, read the diff and the log, and only then scale the count up. A cheap probe at the start saves you from discovering at iteration 40 that the prompt was wrong.

The cap matters because it converts an open-ended process into a bounded one. Without it, “run until done” can mean “run until your bill scares you.” With it, the worst case is exactly the number of iterations you authorized. When the loop hits the cap without finishing, it exits with code 1 (MAX_ITERATIONS). That is a signal, not a failure: read the log, decide whether to raise the cap, and resume.

Match the model to the task

The second knob is model choice, and it is where most of the savings live. Frontier models cost more per token than smaller ones. Plenty of agent work does not need a frontier model: renaming symbols, wiring up boilerplate, fixing a failing lint rule, applying a mechanical refactor across files. Running those on a cheap model and saving the expensive model for genuinely hard reasoning is the single biggest lever on a long run.

Ralph forwards anything after the -- separator straight to the agent, so you choose the model with the agent’s own flag:

# Codex on a specific model
./ralph.sh --agent codex -- --model gpt-5.5

# Gemini on the pro model
./ralph.sh -a gemini -- --model pro

The rule to remember: everything left of -- configures Ralph (which agent, how many iterations, login). Everything right of -- configures the agent (model, approval mode, and so on). Keep them on the correct side and the loop behaves.

This works across the supported agents, which are claude (default), codex, copilot, cursor, gemini, and opencode. Each has its own sandbox and its own credentials, so you can keep separate runs for separate model tiers. A practical pattern: point a cheap-model loop at a queue of mechanical tasks, and a frontier-model loop at the hard architectural ones. The per-agent CLIs are covered in detail in the Codex loop walkthrough and the Claude Code loop walkthrough.

Two things make model selection safe rather than risky. The work is decomposed into small tasks (see below), so a cheaper model is asked to do something small enough that it can actually succeed. And every iteration is verified, so if the cheaper model produces a broken change, the gate catches it instead of letting it ship.

Verification gates stop the loop from thrashing

The most expensive failure mode is not a model that costs too much per call. It is a loop that keeps calling the agent on a task it cannot complete. Verification gates are what turn that infinite drip into a clean stop.

Every iteration runs the verification stack as step three of the loop:

Find the highest-priority incomplete task in .agent/tasks.json.
Work the steps in .agent/tasks/TASK-{ID}.json.
Run tests, linting, and type checking.
Complete the task, take a screenshot, update the task status, and commit.
Repeat until all tasks pass or the iteration cap is reached.

The stack Ralph assumes is Playwright for end-to-end tests, Vitest for unit tests, TypeScript for type checks, ESLint for linting, and Prettier for formatting. The repo mantra is blunt: if you didn’t test it, it doesn’t work.

Here is why this is a cost control and not just a quality control. A task only flips to done when its checks pass. If the change is broken, the gate fails, the task stays open, and the agent gets honest feedback to fix it on the next pass. Without gates, the agent can mark a broken task done and move on, or worse, keep editing the same file with no signal about whether it is closer or further from working. That second case is the thrash: real calls, real tokens, zero progress. Gates give the loop a definition of progress, so a task either advances toward passing or you find out fast that it is stuck.

When a task genuinely cannot be finished, you do not want the loop to spend the rest of its cap discovering that one iteration at a time. The agent emits a promise tag instead:

<promise>COMPLETE</promise> means every task is finished.
<promise>BLOCKED:reason</promise> means it needs human help.
<promise>DECIDE:question</promise> means it needs a decision from you.

Those map to exit codes: 0 for COMPLETE, 1 for MAX_ITERATIONS, 2 for BLOCKED, and 3 for DECIDE. A BLOCKED or DECIDE exit ends the run early instead of burning the remaining iterations. You spend on the calls that made progress and stop on the call that hit a wall. The full treatment of why an autonomous agent needs this feedback is in verification loops for AI agents.

Atomic tasks keep each iteration cheap

Cost per iteration tracks how much the agent has to read and reason about in that single call. A vague, sprawling task forces the agent to load a lot of context, take many steps, and produce a large risky diff that is more likely to fail verification and get retried. A small, well-scoped task is cheap to load, fast to finish, and easy to verify.

So the breakdown of work is itself a cost lever. Ralph follows one rule per invocation: the agent completes exactly one task, commits, and stops. It never batches several tasks into a single iteration. That keeps each commit small, each diff reviewable, and each context window focused on one thing.

The state lives on disk, not in chat history. A task lookup table (.agent/tasks.json) points to individual task specs (.agent/tasks/TASK-{ID}.json), and the running log lives in .agent/logs/LOG.md. Because progress is on the filesystem and in the git log, a fresh-context agent reorients at the start of every iteration without re-reading a giant transcript. That is what keeps the token cost of iteration 40 the same as iteration 2.

The completion promise is the other half of this. When the work is genuinely done, the agent emits <promise>COMPLETE</promise> and the loop exits with code 0, even if you authorized 50 iterations and it finished in 12. You only pay for the iterations the work actually needed. The cap is the ceiling; the completion promise is the early exit. Together they bound spend from both ends.

flowchart TD
    Start(["./ralph.sh -n 50"]) --> Cap{"Iteration < cap?"}
    Cap -->|"no"| MaxOut(["exit 1: MAX_ITERATIONS, stop spending"])
    Cap -->|"yes"| Pick["Pick one atomic task from .agent/tasks.json"]
    Pick --> Spawn["Spawn agent with fresh context and chosen model"]
    Spawn --> Work["Read state from disk, do one task"]
    Work --> Gate{"Tests, lint, type check pass?"}
    Gate -->|"no"| Log["Log failure, keep task open"]
    Log --> Cap
    Gate -->|"yes"| Commit["Commit, update task status, screenshot"]
    Commit --> Promise{"Promise tag?"}
    Promise -->|"COMPLETE"| Done(["exit 0: done early, no wasted calls"])
    Promise -->|"BLOCKED or DECIDE"| Stop(["exit 2 or 3: stop and ask a human"])
    Promise -->|"none"| Cap

Read the diagram as a spend story. Three paths end the run: the cap is hit, the work completes, or the agent asks for help. None of them let the loop drip calls into a task that is going nowhere.

Watch the cost per iteration

You cannot control what you cannot see, so the last piece is the per-iteration trail. Ralph records each iteration’s cleaned output to .agent/history/ and appends to the running log at .agent/logs/LOG.md. That history is your audit log for spend: one entry per agent call, in order, with what the agent did and whether the task advanced.

Reading the history tells you the things that actually drive cost. How many iterations did it take to close each task? Is one task failing verification over and over and eating calls? Did the run hit the cap with work still open, or did it exit early on a completion promise? An iteration that did real work and committed is money well spent. A run of iterations that all touch the same failing task is the thrash you want to catch and fix, usually by tightening the task spec or the prompt in .agent/PROMPT.md.

When you need more than the log, get inside the box. Each agent runs in its own Docker Sandbox with a deterministic name, ralph-<agent>-<project-dir>-<hash8>. List them and open a shell:

# list sandboxes
sbx ls

# shell into the running sandbox to inspect logs and history
sbx exec -it ralph-<agent>-<project>-<hash8> bash

From there you can read .agent/history/, re-run a failing test by hand, and figure out why a task is stalling before it costs you more iterations. The Docker Sandboxes documentation covers the sandbox model in full, and the broader practice of making a long run auditable is the subject of observability for AI coding agents.

If the history shows the loop heading the wrong way and you do not want to kill it, edit .agent/STEERING.md. Ralph folds that critical work into the next iteration before resuming the task list. Steering mid-run is cheaper than letting a misdirected loop spend its whole cap and then starting over.

A cost-aware run, end to end

Put the knobs together and a deliberate run looks like this:

# 1. cheap probe: one iteration to confirm the setup is sane
./ralph.sh --once

# 2. mechanical backlog on a cheaper model, bounded cap
./ralph.sh --agent codex -n 30 -- --model gpt-5.5

# 3. read the per-iteration trail to see where the calls went
sbx exec -it ralph-codex-<project>-<hash8> bash   # then read .agent/logs/LOG.md

The probe spends one call to de-risk the run. The bounded loop runs an appropriately priced model with a hard ceiling, gates every iteration on tests so it cannot thrash, works one atomic task at a time so each call stays cheap, and exits early on a completion promise if the work finishes ahead of the cap. The history tells you exactly where the money went so the next run is tighter. That is cost control for an autonomous agent: not a billing alert after the fact, but a loop built so the expensive failure modes cannot happen.

Frequently asked questions

How do I limit how much an autonomous coding agent can spend?

Cap the iterations. The iteration count is the hard ceiling on agent calls per run. Ralph defaults to 10; set it with ./ralph.sh -n 50 or --max-iterations 5, and use ./ralph.sh --once for a one-call dry run. When the loop hits the cap it exits with code 1 instead of running forever.

Should I run a cheaper model for an AI coding agent loop?

Yes, for mechanical work. Renames, boilerplate, lint fixes, and routine refactors do not need a frontier model. Pass the model after the -- separator, for example ./ralph.sh --agent codex -- --model gpt-5.5, and save the expensive model for genuinely hard reasoning. Small tasks plus verification gates make the cheaper model safe to use.

What stops an agent loop from thrashing on a task it cannot finish?

Verification gates and promise tags. Every iteration runs tests, lint, and type checks, so a task only flips to done when its checks pass. If the agent gets stuck it emits BLOCKED or DECIDE, which exits the run early (codes 2 and 3) instead of spending the rest of the iteration cap on a task going nowhere.

Why does a fresh context window per iteration save money?

A single long conversation re-sends a growing transcript on every turn, so the per-call cost climbs as the session drags on. The Ralph technique starts each iteration with a clean context and reads state from disk (.agent/tasks.json, the log, git history), so the token cost of iteration 40 is about the same as iteration 2.

How do I see where the cost went in an autonomous run?

Read the per-iteration trail. Ralph writes each iteration's output to .agent/history/ and appends to .agent/logs/LOG.md, one entry per agent call. Shell into the sandbox with sbx exec -it ralph-<agent>-<project>-<hash8> bash to read it. Look for tasks that fail verification repeatedly, since those are the calls that cost money without making progress.

Run your own Ralph loop

Ralph is a hackable script you point at your project. Install it and let an agent work through your task list.

npx @pageai/ralph-loop

Install from npm Star on GitHub Watch the walkthrough