Running the Cursor CLI Agent in a Loop

Terminal running the Cursor CLI agent in an autonomous Ralph loop inside a Docker sandbox

Mar 10, 2026 - 14 min read - 2800 words

Creator of RalphLoop.sh, founder of PageAI

To run the Cursor CLI agent in an autonomous loop, point Ralph at it with one flag: ./ralph.sh --agent cursor. Ralph wraps the headless cursor-agent in print mode, starts it with a fresh context window every iteration, and keeps re-running it against your task list until the work is done or you hit the iteration cap. Log in once inside the sandbox, pass Cursor’s own flags after a -- separator, and review a finished diff in the morning.

This is the Cursor-specific walkthrough in the larger guide to agentic coding CLIs. The loop mechanics are identical to running Claude Code in a loop; only the agent binary and its flags change. If you want the full setup from an empty repo (install, task list, login, review), follow how to run a Ralph loop with the Cursor CLI; this post is the flag-level reference.

Run the Cursor CLI agent in a loop with one flag

Ralph is a Bash script you point at a project. The default agent is Claude, so you switch to Cursor explicitly:

./ralph.sh --agent cursor

That runs 10 iterations, the default. Tune the count when you want a longer unattended run or a single dry run:

# 50 iterations
./ralph.sh --agent cursor -n 50

# exactly one iteration (good for a smoke test)
./ralph.sh --agent cursor --once

# explicit cap
./ralph.sh --agent cursor --max-iterations 5

The short form -a works too: ./ralph.sh -a cursor -n 5. Supported agents are claude (default), codex, copilot, cursor, gemini, and opencode, so the same harness drives any of them.

Under the hood, Ralph builds a cursor-agent command and runs it inside a Docker Sandbox. The expansion for ./ralph.sh --agent cursor looks like this:

sbx run --name ralph-cursor-<project>-<hash8> cursor . -- -p "$PROMPT_CONTENT"

The -p flag (long form --print) is Cursor’s headless mode. It prints the agent’s responses to the console for scripts and non-interactive use, and the Cursor CLI parameter reference is clear that print mode still has access to all tools, including write and shell. That is exactly what a loop needs: an agent that can edit files and run commands without a person in the chair. Print mode reads the prompt, does the work, prints a final message, and exits, and that exit is what lets Ralph treat each iteration as a discrete unit.

Set up and log in to Cursor inside the sandbox

Cursor runs inside an isolated Docker Sandbox, not on your host, so it needs credentials in that environment. Start by dropping Ralph into your project:

npx @pageai/ralph-loop

This adds the ralph.sh script and the .agent/ directory that holds the prompt, the task list, and the logs. Then authenticate once with the login action:

./ralph.sh --login --agent cursor

This prints the login command for every supported agent, highlights the one for Cursor, and then drops you into the sandbox shell. Inside, you authenticate Cursor once. Run cursor-agent login and follow the prompt, or provide a key through the CURSOR_API_KEY environment variable. Confirm the session with cursor-agent status. The credential persists in that named sandbox, so later runs attach to the same box and start already logged in.

Each agent gets its own deterministic sandbox name, derived from the agent slug, the project directory, and a hash of the absolute path:

ralph-<agent>-<project-dir>-<hash8>

For Cursor that is ralph-cursor-<project>-<hash8>. Print the exact name for your project without starting a run:

./ralph.sh --print-name --agent cursor

Per-agent names matter because they keep state separate. Your Cursor sandbox and your Claude sandbox never share credentials, history, or installed tools, so you can run both against the same repo without one clobbering the other. If Cursor is not authenticated when the loop starts, Ralph detects the auth failure, stops, and tells you to run ./ralph.sh --login --agent cursor. No silent thrashing on a box that can never make progress.

Pass a model and flags after the — separator

Everything to the right of Ralph’s own -- separator is forwarded straight to the agent. For Cursor, Ralph inserts those arguments right after -p, before the prompt. So this:

./ralph.sh --agent cursor -- --model auto

expands to:

sbx run --name ralph-cursor-<project>-<hash8> cursor . -- -p --model auto "$PROMPT_CONTENT"

The --model flag picks the model for the run. Do not guess at model names: run cursor-agent models (or pass --list-models) inside the sandbox to print the exact identifiers Cursor accepts, then pass one of those. The separator works for any valid cursor-agent flag, not just the model. A few you will reach for in a loop:

# pick a model (list them first with cursor-agent models)
./ralph.sh --agent cursor -- --model auto

# never pause to approve a command (alias: --yolo)
./ralph.sh --agent cursor -- --force

# emit newline-delimited JSON events instead of plain text
./ralph.sh --agent cursor -- --output-format stream-json

The rule to remember: everything left of -- configures Ralph (agent, iteration count, login). Everything right of -- configures the agent. Keep them on the correct side and the loop behaves.

Two of those flags deserve a note. -f (long form --force, alias --yolo) tells Cursor to allow commands unless a rule explicitly denies them, which is what keeps an unattended run from stalling on an approval prompt nobody is there to answer. And --output-format (which only works alongside --print) switches the stream from text to json or stream-json, useful when a downstream CI step parses events with jq. Ralph already records each iteration’s cleaned output to .agent/history/ and the running log to .agent/logs/LOG.md, so you get a per-iteration trail regardless of which format you pick.

What happens each iteration

Ralph’s loop is the Bash loop Geoffrey Huntley described in the original Ralph writeup. Each pass is mechanical and identical:

Find the highest-priority incomplete task in .agent/tasks.json.
Work the steps in .agent/tasks/TASK-{ID}.json.
Run tests, linting, and type checking.
Complete the task, take a screenshot, update the task status, and commit.
Repeat until all tasks pass or the iteration cap is reached.

The critical part is that each iteration spawns a fresh cursor-agent -p with a clean context window. The agent does not carry a bloated, hours-long transcript from one task to the next. It reads the current state from disk, does one task, and exits. This is why the loop deliberately does not use Cursor’s own --resume or --continue session flags: continuity is a liability here, not a feature.

flowchart TD
    Start(["./ralph.sh --agent cursor"]) --> Pick["Pick top task from .agent/tasks.json"]
    Pick --> Spawn["sbx run cursor . -- -p (fresh context)"]
    Spawn --> Work["cursor-agent reads state, edits files, runs commands"]
    Work --> Verify["Run tests, lint, type check, screenshot"]
    Verify --> Commit["Commit and update task status"]
    Commit --> Check{"Promise tag emitted?"}
    Check -->|"none"| Pick
    Check -->|"COMPLETE"| Done(["exit 0, all tasks done"])
    Check -->|"BLOCKED or DECIDE"| Stop(["exit 2 or 3, wants a human"])

The filesystem and git history are the memory layer. Progress lives in .agent/tasks.json, .agent/logs/LOG.md, per-task spec files, and the git log, not in a chat transcript. That separation of thinking (ephemeral, per iteration) from state (durable, on disk) is what keeps a fresh-context agent oriented across dozens of iterations. The deeper version of this idea is in the guide to running an AI coding agent overnight.

A loop also needs a stop condition that is a signal, not a vibe. Cursor emits a semantic promise tag in its final message, and Ralph reads it:

<promise>COMPLETE</promise> means every task is finished.
<promise>BLOCKED:reason</promise> means the agent needs human help.
<promise>DECIDE:question</promise> means it needs a decision you have to make.

Those map to exit codes: 0 for COMPLETE, 1 for hitting MAX_ITERATIONS, 2 for BLOCKED, and 3 for DECIDE. Wire those into a wrapper script or a CI step and you get clean branching: ship on 0, page yourself on 2 or 3, extend the cap on 1.

One rule keeps the whole thing reliable: one task per invocation. Cursor completes exactly one task, commits, and stops. It never batches several tasks into a single iteration, which is what keeps each commit small, each diff reviewable, and each context window focused.

Isolate Cursor in a Docker Sandbox and review the diff in the morning

An autonomous agent that can write files and run shell commands is exactly as dangerous as the permissions it inherits. Run cursor-agent --force on your laptop and it can touch your SSH keys, your environment variables, and anything else your user can reach. The fix is not to make the agent more timid. The fix is to change the blast radius.

Ralph runs each agent inside a Docker Sandbox, an isolated microVM managed by the sbx CLI. Inside that boundary, Cursor runs in its --force (YOLO) mode without you previewing every command, because the sandbox is the boundary the agent cannot cross. The microVM has its own kernel, an isolated filesystem, and a network that is deny-by-default.

Cursor also ships its own internal sandbox, toggled with --sandbox enabled or --sandbox disabled. Running a second sandbox inside the microVM buys you nothing, because the microVM already contains the agent, and it can introduce nested-isolation friction. So a practical pairing for a loop is to let the microVM do the isolating and let Cursor focus on the coding:

./ralph.sh --agent cursor -n 50 -- --model auto --force --sandbox disabled

When the agent needs a package, the deny-by-default network blocks it until you allow the domain:

sbx policy allow network ralph-cursor-<project>-<hash8> registry.npmjs.org

That is a feature. The agent can install what the task needs without a path to exfiltrate your source or reach arbitrary hosts. The full argument, including how the microVM compares to a hand-rolled container, lives in how to run AI coding agents in Docker sandboxes safely, and the Docker Sandboxes documentation covers the policy model in detail.

This is the payoff of looping overnight. Because every task is committed separately, the morning review is a git review, not an archaeology dig:

git log --oneline
git diff main...HEAD

You read the commits in order, eyeball the screenshots the agent captured, and accept or revert. The work arrived as small, verified, individually committed units, so a single bad task is one revert, not a tangled mess you have to unwind by hand.

Verify every iteration with the test stack

A loop is only as good as its feedback. If Cursor cannot tell whether its change worked, it will happily mark a broken task done and move on. The repo mantra is blunt: if you didn’t test it, it doesn’t work.

Ralph assumes a verification stack and runs it inside step three of every iteration:

Vitest for unit tests.
Playwright for end-to-end tests.
TypeScript for type checking.
ESLint for linting.
Prettier for formatting.

Most projects wire these into npm scripts the agent calls during each iteration:

{
  "scripts": {
    "test": "vitest run",
    "test:e2e": "playwright test",
    "typecheck": "tsc --noEmit",
    "lint": "eslint .",
    "format": "prettier --check ."
  }
}

Because print mode has full tool access, Cursor can run those commands itself, read the failures, and fix them before it commits. A failed check sends the agent back to fix the work rather than forward to the next task. Screenshots add a second channel: for UI work, a passing suite is necessary but not sufficient, so the agent captures the rendered state as visual evidence you can review later.

Because every iteration starts fresh, verification is also how the next iteration learns what the last one did. The agent does not remember the previous run. It reads the test results, the updated task status, and the new commits, then decides what is next. That feedback loop is the whole point, and it generalizes across agents. The same verification discipline applies when you run the Gemini CLI in a loop or any other CLI in this family.

Inspect and debug the Cursor sandbox

When a run stalls or a task keeps failing, get inside the box. The sandbox is a normal container you can poke at. List what exists:

sbx ls

Open a shell in the Cursor sandbox and look around:

sbx exec -it ralph-cursor-<project>-<hash8> bash

From there you can check the working tree, re-run a failing test by hand, confirm auth with cursor-agent status, or read .agent/logs/LOG.md and the per-iteration logs in .agent/history/. Reattach to a sandbox session with:

sbx run ralph-cursor-<project>-<hash8>

Most stalls trace back to one of three things: Cursor was never authenticated inside the box, a network policy is blocking an install, or the prompt in .agent/PROMPT.md lacks a clear completion criterion. The sandbox shell shows you which one it is. If you need to redirect a running loop without killing it, edit .agent/STEERING.md. Ralph folds that critical work into the next iteration before resuming the normal task list. That is steering, not stopping, and it keeps momentum while you correct course.

Putting it together

A real Cursor loop, start to finish, is three commands:

# 1. authenticate once (creates the sandbox, you log in inside it)
./ralph.sh --login --agent cursor

# 2. confirm the sandbox name for network policies and debugging
./ralph.sh --print-name --agent cursor

# 3. run the loop with a model and force mode, inside the sandbox boundary
./ralph.sh --agent cursor -n 50 -- --model auto --force --sandbox disabled

That is the Cursor CLI agent running unattended: fresh context per iteration, state on disk, write access granted because the microVM is the real boundary, and a hard stop on a completion promise. Define your tasks in .agent/tasks.json, write a clear .agent/PROMPT.md, start the loop, and read the commits in the morning.

Frequently asked questions

How do I run the Cursor CLI agent in a loop?

Use Ralph and pass the agent flag: ./ralph.sh --agent cursor. Ralph wraps the headless cursor-agent in print mode, runs it inside a Docker Sandbox, starts a fresh context window each iteration, and repeats until every task in .agent/tasks.json is done or the iteration cap is reached. The default is 10 iterations; raise it with -n 50.

How do I pass a model to the Cursor agent through Ralph?

Put it after the -- separator. Anything to the right of -- is forwarded to cursor-agent, so ./ralph.sh --agent cursor -- --model auto runs cursor-agent in print mode with that model and the Ralph prompt. Run cursor-agent models or --list-models inside the sandbox to see the exact model identifiers Cursor accepts.

How do I log in to Cursor inside the sandbox?

Run ./ralph.sh --login --agent cursor. It drops you into the sandbox shell where you run cursor-agent login, or you can set the CURSOR_API_KEY environment variable. The credential persists in that named sandbox, named ralph-cursor-<project>-<hash8>, so future runs attach to the same box already logged in.

Is it safe to run the Cursor agent in --force or --yolo mode?

It is unsafe on your laptop and reasonable inside a sandbox. Ralph runs cursor-agent in an isolated Docker Sandbox microVM with network denied by default, so force mode lets the agent run commands without pausing for approval while staying unable to touch your real files or exfiltrate data. The sandbox is the boundary, not the agent.

How does the loop know when to stop?

Cursor emits a promise tag in its final message. COMPLETE stops the loop with exit code 0, BLOCKED exits with 2, and DECIDE exits with 3. Hitting the iteration cap without completing exits with 1. You branch on those exit codes in a wrapper script or CI.

Run your own Ralph loop

Ralph is a hackable script you point at your project. Install it and let an agent work through your task list.

npx @pageai/ralph-loop

Install from npm Star on GitHub Watch the walkthrough