How to Run the Gemini CLI in an Autonomous Coding Loop
To run Google’s Gemini CLI as an autonomous coding agent, point Ralph at it with one flag: ./ralph.sh --agent gemini. Ralph runs the Gemini CLI in its non-interactive prompt mode inside a Docker Sandbox, starts the agent with a fresh context window every iteration, and keeps re-running it against your task list until the work is done or you hit the iteration cap. Pick a model after the -- separator, authenticate once inside the sandbox, and walk away.
This is the Gemini-specific walkthrough in the larger guide to agentic coding CLIs. The loop mechanics are identical to running the Codex CLI in a loop and running the Cursor CLI agent in a loop. Only the agent binary and its flags change.
Run the Gemini CLI in a loop with one flag
Section titled “Run the Gemini CLI in a loop with one flag”Ralph is a Bash script you point at a project. Claude is the default agent, so you switch to Gemini explicitly:
./ralph.sh --agent geminiThat runs 10 iterations, the default. Change the count when you want a longer unattended session or a single smoke test:
# 50 iterations./ralph.sh --agent gemini -n 50
# exactly one iteration (good for a dry run)./ralph.sh --agent gemini --once
# explicit cap./ralph.sh --agent gemini --max-iterations 5The short form -a works too: ./ralph.sh -a gemini -n 20. Supported agents are claude (default), codex, copilot, cursor, gemini, and opencode, so the same harness drives any of them with the same flags.
Under the hood, Ralph builds a Gemini command and runs it inside a Docker Sandbox. The expansion for ./ralph.sh --agent gemini looks like this:
sbx run --name ralph-gemini-<project>-<hash8> gemini . -- -p "$PROMPT_CONTENT"The -p flag (long form --prompt) is the Gemini CLI’s non-interactive mode. It reads a prompt, does the work, prints a final message, and exits. That clean exit is what lets Ralph treat each iteration as a discrete unit instead of one long interactive session. See the Gemini CLI documentation for the full command surface.
Select a Gemini model after the — separator
Section titled “Select a Gemini model after the — separator”Anything to the right of Ralph’s own -- separator is forwarded straight to the agent. For Gemini, Ralph inserts those arguments after the sandbox’s --, before the -p prompt. So this:
./ralph.sh -a gemini -- --model proexpands to:
sbx run --name ralph-gemini-<project>-<hash8> gemini . -- --model pro -p "$PROMPT_CONTENT"The --model flag (short form -m) picks the model for the run. Use the separator for any valid Gemini CLI flag, not just the model:
# pick a model./ralph.sh -a gemini -- --model pro
# combine the model with a longer run./ralph.sh -a gemini -n 50 -- --model proThe rule to remember: everything left of -- configures Ralph (agent, iteration count, login). Everything right of -- configures Gemini. Keep your arguments on the correct side and the loop behaves.
Log in to Gemini inside the sandbox
Section titled “Log in to Gemini inside the sandbox”Gemini runs inside an isolated Docker Sandbox, not on your host, so it needs credentials in that environment. Authenticate once with the login action:
./ralph.sh --login --agent geminiThis prints the login command for every supported agent, highlights the one for Gemini, and drops you into the sandbox shell. Inside, you run gemini once and complete its authentication (sign in with your Google account or set a GEMINI_API_KEY). The credential persists in that named sandbox, so later runs attach to the same box and start already authenticated.
Each agent gets its own deterministic sandbox name, derived from the agent slug, the project directory, and a hash of the absolute path:
ralph-<agent>-<project-dir>-<hash8>For Gemini that is ralph-gemini-<project>-<hash8>. Print the exact name for your project without starting a run:
./ralph.sh --print-name --agent geminiPer-agent names matter because they keep state separate. Your Gemini sandbox and your Claude sandbox never share credentials, history, or installed tools. If Gemini is not authenticated when the loop starts, Ralph watches for auth-failure patterns like API key not valid, stops, and tells you to run ./ralph.sh --login --agent gemini. No silent thrashing on a box that can never make progress.
What happens each iteration (fresh context)
Section titled “What happens each iteration (fresh context)”Ralph’s loop is the Bash loop Geoffrey Huntley described in the original Ralph writeup. Each pass is mechanical and identical:
- Find the highest-priority incomplete task in
.agent/tasks.json. - Work the steps in
.agent/tasks/TASK-{ID}.json. - Run tests, linting, and type checking.
- Complete the task, take a screenshot, update the task status, and commit.
- Repeat until all tasks pass or the iteration cap is reached.
The critical part is that each iteration spawns a fresh Gemini process with a clean context window. The agent does not carry a bloated, hours-long transcript from one task to the next. It reads the current state from disk, does one task, and exits. That is the fix for context rot, the failure mode where an agent slowly loses the plot over a long session.
flowchart TD
Start(["./ralph.sh -a gemini -- --model pro"]) --> Pick["Pick top task from .agent/tasks.json"]
Pick --> Spawn["sbx run gemini . -- -p (fresh context)"]
Spawn --> Work["Gemini reads state from disk, edits files, runs commands"]
Work --> Verify["Run tests, lint, type check, screenshot"]
Verify --> Commit["Commit and update task status"]
Commit --> Check{"Promise tag emitted?"}
Check -->|"none"| Pick
Check -->|"COMPLETE"| Done(["exit 0, all tasks done"])
Check -->|"BLOCKED or DECIDE"| Stop(["exit 2 or 3, wants a human"])
The filesystem and git history are the memory layer. Progress lives in .agent/tasks.json, .agent/logs/LOG.md, per-task spec files, and the git log, not in a chat transcript. That is what keeps a fresh-context agent oriented across dozens of iterations.
A loop also needs a stop condition that is a signal, not a vibe. Gemini emits a semantic promise tag in its final message, and Ralph reads it:
<promise>COMPLETE</promise>means every task is finished.<promise>BLOCKED:reason</promise>means the agent needs human help.<promise>DECIDE:question</promise>means it needs a decision you have to make.
Those map to exit codes: 0 for COMPLETE, 1 for hitting MAX_ITERATIONS, 2 for BLOCKED, and 3 for DECIDE. Wire those into a wrapper script or a CI step and you get clean branching: ship on 0, page yourself on 2 or 3, extend the cap on 1.
One rule keeps the whole thing reliable: one task per invocation. Gemini completes exactly one task, commits, and stops. It never batches several tasks into a single iteration, which is what keeps each commit small, each diff reviewable, and each context window focused on a single goal.
Verify every iteration with tests and screenshots
Section titled “Verify every iteration with tests and screenshots”A loop is only as good as its feedback. If Gemini cannot tell whether its change worked, it will happily mark a broken task done and move on. The repo mantra is blunt: if you didn’t test it, it doesn’t work.
Ralph assumes a verification stack and runs it inside step three of every iteration:
- Playwright for end-to-end tests.
- Vitest for unit tests.
- TypeScript for type checking.
- ESLint for linting.
- Prettier for formatting.
Gemini runs those commands itself, reads the failures, and fixes them before committing. Screenshots add a second channel: the agent captures the UI state so you can eyeball the result in the morning instead of reading diffs blind.
Because every iteration starts fresh, verification is also how the next iteration learns what the last one did. The agent does not remember the previous run. It reads the test results, the updated task status, and the new commits, then decides what is next. That feedback loop is the whole point, and it works the same across every agent in this family.
Safe autonomy: the sandbox is the boundary
Section titled “Safe autonomy: the sandbox is the boundary”For an unattended loop there is nobody to approve a file write or a shell command, so the agent has to run without pausing for permission. On your laptop that is reckless. Inside a sandbox it is fine, because the blast radius is the microVM, not your machine.
A Ralph loop runs Gemini inside a Docker Sandbox: an isolated microVM with its own kernel, an isolated filesystem, and a network that is deny-by-default. The sandbox is the boundary you enforce, so you do not need the agent policing itself. For the full argument, including why a microVM beats a hand-rolled container, read how to run AI coding agents in Docker sandboxes safely.
When the agent needs a package, the deny-by-default network blocks it until you allow the domain:
sbx policy allow network ralph-gemini-<project>-<hash8> registry.npmjs.orgThat is a feature, not a hurdle. The agent can install what a task needs without a path to reach arbitrary hosts or exfiltrate your source. The Docker Sandboxes documentation covers the policy model in full, including the global -g form and the "**" wildcard for the rare case where you want to open everything.
Inspect and debug the Gemini sandbox
Section titled “Inspect and debug the Gemini sandbox”When a run stalls or a task keeps failing, get inside the box. The sandbox is a normal container you can poke at. List what exists:
sbx lsOpen a shell in the Gemini sandbox and look around:
sbx exec -it ralph-gemini-<project>-<hash8> bashFrom there you can check the working tree, re-run a failing test by hand, inspect installed tools, or read .agent/logs/LOG.md and the per-iteration logs in .agent/history/. Reattach to a sandbox session with:
sbx run ralph-gemini-<project>-<hash8>Most stalls trace back to one of three things: Gemini was never authenticated (so every iteration fails the auth check), a network policy is blocking an install, or the prompt in .agent/PROMPT.md lacks a clear completion criterion. The sandbox shell shows you which one it is.
If you need to redirect a running loop without killing it, edit .agent/STEERING.md. Ralph reads it and folds critical work into the next iteration before resuming the normal task list. That is steering, not stopping, and it keeps momentum while you correct course.
Putting it together
Section titled “Putting it together”A real Gemini loop, start to finish, is three commands:
# 1. authenticate once (creates the sandbox, you sign in inside it)./ralph.sh --login --agent gemini
# 2. confirm the sandbox name for network policies and debugging./ralph.sh --print-name --agent gemini
# 3. run the loop with a model, inside the sandbox boundary./ralph.sh -a gemini -n 50 -- --model proThat is Google’s Gemini CLI running unattended: a fresh context per iteration, state on disk, the microVM as the real boundary, and a hard stop on a completion promise. Define your tasks in .agent/tasks.json, write a clear .agent/PROMPT.md, and let it work through the list while you do something else.
Frequently asked questions
How do I run the Gemini CLI in a coding loop?
Use Ralph and pass the agent flag: ./ralph.sh --agent gemini. Ralph runs the Gemini CLI in its non-interactive -p prompt mode inside a Docker Sandbox, starts a fresh context window each iteration, and repeats until every task in .agent/tasks.json is done or the iteration cap is reached. The default is 10 iterations; raise it with -n 50.
How do I choose a Gemini model through Ralph?
Put it after the -- separator. Anything to the right of -- is forwarded to the agent, so ./ralph.sh -a gemini -- --model pro expands to gemini . -- --model pro -p with the Ralph prompt. The same separator works for any valid Gemini CLI flag.
How do I log in to Gemini inside the sandbox?
Run ./ralph.sh --login --agent gemini. It drops you into the sandbox shell where you run gemini once and complete its authentication, either by signing in with your Google account or setting a GEMINI_API_KEY. The credential persists in that named sandbox, so future runs attach to the same box already authenticated. The sandbox is named ralph-gemini-<project>-<hash8>.
Why does the loop start each iteration with a clean context?
Long sessions rot. An agent that carries an hours-long transcript loses track of the goal. Ralph spawns a fresh Gemini process every iteration with a clean context window, and the agent rebuilds its understanding from disk: .agent/tasks.json, the task spec files, .agent/logs/LOG.md, and the git history. That filesystem state is the memory layer, not the chat.
How does the loop know when to stop?
Gemini emits a promise tag in its final message. <promise>COMPLETE</promise> stops the loop with exit code 0, BLOCKED exits with 2, and DECIDE exits with 3. Hitting the iteration cap without completing exits with 1. You branch on those exit codes in a wrapper script or CI.