How to Run the Codex CLI in an Autonomous Loop

Terminal running the OpenAI Codex CLI in a Ralph loop inside a Docker sandbox

Feb 26, 2026 - 13 min read - 2700 words

Creator of RalphLoop.sh, founder of PageAI

To run OpenAI’s Codex CLI in an autonomous loop, point Ralph at it with one flag: ./ralph.sh --agent codex. Ralph wraps codex exec (the non-interactive mode), starts the agent with a fresh context window each iteration, and keeps re-running it against your task list until the work is done or you hit the iteration cap. Pass Codex’s own flags after a -- separator, log in once inside the sandbox, and let it grind.

This is the Codex-specific walkthrough in the larger guide to agentic coding CLIs. The loop mechanics are identical to running Claude Code in a loop; only the agent binary and its flags change. If you want the full setup from an empty repo, follow how to run a Ralph loop with the Codex CLI; this post is the flag-level reference.

Run the Codex CLI in a loop with one flag

Ralph is a Bash script you point at a project. The default agent is Claude, so you switch to Codex explicitly:

./ralph.sh --agent codex

That runs 10 iterations, the default. Tune the count when you want a longer unattended run or a single dry run:

# 50 iterations
./ralph.sh --agent codex -n 50

# exactly one iteration (good for a smoke test)
./ralph.sh --agent codex --once

# explicit cap
./ralph.sh --agent codex --max-iterations 5

The short form -a works too: ./ralph.sh -a codex -n 20. Supported agents are claude (default), codex, copilot, cursor, gemini, and opencode, so the same harness drives any of them.

Under the hood, Ralph builds a codex exec command and runs it inside a Docker Sandbox. The expansion for ./ralph.sh --agent codex looks like this:

sbx run --name ralph-codex-<project>-<hash8> codex . -- exec "$PROMPT_CONTENT"

codex exec is the headless mode meant for automation and CI. It reads a prompt, does the work, prints a final message, and exits. That exit is what lets Ralph treat each iteration as a discrete unit. See the Codex CLI command reference for the full exec surface.

Pass a model and flags after the — separator

Anything after Ralph’s own -- separator is forwarded straight to the agent. For Codex, Ralph inserts those arguments right after exec, before the prompt. So this:

./ralph.sh --agent codex -- --model gpt-5.5

expands to:

sbx run --name ralph-codex-<project>-<hash8> codex . -- exec --model gpt-5.5 "$PROMPT_CONTENT"

The --model flag (short form -m) picks the model for the run. Use the separator for any valid codex exec flag, not just the model. A few you will reach for in a loop:

# pick a model
./ralph.sh --agent codex -- --model gpt-5.5

# let Codex write to the workspace and never pause for approval
./ralph.sh --agent codex -- --sandbox workspace-write --ask-for-approval never

# stream newline-delimited JSON events instead of formatted text
./ralph.sh --agent codex -- --model gpt-5.5 --json

The rule to remember: everything left of -- configures Ralph (agent, iteration count, login). Everything right of -- configures the agent. Keep them on the correct side and the loop behaves.

Capture machine-readable output for CI

When you run Codex in CI rather than at your terminal, two flags make the output easy to consume. --json switches codex exec from formatted text to newline-delimited JSON events, one per state change (text messages, approval requests, turn_complete, and error). --output-last-message <file> writes the agent’s final natural-language summary to a file you can read after the run finishes. Pair them through the separator:

./ralph.sh --agent codex -- --model gpt-5.5 --json --output-last-message .agent/last-message.txt

Ralph already records each iteration’s cleaned output to .agent/history/ and the running log to .agent/logs/LOG.md, so you get a per-iteration trail regardless of these flags. The Codex JSON stream is useful when a downstream step parses events with jq, for example to fail a pipeline on an error event or to gate a deploy on turn_complete. This pairs naturally with the exit codes Ralph returns, which I cover below.

Log in to Codex inside the sandbox

Codex runs inside an isolated Docker Sandbox, not on your host, so it needs credentials in that environment. Authenticate once with the login action:

./ralph.sh --login --agent codex

This prints the login command for every supported agent, highlights the one for Codex, and then drops you into the sandbox shell. Inside, you authenticate Codex once (run codex login, or provide your OpenAI API key). The credential persists in that named sandbox, so later runs attach to the same box and start already logged in.

Each agent gets its own deterministic sandbox name, derived from the agent slug, the project directory, and a hash of the absolute path:

ralph-<agent>-<project-dir>-<hash8>

For Codex that is ralph-codex-<project>-<hash8>. Print the exact name for your project without starting a run:

./ralph.sh --print-name --agent codex

Per-agent names matter because they keep state separate. Your Codex sandbox and your Claude sandbox never share credentials, history, or installed tools. If Codex is not authenticated when the loop starts, Ralph detects the auth failure, stops, and tells you to run ./ralph.sh --login --agent codex. No silent thrashing on a box that can never make progress.

Full auto, approval modes, and why the sandbox makes it safe

Here is the detail that trips people up: codex exec runs in a read-only sandbox by default. A read-only agent cannot edit files, so a loop that uses the default mode will spin without ever completing a task. You have to grant write access deliberately.

Codex exposes two relevant controls, documented in the Codex agent approvals and security guide:

--sandbox (short -s): one of read-only, workspace-write, or danger-full-access. This is Codex’s own internal sandbox.
--ask-for-approval (short -a): one of untrusted, on-request, or never. This decides when Codex pauses for a human.

For an unattended loop there is nobody to answer a prompt, so any approval mode that pauses will stall the run. Two combinations work:

# write inside the workspace, never pause
./ralph.sh --agent codex -- --sandbox workspace-write --ask-for-approval never

# disable Codex's own gates entirely (alias: --yolo)
./ralph.sh --agent codex -- --dangerously-bypass-approvals-and-sandbox

--full-auto is a deprecated compatibility flag that maps to workspace-write plus on-request approval, and Codex prints a warning when you use it. Prefer the explicit flags above.

The bypass flag sounds reckless, and on your laptop it is. OpenAI is explicit that --dangerously-bypass-approvals-and-sandbox should only run inside an externally hardened environment, like a clean Docker container or a dedicated CI runner. A Ralph loop runs Codex inside exactly that: a Docker Sandbox microVM with its own kernel, an isolated filesystem, and a network that is deny-by-default. The sandbox is the boundary, so you do not need Codex policing itself.

This is the same principle behind running any agent in bypass mode safely. The blast radius is the microVM, not your machine. For the full argument and the network policy details, read how to run AI agents in Docker sandboxes safely.

A practical bonus: bypassing Codex’s own sandbox avoids nested-sandbox friction. Running a seatbelt or landlock sandbox inside a microVM can fail or behave oddly, and you gain nothing because the microVM already contains the agent. Let the sandbox do the isolating and let Codex do the coding.

When the agent needs a package, the deny-by-default network blocks it until you allow the domain:

sbx policy allow network ralph-codex-<project>-<hash8> registry.npmjs.org

That is a feature. The agent can install what the task needs without a path to exfiltrate your source or reach arbitrary hosts. The Docker Sandboxes documentation covers the policy model in full.

What happens each iteration

Ralph’s loop is the Bash loop Geoffrey Huntley described in the original Ralph writeup. Each pass is mechanical and identical:

Find the highest-priority incomplete task in .agent/tasks.json.
Work the steps in .agent/tasks/TASK-{ID}.json.
Run tests, linting, and type checking.
Complete the task, take a screenshot, update the task status, and commit.
Repeat until all tasks pass or the iteration cap is reached.

The critical part is that each iteration spawns a fresh codex exec with a clean context window. The agent does not carry a bloated, hours-long transcript from one task to the next. It reads the current state from disk, does one task, and exits.

flowchart TD
    Start(["./ralph.sh --agent codex"]) --> Pick["Pick top task from .agent/tasks.json"]
    Pick --> Spawn["sbx run codex . -- exec (fresh context)"]
    Spawn --> Work["Codex reads state, edits files, runs commands"]
    Work --> Verify["Run tests, lint, type check, screenshot"]
    Verify --> Commit["Commit and update task status"]
    Commit --> Check{"Promise tag emitted?"}
    Check -->|"none"| Pick
    Check -->|"COMPLETE"| Done(["exit 0, all tasks done"])
    Check -->|"BLOCKED or DECIDE"| Stop(["exit 2 or 3, wants a human"])

The filesystem and git history are the memory layer. Progress lives in .agent/tasks.json, .agent/logs/LOG.md, per-task spec files, and the git log, not in a chat transcript. That is what keeps a fresh-context agent oriented across dozens of iterations. The deeper version of this idea is in the guide to running an AI coding agent overnight and the broader pattern of context engineering for long-running agents.

A loop also needs a stop condition that is a signal, not a vibe. Codex emits a semantic promise tag in its final message, and Ralph reads it:

<promise>COMPLETE</promise> means every task is finished.
<promise>BLOCKED:reason</promise> means the agent needs human help.
<promise>DECIDE:question</promise> means it needs a decision you have to make.

Those map to exit codes: 0 for COMPLETE, 1 for hitting MAX_ITERATIONS, 2 for BLOCKED, and 3 for DECIDE. Wire those into a wrapper script or a CI step and you get clean branching: ship on 0, page yourself on 2 or 3, extend the cap on 1.

One rule keeps the whole thing reliable: one task per invocation. Codex completes exactly one task, commits, and stops. It never batches several tasks into a single iteration, which is what keeps each commit small, each diff reviewable, and each context window focused.

Verify every iteration so the loop trusts itself

A loop is only as good as its feedback. If Codex cannot tell whether its change worked, it will happily mark a broken task done and move on. The repo mantra is blunt: if you didn’t test it, it doesn’t work.

Ralph assumes a verification stack and runs it inside step three of every iteration:

Playwright for end-to-end tests.
Vitest for unit tests.
TypeScript for type checking.
ESLint for linting.
Prettier for formatting.

When you run Codex with --sandbox workspace-write or the bypass flag, it can run those commands itself, read the failures, and fix them before committing. Screenshots add a second channel: the agent captures the UI state so you can eyeball the result in the morning instead of reading diffs blind.

Because every iteration starts fresh, verification is also how the next iteration learns what the last one did. The agent does not remember the previous run. It reads the test results, the updated task status, and the new commits, then decides what is next. That feedback loop is the whole point, and it generalizes across agents. The same verification discipline applies when you run the Gemini CLI in a loop or any other CLI in this family.

Inspect and debug the Codex sandbox

When a run stalls or a task keeps failing, get inside the box. The sandbox is a normal container you can poke at. List what exists:

sbx ls

Open a shell in the Codex sandbox and look around:

sbx exec -it ralph-codex-<project>-<hash8> bash

From there you can check the working tree, re-run a failing test by hand, inspect installed tools, or read .agent/logs/LOG.md and the per-iteration logs in .agent/history/. Reattach to a sandbox session with:

sbx run ralph-codex-<project>-<hash8>

Most stalls trace back to one of three things: Codex was never granted write access (still in read-only mode), a network policy is blocking an install, or the prompt in .agent/PROMPT.md lacks a clear completion criterion. The sandbox shell shows you which one it is.

If you need to redirect a running loop without killing it, edit .agent/STEERING.md. Ralph reads it and folds critical work into the next iteration before resuming the normal task list. That is steering, not stopping, and it keeps momentum while you correct course.

Putting it together

A real Codex loop, start to finish, is three commands:

# 1. authenticate once (creates the sandbox, you log in inside it)
./ralph.sh --login --agent codex

# 2. confirm the sandbox name for network policies and debugging
./ralph.sh --print-name --agent codex

# 3. run the loop with a model and write access, inside the sandbox boundary
./ralph.sh --agent codex -n 50 -- --model gpt-5.5 --dangerously-bypass-approvals-and-sandbox

That is OpenAI’s Codex CLI running unattended: fresh context per iteration, state on disk, write access granted because the microVM is the real boundary, and a hard stop on a completion promise. Define your tasks in .agent/tasks.json, write a clear .agent/PROMPT.md, and let it work.

Frequently asked questions

How do I run the Codex CLI in a loop?

Use Ralph and pass the agent flag: ./ralph.sh --agent codex. Ralph wraps codex exec, runs it inside a Docker Sandbox, starts a fresh context window each iteration, and repeats until every task in .agent/tasks.json is done or the iteration cap is reached. The default is 10 iterations; raise it with -n 50.

How do I pass a model to Codex through Ralph?

Put it after the -- separator. Anything to the right of -- is forwarded to the agent, so ./ralph.sh --agent codex -- --model gpt-5.5 expands to codex exec --model gpt-5.5 with the Ralph prompt. The same separator works for any valid codex exec flag.

Why does my Codex loop never edit any files?

By default codex exec runs in a read-only sandbox, so it cannot write. Grant write access after the separator with --sandbox workspace-write --ask-for-approval never, or use --dangerously-bypass-approvals-and-sandbox. The Docker Sandbox microVM is the real boundary, so the bypass flag is safe here in a way it is not on your laptop.

How do I log in to Codex inside the sandbox?

Run ./ralph.sh --login --agent codex. It drops you into the sandbox shell where you authenticate Codex once. The credential persists in that named sandbox, so future runs attach to the same box already logged in. Each agent has its own sandbox named ralph-codex-<project>-<hash8>.

How does the loop know when to stop?

Codex emits a promise tag in its final message. <promise>COMPLETE</promise> stops the loop with exit code 0, BLOCKED exits with 2, and DECIDE exits with 3. Hitting the iteration cap without completing exits with 1. You branch on those exit codes in a wrapper script or CI.

Run your own Ralph loop

Ralph is a hackable script you point at your project. Install it and let an agent work through your task list.

npx @pageai/ralph-loop

Install from npm Star on GitHub Watch the walkthrough