Skip to content
RALPH LOOP

Why You Should Sandbox Every Autonomous Coding Agent

A comparison of an AI coding agent running directly on a host machine versus running inside a sandbox, showing the difference in blast radius.

Sandbox every autonomous coding agent because the agent runs with your permissions. When you start an agent on your laptop, it does not run as some restricted service account. It runs as you, with your user, which means it can read your SSH keys, your cloud credentials, and your full git history, and it can execute any shell command those permissions allow. A sandbox moves all of that out of reach, so the worst case becomes a throwaway environment you reset instead of a credential leak you cannot undo. That is the entire argument for why sandbox ai agent work, and the rest of this post is the detail behind it.

An agent runs as you, with everything you can touch

Section titled “An agent runs as you, with everything you can touch”

Think about what your own user account can do on a normal dev machine. You can read ~/.ssh/id_ed25519, the private key that authenticates you to GitHub and every server you SSH into. You can read ~/.aws/credentials, ~/.config/gh/hosts.yml, .env files scattered across projects, and the browser session cookies sitting in your home directory. You can git push --force, delete directories, and run a curl | bash line you copied from a README.

An autonomous agent inherits all of it. The model is not malicious. It is a probabilistic system that emits shell commands, and shell commands do not ask for intent. When the agent decides to “clean up some old files” or “reset the environment to a known state,” there is no separate permission layer protecting your home directory. The agent is you, and you have access to everything.

This is fine when a human drives each command and reviews it. Autonomy removes the human from that loop on purpose. The whole point of a Ralph-style loop is that the agent works for hours without you watching, picking up the next task in .agent/tasks.json, running commands, and committing. The thing that made interactive use safe (you reading each command before approving it) is exactly the thing autonomy deletes.

So the question is not whether the agent will eventually run a command you would not have approved. Over a long enough run, it will. The question is what that command can reach when it does.

The blast radius problem with YOLO mode on your laptop

Section titled “The blast radius problem with YOLO mode on your laptop”

To run unattended, an agent has to stop asking permission. In Claude Code that switch is --dangerously-skip-permissions (also --permission-mode bypassPermissions, documented in the Claude Code docs). Other CLIs ship their own version of the same idea, usually called YOLO mode. The naming is honest. You are telling the agent to execute whatever it decides, with no confirmation, for the entire run.

On your host that is a genuinely bad idea, and the danger scales with the length of the run. One iteration is a handful of commands. A fifty-iteration overnight run is hundreds of commands, each one a chance for a hallucinated path, a destructive cleanup, or a misread instruction. You will not be awake to catch the one that matters.

The damage is not limited to the project either. Because the agent runs as you, a single bad command can reach far outside the directory you pointed it at:

  • It can read secrets from other projects and from your home directory, then paste them into a file, a log, or an outbound request.
  • It can push to remotes you have credentials for, including repositories that have nothing to do with the current task.
  • It can delete or rewrite files anywhere your user can write, with no undo.
  • It can install and execute arbitrary code pulled from the network.

None of these require the model to “go rogue.” They are the normal capabilities of your shell, exposed to a system running without a gate. Running bypass-permissions mode directly on a host is the failure mode, not a feature. For the safe version of that same flag, see running agents in YOLO mode safely.

flowchart LR
  subgraph NoSandbox["Without a sandbox: blast radius is your whole machine"]
    AgentA["Agent in YOLO mode (runs as you)"]
    AgentA --> Keys["SSH keys, cloud creds, cookies"]
    AgentA --> Repos["Every other git repo"]
    AgentA --> Net["Unrestricted network"]
    AgentA --> Proj1["Project directory"]
  end
  subgraph WithSandbox["With a sandbox: blast radius is one disposable microVM"]
    AgentB["Agent in YOLO mode (contained)"]
    AgentB --> Proj2["Project directory (only this)"]
    AgentB --> Gate["Network gate: deny-by-default"]
    AgentB -. "no path to" .-> HostKeys["Host keys and creds"]
    AgentB -. "no path to" .-> HostRepos["Other repos"]
  end

The two pictures use the same agent in the same mode. The only thing that changes is what “anything it can do” resolves to. On the left it resolves to your machine. On the right it resolves to a microVM you can throw away.

A sandbox does not make the agent trustworthy. It makes trust unnecessary. You stop trying to predict every command the model might emit and instead make the set of things any command can reach small and disposable. That inversion is what lets you walk away from a running agent and sleep.

Concretely, a sandbox shrinks the worst case from “the agent leaked my SSH key and force-pushed to a client repo” to “the agent trashed a throwaway environment, so I reset it and re-ran the loop.” The first outcome is a security incident. The second is a Tuesday. Same model, same flags, completely different cost when something goes wrong.

This is also why a sandbox is the precondition for real autonomy rather than a nice-to-have. Long unattended runs are the entire premise of running an AI coding agent overnight, and you cannot responsibly leave an agent running for hours unless the place it runs is one you can afford to lose. The sandbox is what makes the overnight run a reasonable decision instead of a gamble with your credentials.

There is a structural point underneath this. A permission prompt is not a security boundary, it is a question. The only real boundary is one enforced from outside the agent, by something the agent cannot reach through or talk its way around. A sandbox is that external boundary. The full version of this argument lives in the pillar guide, how to run AI coding agents in Docker Sandboxes safely.

How Ralph uses Docker Sandboxes by default

Section titled “How Ralph uses Docker Sandboxes by default”

Ralph does not bolt on isolation as an option you remember to enable. It runs every agent inside a Docker Sandbox by default, using the sbx CLI. Each agent gets a lightweight virtual machine with its own kernel, which is a stronger boundary than a namespaced process sharing your host kernel. The Docker Sandboxes documentation is the primary source for how the microVM works underneath.

You do not manage that lifecycle by hand. The script computes a deterministic sandbox name, checks that sbx is installed, decides whether to create or attach, runs the agent in bypass-permissions mode inside the microVM, and stops the sandbox when the run ends. Because the sandbox is the boundary, the agent is free to skip permission prompts and move fast inside it.

Installing and running looks like this:

Terminal window
npx @pageai/ralph-loop
./ralph.sh -n 50

The first command drops Ralph into your project. The second runs the loop for fifty iterations, each one starting the agent with a fresh context inside the sandbox. Pick a different agent and pass agent-specific flags after a -- separator:

Terminal window
./ralph.sh --agent codex -- --model gpt-5.5
./ralph.sh -a gemini -n 5 -- --model pro

Supported agents are claude (the default), codex, copilot, cursor, gemini, and opencode. The default iteration count is 10, and --once runs exactly one iteration.

The sandbox name is deterministic so the same project and agent pair always reuses the same microVM:

ralph-<agent>-<current-dir>-<hash8>

<agent> is the agent slug, <current-dir> is the sanitized basename of the project directory, and <hash8> is the first eight hex characters of a sha256 of the absolute project path. The path hash keeps two same-named directories on different paths from colliding. You never have to memorize the name, because Ralph prints it on startup and on demand:

Terminal window
./ralph.sh --print-name
./ralph.sh --print-name --agent codex

When the run ends, by normal exit, by a double Ctrl+C, or by any path that fires the exit trap, Ralph stops only the sandbox it started:

Terminal window
sbx stop ralph-claude-my-app-a1b2c3d4

Stopping is not deleting, so you can reattach later to inspect what the agent did, then remove it with sbx rm when you are finished.

Not every “sandbox” is a real boundary. A bare container that bind-mounts your whole home directory and has open network is theater. Three properties separate isolation that holds from isolation that only looks like it does.

The agent should see the project directory and nothing above it. Ralph shares your project at the same absolute path it has on your host, so tooling, config, and lockfiles resolve identically, while the rest of your home directory stays outside. No SSH keys, no ~/.aws, no unrelated repositories, no shell history full of tokens. The blast radius becomes the project you pointed at, plus whatever network you explicitly allow.

The corollary matters: because the project is shared, the agent can absolutely wreck your working tree. The protection there is git, not the sandbox. Work on a branch and commit often. The sandbox protects everything outside the project, and version control protects the code inside it.

The environment has to be cheap to destroy and recreate. If resetting a contaminated sandbox is a multi-hour chore, you will avoid resetting it, and a sandbox you never reset slowly turns back into a pet you are afraid to lose. Ralph leans on the deterministic name here: tear a sandbox down with sbx rm, and the next iteration simply probes, finds nothing, and creates a clean one. That re-probe each iteration is what makes a long run resilient to you poking at the sandbox by hand.

Filesystem isolation is half the boundary. The other half is the network, because an agent with unrestricted outbound access can fetch arbitrary code and, in the worst case, send data out. Docker Sandboxes default to blocking outbound HTTP and HTTPS, then you allowlist exactly what the task needs:

Terminal window
sbx policy allow network ralph-claude-my-app-a1b2c3d4 "*.npmjs.org,github.com"

The practical symptom of deny-by-default is that npm install fails or an API call is refused until you grant the domain. That is the gate doing its job. You open specific hosts, not everything. There is a full-open escape hatch with the "**" rule, and it is the right tool only when you genuinely cannot enumerate the domains and you accept the tradeoff for that one sandbox. Building a tight allowlist that lets installs through while keeping exfiltration out is its own topic, covered in network policies for AI agent sandboxes.

A plain container can satisfy some of this and miss the rest, which is why the kind of sandbox matters. For where a hand-rolled container leaks and where a microVM holds, read Docker Sandboxes vs plain containers for AI agents.

A quick test before you run an agent unsandboxed

Section titled “A quick test before you run an agent unsandboxed”

If you are tempted to skip the sandbox for a “quick” autonomous run, ask one question: if this agent ran rm -rf ~ or piped the contents of ~/.ssh to a pastebin, what would it cost you? If the honest answer is “a rebuild and some annoyance,” you are already in a disposable environment and you are fine. If the answer involves rotating credentials, notifying anyone, or the phrase “client repository,” you need the boundary before you start the loop.

The reason to make this the default rather than a judgment call is that the judgment is the part autonomy removes. You are not going to evaluate the risk of each of the next four hundred commands. You evaluate the environment once, up front, and then let the agent be fearless inside it.

A sandbox is a strong boundary and not a magic one. Two limits are worth stating so you do not over-trust the setup.

First, the shared project directory is genuinely shared. The agent can corrupt your working tree, and only git will save you. Branch and commit.

Second, whatever you allowlist is genuinely reachable. Grant a domain that accepts uploads and an agent could in principle send data there. Keep network grants minimal, specific, and reviewed, the same way you would treat firewall rules, and avoid the "**" rule except on purpose.

Inside those limits the model is simple and it holds. Enforce the boundary from the outside, give the agent a disposable place to be fearless, and let the loop run. The sandbox is the blast radius, so fast and autonomous becomes the same thing as contained.

Frequently asked questions

Why do I need to sandbox an AI coding agent at all?

An autonomous agent runs with your user permissions, so it can read your SSH keys, cloud credentials, and git history, push to your remotes, and delete files anywhere you can write. In autonomous mode there is no human reviewing each command, so over a long run the agent will eventually execute something you would not have approved. A sandbox limits what any command can reach to a disposable environment.

What is the worst case if I run an agent without a sandbox?

The worst case is a security incident rather than an inconvenience. A single bad command can leak credentials from your home directory, force-push to an unrelated repository, or delete files with no undo, because the agent has your full permissions. With a sandbox the same bad command only touches a throwaway microVM that you reset and re-run.

Is bypass-permissions or YOLO mode ever safe?

It is unsafe on your host and safe inside a sandbox. On the host, --dangerously-skip-permissions gives the agent full shell access with no confirmation. Inside a Docker Sandbox microVM the same flag only grants access to the shared project directory and an allowlisted network, so the danger has no target on your machine.

Does Ralph sandbox agents automatically?

Yes. Ralph runs every agent inside a Docker Sandbox by default using the sbx CLI. It computes a deterministic sandbox name, creates or attaches the microVM each iteration, runs the agent in bypass-permissions mode inside it, and stops the sandbox when the run ends.

What makes isolation good rather than just theater?

Good isolation mounts only the project directory and nothing above it, stays ephemeral so it is cheap to destroy and recreate, and limits the network to an allowlist instead of leaving it wide open. A microVM with its own kernel is a stronger boundary than a container that shares your host kernel, especially for code you do not trust to behave.