Skip to content
RALPH LOOP

How to Write a PRD an AI Agent Can Actually Build From

Flow from unstructured requirements to a PRD, a short SUMMARY, and a task list an AI agent builds from.

A PRD an AI agent can build from is not a feature wishlist. It is three things in one document: goals (what to build and why), constraints (the stack, the boundaries, what is out of scope), and verifiable acceptance criteria (conditions the agent can check by running a command and reading the output). Drop any one of those and the agent fills the gap with a guess. The guess looks fine in the diff and breaks on the case nobody wrote down.

This post is the practical version: what goes in .agent/prd/PRD.md, what goes in the short SUMMARY.md the loop sends every iteration, how to draft both with the prd-creator skill, and how to write acceptance criteria an agent can actually verify. It assumes you already know the shape of spec-driven development with AI and want to write the document that sits at the top of it.

A human PRD and an agent PRD overlap, but they are not the same document. A human reads a PRD, fills the gaps with judgment, and asks a teammate when something is unclear. An agent does none of that. It reads what is on disk, and when the spec is silent, it invents an answer and proceeds with full confidence. So the agent PRD has to do more work up front.

Three properties decide whether a PRD is buildable.

Goals are concrete, not aspirational. “Make onboarding delightful” is not a goal an agent can build toward. “A new user reaches the dashboard in three steps or fewer after submitting the signup form” is. State the outcome in terms something can observe.

Constraints are explicit. Name the framework, the data store, the auth approach, and the libraries you have already committed to. Name what is out of scope just as clearly. An out-of-scope section is a fence: without it, an agent on a long run happily adds a feature you never asked for and now have to maintain.

Acceptance criteria are verifiable. Each criterion is a condition the agent can confirm by running a test, hitting an endpoint, or reading a file. “Login works” is a vibe. “POST /api/login with a wrong password returns 401 and the body { error: 'Invalid credentials' }” is a criterion. The difference is whether a machine can return a yes or a no without your opinion.

The reason this matters more for agents than for people: an autonomous loop amplifies whatever you feed it. Feed it ambiguity and it amplifies ambiguity across every iteration. The Ralph technique runs an agent against your task list until the work is done, and the whole design rests on the agent rebuilding its understanding from files on each pass. If you want the mechanics of that loop, start with what is the Ralph technique. The PRD is the document the entire loop reads from.

Where the PRD lives: PRD.md and SUMMARY.md

Section titled “Where the PRD lives: PRD.md and SUMMARY.md”

Ralph keeps the product specification in two files under .agent/prd/, and the split is deliberate.

PRD.md is the full document. It is for depth: a human reads it once to understand the project, and the agent reads it when it needs detail it cannot get from the summary. The prd-creator skill writes it with a consistent set of sections:

  • App overview and objectives
  • Target audience
  • Success metrics and KPIs
  • Competitive analysis
  • Core features and user flows
  • Technical stack
  • Prerequisites and access
  • Security considerations
  • Assumptions and dependencies

SUMMARY.md is the short executive overview. This is the file that gets sent to the agent every iteration so it reorients fast without rereading the entire PRD. It contains an overall description of the project, the main features, the key user flows, and a short list of key requirements. Nothing more.

The economics drive the split. Every iteration starts the agent with a fresh context window, and tokens in that window cost money and attention. You do not want to spend that budget reloading a 3000 word PRD on every pass when a tight summary reorients the agent just as well. Long PRD for depth, short summary for the working context on every iteration.

flowchart LR
  Reqs["Unstructured requirements"] --> Interview["prd-creator interview, plan mode"]
  Interview --> PRD["prd/PRD.md: goals, constraints, criteria"]
  PRD --> Summary["prd/SUMMARY.md: short overview, sent each iteration"]
  PRD --> Tasks["tasks.json and tasks/TASK-ID.json"]
  Summary --> Loop["ralph.sh loop, fresh context each iteration"]
  Tasks --> Loop
  Loop --> Verify["Run tests, lint, types, screenshot"]
  Verify --> Commit["Commit, set passes true"]

The PRD feeds two children. The summary is the version the loop reads constantly. The task list is the executable decomposition. Get the PRD right and both children inherit clear goals and criteria. Get it vague and both inherit the vagueness.

Draft it with the prd-creator skill in plan mode

Section titled “Draft it with the prd-creator skill in plan mode”

You do not write all of this by hand. Ralph ships a prd-creator skill that turns unstructured requirements into a PRD plus a task list. Run it in plan mode, where the agent is read-only and focused on asking questions instead of writing code. Plan mode matters here: you want the agent interrogating your idea, not racing ahead to scaffold files before the spec exists.

The flow is a conversation, not a one shot. The instinct most people have is to paste a paragraph and expect a finished plan. The skill instead pushes back. It interviews you to fill the gaps, asking clarifying questions one at a time, and it researches the competitive landscape before it commits anything to PRD.md. When a question can be answered by reading the codebase, it reads the codebase instead of asking you.

A prompt to kick it off looks like plain language:

Use the prd-creator skill in plan mode. I want to build a link shortener
with accounts, custom slugs, and click analytics. Interview me, write the
PRD to .agent/prd/PRD.md and the summary to .agent/prd/SUMMARY.md, then
generate the task list in .agent/tasks.json.

During the interview, the skill also verifies prerequisites and creates or updates .env.local with placeholder values only. It never writes a real secret to the PRD, the tasks, the logs, or .env.local. You fill the real values in by hand. This is the moment the spec records what credentials and access the project needs, so the agent is not discovering halfway through a 50 task run that it never had database access.

When it finishes, you have PRD.md, SUMMARY.md, and tasks.json with one TASK-{ID}.json spec per task. From there you run the loop:

Terminal window
npx @pageai/ralph-loop
./ralph.sh -n 50

You can amend later. When you want to add a feature or fix a bug mid-project, run the skill again to update the PRD and append tasks. The spec grows with the project instead of going stale the moment you start coding.

Write acceptance criteria the agent can verify

Section titled “Write acceptance criteria the agent can verify”

This is the part that separates a PRD an agent can build from a PRD that produces confident nonsense. An acceptance criterion is only useful if the agent can check it without you. The test is simple: can the agent confirm this by running something and reading the output? If not, rewrite it.

Three rules make a criterion verifiable.

Name the input and the expected output. Vague: “the endpoint validates email.” Verifiable: “POST /api/register with email ‘not-an-email’ returns 400 and the body { error: 'Please enter a valid email' }.” Now the agent can send the request, read the status and body, and compare.

Point at an observable artifact. “Passwords are secure” is unprovable. “The stored password starts with the bcrypt prefix $2b$ and never equals the plaintext value” can be confirmed by reading the row. Anchor the criterion to something the agent can inspect: a database row, a response header, a file on disk, a console exit code.

Make it pass or fail, never partial. A criterion that needs interpretation is a criterion the agent will interpret in its favor. “The UI looks clean” invites argument. “The submit button is disabled until both fields are non-empty” does not.

In Ralph, the criteria do not float in the PRD. They land in the per-task spec files. Each TASK-{ID}.json carries an acceptanceCriteria array, and the tests that prove those criteria are steps inside the task, not separate tasks scheduled for later.

{
"id": "TASK-3",
"title": "POST /api/auth/register creates a new user account",
"category": "api-endpoint",
"description": "Validate input, hash the password, store the user, return a success response.",
"acceptanceCriteria": [
"POST with valid email and password returns 201 with the user id and email",
"Invalid email format returns 400 with the error text Please enter a valid email",
"Password shorter than 8 characters returns 400",
"Duplicate email returns 409, not a generic 500",
"The stored password starts with $2b$ and is never the plaintext value"
],
"steps": [
{
"step": 1,
"description": "Add the register route handler",
"details": "Validate with a zod schema, hash with bcrypt, insert into users.",
"pass": false
},
{
"step": 2,
"description": "Write Vitest cases for every acceptance criterion",
"details": "Cover valid registration, invalid email, short password, duplicate email, and the stored hash prefix.",
"pass": false
}
],
"dependencies": ["TASK-1", "TASK-2"],
"estimatedComplexity": "medium"
}

Every line in acceptanceCriteria maps to something the verification stack can confirm. The loop assumes that stack: Playwright for end to end, Vitest for unit tests, TypeScript for types, ESLint for lint, Prettier for format. The repo mantra is blunt: if you didn’t test it, it doesn’t work. The agent runs the gate, and only after it passes does it flip passes to true, take a screenshot, and commit. Turning criteria into atomic, independently verifiable packets like this is its own discipline, covered in breaking a PRD into atomic agent tasks.

Name env vars, libraries, and user flows so the agent does not guess

Section titled “Name env vars, libraries, and user flows so the agent does not guess”

The fastest way to get an agent to invent something wrong is to leave a decision implicit. Three categories cause the most trouble, and the PRD should pin all three.

Environment variables. List every variable the project reads, with a one line note on what each is for. The prd-creator skill writes placeholders into .env.local during prerequisite verification, so the agent knows the keys exist without ever seeing a real secret. Without this, an agent guesses variable names, scatters them across files, and you spend the morning reconciling DATABASE_URL against DB_CONNECTION_STRING.

Libraries and the stack. Say which framework, which ORM, which validation library, which test runner. If the project already uses zod, the PRD should say so, or the agent will reach for whatever it saw most recently in its training and add a second validation library next to your first. Naming the stack is also where you encode reuse: tell the agent to extend the existing auth module rather than write a parallel one.

User flows. A flow is a sequence of steps with branches, and the branches are where agents guess. “Users can reset their password” hides a dozen decisions. Does the reset link expire? After how long? What happens on an expired link? Does requesting a reset for an unknown email reveal that the account does not exist? Write the flow as steps and edge cases, and each edge case becomes an acceptance criterion instead of a surprise in production.

The pattern across all three: every decision you leave out is a decision the agent makes for you, silently, at the moment it is most expensive to change. The PRD is where you make those decisions while they are still cheap.

Take the link shortener from the prompt above and watch the three properties show up.

The goal is concrete: “an authenticated user creates a short link with an optional custom slug and sees total clicks per link.” Not “build a great link tool.” The constraint section names the stack and fences the scope: custom domains and team accounts are out of scope for version one. The acceptance criteria get specific per feature. For slug generation: “a generated slug is 7 characters of base62” and “a collision retries up to 3 times before returning an error.” For the redirect: “GET /:slug on an unknown slug returns 404” and “a valid slug records exactly one click row and issues a 302 to the target URL.”

The prd-creator interview is where these get extracted. Are slugs unique globally or per account? What happens on a collision? Do analytics count unique visitors or raw hits? Each answer becomes a line in the PRD, and the unanswered questions become the edge cases that would otherwise blow up the loop. By the time the PRD is approved, the ambiguity is gone, and the task list inherits criteria a machine can check. The loop then runs one task at a time, which is the rule that keeps a long run from drifting, explained in one task per iteration.

The framing of phases here (specify intent, plan the approach, decompose into tasks, implement and verify) comes from GitHub Spec Kit, and the loop that runs it autonomously was popularized by Geoffrey Huntley in his original Ralph writeup. The PRD is the artifact the first phase produces and every later phase consumes.

If you are writing the spec that drives a loop, read down through the spec-driven cluster:

For the mechanics of the loop that reads your PRD on every pass, the fresh-context design, and where the technique came from, read what is the Ralph technique.

Frequently asked questions

What makes a PRD buildable by an AI agent rather than a person?

A buildable PRD states concrete goals, explicit constraints, and verifiable acceptance criteria. A person can fill gaps with judgment and ask a teammate, but an agent invents an answer whenever the spec is silent. So the agent PRD must name the stack, fence what is out of scope, and define each acceptance criterion as a condition a machine can confirm by running a command and reading the output.

What is the difference between PRD.md and SUMMARY.md in Ralph?

PRD.md is the full document with app overview, target audience, success metrics, core features and user flows, technical stack, prerequisites, security considerations, and assumptions. SUMMARY.md is a short executive overview with the main features, key user flows, and key requirements. The summary is what gets sent to the agent every iteration so it reorients fast without rereading the entire PRD.

How do I write acceptance criteria an agent can verify?

Name the input and the expected output, point at an observable artifact, and make every criterion pass or fail with no interpretation. For example, POST with an invalid email returns 400 with a specific error text, or the stored password starts with the bcrypt prefix and is never the plaintext value. The agent confirms each one with Playwright, Vitest, type checks, and lint before it marks the task done.

How do I create the PRD without writing it all by hand?

Use the prd-creator skill in plan mode. It interviews you one question at a time, researches the competitive landscape, verifies prerequisites, and writes placeholder values into .env.local without ever storing a real secret. It then writes PRD.md and SUMMARY.md and generates tasks.json with a prerequisite verification task first. You can run it again later to amend the PRD and append tasks.

Why does naming env vars, libraries, and user flows matter so much?

Every decision left implicit is a decision the agent makes for you, silently, at the moment it is most expensive to change. If you do not name the validation library, the agent may add a second one. If you do not spell out the password reset flow, it guesses how expiry and unknown emails behave. Listing variables, the stack, and flows with their edge cases turns those guesses into acceptance criteria.