Spec-Driven Development vs Vibe Coding: When Each One Wins
Vibe coding wins when the code is disposable. Spec-driven development wins when the code has to survive: production systems, code other people maintain, and anything an autonomous agent builds while you are asleep. The split is not about taste. It is about whether the work has verifiable acceptance criteria or just a feeling that the output looks right.
This post defines both honestly, including where vibe coding is genuinely the correct choice. Then it covers why an autonomous loop has no choice but to go spec-driven, what vibe coding actually costs once it scales past one person and one afternoon, and how to pick per task instead of picking a religion. It ends with how Ralph leans spec-driven through a PRD and a task list, without forcing you to spec a five line script.
What vibe coding actually is
Section titled “What vibe coding actually is”Vibe coding is writing code by feel. You describe what you want in loose terms, you read the output, you eyeball whether it looks correct, and you keep nudging until it runs. The acceptance test is your gut. The spec lives in your head and changes as you go.
For an AI workflow, vibe coding means prompting an agent with intent and judging the result by reading it. “Add a settings page.” “Make this faster.” “Fix the layout on mobile.” You accept the diff because it looks plausible and the page renders, not because a test confirmed a specific condition.
This is not an insult. Vibe coding is the fastest way to explore an idea you do not fully understand yet. When the goal is to learn the shape of a problem, writing a spec first is premature: you do not know enough to write a good one. You vibe a spike, see what the problem actually wants, then throw the spike away.
Where vibe coding is genuinely fine:
- Spikes and prototypes you intend to delete.
- One off scripts: a data migration you run once, a quick scraper, a throwaway chart.
- Demos and hackathon code where shipping in an hour beats shipping correctly.
- Exploratory work where the spec would just be a guess anyway.
- Personal tools with exactly one user who is also the author.
The common thread is that nobody inherits the code. There is no future maintainer, no on call engineer, no agent that has to extend it next week. When the blast radius of a mistake is your own afternoon, the overhead of a spec is not worth it. Vibe away.
What spec-driven development actually is
Section titled “What spec-driven development actually is”Spec-driven development inverts the order. You write the specification first, then the code exists to satisfy it. The spec is not a wishlist. It is goals (what to build and why), constraints (the stack, the boundaries, what is out of scope), and verifiable acceptance criteria (conditions something can check by running a command and reading the output).
The phases come from GitHub Spec Kit: specify the intent, plan the approach, break it into tasks, then implement and verify. Each phase produces an artifact the next phase consumes. The full version of this workflow, with PRDs, task lists, and breakdown, is laid out in spec-driven development with AI.
The defining property is that “done” is not a feeling. A spec-driven task is done when a specific, checkable condition is true. “Login works” is a vibe. “POST /api/login with a wrong password returns 401 and the body { error: 'Invalid credentials' }” is a criterion a machine can confirm without your opinion. That single difference is what makes the rest of this comparison fall out.
Why an autonomous agent needs a spec
Section titled “Why an autonomous agent needs a spec”Here is the part that decides the whole debate for anyone running agents in a loop. A human can vibe code because a human carries the unwritten spec. You know the system. You remember the edge case from last month. You ask a teammate when something is ambiguous. An autonomous agent does none of that.
An agent reads what is on disk. When the spec is silent, it does not pause and ask. It invents an answer and proceeds with full confidence. The guess looks fine in the diff and breaks on the case nobody wrote down. Multiply that across a long run and you get a pile of confident, untested code that drifts further from your intent on every iteration.
The Ralph technique makes this concrete. Each iteration starts the agent with a fresh context window, which is what keeps a long run from rotting (the agent losing the plot over a marathon session). The mechanics are covered in what is the Ralph technique, the loop popularized by Geoffrey Huntley in his original Ralph writeup. Fresh context is the reason the loop survives. It is also the reason vibe coding cannot drive it.
Think about what fresh context implies. The agent that starts iteration 30 has no memory of the conversation from iterations 1 through 29. It rebuilds its entire understanding from files: the PRD, the task list, the logs, the git history. There is no “you know what I meant” to fall back on. If the intent is not written down, it does not exist for that iteration’s agent.
So the loop needs the spec for two reasons:
- No guessing. Written goals and constraints mean the fresh-context agent reorients to the same target every pass instead of inventing a new one.
- A stop condition. The loop ends on an explicit completion signal, not a vibe. The agent emits
<promise>COMPLETE</promise>when every task passes its criteria, andralph.shexits with code0. Without verifiable criteria there is nothing for the loop to check, so it can never honestly say it is done.
That is the core argument. You cannot run an agent unattended against “make it good.” You can run it against a task list where every task carries acceptance criteria the verification stack can confirm.
What vibe coding costs at scale
Section titled “What vibe coding costs at scale”Vibe coding feels free because the cost is deferred. It shows up later, somewhere else, and usually larger. Three costs dominate once the work outgrows a single afternoon.
Rework. Code accepted on a vibe gets rejected on a test you write three weeks later, or worse, by an incident. The fix is rarely a one line change, because the original code encoded a misunderstanding, not a typo. You are not patching a bug. You are re-deriving the requirement that was never written down, then rewriting against it. Specifying first front-loads that thinking when it is cheap.
Drift. Every implicit decision is a decision someone, or some agent, makes for you. Vibe coding scatters those decisions across files and weeks. One place validates email with a regex, another with a library, a third not at all. There was never a source of truth, so the codebase has three answers to one question. With a spec, the answer is written once and every task inherits it.
Unverifiable output. This is the expensive one for AI workflows. If you cannot state how to check that the work is correct, you cannot automate the check, which means a person has to read every diff and judge it by hand. That person becomes the bottleneck. The whole promise of an autonomous loop is that the machine verifies its own work. Vibe coding removes the thing the machine would verify against, so you are back to manual review at the exact moment you wanted to step away.
The repo mantra is blunt: if you didn’t test it, it doesn’t work. Vibe coding at scale is a bet that you will remember every unwritten assumption and that nobody else will touch the code. Both halves of that bet lose over time.
How to pick per task
Section titled “How to pick per task”The honest answer is that you do not pick one approach for your whole life. You pick per task, and the deciding questions are quick.
flowchart TD
Start["New piece of work"] --> Survive{"Does the code survive past today?"}
Survive -->|"No, it is a spike or demo"| Vibe["Vibe code it, delete it later"]
Survive -->|"Yes"| Criteria{"Can you write pass or fail acceptance criteria?"}
Criteria -->|"Not yet, too fuzzy"| Spike["Vibe a spike first, then spec what you learned"]
Criteria -->|"Yes"| Agent{"Will an autonomous agent build it unattended?"}
Agent -->|"No, you are at the keyboard"| Mixed["Spec the risky parts, vibe the glue"]
Agent -->|"Yes"| Spec["Spec-driven: PRD, tasks, criteria, verify"]
Walk the branches:
- Does it survive past today? If the answer is no, stop reading and vibe it. A migration script you run once does not need a PRD.
- Can you write pass or fail criteria? If the problem is still fuzzy, you do not know enough to spec well. Vibe a spike to learn the shape, then write the spec from what you learned. The spike was reconnaissance, not the deliverable.
- Will an agent build it unattended? This is the hard cutoff. The moment a fresh-context agent is doing the work without you watching, you need the spec. There is no in between, because the agent has no judgment to fall back on and no way to ask you mid-iteration.
- You at the keyboard, code that survives? This is the common case, and it is mixed. Spec the parts where a wrong guess is expensive (auth, money, data integrity, public APIs). Vibe the glue where a mistake is cheap and obvious. You do not need acceptance criteria for a button’s hover color.
The mistake is treating this as ideology. Spec-driven purists waste hours writing criteria for code they will delete. Vibe coding diehards ship confident nonsense to production. The discipline is matching the rigor to the stakes.
How Ralph leans spec-driven
Section titled “How Ralph leans spec-driven”Ralph is built for the unattended case, so it leans spec-driven by design. It does not force you to write a PRD for a one line fix, but the loop architecture assumes a spec exists when you run it for real.
The spec lives in two files under .agent/prd/. PRD.md is the full document: goals, constraints, core features, technical stack, security considerations, and assumptions. SUMMARY.md is the short executive overview sent to the agent every iteration so it reorients fast without rereading the entire PRD. Long document for depth, short summary for the working context on each pass.
You do not write all of this by hand. The prd-creator skill, run in plan mode, interviews you one question at a time, researches the problem, and writes both files plus a task list. The details of writing that document, and acceptance criteria an agent can actually verify, are in how to write a PRD an AI agent can actually build from.
The PRD then decomposes into a task lookup table. tasks.json holds the list, and each tasks/TASK-{ID}.json carries its own description, dependencies, and an acceptanceCriteria array. This is the structure that lets a loop grind through hundreds of tasks without losing track, covered in task lookup tables for agents.
npx @pageai/ralph-loop./ralph.sh -n 50Each iteration the loop finds the highest-priority incomplete task, works its steps, runs tests and linting and type checking, takes a screenshot, flips the task to passing only when the gate is green, and commits. The verification stack is Playwright, Vitest, TypeScript, ESLint, and Prettier. The agent does not get to declare victory on a vibe. It declares victory when the criteria pass, and the loop stops on the completion promise.
That is the whole point of the comparison in one workflow. The spec is what lets the machine verify its own output, which is what lets you close the laptop. Vibe coding is the right tool when nobody inherits the code. The moment an agent inherits it, the spec stops being overhead and starts being the only thing keeping the loop honest.
Where to go next
Section titled “Where to go next”If you are deciding how much structure a piece of work deserves, read across the spec-driven cluster:
- Spec-driven development with AI for the full Specify, Plan, Tasks, Implement workflow.
- How to write a PRD an AI agent can actually build from for the document that sits at the top of it.
- Task lookup tables for agents for scaling a spec to hundreds of tasks.
For the loop that reads your spec on every pass and why fresh context changes the rules, read what is the Ralph technique.
Frequently asked questions
What is the difference between spec-driven development and vibe coding?
Vibe coding writes code by feel: you prompt with loose intent, read the output, and accept it because it looks plausible and runs. Spec-driven development writes the specification first, so the code exists to satisfy concrete goals, explicit constraints, and verifiable acceptance criteria. The defining difference is whether done is a feeling or a checkable condition a machine can confirm by running a command and reading the output.
Is vibe coding ever the right choice?
Yes, when the code is disposable and nobody inherits it. Spikes you intend to delete, one off scripts, demos, hackathon code, and personal tools with one user are all fine to vibe. The overhead of a spec is not worth it when the blast radius of a mistake is your own afternoon. Vibe a spike to learn a fuzzy problem, then write the spec from what you learned.
Why do autonomous AI agents need a spec instead of vibe coding?
An autonomous agent reads only what is on disk, and a Ralph loop starts each iteration with a fresh context window that has no memory of previous iterations. When the spec is silent, the agent invents an answer and proceeds with full confidence, and that guess compounds across a long run. Verifiable acceptance criteria give the loop something to check and a real stop condition, so it can honestly signal completion instead of guessing it is done.
What does vibe coding cost once a project scales?
Three things: rework, because code accepted on a vibe encodes a misunderstanding you later re-derive and rewrite; drift, because every implicit decision gets answered differently across files with no source of truth; and unverifiable output, because if you cannot state how to check correctness, a person has to read every diff by hand. That manual review is the exact bottleneck an autonomous loop is supposed to remove.
How does Ralph use spec-driven development?
Ralph keeps the spec in .agent/prd/PRD.md and a short SUMMARY.md sent to the agent each iteration, generated by the prd-creator skill in plan mode. The PRD decomposes into a task lookup table where each TASK-ID.json carries acceptance criteria. Every iteration the loop works one task, runs Playwright, Vitest, TypeScript, ESLint, and Prettier, and only marks the task passing when the gate is green, then commits and stops on a completion promise.