Task Lookup Tables: Scaling Autonomous Agents to Hundreds of Tasks

A lean tasks.json lookup table pointing to one selected TASK-ID.json spec file that an agent loads per iteration.

Apr 30, 2026 - 14 min read - 2900 words

Creator of RalphLoop.sh, founder of PageAI

A flat task list does not scale. Once you push past a few dozen entries, a single file that holds every task plus every step, criterion, and note becomes too big to load into an agent’s context window on every iteration. The fix is a task lookup table: a lean index of tasks that points to a separate detailed spec file per task. The agent reads the small index to decide what to do next, then loads exactly one spec file to do it. That separation is what lets an autonomous loop work through hundreds of tasks without choking on its own backlog.

This post shows how Ralph models that pattern with .agent/tasks.json and .agent/tasks/TASK-{ID}.json, why the split scales, how status and priority selection work each iteration, and how you grow the table over time. It assumes you already understand spec-driven development with AI and want the data structure that makes a long run survivable.

What a task lookup table is

A lookup table is an index. Instead of one giant document that mixes the list of work with the detail of each item, you keep two layers.

The first layer is a flat array where each entry is small: an id, a title, a category, a pointer to the detail file, and a status flag. That is the lookup table. It answers one question fast: what is left to do, and which thing is next.

The second layer is one spec file per task. Each file is the full contract for a single unit of work: description, acceptance criteria, ordered steps, dependencies, complexity, and technical notes. The agent only opens this when it has already decided to work that task.

The reason to split is the same reason a database keeps an index separate from the rows. You scan the cheap thing to find the right record, then you read the expensive thing once. An agent that has to load 200 fully detailed tasks just to pick the next one burns its context window before it writes a line of code.

How Ralph models it: tasks.json plus per-task specs

Ralph keeps the index in .agent/tasks.json and the detail in .agent/tasks/TASK-{ID}.json. The root file is deliberately thin. Each entry carries only what the loop needs to choose a task.

[
  {
    "id": "TASK-1",
    "title": "Verify project prerequisites and access",
    "category": "setup",
    "specFilePath": ".agent/tasks/TASK-1.json",
    "passes": false
  },
  {
    "id": "TASK-2",
    "title": "User table with authentication fields",
    "category": "data-model",
    "specFilePath": ".agent/tasks/TASK-2.json",
    "passes": false
  },
  {
    "id": "TASK-3",
    "title": "POST /api/auth/register creates new user account",
    "category": "api-endpoint",
    "specFilePath": ".agent/tasks/TASK-3.json",
    "passes": false
  }
]

Five fields, nothing more. The specFilePath is the pointer that turns this flat list into a lookup table. The passes flag is the status. Everything heavy lives behind the pointer, in the per-task file.

Here is what one of those spec files holds. This is TASK-3.json, the detail behind the one-line index entry above.

{
  "id": "TASK-3",
  "title": "POST /api/auth/register creates new user account",
  "category": "api-endpoint",
  "description": "Validate input, hash the password, store the user, and return a success response.",
  "acceptanceCriteria": [
    "POST with a valid email and password returns 201 with the user id and email",
    "Invalid email format returns 400 with the error text Please enter a valid email",
    "Password shorter than 8 characters returns 400",
    "Duplicate email returns 409, not a generic 500",
    "The stored password starts with $2b$ and is never the plaintext value"
  ],
  "steps": [
    {
      "step": 1,
      "description": "Add the register route handler",
      "details": "Validate with a zod schema, hash with bcrypt, insert into the users table.",
      "pass": false
    },
    {
      "step": 2,
      "description": "Write Vitest cases for every acceptance criterion",
      "details": "Cover valid registration, invalid email, short password, duplicate email, and the stored hash prefix.",
      "pass": false
    }
  ],
  "dependencies": ["TASK-1", "TASK-2"],
  "estimatedComplexity": "medium",
  "technicalNotes": [
    "Never log passwords, even in error branches",
    "Return 409 on duplicate email rather than a generic 500"
  ]
}

Notice the asymmetry. The index entry for TASK-3 is five lines. The spec file is forty. Multiply that across a real project and the savings are obvious: a 200 task project has a tasks.json of roughly a thousand lines of thin entries, while the detail sits in 200 separate files the agent never loads all at once.

The diagram below is the whole idea. The loop scans the lean table, picks one id, follows its specFilePath, and loads only that file into the working context.

flowchart LR
  Index["tasks.json: lean index, one line per task"]
  Index --> Scan["Scan for highest-priority task where passes is false"]
  Scan --> Pick["Selected: TASK-3"]
  Pick --> Path["Follow specFilePath"]
  Path --> Spec["tasks/TASK-3.json: full spec loaded into context"]
  Spec --> Work["Work the steps, run the gate"]
  Work --> Flip["Set passes true in tasks.json, commit"]
  Index -. "not loaded" .-> Rest["tasks/TASK-4.json ... TASK-200.json"]

The dotted edge is the point. Every other spec file stays on disk, unread, until its turn comes. The agent never pays the token cost of a task it is not working.

Why this scales to hundreds of tasks

The scaling property comes from one fact: the agent only loads the one task it needs. Every iteration of the Ralph loop starts the agent with a fresh context window. It does not carry the previous iteration in chat history. It rebuilds its understanding from files. So the cost of each iteration is whatever the agent reads off disk, not the size of the whole project.

Walk through the math. Suppose the average per-task spec is 40 lines and the project has 200 tasks. If the agent had to load every detailed task to orient itself, that is 8000 lines on every single iteration, repeated 200 times. With a lookup table, each iteration loads the thin index (around 1000 lines of one-line entries) plus exactly one spec file (40 lines). The detail you load grows by one file, not by the entire backlog, no matter how large the table gets.

That is why a flat, fully detailed task file hits a wall. It works fine at ten tasks. At a hundred it crowds out the actual code in the context window, and the agent starts skimming, missing acceptance criteria, and contradicting earlier decisions. Splitting the index from the detail keeps the working context flat as the project grows.

The token economics reinforce this. Tokens in the context window cost money and attention on every pass. You do not want to spend that budget reloading 199 tasks the agent will not touch this iteration. A lean index plus one spec is the minimum the agent needs to choose work and do it correctly. This is the same instinct behind keeping a short SUMMARY.md for the PRD instead of resending the full document every iteration: load the index constantly, load the detail on demand.

Status tracking with the passes flag

Status lives in two places, and the split mirrors the index-and-detail structure.

At the index level, each entry in tasks.json has a single passes boolean. It starts false. The agent only flips it to true after the work is built and verified. The loop reads this flag to decide what is finished and what remains. Scanning for incomplete work is a pass over the lean index, which stays cheap even at hundreds of entries.

At the detail level, each step inside a spec file has its own pass boolean. These track progress within a single task: step one done, step two not yet. A task is not complete until every step passes and the acceptance criteria hold. Only then does the top-level passes in the index flip.

The rule that protects this system is strict: never flip passes to true until the work is verified. Tasks are generated with passes: false and stay false until the agent runs the verification stack the loop assumes, which is Playwright for end to end, Vitest for unit tests, TypeScript for types, ESLint for lint, and Prettier for format. The repo mantra is blunt: if you didn’t test it, it doesn’t work. A status flag that flips on a vibe instead of a passing gate turns the lookup table into a lie, and a fresh-context agent will trust that lie on the next iteration.

Because status is just a flag in a file, the run is resumable and auditable. Any fresh agent on any machine can read tasks.json, see which entries are still false, and pick up exactly where the last one stopped. Watching those flags flip across a run, alongside the per-iteration history and screenshots, is the basis of observability for autonomous coding agents. The table is both the work queue and the progress report.

Priority selection each iteration

Picking the next task is a scan over the index, and the order is not arbitrary. Each iteration, the loop finds the highest-priority incomplete task in tasks.json, then loads that one spec file and works its steps.

Two things shape the selection.

First, dependencies. Each spec file carries a dependencies array of task ids that must finish before it can start. TASK-3 (the register endpoint) lists ["TASK-1", "TASK-2"], so the loop will not select it until the prerequisite gate and the users table are both passes: true. The agent never builds on a foundation that is not there yet.

Second, the prerequisite gate. TASK-1 is always reserved for prerequisite verification: environment variable placeholders exist, database access works, required tools are authenticated, and any open gaps have an explicit proceed or block decision. Every downstream task that needs those prerequisites lists TASK-1 as a dependency. You do not want an agent discovering halfway through a 200 task run that it never had database credentials.

flowchart TD
  Start["Fresh context"] --> Read["Read tasks.json index"]
  Read --> Filter["Filter to passes false"]
  Filter --> Deps{"Dependencies satisfied?"}
  Deps -->|"no, skip"| Next["Try next candidate"]
  Next --> Deps
  Deps -->|"yes"| Select["Select highest-priority task"]
  Select --> Load["Load its TASK-ID.json spec"]
  Load --> Build["Work the steps"]
  Build --> Gate["Tests, lint, types, screenshot"]
  Gate -->|"fail"| Build
  Gate -->|"pass"| Update["Set passes true, commit"]
  Update --> Stop["Stop. Next iteration starts clean"]

The loop completes exactly one task per invocation, commits, and stops. It never batches. That discipline is what keeps a long run reliable, and the reasoning behind it is covered in one task per iteration. The lookup table is what makes one-task-per-iteration cheap to execute: selection is a quick scan of flags and dependencies, not a re-read of the entire project.

You control how many of these iterations run. The default is 10. Run more with a flag.

npx @pageai/ralph-loop
./ralph.sh -n 50

If the table has 200 tasks and you cap the loop at 50 iterations, the run stops at the cap with exit code 1 (MAX_ITERATIONS) and the remaining tasks stay false in the index, ready for the next run to resume. Nothing is lost. The lookup table is the durable record.

Adding tasks over time with the prd-creator skill

A lookup table is not a frozen document. You grow it as the project grows, and you do not hand-edit a 200 entry JSON file to do that.

Ralph ships a prd-creator skill that turns unstructured requirements into a PRD plus a task list. The first time you run it, it interviews you, writes .agent/prd/PRD.md and .agent/prd/SUMMARY.md, then generates tasks.json with one TASK-{ID}.json spec per task. TASK-1 is always the prerequisite gate, and every task is initialized with passes: false. For a typical project that is dozens to hundreds of entries, not five, because the skill keeps tasks small: anything too complex to finish in a short sitting gets split.

When you want to add a feature or fix a bug later, you run the skill again. It updates the PRD and appends new tasks to the index, each with its own spec file, each starting false. The completed entries keep their passes: true status, so the loop ignores them and works only the new false ones. The table grows without disturbing the finished work.

Use the prd-creator skill in plan mode. Add a password reset flow to the
existing PRD and append the new tasks to .agent/tasks.json with one spec
file each under .agent/tasks/.

Run it in plan mode, where the agent is read-only and asks questions instead of writing code. The skill decomposes the new feature into atomic tasks the same way it did the first batch, which is its own discipline covered in breaking a PRD into atomic agent tasks. The result is a lookup table that accretes work over the life of the project while staying scannable, because the index entries stay thin no matter how many you add.

Where to go next

If you are building the spec that drives a long run, read across the spec-driven cluster:

Spec-driven development with AI for the full Specify, Plan, Tasks, Implement workflow the lookup table sits inside.
Breaking a PRD into atomic agent tasks for decomposing work into the packets that fill the table.
One task per iteration for the rule that makes the lookup table cheap to execute.

For watching the table fill in across a run, with logs, history, and screenshots, read observability for autonomous coding agents.

Frequently asked questions

What is a task lookup table for an AI agent?

It is a two-layer structure that separates the list of work from the detail of each item. The first layer is a lean index where each entry has an id, title, category, a pointer to a detail file, and a status flag. The second layer is one spec file per task with the full description, acceptance criteria, steps, and dependencies. The agent scans the cheap index to pick the next task, then loads only that one spec file to do the work.

Why does a flat task list stop scaling for autonomous agents?

A flat list that holds every task plus all of its detail gets too large to load into the context window on every iteration. At ten tasks it is fine. At a hundred it crowds out the actual code, so the agent skims, misses acceptance criteria, and contradicts earlier decisions. A lookup table keeps the working context flat because each iteration loads the thin index plus exactly one detailed spec, no matter how many tasks exist.

How does Ralph store tasks on disk?

The index is .agent/tasks.json, a flat array where each entry has id, title, category, specFilePath, and a passes flag. The detail for each task lives in .agent/tasks/TASK-{ID}.json with description, acceptance criteria, ordered steps, dependencies, estimated complexity, and technical notes. The agent reads these files fresh on every iteration, so the filesystem is the memory rather than the chat history.

How does the agent choose which task to work next?

Each iteration it scans tasks.json for the highest-priority entry where passes is false and whose dependencies are already satisfied. TASK-1 is always prerequisite verification, and downstream tasks list it as a dependency, so feature work never starts before access and environment are confirmed. The agent loads that one spec file, works the steps, runs the verification gate, then flips passes to true and commits.

How do I add more tasks to the lookup table over time?

Run the prd-creator skill again in plan mode. It updates the PRD and appends new tasks to tasks.json, each with its own spec file under .agent/tasks/ and each initialized with passes set to false. Completed entries keep their passes true status, so the loop ignores them and works only the new false ones. The table grows without disturbing finished work and stays scannable because the index entries stay thin.

Run your own Ralph loop

Ralph is a hackable script you point at your project. Install it and let an agent work through your task list.

npx @pageai/ralph-loop

Install from npm Star on GitHub Watch the walkthrough