BabyAGI: The Three-Hour Project That Kicked Off the AI Agent Era

In April 2023, Yohei Nakajima, a venture capitalist at Untapped Capital, spent about three hours across two days building something with GPT-4. He fed it roughly 50 prompts. It generated the code, wrote its own research paper, produced flowcharts, and composed a Twitter thread announcing itself. He called it BabyAGI.

Within days, AI safety researchers were debating it. Friends compared it to Skynet. When Nakajima asked it to make “as many paperclips as possible” (the classic AI alignment thought experiment), it autonomously wrote a safety protocol. The guy who originally proposed the paperclip apocalypse scenario was impressed.

BabyAGI wasn’t AGI. Nakajima said so himself. But it was the first widely shared autonomous agent framework that could create its own tasks, execute them, and loop back to create more. That pattern, task creation followed by execution followed by reprioritization, is now the backbone of every major agent framework in production today.

What BabyAGI Actually Does

The original BabyAGI ran a tight loop with three components:

Execution agent completes a task using an LLM
Task creation agent looks at the result and generates new tasks
Prioritization agent reorders the task queue based on the overall objective

It used Pinecone to store embedded task-result pairs for context during planning and execution. The whole thing was surprisingly compact. You could read the entire codebase in one sitting.

The insight was simple but it stuck: if you give an LLM the ability to create its own to-do list and work through it, you get something that looks like autonomous behavior. It wasn’t actually autonomous in any deep sense. It was a loop with an LLM doing the thinking at each step. But that loop turned out to be the design pattern that mattered.

The Rewrite: functionz

The original BabyAGI has been archived. The current version is a complete rewrite centered on something called functionz, a database-backed function management framework.

Instead of the task loop, the new BabyAGI focuses on self-building agents that can register, store, and execute functions from a database. It tracks dependencies between functions using a graph structure, manages API keys and secrets, and includes a web dashboard at localhost:8080 for monitoring everything.

The two headline features are experimental self-building agents:

process_user_input: Figures out whether existing functions can handle a request or generates new ones
self_build: Creates task variations and generates handler functions for them

You can tell it something like “grab today’s scores from ESPN and email them to me,” and it will generate the necessary functions, register them, and chain them together. In theory. In practice, the generated code is minimal and often needs manual cleanup.

Installation is one line:

pip install babyagi

Then a few lines of Python to spin up the dashboard:

import babyagi
app = babyagi.create_app('/dashboard')
app.run(host='0.0.0.0', port=8080)

Why It’s Not in Our Tools Database

BabyAGI sits at 22,100 GitHub stars and runs under an MIT license. By those metrics alone you’d expect it on our tools page. We left it off for a few reasons.

First, Nakajima himself says it’s not production-ready. The README includes the line: “This is a framework built by Yohei who has never held a job as a developer.” That kind of honesty is rare and worth respecting. He’s not selling you something. He’s sharing an idea.

Second, the framework is maintained by one person working nights and weekends. PR management has been sparse. For anyone building real products, that’s a risk you need to weigh against the 22k stars.

Third, the agent framework space has moved well past BabyAGI’s original scope. CrewAI has $18M in funding, 100,000+ certified developers, and handles 60 million agent executions monthly. LangGraph provides graph-based orchestration for stateful, long-running agents. AutoGen offers multi-agent conversation patterns backed by Microsoft. These are production-grade frameworks with teams behind them. BabyAGI is a one-person experiment that happens to have been first.

Where BabyAGI Still Matters

None of that diminishes what BabyAGI represents.

If you’re trying to understand how AI agents work at a fundamental level, BabyAGI is still one of the best starting points. The task loop pattern is clean enough to fit in your head. You can trace the flow from task creation to execution to reprioritization without getting lost in abstraction layers. That clarity is hard to find in frameworks built for production.

For students, hobbyists, and anyone building their first agent, BabyAGI is a better starting point than jumping straight into CrewAI or LangGraph. You’ll learn the core concepts without fighting an opinionated framework.

And the broader lesson from BabyAGI’s origin story is worth remembering: a VC with no coding background used GPT-4 to build a framework that defined an entire category of software in three hours. That was April 2023. Vibe coding wasn’t even a phrase yet. Nakajima was doing it before anyone had a name for it.

The Lineage

It’s worth tracing what BabyAGI’s loop pattern turned into. The create-execute-reprioritize cycle shows up everywhere now:

Claude Code’s agent mode runs a similar loop: understand the task, plan steps, execute, evaluate, loop back
OpenAI’s Codex agents break down coding tasks into subtasks, execute them, and self-correct
Devin and similar coding agents use task decomposition as their core architecture
CrewAI’s crew pattern is a multi-agent version of the same idea, with each agent owning a piece of the task queue

BabyAGI didn’t invent the concept of an agent loop. Researchers had described similar patterns before. But it was the first implementation that a non-researcher could run, understand, and build on. That’s why it mattered. That’s why 22,000 people starred it.

Bottom Line

BabyAGI is an educational project that accidentally defined a category. It’s not the tool you use to ship products. It’s the tool that helps you understand how the tools you use to ship products actually work.

If you want to build agents, start here. Read the code. Run the loop. Then graduate to something with a team behind it.

Sources: