Karpathy's Autoresearch: AI Agents Train LLMs Overnight

Karpathy's Autoresearch: Let AI Agents Revolutionize Your Model Training

The era of manual AI research is over. Andrej Karpathy's autoresearch repository (20.6k stars) introduces a groundbreaking approach: AI agents autonomously improve LLMs overnight without human intervention.

The Revolutionary Concept

Instead of researchers manually tweaking hyperparameters, architecture, and optimizers, autoresearch hands control to AI agents. The workflow:

  1. Agent edits train.py (GPT model, Muon+AdamW optimizer, training loop)
  2. Runs 5-minute training (fixed wall-clock budget)
  3. Evaluates on val_bpb (bits per byte, lower = better)
  4. Keeps improvements, discards failures
  5. Repeats ~100x overnight

Wake up to optimized models and detailed experiment logs.

Minimal 4-File Setup

uv sync
uv run prepare.py  # Download data + train tokenizer
uv run train.py    # Manual test (~5 min)

Core files: - prepare.py – Data prep + utilities (fixed) - train.py – Agent's playground (model + training) - program.md – Agent instructions (human-editable)

Production-Ready Design Choices

βœ… Single editable file keeps diffs reviewable βœ… Fixed 5-min budget = fair architecture comparisons βœ… Self-contained – PyTorch + minimal deps βœ… Vocab-independent metric (val_bpb)

Quick Start for H100 Users

# 1. Install (Python 3.10+)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

# 2. Prep data (~2 min)
uv run prepare.py

# 3. Test run (~5 min)
uv run train.py

Spin up Claude/Codex:

"Hi, read program.md and kick off a new experiment!"

Smaller Hardware? Try These Forks

Pro tips for low-compute: TinyStories dataset, vocab_size=1024, DEPTH=4, MAX_SEQ_LEN=256.

Why This Changes Everything

  • Democratizes research: Single GPU β†’ frontier progress
  • Platform-optimized: Finds best model for your hardware
  • Agent-programmable: Edit program.md to add multi-agent swarms
  • MIT licensed: Fork, extend, contribute

GitHub Repo (20.6k ⭐) – The future of AI research has arrived.

Original Article: View Original

Share this article