Karpathy's Autoresearch: AI Agents Train LLMs Overnight
Andrej Karpathy's autoresearch repo revolutionizes AI development by letting autonomous AI agents experiment with LLM training overnight. No manual coding required – agents modify train.py, run 5-minute experiments, and optimize models based on validation loss. Wake up to better models and detailed logs. Single-GPU setup with nanochat architecture makes frontier research accessible to anyone with an NVIDIA GPU. Perfect for AI researchers wanting to automate hyperparameter tuning, architecture search, and model optimization.
Karpathy's Autoresearch: Let AI Agents Revolutionize Your Model Training
The era of manual AI research is over. Andrej Karpathy's autoresearch repository (20.6k stars) introduces a groundbreaking approach: AI agents autonomously improve LLMs overnight without human intervention.
The Revolutionary Concept
Instead of researchers manually tweaking hyperparameters, architecture, and optimizers, autoresearch hands control to AI agents. The workflow:
- Agent edits
train.py(GPT model, Muon+AdamW optimizer, training loop) - Runs 5-minute training (fixed wall-clock budget)
- Evaluates on val_bpb (bits per byte, lower = better)
- Keeps improvements, discards failures
- Repeats ~100x overnight
Wake up to optimized models and detailed experiment logs.
Minimal 4-File Setup
uv sync
uv run prepare.py # Download data + train tokenizer
uv run train.py # Manual test (~5 min)
Core files:
prepare.py– Data prep + utilities (fixed)train.py– Agent's playground (model + training)program.md– Agent instructions (human-editable)
Production-Ready Design Choices
✅ Single editable file keeps diffs reviewable ✅ Fixed 5-min budget = fair architecture comparisons ✅ Self-contained – PyTorch + minimal deps ✅ Vocab-independent metric (val_bpb)
Quick Start for H100 Users
# 1. Install (Python 3.10+)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
# 2. Prep data (~2 min)
uv run prepare.py
# 3. Test run (~5 min)
uv run train.py
Spin up Claude/Codex:
"Hi, read program.md and kick off a new experiment!"
Smaller Hardware? Try These Forks
- MacOS: miolini/autoresearch-macos
- MacOS MLX: trevin-creator/autoresearch-mlx
- Windows RTX: jsegov/autoresearch-win-rtx
Pro tips for low-compute: TinyStories dataset, vocab_size=1024, DEPTH=4, MAX_SEQ_LEN=256.
Why This Changes Everything
- Democratizes research: Single GPU → frontier progress
- Platform-optimized: Finds best model for your hardware
- Agent-programmable: Edit
program.mdto add multi-agent swarms - MIT licensed: Fork, extend, contribute
GitHub Repo (20.6k ⭐) – The future of AI research has arrived.