Build Your Own ChatGPT: Nanochat - The $100 LLM
Build Your Own ChatGPT: Nanochat β The $100 LLM Project
Andrej Karpathy, a prominent figure in the AI community, has unveiled Nanochat, an ambitious open-source project designed to make the complexities of Large Language Model (LLM) development accessible to everyone. Dubbed "the best ChatGPT that $100 can buy," Nanochat is a full-stack implementation of a ChatGPT-like LLM packed into a single, clean, minimal, and highly hackable codebase.
What is Nanochat?
Nanochat goes beyond just pretraining; it offers a complete pipeline for LLM development, encompassing tokenization, pretraining, fine-tuning, evaluation, inference, and even web serving via a simple user interface. This means you can train and interact with your very own LLM, much like you would with ChatGPT. The project is specifically engineered to run efficiently, primarily on a single 8XH100 node using provided scripts like speedrun.sh.
The $100 Challenge: Training Your Own LLM
The central ethos of Nanochat is accessibility and cost-effectiveness. The speedrun.sh script demonstrates how to train a functional LLM for approximately $100. This involves about 4 hours of training on an 8XH100 node, yielding a model with 1.9 billion parameters trained on 38 billion tokens. While these "micro models" might not rival cutting-edge LLMs like GPT-5 in performance (often described as conversing with a kindergartner due to their naivete and tendency to hallucinate), they offer an unparalleled opportunity for hands-on learning and customization.
Karpathy emphasizes that Nanochat is fully yours β configurable, tweakable, and hackable from start to finish. This makes it an ideal platform for researchers, developers, and enthusiasts looking to understand the inner workings of LLMs without a multi-million-dollar budget.
Getting Started: A Quick Guide
To embark on your Nanochat journey, the fastest route is to execute the speedrun.sh script. This script handles the entire process, from data preparation to model training and inference. Once the training (which takes about 4 hours) is complete, you can interact with your newly trained LLM through a web-based UI by running python -m scripts.chat_web.
The project also provides insights into scaling, suggesting methods to train larger models (e.g., $300 tier d26 model) with minor adjustments to the speedrun.sh configuration, primarily involving increasing data shards and carefully managing device batch sizes to prevent out-of-memory errors.
Designed for Learning and Hacking
Nanochat deliberately avoids the complexity of an exhaustive LLM framework. Instead, it prioritizes a single, cohesive, minimal, readable, and maximally-forkable "strong baseline" codebase. This design philosophy is geared towards ensuring high cognitive accessibility for anyone wanting to delve into LLM development. The goal is to produce a concrete ChatGPT clone and its 'report card' of evaluations and metrics.
For those on less powerful hardware, Nanochat also offers experimental support for CPU and MPS (Apple Silicon) devices, allowing for tinkering and training of very tiny LLMs, albeit with greater patience required.
Contributing to the Future of Micro Models
Nanochat is an ongoing project, aiming to advance the state-of-the-art in micro models that are accessible to work with end-to-end on budgets under $1000. Contributions are welcome, emphasizing the community-driven aspect of developing a robust, yet straightforward, LLM training ecosystem.
By demystifying the process and lowering the entry barrier, Nanochat promises to be a pivotal tool for anyone looking to build, understand, and customize their own AI assistants, directly from their own compute environment.