llmfit: The Ultimate LLM Fit Tool for Your Hardware
llmfit: The Ultimate LLM Fit Tool for Your Hardware
Large‑language‑model (LLM) adoption has exploded, but selecting the right model for your machine still feels like a blind guess. Do you need a 30‑B Whisper model for a laptop with 16 GB RAM? Should you force a MoE model onto a single‑GPU workstation? Traditionally you’d read papers, download massive binaries, run quick benchmarks, and still end up with under‑ or over‑utilized hardware.
Enter llmfit – a Rust‑built terminal utility that automatically evaluates 157 models from 30 providers across four dimensions (quality, speed, fit, context) and tells you exactly which one will run on your system. No more fiddling over GPU memory calculations or uncertain quantization choices.
What llmfit Can Do
| Feature | Description |
|---|---|
| Hardware detection | Reads RAM, CPU cores, and auto‑detects Nvidia/AMD/Intel/Apple GPUs. Returns backend (CUDA, Metal, ROCm, SYCL) and VRAM. |
| Dynamic quantization | Traverses a hierarchy from Q8₀ to Q2℺, picking the highest‑quality quantization that fits. Falls back to half context if nothing fits fully. |
| Mixture‑of‑Experts (MoE) | Detects MoE models (Mixtral, DeepSeek, etc.) and calculates active‑expert memory usage, enabling efficient off‑loading. |
| Scoring & ranking | Computes composite scores weighted per use‑case (Chat, Coding, Reasoning). Shows top‑rated models in a sortable table. |
| Multi‑GPU & CPU‑plus‑GPU | Supports multi‑GPU setups, CPU+GPU spill‑over, and pure CPU runs if GPUs are absent. |
| Ollama integration | Automatically lists installed Ollama models, highlights them, and lets you pull new ones with a single key press. Works out‑of‑the box if ollama serve is running. |
| Interactive TUI & CLI | Launch with llmfit for an ncurses‑style interface or use --cli for classic table, fit, search, info, etc. |
| JSON output | Add --json to any command for machine‑readable data, ideal for agents or scripts. |
| OpenClaw skill | Ships an OpenClaw skill that recommends and configures Ollama models directly inside your agent’s openclaw.json. |
Quick Start
Three identical ways to get started:
- Homebrew (macOS/Linux)
brew tap AlexsJones/llmfit brew install llmfit - Cargo (for Rust users)
cargo install llmfit - curl script (any Unix shell)
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
If you’re on Windows, the script will still install a binary to %USERPROFILE%/.local/bin. Just adjust your PATH accordingly.
Pro tip: After installation, test the TUI with
llmfit. If you see a green ✓ under Ollama, it means your local server is detected and you can start pulling models instantly.
Using the Tool
Interactive TUI
Running llmfit launches a clean interface that displays:
- System Specs: CPU cores, RAM, GPU name, VRAM, backend.
- Model Table: Columns for score, tok/s, quant, mode, memory, use‑case.
- Keyboard Shortcuts: Navigate with arrows or
j/k, search with/, filter fit withf, toggle providers withp, pull a model withd, refresh installed list withr, and quit withq.
Classic CLI
If you prefer plain text, use --cli:
# Top ranked models
llmfit --cli
# Perfect‑fit models only
llmfit fit --perfect -n 5
# Human‑readable JSON
llmfit recommend --json --limit 5 --use-case coding
The --json flag is handy when you want to pipe results to another tool or store them in a configuration file.
Behind the Scenes
The core of llmfit lives in a single hf_models.json file that ships with every release (< 2 MB). It contains meta‑data for each model: parameter count, context window, provider, MoE flags, etc. The Rust code uses this embedded data to:
- Detect hardware –
sysinforeads RAM and CPU, while dedicated queries (nvidia‑smi, rocm‑smi, system_profiler) pull VRAM and backend. - Enumerate models – Iterate over the database, calculate memory usage per quantization level, and apply user constraints.
- Score – Four normalized dimensions (quality, speed, fit, context) are combined with use‑case‑specific weights.
- Render –
tui_app.rsmanages the interactive view,tui_ui.rsdraws the layout withratatui, anddisplay.rsformats classic tables.
The result is a fast (< 1 s) command‑line utility that feels instant.
Why llmfit Matters
- Zero‑setup – No need to manually calculate VRAM or run heavy benchmarks.
- Up‑to‑date – The scraper
scripts/scrape_hf_models.pypulls from HuggingFace; runmake update-modelsto refresh the database. - Portable – Works on macOS, Linux, Windows, ARM, and Intel. Supports Metal, CUDA, ROCm, SYCL, and CPU backends.
- Integrates – Whether you’re using Ollama, vLLM or LM Studio, llmfit can map model names and tell you what will run.
- Community‑friendly – MIT‑licensed, written in Rust, actively maintained, and documented.
Use Cases
| Scenario | Recommended Feature |
|---|---|
| Personal laptop | llmfit + --cli to find the best small‑size model that fits 8–16 GB RAM. |
| Small server | Enable multi‑GPU support, pull MoE models, and view CPU+GPU spill‑over. |
| OpenClaw agent | Use the provided skill to let agents auto‑configure Ollama models based on the output of llmfit recommend. |
| Research lab | Run make update-models nightly to keep the database fresh, then script llmfit recommend into CI workflows. |
Getting Involved
The project welcomes contributions:
- Add a new model – Update
TARGET_MODELSinscripts/scrape_hf_models.py. - Improve scoring – Open an issue to tweak weighting for a use‑case.
- Feature requests – Request support for a new provider or backend.
- Bug reports – If the tool incorrectly estimates memory, let us know!
After changes, run make update-models and commit the updated hf_models.json.
The Bottom Line
llmfit turns the daunting task of which LLM runs on my machine? into a single, deterministic command. Its blend of accurate hardware probing, intelligent quantization, and exhaustive model coverage makes it an indispensable utility for developers, researchers, and AI hobbyists who want the best model for their hardware without the manual trial‑and‑error.
Try llmfit today and see your favorite LLMs on screen in just a few seconds. Because choosing the right model should never be a guessing game.