llmfit: The Ultimate LLM Fit Tool for Your Hardware

llmfit: The Ultimate LLM Fit Tool for Your Hardware

Large‑language‑model (LLM) adoption has exploded, but selecting the right model for your machine still feels like a blind guess. Do you need a 30‑B Whisper model for a laptop with 16 GB RAM? Should you force a MoE model onto a single‑GPU workstation? Traditionally you’d read papers, download massive binaries, run quick benchmarks, and still end up with under‑ or over‑utilized hardware.

Enter llmfit – a Rust‑built terminal utility that automatically evaluates 157 models from 30 providers across four dimensions (quality, speed, fit, context) and tells you exactly which one will run on your system. No more fiddling over GPU memory calculations or uncertain quantization choices.


What llmfit Can Do

Feature Description
Hardware detection Reads RAM, CPU cores, and auto‑detects Nvidia/AMD/Intel/Apple GPUs.
Returns backend (CUDA, Metal, ROCm, SYCL) and VRAM.
Dynamic quantization Traverses a hierarchy from Q8₀ to Q2℺, picking the highest‑quality quantization that fits. Falls back to half context if nothing fits fully.
Mixture‑of‑Experts (MoE) Detects MoE models (Mixtral, DeepSeek, etc.) and calculates active‑expert memory usage, enabling efficient off‑loading.
Scoring & ranking Computes composite scores weighted per use‑case (Chat, Coding, Reasoning). Shows top‑rated models in a sortable table.
Multi‑GPU & CPU‑plus‑GPU Supports multi‑GPU setups, CPU+GPU spill‑over, and pure CPU runs if GPUs are absent.
Ollama integration Automatically lists installed Ollama models, highlights them, and lets you pull new ones with a single key press. Works out‑of‑the box if ollama serve is running.
Interactive TUI & CLI Launch with llmfit for an ncurses‑style interface or use --cli for classic table, fit, search, info, etc.
JSON output Add --json to any command for machine‑readable data, ideal for agents or scripts.
OpenClaw skill Ships an OpenClaw skill that recommends and configures Ollama models directly inside your agent’s openclaw.json.

Quick Start

Three identical ways to get started:

  1. Homebrew (macOS/Linux)
    brew tap AlexsJones/llmfit
    brew install llmfit
    
  2. Cargo (for Rust users)
    cargo install llmfit
    
  3. curl script (any Unix shell)
    curl -fsSL https://llmfit.axjns.dev/install.sh | sh
    

If you’re on Windows, the script will still install a binary to %USERPROFILE%/.local/bin. Just adjust your PATH accordingly.

Pro tip: After installation, test the TUI with llmfit. If you see a green ✓ under Ollama, it means your local server is detected and you can start pulling models instantly.

Using the Tool

Interactive TUI

Running llmfit launches a clean interface that displays:

  • System Specs: CPU cores, RAM, GPU name, VRAM, backend.
  • Model Table: Columns for score, tok/s, quant, mode, memory, use‑case.
  • Keyboard Shortcuts: Navigate with arrows or j/k, search with /, filter fit with f, toggle providers with p, pull a model with d, refresh installed list with r, and quit with q.

Classic CLI

If you prefer plain text, use --cli:

# Top ranked models
llmfit --cli

# Perfect‑fit models only
llmfit fit --perfect -n 5

# Human‑readable JSON
llmfit recommend --json --limit 5 --use-case coding

The --json flag is handy when you want to pipe results to another tool or store them in a configuration file.

Behind the Scenes

The core of llmfit lives in a single hf_models.json file that ships with every release (< 2 MB). It contains meta‑data for each model: parameter count, context window, provider, MoE flags, etc. The Rust code uses this embedded data to:

  1. Detect hardwaresysinfo reads RAM and CPU, while dedicated queries (nvidia‑smi, rocm‑smi, system_profiler) pull VRAM and backend.
  2. Enumerate models – Iterate over the database, calculate memory usage per quantization level, and apply user constraints.
  3. Score – Four normalized dimensions (quality, speed, fit, context) are combined with use‑case‑specific weights.
  4. Rendertui_app.rs manages the interactive view, tui_ui.rs draws the layout with ratatui, and display.rs formats classic tables.

The result is a fast (< 1 s) command‑line utility that feels instant.

Why llmfit Matters

  • Zero‑setup – No need to manually calculate VRAM or run heavy benchmarks.
  • Up‑to‑date – The scraper scripts/scrape_hf_models.py pulls from HuggingFace; run make update-models to refresh the database.
  • Portable – Works on macOS, Linux, Windows, ARM, and Intel. Supports Metal, CUDA, ROCm, SYCL, and CPU backends.
  • Integrates – Whether you’re using Ollama, vLLM or LM Studio, llmfit can map model names and tell you what will run.
  • Community‑friendly – MIT‑licensed, written in Rust, actively maintained, and documented.

Use Cases

Scenario Recommended Feature
Personal laptop llmfit + --cli to find the best small‑size model that fits 8–16 GB RAM.
Small server Enable multi‑GPU support, pull MoE models, and view CPU+GPU spill‑over.
OpenClaw agent Use the provided skill to let agents auto‑configure Ollama models based on the output of llmfit recommend.
Research lab Run make update-models nightly to keep the database fresh, then script llmfit recommend into CI workflows.

Getting Involved

The project welcomes contributions:

  1. Add a new model – Update TARGET_MODELS in scripts/scrape_hf_models.py.
  2. Improve scoring – Open an issue to tweak weighting for a use‑case.
  3. Feature requests – Request support for a new provider or backend.
  4. Bug reports – If the tool incorrectly estimates memory, let us know!

After changes, run make update-models and commit the updated hf_models.json.

The Bottom Line

llmfit turns the daunting task of which LLM runs on my machine? into a single, deterministic command. Its blend of accurate hardware probing, intelligent quantization, and exhaustive model coverage makes it an indispensable utility for developers, researchers, and AI hobbyists who want the best model for their hardware without the manual trial‑and‑error.

Try llmfit today and see your favorite LLMs on screen in just a few seconds. Because choosing the right model should never be a guessing game.

Original Article: View Original

Share this article