ComfyUI‑GGUF: Run Low‑Bit Models on Your GPU
ComfyUI‑GGUF: Run Low‑Bit Models on Your GPU
The recent surge of low‑bit model formats such as GGUF has made it possible to run large diffusion networks on machines with limited VRAM. ComfyUI‑GGUF is a lightweight, open‑source extension that plugs directly into the ComfyUI ecosystem, letting you load quantized GGUF files for UNet, Diffusion, and even the T5 text encoder. This guide walks through the concepts, installation steps, and real‑world usage so you can start generating high‑quality images without investing in a high‑end GPU.
Why GGUF Matters
- Size and Speed: GGUF stores model weights in a compressed, column‑arithmetic format that can drop the bit‑width to 4‑bit or 3‑bit per weight while keeping model quality intact.
- On‑the‑fly Dequantization: The extension automatically dequantizes weights at runtime, keeping CPU/GPU memory usage low. This is especially useful for transformer/DiT architectures like Flux.
- Cross‑Platform: Whether you’re on Windows, macOS, or Linux, the repository includes platform‑specific installation guidelines.
Supported Models at a Glance
| Model | Quantization | GGUF Variant |
|---|---|---|
| Flux 1‑Dev | Q4_0 | flux1-dev.gguf |
| Flux Schnell | Q4_0 | flux1-schnell.gguf |
| Stable Diffusion 3.5‑Large | Q4_0 | stable-diffusion-3.5-large.gguf |
| Stable Diffusion 3.5‑Large‑Turbo | Q4_0 | stable-diffusion-3.5-large-turbo.gguf |
| T5‑v1.1‑XXL | Q4_0 | t5_v1.1-xxl.gguf |
All models are dropped into the ComfyUI/models/unet folder (or the CLIP folder for T5) to be discovered by the new GGUF Unet Loader.
1️⃣ Installation Prerequisites
- ComfyUI – Ensure you’re running a recent ComfyUI version (post‑October 2024) that supports custom ops.
- Python 3.9+ – The extension relies on the
ggufpackage. - Git – Clone the repo locally.
⚠️ For macOS, use torch 2.4.1. Torch 2.6.* nightly releases trigger an “M1 buffer is not large enough” error.
2️⃣ Clone the Repository
# From your ComfyUI installation root
git clone https://github.com/city96/ComfyUI-GGUF custom_nodes/ComfyUI-GGUF
After cloning, install the sole inference dependency:
pip install --upgrade gguf
If you run a stand‑alone ComfyUI portable build, run those commands within the ComfyUI_windows_portable folder and point Python to the embedded interpreter.
3️⃣ Replace the Standard Loader
Open your ComfyUI workflow editor and replace the standard Load Diffusion Model node with the new Unet Loader (GGUF) node. The node lives under the bootleg category.
💡 The node auto‑scans the unet folder for
.gguffiles; simply drop the quantized archive and you're ready.
4️⃣ Optional: Quantize Your Own Models
If you own a non‑quantized checkpoint, you can use the tools folder scripts.
- Place the original
.ckptor.binintools. - Run the provided quantizer script (uses the
ggufCLI under the hood). Example:
python tools/quantize.py --input sd3-large.ckpt --output sd3-large.gguf --bits 4
This will produce a sd3-large.gguf that you can place in your unet folder.
5️⃣ Experimental LoRA Support
Currently, the LoRA loader is experimental but has shown successful integration when using the built‑in LoRA nodes. Simply load your LoRA .ckpt alongside the GGUF UNet; ComfyUI will merge them at runtime.
6️⃣ Platform‑Specific Tips
- Windows: Run a CMD inside ComfyUI_windows_portable, then execute the
pip install -r requirements.txtcommand. - macOS (Sequoia): Use
torch==2.4.1to avoid buffer overflows. - Linux: Standard
pip installworks; ensure you have a recent CUDA toolkit if you plan to use GPU acceleration.
🚀 Running Low‑Bit Inference
After setting up, launch ComfyUI and use a simple workflow:
- Add Unet Loader (GGUF).
- Add a T5 Loader (GGUF) node if you need a quantized text encoder.
- Insert standard Text Prompt and Sampler nodes.
- Hit Generate.
You’ll notice GPU memory usage drop from ~10 GB (full precision) to ~4 GB or less, depending on the bit‑width.
📌 Takeaways
- ComfyUI‑GGUF brings low‑bit inference to the forefront of creative AI tools.
- It’s a clean, open‑source solution that reduces VRAM costs without compromising visual fidelity.
- With a few
git clonecommands and apip install, you can start running Flux 1‑Dev or Stable Diffusion 3.5 on an NVIDIA RTX 4060 or even an integrated GPU. - Experiment with quantization levels – the library supports Q4_0, Q4_1, and even Q3_0 variants.
Happy generating, and let the low‑bit dream become a reality on your desktop!