Stable‑Diffusion.cpp: Light‑weight C++ Diffusion Inference Engine
What is stable‑diffusion.cpp?
stable‑diffusion.cpp is an open‑source, pure‑C/C++ implementation of modern diffusion models. It brings stable‑diffusion, Flux, Wan, Qwen‑Image, Z‑Image and other emerging algorithms to any system that can compile C++ — from Linux servers to Windows laptops, and even to Android via Termux. The project is designed to be:
- Zero‑dependency – no external libraries other than the bundled ggml runtime.
- Cross‑platform – works on Linux, macOS, Windows, ARM‑based Macs, and Android.
- GPU‑friendly – supports CUDA, Vulkan, Metal, OpenCL, SYCL, and even CPU‑only execution with AVX / AVX2 / AVX512.
- Performance‑oriented – optimized memory usage, Flash‑Attention, VAE tiling, and cache‑based acceleration.
The library is inspired by llama.cpp and ggml, making it a natural fit for developers familiar with those ecosystems.
Core Features at a Glance
| Feature | Supported Models | Notes |
|---|---|---|
| Image Generation | SD1.x, SD2.x, SD‑Turbo, SDXL, SDXL‑Turbo, SD‑v1.5, SD‑v2.5, SD‑3, SD‑3.5 | Standard text‑to‑image pipelines |
| Image Editing / Inpainting | FLUX.1‑Kontext‑dev, Qwen‑Image‑Edit series | Supports prompt-based edits |
| Video Generation | Wan2.1, Wan2.2 | Includes motion‑aware conditioning |
| Fast Upscale | ESRGAN | Custom tile sizing |
| Latency‑Optimised Decoding | TAESD | Faster latent decoding |
| LoRA & ControlNet | SD1.5 | Same interface as stable‑diffusion‑webui |
| Latent Consistency Models | LCM, LCM‑LoRA | Added in 2025 |
| Back‑end Choice | CPU, CUDA, Vulkan, Metal, OpenCL, SYCL | Plug‐in architecture |
| Weight Formats | .ckpt/.pth, .safetensors, .gguf | Flexible weight loading |
| Command‑line API | sd-cli |
One‑liner image generation |
| C API | sd.h / sd.cpp |
Embed into other projects |
| Docker & CI | Docker images | Build for Linux & Windows |
Getting Started: Build & Run
1. Install Dependencies
# On Ubuntu
sudo apt-get update && sudo apt-get install -y build-essential git cmake
# On macOS with Homebrew
brew install cmake git
2. Clone the Repo
git clone https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
3. Build from Source
mkdir build && cd build
cmake .. -Dggml_backend=cpu # or cuda, vulkan, metal, etc.
make -j$(nproc)
4. Download Model Weights
git clone https://huggingface.co/stable-diffusion-v1-5
curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
5. Generate an Image
./bin/sd-cli -m ../models/v1-5-pruned-emaonly.safetensors -p "a cyberpunk city at dusk"
The PNG output will also embed model embeddings as a web‑UI compatible token string.
Performance Tips
| Tip | Explanation |
|---|---|
| Use GGUF | The .gguf format is quantized and loads faster. |
| Enable Flash‑Attention | Reduces VRAM usage by a large margin on CUDA. |
| VAE Tiling | Lowers peak memory for large images. |
| Cache Layers | ./bin/sd-cli --use-cache re‑uses earlier model states. |
| Use Metal on macOS | Provides ~40 % speed‑up compared to CPU. |
Performance benchmarks on a 2020‑MacBook‑Pro (10‑core CPU, 3070 GPU) show generation of a 512×512 SD‑XL image in ~32 s on CPU, reduced to ~2.5 s on NVIDIA 3070 with CUDA + Flash‑Attention.
Extending the Library
The API is intentionally lightweight. To add new models:
1. Add a model definition (.h/.cpp) and reference the ggml format.
2. Update CMakeLists.txt and add an entry in docs/.
3. Submit a PR and you’ll see it in the next release!
Examples of community extensions include:
- Python bindings – stable-diffusion-cpp-python
- Go wrappers – stable-diffusion
- Rust runtime – diffusion-rs
- Flutter widget – local-diffusion
Community & Contribution
The repo has over 5,000 stars, 500 forks, and a vibrant contributor base. If you're interested in contributing:
- Fork the repo.
- Create feature branches.
- Submit PRs with clear commit messages.
- Run tests (make check).
- Engage in the issues thread for discussion.
The project also ships with a set of ready‑made Docker images for quick deployment in production or CI pipelines.
Why Choose stable‑diffusion.cpp?
- Performance Meets Simplicity – Get the most out of your GPU without learning a new framework.
- Broad Model Coverage – From classic SD to the latest Flux and Wan models.
- Customizable – Swap backends, use quantization, or embed into your own C++ service.
- Live Development – Active releases each month and new model support added monthly.
Ready to try? Grab the pre‑built binaries from the releases page or build your own. The documentation is continuously updated, and the community is incredibly helpful.
Next Steps:
1. Pick an engine back‑end that matches your hardware.
2. Download a model and generate a test image.
3. Dive into the examples/ folder for more advanced pipelines like image‑editing or video generation.
Happy Diffusing!