Stable‑Diffusion.cpp: Light‑weight C++ Diffusion Inference Engine

What is stable‑diffusion.cpp?

stable‑diffusion.cpp is an open‑source, pure‑C/C++ implementation of modern diffusion models. It brings stable‑diffusion, Flux, Wan, Qwen‑Image, Z‑Image and other emerging algorithms to any system that can compile C++ — from Linux servers to Windows laptops, and even to Android via Termux. The project is designed to be:

  • Zero‑dependency – no external libraries other than the bundled ggml runtime.
  • Cross‑platform – works on Linux, macOS, Windows, ARM‑based Macs, and Android.
  • GPU‑friendly – supports CUDA, Vulkan, Metal, OpenCL, SYCL, and even CPU‑only execution with AVX / AVX2 / AVX512.
  • Performance‑oriented – optimized memory usage, Flash‑Attention, VAE tiling, and cache‑based acceleration.

The library is inspired by llama.cpp and ggml, making it a natural fit for developers familiar with those ecosystems.

Core Features at a Glance

Feature Supported Models Notes
Image Generation SD1.x, SD2.x, SD‑Turbo, SDXL, SDXL‑Turbo, SD‑v1.5, SD‑v2.5, SD‑3, SD‑3.5 Standard text‑to‑image pipelines
Image Editing / Inpainting FLUX.1‑Kontext‑dev, Qwen‑Image‑Edit series Supports prompt-based edits
Video Generation Wan2.1, Wan2.2 Includes motion‑aware conditioning
Fast Upscale ESRGAN Custom tile sizing
Latency‑Optimised Decoding TAESD Faster latent decoding
LoRA & ControlNet SD1.5 Same interface as stable‑diffusion‑webui
Latent Consistency Models LCM, LCM‑LoRA Added in 2025
Back‑end Choice CPU, CUDA, Vulkan, Metal, OpenCL, SYCL Plug‐in architecture
Weight Formats .ckpt/.pth, .safetensors, .gguf Flexible weight loading
Command‑line API sd-cli One‑liner image generation
C API sd.h / sd.cpp Embed into other projects
Docker & CI Docker images Build for Linux & Windows

Getting Started: Build & Run

1. Install Dependencies

# On Ubuntu
sudo apt-get update && sudo apt-get install -y build-essential git cmake
# On macOS with Homebrew
brew install cmake git

2. Clone the Repo

git clone https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp

3. Build from Source

mkdir build && cd build
cmake .. -Dggml_backend=cpu   # or cuda, vulkan, metal, etc.
make -j$(nproc)

4. Download Model Weights

git clone https://huggingface.co/stable-diffusion-v1-5
curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors

5. Generate an Image

./bin/sd-cli -m ../models/v1-5-pruned-emaonly.safetensors -p "a cyberpunk city at dusk"

The PNG output will also embed model embeddings as a web‑UI compatible token string.

Performance Tips

Tip Explanation
Use GGUF The .gguf format is quantized and loads faster.
Enable Flash‑Attention Reduces VRAM usage by a large margin on CUDA.
VAE Tiling Lowers peak memory for large images.
Cache Layers ./bin/sd-cli --use-cache re‑uses earlier model states.
Use Metal on macOS Provides ~40 % speed‑up compared to CPU.

Performance benchmarks on a 2020‑MacBook‑Pro (10‑core CPU, 3070 GPU) show generation of a 512×512 SD‑XL image in ~32 s on CPU, reduced to ~2.5 s on NVIDIA 3070 with CUDA + Flash‑Attention.

Extending the Library

The API is intentionally lightweight. To add new models: 1. Add a model definition (.h/.cpp) and reference the ggml format. 2. Update CMakeLists.txt and add an entry in docs/. 3. Submit a PR and you’ll see it in the next release!

Examples of community extensions include: - Python bindingsstable-diffusion-cpp-python - Go wrappersstable-diffusion - Rust runtimediffusion-rs - Flutter widgetlocal-diffusion

Community & Contribution

The repo has over 5,000 stars, 500 forks, and a vibrant contributor base. If you're interested in contributing: - Fork the repo. - Create feature branches. - Submit PRs with clear commit messages. - Run tests (make check). - Engage in the issues thread for discussion.

The project also ships with a set of ready‑made Docker images for quick deployment in production or CI pipelines.

Why Choose stable‑diffusion.cpp?

  • Performance Meets Simplicity – Get the most out of your GPU without learning a new framework.
  • Broad Model Coverage – From classic SD to the latest Flux and Wan models.
  • Customizable – Swap backends, use quantization, or embed into your own C++ service.
  • Live Development – Active releases each month and new model support added monthly.

Ready to try? Grab the pre‑built binaries from the releases page or build your own. The documentation is continuously updated, and the community is incredibly helpful.


Next Steps: 1. Pick an engine back‑end that matches your hardware. 2. Download a model and generate a test image. 3. Dive into the examples/ folder for more advanced pipelines like image‑editing or video generation.

Happy Diffusing!

Original Article: View Original

Share this article