oMLX: Mac Menu Bar LLM Server with SSD Cache
oMLX: Revolutionize Local AI on Your Mac with Menu Bar Control
Local LLMs on Apple Silicon just got a major upgrade. oMLX is an open-source inference server that combines production-grade features with dead-simple Mac integration. Forget terminal juggling β manage your LLMs directly from your menu bar.
Why oMLX Stands Out
Built on Apple's MLX framework, oMLX delivers:
- Tiered KV Caching: Hot RAM tier + Cold SSD tier with prefix sharing and Copy-on-Write
- Continuous Batching: Handle concurrent requests like vLLM
- Multi-Model Serving: LLMs, VLMs, embeddings, and rerankers in one server
- Native macOS App: Menu bar stats, auto-restart, in-app updates
- Admin Dashboard: Real-time monitoring, model downloader, benchmarks, per-model settings
Killer Features for Developers
# Pin your daily models, auto-swap heavy ones
Pin: Qwen3-Coder-8bit, Step-3.5-Flash
Auto-load: gpt-oss-120b on demand
# SSD cache survives restarts
/hot-cache: 20GB RAM
/cold-cache: ~/.omlx/cache (SSD)
Claude Code Ready: Context scaling + SSE keep-alive prevents timeouts during long compilations.
Vision Superpowers: Qwen3.5-VL, GLM-4V, Pixtral with multi-image tool calling and OCR auto-detection.
Dead Simple Setup
# Homebrew (recommended)
brew tap jundot/omlx
git clone https://github.com/jundot/omlx
pip install -e .
# Launch and forget
brew services start omlx
Or grab the DMG from Releases β three clicks to first tokens.
OpenAI API Drop-In
POST http://localhost:8000/v1/chat/completions
curl -X POST http://localhost:8000/v1/embeddings
Full streaming usage stats, Anthropic Messages API, tool calling, and vision inputs (base64/URL).
Production Ready
- Memory Enforcement: Total RAM limit prevents OOM
- LRU + Manual + TTL: Sophisticated model eviction
- Offline Admin: All CDN assets vendored
- Structured Logging: Service + application logs
Benchmarks Speak Louder
Run from admin panel: Prefill tokens/sec, generation tokens/sec, cache hit rates. Real-world numbers, not synthetic fluff.
Get Started Today
β 2.4k GitHub stars and growing. Apache 2.0 licensed.
omlx serve --model-dir ~/models --max-model-memory 32GB
Your Mac's unified memory + oMLX = local AI that rivals cloud services. Install now and experience the future of on-device inference.