oMLX: Mac Menu Bar LLM Server with SSD Cache

oMLX: Revolutionize Local AI on Your Mac with Menu Bar Control

Local LLMs on Apple Silicon just got a major upgrade. oMLX is an open-source inference server that combines production-grade features with dead-simple Mac integration. Forget terminal juggling – manage your LLMs directly from your menu bar.

Why oMLX Stands Out

Built on Apple's MLX framework, oMLX delivers:

  • Tiered KV Caching: Hot RAM tier + Cold SSD tier with prefix sharing and Copy-on-Write
  • Continuous Batching: Handle concurrent requests like vLLM
  • Multi-Model Serving: LLMs, VLMs, embeddings, and rerankers in one server
  • Native macOS App: Menu bar stats, auto-restart, in-app updates
  • Admin Dashboard: Real-time monitoring, model downloader, benchmarks, per-model settings

Killer Features for Developers

# Pin your daily models, auto-swap heavy ones
Pin: Qwen3-Coder-8bit, Step-3.5-Flash
Auto-load: gpt-oss-120b on demand

# SSD cache survives restarts
/hot-cache: 20GB RAM
/cold-cache: ~/.omlx/cache (SSD)

Claude Code Ready: Context scaling + SSE keep-alive prevents timeouts during long compilations.

Vision Superpowers: Qwen3.5-VL, GLM-4V, Pixtral with multi-image tool calling and OCR auto-detection.

Dead Simple Setup

# Homebrew (recommended)
brew tap jundot/omlx
git clone https://github.com/jundot/omlx
pip install -e .

# Launch and forget
brew services start omlx

Or grab the DMG from Releases – three clicks to first tokens.

OpenAI API Drop-In

POST http://localhost:8000/v1/chat/completions
curl -X POST http://localhost:8000/v1/embeddings

Full streaming usage stats, Anthropic Messages API, tool calling, and vision inputs (base64/URL).

Production Ready

  • Memory Enforcement: Total RAM limit prevents OOM
  • LRU + Manual + TTL: Sophisticated model eviction
  • Offline Admin: All CDN assets vendored
  • Structured Logging: Service + application logs

Benchmarks Speak Louder

Run from admin panel: Prefill tokens/sec, generation tokens/sec, cache hit rates. Real-world numbers, not synthetic fluff.

Get Started Today

⭐ 2.4k GitHub stars and growing. Apache 2.0 licensed.

omlx serve --model-dir ~/models --max-model-memory 32GB

Your Mac's unified memory + oMLX = local AI that rivals cloud services. Install now and experience the future of on-device inference.

Original Article: View Original

Share this article