Apple Silicon - Open Source Projects

397B MoE on MacBook: 4.4 t/s Flash-MoE Engine

April 03, 2026

Tags:

Apple Silicon LLM inference Mixture of Experts Metal Compute Model Quantization

Flash-MoE runs Qwen3.5-397B-A17B (397 billion parameters) on a MacBook Pro M3 Max with 48GB RAM at 4.4+ tokens/second. Pure C/Metal inference streams 209GB model from SSD with production-quality output including tool calling. Key innovations: FMA-optimized dequant kernels (+12% speed), OS page cache expert streaming, deferred GPU compute, and hand-tuned Metal shaders. 58 experiments documented with full technical paper.

Read more Original

Practical Open Source Projects

TurboQuant+: 6.4x KV Cache Compression for LLMs

March 29, 2026

Tags:

Apple Silicon Llama.cpp LLM inference KV cache compression TurboQuant

TurboQuant+ implements ICLR 2026's breakthrough KV cache compression, achieving 4.6-6.4x compression with near q8_0 quality and speed. Features turbo2/turbo3/turbo4 formats, attention-gated Sparse V decoding (+22.8% decode speed), and full llama.cpp Metal integration. Run Qwen 3.5 35B-A3B on M5 Max with 93.9% NIAH retrieval and 1.02x q8_0 prefill speed. Complete Python prototype with 511+ tests and community validation across Apple Silicon, NVIDIA, and AMD.

Read more Original

Practical Open Source Projects

oMLX: Mac Menu Bar LLM Server with SSD Cache

March 10, 2026

Tags:

Apple Silicon MLX oMLX LLM Server Mac AI

Discover oMLX, the ultimate local LLM server for Apple Silicon Macs. Run LLMs, VLMs, and embeddings from your menu bar with continuous batching, tiered KV caching (RAM + SSD), and multi-model serving. Features admin dashboard, OpenAI API compatibility, Claude Code optimization, and one-click model downloads from Hugging Face. Install via DMG, Homebrew, or source – perfect for developers wanting production-grade local AI without cloud costs.

Read more Original

Practical Open Source Projects

Train Transformers on Apple Neural Engine - ANE GitHub

March 03, 2026

Tags:

Apple Silicon Apple Neural Engine Transformer Training ANE ML Optimization

Discover ANE Training: a groundbreaking open-source project that reverse-engineers Apple's Neural Engine to run full transformer training (forward + backward passes) directly on M4 hardware. Achieving 9.3ms/step and 1.78 TFLOPS sustained performance with pure ANE compute - no Metal, no GPU. Includes detailed benchmarks, MIL program generation, IOSurface optimization, and channel-first layouts. Perfect for Apple Silicon ML researchers pushing hardware boundaries.

Read more Original

Practical Open Source Projects

Apple's Containerization: Linux Containers on macOS

June 11, 2025

Tags:

macOS Containerization Linux Containers Apple Silicon Swift

Discover Apple's open-source Swift package, 'Containerization,' enabling seamless execution of Linux containers on macOS. This project leverages Virtualization.framework on Apple silicon to provide efficient container management, OCI image handling, and lightweight virtual machines. Learn how developers can utilize this tool to streamline their workflows, interact with remote registries, and even run x86_64 containers using Rosetta 2. Dive into the features, requirements, and build processes of this innovative solution designed for modern development environments, offering sub-second boot times and flexible kernel configurations.

Read more Original

Categories

Posts tagged with: Apple Silicon

397B MoE on MacBook: 4.4 t/s Flash-MoE Engine

TurboQuant+: 6.4x KV Cache Compression for LLMs

oMLX: Mac Menu Bar LLM Server with SSD Cache

Train Transformers on Apple Neural Engine - ANE GitHub

Apple's Containerization: Linux Containers on macOS