Posts tagged with: Apple Silicon

Content related to Apple Silicon

397B MoE on MacBook: 4.4 t/s Flash-MoE Engine

April 03, 2026

Flash-MoE runs Qwen3.5-397B-A17B (397 billion parameters) on a MacBook Pro M3 Max with 48GB RAM at 4.4+ tokens/second. Pure C/Metal inference streams 209GB model from SSD with production-quality output including tool calling. Key innovations: FMA-optimized dequant kernels (+12% speed), OS page cache expert streaming, deferred GPU compute, and hand-tuned Metal shaders. 58 experiments documented with full technical paper.

TurboQuant+: 6.4x KV Cache Compression for LLMs

March 29, 2026

TurboQuant+ implements ICLR 2026's breakthrough KV cache compression, achieving 4.6-6.4x compression with near q8_0 quality and speed. Features turbo2/turbo3/turbo4 formats, attention-gated Sparse V decoding (+22.8% decode speed), and full llama.cpp Metal integration. Run Qwen 3.5 35B-A3B on M5 Max with 93.9% NIAH retrieval and 1.02x q8_0 prefill speed. Complete Python prototype with 511+ tests and community validation across Apple Silicon, NVIDIA, and AMD.

oMLX: Mac Menu Bar LLM Server with SSD Cache

March 10, 2026

Discover oMLX, the ultimate local LLM server for Apple Silicon Macs. Run LLMs, VLMs, and embeddings from your menu bar with continuous batching, tiered KV caching (RAM + SSD), and multi-model serving. Features admin dashboard, OpenAI API compatibility, Claude Code optimization, and one-click model downloads from Hugging Face. Install via DMG, Homebrew, or source – perfect for developers wanting production-grade local AI without cloud costs.

Train Transformers on Apple Neural Engine - ANE GitHub

March 03, 2026

Discover ANE Training: a groundbreaking open-source project that reverse-engineers Apple's Neural Engine to run full transformer training (forward + backward passes) directly on M4 hardware. Achieving 9.3ms/step and 1.78 TFLOPS sustained performance with pure ANE compute - no Metal, no GPU. Includes detailed benchmarks, MIL program generation, IOSurface optimization, and channel-first layouts. Perfect for Apple Silicon ML researchers pushing hardware boundaries.

Apple's Containerization: Linux Containers on macOS

June 11, 2025

Discover Apple's open-source Swift package, 'Containerization,' enabling seamless execution of Linux containers on macOS. This project leverages Virtualization.framework on Apple silicon to provide efficient container management, OCI image handling, and lightweight virtual machines. Learn how developers can utilize this tool to streamline their workflows, interact with remote registries, and even run x86_64 containers using Rosetta 2. Dive into the features, requirements, and build processes of this innovative solution designed for modern development environments, offering sub-second boot times and flexible kernel configurations.