Categories
- All Posts 549
- Practical Open Source Projects 478
- Tutorial Articles 22
- Online Utilities 13
- AI news 7
- Tiny Startups Showcase 7
- Claude Code Skills 6
- Prompt Templates 5
- Hugging Face Spaces 3
- OpenClaw Use Cases 3
- LLM Learning Resources 1
- Online AI Image Tools 1
- OpenClaw Master Skills Collection 1
- Rust Training Resources 1
- AI Short Drama Tools 1
- My Favorites 0
Posts tagged with: Apple Silicon
Content related to Apple Silicon
397B MoE on MacBook: 4.4 t/s Flash-MoE Engine
Flash-MoE runs Qwen3.5-397B-A17B (397 billion parameters) on a MacBook Pro M3 Max with 48GB RAM at 4.4+ tokens/second. Pure C/Metal inference streams 209GB model from SSD with production-quality output including tool calling. Key innovations: FMA-optimized dequant kernels (+12% speed), OS page cache expert streaming, deferred GPU compute, and hand-tuned Metal shaders. 58 experiments documented with full technical paper.
TurboQuant+: 6.4x KV Cache Compression for LLMs
TurboQuant+ implements ICLR 2026's breakthrough KV cache compression, achieving 4.6-6.4x compression with near q8_0 quality and speed. Features turbo2/turbo3/turbo4 formats, attention-gated Sparse V decoding (+22.8% decode speed), and full llama.cpp Metal integration. Run Qwen 3.5 35B-A3B on M5 Max with 93.9% NIAH retrieval and 1.02x q8_0 prefill speed. Complete Python prototype with 511+ tests and community validation across Apple Silicon, NVIDIA, and AMD.
oMLX: Mac Menu Bar LLM Server with SSD Cache
Discover oMLX, the ultimate local LLM server for Apple Silicon Macs. Run LLMs, VLMs, and embeddings from your menu bar with continuous batching, tiered KV caching (RAM + SSD), and multi-model serving. Features admin dashboard, OpenAI API compatibility, Claude Code optimization, and one-click model downloads from Hugging Face. Install via DMG, Homebrew, or source β perfect for developers wanting production-grade local AI without cloud costs.
Train Transformers on Apple Neural Engine - ANE GitHub
Discover ANE Training: a groundbreaking open-source project that reverse-engineers Apple's Neural Engine to run full transformer training (forward + backward passes) directly on M4 hardware. Achieving 9.3ms/step and 1.78 TFLOPS sustained performance with pure ANE compute - no Metal, no GPU. Includes detailed benchmarks, MIL program generation, IOSurface optimization, and channel-first layouts. Perfect for Apple Silicon ML researchers pushing hardware boundaries.
Apple's Containerization: Linux Containers on macOS
Discover Apple's open-source Swift package, 'Containerization,' enabling seamless execution of Linux containers on macOS. This project leverages Virtualization.framework on Apple silicon to provide efficient container management, OCI image handling, and lightweight virtual machines. Learn how developers can utilize this tool to streamline their workflows, interact with remote registries, and even run x86_64 containers using Rosetta 2. Dive into the features, requirements, and build processes of this innovative solution designed for modern development environments, offering sub-second boot times and flexible kernel configurations.