Categories
- All Posts 497
- Practical Open Source Projects 436
- Tutorial Articles 22
- Online Utilities 12
- AI news 7
- Tiny Startups Showcase 7
- Prompt Templates 4
- Hugging Face Spaces 3
- OpenClaw Use Cases 2
- LLM Learning Resources 1
- Online AI Image Tools 1
- OpenClaw Master Skills Collection 1
- Rust Training Resources 1
- My Favorites 0
Posts tagged with: KV cache compression
Content related to KV cache compression
TurboQuant+: 6.4x KV Cache Compression for LLMs
March 29, 2026
TurboQuant+ implements ICLR 2026's breakthrough KV cache compression, achieving 4.6-6.4x compression with near q8_0 quality and speed. Features turbo2/turbo3/turbo4 formats, attention-gated Sparse V decoding (+22.8% decode speed), and full llama.cpp Metal integration. Run Qwen 3.5 35B-A3B on M5 Max with 93.9% NIAH retrieval and 1.02x q8_0 prefill speed. Complete Python prototype with 511+ tests and community validation across Apple Silicon, NVIDIA, and AMD.