Categories
- All Posts 549
- Practical Open Source Projects 478
- Tutorial Articles 22
- Online Utilities 13
- AI news 7
- Tiny Startups Showcase 7
- Claude Code Skills 6
- Prompt Templates 5
- Hugging Face Spaces 3
- OpenClaw Use Cases 3
- LLM Learning Resources 1
- Online AI Image Tools 1
- OpenClaw Master Skills Collection 1
- Rust Training Resources 1
- AI Short Drama Tools 1
- My Favorites 0
Posts tagged with: Real-time AI
Content related to Real-time AI
Helios: 14B Real-Time Video Gen at 19.5 FPS
Discover Helios, the breakthrough 14B parameter video generation model from PKU-YuanGroup that generates minute-scale, high-quality videos at 19.5 FPS on a single H100 GPU. No anti-drifting tricks, no acceleration hacks - just pure architectural innovation. Supports T2V, I2V, V2V, and interactive generation with Day-0 support for Diffusers, SGLang, vLLM-Omni, and Ascend NPU. Run it locally with ~6GB VRAM using group offloading. Complete training code and three model variants (Base, Mid, Distilled) available now.
WhisperLiveKit: Real-time Local Speech-to-Text
Discover WhisperLiveKit, a powerful open-source project enabling real-time, fully local speech-to-text, translation, and speaker diarization. It leverages state-of-the-art research like SimulStreaming and WhisperStreaming for unparalleled accuracy and low latency, overcoming the limitations of traditional audio chunk processing. With a user-friendly server and web UI, WhisperLiveKit is ideal for applications ranging from meeting transcriptions and accessibility tools to content creation and customer service analysis. The project offers straightforward installation via pip, various configuration options for different models and backends, and robust deployment guides for both CPU and GPU environments using Docker.
TEN VAD: High-Performance, Lightweight Voice Activity Detector
Discover TEN VAD, an advanced, low-latency Voice Activity Detector (VAD) from the TEN framework. Designed for real-time conversational AI, TEN VAD offers superior precision and efficiency compared to industry standards like WebRTC VAD and Silero VAD. It boasts a lightweight footprint, cross-platform compatibility (Linux, Windows, macOS, Android, iOS, Web via WASM), and comprehensive language support including Python, JS, and C. This open-source project is ideal for developers building agent-friendly, high-performance voice applications, providing robust capabilities for accurate speech detection and reduced latency in human-agent interactions. Explore its features, installation guides, and how it fits into the broader TEN ecosystem for multimodal conversational AI.
Airi: Open-Source AI VTuber for Real-Time Interaction
Discover Airi, an ambitious open-source project aiming to create AI-driven virtual characters capable of real-time voice chat, even playing Minecraft and Factorio. Built with web technologies like WebGPU and WebAudio, Airi is designed for accessibility, running seamlessly in browsers and on desktop. This project stands out by inviting developers, artists, and designers to contribute to its vision of bringing AI waifus and virtual personalities into our digital worlds. Learn about its current capabilities, development roadmap, and how you can get involved in shaping the future of AI-powered virtual companions.