VoxCPM2: 2B Multilingual TTS with Voice Cloning & Design
VoxCPM2: Revolutionizing TTS with Tokenizer-Free Architecture
The Next Generation of Speech Synthesis
VoxCPM2 represents a quantum leap in text-to-speech technology. This 2B parameter model, built on MiniCPM-4 backbone, eliminates traditional tokenization bottlenecks through its innovative diffusion autoregressive architecture. Trained on 2M+ hours of multilingual speech, it delivers studio-quality 48kHz audio across 30 languages without requiring language tags.
β¨ Key Innovations
π¨ Voice Design from Text Alone
Create entirely new voices using natural language: (Young female, warm gentle tone, slight smile) generates unique voices without reference audio.
ποΈ Controllable Voice Cloning
Clone any voice from short clips while controlling emotion, pace, and style: (slightly faster, cheerful) preserves timbre while adjusting expression.
ποΈ Ultimate Cloning Fidelity
Provide reference audio + transcript for pixel-perfect vocal reproduction, capturing every nuance of timbre, rhythm, and emotion.
π Lightning-Fast Implementation
from voxcpm import VoxCPM
import soundfile as sf
model = VoxCPM.from_pretrained("openbmb/VoxCPM2")
wav = model.generate("Hello from VoxCPM2!", cfg_value=2.0)
sf.write("output.wav", wav, 48000)
Performance: RTF ~0.13 on RTX 4090 with Nano-vLLM (batched serving), ~8GB VRAM.
π 30-Language Coverage
Arabic, Chinese dialects (8+), English, French, German, Hindi, Japanese, Korean, Spanish, Thai, Vietnamese + 20 more.
π Benchmark Dominance
| Model | Params | EN WER | ZH CER | SIM Score |
|---|---|---|---|---|
| VoxCPM2 | 2B | 1.84% | 0.97% | 85.4% (EN) |
| Qwen3-TTS | 1.7B | 1.23% | 1.22% | 77.5% |
| FishAudio S2 | 4B | 0.99% | 0.54% | 79.7% |
π§ Production Ready
- CLI:
voxcpm clone --reference-audio voice.wav - Web Demo:
python app.py - LoRA Fine-tuning: 5-10min audio adapts to new speakers
- Nano-vLLM: High-throughput async serving
π¦ Get Started Now
pip install voxcpm
Fully Apache 2.0 licensed - commercial use welcome. Join 10K+ stars on GitHub and experience SOTA TTS today!