VoxCPM2: 2B Multilingual TTS with Voice Cloning & Design

VoxCPM2: Revolutionizing TTS with Tokenizer-Free Architecture

The Next Generation of Speech Synthesis

VoxCPM2 represents a quantum leap in text-to-speech technology. This 2B parameter model, built on MiniCPM-4 backbone, eliminates traditional tokenization bottlenecks through its innovative diffusion autoregressive architecture. Trained on 2M+ hours of multilingual speech, it delivers studio-quality 48kHz audio across 30 languages without requiring language tags.

✨ Key Innovations

🎨 Voice Design from Text Alone

Create entirely new voices using natural language: (Young female, warm gentle tone, slight smile) generates unique voices without reference audio.

πŸŽ›οΈ Controllable Voice Cloning

Clone any voice from short clips while controlling emotion, pace, and style: (slightly faster, cheerful) preserves timbre while adjusting expression.

πŸŽ™οΈ Ultimate Cloning Fidelity

Provide reference audio + transcript for pixel-perfect vocal reproduction, capturing every nuance of timbre, rhythm, and emotion.

πŸš€ Lightning-Fast Implementation

from voxcpm import VoxCPM
import soundfile as sf

model = VoxCPM.from_pretrained("openbmb/VoxCPM2")
wav = model.generate("Hello from VoxCPM2!", cfg_value=2.0)
sf.write("output.wav", wav, 48000)

Performance: RTF ~0.13 on RTX 4090 with Nano-vLLM (batched serving), ~8GB VRAM.

🌍 30-Language Coverage

Arabic, Chinese dialects (8+), English, French, German, Hindi, Japanese, Korean, Spanish, Thai, Vietnamese + 20 more.

πŸ“Š Benchmark Dominance

Model Params EN WER ZH CER SIM Score
VoxCPM2 2B 1.84% 0.97% 85.4% (EN)
Qwen3-TTS 1.7B 1.23% 1.22% 77.5%
FishAudio S2 4B 0.99% 0.54% 79.7%

πŸ”§ Production Ready

  • CLI: voxcpm clone --reference-audio voice.wav
  • Web Demo: python app.py
  • LoRA Fine-tuning: 5-10min audio adapts to new speakers
  • Nano-vLLM: High-throughput async serving

πŸ“¦ Get Started Now

pip install voxcpm

Fully Apache 2.0 licensed - commercial use welcome. Join 10K+ stars on GitHub and experience SOTA TTS today!

Live Playground | Hugging Face Weights

Original Article: View Original

Share this article