Voice‑Pro: Open‑Source AI Dubbing Studio for Multilingual Media
Voice‑Pro: The All‑In‑One Open‑Source AI Dubbing Studio
The world of AI‑powered media creation is expanding rapidly. If you’ve been hunting for a free, open‑source solution that unifies text‑to‑speech (TTS), voice cloning, real‑time translation, and multimedia processing—look no further than Voice‑Pro.
What is Voice‑Pro?
- Open‑source Web UI built on Gradio 5.14.0, released under the GPL‑3.0 license.
- Speech recognition powered by Whisper, Faster‑Whisper, Whisper‑Timestamped, and WhisperX.
- Zero‑shot voice cloning: E2‑TTS, F5‑TTS, CosyVoice, and Kokoro.
- Text‑to‑speech: Edge‑TTS (100+ languages, 400+ voices), Kokoro (ranked #2 on HF TTS Arena), and optional paid Azure TTS.
- Multilingual translation with Deep‑Translator (100+ languages, optional Azure Translator).
- YouTube downloader (yt‑dlp) + audio isolation (Demucs) + subtitle generation.
- Supports Windows (NVIDIA GPU), macOS, and Linux.
Who Can Benefit?
- Podcasters & YouTubers: Create dubbed episodes with AI voices without paying for subscription plans.
- Educators & e‑learning creators: Generate multilingual subtitles and translations for videos.
- Developers & researchers: Experiment with cutting‑edge TTS models in a sandbox.
- Content creators: Produce karaoke tracks or AI‑generated audiobooks.
Getting Started – Installation
Prerequisites
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10/11, macOS 10.15+, Ubuntu 20.04+ | All |
| GPU | None for CPU, otherwise NVIDIA CUDA 12.4 | NVIDIA 8 GB+ VRAM |
| RAM | 4 GB | 8 GB+ |
| Disk | 20 GB free | 30 GB+ |
Clone the Repo
git clone https://github.com/abus-aikorea/voice-pro.git
cd voice-pro
Configure (Windows)
configure.bat # installs ffmpeg, checks CUDA, downloads models
Configure (macOS/Linux)
chmod +x configure.sh
./configure.sh
Tip: The first run will download large model checkpoints (~10 GB). Ensure a fast Internet connection.
Run the WebUI
start.bat # Windows
./start.sh # macOS/Linux
http://127.0.0.1:7870/. Open it in your browser.
Using Voice‑Pro – Step by Step
- Upload Video or Audio – In the Dubbing Studio tab, paste a YouTube URL or upload an MP4/WAV file.
- Extract Audio – The tool automatically calls yt‑dlp to pull video audio and Demucs to separate vocals.
- Transcribe – WhisperX generates a high‑quality transcript in your target language (choose from >100 options).
- Translate – Instant translation to any language using Deep‑Translator.
- Choose a Voice – Pick an existing voice via Edge‑TTS or clone a reference sample with F5‑TTS/CosyVoice – no fine‑tuning required.
- Synthesize – TTS with adjustable speed, volume, pitch. Export as WAV/FLAC/MP3.
- Sync & Export – Automatically creates SRT subtitles, uploads them back to YouTube, or saves locally.
Advanced Features
- Zero‑shot cloning: No model training, just supply a short audio clip.
- Custom compute type: Switch between float32, float16, or int8 (quantized) to balance quality vs. GPU usage.
- Real‑time demos: On the Live Translation tab, speak into the mic and watch subtitles appear in real time.
- API‑like interface: The Gradio server can be wrapped by other Python scripts; see
app/voice_pro.pyfor examples. - Community voice library: Contributors can add new celebrity voices via GitHub Issues; a curated list is hosted in
celebrities30sREADME.
Why Voice‑Pro Outperforms SaaS
Voice‑Pro removes subscription fatigue: - Free for all core features – no per‑minute costs. - Open‑source – you can modify the TTS pipeline or integrate your own models. - GPU flexibility – run on a laptop or deploy to a cloud GPU instance. - Feature parity – Supports the same TTS engines that commercial services like ElevenLabs use, plus deeper controls.
Troubleshooting Quick‑Fixes
| Issue | Fix |
|---|---|
| CUDA OOM | Reduce denoise level or switch to int8 compute |
| Whisper errors | Ensure requirements-voice-gpu.txt or -cpu.txt is installed; delete installer_files then rerun configure |
| Subtitles off‑sync | Use the WhisperX tab to re‑align timestamps |
Community & Next Steps
- Check out the GitHub Discussions for feature requests and support.
- Contribute by adding new voice samples or optimizing existing models.
- Experiment with adding your own Hugging Face pipelines – the modular design makes it straightforward.
- Consider sponsoring the repo or buying a “premium” upgrade (Azure TTS/Translator) if you need enterprise‑grade quality.
Final Word
Voice‑Pro is a powerful, zero‑cost alternative to pricey AI dubbing services. Its modular open‑source nature means you’re not locked into a vendor; you own the code, the models, and the output. Whether you’re a YouTuber looking to dub a video in 12 languages, a researcher's lab needing fast prototyping of voice clones, or a student in a language class, Voice‑Pro gives you the tools to turn speech and text into high‑fidelity audio in minutes.
Get started today, and bring the future of AI audio to your projects—without paying a dime.