Voice-Pro: An Open-Source All-in-One AI Audio & Dubbing Suite
Voice-Pro is a powerful, open-source Gradio-based WebUI that integrates state-of-the-art voice cloning, transcription, and translation tools into one workflow.
For creators and developers, the current landscape of AI audio tools is fragmented. You often find yourself jumping between a YouTube downloader, a separate vocal isolation tool, a transcription service, and a voice cloning platform. Voice-Pro changes that by consolidating these essential tasks into a single, cohesive Gradio-based WebUI.
Originally a commercial project, the developers have recently open-sourced the entire codebase, making it a powerful, free alternative to subscription-heavy platforms like ElevenLabs or Descript.
What is Voice-Pro?
Voice-Pro is designed as a "Dubbing Studio" that handles the entire pipeline of multimedia content creation. Whether you are a podcaster looking to translate your content into multiple languages or a developer building an automated video processing pipeline, this tool provides a unified interface for the best open-source models available today.
Core Capabilities:
- Audio Extraction: Built-in
yt-dlpsupport for downloading and processing YouTube content directly. - Vocal Isolation: Uses Demucs to cleanly separate vocals from background music, essential for high-quality voice cloning.
- Speech-to-Text (STT): Supports a variety of Whisper implementations, including
Faster-Whisper,Whisper-Timestamped, andWhisperXfor high-accuracy, word-level transcription. - Zero-Shot Voice Cloning: Features cutting-edge models like F5-TTS, E2-TTS, and CosyVoice, allowing you to clone voices with minimal reference audio.
- Text-to-Speech (TTS): Includes
Edge-TTSfor high-quality, natural-sounding speech andkokoro, a high-performance TTS model currently trending in the HuggingFace arena. - Translation: Integrated
Deep-Translatorfor instant, multilingual support across 100+ languages.
Why Developers Should Care
Unlike SaaS platforms that charge per-minute fees, Voice-Pro is a self-hosted solution. If you have an NVIDIA GPU (with at least 4GB-8GB VRAM), you can run these models locally without worrying about API costs or data privacy concerns.
Technical Stack Highlights:
- Framework: Built on Python 3.10.15 with Gradio 5.14.0.
- Compute: Optimized for CUDA 12.4, ensuring fast inference for heavy tasks like voice cloning and transcription.
- Extensibility: Because it is open-source, you can modify the
start-voice.pyorone_click.pyscripts to integrate your own custom models or fine-tuned weights.
Getting Started
Installation is designed to be "one-click" for Windows users, though it is also compatible with Linux and Mac environments.
- Clone the repository:
git clone https://github.com/abus-aikorea/voice-pro.git - Configure the environment:
Run
configure.bat(orconfigure.shon Linux/Mac). This script handles the heavy lifting of setting up Git, FFmpeg, and the necessary CUDA dependencies. - Launch the UI:
Run
start.bat. On the first run, the application will download the necessary model weights (such as the 9GB CosyVoice model), so ensure you have a stable internet connection.
Troubleshooting & Optimization
- CUDA Out-Of-Memory (OOM): If you hit memory limits, try setting the
Denoiselevel to 0 or 1. Additionally, usingintcompute types instead offloatcan significantly reduce VRAM usage at the cost of slight quality degradation. - Subtitle Quality: If your transcriptions aren't meeting your standards, remember that the model size matters. While
largemodels provide the best accuracy, they require more compute. Experiment withmediumorsmallmodels if you are processing long-form content on consumer hardware.
Final Thoughts
Voice-Pro represents the best of the open-source AI community. By wrapping complex models like F5-TTS and WhisperX into a user-friendly WebUI, it lowers the barrier to entry for high-quality content production. Whether you are using it for personal projects or as a base for your own AI-powered application, it is a repository worth exploring.
Check out the project on GitHub to contribute or view the latest updates.