Chatterbox TTS: Open Source Speech Synthesis Powerhouse

Unleash Your Content with Chatterbox: The Advanced Open-Source TTS Model

Resemble AI is proud to present Chatterbox, a groundbreaking open-source Text-to-Speech (TTS) model designed to bring your creative projects to life. Licensed under the permissive MIT license, Chatterbox has been meticulously developed and benchmarked, consistently outperforming established closed-source systems like ElevenLabs in user evaluations. Whether you're developing engaging video content, interactive games, or sophisticated AI agents, Chatterbox offers a powerful and flexible solution for generating high-quality synthetic speech.

Key Features and Capabilities

Chatterbox stands out with its impressive array of features:

  • State-of-the-Art Zero-Shot TTS: Experience top-tier speech synthesis with minimal training data, making voice cloning more accessible than ever.
  • Powerful 0.5B Llama Backbone: Built upon robust AI architecture for exceptional performance and natural speech generation.
  • Unique Exaggeration/Intensity Control: Fine-tune the expressiveness of the synthesized speech, allowing for dramatic or subtle vocal performances.
  • Ultra-Stable Alignment-Informed Inference: Ensures consistent and high-quality output, reducing artifacts and unwanted variations.
  • Extensive Training Data: Trained on a massive 0.5 million hours of cleaned data, contributing to its remarkable naturalness.
  • Watermarked Outputs: Incorporates built-in PerTh (Perceptual Threshold) Watermarking for responsible AI development, ensuring detectability even after audio manipulation.
  • Easy Voice Conversion Script: Includes a convenient script for seamless voice conversion tasks.
  • Outperforms ElevenLabs: Proven to deliver superior results compared to leading commercial alternatives.

Getting Started with Chatterbox

Integrating Chatterbox into your workflow is straightforward. You can install it directly using pip:

pip install chatterbox-tts

Alternatively, for more advanced usage or customization, you can install it from source:

# Create and activate a new conda environment
conda create -yn chatterbox python=3.11
conda activate chatterbox

# Clone the repository and install
git clone https://github.com/resemble-ai/chatterbox.git
cd chatterbox
pip install -e .

The project is primarily developed and tested on Python 3.11 on Debian 11 OS.

Basic Usage Example

Here’s a simple example demonstrating how to generate speech using Chatterbox:

import torchaudio as ta
from chatterbox.tts import ChatterboxTTS

# Initialize the model on CUDA device
model = ChatterboxTTS.from_pretrained(device="cuda")

# Text to synthesize
text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."

# Generate speech
wav = model.generate(text)

# Save the synthesized audio
ta.save("test-1.wav", wav, model.sr)
further details on advanced usage and voice prompting can be found in the `example_tts.py` and `example_vc.py` scripts within the repository.

Responsible AI and Watermarking

Chatterbox is committed to responsible AI development. Every audio file generated includes an imperceptible neural watermark using Resemble AI's Perth Watermarker. This watermark is robust against common audio manipulations, including MP3 compression and editing, ensuring nearly 100% detection accuracy for ethical usage tracking.

To extract the watermark:

import perth
import librosa

AUDIO_PATH = "YOUR_FILE.wav"

# Load the watermarked audio
watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)

# Initialize watermarker
watermarker = perth.PerthImplicitWatermarker()

# Extract watermark
watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)
print(f"Extracted watermark: {watermark}")
# Output indicates 0.0 (no watermark) or 1.0 (watermarked)

Join the Community

Resemble AI invites you to join their Discord community to collaborate, share insights, and build awesome projects together. Embrace the power of open-source TTS with Chatterbox and elevate your audio content.

Original Article: View Original

Share this article