FunCineForge: Zero-Shot Movie Dubbing Pipeline
Discover FunCineForge, the groundbreaking open-source toolkit for creating large-scale movie dubbing datasets and deploying zero-shot dubbing models. This end-to-end pipeline handles video processing, speech separation, speaker diarization, and multimodal corrections using MLLMs. Build CineDub-CN/EN datasets from raw footage and generate high-quality dubs with perfect lip-sync and timbre matching. Includes inference code, demo samples, and supports both Chinese and English. Perfect for AI researchers and content creators.
FunCineForge: Revolutionizing Zero-Shot Movie Dubbing with Open-Source Power
The Future of Automated Dubbing is Here
FunCineForge from FunAudioLLM represents a breakthrough in AI-driven movie dubbing. This comprehensive open-source project delivers both a unified dataset pipeline and a multimodal LLM-based dubbing model that excels across diverse cinematic scenes โ from monologues and narration to complex multi-speaker dialogues.
What Makes FunCineForge Special?
๐ฌ End-to-End Dataset Pipeline
The pipeline transforms raw video footage into production-ready dubbing datasets:
- Video Normalization & Trimming (
normalize_trim.py) - Speech Separation (vocals from background music)
- Video Clipping with bilingual subtitle support (Chinese/English)
- Speaker Diarization using multimodal active speaker detection
- CoT Correction with MLLMs (Gemini-3-Pro) for 0.94% CER accuracy
๐ค State-of-the-Art Dubbing Model
- Superior audio quality
- Perfect lip synchronization
- Seamless timbre transitions
- Excellent instruction following
๐ Quick Start Guide
git clone [email protected]:FunAudioLLM/FunCineForge.git
conda create -n FunCineForge python=3.10
conda activate FunCineForge
python setup.py
Dataset Processing:
python normalize_trim.py --root datasets/raw_zh --intro 10 --outro 10
cd speech_separation && python run.py --root datasets/clean/zh
cd ../video_clip && bash run.sh --stage 1 --stop_stage 2 --lang zh
Inference:
cd exps
bash infer.sh
๐ Key Results
- CineDub-CN: First large-scale Chinese TV dubbing dataset
- CER reduced from 4.53% โ 0.94%
- Speaker diarization error: 8.38% โ 1.20%
- Consumer-grade GPU inference
๐ฏ Who Should Use This?
- AI Researchers building speech/video datasets
- Content Creators needing automated dubbing
- Film Studios exploring localization solutions
- Developers working on multimodal TTS
Recent Updates (March 2026)
- โ Open-sourced inference code + checkpoints
- โ English dataset (CineDub-EN) released
- โ Bilingual pipeline support
- โ Demo samples available at funcineforge.github.io
๐ Citation
@misc{liu2026funcineforgeunifieddatasettoolkit,
title={FunCineForge: A Unified Dataset Toolkit...},
author={Jiaxuan Liu and Yang Xiang...}
โญ Star the repo and join the Tongyi Lab Speech Team's mission to make professional dubbing accessible to everyone.