FunCineForge: Zero-Shot Movie Dubbing Pipeline

Discover FunCineForge, the groundbreaking open-source toolkit for creating large-scale movie dubbing datasets and deploying zero-shot dubbing models. This end-to-end pipeline handles video processing, speech separation, speaker diarization, and multimodal corrections using MLLMs. Build CineDub-CN/EN datasets from raw footage and generate high-quality dubs with perfect lip-sync and timbre matching. Includes inference code, demo samples, and supports both Chinese and English. Perfect for AI researchers and content creators.

FunCineForge: Revolutionizing Zero-Shot Movie Dubbing with Open-Source Power

The Future of Automated Dubbing is Here

FunCineForge from FunAudioLLM represents a breakthrough in AI-driven movie dubbing. This comprehensive open-source project delivers both a unified dataset pipeline and a multimodal LLM-based dubbing model that excels across diverse cinematic scenes โ€“ from monologues and narration to complex multi-speaker dialogues.

What Makes FunCineForge Special?

๐ŸŽฌ End-to-End Dataset Pipeline

The pipeline transforms raw video footage into production-ready dubbing datasets:

  1. Video Normalization & Trimming (normalize_trim.py)
  2. Speech Separation (vocals from background music)
  3. Video Clipping with bilingual subtitle support (Chinese/English)
  4. Speaker Diarization using multimodal active speaker detection
  5. CoT Correction with MLLMs (Gemini-3-Pro) for 0.94% CER accuracy

๐Ÿค– State-of-the-Art Dubbing Model

  • Superior audio quality
  • Perfect lip synchronization
  • Seamless timbre transitions
  • Excellent instruction following

๐Ÿš€ Quick Start Guide

git clone [email protected]:FunAudioLLM/FunCineForge.git
conda create -n FunCineForge python=3.10
conda activate FunCineForge
python setup.py

Dataset Processing:

python normalize_trim.py --root datasets/raw_zh --intro 10 --outro 10
cd speech_separation && python run.py --root datasets/clean/zh
cd ../video_clip && bash run.sh --stage 1 --stop_stage 2 --lang zh

Inference:

cd exps
bash infer.sh

๐Ÿ“Š Key Results

  • CineDub-CN: First large-scale Chinese TV dubbing dataset
  • CER reduced from 4.53% โ†’ 0.94%
  • Speaker diarization error: 8.38% โ†’ 1.20%
  • Consumer-grade GPU inference

๐ŸŽฏ Who Should Use This?

  • AI Researchers building speech/video datasets
  • Content Creators needing automated dubbing
  • Film Studios exploring localization solutions
  • Developers working on multimodal TTS

Recent Updates (March 2026)

  • โœ… Open-sourced inference code + checkpoints
  • โœ… English dataset (CineDub-EN) released
  • โœ… Bilingual pipeline support
  • โœ… Demo samples available at funcineforge.github.io

๐Ÿ“š Citation

@misc{liu2026funcineforgeunifieddatasettoolkit,
title={FunCineForge: A Unified Dataset Toolkit...},
author={Jiaxuan Liu and Yang Xiang...}

โญ Star the repo and join the Tongyi Lab Speech Team's mission to make professional dubbing accessible to everyone.

Explore FunCineForge on GitHub | Dataset Demos