Netflix VOID: Remove Objects & Interactions from Videos
Netflix VOID: Revolutionary Video Object Removal with Physics Awareness
Netflix has open-sourced VOID (Video Object and Interaction Deletion), a state-of-the-art model that goes beyond traditional video inpainting. VOID doesn't just erase objects β it removes their physical interactions with the environment, creating incredibly realistic results.
What Makes VOID Different?
Traditional video editing tools struggle with secondary motion effects. Remove a person holding a guitar? The guitar stays floating. VOID solves this:
- Primary object removal (person = gone)
- Interaction regions (guitar falls naturally due to physics)
- Quadmask system (0=object, 63=overlap, 127=affected, 255=keep)
https://github.com/Netflix/void-model/raw/main/assets/teaser-with-name.mp4
π Quick Start (Colab Ready)
# 1. Open Colab notebook (40GB+ VRAM recommended)
# 2. Models auto-download from Hugging Face
# 3. Process sample video in minutes
Live Demo: Gradio Interface
π οΈ Technical Breakdown
Two-Stage Pipeline
- Pass 1: Base inpainting with VOID transformer
- Pass 2: Warped-noise refinement for temporal consistency
Smart Mask Generation
- SAM2 for precise segmentation
- Gemini VLM reasons about interaction regions
- GUI editor for manual mask refinement
Input Format
my-video/
βββ input_video.mp4
βββ quadmask_0.mp4
βββ prompt.json # {"bg": "A table with a cup on it."}
Training from Scratch
VOID ships with complete data generation pipelines:
- HUMOTO: Human-object physics using Blender
- Kubric: Object-only interactions
Generate paired counterfactual videos (with/without object) and train both passes.
Real-World Applications
- VFX cleanup: Remove unwanted elements with realistic physics
- Privacy protection: Anonymize people while preserving scene dynamics
- Creative video editing: Rearrange scenes with natural motion
Community Extensions
- Gradio Web Demo
- Star History: 488β in days
- Apache 2.0 license
Get Started Today
- Clone:
git clone https://github.com/Netflix/void-model - Install:
pip install -r requirements.txt - Download models from Hugging Face
- Run Colab notebook
Paper: arXiv:2604.02296
VOID represents the cutting edge of video understanding β combining VLM reasoning, SAM2 segmentation, and diffusion models for unprecedented video manipulation capabilities.