Netflix VOID: Revolutionary Video Object Removal with Physics Awareness

Netflix has open-sourced VOID (Video Object and Interaction Deletion), a state-of-the-art model that goes beyond traditional video inpainting. VOID doesn't just erase objects – it removes their physical interactions with the environment, creating incredibly realistic results.

What Makes VOID Different?

Traditional video editing tools struggle with secondary motion effects. Remove a person holding a guitar? The guitar stays floating. VOID solves this:

Primary object removal (person = gone)
Interaction regions (guitar falls naturally due to physics)
Quadmask system (0=object, 63=overlap, 127=affected, 255=keep)

https://github.com/Netflix/void-model/raw/main/assets/teaser-with-name.mp4

🚀 Quick Start (Colab Ready)

# 1. Open Colab notebook (40GB+ VRAM recommended)
# 2. Models auto-download from Hugging Face
# 3. Process sample video in minutes

Live Demo: Gradio Interface

🛠️ Technical Breakdown

Two-Stage Pipeline

Pass 1: Base inpainting with VOID transformer
Pass 2: Warped-noise refinement for temporal consistency

Smart Mask Generation

SAM2 for precise segmentation
Gemini VLM reasons about interaction regions
GUI editor for manual mask refinement

Input Format

my-video/
├── input_video.mp4
├── quadmask_0.mp4
└── prompt.json  # {"bg": "A table with a cup on it."}

Training from Scratch

VOID ships with complete data generation pipelines:

HUMOTO: Human-object physics using Blender
Kubric: Object-only interactions

Generate paired counterfactual videos (with/without object) and train both passes.

Real-World Applications

VFX cleanup: Remove unwanted elements with realistic physics
Privacy protection: Anonymize people while preserving scene dynamics
Creative video editing: Rearrange scenes with natural motion

Community Extensions

Gradio Web Demo
Star History: 488⭐ in days
Apache 2.0 license

Get Started Today

Clone: git clone https://github.com/Netflix/void-model
Install: pip install -r requirements.txt
Download models from Hugging Face
Run Colab notebook

Paper: arXiv:2604.02296

VOID represents the cutting edge of video understanding – combining VLM reasoning, SAM2 segmentation, and diffusion models for unprecedented video manipulation capabilities.