Netflix VOID: Remove Objects & Interactions from Videos

Netflix VOID: Revolutionary Video Object Removal with Physics Awareness

Netflix has open-sourced VOID (Video Object and Interaction Deletion), a state-of-the-art model that goes beyond traditional video inpainting. VOID doesn't just erase objects – it removes their physical interactions with the environment, creating incredibly realistic results.

What Makes VOID Different?

Traditional video editing tools struggle with secondary motion effects. Remove a person holding a guitar? The guitar stays floating. VOID solves this:

  • Primary object removal (person = gone)
  • Interaction regions (guitar falls naturally due to physics)
  • Quadmask system (0=object, 63=overlap, 127=affected, 255=keep)

https://github.com/Netflix/void-model/raw/main/assets/teaser-with-name.mp4

πŸš€ Quick Start (Colab Ready)

# 1. Open Colab notebook (40GB+ VRAM recommended)
# 2. Models auto-download from Hugging Face
# 3. Process sample video in minutes

Live Demo: Gradio Interface

πŸ› οΈ Technical Breakdown

Two-Stage Pipeline

  1. Pass 1: Base inpainting with VOID transformer
  2. Pass 2: Warped-noise refinement for temporal consistency

Smart Mask Generation

  • SAM2 for precise segmentation
  • Gemini VLM reasons about interaction regions
  • GUI editor for manual mask refinement

Input Format

my-video/
β”œβ”€β”€ input_video.mp4
β”œβ”€β”€ quadmask_0.mp4
└── prompt.json  # {"bg": "A table with a cup on it."}

Training from Scratch

VOID ships with complete data generation pipelines:

  1. HUMOTO: Human-object physics using Blender
  2. Kubric: Object-only interactions

Generate paired counterfactual videos (with/without object) and train both passes.

Real-World Applications

  • VFX cleanup: Remove unwanted elements with realistic physics
  • Privacy protection: Anonymize people while preserving scene dynamics
  • Creative video editing: Rearrange scenes with natural motion

Community Extensions

Get Started Today

  1. Clone: git clone https://github.com/Netflix/void-model
  2. Install: pip install -r requirements.txt
  3. Download models from Hugging Face
  4. Run Colab notebook

Paper: arXiv:2604.02296

VOID represents the cutting edge of video understanding – combining VLM reasoning, SAM2 segmentation, and diffusion models for unprecedented video manipulation capabilities.

Original Article: View Original

Share this article