Helios: 14B Real-Time Video Gen at 19.5 FPS

March 25, 2026

Category: Practical Open Source Projects

Tags:

Open Source Real-time AI HuggingFace Video Generation diffusion-models

Helios: Real-Time Long Video Generation Revolution

The 14B Model That Runs Faster Than 1.3B Models

Helios from PKU-YuanGroup redefines video generation. This 14B parameter model generates minute-scale, high-quality videos at 19.5 FPS on a single H100 GPU (~10 FPS on Ascend NPU) - without anti-drifting strategies or acceleration tricks.

Key breakthroughs: - No self-forcing, error-banks, or keyframe sampling - No KV-cache, causal masking, or quantization - Fits 4×14B models in 80GB GPU memory - Image-diffusion-scale training batch sizes

Three Model Variants Available

Model	Quality	Speed	Scheduler
Helios-Base	Best	Standard	HeliosScheduler + CFG
Helios-Mid	Intermediate	Faster	CFG-Zero*
Helios-Distilled	Good	Fastest	HeliosDMDScheduler

Day-0 Ecosystem Support

✅ HuggingFace Diffusers (Standard + Modular pipelines) ✅ SGLang-Diffusion (Native + Diffusers backend) ✅ vLLM-Omni (Full disaggregated serving) ✅ Ascend NPU (Huawei hardware) ✅ Cache-DiT (Full cache acceleration) ✅ Gradio Demo (AOTI compilation on Spaces)

Run on Consumer Hardware (~6GB VRAM)

CUDA_VISIBLE_DEVICES=0 python infer_helios.py \
--base_model_path "BestWishYsh/Helios-Distilled" \
--sample_type "t2v" \
--prompt "A vibrant tropical fish..." \
--num_frames 240 \
--enable_low_vram_mode \
--group_offloading_type "leaf_level"

Multi-GPU Context Parallelism

Supports Ulysses Attention, Ring Attention, Unified Attention across 4+ GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 infer_helios.py \
--enable_parallelism --cp_backend "ulysses" \
--base_model_path "BestWishYsh/Helios-Base"

Complete Installation (5 minutes)

git clone --depth=1 https://github.com/PKU-YuanGroup/Helios.git
cd Helios
conda create -n helios python=3.11.2
conda activate helios
# Install PyTorch + bash install.sh
huggingface-cli download BestWishYsh/Helios-Distilled

Quick Start Scripts

cd scripts/inference
bash helios-distilled_t2v.sh  # Text-to-Video
bash helios-distilled_i2v.sh  # Image-to-Video
bash helios-distilled_v2v.sh  # Video-to-Video

Training from Scratch

Three-stage progressive pipeline with DDP or DeepSpeed support: 1. Stage-1: Architectural adaptation (Unified History Injection) 2. Stage-2: Pyramid Unified Predictor Corrector 3. Stage-3: Adversarial Hierarchical Distillation

bash scripts/training/train_ddp.sh

Performance Benchmarks

Hardware	FPS	VRAM	Video Length
H100	19.5	24GB	60s+
RTX 4090	15+	6GB*	30s+
Ascend NPU	10	24GB	60s+

*With group offloading

Why Helios Matters

Real engineering: No research tricks, production-ready
Complete stack: Training + inference + deployment
Hardware agnostic: NVIDIA + Huawei + consumer GPUs
Developer-friendly: Multiple inference backends
Scalable: Single GPU to multi-node clusters

Get started today: GitHub | arXiv | HF Space

⭐ Star the repo and join the real-time video generation revolution!

Original Article: View Original

Share this article