Helios: 14B Real-Time Video Gen at 19.5 FPS

Helios: Real-Time Long Video Generation Revolution

The 14B Model That Runs Faster Than 1.3B Models

Helios from PKU-YuanGroup redefines video generation. This 14B parameter model generates minute-scale, high-quality videos at 19.5 FPS on a single H100 GPU (~10 FPS on Ascend NPU) - without anti-drifting strategies or acceleration tricks.

Key breakthroughs: - No self-forcing, error-banks, or keyframe sampling - No KV-cache, causal masking, or quantization - Fits 4Γ—14B models in 80GB GPU memory - Image-diffusion-scale training batch sizes

Three Model Variants Available

Model Quality Speed Scheduler
Helios-Base Best Standard HeliosScheduler + CFG
Helios-Mid Intermediate Faster CFG-Zero*
Helios-Distilled Good Fastest HeliosDMDScheduler

Day-0 Ecosystem Support

βœ… HuggingFace Diffusers (Standard + Modular pipelines) βœ… SGLang-Diffusion (Native + Diffusers backend) βœ… vLLM-Omni (Full disaggregated serving) βœ… Ascend NPU (Huawei hardware) βœ… Cache-DiT (Full cache acceleration) βœ… Gradio Demo (AOTI compilation on Spaces)

Run on Consumer Hardware (~6GB VRAM)

CUDA_VISIBLE_DEVICES=0 python infer_helios.py \
--base_model_path "BestWishYsh/Helios-Distilled" \
--sample_type "t2v" \
--prompt "A vibrant tropical fish..." \
--num_frames 240 \
--enable_low_vram_mode \
--group_offloading_type "leaf_level"

Multi-GPU Context Parallelism

Supports Ulysses Attention, Ring Attention, Unified Attention across 4+ GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 infer_helios.py \
--enable_parallelism --cp_backend "ulysses" \
--base_model_path "BestWishYsh/Helios-Base"

Complete Installation (5 minutes)

git clone --depth=1 https://github.com/PKU-YuanGroup/Helios.git
cd Helios
conda create -n helios python=3.11.2
conda activate helios
# Install PyTorch + bash install.sh
huggingface-cli download BestWishYsh/Helios-Distilled

Quick Start Scripts

cd scripts/inference
bash helios-distilled_t2v.sh  # Text-to-Video
bash helios-distilled_i2v.sh  # Image-to-Video
bash helios-distilled_v2v.sh  # Video-to-Video

Training from Scratch

Three-stage progressive pipeline with DDP or DeepSpeed support: 1. Stage-1: Architectural adaptation (Unified History Injection) 2. Stage-2: Pyramid Unified Predictor Corrector 3. Stage-3: Adversarial Hierarchical Distillation

bash scripts/training/train_ddp.sh

Performance Benchmarks

Hardware FPS VRAM Video Length
H100 19.5 24GB 60s+
RTX 4090 15+ 6GB* 30s+
Ascend NPU 10 24GB 60s+

*With group offloading

Why Helios Matters

  1. Real engineering: No research tricks, production-ready
  2. Complete stack: Training + inference + deployment
  3. Hardware agnostic: NVIDIA + Huawei + consumer GPUs
  4. Developer-friendly: Multiple inference backends
  5. Scalable: Single GPU to multi-node clusters

Get started today: GitHub | arXiv | HF Space

⭐ Star the repo and join the real-time video generation revolution!

Original Article: View Original

Share this article