Helios: 14B Real-Time Video Gen at 19.5 FPS
Helios: Real-Time Long Video Generation Revolution
The 14B Model That Runs Faster Than 1.3B Models
Helios from PKU-YuanGroup redefines video generation. This 14B parameter model generates minute-scale, high-quality videos at 19.5 FPS on a single H100 GPU (~10 FPS on Ascend NPU) - without anti-drifting strategies or acceleration tricks.
Key breakthroughs: - No self-forcing, error-banks, or keyframe sampling - No KV-cache, causal masking, or quantization - Fits 4Γ14B models in 80GB GPU memory - Image-diffusion-scale training batch sizes
Three Model Variants Available
| Model | Quality | Speed | Scheduler |
|---|---|---|---|
| Helios-Base | Best | Standard | HeliosScheduler + CFG |
| Helios-Mid | Intermediate | Faster | CFG-Zero* |
| Helios-Distilled | Good | Fastest | HeliosDMDScheduler |
Day-0 Ecosystem Support
β HuggingFace Diffusers (Standard + Modular pipelines) β SGLang-Diffusion (Native + Diffusers backend) β vLLM-Omni (Full disaggregated serving) β Ascend NPU (Huawei hardware) β Cache-DiT (Full cache acceleration) β Gradio Demo (AOTI compilation on Spaces)
Run on Consumer Hardware (~6GB VRAM)
CUDA_VISIBLE_DEVICES=0 python infer_helios.py \
--base_model_path "BestWishYsh/Helios-Distilled" \
--sample_type "t2v" \
--prompt "A vibrant tropical fish..." \
--num_frames 240 \
--enable_low_vram_mode \
--group_offloading_type "leaf_level"
Multi-GPU Context Parallelism
Supports Ulysses Attention, Ring Attention, Unified Attention across 4+ GPUs:
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 infer_helios.py \
--enable_parallelism --cp_backend "ulysses" \
--base_model_path "BestWishYsh/Helios-Base"
Complete Installation (5 minutes)
git clone --depth=1 https://github.com/PKU-YuanGroup/Helios.git
cd Helios
conda create -n helios python=3.11.2
conda activate helios
# Install PyTorch + bash install.sh
huggingface-cli download BestWishYsh/Helios-Distilled
Quick Start Scripts
cd scripts/inference
bash helios-distilled_t2v.sh # Text-to-Video
bash helios-distilled_i2v.sh # Image-to-Video
bash helios-distilled_v2v.sh # Video-to-Video
Training from Scratch
Three-stage progressive pipeline with DDP or DeepSpeed support: 1. Stage-1: Architectural adaptation (Unified History Injection) 2. Stage-2: Pyramid Unified Predictor Corrector 3. Stage-3: Adversarial Hierarchical Distillation
bash scripts/training/train_ddp.sh
Performance Benchmarks
| Hardware | FPS | VRAM | Video Length |
|---|---|---|---|
| H100 | 19.5 | 24GB | 60s+ |
| RTX 4090 | 15+ | 6GB* | 30s+ |
| Ascend NPU | 10 | 24GB | 60s+ |
*With group offloading
Why Helios Matters
- Real engineering: No research tricks, production-ready
- Complete stack: Training + inference + deployment
- Hardware agnostic: NVIDIA + Huawei + consumer GPUs
- Developer-friendly: Multiple inference backends
- Scalable: Single GPU to multi-node clusters
Get started today: GitHub | arXiv | HF Space
β Star the repo and join the real-time video generation revolution!