Helios：14B 实时视频生成，19.5 FPS

March 25, 2026

类别: 实用开源项目

标签:

Open Source Real-time AI HuggingFace Video Generation diffusion-models

Helios：实时长视频生成革命

比 1.3B 模型还快的 14B 模型

来自 PKU-YuanGroup 的 Helios 重新定义了视频生成。这款 14B 参数模型 在单张 H100 GPU 上以 19.5 FPS 生成分钟级高品质视频（Ascend NPU 上约 10 FPS）——无需反漂移策略或加速技巧。

关键突破： - 无自回归强制、错误银行或关键帧采样 - 无 KV 缓存、因果掩码或量化 - 80GB GPU 内存可容纳 4×14B 模型 - 图像扩散级别的训练批次大小

三种模型变体可用

模型	质量	速度	调度器
Helios-Base	最佳	标准	HeliosScheduler + CFG
Helios-Mid	中等	更快	CFG-Zero*
Helios-Distilled	良好	最快	HeliosDMDScheduler

开箱即用生态支持

✅ HuggingFace Diffusers（标准 + 模块化管道） ✅ SGLang-Diffusion（原生 + Diffusers 后端） ✅ vLLM-Omni（完全解耦服务） ✅ Ascend NPU（华为硬件） ✅ Cache-DiT（全缓存加速） ✅ Gradio Demo（Spaces 上 AOTI 编译）

消费级硬件运行（~6GB 显存）

CUDA_VISIBLE_DEVICES=0 python infer_helios.py \
--base_model_path "BestWishYsh/Helios-Distilled" \
--sample_type "t2v" \
--prompt "A vibrant tropical fish..." \
--num_frames 240 \
--enable_low_vram_mode \
--group_offloading_type "leaf_level"

多 GPU 上下文并行

支持跨 4+ GPU 的 Ulysses Attention、Ring Attention、Unified Attention：

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 infer_helios.py \
--enable_parallelism --cp_backend "ulysses" \
--base_model_path "BestWishYsh/Helios-Base"

完整安装（5 分钟）

git clone --depth=1 https://github.com/PKU-YuanGroup/Helios.git
cd Helios
conda create -n helios python=3.11.2
conda activate helios
# 安装 PyTorch + bash install.sh
huggingface-cli download BestWishYsh/Helios-Distilled

快速启动脚本

cd scripts/inference
bash helios-distilled_t2v.sh  # 文本到视频
bash helios-distilled_i2v.sh  # 图像到视频
bash helios-distilled_v2v.sh  # 视频到视频

从零训练

支持 DDP 或 DeepSpeed 的三阶段渐进式管道： 1. 阶段 1：架构适配（统一历史注入） 2. 阶段 2：金字塔统一预测校正器 3. 阶段 3：对抗性层次蒸馏

bash scripts/training/train_ddp.sh

性能基准

硬件	FPS	显存	视频长度
H100	19.5	24GB	60s+
RTX 4090	15+	6GB*	30s+
Ascend NPU	10	24GB	60s+

*使用组卸载

为什么 Helios 重要

真实工程：无研究技巧，即开即用
完整技术栈：训练 + 推理 + 部署
硬件无关：NVIDIA + 华为 + 消费级 GPU
开发者友好：多种推理后端
可扩展：单 GPU 到多节点集群

立即开始：GitHub | arXiv | HF Space

⭐ 给仓库点星，加入实时视频生成革命！

原创文章: 查看原文

分享本文