SongGeneration – LeVo Open‑Source Music Model (NeurIPS 2025)

SongGeneration – LeVo Open‑Source Music Model (NeurIPS 2025)

If you’ve ever imagined an AI that can write complete songs with lyrics, melodies, and accompaniment on demand, SongGeneration is the tool for you. Developed as the official codebase for the NeurIPS 2025 paper LeVo: High‑Quality Song Generation with Multi‑Preference Alignment, this repository turns a cutting‑edge research model into a developer‑friendly, open‑source module that can generate full‑length tracks in just a few seconds.

🎶 What is SongGeneration?

SongGeneration is a language‑model‑based architecture that blends two core components:

  1. LeLM – a transformer‑style LM that jointly models mixed tokens (voice + back‑ground music) and dual‑track tokens (separated vocal and accompaniment), allowing the model to align lyrics with harmonic and rhythmic structure.
  2. Music Codec – an efficient neural codec that decodes the dual‑track tokens back into high‑fidelity PCM audio.

The combination yields a system capable of:

  • Producing songs up to 4 minutes and 30 seconds in length.
  • Supporting multiple languages (Chinese, English, Spanish, Japanese, etc.) depending on the checkpoint.
  • Generating multi‑track outputs (vocals, accompaniment, or both combined) and even pure vocals or pure music.
  • Using prompt‑audio to steer the style, genre, or timbre.

🏗️ Repository Structure

SongGeneration/
├─ conf/                # Default configuration files
├─ img/                 # Images for README and docs
├─ sample/              # Sample input JSONL files & outputs
├─ tools/
│  └─ gradio/           # Gradio UI scripts
├─ generate.py          # Gradio‑based inference entry point
├─ generate.sh           # Shell script wrapper for inference
├─ LICENSE              # MIT‑style license
├─ README.md            # Full documentation
├─ requirements.txt      # Python dependencies
└─ ...

The README is your entry point. It covers everything from installation to advanced usage (e.g., low‑memory inference, Flash Attention toggles, and Docker deployment).

📦 Installing and Running

1. Prerequisites

  • Python ≥ 3.8.12
  • CUDA ≥ 11.8 (for GPU acceleration; optional CPU fallback)
  • GPU with at least 10 GB of free memory (10B+ RTX GPUs). If your GPU is smaller, see the low‑memory section.

2. Install the Code

# Clone the repo
git clone https://github.com/tencent-ailab/SongGeneration.git
cd SongGeneration

# Install dependencies
pip install -r requirements.txt
pip install -r requirements_nodeps.txt --no-deps

# Optional: install Flash Attention (improves speed)
# pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

3. Download Model Checkpoints

All checkpoints are hosted on Hugging Face. For example, to download the SongGeneration‑base‑full checkpoint:

huggingface-cli download lglg666/SongGeneration-base-full --local-dir ./songgeneration_base_full

If you prefer the larger SongGeneration‑large model, adjust the path accordingly. The model folder names must match the checkpoint name exactly.

4. Run Inference

You can launch the Gradio interface or run inference from the command line.

a) Gradio UI

sh tools/gradio/run.sh <ckpt_path>
# example:
# sh tools/gradio/run.sh ./songgeneration_base_full

This will start a local web server (default http://localhost:7860) where you can:

  • Upload a JSONL file or type in lyrics directly.
  • Choose a model version.
  • Toggle options like Separate Tracks, Pure Vocals, or Pure Music.
  • Preview and download generated MP3‑style clips.

b) CLI Inference

sh generate.sh <ckpt_path> <input_jsonl> <output_dir> [options]
# sample usage
sh generate.sh ./songgeneration_base_full sample/lyrics.jsonl sample/output

Options include:

  • --low_mem  Use low‑memory inference (reduces VRAM requirements).
  • --not_use_flash_attn Skip Flash Attention if not available.
  • --bgm  Generate only background music.
  • --vocal  Generate only vocals (a cappella style).
  • --separate Generate separated vocal and accompaniment tracks.

5. Prompt‑Audio & Descriptions

You can steer the generation with a short (10‑second) prompt audio or by specifying description tags. The repository provides a table of supported tags for gender, timbre, genre, emotion, instrument, and BPM. For instance:

{
  "idx": "song_01",
  "gt_lyric": "[intro-short] ; [verse] … ; [chorus] … ; [outro-medium]",
  "descriptions": "female, bright, pop, energetic, guitar, drums"
}

The model will produce a 2‑minute pop track with matching vocal timbre and instrumentation.

🚀 Latest Features (as of Oct 2025)

Date Update
2025‑10‑16 Demo web‐page now supports full‑length song generation (up to 4 m 30 s)
2025‑10‑15 Codebase updated for inference speed and new model version
2025‑10‑14 Released the SongGeneration‑large checkpoint
2025‑10‑13 Released full‑time SongGeneration‑base‑full with evaluation data
2025‑10‑12 Released English‑enhanced SongGeneration‑base‑new
2025‑09‑23 Data processing pipeline released – automatic beat‑tracking and lyric alignment
2025‑07‑25 Model can run with as little as 10 GB GPU memory

📚 How to Use in Your Projects

1️⃣ Build a Web App

Use the Gradio UI as a base and embed it in your own Flask or FastAPI server. The generate.py script exposes a simple API you can call from any language.

2️⃣ Create a Mobile Demo

Export the trained checkpoints to ONNX or TorchScript and deploy on Android/ iOS using CoreML or TensorFlow Lite. The low‑memory inference flag is especially helpful on mobile GPUs.

3️⃣ Generate Music for Video Games

Feed structured lyric scripts (e.g., JSONL files) into the system. The output format (VST or raw PCM) can be directly imported into game audio engines like FMOD or Wwise.

4️⃣ Experiment With Retrieval‑Based Prompting

The repo includes a small toolkit for auto‑selecting prompt audio from a library: by specifying auto_prompt_audio_type you can tell the model to generate a Jazz or Rock style piece without uploading an audio file.

🔗 Resources & Community

  • Hugging Face Space – Interactive demo: https://huggingface.co/spaces/lglg666/song-generation-levo
  • PaperLeVo: High‑Quality Song Generation with Multi‑Preference Alignment (arXiv 2506.07520)
  • Docker Image – Quick start: docker pull juhayna/song-generation-levo:hf0613
  • GitHub Discussions & Issues – Ask questions, share samples, report bugs.

⚙️ Evaluation Highlights

The repository lists comprehensive evaluation metrics (e.g., PER, Audiobox Aesthetics, SongEval). Compared to existing open‑source models like Suno or Mureka, SongGeneration exhibits the lowest PER and the highest aesthetic scores, especially in the SongGeneration‑large variant.

🎓 Final Thoughts

SongGeneration bridges the gap between research and production. Its modular design, extensive documentation, and large‑scale pretrained checkpoints make it a prime candidate for anyone looking to build AI‑powered music generation tools—whether for a commercial platform, a personal project, or academic research. With the active development community on GitHub and active support on Hugging Face, you can stay up‑to‑date with the latest updates and leverage new model variants as soon as they’re released.

Give it a try today—explore the repo, run the demo, and compose your own AI‑crafted songs with a single command!

Original Article: View Original

Share this article