ACE-Step 1.5: Open‑Source Music Model Outperforms Commercial

ACE-Step 1.5 – The Open‑Source Music Generation Model That Beats Commercial Alternatives

What is ACE‑Step 1.5?

ACE‑Step is a modular, hybrid‑architecture music foundation model released under the MIT license. It blends a Language Model (LM)—used as an omni‑capable planner—to convert simple prompts into song blueprints, with a Diffusion Transformer (DiT) that generates raw audio. The LM supplies lyrics, structure, style tokens, and guiding text, and even performs chain‑of‑thought reasoning to keep the music aligned with user intent.

The result? Commercial‑grade output (often beating Suno v4.5, near Suno v5) while staying lightweight: under 4 GB VRAM is enough to generate a full 5‑minute track in under 10 seconds on an RTX 3090 or even ~2 s on an A100. A pure CPU build is also possible, albeit slower.

Core Feature Highlights

  • Fast Generation – 2 s/sound on A100, 10 s on RTX 3090.
  • High‑Quality, Multi‑Lang Lyrics – Supports 50+ languages for lyric input.
  • Rich Style Control – 1,000+ instruments and fine‑grained timbre descriptors.
  • Zero‑Latency Editing – Cover generation, repaint, vocal‑to‑BGM, track separation, multi‑track layering, etc.
  • Lightpersonalisation – Fine‑tune a LoRA with just a handful of songs (≈8 songs, 1 hour on 3090). Works on 12–16 GB VRAM.
  • Model Zoo – DiT and LM variants (0.6 B / 1.7 B / 4 B), turbo, shift, continuous, SFT, etc.
  • Easy Deployment – Gradio UI, REST API, single‑line uv commands, Windows portable bundle.

Getting Started

1. Clone the Repository

git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5

If you prefer Python directly, ensure you have Python 3.11 and uv (the modern Python package manager). The Windows bundle comes with python_embeded for quick launch.

2. Install Dependencies

uv sync

For the Windows portable package, just double‑click start_gradio_ui.bat to automatically install.

Tip – On Linux / macOS you may need to install uv first:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then run uv sync.

3. Download the Model Checkpoints

Models download automatically the first time you run the UI or the API. If you want to pre‑download:

uv run acestep-download --all

This pulls in everything: DiT, LM (1.7 B and 0.6 B), VAE, embedding, etc. Optional variants like acestep-v15-turbo-shift3 are also available.

4. Launch the Gradio UI

uv run acestep

or, from the Windows bundle:

start start_gradio_ui.bat

Open http://localhost:7860 in your browser. The UI is multilingual; choose your language at startup.

5. Run the REST API (Optional)

uv run acestep-api

This starts a server at http://localhost:8001. Use curl or Postman to hit /v1/generate.

6. Quick‑Start Commands (All Platforms)

Function Command
Gradio uv run acestep --serve-name 0.0.0.0 --share
API (with key) uv run acestep-api --api-key secret123
Pre‑initialize LM uv run acestep --init_service true --lm_model_path acestep-5Hz-lm-1.7B
Use ModelScope download source uv run acestep --download-source modelscope

For script‑based Windows users, edit start_gradio_ui.bat or start_api_server.bat to adjust LANGUAGE, DOWNLOAD_SOURCE, or CONFIG_PATH.

Customising ACE‑Step

1. Selecting the Right LM/DiT

GPU VRAM Recommended LM Notes
≤ 6 GB None (DiT only) Off‑load to CPU by default
6–12 GB acestep-5Hz-lm-0.6B Lightweight, good quality
12–16 GB acestep-5Hz-lm-1.7B Better audio understanding
≥ 16 GB acestep-5Hz-lm-4B Highest fidelity

Set the LM path in the UI or via --lm_model_path.

2. LoRA Training

  1. Prepare data – 8–12 short songs in WAV/MP3 format.
  2. Launch LoRA UI – Gradio includes a “LoRA” tab.
  3. Configure – Choose dataset folder, set learning rate, epochs.
  4. Train – Click “Train Now”. Training on a 3090 takes ~ 1 hr.
  5. Save – The resulting .pt file can be loaded back into ACE‑Step for inference.

3. Advanced Editing

  • Repaint & Edit – Select a segment and click “Edit”; the model regenerates that slice.
  • Cover Generation – Upload an audio file, choose a target style, and generate.
  • Track Separation – Separate into stems (vocal, drums, bass, etc.) using built‑in functions.
  • Vocal‑to‑BGM – Use the vocal track as conditioning to produce accompaniment.

FAQ & Troubleshooting

Issue Fix
“CUDA error: out of memory” Reduce --max_length or switch to the 0.6 B LM variant.
Models fail to download Ensure uv is in PATH and your internet isn’t blocked. Try --download-source huggingface.
Gradio UI not loading Check if Port 7860 is free; try --port 7861.
API returns 401 Provide the correct --api-key in the command or set it in the .env file.
Windows “Portable” not working Verify that `python_embeded
equirements.txtis present and runuv install`.

Why ACE‑Step Matters

  • No cloud required – You keep every part of the pipeline local, preserving privacy and eliminating bandwidth costs.
  • Open‑source transparency – Full access to the code and model weights lets developers audit, fork, and extend the work.
  • Rapid prototyping – The Gradio interface lets you iterate on prompts and tweaks without writing code.
  • Community‑driven – Contributions are welcome; the repo already boasts 12 contributors and a growing community of musicians and engineers.

Conclusion

ACE‑Step 1.5 is a game‑changer for anyone looking to generate high‑fidelity music on modest hardware. Its hybrid LM‑DiT design, lightning‑fast inference, and extensive control suite make it a top choice for artists, content creators, and research labs alike. Grab the repo, follow the simple install guide, and start crafting your own custom music right from your laptop today.


References: GitHub repository https://github.com/ace-step/ACE-Step-1.5, HuggingFace Space https://huggingface.co/spaces/ace-step/ace-step-1.5

Original Article: View Original

Share this article