ACE-Step 1.5: Open‑Source Music Model Outperforms Commercial
ACE-Step 1.5 – The Open‑Source Music Generation Model That Beats Commercial Alternatives
What is ACE‑Step 1.5?
ACE‑Step is a modular, hybrid‑architecture music foundation model released under the MIT license. It blends a Language Model (LM)—used as an omni‑capable planner—to convert simple prompts into song blueprints, with a Diffusion Transformer (DiT) that generates raw audio. The LM supplies lyrics, structure, style tokens, and guiding text, and even performs chain‑of‑thought reasoning to keep the music aligned with user intent.
The result? Commercial‑grade output (often beating Suno v4.5, near Suno v5) while staying lightweight: under 4 GB VRAM is enough to generate a full 5‑minute track in under 10 seconds on an RTX 3090 or even ~2 s on an A100. A pure CPU build is also possible, albeit slower.
Core Feature Highlights
- Fast Generation – 2 s/sound on A100, 10 s on RTX 3090.
- High‑Quality, Multi‑Lang Lyrics – Supports 50+ languages for lyric input.
- Rich Style Control – 1,000+ instruments and fine‑grained timbre descriptors.
- Zero‑Latency Editing – Cover generation, repaint, vocal‑to‑BGM, track separation, multi‑track layering, etc.
- Lightpersonalisation – Fine‑tune a LoRA with just a handful of songs (≈8 songs, 1 hour on 3090). Works on 12–16 GB VRAM.
- Model Zoo – DiT and LM variants (0.6 B / 1.7 B / 4 B), turbo, shift, continuous, SFT, etc.
- Easy Deployment – Gradio UI, REST API, single‑line
uvcommands, Windows portable bundle.
Getting Started
1. Clone the Repository
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
If you prefer Python directly, ensure you have Python 3.11 and uv (the modern Python package manager). The Windows bundle comes with python_embeded for quick launch.
2. Install Dependencies
uv sync
For the Windows portable package, just double‑click start_gradio_ui.bat to automatically install.
Tip – On Linux / macOS you may need to install
uvfirst:curl -LsSf https://astral.sh/uv/install.sh | shThen run
uv sync.
3. Download the Model Checkpoints
Models download automatically the first time you run the UI or the API. If you want to pre‑download:
uv run acestep-download --all
This pulls in everything: DiT, LM (1.7 B and 0.6 B), VAE, embedding, etc. Optional variants like acestep-v15-turbo-shift3 are also available.
4. Launch the Gradio UI
uv run acestep
or, from the Windows bundle:
start start_gradio_ui.bat
Open http://localhost:7860 in your browser. The UI is multilingual; choose your language at startup.
5. Run the REST API (Optional)
uv run acestep-api
This starts a server at http://localhost:8001. Use curl or Postman to hit /v1/generate.
6. Quick‑Start Commands (All Platforms)
| Function | Command |
|---|---|
| Gradio | uv run acestep --serve-name 0.0.0.0 --share |
| API (with key) | uv run acestep-api --api-key secret123 |
| Pre‑initialize LM | uv run acestep --init_service true --lm_model_path acestep-5Hz-lm-1.7B |
| Use ModelScope download source | uv run acestep --download-source modelscope |
For script‑based Windows users, edit start_gradio_ui.bat or start_api_server.bat to adjust LANGUAGE, DOWNLOAD_SOURCE, or CONFIG_PATH.
Customising ACE‑Step
1. Selecting the Right LM/DiT
| GPU VRAM | Recommended LM | Notes |
|---|---|---|
| ≤ 6 GB | None (DiT only) | Off‑load to CPU by default |
| 6–12 GB | acestep-5Hz-lm-0.6B |
Lightweight, good quality |
| 12–16 GB | acestep-5Hz-lm-1.7B |
Better audio understanding |
| ≥ 16 GB | acestep-5Hz-lm-4B |
Highest fidelity |
Set the LM path in the UI or via --lm_model_path.
2. LoRA Training
- Prepare data – 8–12 short songs in WAV/MP3 format.
- Launch LoRA UI – Gradio includes a “LoRA” tab.
- Configure – Choose dataset folder, set learning rate, epochs.
- Train – Click “Train Now”. Training on a 3090 takes ~ 1 hr.
- Save – The resulting
.ptfile can be loaded back into ACE‑Step for inference.
3. Advanced Editing
- Repaint & Edit – Select a segment and click “Edit”; the model regenerates that slice.
- Cover Generation – Upload an audio file, choose a target style, and generate.
- Track Separation – Separate into stems (vocal, drums, bass, etc.) using built‑in functions.
- Vocal‑to‑BGM – Use the vocal track as conditioning to produce accompaniment.
FAQ & Troubleshooting
| Issue | Fix |
|---|---|
| “CUDA error: out of memory” | Reduce --max_length or switch to the 0.6 B LM variant. |
| Models fail to download | Ensure uv is in PATH and your internet isn’t blocked. Try --download-source huggingface. |
| Gradio UI not loading | Check if Port 7860 is free; try --port 7861. |
| API returns 401 | Provide the correct --api-key in the command or set it in the .env file. |
| Windows “Portable” not working | Verify that `python_embeded |
equirements.txtis present and runuv install`. |
Why ACE‑Step Matters
- No cloud required – You keep every part of the pipeline local, preserving privacy and eliminating bandwidth costs.
- Open‑source transparency – Full access to the code and model weights lets developers audit, fork, and extend the work.
- Rapid prototyping – The Gradio interface lets you iterate on prompts and tweaks without writing code.
- Community‑driven – Contributions are welcome; the repo already boasts 12 contributors and a growing community of musicians and engineers.
Conclusion
ACE‑Step 1.5 is a game‑changer for anyone looking to generate high‑fidelity music on modest hardware. Its hybrid LM‑DiT design, lightning‑fast inference, and extensive control suite make it a top choice for artists, content creators, and research labs alike. Grab the repo, follow the simple install guide, and start crafting your own custom music right from your laptop today.
References: GitHub repository https://github.com/ace-step/ACE-Step-1.5, HuggingFace Space https://huggingface.co/spaces/ace-step/ace-step-1.5