ACE-Step Music Generator

v2.0.0

Overview

The ACE-Step Music Generator integrates the ACE-Step diffusion model into TouchDesigner for text-to-music generation, audio-to-audio transformation, and audio editing. All inference runs on the external SideCar process, keeping TouchDesigner responsive. Generated audio includes a real-time waveform visualizer and optional autoplay.

Key Features

Text-to-Music: Generate music from descriptive tags, genres, and structured lyrics
Audio-to-Audio: Transform existing audio using a reference file with adjustable influence strength
Audio Editing: Edit, repaint, retake, or extend existing audio with fine-grained control
SideCar Architecture: All model loading and inference is offloaded to an external process
Auto Repository Setup: Prompts to download and clone the ACE-Step repository on first use
Settings Recall: Save and reload generation parameters from previous outputs

Requirements

SideCar Operator: Must be running and connected. All model inference happens there
SideCar Python Environment: All ACE-Step dependencies (torch, torchaudio, librosa, diffusers, etc.) must be installed in the SideCar’s Python environment. This operator does not manage packages
Git: Must be installed and in your system PATH for automatic repository cloning

Input/Output

Inputs

None required. Queries are configured via parameters. Reference audio files are specified by file path for audio-to-audio and editing modes.

Outputs

Waveform Visualization: Real-time visual waveform rendered to an internal scriptTOP
Audio Files: Generated WAV files saved to the configured output folder

Usage Examples

Text-to-Music Generation

Ensure the SideCar is running and connected (check the About page, ‘SideCar Operator’)
On the ACE-Step page, enter descriptive tags in ‘Prompt / Tags’ (e.g., “upbeat pop, catchy melody, female singer”)
Enter structured lyrics in ‘Lyrics’ using tags like [verse], [chorus]
Set ‘Audio Duration’ to the desired length in seconds
Pulse ‘Generate Music’
If this is your first time, a dialog will prompt you to download the ACE-Step repository — click Download and wait for it to finish, then pulse ‘Generate Music’ again

Audio-to-Audio Transformation

On the ACE-Step page, toggle ‘Enable Audio2Audio’ to On
Set ‘Reference Audio Input’ to your source audio file
Adjust ‘Reference Audio Strength’ — higher values stay closer to the reference
Enter a prompt and lyrics to guide the transformation
Pulse ‘Generate Music’

Audio Editing (Edit, Repaint, Retake, Extend)

On the Edit page, toggle ‘Enable Audio Editing/Manipulation’ to On
Set ‘Source Audio Path’ to the audio you want to modify
Select an ‘Edit Mode’:
- Edit Audio Content: Changes the content of the audio using original and target prompts/lyrics. Requires filling in ‘Original Prompt’ and ‘Original Lyrics’ — pulse ‘Load Src Credentials’ to auto-fill these from a previous generation’s saved parameters
- Extend Audio Duration: Extends the audio by setting ‘Extend Start’ and ‘Extend End’ beyond the original boundaries
- Repaint Audio Segment: Regenerates a time region defined by start/end times
- Retake Full Audio: Regenerates the entire audio with variance control
Adjust ‘Variance’ and ‘Variant Seed’ for variation control
Pulse ‘Generate Music’ on the ACE-Step page

Reloading Previous Settings

Set ‘Current Audio’ to a previously generated WAV file
Pulse ‘Settings from Current Audio’ — this loads all generation parameters from the associated JSON file saved alongside the audio

Best Practices

Set ‘Manual Seed’ to a specific value for reproducible results, or leave at -1 for random
Toggle ‘Add Unique Suffix to Filename’ to On to prevent overwriting previous outputs
For audio editing, always use ‘Load Src Credentials’ to auto-fill the original prompt and lyrics rather than typing them manually
Higher ‘Inference Steps’ improve quality but increase generation time — 60 is a good starting point
On the Advanced page, ‘Use bfloat16 Precision’ speeds up inference on supported GPUs. Disable it on macOS or if you encounter errors

Troubleshooting

SideCar Not Connected: Check that the SideCar server is running. Verify the ‘SideCar Operator’ reference on the About page points to the correct operator
Repository Missing: If the clone prompt appears repeatedly, check your internet connection and Git installation. Review the TouchDesigner console for detailed errors
Missing Dependencies: Errors about missing Python packages (e.g., torch, librosa) mean you need to install them in the SideCar’s Python environment manually
torch.compile() Not Supported on Windows: The ACE-Step model does not support torch.compile() on Windows. Leave this toggle off unless running on Linux

Research Citation

Research & Licensing

ACE-STEP Project

The ACE-STEP project is an open-source initiative focused on advancing AI music generation.

ACE-Step

ACE-Step is a foundation model for music generation that integrates diffusion-based generation with advanced encoding and transformation techniques.

Technical Details

Combines diffusion with DCAE and linear transformer architecture
Uses MERT and m-hubert for semantic representation alignment (REPA)
Supports text-to-music, audio-to-audio, edit, repaint, retake, and extend tasks

Research Impact

Provides a holistic open-source architecture for state-of-the-art music generation
Enables original music generation across diverse genres for creative production and education

Citation

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Key Research Contributions

Open-source foundation model for music generation using diffusion with Deep Compression AutoEncoder (DCAE) and lightweight linear transformer
Leverages MERT and m-hubert for semantic alignment (REPA) enabling rapid training convergence
Faster synthesis than LLM-based models (up to 4 minutes of music in 20 seconds on A100 GPU)
Supports voice cloning, lyric editing, remixing, and track generation through fine-grained acoustic control

License

Apache License 2.0 - This model is freely available for research and commercial use.

Parameters

ACE-Step

Status (Status) op('acestep').par.Status Str

Current status of the operator.

Default:: "" (Empty String)

Active (Active) op('acestep').par.Active Toggle

Default:: False

Current Audio (Currentaudio) op('acestep').par.Currentaudio File

Default:: "" (Empty String)

Playhead (Playhead) op('acestep').par.Playhead Float

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Autoplay After Generation (Autoplay) op('acestep').par.Autoplay Toggle

Automatically play the audio after generation.

Default:: False

Generate Music (Generate) op('acestep').par.Generate Pulse

Trigger the music generation process based on current settings.

Default:: False

Core Generation Header

Prompt / Tags (Prompt) op('acestep').par.Prompt Str

Descriptive tags, genres, or scene descriptions. Used for text2music, audio2audio, and as a basis for edit/repaint.

Default:: "" (Empty String)

Lyrics (Lyrics) op('acestep').par.Lyrics Str

Enter lyrics with structure tags like [verse], [chorus]. Use \\\\n for newlines. Used for text2music, audio2audio, and as a basis for edit/repaint.

Default:: "" (Empty String)

Audio Duration (s) (Duration) op('acestep').par.Duration Float

Desired duration of the generated audio in seconds.

Default:: 0.0
Range:: 1 to 240
Slider Range:: 1 to 240

Inference Steps (Infersteps) op('acestep').par.Infersteps Int

Number of inference steps. Higher can improve quality but takes longer.

Default:: 0
Range:: 10 to 100
Slider Range:: 10 to 100

Manual Seed (Manualseed) op('acestep').par.Manualseed Int

Seed for reproducibility. -1 for random. Affects initial generation.

Default:: 0
Range:: -1 to 1000000000
Slider Range:: -1 to 1000000000

Guidance Scale (Main) (Guidancescale) op('acestep').par.Guidancescale Float

Main classifier-free guidance scale. Used if CFG Type is not "Double Condition".

Default:: 0.0
Range:: 1 to 30
Slider Range:: 1 to 30

Omega Scale (for APG) (Omegascale) op('acestep').par.Omegascale Float

Omega scale factor for APG guidance type.

Default:: 0.0
Range:: 0 to 20
Slider Range:: 0 to 20

Text Guidance Scale (Double Cond) (Guidancescaletext) op('acestep').par.Guidancescaletext Float

Guidance scale for text prompt when CFG Type is "Double Condition".

Default:: 0.0
Range:: 0 to 30
Slider Range:: 0 to 30

Lyric Guidance Scale (Double Cond) (Guidancescalelyric) op('acestep').par.Guidancescalelyric Float

Guidance scale for lyrics when CFG Type is "Double Condition".

Default:: 0.0
Range:: 0 to 30
Slider Range:: 0 to 30

Audio2Audio Mode [ Euler Scheduler Only ] Header

Enable Audio2Audio (Audio2audioenable) op('acestep').par.Audio2audioenable Toggle

Enable audio-to-audio generation. Uses Prompt & Lyrics as guidance if provided.

Default:: False

Reference Audio Input (Refaudioinput) op('acestep').par.Refaudioinput File

Path to the reference audio file for Audio2Audio mode.

Default:: "" (Empty String)

Reference Audio Strength (Refaudiostrength) op('acestep').par.Refaudiostrength Float

Strength of the reference audio influence (0.0 to 1.0).

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Output Settings Header

Output Folder (Outputfolder) op('acestep').par.Outputfolder Folder

Folder to save the generated WAV file. Relative to project or absolute.

Default:: "" (Empty String)

Output Filename (Outputfilename) op('acestep').par.Outputfilename Str

Name of the generated WAV file.

Default:: "" (Empty String)

Add Unique Suffix to Filename (Uniquesuffix) op('acestep').par.Uniquesuffix Toggle

If True, appends a timestamp to the filename to prevent overwriting.

Default:: False

Initialize ACE-Step Model (Initialize) op('acestep').par.Initialize Pulse

Check Dependencies, SideCar Connection, and Initialize Model.

Default:: False

Unload Model (Unloadmodel) op('acestep').par.Unloadmodel Pulse

Release the model from memory via SideCar.

Default:: False

Settings from Current Audio (Loadsettings) op('acestep').par.Loadsettings Pulse

Load generation parameters from the JSON associated with the Current Audio file.

Default:: False

Edit

Enable Audio Editing/Manipulation (Editaudio) op('acestep').par.Editaudio Toggle

Master toggle to enable audio editing modes on this page.

Default:: False

Audio Editing Configuration Header

Source Audio Path (Srcaudiopath) op('acestep').par.Srcaudiopath File

Path to the source audio file for all edit modes.

Default:: "" (Empty String)

Extend / Repaint / Retake Header

Variant Seed (Retakeseeds) op('acestep').par.Retakeseeds Int

Seed for retake/repaint/extend variations. -1 for random.

Default:: 0
Range:: -1 to 1000000000
Slider Range:: -1 to 1000000000

Variance (Retakevariance) op('acestep').par.Retakevariance Float

Amount of variance for retake/repaint (0.0 to 1.0).

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Extend Start (s) (Repaintstart) op('acestep').par.Repaintstart Float

Start time in seconds for repaint. For extend, negative values pad left. 0 for retake.

Default:: 0.0
Range:: -240 to 240
Slider Range:: -240 to 240

Extend End (s) (Repaintend) op('acestep').par.Repaintend Float

End time in seconds for repaint. For extend, values beyond original duration extend right. Original duration for retake.

Default:: 0.0
Range:: -240 to 480
Slider Range:: -240 to 480

Transition Time (s) (Transitiontime) op('acestep').par.Transitiontime Float

Duration of the transition/crossfade in seconds for repaint/extend modes. 0 for abrupt change.

Default:: 0.0
Range:: 0 to 30
Slider Range:: 0 to 30

Edit Audio Content [ Slower ] Header

Original Prompt (Editoriginalprompt) op('acestep').par.Editoriginalprompt Str

The original prompt used to generate the Source Audio. Required for "Edit Audio Content" mode.

Default:: "" (Empty String)

Original Lyrics (Editoriginallyrics) op('acestep').par.Editoriginallyrics Str

The original lyrics used to generate the Source Audio. Required for "Edit Audio Content" mode.

Default:: "" (Empty String)

Target Prompt (Edittargetprompt) op('acestep').par.Edittargetprompt Str

Target prompt for "Edit Audio Content" mode. If empty, uses main prompt.

Default:: "" (Empty String)

Target Lyrics (Edittargetlyrics) op('acestep').par.Edittargetlyrics Str

Target lyrics for "Edit Audio Content" mode. If empty, uses main lyrics.

Default:: "" (Empty String)

Min Influence (n_min) (Editnmin) op('acestep').par.Editnmin Float

Min influence for audio editing (0.0 to 1.0).

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Max Influence (n_max) (Editnmax) op('acestep').par.Editnmax Float

Max influence for audio editing (0.0 to 1.0).

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Avg Window (n_avg) (Editnavg) op('acestep').par.Editnavg Int

Averaging window size for editing.

Default:: 0
Range:: 1 to 100
Slider Range:: 1 to 100

Load Src Credentials (Loadsrccredentials) op('acestep').par.Loadsrccredentials Pulse

Loads prompt and lyrics from the _input_params.json associated with the Src Audio Path.

Default:: False

Advanced

Advanced Guidance Control Header

Guidance Interval (Guidanceinterval) op('acestep').par.Guidanceinterval Float

Guidance interval for CFG.

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Guidance Interval Decay (Guidanceintervaldecay) op('acestep').par.Guidanceintervaldecay Float

Decay rate for guidance interval.

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Min Guidance Scale (Minguidancescale) op('acestep').par.Minguidancescale Float

Minimum guidance scale.

Default:: 0.0
Range:: 0 to 30
Slider Range:: 0 to 30

ERG Control Header

Use ERG for Tags (Usergtag) op('acestep').par.Usergtag Toggle

Enable ERG (Exponentially Smoothed Moving Average Guidance) for prompt/tags.

Default:: False

Use ERG for Lyrics (Userglyric) op('acestep').par.Userglyric Toggle

Enable ERG for lyrics.

Default:: False

Use ERG for Diffusion (Usergdiffusion) op('acestep').par.Usergdiffusion Toggle

Enable ERG for diffusion process.

Default:: False

Other Advanced Parameters Header

Use Optimal Step Size (OSS) (Useoss) op('acestep').par.Useoss Toggle

Enable Optimal Step Size scheduling. Only effective if Scheduler Type is Euler.

Default:: False

OSS Steps (e.g., 20,50,100) (Osssteps) op('acestep').par.Osssteps Str

Steps for OSS (Optimal Step Size) scheduling, comma-separated. MUST be used with Euler Scheduler.

Default:: "" (Empty String)

Device & Precision Header

GPU Device ID (Deviceid) op('acestep').par.Deviceid Int

GPU device ID to use (e.g., 0, 1). Requires re-initialize.

Default:: 0
Range:: 0 to 1
Slider Range:: 0 to 1

Use bfloat16 Precision (Usebf16) op('acestep').par.Usebf16 Toggle

Use bfloat16 for faster inference (if supported). Uncheck for macOS or if errors occur. Requires re-initialize.

Default:: False

Use torch.compile() (Torchcompile) op('acestep').par.Torchcompile Toggle

Optimize model with torch.compile() for faster inference (Not supported on Windows by ACE-Step). Requires re-initialize.

Default:: False

Model Configuration Header

ACE-Step Repo Path (Modelpath) op('acestep').par.Modelpath Folder

Path to the CLONED ACE-Step GitHub repository directory.

Default:: "" (Empty String)

Checkpoint Dir (Optional) (Checkpointdir) op('acestep').par.Checkpointdir Folder

Path to the ACE-Step model checkpoint DIRECTORY. Leave empty for auto-download to default location inside repo.

Default:: "" (Empty String)

Changelog

v2.0.02025-07-19

🎨 Audio Visualization & Enhanced User Experience

Real-Time Audio Visualization:

Professional waveform visualization with frequency analysis
High-quality grayscale waveform rendering at 1280x960 resolution
Dynamic amplitude processing with transient emphasis
Frequency-based brightness variation for rich visual feedback
Anti-aliasing and smooth envelope generation

Automatic Visualization Triggers:

Auto-visualization after successful generation
Manual visualization via Currentaudio parameter changes
Smart path tracking to prevent redundant processing

Visual Feedback Enhancements:

Black screen clearing when no audio is selected
Visual confirmation of current audio status
Seamless integration with audio playback controls

TECHNICAL IMPROVEMENTS:

Async Visualization Processing:

Non-blocking waveform generation using TDAsyncIO
Thread-safe audio analysis with librosa integration
Graceful fallback to synchronous processing when needed

Robust Error Handling:

Fixed critical len() type errors in visualization pipeline
Comprehensive try/catch blocks around FFT processing
Safe numpy array type checking throughout audio pipeline

Parameter Callback System:

New Currentaudio() callback method for parameter-driven visualization
Intelligent request state checking to prevent conflicts during generation
Path validation and existence checking before processing

BUG FIXES:

Critical Stability Fixes:

Resolved TouchDesigner crashes caused by async visualization errors
Fixed "object of type 'int' has no len()" errors in audio processing
Improved error handling in FFT frequency analysis
Safe handling of edge cases in audio array processing

Visualization Pipeline Fixes:

Proper numpy array type validation throughout processing chain
Graceful handling of malformed or empty audio files
Improved error logging for debugging visualization issues

USER EXPERIENCE ENHANCEMENTS:

Visual Audio Management:

Immediate visual feedback when changing current audio file
Clear visual indication when no audio is loaded (black screen)
Smooth integration between generation and visualization workflows

Status & Logging Improvements:

Enhanced logging for visualization processes
Clear status messages for audio loading and processing
Improved error messages for troubleshooting

TECHNICAL DETAILS:

Visualization Engine:

Uses librosa for professional audio analysis
Implements RMS and peak envelope detection
FFT-based frequency analysis for visual brightness variation
Supports both sync and async processing modes

Integration Points:

Seamless connection with existing audio playback system
Compatible with all generation modes (text2music, audio2audio, editing)
Maintains full backward compatibility with v1.0.0 workflows

PERFORMANCE:

Optimized waveform generation with configurable resolution
Efficient memory usage in visualization processing
Non-blocking UI during visualization generation

v1.0.02025-06-20

🎵 Initial Release - ACE-Step Music Generation Integration

NEW FEATURES:

Text-to-Music Generation: Generate music from text prompts and descriptive tags
Lyrics Support: Full lyrics integration with structure tags like [verse], [chorus]
Audio2Audio Mode: Transform existing audio using prompts and lyrics as guidance
Advanced Audio Editing: Complete suite of audio manipulation tools:

Edit Audio Content: Modify existing audio with target prompts/lyrics
Repaint Audio Segment: Replace specific time segments
Retake Full Audio: Generate variations of entire audio
Extend Audio Duration: Extend audio beyond original length

Professional Parameter Control:

Inference steps, guidance scales, scheduler types (Euler, Heun)
CFG types (APG, CFG, Zero STAR, Double Condition)
ERG (Exponentially Smoothed Moving Average Guidance) controls
Manual seed support for reproducible generation

SideCar Integration: Seamless integration with SideCar server for distributed processing
Dependency Management: Automatic detection and installation of required Python packages
Output Management:

Configurable output folders and filenames
Automatic unique timestamp suffixes
JSON parameter saving for reproducibility

Settings Management: Load/save generation parameters from JSON files
Audio Playback: Built-in audio playback with playhead control
Model Management: Initialize, load, and unload models on demand

TECHNICAL FEATURES:

Three-Page Parameter Layout:

Main: Core generation and output settings
Edit: Audio editing and manipulation controls
Advanced: Professional diffusion and guidance parameters

Async Processing: Non-blocking generation via TDAsyncIO integration
Error Handling: Comprehensive dependency checking and error recovery
Status Monitoring: Real-time status updates and progress tracking

SUPPORTED WORKFLOWS:

Text → Music: Generate music from descriptive prompts
Audio → Audio: Transform existing audio with new characteristics
Audio Editing: Professional audio manipulation and refinement
Batch Processing: Generate multiple variations with different seeds

REQUIREMENTS:

ACE-Step repository (user must clone and configure)
SideCar operator for processing
Python dependencies (auto-installed when possible)
Optional: Custom checkpoints directory

ACE-Step Music Generator

Overview

Key Features

Requirements

Input/Output

Inputs

Outputs

Usage Examples

Text-to-Music Generation

Audio-to-Audio Transformation

Audio Editing (Edit, Repaint, Retake, Extend)

Reloading Previous Settings

Best Practices

Troubleshooting

Research Citation

Research & Licensing

ACE-STEP Project

ACE-Step

Technical Details

Research Impact

Citation

Key Research Contributions

License

Parameters

ACE-Step

Edit

Advanced

Changelog

🎨 Audio Visualization & Enhanced User Experience

🎵 Initial Release - ACE-Step Music Generation Integration

Related Operators