Skip to content

ACE-Step Music Generator

v2.0.0

The ACE-Step Music Generator integrates the ACE-Step diffusion model into TouchDesigner for text-to-music generation, audio-to-audio transformation, and audio editing. All inference runs on the external SideCar process, keeping TouchDesigner responsive. Generated audio includes a real-time waveform visualizer and optional autoplay.

  • Text-to-Music: Generate music from descriptive tags, genres, and structured lyrics
  • Audio-to-Audio: Transform existing audio using a reference file with adjustable influence strength
  • Audio Editing: Edit, repaint, retake, or extend existing audio with fine-grained control
  • SideCar Architecture: All model loading and inference is offloaded to an external process
  • Auto Repository Setup: Prompts to download and clone the ACE-Step repository on first use
  • Settings Recall: Save and reload generation parameters from previous outputs
  • SideCar Operator: Must be running and connected. All model inference happens there
  • SideCar Python Environment: All ACE-Step dependencies (torch, torchaudio, librosa, diffusers, etc.) must be installed in the SideCar’s Python environment. This operator does not manage packages
  • Git: Must be installed and in your system PATH for automatic repository cloning

None required. Queries are configured via parameters. Reference audio files are specified by file path for audio-to-audio and editing modes.

  • Waveform Visualization: Real-time visual waveform rendered to an internal scriptTOP
  • Audio Files: Generated WAV files saved to the configured output folder
  1. Ensure the SideCar is running and connected (check the About page, ‘SideCar Operator’)
  2. On the ACE-Step page, enter descriptive tags in ‘Prompt / Tags’ (e.g., “upbeat pop, catchy melody, female singer”)
  3. Enter structured lyrics in ‘Lyrics’ using tags like [verse], [chorus]
  4. Set ‘Audio Duration’ to the desired length in seconds
  5. Pulse ‘Generate Music’
  6. If this is your first time, a dialog will prompt you to download the ACE-Step repository — click Download and wait for it to finish, then pulse ‘Generate Music’ again
  1. On the ACE-Step page, toggle ‘Enable Audio2Audio’ to On
  2. Set ‘Reference Audio Input’ to your source audio file
  3. Adjust ‘Reference Audio Strength’ — higher values stay closer to the reference
  4. Enter a prompt and lyrics to guide the transformation
  5. Pulse ‘Generate Music’

Audio Editing (Edit, Repaint, Retake, Extend)

Section titled “Audio Editing (Edit, Repaint, Retake, Extend)”
  1. On the Edit page, toggle ‘Enable Audio Editing/Manipulation’ to On
  2. Set ‘Source Audio Path’ to the audio you want to modify
  3. Select an ‘Edit Mode’:
    • Edit Audio Content: Changes the content of the audio using original and target prompts/lyrics. Requires filling in ‘Original Prompt’ and ‘Original Lyrics’ — pulse ‘Load Src Credentials’ to auto-fill these from a previous generation’s saved parameters
    • Extend Audio Duration: Extends the audio by setting ‘Extend Start’ and ‘Extend End’ beyond the original boundaries
    • Repaint Audio Segment: Regenerates a time region defined by start/end times
    • Retake Full Audio: Regenerates the entire audio with variance control
  4. Adjust ‘Variance’ and ‘Variant Seed’ for variation control
  5. Pulse ‘Generate Music’ on the ACE-Step page
  1. Set ‘Current Audio’ to a previously generated WAV file
  2. Pulse ‘Settings from Current Audio’ — this loads all generation parameters from the associated JSON file saved alongside the audio
  • Set ‘Manual Seed’ to a specific value for reproducible results, or leave at -1 for random
  • Toggle ‘Add Unique Suffix to Filename’ to On to prevent overwriting previous outputs
  • For audio editing, always use ‘Load Src Credentials’ to auto-fill the original prompt and lyrics rather than typing them manually
  • Higher ‘Inference Steps’ improve quality but increase generation time — 60 is a good starting point
  • On the Advanced page, ‘Use bfloat16 Precision’ speeds up inference on supported GPUs. Disable it on macOS or if you encounter errors
  • SideCar Not Connected: Check that the SideCar server is running. Verify the ‘SideCar Operator’ reference on the About page points to the correct operator
  • Repository Missing: If the clone prompt appears repeatedly, check your internet connection and Git installation. Review the TouchDesigner console for detailed errors
  • Missing Dependencies: Errors about missing Python packages (e.g., torch, librosa) mean you need to install them in the SideCar’s Python environment manually
  • torch.compile() Not Supported on Windows: The ACE-Step model does not support torch.compile() on Windows. Leave this toggle off unless running on Linux

Research & Licensing

ACE-STEP Project

The ACE-STEP project is an open-source initiative focused on advancing AI music generation.

ACE-Step

ACE-Step is a foundation model for music generation that integrates diffusion-based generation with advanced encoding and transformation techniques.

Technical Details

  • Combines diffusion with DCAE and linear transformer architecture
  • Uses MERT and m-hubert for semantic representation alignment (REPA)
  • Supports text-to-music, audio-to-audio, edit, repaint, retake, and extend tasks

Research Impact

  • Provides a holistic open-source architecture for state-of-the-art music generation
  • Enables original music generation across diverse genres for creative production and education

Citation

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Key Research Contributions

  • Open-source foundation model for music generation using diffusion with Deep Compression AutoEncoder (DCAE) and lightweight linear transformer
  • Leverages MERT and m-hubert for semantic alignment (REPA) enabling rapid training convergence
  • Faster synthesis than LLM-based models (up to 4 minutes of music in 20 seconds on A100 GPU)
  • Supports voice cloning, lyric editing, remixing, and track generation through fine-grained acoustic control

License

Apache License 2.0 - This model is freely available for research and commercial use.

Status (Status) op('acestep').par.Status Str

Current status of the operator.

Default:
"" (Empty String)
Active (Active) op('acestep').par.Active Toggle
Default:
False
Current Audio (Currentaudio) op('acestep').par.Currentaudio File
Default:
"" (Empty String)
Playhead (Playhead) op('acestep').par.Playhead Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Autoplay After Generation (Autoplay) op('acestep').par.Autoplay Toggle

Automatically play the audio after generation.

Default:
False
Generate Music (Generate) op('acestep').par.Generate Pulse

Trigger the music generation process based on current settings.

Default:
False
Core Generation Header
Prompt / Tags (Prompt) op('acestep').par.Prompt Str

Descriptive tags, genres, or scene descriptions. Used for text2music, audio2audio, and as a basis for edit/repaint.

Default:
"" (Empty String)
Lyrics (Lyrics) op('acestep').par.Lyrics Str

Enter lyrics with structure tags like [verse], [chorus]. Use \\\\n for newlines. Used for text2music, audio2audio, and as a basis for edit/repaint.

Default:
"" (Empty String)
Audio Duration (s) (Duration) op('acestep').par.Duration Float

Desired duration of the generated audio in seconds.

Default:
0.0
Range:
1 to 240
Slider Range:
1 to 240
Inference Steps (Infersteps) op('acestep').par.Infersteps Int

Number of inference steps. Higher can improve quality but takes longer.

Default:
0
Range:
10 to 100
Slider Range:
10 to 100
Manual Seed (Manualseed) op('acestep').par.Manualseed Int

Seed for reproducibility. -1 for random. Affects initial generation.

Default:
0
Range:
-1 to 1000000000
Slider Range:
-1 to 1000000000
Scheduler Type (Schedulertype) op('acestep').par.Schedulertype Menu

Scheduler type for diffusion process.

Default:
euler
Options:
euler, heun
CFG Type (Cfgtype) op('acestep').par.Cfgtype Menu

Type of Classifier-Free Guidance.

Default:
apg
Options:
apg, cfg, double_condition, zero_star
Guidance Scale (Main) (Guidancescale) op('acestep').par.Guidancescale Float

Main classifier-free guidance scale. Used if CFG Type is not "Double Condition".

Default:
0.0
Range:
1 to 30
Slider Range:
1 to 30
Omega Scale (for APG) (Omegascale) op('acestep').par.Omegascale Float

Omega scale factor for APG guidance type.

Default:
0.0
Range:
0 to 20
Slider Range:
0 to 20
Text Guidance Scale (Double Cond) (Guidancescaletext) op('acestep').par.Guidancescaletext Float

Guidance scale for text prompt when CFG Type is "Double Condition".

Default:
0.0
Range:
0 to 30
Slider Range:
0 to 30
Lyric Guidance Scale (Double Cond) (Guidancescalelyric) op('acestep').par.Guidancescalelyric Float

Guidance scale for lyrics when CFG Type is "Double Condition".

Default:
0.0
Range:
0 to 30
Slider Range:
0 to 30
Audio2Audio Mode [ Euler Scheduler Only ] Header
Enable Audio2Audio (Audio2audioenable) op('acestep').par.Audio2audioenable Toggle

Enable audio-to-audio generation. Uses Prompt & Lyrics as guidance if provided.

Default:
False
Reference Audio Input (Refaudioinput) op('acestep').par.Refaudioinput File

Path to the reference audio file for Audio2Audio mode.

Default:
"" (Empty String)
Reference Audio Strength (Refaudiostrength) op('acestep').par.Refaudiostrength Float

Strength of the reference audio influence (0.0 to 1.0).

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Output Settings Header
Output Folder (Outputfolder) op('acestep').par.Outputfolder Folder

Folder to save the generated WAV file. Relative to project or absolute.

Default:
"" (Empty String)
Output Filename (Outputfilename) op('acestep').par.Outputfilename Str

Name of the generated WAV file.

Default:
"" (Empty String)
Add Unique Suffix to Filename (Uniquesuffix) op('acestep').par.Uniquesuffix Toggle

If True, appends a timestamp to the filename to prevent overwriting.

Default:
False
Initialize ACE-Step Model (Initialize) op('acestep').par.Initialize Pulse

Check Dependencies, SideCar Connection, and Initialize Model.

Default:
False
Unload Model (Unloadmodel) op('acestep').par.Unloadmodel Pulse

Release the model from memory via SideCar.

Default:
False
Settings from Current Audio (Loadsettings) op('acestep').par.Loadsettings Pulse

Load generation parameters from the JSON associated with the Current Audio file.

Default:
False
Enable Audio Editing/Manipulation (Editaudio) op('acestep').par.Editaudio Toggle

Master toggle to enable audio editing modes on this page.

Default:
False
Audio Editing Configuration Header
Edit Mode (Editmode) op('acestep').par.Editmode Menu

Select the audio manipulation task.

Default:
extend
Options:
edit, extend, repaint, retake
Source Audio Path (Srcaudiopath) op('acestep').par.Srcaudiopath File

Path to the source audio file for all edit modes.

Default:
"" (Empty String)
Extend / Repaint / Retake Header
Variant Seed (Retakeseeds) op('acestep').par.Retakeseeds Int

Seed for retake/repaint/extend variations. -1 for random.

Default:
0
Range:
-1 to 1000000000
Slider Range:
-1 to 1000000000
Variance (Retakevariance) op('acestep').par.Retakevariance Float

Amount of variance for retake/repaint (0.0 to 1.0).

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Extend Start (s) (Repaintstart) op('acestep').par.Repaintstart Float

Start time in seconds for repaint. For extend, negative values pad left. 0 for retake.

Default:
0.0
Range:
-240 to 240
Slider Range:
-240 to 240
Extend End (s) (Repaintend) op('acestep').par.Repaintend Float

End time in seconds for repaint. For extend, values beyond original duration extend right. Original duration for retake.

Default:
0.0
Range:
-240 to 480
Slider Range:
-240 to 480
Transition Time (s) (Transitiontime) op('acestep').par.Transitiontime Float

Duration of the transition/crossfade in seconds for repaint/extend modes. 0 for abrupt change.

Default:
0.0
Range:
0 to 30
Slider Range:
0 to 30
Edit Audio Content [ Slower ] Header
Original Prompt (Editoriginalprompt) op('acestep').par.Editoriginalprompt Str

The original prompt used to generate the Source Audio. Required for "Edit Audio Content" mode.

Default:
"" (Empty String)
Original Lyrics (Editoriginallyrics) op('acestep').par.Editoriginallyrics Str

The original lyrics used to generate the Source Audio. Required for "Edit Audio Content" mode.

Default:
"" (Empty String)
Target Prompt (Edittargetprompt) op('acestep').par.Edittargetprompt Str

Target prompt for "Edit Audio Content" mode. If empty, uses main prompt.

Default:
"" (Empty String)
Target Lyrics (Edittargetlyrics) op('acestep').par.Edittargetlyrics Str

Target lyrics for "Edit Audio Content" mode. If empty, uses main lyrics.

Default:
"" (Empty String)
Min Influence (n_min) (Editnmin) op('acestep').par.Editnmin Float

Min influence for audio editing (0.0 to 1.0).

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Max Influence (n_max) (Editnmax) op('acestep').par.Editnmax Float

Max influence for audio editing (0.0 to 1.0).

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Avg Window (n_avg) (Editnavg) op('acestep').par.Editnavg Int

Averaging window size for editing.

Default:
0
Range:
1 to 100
Slider Range:
1 to 100
Load Src Credentials (Loadsrccredentials) op('acestep').par.Loadsrccredentials Pulse

Loads prompt and lyrics from the _input_params.json associated with the Src Audio Path.

Default:
False
Advanced Guidance Control Header
Guidance Interval (Guidanceinterval) op('acestep').par.Guidanceinterval Float

Guidance interval for CFG.

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Guidance Interval Decay (Guidanceintervaldecay) op('acestep').par.Guidanceintervaldecay Float

Decay rate for guidance interval.

Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Min Guidance Scale (Minguidancescale) op('acestep').par.Minguidancescale Float

Minimum guidance scale.

Default:
0.0
Range:
0 to 30
Slider Range:
0 to 30
ERG Control Header
Use ERG for Tags (Usergtag) op('acestep').par.Usergtag Toggle

Enable ERG (Exponentially Smoothed Moving Average Guidance) for prompt/tags.

Default:
False
Use ERG for Lyrics (Userglyric) op('acestep').par.Userglyric Toggle

Enable ERG for lyrics.

Default:
False
Use ERG for Diffusion (Usergdiffusion) op('acestep').par.Usergdiffusion Toggle

Enable ERG for diffusion process.

Default:
False
Other Advanced Parameters Header
Use Optimal Step Size (OSS) (Useoss) op('acestep').par.Useoss Toggle

Enable Optimal Step Size scheduling. Only effective if Scheduler Type is Euler.

Default:
False
OSS Steps (e.g., 20,50,100) (Osssteps) op('acestep').par.Osssteps Str

Steps for OSS (Optimal Step Size) scheduling, comma-separated. MUST be used with Euler Scheduler.

Default:
"" (Empty String)
Device & Precision Header
GPU Device ID (Deviceid) op('acestep').par.Deviceid Int

GPU device ID to use (e.g., 0, 1). Requires re-initialize.

Default:
0
Range:
0 to 1
Slider Range:
0 to 1
Use bfloat16 Precision (Usebf16) op('acestep').par.Usebf16 Toggle

Use bfloat16 for faster inference (if supported). Uncheck for macOS or if errors occur. Requires re-initialize.

Default:
False
Use torch.compile() (Torchcompile) op('acestep').par.Torchcompile Toggle

Optimize model with torch.compile() for faster inference (Not supported on Windows by ACE-Step). Requires re-initialize.

Default:
False
Model Configuration Header
ACE-Step Repo Path (Modelpath) op('acestep').par.Modelpath Folder

Path to the CLONED ACE-Step GitHub repository directory.

Default:
"" (Empty String)
Checkpoint Dir (Optional) (Checkpointdir) op('acestep').par.Checkpointdir Folder

Path to the ACE-Step model checkpoint DIRECTORY. Leave empty for auto-download to default location inside repo.

Default:
"" (Empty String)
v2.0.02025-07-19

🎨 Audio Visualization & Enhanced User Experience

  • Real-Time Audio Visualization:
    • Professional waveform visualization with frequency analysis
    • High-quality grayscale waveform rendering at 1280x960 resolution
    • Dynamic amplitude processing with transient emphasis
    • Frequency-based brightness variation for rich visual feedback
    • Anti-aliasing and smooth envelope generation
  • Automatic Visualization Triggers:
    • Auto-visualization after successful generation
    • Manual visualization via Currentaudio parameter changes
    • Smart path tracking to prevent redundant processing
  • Visual Feedback Enhancements:
    • Black screen clearing when no audio is selected
    • Visual confirmation of current audio status
    • Seamless integration with audio playback controls

    TECHNICAL IMPROVEMENTS:

    • Async Visualization Processing:
      • Non-blocking waveform generation using TDAsyncIO
      • Thread-safe audio analysis with librosa integration
      • Graceful fallback to synchronous processing when needed
    • Robust Error Handling:
      • Fixed critical len() type errors in visualization pipeline
      • Comprehensive try/catch blocks around FFT processing
      • Safe numpy array type checking throughout audio pipeline
    • Parameter Callback System:
      • New Currentaudio() callback method for parameter-driven visualization
      • Intelligent request state checking to prevent conflicts during generation
      • Path validation and existence checking before processing

      BUG FIXES:

      • Critical Stability Fixes:
        • Resolved TouchDesigner crashes caused by async visualization errors
        • Fixed "object of type 'int' has no len()" errors in audio processing
        • Improved error handling in FFT frequency analysis
        • Safe handling of edge cases in audio array processing
      • Visualization Pipeline Fixes:
        • Proper numpy array type validation throughout processing chain
        • Graceful handling of malformed or empty audio files
        • Improved error logging for debugging visualization issues

        USER EXPERIENCE ENHANCEMENTS:

        • Visual Audio Management:
          • Immediate visual feedback when changing current audio file
          • Clear visual indication when no audio is loaded (black screen)
          • Smooth integration between generation and visualization workflows
        • Status & Logging Improvements:
          • Enhanced logging for visualization processes
          • Clear status messages for audio loading and processing
          • Improved error messages for troubleshooting

          TECHNICAL DETAILS:

          • Visualization Engine:
            • Uses librosa for professional audio analysis
            • Implements RMS and peak envelope detection
            • FFT-based frequency analysis for visual brightness variation
            • Supports both sync and async processing modes
          • Integration Points:
            • Seamless connection with existing audio playback system
            • Compatible with all generation modes (text2music, audio2audio, editing)
            • Maintains full backward compatibility with v1.0.0 workflows

            PERFORMANCE:

            • Optimized waveform generation with configurable resolution
            • Efficient memory usage in visualization processing
            • Non-blocking UI during visualization generation
v1.0.02025-06-20

🎵 Initial Release - ACE-Step Music Generation Integration

NEW FEATURES:

  • Text-to-Music Generation: Generate music from text prompts and descriptive tags
  • Lyrics Support: Full lyrics integration with structure tags like [verse], [chorus]
  • Audio2Audio Mode: Transform existing audio using prompts and lyrics as guidance
  • Advanced Audio Editing: Complete suite of audio manipulation tools:
    • Edit Audio Content: Modify existing audio with target prompts/lyrics
    • Repaint Audio Segment: Replace specific time segments
    • Retake Full Audio: Generate variations of entire audio
    • Extend Audio Duration: Extend audio beyond original length
  • Professional Parameter Control:
    • Inference steps, guidance scales, scheduler types (Euler, Heun)
    • CFG types (APG, CFG, Zero STAR, Double Condition)
    • ERG (Exponentially Smoothed Moving Average Guidance) controls
    • Manual seed support for reproducible generation
  • SideCar Integration: Seamless integration with SideCar server for distributed processing
  • Dependency Management: Automatic detection and installation of required Python packages
  • Output Management:
    • Configurable output folders and filenames
    • Automatic unique timestamp suffixes
    • JSON parameter saving for reproducibility
  • Settings Management: Load/save generation parameters from JSON files
  • Audio Playback: Built-in audio playback with playhead control
  • Model Management: Initialize, load, and unload models on demand

TECHNICAL FEATURES:

  • Three-Page Parameter Layout:
    • Main: Core generation and output settings
    • Edit: Audio editing and manipulation controls
    • Advanced: Professional diffusion and guidance parameters
  • Async Processing: Non-blocking generation via TDAsyncIO integration
  • Error Handling: Comprehensive dependency checking and error recovery
  • Status Monitoring: Real-time status updates and progress tracking

SUPPORTED WORKFLOWS:

  • Text → Music: Generate music from descriptive prompts
  • Audio → Audio: Transform existing audio with new characteristics
  • Audio Editing: Professional audio manipulation and refinement
  • Batch Processing: Generate multiple variations with different seeds

REQUIREMENTS:

  • ACE-Step repository (user must clone and configure)
  • SideCar operator for processing
  • Python dependencies (auto-installed when possible)
  • Optional: Custom checkpoints directory