Skip to content
  1. OPERATORS
  2. PIPELINES

Acestep

v3.0.0updated

The ACE-Step Music Generator integrates the ACE-Step diffusion model into TouchDesigner for text-to-music generation, audio-to-audio transformation, and audio editing. All inference runs on the external SideCar process, keeping TouchDesigner responsive. Generated audio includes a real-time waveform visualizer and optional autoplay.

  • Text-to-Music: Generate music from descriptive tags, genres, and structured lyrics
  • Audio-to-Audio: Transform existing audio using a reference file with adjustable influence strength
  • Audio Editing: Edit, repaint, retake, or extend existing audio with fine-grained control
  • SideCar Architecture: All model loading and inference is offloaded to an external process
  • Auto Repository Setup: Prompts to download and clone the ACE-Step repository on first use
  • Settings Recall: Save and reload generation parameters from previous outputs
  • SideCar Operator: Must be running and connected. All model inference happens there
  • SideCar Python Environment: All ACE-Step dependencies (torch, torchaudio, librosa, diffusers, etc.) must be installed in the SideCar’s Python environment. This operator does not manage packages
  • Git: Must be installed and in your system PATH for automatic repository cloning

None required. Queries are configured via parameters. Reference audio files are specified by file path for audio-to-audio and editing modes.

  • Waveform Visualization: Real-time visual waveform rendered to an internal scriptTOP
  • Audio Files: Generated WAV files saved to the configured output folder
  1. Ensure the SideCar is running and connected (check the About page, ‘SideCar Operator’)
  2. On the ACE-Step page, enter descriptive tags in ‘Prompt / Tags’ (e.g., “upbeat pop, catchy melody, female singer”)
  3. Enter structured lyrics in ‘Lyrics’ using tags like [verse], [chorus]
  4. Set ‘Audio Duration’ to the desired length in seconds
  5. Pulse ‘Generate Music’
  6. If this is your first time, a dialog will prompt you to download the ACE-Step repository — click Download and wait for it to finish, then pulse ‘Generate Music’ again
  1. On the ACE-Step page, toggle ‘Enable Audio2Audio’ to On
  2. Set ‘Reference Audio Input’ to your source audio file
  3. Adjust ‘Reference Audio Strength’ — higher values stay closer to the reference
  4. Enter a prompt and lyrics to guide the transformation
  5. Pulse ‘Generate Music’

Audio Editing (Edit, Repaint, Retake, Extend)

Section titled “Audio Editing (Edit, Repaint, Retake, Extend)”
  1. On the Edit page, toggle ‘Enable Audio Editing/Manipulation’ to On
  2. Set ‘Source Audio Path’ to the audio you want to modify
  3. Select an ‘Edit Mode’:
    • Edit Audio Content: Changes the content of the audio using original and target prompts/lyrics. Requires filling in ‘Original Prompt’ and ‘Original Lyrics’ — pulse ‘Load Src Credentials’ to auto-fill these from a previous generation’s saved parameters
    • Extend Audio Duration: Extends the audio by setting ‘Extend Start’ and ‘Extend End’ beyond the original boundaries
    • Repaint Audio Segment: Regenerates a time region defined by start/end times
    • Retake Full Audio: Regenerates the entire audio with variance control
  4. Adjust ‘Variance’ and ‘Variant Seed’ for variation control
  5. Pulse ‘Generate Music’ on the ACE-Step page
  1. Set ‘Current Audio’ to a previously generated WAV file
  2. Pulse ‘Settings from Current Audio’ — this loads all generation parameters from the associated JSON file saved alongside the audio
  • Set ‘Manual Seed’ to a specific value for reproducible results, or leave at -1 for random
  • Toggle ‘Add Unique Suffix to Filename’ to On to prevent overwriting previous outputs
  • For audio editing, always use ‘Load Src Credentials’ to auto-fill the original prompt and lyrics rather than typing them manually
  • Higher ‘Inference Steps’ improve quality but increase generation time — 60 is a good starting point
  • On the Advanced page, ‘Use bfloat16 Precision’ speeds up inference on supported GPUs. Disable it on macOS or if you encounter errors
  • SideCar Not Connected: Check that the SideCar server is running. Verify the ‘SideCar Operator’ reference on the About page points to the correct operator
  • Repository Missing: If the clone prompt appears repeatedly, check your internet connection and Git installation. Review the TouchDesigner console for detailed errors
  • Missing Dependencies: Errors about missing Python packages (e.g., torch, librosa) mean you need to install them in the SideCar’s Python environment manually
  • torch.compile() Not Supported on Windows: The ACE-Step model does not support torch.compile() on Windows. Leave this toggle off unless running on Linux

Research & Licensing

ACE-STEP Project

The ACE-STEP project is an open-source initiative focused on advancing AI music generation.

ACE-Step

ACE-Step is a foundation model for music generation that integrates diffusion-based generation with advanced encoding and transformation techniques.

Technical Details

  • Combines diffusion with DCAE and linear transformer architecture
  • Uses MERT and m-hubert for semantic representation alignment (REPA)
  • Supports text-to-music, audio-to-audio, edit, repaint, retake, and extend tasks

Research Impact

  • Provides a holistic open-source architecture for state-of-the-art music generation
  • Enables original music generation across diverse genres for creative production and education

Citation

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Key Research Contributions

  • Open-source foundation model for music generation using diffusion with Deep Compression AutoEncoder (DCAE) and lightweight linear transformer
  • Leverages MERT and m-hubert for semantic alignment (REPA) enabling rapid training convergence
  • Faster synthesis than LLM-based models (up to 4 minutes of music in 20 seconds on A100 GPU)
  • Supports voice cloning, lyric editing, remixing, and track generation through fine-grained acoustic control

License

Apache License 2.0 - This model is freely available for research and commercial use.

Status (Status) op('acestep').par.Status Str
Default:
"" (Empty String)
Generate Music (Generate) op('acestep').par.Generate Pulse
Default:
False
Caption / Prompt (Prompt) op('acestep').par.Prompt Str

Music description (max 512 chars). Tags, genres, mood.

Default:
groovy funky syncopated
Lyrics (Lyrics) op('acestep').par.Lyrics Str

Lyrics with structure tags like [verse], [chorus]. Use [Instrumental] for no vocals.

Default:
[[instrumental]]
Duration (s) (Duration) op('acestep').par.Duration Float

Audio duration in seconds (10-600).

Default:
15.0
Range:
10 to 600
BPM (Bpm) op('acestep').par.Bpm Int

Beats per minute. 0 = auto-detect.

Default:
0
Range:
0 to 300
Key (Keyscale) op('acestep').par.Keyscale StrMenu

Musical key. Auto = let the model decide.

Default:
auto
Menu Options:
  • Auto (auto)
  • C Major (C major)
  • C Minor (C minor)
  • C# Major (C# major)
  • C# Minor (C# minor)
  • Db Major (Db major)
  • Db Minor (Db minor)
  • D Major (D major)
  • D Minor (D minor)
  • D# Major (D# major)
  • D# Minor (D# minor)
  • Eb Major (Eb major)
  • Eb Minor (Eb minor)
  • E Major (E major)
  • E Minor (E minor)
  • F Major (F major)
  • F Minor (F minor)
  • F# Major (F# major)
  • F# Minor (F# minor)
  • Gb Major (Gb major)
  • Gb Minor (Gb minor)
  • G Major (G major)
  • G Minor (G minor)
  • G# Major (G# major)
  • G# Minor (G# minor)
  • Ab Major (Ab major)
  • Ab Minor (Ab minor)
  • A Major (A major)
  • A Minor (A minor)
  • A# Major (A# major)
  • A# Minor (A# minor)
  • Bb Major (Bb major)
  • Bb Minor (Bb minor)
  • B Major (B major)
  • B Minor (B minor)
Time Signature (Timesignature) op('acestep').par.Timesignature Menu

Time signature. Auto = let the model decide.

Default:
auto
Options:
auto, 2, 3, 4, 6
Vocal Language (Vocallanguage) op('acestep').par.Vocallanguage StrMenu

Vocal language. "Instrumental / Auto" for no vocals.

Default:
en
Menu Options:
  • Instrumental / Auto (unknown)
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Japanese (ja)
  • Korean (ko)
  • Mandarin (zh)
  • Cantonese (yue)
  • Arabic (ar)
  • Azerbaijani (az)
  • Bulgarian (bg)
  • Bengali (bn)
  • Catalan (ca)
  • Czech (cs)
  • Danish (da)
  • Greek (el)
  • Persian (fa)
  • Finnish (fi)
  • Hebrew (he)
  • Hindi (hi)
  • Croatian (hr)
  • Haitian Creole (ht)
  • Hungarian (hu)
  • Indonesian (id)
  • Icelandic (is)
  • Latin (la)
  • Lithuanian (lt)
  • Malay (ms)
  • Nepali (ne)
  • Dutch (nl)
  • Norwegian (no)
  • Punjabi (pa)
  • Polish (pl)
  • Romanian (ro)
  • Russian (ru)
  • Sanskrit (sa)
  • Slovak (sk)
  • Serbian (sr)
  • Swedish (sv)
  • Swahili (sw)
  • Tamil (ta)
  • Telugu (te)
  • Thai (th)
  • Tagalog (tl)
  • Turkish (tr)
  • Ukrainian (uk)
  • Urdu (ur)
  • Vietnamese (vi)
Seed (Seed) op('acestep').par.Seed Int

-1 for random seed.

Default:
-1
Range:
-1 to 1000000000
Batch Size (Batchsize) op('acestep').par.Batchsize Int

Number of variations to generate.

Default:
1
Range:
1 to 8
Output Header
Audio Format (Audioformat) op('acestep').par.Audioformat Menu
Default:
wav
Options:
wav, mp3, flac, opus, aac
Output Folder (Outputfolder) op('acestep').par.Outputfolder Folder

Folder to save downloaded audio. Relative to project or absolute.

Default:
audio_out
Current Audio File (Currentaudio) op('acestep').par.Currentaudio File
Default:
"" (Empty String)
Server Header
Check Server Health (Checkhealth) op('acestep').par.Checkhealth Pulse
Default:
False
Load Settings from Audio (Loadsettings) op('acestep').par.Loadsettings Pulse
Default:
False
Server Host (Serverhost) op('acestep').par.Serverhost Str
Default:
127.0.0.1
Server Port (Serverport) op('acestep').par.Serverport Int
Default:
0
Range:
1 to 65535
Transport Header
Autoplay (Autoplay) op('acestep').par.Autoplay Toggle

Automatically play audio after generation completes.

Default:
True
Play (Play) op('acestep').par.Play Pulse
Default:
False
Pause (Pause) op('acestep').par.Pause Pulse
Default:
False
Stop (Stop) op('acestep').par.Stop Pulse
Default:
False
Replay (Replay) op('acestep').par.Replay Pulse
Default:
False
Next (Next) op('acestep').par.Next Pulse
Default:
False
Previous (Previous) op('acestep').par.Previous Pulse
Default:
False
Playhead (Playhead) op('acestep').par.Playhead Float

Scrub position (0-1).

Default:
0.0
Range:
0 to 1
Volume (Volume) op('acestep').par.Volume Float
Default:
0.7
Range:
0 to 1
Playlist Header
Track # (Playlistindex) op('acestep').par.Playlistindex Int

Current track index in the generation history.

Default:
1
Range:
0 to 999
Total Tracks (Playlistcount) op('acestep').par.Playlistcount Int
Default:
0
Range:
0 to 1
Clear History (Clearhistory) op('acestep').par.Clearhistory Pulse
Default:
False
Model (Model) op('acestep').par.Model Menu

DiT model variant. Server Default uses whatever the sidecar loaded.

Default:
default
Options:
default, acestep-v15-turbo, acestep-v15-base, acestep-v15-sft, acestep-v15-turbo-shift1, acestep-v15-turbo-shift3, acestep-v15-turbo-continuous, acestep-v15-xl-base, acestep-v15-xl-sft, acestep-v15-xl-turbo
Inference Steps (Infersteps) op('acestep').par.Infersteps Int

Turbo: 1-20, Base: 1-200.

Default:
8
Range:
1 to 200
Guidance Scale (Guidancescale) op('acestep').par.Guidancescale Float
Default:
7.0
Range:
1 to 15
Timestep Shift (Shift) op('acestep').par.Shift Float

Only effective for base models.

Default:
3.0
Range:
1 to 5
Infer Method (Infermethod) op('acestep').par.Infermethod Menu
Default:
ode
Options:
ode, sde
Use ADG (Useadg) op('acestep').par.Useadg Toggle
Default:
False
CFG Interval Start (Cfgintervalstart) op('acestep').par.Cfgintervalstart Float
Default:
0.0
Range:
0 to 1
CFG Interval End (Cfgintervalend) op('acestep').par.Cfgintervalend Float
Default:
1.0
Range:
0 to 1
Task Type (Tasktype) op('acestep').par.Tasktype Menu
Default:
text2music
Options:
text2music, cover, repaint, lego, extract, complete
Source Audio (Srcaudiopath) op('acestep').par.Srcaudiopath File

Source audio file for cover/repaint/lego/extract/complete.

Default:
"" (Empty String)
Reference Audio (Referenceaudiopath) op('acestep').par.Referenceaudiopath File

Reference audio for style guidance.

Default:
"" (Empty String)
Cover Strength (Audiocoverstrength) op('acestep').par.Audiocoverstrength Float
Default:
1.0
Range:
0 to 1
Cover Noise Strength (Covernoisestrength) op('acestep').par.Covernoisestrength Float

0.0 = pure noise, 1.0 = closest to source.

Default:
0.0
Range:
0 to 1
Instruction (Instruction) op('acestep').par.Instruction Str

Instruction for the model. Default works for most tasks.

Default:
Fill the audio semantic mask based on the given conditions:
Repaint / Complete Header
Repaint Start (s) (Repaintingstart) op('acestep').par.Repaintingstart Float
Default:
0.0
Range:
0 to 600
Repaint End (s) (Repaintingend) op('acestep').par.Repaintingend Float

0 = auto (full duration).

Default:
0.0
Range:
0 to 600
Repaint Mode (Repaintmode) op('acestep').par.Repaintmode Menu
Default:
balanced
Options:
conservative, balanced, aggressive
Repaint Strength (Repaintstrength) op('acestep').par.Repaintstrength Float

Only used in balanced mode. 0=conservative, 1=aggressive.

Default:
0.5
Range:
0 to 1
Enable LLM (Initllm) op('acestep').par.Initllm Toggle

Load a 5Hz language model for prompt expansion. Improves quality but uses 1-8GB extra VRAM. Requires sidecar restart to take effect.

Default:
False
LLM Model Size (Llmmodel) op('acestep').par.Llmmodel Menu

Which 5Hz LM to load. Requires sidecar restart to take effect.

Default:
0.6B
Options:
0.6B, 1.7B, 4B
Enable LM Thinking (Thinking) op('acestep').par.Thinking Toggle

Use 5Hz LM chain-of-thought (higher quality, slower). Requires LLM enabled.

Default:
False
Enhance Input via LM (Useformat) op('acestep').par.Useformat Toggle

Use format_sample() to enhance caption/lyrics. Requires LLM enabled.

Default:
False
LM Temperature (Lmtemperature) op('acestep').par.Lmtemperature Float
Default:
0.85
Range:
0 to 2
LM CFG Scale (Lmcfgscale) op('acestep').par.Lmcfgscale Float
Default:
2.5
Range:
0 to 10
LM Top P (Lmtopp) op('acestep').par.Lmtopp Float
Default:
0.9
Range:
0 to 1
Tool Preset (Toolpreset) op('acestep').par.Toolpreset Menu
Default:
off
Options:
custom, off, readonly, interactive, full
Generate Music (Enablegenerate) op('acestep').par.Enablegenerate Toggle
Default:
False
Playback (Enableplayback) op('acestep').par.Enableplayback Toggle

Expose playback tool — transport controls, track switching, volume.

Default:
False
Get Tracks (Enablegettracks) op('acestep').par.Enablegettracks Toggle

Expose get_tracks tool — browse generation history and track metadata.

Default:
False
v3.0.02026-05-02
  • added Toolname/Tooldescription pars for all 3 tools (generate, playback, get_tracks) - added _tool_decl helper for parameter-driven tool name/description resolution - updated category to Pipelines
  • rewrote GetTool to use tool_definition wrapper format
  • major extension rewrite (578 insertions) - added docs
  • rewrote extension as thin HTTP client to ACE-Step 1.5 API sidecar - removed embedded pipeline, dependency management, and repo cloning logic - added aiohttp-based task submission, polling, and audio download
v2.0.02025-07-19

🎨 Audio Visualization & Enhanced User Experience

  • Real-Time Audio Visualization:
    • Professional waveform visualization with frequency analysis
    • High-quality grayscale waveform rendering at 1280x960 resolution
    • Dynamic amplitude processing with transient emphasis
    • Frequency-based brightness variation for rich visual feedback
    • Anti-aliasing and smooth envelope generation
  • Automatic Visualization Triggers:
    • Auto-visualization after successful generation
    • Manual visualization via Currentaudio parameter changes
    • Smart path tracking to prevent redundant processing
  • Visual Feedback Enhancements:
    • Black screen clearing when no audio is selected
    • Visual confirmation of current audio status
    • Seamless integration with audio playback controls

    TECHNICAL IMPROVEMENTS:

    • Async Visualization Processing:
      • Non-blocking waveform generation using TDAsyncIO
      • Thread-safe audio analysis with librosa integration
      • Graceful fallback to synchronous processing when needed
    • Robust Error Handling:
      • Fixed critical len() type errors in visualization pipeline
      • Comprehensive try/catch blocks around FFT processing
      • Safe numpy array type checking throughout audio pipeline
    • Parameter Callback System:
      • New Currentaudio() callback method for parameter-driven visualization
      • Intelligent request state checking to prevent conflicts during generation
      • Path validation and existence checking before processing

      BUG FIXES:

      • Critical Stability Fixes:
        • Resolved TouchDesigner crashes caused by async visualization errors
        • Fixed "object of type 'int' has no len()" errors in audio processing
        • Improved error handling in FFT frequency analysis
        • Safe handling of edge cases in audio array processing
      • Visualization Pipeline Fixes:
        • Proper numpy array type validation throughout processing chain
        • Graceful handling of malformed or empty audio files
        • Improved error logging for debugging visualization issues

        USER EXPERIENCE ENHANCEMENTS:

        • Visual Audio Management:
          • Immediate visual feedback when changing current audio file
          • Clear visual indication when no audio is loaded (black screen)
          • Smooth integration between generation and visualization workflows
        • Status & Logging Improvements:
          • Enhanced logging for visualization processes
          • Clear status messages for audio loading and processing
          • Improved error messages for troubleshooting

          TECHNICAL DETAILS:

          • Visualization Engine:
            • Uses librosa for professional audio analysis
            • Implements RMS and peak envelope detection
            • FFT-based frequency analysis for visual brightness variation
            • Supports both sync and async processing modes
          • Integration Points:
            • Seamless connection with existing audio playback system
            • Compatible with all generation modes (text2music, audio2audio, editing)
            • Maintains full backward compatibility with v1.0.0 workflows

            PERFORMANCE:

            • Optimized waveform generation with configurable resolution
            • Efficient memory usage in visualization processing
            • Non-blocking UI during visualization generation
v1.0.02025-06-20

🎵 Initial Release - ACE-Step Music Generation Integration

NEW FEATURES:

  • Text-to-Music Generation: Generate music from text prompts and descriptive tags
  • Lyrics Support: Full lyrics integration with structure tags like [verse], [chorus]
  • Audio2Audio Mode: Transform existing audio using prompts and lyrics as guidance
  • Advanced Audio Editing: Complete suite of audio manipulation tools:
    • Edit Audio Content: Modify existing audio with target prompts/lyrics
    • Repaint Audio Segment: Replace specific time segments
    • Retake Full Audio: Generate variations of entire audio
    • Extend Audio Duration: Extend audio beyond original length
  • Professional Parameter Control:
    • Inference steps, guidance scales, scheduler types (Euler, Heun)
    • CFG types (APG, CFG, Zero STAR, Double Condition)
    • ERG (Exponentially Smoothed Moving Average Guidance) controls
    • Manual seed support for reproducible generation
  • SideCar Integration: Seamless integration with SideCar server for distributed processing
  • Dependency Management: Automatic detection and installation of required Python packages
  • Output Management:
    • Configurable output folders and filenames
    • Automatic unique timestamp suffixes
    • JSON parameter saving for reproducibility
  • Settings Management: Load/save generation parameters from JSON files
  • Audio Playback: Built-in audio playback with playhead control
  • Model Management: Initialize, load, and unload models on demand

TECHNICAL FEATURES:

  • Three-Page Parameter Layout:
    • Main: Core generation and output settings
    • Edit: Audio editing and manipulation controls
    • Advanced: Professional diffusion and guidance parameters
  • Async Processing: Non-blocking generation via TDAsyncIO integration
  • Error Handling: Comprehensive dependency checking and error recovery
  • Status Monitoring: Real-time status updates and progress tracking

SUPPORTED WORKFLOWS:

  • Text → Music: Generate music from descriptive prompts
  • Audio → Audio: Transform existing audio with new characteristics
  • Audio Editing: Professional audio manipulation and refinement
  • Batch Processing: Generate multiple variations with different seeds

REQUIREMENTS:

  • ACE-Step repository (user must clone and configure)
  • SideCar operator for processing
  • Python dependencies (auto-installed when possible)
  • Optional: Custom checkpoints directory