OPERATORS
PIPELINES

Acestep

v3.0.0updated

Overview

The ACE-Step Music Generator integrates the ACE-Step diffusion model into TouchDesigner for text-to-music generation, audio-to-audio transformation, and audio editing. All inference runs on the external SideCar process, keeping TouchDesigner responsive. Generated audio includes a real-time waveform visualizer and optional autoplay.

Key Features

Text-to-Music: Generate music from descriptive tags, genres, and structured lyrics
Audio-to-Audio: Transform existing audio using a reference file with adjustable influence strength
Audio Editing: Edit, repaint, retake, or extend existing audio with fine-grained control
SideCar Architecture: All model loading and inference is offloaded to an external process
Auto Repository Setup: Prompts to download and clone the ACE-Step repository on first use
Settings Recall: Save and reload generation parameters from previous outputs

Requirements

SideCar Operator: Must be running and connected. All model inference happens there
SideCar Python Environment: All ACE-Step dependencies (torch, torchaudio, librosa, diffusers, etc.) must be installed in the SideCar’s Python environment. This operator does not manage packages
Git: Must be installed and in your system PATH for automatic repository cloning

Input/Output

Inputs

None required. Queries are configured via parameters. Reference audio files are specified by file path for audio-to-audio and editing modes.

Outputs

Waveform Visualization: Real-time visual waveform rendered to an internal scriptTOP
Audio Files: Generated WAV files saved to the configured output folder

Usage Examples

Text-to-Music Generation

Ensure the SideCar is running and connected (check the About page, ‘SideCar Operator’)
On the ACE-Step page, enter descriptive tags in ‘Prompt / Tags’ (e.g., “upbeat pop, catchy melody, female singer”)
Enter structured lyrics in ‘Lyrics’ using tags like [verse], [chorus]
Set ‘Audio Duration’ to the desired length in seconds
Pulse ‘Generate Music’
If this is your first time, a dialog will prompt you to download the ACE-Step repository — click Download and wait for it to finish, then pulse ‘Generate Music’ again

Audio-to-Audio Transformation

On the ACE-Step page, toggle ‘Enable Audio2Audio’ to On
Set ‘Reference Audio Input’ to your source audio file
Adjust ‘Reference Audio Strength’ — higher values stay closer to the reference
Enter a prompt and lyrics to guide the transformation
Pulse ‘Generate Music’

Audio Editing (Edit, Repaint, Retake, Extend)

On the Edit page, toggle ‘Enable Audio Editing/Manipulation’ to On
Set ‘Source Audio Path’ to the audio you want to modify
Select an ‘Edit Mode’:
- Edit Audio Content: Changes the content of the audio using original and target prompts/lyrics. Requires filling in ‘Original Prompt’ and ‘Original Lyrics’ — pulse ‘Load Src Credentials’ to auto-fill these from a previous generation’s saved parameters
- Extend Audio Duration: Extends the audio by setting ‘Extend Start’ and ‘Extend End’ beyond the original boundaries
- Repaint Audio Segment: Regenerates a time region defined by start/end times
- Retake Full Audio: Regenerates the entire audio with variance control
Adjust ‘Variance’ and ‘Variant Seed’ for variation control
Pulse ‘Generate Music’ on the ACE-Step page

Reloading Previous Settings

Set ‘Current Audio’ to a previously generated WAV file
Pulse ‘Settings from Current Audio’ — this loads all generation parameters from the associated JSON file saved alongside the audio

Best Practices

Set ‘Manual Seed’ to a specific value for reproducible results, or leave at -1 for random
Toggle ‘Add Unique Suffix to Filename’ to On to prevent overwriting previous outputs
For audio editing, always use ‘Load Src Credentials’ to auto-fill the original prompt and lyrics rather than typing them manually
Higher ‘Inference Steps’ improve quality but increase generation time — 60 is a good starting point
On the Advanced page, ‘Use bfloat16 Precision’ speeds up inference on supported GPUs. Disable it on macOS or if you encounter errors

Troubleshooting

SideCar Not Connected: Check that the SideCar server is running. Verify the ‘SideCar Operator’ reference on the About page points to the correct operator
Repository Missing: If the clone prompt appears repeatedly, check your internet connection and Git installation. Review the TouchDesigner console for detailed errors
Missing Dependencies: Errors about missing Python packages (e.g., torch, librosa) mean you need to install them in the SideCar’s Python environment manually
torch.compile() Not Supported on Windows: The ACE-Step model does not support torch.compile() on Windows. Leave this toggle off unless running on Linux

Research Citation

Research & Licensing

ACE-STEP Project

The ACE-STEP project is an open-source initiative focused on advancing AI music generation.

ACE-Step

ACE-Step is a foundation model for music generation that integrates diffusion-based generation with advanced encoding and transformation techniques.

Technical Details

Combines diffusion with DCAE and linear transformer architecture
Uses MERT and m-hubert for semantic representation alignment (REPA)
Supports text-to-music, audio-to-audio, edit, repaint, retake, and extend tasks

Research Impact

Provides a holistic open-source architecture for state-of-the-art music generation
Enables original music generation across diverse genres for creative production and education

Citation

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Key Research Contributions

Open-source foundation model for music generation using diffusion with Deep Compression AutoEncoder (DCAE) and lightweight linear transformer
Leverages MERT and m-hubert for semantic alignment (REPA) enabling rapid training convergence
Faster synthesis than LLM-based models (up to 4 minutes of music in 20 seconds on A100 GPU)
Supports voice cloning, lyric editing, remixing, and track generation through fine-grained acoustic control

License

Apache License 2.0 - This model is freely available for research and commercial use.

Parameters

ACE-Step

Status (Status) op('acestep').par.Status Str

Default:: "" (Empty String)

Generate Music (Generate) op('acestep').par.Generate Pulse

Default:: False

Caption / Prompt (Prompt) op('acestep').par.Prompt Str

Music description (max 512 chars). Tags, genres, mood.

Default:: groovy funky syncopated

Lyrics (Lyrics) op('acestep').par.Lyrics Str

Lyrics with structure tags like [verse], [chorus]. Use [Instrumental] for no vocals.

Default:: [[instrumental]]

Duration (s) (Duration) op('acestep').par.Duration Float

Audio duration in seconds (10-600).

Default:: 15.0
Range:: 10 to 600

BPM (Bpm) op('acestep').par.Bpm Int

Beats per minute. 0 = auto-detect.

Default:: 0
Range:: 0 to 300

Key (Keyscale) op('acestep').par.Keyscale StrMenu

Musical key. Auto = let the model decide.

Default:

auto

Menu Options:

Auto (auto)
C Major (C major)
C Minor (C minor)
C# Major (C# major)
C# Minor (C# minor)
Db Major (Db major)
Db Minor (Db minor)
D Major (D major)
D Minor (D minor)
D# Major (D# major)
D# Minor (D# minor)
Eb Major (Eb major)
Eb Minor (Eb minor)
E Major (E major)
E Minor (E minor)
F Major (F major)
F Minor (F minor)
F# Major (F# major)
F# Minor (F# minor)
Gb Major (Gb major)
Gb Minor (Gb minor)
G Major (G major)
G Minor (G minor)
G# Major (G# major)
G# Minor (G# minor)
Ab Major (Ab major)
Ab Minor (Ab minor)
A Major (A major)
A Minor (A minor)
A# Major (A# major)
A# Minor (A# minor)
Bb Major (Bb major)
Bb Minor (Bb minor)
B Major (B major)
B Minor (B minor)

Vocal Language (Vocallanguage) op('acestep').par.Vocallanguage StrMenu

Vocal language. "Instrumental / Auto" for no vocals.

Default:

en

Menu Options:

Instrumental / Auto (unknown)
English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Japanese (ja)
Korean (ko)
Mandarin (zh)
Cantonese (yue)
Arabic (ar)
Azerbaijani (az)
Bulgarian (bg)
Bengali (bn)
Catalan (ca)
Czech (cs)
Danish (da)
Greek (el)
Persian (fa)
Finnish (fi)
Hebrew (he)
Hindi (hi)
Croatian (hr)
Haitian Creole (ht)
Hungarian (hu)
Indonesian (id)
Icelandic (is)
Latin (la)
Lithuanian (lt)
Malay (ms)
Nepali (ne)
Dutch (nl)
Norwegian (no)
Punjabi (pa)
Polish (pl)
Romanian (ro)
Russian (ru)
Sanskrit (sa)
Slovak (sk)
Serbian (sr)
Swedish (sv)
Swahili (sw)
Tamil (ta)
Telugu (te)
Thai (th)
Tagalog (tl)
Turkish (tr)
Ukrainian (uk)
Urdu (ur)
Vietnamese (vi)

Seed (Seed) op('acestep').par.Seed Int

-1 for random seed.

Default:: -1
Range:: -1 to 1000000000

Batch Size (Batchsize) op('acestep').par.Batchsize Int

Number of variations to generate.

Default:: 1
Range:: 1 to 8

Output Header

Output Folder (Outputfolder) op('acestep').par.Outputfolder Folder

Folder to save downloaded audio. Relative to project or absolute.

Default:: audio_out

Current Audio File (Currentaudio) op('acestep').par.Currentaudio File

Default:: "" (Empty String)

Server Header

Check Server Health (Checkhealth) op('acestep').par.Checkhealth Pulse

Default:: False

Load Settings from Audio (Loadsettings) op('acestep').par.Loadsettings Pulse

Default:: False

Server Host (Serverhost) op('acestep').par.Serverhost Str

Default:: 127.0.0.1

Server Port (Serverport) op('acestep').par.Serverport Int

Default:: 0
Range:: 1 to 65535

Playback

Transport Header

Autoplay (Autoplay) op('acestep').par.Autoplay Toggle

Automatically play audio after generation completes.

Default:: True

Play (Play) op('acestep').par.Play Pulse

Default:: False

Pause (Pause) op('acestep').par.Pause Pulse

Default:: False

Stop (Stop) op('acestep').par.Stop Pulse

Default:: False

Replay (Replay) op('acestep').par.Replay Pulse

Default:: False

Next (Next) op('acestep').par.Next Pulse

Default:: False

Previous (Previous) op('acestep').par.Previous Pulse

Default:: False

Playhead (Playhead) op('acestep').par.Playhead Float

Scrub position (0-1).

Default:: 0.0
Range:: 0 to 1

Volume (Volume) op('acestep').par.Volume Float

Default:: 0.7
Range:: 0 to 1

Playlist Header

Track # (Playlistindex) op('acestep').par.Playlistindex Int

Current track index in the generation history.

Default:: 1
Range:: 0 to 999

Total Tracks (Playlistcount) op('acestep').par.Playlistcount Int

Default:: 0
Range:: 0 to 1

Clear History (Clearhistory) op('acestep').par.Clearhistory Pulse

Default:: False

Advanced

Inference Steps (Infersteps) op('acestep').par.Infersteps Int

Turbo: 1-20, Base: 1-200.

Default:: 8
Range:: 1 to 200

Guidance Scale (Guidancescale) op('acestep').par.Guidancescale Float

Default:: 7.0
Range:: 1 to 15

Timestep Shift (Shift) op('acestep').par.Shift Float

Only effective for base models.

Default:: 3.0
Range:: 1 to 5

Use ADG (Useadg) op('acestep').par.Useadg Toggle

Default:: False

CFG Interval Start (Cfgintervalstart) op('acestep').par.Cfgintervalstart Float

Default:: 0.0
Range:: 0 to 1

CFG Interval End (Cfgintervalend) op('acestep').par.Cfgintervalend Float

Default:: 1.0
Range:: 0 to 1

Task

Source Audio (Srcaudiopath) op('acestep').par.Srcaudiopath File

Source audio file for cover/repaint/lego/extract/complete.

Default:: "" (Empty String)

Reference Audio (Referenceaudiopath) op('acestep').par.Referenceaudiopath File

Reference audio for style guidance.

Default:: "" (Empty String)

Cover Strength (Audiocoverstrength) op('acestep').par.Audiocoverstrength Float

Default:: 1.0
Range:: 0 to 1

Cover Noise Strength (Covernoisestrength) op('acestep').par.Covernoisestrength Float

0.0 = pure noise, 1.0 = closest to source.

Default:: 0.0
Range:: 0 to 1

Instruction (Instruction) op('acestep').par.Instruction Str

Instruction for the model. Default works for most tasks.

Default:: Fill the audio semantic mask based on the given conditions:

Repaint / Complete Header

Repaint Start (s) (Repaintingstart) op('acestep').par.Repaintingstart Float

Default:: 0.0
Range:: 0 to 600

Repaint End (s) (Repaintingend) op('acestep').par.Repaintingend Float

0 = auto (full duration).

Default:: 0.0
Range:: 0 to 600

Repaint Strength (Repaintstrength) op('acestep').par.Repaintstrength Float

Only used in balanced mode. 0=conservative, 1=aggressive.

Default:: 0.5
Range:: 0 to 1

LM

Enable LLM (Initllm) op('acestep').par.Initllm Toggle

Load a 5Hz language model for prompt expansion. Improves quality but uses 1-8GB extra VRAM. Requires sidecar restart to take effect.

Default:: False

Enable LM Thinking (Thinking) op('acestep').par.Thinking Toggle

Use 5Hz LM chain-of-thought (higher quality, slower). Requires LLM enabled.

Default:: False

Enhance Input via LM (Useformat) op('acestep').par.Useformat Toggle

Use format_sample() to enhance caption/lyrics. Requires LLM enabled.

Default:: False

LM Temperature (Lmtemperature) op('acestep').par.Lmtemperature Float

Default:: 0.85
Range:: 0 to 2

LM CFG Scale (Lmcfgscale) op('acestep').par.Lmcfgscale Float

Default:: 2.5
Range:: 0 to 10

LM Top P (Lmtopp) op('acestep').par.Lmtopp Float

Default:: 0.9
Range:: 0 to 1

Tool Toggles

Generate Music (Enablegenerate) op('acestep').par.Enablegenerate Toggle

Default:: False

Playback (Enableplayback) op('acestep').par.Enableplayback Toggle

Expose playback tool — transport controls, track switching, volume.

Default:: False

Get Tracks (Enablegettracks) op('acestep').par.Enablegettracks Toggle

Expose get_tracks tool — browse generation history and track metadata.

Default:: False

Changelog

v3.0.02026-05-02

added Toolname/Tooldescription pars for all 3 tools (generate, playback, get_tracks) - added _tool_decl helper for parameter-driven tool name/description resolution - updated category to Pipelines
rewrote GetTool to use tool_definition wrapper format
major extension rewrite (578 insertions) - added docs
rewrote extension as thin HTTP client to ACE-Step 1.5 API sidecar - removed embedded pipeline, dependency management, and repo cloning logic - added aiohttp-based task submission, polling, and audio download

v2.0.02025-07-19

🎨 Audio Visualization & Enhanced User Experience

Real-Time Audio Visualization:

Professional waveform visualization with frequency analysis
High-quality grayscale waveform rendering at 1280x960 resolution
Dynamic amplitude processing with transient emphasis
Frequency-based brightness variation for rich visual feedback
Anti-aliasing and smooth envelope generation

Automatic Visualization Triggers:

Auto-visualization after successful generation
Manual visualization via Currentaudio parameter changes
Smart path tracking to prevent redundant processing

Visual Feedback Enhancements:

Black screen clearing when no audio is selected
Visual confirmation of current audio status
Seamless integration with audio playback controls

TECHNICAL IMPROVEMENTS:

Async Visualization Processing:

Non-blocking waveform generation using TDAsyncIO
Thread-safe audio analysis with librosa integration
Graceful fallback to synchronous processing when needed

Robust Error Handling:

Fixed critical len() type errors in visualization pipeline
Comprehensive try/catch blocks around FFT processing
Safe numpy array type checking throughout audio pipeline

Parameter Callback System:

New Currentaudio() callback method for parameter-driven visualization
Intelligent request state checking to prevent conflicts during generation
Path validation and existence checking before processing

BUG FIXES:

Critical Stability Fixes:

Resolved TouchDesigner crashes caused by async visualization errors
Fixed "object of type 'int' has no len()" errors in audio processing
Improved error handling in FFT frequency analysis
Safe handling of edge cases in audio array processing

Visualization Pipeline Fixes:

Proper numpy array type validation throughout processing chain
Graceful handling of malformed or empty audio files
Improved error logging for debugging visualization issues

USER EXPERIENCE ENHANCEMENTS:

Visual Audio Management:

Immediate visual feedback when changing current audio file
Clear visual indication when no audio is loaded (black screen)
Smooth integration between generation and visualization workflows

Status & Logging Improvements:

Enhanced logging for visualization processes
Clear status messages for audio loading and processing
Improved error messages for troubleshooting

TECHNICAL DETAILS:

Visualization Engine:

Uses librosa for professional audio analysis
Implements RMS and peak envelope detection
FFT-based frequency analysis for visual brightness variation
Supports both sync and async processing modes

Integration Points:

Seamless connection with existing audio playback system
Compatible with all generation modes (text2music, audio2audio, editing)
Maintains full backward compatibility with v1.0.0 workflows

PERFORMANCE:

Optimized waveform generation with configurable resolution
Efficient memory usage in visualization processing
Non-blocking UI during visualization generation

v1.0.02025-06-20

🎵 Initial Release - ACE-Step Music Generation Integration

NEW FEATURES:

Text-to-Music Generation: Generate music from text prompts and descriptive tags
Lyrics Support: Full lyrics integration with structure tags like [verse], [chorus]
Audio2Audio Mode: Transform existing audio using prompts and lyrics as guidance
Advanced Audio Editing: Complete suite of audio manipulation tools:

Edit Audio Content: Modify existing audio with target prompts/lyrics
Repaint Audio Segment: Replace specific time segments
Retake Full Audio: Generate variations of entire audio
Extend Audio Duration: Extend audio beyond original length

Professional Parameter Control:

Inference steps, guidance scales, scheduler types (Euler, Heun)
CFG types (APG, CFG, Zero STAR, Double Condition)
ERG (Exponentially Smoothed Moving Average Guidance) controls
Manual seed support for reproducible generation

SideCar Integration: Seamless integration with SideCar server for distributed processing
Dependency Management: Automatic detection and installation of required Python packages
Output Management:

Configurable output folders and filenames
Automatic unique timestamp suffixes
JSON parameter saving for reproducibility

Settings Management: Load/save generation parameters from JSON files
Audio Playback: Built-in audio playback with playhead control
Model Management: Initialize, load, and unload models on demand

TECHNICAL FEATURES:

Three-Page Parameter Layout:

Main: Core generation and output settings
Edit: Audio editing and manipulation controls
Advanced: Professional diffusion and guidance parameters

Async Processing: Non-blocking generation via TDAsyncIO integration
Error Handling: Comprehensive dependency checking and error recovery
Status Monitoring: Real-time status updates and progress tracking

SUPPORTED WORKFLOWS:

Text → Music: Generate music from descriptive prompts
Audio → Audio: Transform existing audio with new characteristics
Audio Editing: Professional audio manipulation and refinement
Batch Processing: Generate multiple variations with different seeds

REQUIREMENTS:

ACE-Step repository (user must clone and configure)
SideCar operator for processing
Python dependencies (auto-installed when possible)
Optional: Custom checkpoints directory

Lyria

Overview

Key Features

Requirements

Input/Output

Inputs

Outputs

Usage Examples

Text-to-Music Generation

Audio-to-Audio Transformation

Audio Editing (Edit, Repaint, Retake, Extend)

Reloading Previous Settings

Best Practices

Troubleshooting

Research Citation

Research & Licensing

ACE-STEP Project

ACE-Step

Technical Details

Research Impact

Citation

Key Research Contributions

License

Parameters

ACE-Step

Playback

Advanced

Task

LM

Tool Toggles

Changelog

🎨 Audio Visualization & Enhanced User Experience

🎵 Initial Release - ACE-Step Music Generation Integration

Related Operators