- OPERATORS
- PIPELINES
Acestep
v3.0.0updatedOverview
Section titled “Overview”The ACE-Step Music Generator integrates the ACE-Step diffusion model into TouchDesigner for text-to-music generation, audio-to-audio transformation, and audio editing. All inference runs on the external SideCar process, keeping TouchDesigner responsive. Generated audio includes a real-time waveform visualizer and optional autoplay.
Key Features
Section titled “Key Features”- Text-to-Music: Generate music from descriptive tags, genres, and structured lyrics
- Audio-to-Audio: Transform existing audio using a reference file with adjustable influence strength
- Audio Editing: Edit, repaint, retake, or extend existing audio with fine-grained control
- SideCar Architecture: All model loading and inference is offloaded to an external process
- Auto Repository Setup: Prompts to download and clone the ACE-Step repository on first use
- Settings Recall: Save and reload generation parameters from previous outputs
Requirements
Section titled “Requirements”- SideCar Operator: Must be running and connected. All model inference happens there
- SideCar Python Environment: All ACE-Step dependencies (
torch,torchaudio,librosa,diffusers, etc.) must be installed in the SideCar’s Python environment. This operator does not manage packages - Git: Must be installed and in your system PATH for automatic repository cloning
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”None required. Queries are configured via parameters. Reference audio files are specified by file path for audio-to-audio and editing modes.
Outputs
Section titled “Outputs”- Waveform Visualization: Real-time visual waveform rendered to an internal scriptTOP
- Audio Files: Generated WAV files saved to the configured output folder
Usage Examples
Section titled “Usage Examples”Text-to-Music Generation
Section titled “Text-to-Music Generation”- Ensure the SideCar is running and connected (check the About page, ‘SideCar Operator’)
- On the ACE-Step page, enter descriptive tags in ‘Prompt / Tags’ (e.g., “upbeat pop, catchy melody, female singer”)
- Enter structured lyrics in ‘Lyrics’ using tags like
[verse],[chorus] - Set ‘Audio Duration’ to the desired length in seconds
- Pulse ‘Generate Music’
- If this is your first time, a dialog will prompt you to download the ACE-Step repository — click Download and wait for it to finish, then pulse ‘Generate Music’ again
Audio-to-Audio Transformation
Section titled “Audio-to-Audio Transformation”- On the ACE-Step page, toggle ‘Enable Audio2Audio’ to On
- Set ‘Reference Audio Input’ to your source audio file
- Adjust ‘Reference Audio Strength’ — higher values stay closer to the reference
- Enter a prompt and lyrics to guide the transformation
- Pulse ‘Generate Music’
Audio Editing (Edit, Repaint, Retake, Extend)
Section titled “Audio Editing (Edit, Repaint, Retake, Extend)”- On the Edit page, toggle ‘Enable Audio Editing/Manipulation’ to On
- Set ‘Source Audio Path’ to the audio you want to modify
- Select an ‘Edit Mode’:
- Edit Audio Content: Changes the content of the audio using original and target prompts/lyrics. Requires filling in ‘Original Prompt’ and ‘Original Lyrics’ — pulse ‘Load Src Credentials’ to auto-fill these from a previous generation’s saved parameters
- Extend Audio Duration: Extends the audio by setting ‘Extend Start’ and ‘Extend End’ beyond the original boundaries
- Repaint Audio Segment: Regenerates a time region defined by start/end times
- Retake Full Audio: Regenerates the entire audio with variance control
- Adjust ‘Variance’ and ‘Variant Seed’ for variation control
- Pulse ‘Generate Music’ on the ACE-Step page
Reloading Previous Settings
Section titled “Reloading Previous Settings”- Set ‘Current Audio’ to a previously generated WAV file
- Pulse ‘Settings from Current Audio’ — this loads all generation parameters from the associated JSON file saved alongside the audio
Best Practices
Section titled “Best Practices”- Set ‘Manual Seed’ to a specific value for reproducible results, or leave at -1 for random
- Toggle ‘Add Unique Suffix to Filename’ to On to prevent overwriting previous outputs
- For audio editing, always use ‘Load Src Credentials’ to auto-fill the original prompt and lyrics rather than typing them manually
- Higher ‘Inference Steps’ improve quality but increase generation time — 60 is a good starting point
- On the Advanced page, ‘Use bfloat16 Precision’ speeds up inference on supported GPUs. Disable it on macOS or if you encounter errors
Troubleshooting
Section titled “Troubleshooting”- SideCar Not Connected: Check that the SideCar server is running. Verify the ‘SideCar Operator’ reference on the About page points to the correct operator
- Repository Missing: If the clone prompt appears repeatedly, check your internet connection and Git installation. Review the TouchDesigner console for detailed errors
- Missing Dependencies: Errors about missing Python packages (e.g.,
torch,librosa) mean you need to install them in the SideCar’s Python environment manually - torch.compile() Not Supported on Windows: The ACE-Step model does not support
torch.compile()on Windows. Leave this toggle off unless running on Linux
Research Citation
Section titled “Research Citation”Research & Licensing
ACE-STEP Project
The ACE-STEP project is an open-source initiative focused on advancing AI music generation.
ACE-Step
ACE-Step is a foundation model for music generation that integrates diffusion-based generation with advanced encoding and transformation techniques.
Technical Details
- Combines diffusion with DCAE and linear transformer architecture
- Uses MERT and m-hubert for semantic representation alignment (REPA)
- Supports text-to-music, audio-to-audio, edit, repaint, retake, and extend tasks
Research Impact
- Provides a holistic open-source architecture for state-of-the-art music generation
- Enables original music generation across diverse genres for creative production and education
Citation
@misc{gong2025acestep,
title={ACE-Step: A Step Towards Music Generation Foundation Model},
author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step}},
year={2025},
note={GitHub repository}
} Key Research Contributions
- Open-source foundation model for music generation using diffusion with Deep Compression AutoEncoder (DCAE) and lightweight linear transformer
- Leverages MERT and m-hubert for semantic alignment (REPA) enabling rapid training convergence
- Faster synthesis than LLM-based models (up to 4 minutes of music in 20 seconds on A100 GPU)
- Supports voice cloning, lyric editing, remixing, and track generation through fine-grained acoustic control
License
Apache License 2.0 - This model is freely available for research and commercial use.
Parameters
Section titled “Parameters”ACE-Step
Section titled “ACE-Step”op('acestep').par.Status Str - Default:
"" (Empty String)
op('acestep').par.Generate Pulse - Default:
False
op('acestep').par.Prompt Str Music description (max 512 chars). Tags, genres, mood.
- Default:
groovy funky syncopated
op('acestep').par.Lyrics Str Lyrics with structure tags like [verse], [chorus]. Use [Instrumental] for no vocals.
- Default:
[[instrumental]]
op('acestep').par.Duration Float Audio duration in seconds (10-600).
- Default:
15.0- Range:
- 10 to 600
op('acestep').par.Bpm Int Beats per minute. 0 = auto-detect.
- Default:
0- Range:
- 0 to 300
op('acestep').par.Seed Int -1 for random seed.
- Default:
-1- Range:
- -1 to 1000000000
op('acestep').par.Batchsize Int Number of variations to generate.
- Default:
1- Range:
- 1 to 8
op('acestep').par.Outputfolder Folder Folder to save downloaded audio. Relative to project or absolute.
- Default:
audio_out
op('acestep').par.Currentaudio File - Default:
"" (Empty String)
op('acestep').par.Checkhealth Pulse - Default:
False
op('acestep').par.Loadsettings Pulse - Default:
False
op('acestep').par.Serverhost Str - Default:
127.0.0.1
op('acestep').par.Serverport Int - Default:
0- Range:
- 1 to 65535
Playback
Section titled “Playback”op('acestep').par.Autoplay Toggle Automatically play audio after generation completes.
- Default:
True
op('acestep').par.Play Pulse - Default:
False
op('acestep').par.Pause Pulse - Default:
False
op('acestep').par.Stop Pulse - Default:
False
op('acestep').par.Replay Pulse - Default:
False
op('acestep').par.Next Pulse - Default:
False
op('acestep').par.Previous Pulse - Default:
False
op('acestep').par.Playhead Float Scrub position (0-1).
- Default:
0.0- Range:
- 0 to 1
op('acestep').par.Volume Float - Default:
0.7- Range:
- 0 to 1
op('acestep').par.Playlistindex Int Current track index in the generation history.
- Default:
1- Range:
- 0 to 999
op('acestep').par.Playlistcount Int - Default:
0- Range:
- 0 to 1
op('acestep').par.Clearhistory Pulse - Default:
False
Advanced
Section titled “Advanced”op('acestep').par.Infersteps Int Turbo: 1-20, Base: 1-200.
- Default:
8- Range:
- 1 to 200
op('acestep').par.Guidancescale Float - Default:
7.0- Range:
- 1 to 15
op('acestep').par.Shift Float Only effective for base models.
- Default:
3.0- Range:
- 1 to 5
op('acestep').par.Useadg Toggle - Default:
False
op('acestep').par.Cfgintervalstart Float - Default:
0.0- Range:
- 0 to 1
op('acestep').par.Cfgintervalend Float - Default:
1.0- Range:
- 0 to 1
op('acestep').par.Srcaudiopath File Source audio file for cover/repaint/lego/extract/complete.
- Default:
"" (Empty String)
op('acestep').par.Referenceaudiopath File Reference audio for style guidance.
- Default:
"" (Empty String)
op('acestep').par.Audiocoverstrength Float - Default:
1.0- Range:
- 0 to 1
op('acestep').par.Covernoisestrength Float 0.0 = pure noise, 1.0 = closest to source.
- Default:
0.0- Range:
- 0 to 1
op('acestep').par.Instruction Str Instruction for the model. Default works for most tasks.
- Default:
Fill the audio semantic mask based on the given conditions:
op('acestep').par.Repaintingstart Float - Default:
0.0- Range:
- 0 to 600
op('acestep').par.Repaintingend Float 0 = auto (full duration).
- Default:
0.0- Range:
- 0 to 600
op('acestep').par.Repaintstrength Float Only used in balanced mode. 0=conservative, 1=aggressive.
- Default:
0.5- Range:
- 0 to 1
op('acestep').par.Initllm Toggle Load a 5Hz language model for prompt expansion. Improves quality but uses 1-8GB extra VRAM. Requires sidecar restart to take effect.
- Default:
False
op('acestep').par.Thinking Toggle Use 5Hz LM chain-of-thought (higher quality, slower). Requires LLM enabled.
- Default:
False
op('acestep').par.Useformat Toggle Use format_sample() to enhance caption/lyrics. Requires LLM enabled.
- Default:
False
op('acestep').par.Lmtemperature Float - Default:
0.85- Range:
- 0 to 2
op('acestep').par.Lmcfgscale Float - Default:
2.5- Range:
- 0 to 10
op('acestep').par.Lmtopp Float - Default:
0.9- Range:
- 0 to 1
Tool Toggles
Section titled “Tool Toggles”op('acestep').par.Enablegenerate Toggle - Default:
False
op('acestep').par.Enableplayback Toggle Expose playback tool — transport controls, track switching, volume.
- Default:
False
op('acestep').par.Enablegettracks Toggle Expose get_tracks tool — browse generation history and track metadata.
- Default:
False
Changelog
Section titled “Changelog”v3.0.02026-05-02
- added Toolname/Tooldescription pars for all 3 tools (generate, playback, get_tracks) - added _tool_decl helper for parameter-driven tool name/description resolution - updated category to Pipelines
- rewrote GetTool to use tool_definition wrapper format
- major extension rewrite (578 insertions) - added docs
- rewrote extension as thin HTTP client to ACE-Step 1.5 API sidecar - removed embedded pipeline, dependency management, and repo cloning logic - added aiohttp-based task submission, polling, and audio download
v2.0.02025-07-19
🎨 Audio Visualization & Enhanced User Experience
- Real-Time Audio Visualization:
- Professional waveform visualization with frequency analysis
- High-quality grayscale waveform rendering at 1280x960 resolution
- Dynamic amplitude processing with transient emphasis
- Frequency-based brightness variation for rich visual feedback
- Anti-aliasing and smooth envelope generation
- Automatic Visualization Triggers:
- Auto-visualization after successful generation
- Manual visualization via
Currentaudioparameter changes - Smart path tracking to prevent redundant processing
- Visual Feedback Enhancements:
- Black screen clearing when no audio is selected
- Visual confirmation of current audio status
- Seamless integration with audio playback controls
- Async Visualization Processing:
- Non-blocking waveform generation using TDAsyncIO
- Thread-safe audio analysis with librosa integration
- Graceful fallback to synchronous processing when needed
- Robust Error Handling:
- Fixed critical
len()type errors in visualization pipeline - Comprehensive try/catch blocks around FFT processing
- Safe numpy array type checking throughout audio pipeline
- Parameter Callback System:
- New
Currentaudio()callback method for parameter-driven visualization - Intelligent request state checking to prevent conflicts during generation
- Path validation and existence checking before processing
- Critical Stability Fixes:
- Resolved TouchDesigner crashes caused by async visualization errors
- Fixed "object of type 'int' has no len()" errors in audio processing
- Improved error handling in FFT frequency analysis
- Safe handling of edge cases in audio array processing
- Visualization Pipeline Fixes:
- Proper numpy array type validation throughout processing chain
- Graceful handling of malformed or empty audio files
- Improved error logging for debugging visualization issues
- Visual Audio Management:
- Immediate visual feedback when changing current audio file
- Clear visual indication when no audio is loaded (black screen)
- Smooth integration between generation and visualization workflows
- Status & Logging Improvements:
- Enhanced logging for visualization processes
- Clear status messages for audio loading and processing
- Improved error messages for troubleshooting
- Visualization Engine:
- Uses librosa for professional audio analysis
- Implements RMS and peak envelope detection
- FFT-based frequency analysis for visual brightness variation
- Supports both sync and async processing modes
- Integration Points:
- Seamless connection with existing audio playback system
- Compatible with all generation modes (text2music, audio2audio, editing)
- Maintains full backward compatibility with v1.0.0 workflows
- Optimized waveform generation with configurable resolution
- Efficient memory usage in visualization processing
- Non-blocking UI during visualization generation
TECHNICAL IMPROVEMENTS:
BUG FIXES:
USER EXPERIENCE ENHANCEMENTS:
TECHNICAL DETAILS:
PERFORMANCE:
v1.0.02025-06-20
🎵 Initial Release - ACE-Step Music Generation Integration
NEW FEATURES:
- Text-to-Music Generation: Generate music from text prompts and descriptive tags
- Lyrics Support: Full lyrics integration with structure tags like [verse], [chorus]
- Audio2Audio Mode: Transform existing audio using prompts and lyrics as guidance
- Advanced Audio Editing: Complete suite of audio manipulation tools:
- Edit Audio Content: Modify existing audio with target prompts/lyrics
- Repaint Audio Segment: Replace specific time segments
- Retake Full Audio: Generate variations of entire audio
- Extend Audio Duration: Extend audio beyond original length
- Professional Parameter Control:
- Inference steps, guidance scales, scheduler types (Euler, Heun)
- CFG types (APG, CFG, Zero STAR, Double Condition)
- ERG (Exponentially Smoothed Moving Average Guidance) controls
- Manual seed support for reproducible generation
- SideCar Integration: Seamless integration with SideCar server for distributed processing
- Dependency Management: Automatic detection and installation of required Python packages
- Output Management:
- Configurable output folders and filenames
- Automatic unique timestamp suffixes
- JSON parameter saving for reproducibility
- Settings Management: Load/save generation parameters from JSON files
- Audio Playback: Built-in audio playback with playhead control
- Model Management: Initialize, load, and unload models on demand
TECHNICAL FEATURES:
- Three-Page Parameter Layout:
- Main: Core generation and output settings
- Edit: Audio editing and manipulation controls
- Advanced: Professional diffusion and guidance parameters
- Async Processing: Non-blocking generation via TDAsyncIO integration
- Error Handling: Comprehensive dependency checking and error recovery
- Status Monitoring: Real-time status updates and progress tracking
SUPPORTED WORKFLOWS:
- Text → Music: Generate music from descriptive prompts
- Audio → Audio: Transform existing audio with new characteristics
- Audio Editing: Professional audio manipulation and refinement
- Batch Processing: Generate multiple variations with different seeds
REQUIREMENTS:
- ACE-Step repository (user must clone and configure)
- SideCar operator for processing
- Python dependencies (auto-installed when possible)
- Optional: Custom checkpoints directory