TTS Kyutai

v1.1.2Updated

The TTS Kyutai operator runs local text-to-speech synthesis using Kyutai’s neural TTS models (derived from the Moshi speech-text foundation model). It launches an external Python worker process for GPU-accelerated inference and streams audio frames back into TouchDesigner as they are generated.

Key Features

Local inference — no API keys or cloud services required
Streaming synthesis — audio frames arrive progressively during generation
250+ built-in voices across multiple voice sets (VCTK, Expresso, CML-TTS, Unmute)
Voice search — filter the large voice library by keyword
Extend mode — append new speech to existing audio instead of replacing it
Auto-save to disk — optionally save every synthesis as WAV or OGG with metadata
TCP worker reattach — survive TouchDesigner file saves without reloading the model

Requirements

Python packages: moshi, torch (with CUDA 12.1), huggingface_hub — install via the Install/Settings page. The installer pins PyTorch 2.4.0 with CUDA 12.1 (cu121) for compatibility.
Models: The TTS model and voice repository must be downloaded from HuggingFace before first use
Hardware: NVIDIA GPU with CUDA 12.1-compatible drivers strongly recommended; CPU inference is supported but significantly slower

Input/Output

Inputs

None — text is entered directly via the Input Text field on the KyutaiTTS page.

Outputs

Output 1: store_output CHOP — generated audio at 24 kHz (mono)
Output 2: synthesis_log DAT — timestamped log of all synthesis operations
Output 3: text_queue DAT — queued text entries

Usage Examples

First-Time Setup

On the Install/Settings page, pulse Dependencies Available if it shows missing packages. Restart TouchDesigner after installation completes.
Pulse Download Model to fetch the TTS model from HuggingFace.
Pulse Download Voices to fetch the voice repository.
On the KyutaiTTS page, pulse Initialize TTS Kyutai to launch the worker process. The status will show “Ready” when the model is loaded.

Basic Speech Generation

Select a voice from the Voice menu (use Search Voices to filter by name or style, e.g. “happy”, “whisper”, “narration”).
Type text into the Input Text field.
Pulse Generate Speech. Audio appears progressively in the store_output CHOP.

Audio Playback

The Playback page controls how generated audio is played back through your system’s audio hardware.

Enable Active to hear synthesized audio through your speakers or headphones.
Select a Driver (default DirectSound/CoreAudio, or ASIO for low-latency setups) and choose the target Device from the menu.
Adjust Volume to control playback level.
If playback stalls or behaves unexpectedly, pulse Reset Playback to reinitialize the audio output.

Extending Audio

Enable Extend Current Audio on the KyutaiTTS page to append new speech to the end of the existing audio buffer instead of replacing it. This is useful for building up longer recordings across multiple synthesis passes — each pulse of Generate Speech adds to what is already in the output rather than clearing it first.

Saving Audio to Disk

On the Playback page, enable Auto Save To Disk to save every synthesis automatically, or pulse Save Current Audio for manual saves.
Set the Save Folder, Base Name (supports $TIMESTAMP placeholder), and File Type (WAV or OGG).
Enable Auto Version Files to avoid overwriting existing files.

Troubleshooting

“TTS engine is not ready” — Pulse Initialize TTS Kyutai and wait for the status to show “Ready”.
No audio output from the Playback page — Check that Active is enabled and the correct Device and Driver are selected.
Worker crashes on start — Verify CUDA drivers are installed. Try setting Device to “CPU” on the Install/Settings page as a fallback.
Voices menu shows “(Download Voices First)” — Pulse Download Voices on the Install/Settings page.

Research & Licensing

Kyutai

Kyutai is an AI research lab focused on speech and language technologies. Their Moshi model is a breakthrough in speech-text foundation models for real-time conversational AI.

Moshi TTS

The TTS component of Moshi generates speech from text using a dual-stream transformer architecture with voice cloning from reference audio embeddings.

Technical Details

7B parameter transformer architecture for speech processing
24 kHz audio output with streaming frame-by-frame generation
Fully causal and streaming with 80ms frame size

Research Impact

Enables natural real-time conversation with minimal latency
Production-ready implementations in Rust, Python, and MLX

Citation

@techreport{kyutai2024moshi,
  title={Moshi: a speech-text foundation model for real-time dialogue},
  author={Alexandre Défossez and Laurent Mazaré and Manu Orsini and
  Amélie Royer and Patrick Pérez and Hervé Jégou and Edouard Grave and Neil Zeghidour},
  year={2024},
  eprint={2410.00037},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  url={https://arxiv.org/abs/2410.00037},
}

Key Research Contributions

Full-duplex spoken dialogue with dual-stream audio modeling
Ultra-low latency speech synthesis (160ms theoretical)
Streaming neural audio codec (Mimi) at 1.1 kbps

License

CC-BY 4.0 - This model is freely available for research and commercial use.

Parameters

KyutaiTTS

Status (Status) op('tts_kyutai').par.Status Str

Default:: "" (Empty String)

Generate Speech (Texttospeech) op('tts_kyutai').par.Texttospeech Pulse

Default:: False

Input Text (Inputtext) op('tts_kyutai').par.Inputtext Str

Default:: "" (Empty String)

Initialize TTS Kyutai (Initialize) op('tts_kyutai').par.Initialize Pulse

Default:: False

Shutdown TTS Kyutai (Shutdown) op('tts_kyutai').par.Shutdown Pulse

Default:: False

Initialize On Start (Initializeonstart) op('tts_kyutai').par.Initializeonstart Toggle

Default:: False

Extend Current Audio (Appendtooutput) op('tts_kyutai').par.Appendtooutput Toggle

Default:: False

Voice (Voice) op('tts_kyutai').par.Voice StrMenu

Default:

"" (Empty String)

Menu Options:

cml-tts/fr/10087_11650_000028-0002.wav (cml-tts/fr/10087_11650_000028-0002.wav)
cml-tts/fr/10177_10625_000134-0003.wav (cml-tts/fr/10177_10625_000134-0003.wav)
cml-tts/fr/10179_11051_000005-0001.wav (cml-tts/fr/10179_11051_000005-0001.wav)
cml-tts/fr/12080_11650_000047-0001.wav (cml-tts/fr/12080_11650_000047-0001.wav)
cml-tts/fr/12205_11650_000004-0002.wav (cml-tts/fr/12205_11650_000004-0002.wav)
cml-tts/fr/12977_10625_000037-0001.wav (cml-tts/fr/12977_10625_000037-0001.wav)
cml-tts/fr/1406_1028_000009-0003.wav (cml-tts/fr/1406_1028_000009-0003.wav)
cml-tts/fr/1591_1028_000108-0004.wav (cml-tts/fr/1591_1028_000108-0004.wav)
cml-tts/fr/1770_1028_000036-0002.wav (cml-tts/fr/1770_1028_000036-0002.wav)
cml-tts/fr/2114_1656_000053-0001.wav (cml-tts/fr/2114_1656_000053-0001.wav)
cml-tts/fr/2154_2576_000020-0003.wav (cml-tts/fr/2154_2576_000020-0003.wav)
cml-tts/fr/2216_1745_000007-0001.wav (cml-tts/fr/2216_1745_000007-0001.wav)
cml-tts/fr/2223_1745_000009-0002.wav (cml-tts/fr/2223_1745_000009-0002.wav)
cml-tts/fr/2465_1943_000152-0002.wav (cml-tts/fr/2465_1943_000152-0002.wav)
cml-tts/fr/296_1028_000022-0001.wav (cml-tts/fr/296_1028_000022-0001.wav)
cml-tts/fr/3267_1902_000075-0001.wav (cml-tts/fr/3267_1902_000075-0001.wav)
cml-tts/fr/4193_3103_000004-0001.wav (cml-tts/fr/4193_3103_000004-0001.wav)
cml-tts/fr/4482_3103_000063-0001.wav (cml-tts/fr/4482_3103_000063-0001.wav)
cml-tts/fr/4724_3731_000031-0001.wav (cml-tts/fr/4724_3731_000031-0001.wav)
cml-tts/fr/4937_3731_000004-0001.wav (cml-tts/fr/4937_3731_000004-0001.wav)
cml-tts/fr/5207_3078_000031-0002.wav (cml-tts/fr/5207_3078_000031-0002.wav)
cml-tts/fr/5476_3103_000072-0001.wav (cml-tts/fr/5476_3103_000072-0001.wav)
cml-tts/fr/577_394_000070-0001.wav (cml-tts/fr/577_394_000070-0001.wav)
cml-tts/fr/5790_4893_000052-0001.wav (cml-tts/fr/5790_4893_000052-0001.wav)
cml-tts/fr/579_2548_000015-0001.wav (cml-tts/fr/579_2548_000015-0001.wav)
cml-tts/fr/5830_4703_000037-0001.wav (cml-tts/fr/5830_4703_000037-0001.wav)
cml-tts/fr/6318_7016_000027-0002.wav (cml-tts/fr/6318_7016_000027-0002.wav)
cml-tts/fr/7142_2432_000124-0003.wav (cml-tts/fr/7142_2432_000124-0003.wav)
cml-tts/fr/7400_2928_000100-0001.wav (cml-tts/fr/7400_2928_000100-0001.wav)
cml-tts/fr/7591_6742_000149-0002.wav (cml-tts/fr/7591_6742_000149-0002.wav)
cml-tts/fr/7601_7727_000062-0001.wav (cml-tts/fr/7601_7727_000062-0001.wav)
cml-tts/fr/7762_8734_000048-0002.wav (cml-tts/fr/7762_8734_000048-0002.wav)
cml-tts/fr/8128_7016_000047-0002.wav (cml-tts/fr/8128_7016_000047-0002.wav)
cml-tts/fr/928_486_000075-0001.wav (cml-tts/fr/928_486_000075-0001.wav)
cml-tts/fr/9834_9697_000150-0003.wav (cml-tts/fr/9834_9697_000150-0003.wav)
expresso/ex01-ex02_default_001_channel1_168s.wav (expresso/ex01-ex02_default_001_channel1_168s.wav)
expresso/ex01-ex02_default_001_channel2_198s.wav (expresso/ex01-ex02_default_001_channel2_198s.wav)
expresso/ex01-ex02_enunciated_001_channel1_432s.wav (expresso/ex01-ex02_enunciated_001_channel1_432s.wav)
expresso/ex01-ex02_enunciated_001_channel2_354s.wav (expresso/ex01-ex02_enunciated_001_channel2_354s.wav)
expresso/ex01-ex02_fast_001_channel1_104s.wav (expresso/ex01-ex02_fast_001_channel1_104s.wav)
expresso/ex01-ex02_fast_001_channel2_73s.wav (expresso/ex01-ex02_fast_001_channel2_73s.wav)
expresso/ex01-ex02_projected_001_channel1_46s.wav (expresso/ex01-ex02_projected_001_channel1_46s.wav)
expresso/ex01-ex02_projected_002_channel2_248s.wav (expresso/ex01-ex02_projected_002_channel2_248s.wav)
expresso/ex01-ex02_whisper_001_channel1_579s.wav (expresso/ex01-ex02_whisper_001_channel1_579s.wav)
expresso/ex01-ex02_whisper_001_channel2_717s.wav (expresso/ex01-ex02_whisper_001_channel2_717s.wav)
expresso/ex03-ex01_angry_001_channel1_201s.wav (expresso/ex03-ex01_angry_001_channel1_201s.wav)
expresso/ex03-ex01_angry_001_channel2_181s.wav (expresso/ex03-ex01_angry_001_channel2_181s.wav)
expresso/ex03-ex01_awe_001_channel1_1323s.wav (expresso/ex03-ex01_awe_001_channel1_1323s.wav)
expresso/ex03-ex01_awe_001_channel2_1290s.wav (expresso/ex03-ex01_awe_001_channel2_1290s.wav)
expresso/ex03-ex01_calm_001_channel1_1143s.wav (expresso/ex03-ex01_calm_001_channel1_1143s.wav)
expresso/ex03-ex01_calm_001_channel2_1081s.wav (expresso/ex03-ex01_calm_001_channel2_1081s.wav)
expresso/ex03-ex01_confused_001_channel1_909s.wav (expresso/ex03-ex01_confused_001_channel1_909s.wav)
expresso/ex03-ex01_confused_001_channel2_816s.wav (expresso/ex03-ex01_confused_001_channel2_816s.wav)
expresso/ex03-ex01_desire_004_channel1_545s.wav (expresso/ex03-ex01_desire_004_channel1_545s.wav)
expresso/ex03-ex01_desire_004_channel2_580s.wav (expresso/ex03-ex01_desire_004_channel2_580s.wav)
expresso/ex03-ex01_disgusted_004_channel1_170s.wav (expresso/ex03-ex01_disgusted_004_channel1_170s.wav)
expresso/ex03-ex01_enunciated_001_channel1_388s.wav (expresso/ex03-ex01_enunciated_001_channel1_388s.wav)
expresso/ex03-ex01_enunciated_001_channel2_576s.wav (expresso/ex03-ex01_enunciated_001_channel2_576s.wav)
expresso/ex03-ex01_happy_001_channel1_334s.wav (expresso/ex03-ex01_happy_001_channel1_334s.wav)
expresso/ex03-ex01_happy_001_channel2_257s.wav (expresso/ex03-ex01_happy_001_channel2_257s.wav)
expresso/ex03-ex01_laughing_001_channel1_188s.wav (expresso/ex03-ex01_laughing_001_channel1_188s.wav)
expresso/ex03-ex01_laughing_002_channel2_232s.wav (expresso/ex03-ex01_laughing_002_channel2_232s.wav)
expresso/ex03-ex01_nonverbal_001_channel2_37s.wav (expresso/ex03-ex01_nonverbal_001_channel2_37s.wav)
expresso/ex03-ex01_nonverbal_006_channel1_62s.wav (expresso/ex03-ex01_nonverbal_006_channel1_62s.wav)
expresso/ex03-ex01_sarcastic_001_channel1_435s.wav (expresso/ex03-ex01_sarcastic_001_channel1_435s.wav)
expresso/ex03-ex01_sarcastic_001_channel2_491s.wav (expresso/ex03-ex01_sarcastic_001_channel2_491s.wav)
expresso/ex03-ex01_sleepy_001_channel1_619s.wav (expresso/ex03-ex01_sleepy_001_channel1_619s.wav)
expresso/ex03-ex01_sleepy_001_channel2_662s.wav (expresso/ex03-ex01_sleepy_001_channel2_662s.wav)
expresso/ex03-ex02_animal-animaldir_002_channel2_89s.wav (expresso/ex03-ex02_animal-animaldir_002_channel2_89s.wav)
expresso/ex03-ex02_animal-animaldir_003_channel1_32s.wav (expresso/ex03-ex02_animal-animaldir_003_channel1_32s.wav)
expresso/ex03-ex02_animaldir-animal_008_channel1_147s.wav (expresso/ex03-ex02_animaldir-animal_008_channel1_147s.wav)
expresso/ex03-ex02_animaldir-animal_008_channel2_136s.wav (expresso/ex03-ex02_animaldir-animal_008_channel2_136s.wav)
expresso/ex03-ex02_child-childdir_001_channel1_291s.wav (expresso/ex03-ex02_child-childdir_001_channel1_291s.wav)
expresso/ex03-ex02_child-childdir_001_channel2_69s.wav (expresso/ex03-ex02_child-childdir_001_channel2_69s.wav)
expresso/ex03-ex02_childdir-child_004_channel1_308s.wav (expresso/ex03-ex02_childdir-child_004_channel1_308s.wav)
expresso/ex03-ex02_childdir-child_004_channel2_187s.wav (expresso/ex03-ex02_childdir-child_004_channel2_187s.wav)
expresso/ex03-ex02_laughing_001_channel1_248s.wav (expresso/ex03-ex02_laughing_001_channel1_248s.wav)
expresso/ex03-ex02_laughing_001_channel2_234s.wav (expresso/ex03-ex02_laughing_001_channel2_234s.wav)
expresso/ex03-ex02_narration_001_channel1_674s.wav (expresso/ex03-ex02_narration_001_channel1_674s.wav)
expresso/ex03-ex02_narration_002_channel2_1136s.wav (expresso/ex03-ex02_narration_002_channel2_1136s.wav)
expresso/ex03-ex02_sad-sympathetic_001_channel1_454s.wav (expresso/ex03-ex02_sad-sympathetic_001_channel1_454s.wav)
expresso/ex03-ex02_sad-sympathetic_001_channel2_400s.wav (expresso/ex03-ex02_sad-sympathetic_001_channel2_400s.wav)
expresso/ex03-ex02_sympathetic-sad_008_channel1_215s.wav (expresso/ex03-ex02_sympathetic-sad_008_channel1_215s.wav)
expresso/ex03-ex02_sympathetic-sad_008_channel2_268s.wav (expresso/ex03-ex02_sympathetic-sad_008_channel2_268s.wav)
expresso/ex04-ex01_animal-animaldir_006_channel1_196s.wav (expresso/ex04-ex01_animal-animaldir_006_channel1_196s.wav)
expresso/ex04-ex01_animal-animaldir_006_channel2_49s.wav (expresso/ex04-ex01_animal-animaldir_006_channel2_49s.wav)
expresso/ex04-ex01_animaldir-animal_001_channel1_118s.wav (expresso/ex04-ex01_animaldir-animal_001_channel1_118s.wav)
expresso/ex04-ex01_animaldir-animal_004_channel2_88s.wav (expresso/ex04-ex01_animaldir-animal_004_channel2_88s.wav)
expresso/ex04-ex01_child-childdir_003_channel2_283s.wav (expresso/ex04-ex01_child-childdir_003_channel2_283s.wav)
expresso/ex04-ex01_child-childdir_004_channel1_118s.wav (expresso/ex04-ex01_child-childdir_004_channel1_118s.wav)
expresso/ex04-ex01_childdir-child_001_channel1_228s.wav (expresso/ex04-ex01_childdir-child_001_channel1_228s.wav)
expresso/ex04-ex01_childdir-child_001_channel2_420s.wav (expresso/ex04-ex01_childdir-child_001_channel2_420s.wav)
expresso/ex04-ex01_disgusted_001_channel1_130s.wav (expresso/ex04-ex01_disgusted_001_channel1_130s.wav)
expresso/ex04-ex01_disgusted_001_channel2_325s.wav (expresso/ex04-ex01_disgusted_001_channel2_325s.wav)
expresso/ex04-ex01_laughing_001_channel1_306s.wav (expresso/ex04-ex01_laughing_001_channel1_306s.wav)
expresso/ex04-ex01_laughing_001_channel2_293s.wav (expresso/ex04-ex01_laughing_001_channel2_293s.wav)
expresso/ex04-ex01_narration_001_channel1_605s.wav (expresso/ex04-ex01_narration_001_channel1_605s.wav)
expresso/ex04-ex01_narration_001_channel2_686s.wav (expresso/ex04-ex01_narration_001_channel2_686s.wav)
expresso/ex04-ex01_sad-sympathetic_001_channel1_267s.wav (expresso/ex04-ex01_sad-sympathetic_001_channel1_267s.wav)
expresso/ex04-ex01_sad-sympathetic_001_channel2_346s.wav (expresso/ex04-ex01_sad-sympathetic_001_channel2_346s.wav)
expresso/ex04-ex01_sympathetic-sad_008_channel1_415s.wav (expresso/ex04-ex01_sympathetic-sad_008_channel1_415s.wav)
expresso/ex04-ex01_sympathetic-sad_008_channel2_453s.wav (expresso/ex04-ex01_sympathetic-sad_008_channel2_453s.wav)
expresso/ex04-ex02_angry_001_channel1_119s.wav (expresso/ex04-ex02_angry_001_channel1_119s.wav)
expresso/ex04-ex02_angry_001_channel2_150s.wav (expresso/ex04-ex02_angry_001_channel2_150s.wav)
expresso/ex04-ex02_awe_001_channel1_982s.wav (expresso/ex04-ex02_awe_001_channel1_982s.wav)
expresso/ex04-ex02_awe_001_channel2_1013s.wav (expresso/ex04-ex02_awe_001_channel2_1013s.wav)
expresso/ex04-ex02_bored_001_channel1_254s.wav (expresso/ex04-ex02_bored_001_channel1_254s.wav)
expresso/ex04-ex02_bored_001_channel2_232s.wav (expresso/ex04-ex02_bored_001_channel2_232s.wav)
expresso/ex04-ex02_calm_001_channel2_336s.wav (expresso/ex04-ex02_calm_001_channel2_336s.wav)
expresso/ex04-ex02_calm_002_channel1_480s.wav (expresso/ex04-ex02_calm_002_channel1_480s.wav)
expresso/ex04-ex02_confused_001_channel1_499s.wav (expresso/ex04-ex02_confused_001_channel1_499s.wav)
expresso/ex04-ex02_confused_001_channel2_488s.wav (expresso/ex04-ex02_confused_001_channel2_488s.wav)
expresso/ex04-ex02_desire_001_channel1_657s.wav (expresso/ex04-ex02_desire_001_channel1_657s.wav)
expresso/ex04-ex02_desire_001_channel2_694s.wav (expresso/ex04-ex02_desire_001_channel2_694s.wav)
expresso/ex04-ex02_disgusted_001_channel2_98s.wav (expresso/ex04-ex02_disgusted_001_channel2_98s.wav)
expresso/ex04-ex02_disgusted_004_channel1_169s.wav (expresso/ex04-ex02_disgusted_004_channel1_169s.wav)
expresso/ex04-ex02_enunciated_001_channel1_496s.wav (expresso/ex04-ex02_enunciated_001_channel1_496s.wav)
expresso/ex04-ex02_enunciated_001_channel2_898s.wav (expresso/ex04-ex02_enunciated_001_channel2_898s.wav)
expresso/ex04-ex02_fearful_001_channel1_316s.wav (expresso/ex04-ex02_fearful_001_channel1_316s.wav)
expresso/ex04-ex02_fearful_001_channel2_266s.wav (expresso/ex04-ex02_fearful_001_channel2_266s.wav)
expresso/ex04-ex02_happy_001_channel1_118s.wav (expresso/ex04-ex02_happy_001_channel1_118s.wav)
expresso/ex04-ex02_happy_001_channel2_140s.wav (expresso/ex04-ex02_happy_001_channel2_140s.wav)
expresso/ex04-ex02_laughing_001_channel1_147s.wav (expresso/ex04-ex02_laughing_001_channel1_147s.wav)
expresso/ex04-ex02_laughing_001_channel2_159s.wav (expresso/ex04-ex02_laughing_001_channel2_159s.wav)
expresso/ex04-ex02_nonverbal_004_channel1_18s.wav (expresso/ex04-ex02_nonverbal_004_channel1_18s.wav)
expresso/ex04-ex02_nonverbal_004_channel2_71s.wav (expresso/ex04-ex02_nonverbal_004_channel2_71s.wav)
expresso/ex04-ex02_sarcastic_001_channel1_519s.wav (expresso/ex04-ex02_sarcastic_001_channel1_519s.wav)
expresso/ex04-ex02_sarcastic_001_channel2_466s.wav (expresso/ex04-ex02_sarcastic_001_channel2_466s.wav)
expresso/ex04-ex03_default_001_channel1_3s.wav (expresso/ex04-ex03_default_001_channel1_3s.wav)
expresso/ex04-ex03_default_002_channel2_239s.wav (expresso/ex04-ex03_default_002_channel2_239s.wav)
expresso/ex04-ex03_enunciated_001_channel1_86s.wav (expresso/ex04-ex03_enunciated_001_channel1_86s.wav)
expresso/ex04-ex03_enunciated_001_channel2_342s.wav (expresso/ex04-ex03_enunciated_001_channel2_342s.wav)
expresso/ex04-ex03_fast_001_channel1_208s.wav (expresso/ex04-ex03_fast_001_channel1_208s.wav)
expresso/ex04-ex03_fast_001_channel2_25s.wav (expresso/ex04-ex03_fast_001_channel2_25s.wav)
expresso/ex04-ex03_projected_001_channel1_192s.wav (expresso/ex04-ex03_projected_001_channel1_192s.wav)
expresso/ex04-ex03_projected_001_channel2_179s.wav (expresso/ex04-ex03_projected_001_channel2_179s.wav)
expresso/ex04-ex03_whisper_001_channel1_198s.wav (expresso/ex04-ex03_whisper_001_channel1_198s.wav)
expresso/ex04-ex03_whisper_002_channel2_266s.wav (expresso/ex04-ex03_whisper_002_channel2_266s.wav)
unmute-prod-website/default_voice.wav (unmute-prod-website/default_voice.wav)
unmute-prod-website/degaulle-2.wav (unmute-prod-website/degaulle-2.wav)
unmute-prod-website/developpeuse-3.wav (unmute-prod-website/developpeuse-3.wav)
unmute-prod-website/ex04_narration_longform_00001.wav (unmute-prod-website/ex04_narration_longform_00001.wav)
unmute-prod-website/fabieng-enhanced-v2.wav (unmute-prod-website/fabieng-enhanced-v2.wav)
unmute-prod-website/p329_022.wav (unmute-prod-website/p329_022.wav)
vctk/p225_023.wav (vctk/p225_023.wav)
vctk/p226_023.wav (vctk/p226_023.wav)
vctk/p227_023.wav (vctk/p227_023.wav)
vctk/p228_023.wav (vctk/p228_023.wav)
vctk/p229_023.wav (vctk/p229_023.wav)
vctk/p230_023.wav (vctk/p230_023.wav)
vctk/p231_023.wav (vctk/p231_023.wav)
vctk/p232_023.wav (vctk/p232_023.wav)
vctk/p233_023.wav (vctk/p233_023.wav)
vctk/p234_023.wav (vctk/p234_023.wav)
vctk/p236_023.wav (vctk/p236_023.wav)
vctk/p237_023.wav (vctk/p237_023.wav)
vctk/p238_023.wav (vctk/p238_023.wav)
vctk/p239_023.wav (vctk/p239_023.wav)
vctk/p240_023.wav (vctk/p240_023.wav)
vctk/p241_023.wav (vctk/p241_023.wav)
vctk/p243_023.wav (vctk/p243_023.wav)
vctk/p244_023.wav (vctk/p244_023.wav)
vctk/p245_023.wav (vctk/p245_023.wav)
vctk/p246_023.wav (vctk/p246_023.wav)
vctk/p247_023.wav (vctk/p247_023.wav)
vctk/p248_023.wav (vctk/p248_023.wav)
vctk/p249_023.wav (vctk/p249_023.wav)
vctk/p250_023.wav (vctk/p250_023.wav)
vctk/p251_023.wav (vctk/p251_023.wav)
vctk/p252_023.wav (vctk/p252_023.wav)
vctk/p253_023.wav (vctk/p253_023.wav)
vctk/p254_023.wav (vctk/p254_023.wav)
vctk/p255_023.wav (vctk/p255_023.wav)
vctk/p256_023.wav (vctk/p256_023.wav)
vctk/p257_023.wav (vctk/p257_023.wav)
vctk/p258_023.wav (vctk/p258_023.wav)
vctk/p259_023.wav (vctk/p259_023.wav)
vctk/p260_023.wav (vctk/p260_023.wav)
vctk/p261_023.wav (vctk/p261_023.wav)
vctk/p262_023.wav (vctk/p262_023.wav)
vctk/p263_023.wav (vctk/p263_023.wav)
vctk/p264_023.wav (vctk/p264_023.wav)
vctk/p265_023.wav (vctk/p265_023.wav)
vctk/p266_023.wav (vctk/p266_023.wav)
vctk/p267_023.wav (vctk/p267_023.wav)
vctk/p269_023.wav (vctk/p269_023.wav)
vctk/p270_023.wav (vctk/p270_023.wav)
vctk/p271_023.wav (vctk/p271_023.wav)
vctk/p272_023.wav (vctk/p272_023.wav)
vctk/p273_023.wav (vctk/p273_023.wav)
vctk/p274_023.wav (vctk/p274_023.wav)
vctk/p275_023.wav (vctk/p275_023.wav)
vctk/p276_023.wav (vctk/p276_023.wav)
vctk/p277_023.wav (vctk/p277_023.wav)
vctk/p278_023.wav (vctk/p278_023.wav)
vctk/p279_023.wav (vctk/p279_023.wav)
vctk/p280_023.wav (vctk/p280_023.wav)
vctk/p281_023.wav (vctk/p281_023.wav)
vctk/p282_023.wav (vctk/p282_023.wav)
vctk/p283_023.wav (vctk/p283_023.wav)
vctk/p284_023.wav (vctk/p284_023.wav)
vctk/p285_023.wav (vctk/p285_023.wav)
vctk/p286_023.wav (vctk/p286_023.wav)
vctk/p287_023.wav (vctk/p287_023.wav)
vctk/p288_023.wav (vctk/p288_023.wav)
vctk/p292_023.wav (vctk/p292_023.wav)
vctk/p293_023.wav (vctk/p293_023.wav)
vctk/p294_023.wav (vctk/p294_023.wav)
vctk/p297_023.wav (vctk/p297_023.wav)
vctk/p298_023.wav (vctk/p298_023.wav)
vctk/p299_023.wav (vctk/p299_023.wav)
vctk/p300_023.wav (vctk/p300_023.wav)
vctk/p301_023.wav (vctk/p301_023.wav)
vctk/p302_023.wav (vctk/p302_023.wav)
vctk/p303_023.wav (vctk/p303_023.wav)
vctk/p304_023.wav (vctk/p304_023.wav)
vctk/p305_023.wav (vctk/p305_023.wav)
vctk/p306_023.wav (vctk/p306_023.wav)
vctk/p307_023.wav (vctk/p307_023.wav)
vctk/p308_023.wav (vctk/p308_023.wav)
vctk/p310_023.wav (vctk/p310_023.wav)
vctk/p311_023.wav (vctk/p311_023.wav)
vctk/p312_023.wav (vctk/p312_023.wav)
vctk/p313_023.wav (vctk/p313_023.wav)
vctk/p314_023.wav (vctk/p314_023.wav)
vctk/p315_023.wav (vctk/p315_023.wav)
vctk/p316_023.wav (vctk/p316_023.wav)
vctk/p317_023.wav (vctk/p317_023.wav)
vctk/p318_023.wav (vctk/p318_023.wav)
vctk/p323_023.wav (vctk/p323_023.wav)
vctk/p326_023.wav (vctk/p326_023.wav)
vctk/p329_023.wav (vctk/p329_023.wav)
vctk/p330_023.wav (vctk/p330_023.wav)
vctk/p333_023.wav (vctk/p333_023.wav)
vctk/p334_023.wav (vctk/p334_023.wav)
vctk/p335_023.wav (vctk/p335_023.wav)
vctk/p336_023.wav (vctk/p336_023.wav)
vctk/p339_023.wav (vctk/p339_023.wav)
vctk/p341_023.wav (vctk/p341_023.wav)
vctk/p343_023.wav (vctk/p343_023.wav)
vctk/p345_023.wav (vctk/p345_023.wav)
vctk/p347_023.wav (vctk/p347_023.wav)
vctk/p351_023.wav (vctk/p351_023.wav)
vctk/p360_023.wav (vctk/p360_023.wav)
vctk/p361_023.wav (vctk/p361_023.wav)
vctk/p363_023.wav (vctk/p363_023.wav)
vctk/p364_023.wav (vctk/p364_023.wav)
vctk/p374_023.wav (vctk/p374_023.wav)
vctk/p376_023.wav (vctk/p376_023.wav)
vctk/s5_023.wav (vctk/s5_023.wav)

Search Voices (Voicesearch) op('tts_kyutai').par.Voicesearch Str

Default:: "" (Empty String)

TTS Kyutai (Enginestatus) op('tts_kyutai').par.Enginestatus Str

Default:: "" (Empty String)

Streaming Mode (Streamingmode) op('tts_kyutai').par.Streamingmode Toggle

Default:: False

Temperature (Temperature) op('tts_kyutai').par.Temperature Float

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

CFG Coefficient (Cfgcoef) op('tts_kyutai').par.Cfgcoef Float

Default:: 0.0
Range:: 0.5 to 4
Slider Range:: 0.5 to 4

Padding Between (sec) (Paddingbetween) op('tts_kyutai').par.Paddingbetween Int

Default:: 0
Range:: 0 to 5
Slider Range:: 0 to 5

Clear Queue (Clearqueue) op('tts_kyutai').par.Clearqueue Pulse

Default:: False

Stop Synthesis (Stopsynth) op('tts_kyutai').par.Stopsynth Pulse

Default:: False

Clear Audio Buffers (Clearaudio) op('tts_kyutai').par.Clearaudio Pulse

Clears all generated audio from memory and from the output CHOPs (store_output and full_audio).

Default:: False

Playback

Audio Device Settings Header

Reset Playback (Resetpulse) op('tts_kyutai').par.Resetpulse Pulse

Default:: False

Active (Audioactive) op('tts_kyutai').par.Audioactive Toggle

Default:: True

Device (Audiodevice) op('tts_kyutai').par.Audiodevice Menu

Default:: default
Options:: default, {0.0.0.00000000}.{d7b929aa-ec27-4f96-bd39-78d6a8c2044a}||Out_1-2_(MOTU_M_Series)||1, {0.0.0.00000000}.{044f8ef8-f1ad-4655-90d6-0aef7b713b78}||Voicemeeter_AUX_Input_(VB-Audio_Voicemeeter_VAIO)||2, {0.0.0.00000000}.{170cc7c6-264f-46f4-a652-4c80058e49d2}||LS27A70_(NVIDIA_High_Definition_Audio)||3, {0.0.0.00000000}.{2162b344-60a4-4dda-a068-a0887826a518}||Voicemeeter_VAIO3_Input_(VB-Audio_Voicemeeter_VAIO)||4, {0.0.0.00000000}.{25b313f1-54e3-4bae-b661-e07474413cab}||Voicemeeter_In_5_(VB-Audio_Voicemeeter_VAIO)||5, {0.0.0.00000000}.{34b7624e-63b1-49fc-93dc-1d03ca1dd600}||CABLE-B_In_16ch_(VB-Audio_Virtual_Cable_B)||6, {0.0.0.00000000}.{372fb62b-07aa-4580-b062-2c6adba187e7}||Out_3-4_(MOTU_M_Series)||7, {0.0.0.00000000}.{394e3e0c-eac4-4fc9-ad76-1600f0cb570b}||CABLE-A_Input_(VB-Audio_Virtual_Cable_A)||8, {0.0.0.00000000}.{6b4330e7-1895-4e98-8c9b-f21172031db3}||F13NA_(NVIDIA_High_Definition_Audio)||9, {0.0.0.00000000}.{95545e0a-9e74-42cc-9835-fa5b33914d6c}||Voicemeeter_In_2_(VB-Audio_Voicemeeter_VAIO)||10, {0.0.0.00000000}.{9f9a792f-c06d-4923-9f4b-da5200378f26}||LEN_P32u-10_(NVIDIA_High_Definition_Audio)||11, {0.0.0.00000000}.{a4936fe7-9e56-4176-9789-f343437419f2}||Voicemeeter_In_4_(VB-Audio_Voicemeeter_VAIO)||12, {0.0.0.00000000}.{a720258f-2455-411a-a701-1f8ecee9d3d6}||Voicemeeter_In_3_(VB-Audio_Voicemeeter_VAIO)||13, {0.0.0.00000000}.{ad8837d6-c905-4a72-adcb-7018ee7baab3}||CABLE_Input_(VB-Audio_Virtual_Cable)||14, {0.0.0.00000000}.{b6526f47-8c31-48d5-8b30-489196c56a6b}||Headphones_(iLoud_Micro-Monitor)||15, {0.0.0.00000000}.{b8fcdf16-5164-47d2-80aa-0f80ce4bd0b5}||Voicemeeter_Input_(VB-Audio_Voicemeeter_VAIO)||16, {0.0.0.00000000}.{c6e56f21-72eb-46b0-9761-7ba5c2286004}||Voicemeeter_In_1_(VB-Audio_Voicemeeter_VAIO)||17, {0.0.0.00000000}.{d8576751-f212-4fa7-8cb5-0f825e64c87e}||CABLE-B_Input_(VB-Audio_Virtual_Cable_B)||18, {0.0.0.00000000}.{fddd6891-fc7b-4afa-8c26-9246692a19f0}||CABLE-A_In_16ch_(VB-Audio_Virtual_Cable_A)||19

Volume (Volume) op('tts_kyutai').par.Volume Float

Default:: 1.0
Range:: 0 to 1
Slider Range:: 0 to 1

Auto Save To Disk (Autosavetodisk) op('tts_kyutai').par.Autosavetodisk Toggle

Automatically save generated audio and metadata locally after successful synthesis.

Default:: False

Save Folder (Folder) op('tts_kyutai').par.Folder Folder

Folder where generated audio files and metadata are saved.

Default:: "" (Empty String)

Base Name (Name) op('tts_kyutai').par.Name Str

Base filename for saved audio. Use $TIMESTAMP for unique names.

Default:: "" (Empty String)

Auto Version Files (Autoversion) op('tts_kyutai').par.Autoversion Toggle

Automatically add _1, _2, etc. if filename exists.

Default:: False

Save Current Audio (Savefile) op('tts_kyutai').par.Savefile Pulse

Saves the audio currently in the output CHOP to a file using the settings above.

Default:: False

Install/Settings

Dependencies Available (Installdependencies) op('tts_kyutai').par.Installdependencies Pulse

Default:: False

Model Repository (Modelrepo) op('tts_kyutai').par.Modelrepo Str

Default:: "" (Empty String)

Download Model (Downloadmodel) op('tts_kyutai').par.Downloadmodel Pulse

Default:: False

Voice Repository (Voicerepo) op('tts_kyutai').par.Voicerepo Str

Default:: "" (Empty String)

Download Voices (Downloadvoices) op('tts_kyutai').par.Downloadvoices Pulse

Default:: False

Worker Connection Settings Header

Monitor Worker Logs (stderr) (Monitorworkerlogs) op('tts_kyutai').par.Monitorworkerlogs Toggle

Default:: False

Auto Reattach On Init (Autoreattachoninit) op('tts_kyutai').par.Autoreattachoninit Toggle

Default:: False

Force Attach (Skip PID Check) (Forceattachoninit) op('tts_kyutai').par.Forceattachoninit Toggle

Default:: False

Changelog

v1.1.22026-03-26

Initial release

v1.1.12026-03-01

Fix TD 32050+ freeze by removing moshi import at module level - Hardcode DEFAULT_DSM_TTS_REPO constants instead of importing from moshi - Use importlib.metadata for dependency checking
Initial commit

v1.1.02025-08-17

NEW: TCP IPC Mode - Added robust TCP communication with worker processes (recommended over STDIO)
NEW: Auto Worker Reattach - Automatically reconnect to existing workers on TD restart/reload
NEW: TCP Heartbeat System - Automatic connection monitoring with reconnect on timeout
NEW: Sophisticated Audio Saving - Auto-save with metadata, versioning, and multiple formats (WAV/OGG)
NEW: Clear Audio Method - Clear audio buffers and CHOPs with one button
NEW: Manual Save Function - Save current audio with comprehensive metadata tracking
IMPROVED: Parameter Organization - Cleaned and reorganized parameter menus for better UX
IMPROVED: Method Naming - Renamed Synthesize to Texttospeech with optional text parameter
IMPROVED: Connection Reliability - Automatic TCP reconnection and worker process persistence
IMPROVED: Audio Management - Enhanced buffering and progressive CHOP updates