Skip to content

TTS Kyutai

v1.1.2Updated

The TTS Kyutai operator runs local text-to-speech synthesis using Kyutai’s neural TTS models (derived from the Moshi speech-text foundation model). It launches an external Python worker process for GPU-accelerated inference and streams audio frames back into TouchDesigner as they are generated.

  • Local inference — no API keys or cloud services required
  • Streaming synthesis — audio frames arrive progressively during generation
  • 250+ built-in voices across multiple voice sets (VCTK, Expresso, CML-TTS, Unmute)
  • Voice search — filter the large voice library by keyword
  • Extend mode — append new speech to existing audio instead of replacing it
  • Auto-save to disk — optionally save every synthesis as WAV or OGG with metadata
  • TCP worker reattach — survive TouchDesigner file saves without reloading the model
  • Python packages: moshi, torch (with CUDA 12.1), huggingface_hub — install via the Install/Settings page. The installer pins PyTorch 2.4.0 with CUDA 12.1 (cu121) for compatibility.
  • Models: The TTS model and voice repository must be downloaded from HuggingFace before first use
  • Hardware: NVIDIA GPU with CUDA 12.1-compatible drivers strongly recommended; CPU inference is supported but significantly slower

None — text is entered directly via the Input Text field on the KyutaiTTS page.

  • Output 1: store_output CHOP — generated audio at 24 kHz (mono)
  • Output 2: synthesis_log DAT — timestamped log of all synthesis operations
  • Output 3: text_queue DAT — queued text entries
  1. On the Install/Settings page, pulse Dependencies Available if it shows missing packages. Restart TouchDesigner after installation completes.
  2. Pulse Download Model to fetch the TTS model from HuggingFace.
  3. Pulse Download Voices to fetch the voice repository.
  4. On the KyutaiTTS page, pulse Initialize TTS Kyutai to launch the worker process. The status will show “Ready” when the model is loaded.
  1. Select a voice from the Voice menu (use Search Voices to filter by name or style, e.g. “happy”, “whisper”, “narration”).
  2. Type text into the Input Text field.
  3. Pulse Generate Speech. Audio appears progressively in the store_output CHOP.

The Playback page controls how generated audio is played back through your system’s audio hardware.

  1. Enable Active to hear synthesized audio through your speakers or headphones.
  2. Select a Driver (default DirectSound/CoreAudio, or ASIO for low-latency setups) and choose the target Device from the menu.
  3. Adjust Volume to control playback level.
  4. If playback stalls or behaves unexpectedly, pulse Reset Playback to reinitialize the audio output.

Enable Extend Current Audio on the KyutaiTTS page to append new speech to the end of the existing audio buffer instead of replacing it. This is useful for building up longer recordings across multiple synthesis passes — each pulse of Generate Speech adds to what is already in the output rather than clearing it first.

  1. On the Playback page, enable Auto Save To Disk to save every synthesis automatically, or pulse Save Current Audio for manual saves.
  2. Set the Save Folder, Base Name (supports $TIMESTAMP placeholder), and File Type (WAV or OGG).
  3. Enable Auto Version Files to avoid overwriting existing files.
  • “TTS engine is not ready” — Pulse Initialize TTS Kyutai and wait for the status to show “Ready”.
  • No audio output from the Playback page — Check that Active is enabled and the correct Device and Driver are selected.
  • Worker crashes on start — Verify CUDA drivers are installed. Try setting Device to “CPU” on the Install/Settings page as a fallback.
  • Voices menu shows “(Download Voices First)” — Pulse Download Voices on the Install/Settings page.

Research & Licensing

Kyutai

Kyutai is an AI research lab focused on speech and language technologies. Their Moshi model is a breakthrough in speech-text foundation models for real-time conversational AI.

Moshi TTS

The TTS component of Moshi generates speech from text using a dual-stream transformer architecture with voice cloning from reference audio embeddings.

Technical Details

  • 7B parameter transformer architecture for speech processing
  • 24 kHz audio output with streaming frame-by-frame generation
  • Fully causal and streaming with 80ms frame size

Research Impact

  • Enables natural real-time conversation with minimal latency
  • Production-ready implementations in Rust, Python, and MLX

Citation

@techreport{kyutai2024moshi,
  title={Moshi: a speech-text foundation model for real-time dialogue},
  author={Alexandre Défossez and Laurent Mazaré and Manu Orsini and
  Amélie Royer and Patrick Pérez and Hervé Jégou and Edouard Grave and Neil Zeghidour},
  year={2024},
  eprint={2410.00037},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  url={https://arxiv.org/abs/2410.00037},
}

Key Research Contributions

  • Full-duplex spoken dialogue with dual-stream audio modeling
  • Ultra-low latency speech synthesis (160ms theoretical)
  • Streaming neural audio codec (Mimi) at 1.1 kbps

License

CC-BY 4.0 - This model is freely available for research and commercial use.

Status (Status) op('tts_kyutai').par.Status Str
Default:
"" (Empty String)
Generate Speech (Texttospeech) op('tts_kyutai').par.Texttospeech Pulse
Default:
False
Input Text (Inputtext) op('tts_kyutai').par.Inputtext Str
Default:
"" (Empty String)
Initialize TTS Kyutai (Initialize) op('tts_kyutai').par.Initialize Pulse
Default:
False
Shutdown TTS Kyutai (Shutdown) op('tts_kyutai').par.Shutdown Pulse
Default:
False
Initialize On Start (Initializeonstart) op('tts_kyutai').par.Initializeonstart Toggle
Default:
False
Extend Current Audio (Appendtooutput) op('tts_kyutai').par.Appendtooutput Toggle
Default:
False
Voice (Voice) op('tts_kyutai').par.Voice StrMenu
Default:
"" (Empty String)
Menu Options:
  • cml-tts/fr/10087_11650_000028-0002.wav (cml-tts/fr/10087_11650_000028-0002.wav)
  • cml-tts/fr/10177_10625_000134-0003.wav (cml-tts/fr/10177_10625_000134-0003.wav)
  • cml-tts/fr/10179_11051_000005-0001.wav (cml-tts/fr/10179_11051_000005-0001.wav)
  • cml-tts/fr/12080_11650_000047-0001.wav (cml-tts/fr/12080_11650_000047-0001.wav)
  • cml-tts/fr/12205_11650_000004-0002.wav (cml-tts/fr/12205_11650_000004-0002.wav)
  • cml-tts/fr/12977_10625_000037-0001.wav (cml-tts/fr/12977_10625_000037-0001.wav)
  • cml-tts/fr/1406_1028_000009-0003.wav (cml-tts/fr/1406_1028_000009-0003.wav)
  • cml-tts/fr/1591_1028_000108-0004.wav (cml-tts/fr/1591_1028_000108-0004.wav)
  • cml-tts/fr/1770_1028_000036-0002.wav (cml-tts/fr/1770_1028_000036-0002.wav)
  • cml-tts/fr/2114_1656_000053-0001.wav (cml-tts/fr/2114_1656_000053-0001.wav)
  • cml-tts/fr/2154_2576_000020-0003.wav (cml-tts/fr/2154_2576_000020-0003.wav)
  • cml-tts/fr/2216_1745_000007-0001.wav (cml-tts/fr/2216_1745_000007-0001.wav)
  • cml-tts/fr/2223_1745_000009-0002.wav (cml-tts/fr/2223_1745_000009-0002.wav)
  • cml-tts/fr/2465_1943_000152-0002.wav (cml-tts/fr/2465_1943_000152-0002.wav)
  • cml-tts/fr/296_1028_000022-0001.wav (cml-tts/fr/296_1028_000022-0001.wav)
  • cml-tts/fr/3267_1902_000075-0001.wav (cml-tts/fr/3267_1902_000075-0001.wav)
  • cml-tts/fr/4193_3103_000004-0001.wav (cml-tts/fr/4193_3103_000004-0001.wav)
  • cml-tts/fr/4482_3103_000063-0001.wav (cml-tts/fr/4482_3103_000063-0001.wav)
  • cml-tts/fr/4724_3731_000031-0001.wav (cml-tts/fr/4724_3731_000031-0001.wav)
  • cml-tts/fr/4937_3731_000004-0001.wav (cml-tts/fr/4937_3731_000004-0001.wav)
  • cml-tts/fr/5207_3078_000031-0002.wav (cml-tts/fr/5207_3078_000031-0002.wav)
  • cml-tts/fr/5476_3103_000072-0001.wav (cml-tts/fr/5476_3103_000072-0001.wav)
  • cml-tts/fr/577_394_000070-0001.wav (cml-tts/fr/577_394_000070-0001.wav)
  • cml-tts/fr/5790_4893_000052-0001.wav (cml-tts/fr/5790_4893_000052-0001.wav)
  • cml-tts/fr/579_2548_000015-0001.wav (cml-tts/fr/579_2548_000015-0001.wav)
  • cml-tts/fr/5830_4703_000037-0001.wav (cml-tts/fr/5830_4703_000037-0001.wav)
  • cml-tts/fr/6318_7016_000027-0002.wav (cml-tts/fr/6318_7016_000027-0002.wav)
  • cml-tts/fr/7142_2432_000124-0003.wav (cml-tts/fr/7142_2432_000124-0003.wav)
  • cml-tts/fr/7400_2928_000100-0001.wav (cml-tts/fr/7400_2928_000100-0001.wav)
  • cml-tts/fr/7591_6742_000149-0002.wav (cml-tts/fr/7591_6742_000149-0002.wav)
  • cml-tts/fr/7601_7727_000062-0001.wav (cml-tts/fr/7601_7727_000062-0001.wav)
  • cml-tts/fr/7762_8734_000048-0002.wav (cml-tts/fr/7762_8734_000048-0002.wav)
  • cml-tts/fr/8128_7016_000047-0002.wav (cml-tts/fr/8128_7016_000047-0002.wav)
  • cml-tts/fr/928_486_000075-0001.wav (cml-tts/fr/928_486_000075-0001.wav)
  • cml-tts/fr/9834_9697_000150-0003.wav (cml-tts/fr/9834_9697_000150-0003.wav)
  • expresso/ex01-ex02_default_001_channel1_168s.wav (expresso/ex01-ex02_default_001_channel1_168s.wav)
  • expresso/ex01-ex02_default_001_channel2_198s.wav (expresso/ex01-ex02_default_001_channel2_198s.wav)
  • expresso/ex01-ex02_enunciated_001_channel1_432s.wav (expresso/ex01-ex02_enunciated_001_channel1_432s.wav)
  • expresso/ex01-ex02_enunciated_001_channel2_354s.wav (expresso/ex01-ex02_enunciated_001_channel2_354s.wav)
  • expresso/ex01-ex02_fast_001_channel1_104s.wav (expresso/ex01-ex02_fast_001_channel1_104s.wav)
  • expresso/ex01-ex02_fast_001_channel2_73s.wav (expresso/ex01-ex02_fast_001_channel2_73s.wav)
  • expresso/ex01-ex02_projected_001_channel1_46s.wav (expresso/ex01-ex02_projected_001_channel1_46s.wav)
  • expresso/ex01-ex02_projected_002_channel2_248s.wav (expresso/ex01-ex02_projected_002_channel2_248s.wav)
  • expresso/ex01-ex02_whisper_001_channel1_579s.wav (expresso/ex01-ex02_whisper_001_channel1_579s.wav)
  • expresso/ex01-ex02_whisper_001_channel2_717s.wav (expresso/ex01-ex02_whisper_001_channel2_717s.wav)
  • expresso/ex03-ex01_angry_001_channel1_201s.wav (expresso/ex03-ex01_angry_001_channel1_201s.wav)
  • expresso/ex03-ex01_angry_001_channel2_181s.wav (expresso/ex03-ex01_angry_001_channel2_181s.wav)
  • expresso/ex03-ex01_awe_001_channel1_1323s.wav (expresso/ex03-ex01_awe_001_channel1_1323s.wav)
  • expresso/ex03-ex01_awe_001_channel2_1290s.wav (expresso/ex03-ex01_awe_001_channel2_1290s.wav)
  • expresso/ex03-ex01_calm_001_channel1_1143s.wav (expresso/ex03-ex01_calm_001_channel1_1143s.wav)
  • expresso/ex03-ex01_calm_001_channel2_1081s.wav (expresso/ex03-ex01_calm_001_channel2_1081s.wav)
  • expresso/ex03-ex01_confused_001_channel1_909s.wav (expresso/ex03-ex01_confused_001_channel1_909s.wav)
  • expresso/ex03-ex01_confused_001_channel2_816s.wav (expresso/ex03-ex01_confused_001_channel2_816s.wav)
  • expresso/ex03-ex01_desire_004_channel1_545s.wav (expresso/ex03-ex01_desire_004_channel1_545s.wav)
  • expresso/ex03-ex01_desire_004_channel2_580s.wav (expresso/ex03-ex01_desire_004_channel2_580s.wav)
  • expresso/ex03-ex01_disgusted_004_channel1_170s.wav (expresso/ex03-ex01_disgusted_004_channel1_170s.wav)
  • expresso/ex03-ex01_enunciated_001_channel1_388s.wav (expresso/ex03-ex01_enunciated_001_channel1_388s.wav)
  • expresso/ex03-ex01_enunciated_001_channel2_576s.wav (expresso/ex03-ex01_enunciated_001_channel2_576s.wav)
  • expresso/ex03-ex01_happy_001_channel1_334s.wav (expresso/ex03-ex01_happy_001_channel1_334s.wav)
  • expresso/ex03-ex01_happy_001_channel2_257s.wav (expresso/ex03-ex01_happy_001_channel2_257s.wav)
  • expresso/ex03-ex01_laughing_001_channel1_188s.wav (expresso/ex03-ex01_laughing_001_channel1_188s.wav)
  • expresso/ex03-ex01_laughing_002_channel2_232s.wav (expresso/ex03-ex01_laughing_002_channel2_232s.wav)
  • expresso/ex03-ex01_nonverbal_001_channel2_37s.wav (expresso/ex03-ex01_nonverbal_001_channel2_37s.wav)
  • expresso/ex03-ex01_nonverbal_006_channel1_62s.wav (expresso/ex03-ex01_nonverbal_006_channel1_62s.wav)
  • expresso/ex03-ex01_sarcastic_001_channel1_435s.wav (expresso/ex03-ex01_sarcastic_001_channel1_435s.wav)
  • expresso/ex03-ex01_sarcastic_001_channel2_491s.wav (expresso/ex03-ex01_sarcastic_001_channel2_491s.wav)
  • expresso/ex03-ex01_sleepy_001_channel1_619s.wav (expresso/ex03-ex01_sleepy_001_channel1_619s.wav)
  • expresso/ex03-ex01_sleepy_001_channel2_662s.wav (expresso/ex03-ex01_sleepy_001_channel2_662s.wav)
  • expresso/ex03-ex02_animal-animaldir_002_channel2_89s.wav (expresso/ex03-ex02_animal-animaldir_002_channel2_89s.wav)
  • expresso/ex03-ex02_animal-animaldir_003_channel1_32s.wav (expresso/ex03-ex02_animal-animaldir_003_channel1_32s.wav)
  • expresso/ex03-ex02_animaldir-animal_008_channel1_147s.wav (expresso/ex03-ex02_animaldir-animal_008_channel1_147s.wav)
  • expresso/ex03-ex02_animaldir-animal_008_channel2_136s.wav (expresso/ex03-ex02_animaldir-animal_008_channel2_136s.wav)
  • expresso/ex03-ex02_child-childdir_001_channel1_291s.wav (expresso/ex03-ex02_child-childdir_001_channel1_291s.wav)
  • expresso/ex03-ex02_child-childdir_001_channel2_69s.wav (expresso/ex03-ex02_child-childdir_001_channel2_69s.wav)
  • expresso/ex03-ex02_childdir-child_004_channel1_308s.wav (expresso/ex03-ex02_childdir-child_004_channel1_308s.wav)
  • expresso/ex03-ex02_childdir-child_004_channel2_187s.wav (expresso/ex03-ex02_childdir-child_004_channel2_187s.wav)
  • expresso/ex03-ex02_laughing_001_channel1_248s.wav (expresso/ex03-ex02_laughing_001_channel1_248s.wav)
  • expresso/ex03-ex02_laughing_001_channel2_234s.wav (expresso/ex03-ex02_laughing_001_channel2_234s.wav)
  • expresso/ex03-ex02_narration_001_channel1_674s.wav (expresso/ex03-ex02_narration_001_channel1_674s.wav)
  • expresso/ex03-ex02_narration_002_channel2_1136s.wav (expresso/ex03-ex02_narration_002_channel2_1136s.wav)
  • expresso/ex03-ex02_sad-sympathetic_001_channel1_454s.wav (expresso/ex03-ex02_sad-sympathetic_001_channel1_454s.wav)
  • expresso/ex03-ex02_sad-sympathetic_001_channel2_400s.wav (expresso/ex03-ex02_sad-sympathetic_001_channel2_400s.wav)
  • expresso/ex03-ex02_sympathetic-sad_008_channel1_215s.wav (expresso/ex03-ex02_sympathetic-sad_008_channel1_215s.wav)
  • expresso/ex03-ex02_sympathetic-sad_008_channel2_268s.wav (expresso/ex03-ex02_sympathetic-sad_008_channel2_268s.wav)
  • expresso/ex04-ex01_animal-animaldir_006_channel1_196s.wav (expresso/ex04-ex01_animal-animaldir_006_channel1_196s.wav)
  • expresso/ex04-ex01_animal-animaldir_006_channel2_49s.wav (expresso/ex04-ex01_animal-animaldir_006_channel2_49s.wav)
  • expresso/ex04-ex01_animaldir-animal_001_channel1_118s.wav (expresso/ex04-ex01_animaldir-animal_001_channel1_118s.wav)
  • expresso/ex04-ex01_animaldir-animal_004_channel2_88s.wav (expresso/ex04-ex01_animaldir-animal_004_channel2_88s.wav)
  • expresso/ex04-ex01_child-childdir_003_channel2_283s.wav (expresso/ex04-ex01_child-childdir_003_channel2_283s.wav)
  • expresso/ex04-ex01_child-childdir_004_channel1_118s.wav (expresso/ex04-ex01_child-childdir_004_channel1_118s.wav)
  • expresso/ex04-ex01_childdir-child_001_channel1_228s.wav (expresso/ex04-ex01_childdir-child_001_channel1_228s.wav)
  • expresso/ex04-ex01_childdir-child_001_channel2_420s.wav (expresso/ex04-ex01_childdir-child_001_channel2_420s.wav)
  • expresso/ex04-ex01_disgusted_001_channel1_130s.wav (expresso/ex04-ex01_disgusted_001_channel1_130s.wav)
  • expresso/ex04-ex01_disgusted_001_channel2_325s.wav (expresso/ex04-ex01_disgusted_001_channel2_325s.wav)
  • expresso/ex04-ex01_laughing_001_channel1_306s.wav (expresso/ex04-ex01_laughing_001_channel1_306s.wav)
  • expresso/ex04-ex01_laughing_001_channel2_293s.wav (expresso/ex04-ex01_laughing_001_channel2_293s.wav)
  • expresso/ex04-ex01_narration_001_channel1_605s.wav (expresso/ex04-ex01_narration_001_channel1_605s.wav)
  • expresso/ex04-ex01_narration_001_channel2_686s.wav (expresso/ex04-ex01_narration_001_channel2_686s.wav)
  • expresso/ex04-ex01_sad-sympathetic_001_channel1_267s.wav (expresso/ex04-ex01_sad-sympathetic_001_channel1_267s.wav)
  • expresso/ex04-ex01_sad-sympathetic_001_channel2_346s.wav (expresso/ex04-ex01_sad-sympathetic_001_channel2_346s.wav)
  • expresso/ex04-ex01_sympathetic-sad_008_channel1_415s.wav (expresso/ex04-ex01_sympathetic-sad_008_channel1_415s.wav)
  • expresso/ex04-ex01_sympathetic-sad_008_channel2_453s.wav (expresso/ex04-ex01_sympathetic-sad_008_channel2_453s.wav)
  • expresso/ex04-ex02_angry_001_channel1_119s.wav (expresso/ex04-ex02_angry_001_channel1_119s.wav)
  • expresso/ex04-ex02_angry_001_channel2_150s.wav (expresso/ex04-ex02_angry_001_channel2_150s.wav)
  • expresso/ex04-ex02_awe_001_channel1_982s.wav (expresso/ex04-ex02_awe_001_channel1_982s.wav)
  • expresso/ex04-ex02_awe_001_channel2_1013s.wav (expresso/ex04-ex02_awe_001_channel2_1013s.wav)
  • expresso/ex04-ex02_bored_001_channel1_254s.wav (expresso/ex04-ex02_bored_001_channel1_254s.wav)
  • expresso/ex04-ex02_bored_001_channel2_232s.wav (expresso/ex04-ex02_bored_001_channel2_232s.wav)
  • expresso/ex04-ex02_calm_001_channel2_336s.wav (expresso/ex04-ex02_calm_001_channel2_336s.wav)
  • expresso/ex04-ex02_calm_002_channel1_480s.wav (expresso/ex04-ex02_calm_002_channel1_480s.wav)
  • expresso/ex04-ex02_confused_001_channel1_499s.wav (expresso/ex04-ex02_confused_001_channel1_499s.wav)
  • expresso/ex04-ex02_confused_001_channel2_488s.wav (expresso/ex04-ex02_confused_001_channel2_488s.wav)
  • expresso/ex04-ex02_desire_001_channel1_657s.wav (expresso/ex04-ex02_desire_001_channel1_657s.wav)
  • expresso/ex04-ex02_desire_001_channel2_694s.wav (expresso/ex04-ex02_desire_001_channel2_694s.wav)
  • expresso/ex04-ex02_disgusted_001_channel2_98s.wav (expresso/ex04-ex02_disgusted_001_channel2_98s.wav)
  • expresso/ex04-ex02_disgusted_004_channel1_169s.wav (expresso/ex04-ex02_disgusted_004_channel1_169s.wav)
  • expresso/ex04-ex02_enunciated_001_channel1_496s.wav (expresso/ex04-ex02_enunciated_001_channel1_496s.wav)
  • expresso/ex04-ex02_enunciated_001_channel2_898s.wav (expresso/ex04-ex02_enunciated_001_channel2_898s.wav)
  • expresso/ex04-ex02_fearful_001_channel1_316s.wav (expresso/ex04-ex02_fearful_001_channel1_316s.wav)
  • expresso/ex04-ex02_fearful_001_channel2_266s.wav (expresso/ex04-ex02_fearful_001_channel2_266s.wav)
  • expresso/ex04-ex02_happy_001_channel1_118s.wav (expresso/ex04-ex02_happy_001_channel1_118s.wav)
  • expresso/ex04-ex02_happy_001_channel2_140s.wav (expresso/ex04-ex02_happy_001_channel2_140s.wav)
  • expresso/ex04-ex02_laughing_001_channel1_147s.wav (expresso/ex04-ex02_laughing_001_channel1_147s.wav)
  • expresso/ex04-ex02_laughing_001_channel2_159s.wav (expresso/ex04-ex02_laughing_001_channel2_159s.wav)
  • expresso/ex04-ex02_nonverbal_004_channel1_18s.wav (expresso/ex04-ex02_nonverbal_004_channel1_18s.wav)
  • expresso/ex04-ex02_nonverbal_004_channel2_71s.wav (expresso/ex04-ex02_nonverbal_004_channel2_71s.wav)
  • expresso/ex04-ex02_sarcastic_001_channel1_519s.wav (expresso/ex04-ex02_sarcastic_001_channel1_519s.wav)
  • expresso/ex04-ex02_sarcastic_001_channel2_466s.wav (expresso/ex04-ex02_sarcastic_001_channel2_466s.wav)
  • expresso/ex04-ex03_default_001_channel1_3s.wav (expresso/ex04-ex03_default_001_channel1_3s.wav)
  • expresso/ex04-ex03_default_002_channel2_239s.wav (expresso/ex04-ex03_default_002_channel2_239s.wav)
  • expresso/ex04-ex03_enunciated_001_channel1_86s.wav (expresso/ex04-ex03_enunciated_001_channel1_86s.wav)
  • expresso/ex04-ex03_enunciated_001_channel2_342s.wav (expresso/ex04-ex03_enunciated_001_channel2_342s.wav)
  • expresso/ex04-ex03_fast_001_channel1_208s.wav (expresso/ex04-ex03_fast_001_channel1_208s.wav)
  • expresso/ex04-ex03_fast_001_channel2_25s.wav (expresso/ex04-ex03_fast_001_channel2_25s.wav)
  • expresso/ex04-ex03_projected_001_channel1_192s.wav (expresso/ex04-ex03_projected_001_channel1_192s.wav)
  • expresso/ex04-ex03_projected_001_channel2_179s.wav (expresso/ex04-ex03_projected_001_channel2_179s.wav)
  • expresso/ex04-ex03_whisper_001_channel1_198s.wav (expresso/ex04-ex03_whisper_001_channel1_198s.wav)
  • expresso/ex04-ex03_whisper_002_channel2_266s.wav (expresso/ex04-ex03_whisper_002_channel2_266s.wav)
  • unmute-prod-website/default_voice.wav (unmute-prod-website/default_voice.wav)
  • unmute-prod-website/degaulle-2.wav (unmute-prod-website/degaulle-2.wav)
  • unmute-prod-website/developpeuse-3.wav (unmute-prod-website/developpeuse-3.wav)
  • unmute-prod-website/ex04_narration_longform_00001.wav (unmute-prod-website/ex04_narration_longform_00001.wav)
  • unmute-prod-website/fabieng-enhanced-v2.wav (unmute-prod-website/fabieng-enhanced-v2.wav)
  • unmute-prod-website/p329_022.wav (unmute-prod-website/p329_022.wav)
  • vctk/p225_023.wav (vctk/p225_023.wav)
  • vctk/p226_023.wav (vctk/p226_023.wav)
  • vctk/p227_023.wav (vctk/p227_023.wav)
  • vctk/p228_023.wav (vctk/p228_023.wav)
  • vctk/p229_023.wav (vctk/p229_023.wav)
  • vctk/p230_023.wav (vctk/p230_023.wav)
  • vctk/p231_023.wav (vctk/p231_023.wav)
  • vctk/p232_023.wav (vctk/p232_023.wav)
  • vctk/p233_023.wav (vctk/p233_023.wav)
  • vctk/p234_023.wav (vctk/p234_023.wav)
  • vctk/p236_023.wav (vctk/p236_023.wav)
  • vctk/p237_023.wav (vctk/p237_023.wav)
  • vctk/p238_023.wav (vctk/p238_023.wav)
  • vctk/p239_023.wav (vctk/p239_023.wav)
  • vctk/p240_023.wav (vctk/p240_023.wav)
  • vctk/p241_023.wav (vctk/p241_023.wav)
  • vctk/p243_023.wav (vctk/p243_023.wav)
  • vctk/p244_023.wav (vctk/p244_023.wav)
  • vctk/p245_023.wav (vctk/p245_023.wav)
  • vctk/p246_023.wav (vctk/p246_023.wav)
  • vctk/p247_023.wav (vctk/p247_023.wav)
  • vctk/p248_023.wav (vctk/p248_023.wav)
  • vctk/p249_023.wav (vctk/p249_023.wav)
  • vctk/p250_023.wav (vctk/p250_023.wav)
  • vctk/p251_023.wav (vctk/p251_023.wav)
  • vctk/p252_023.wav (vctk/p252_023.wav)
  • vctk/p253_023.wav (vctk/p253_023.wav)
  • vctk/p254_023.wav (vctk/p254_023.wav)
  • vctk/p255_023.wav (vctk/p255_023.wav)
  • vctk/p256_023.wav (vctk/p256_023.wav)
  • vctk/p257_023.wav (vctk/p257_023.wav)
  • vctk/p258_023.wav (vctk/p258_023.wav)
  • vctk/p259_023.wav (vctk/p259_023.wav)
  • vctk/p260_023.wav (vctk/p260_023.wav)
  • vctk/p261_023.wav (vctk/p261_023.wav)
  • vctk/p262_023.wav (vctk/p262_023.wav)
  • vctk/p263_023.wav (vctk/p263_023.wav)
  • vctk/p264_023.wav (vctk/p264_023.wav)
  • vctk/p265_023.wav (vctk/p265_023.wav)
  • vctk/p266_023.wav (vctk/p266_023.wav)
  • vctk/p267_023.wav (vctk/p267_023.wav)
  • vctk/p269_023.wav (vctk/p269_023.wav)
  • vctk/p270_023.wav (vctk/p270_023.wav)
  • vctk/p271_023.wav (vctk/p271_023.wav)
  • vctk/p272_023.wav (vctk/p272_023.wav)
  • vctk/p273_023.wav (vctk/p273_023.wav)
  • vctk/p274_023.wav (vctk/p274_023.wav)
  • vctk/p275_023.wav (vctk/p275_023.wav)
  • vctk/p276_023.wav (vctk/p276_023.wav)
  • vctk/p277_023.wav (vctk/p277_023.wav)
  • vctk/p278_023.wav (vctk/p278_023.wav)
  • vctk/p279_023.wav (vctk/p279_023.wav)
  • vctk/p280_023.wav (vctk/p280_023.wav)
  • vctk/p281_023.wav (vctk/p281_023.wav)
  • vctk/p282_023.wav (vctk/p282_023.wav)
  • vctk/p283_023.wav (vctk/p283_023.wav)
  • vctk/p284_023.wav (vctk/p284_023.wav)
  • vctk/p285_023.wav (vctk/p285_023.wav)
  • vctk/p286_023.wav (vctk/p286_023.wav)
  • vctk/p287_023.wav (vctk/p287_023.wav)
  • vctk/p288_023.wav (vctk/p288_023.wav)
  • vctk/p292_023.wav (vctk/p292_023.wav)
  • vctk/p293_023.wav (vctk/p293_023.wav)
  • vctk/p294_023.wav (vctk/p294_023.wav)
  • vctk/p297_023.wav (vctk/p297_023.wav)
  • vctk/p298_023.wav (vctk/p298_023.wav)
  • vctk/p299_023.wav (vctk/p299_023.wav)
  • vctk/p300_023.wav (vctk/p300_023.wav)
  • vctk/p301_023.wav (vctk/p301_023.wav)
  • vctk/p302_023.wav (vctk/p302_023.wav)
  • vctk/p303_023.wav (vctk/p303_023.wav)
  • vctk/p304_023.wav (vctk/p304_023.wav)
  • vctk/p305_023.wav (vctk/p305_023.wav)
  • vctk/p306_023.wav (vctk/p306_023.wav)
  • vctk/p307_023.wav (vctk/p307_023.wav)
  • vctk/p308_023.wav (vctk/p308_023.wav)
  • vctk/p310_023.wav (vctk/p310_023.wav)
  • vctk/p311_023.wav (vctk/p311_023.wav)
  • vctk/p312_023.wav (vctk/p312_023.wav)
  • vctk/p313_023.wav (vctk/p313_023.wav)
  • vctk/p314_023.wav (vctk/p314_023.wav)
  • vctk/p315_023.wav (vctk/p315_023.wav)
  • vctk/p316_023.wav (vctk/p316_023.wav)
  • vctk/p317_023.wav (vctk/p317_023.wav)
  • vctk/p318_023.wav (vctk/p318_023.wav)
  • vctk/p323_023.wav (vctk/p323_023.wav)
  • vctk/p326_023.wav (vctk/p326_023.wav)
  • vctk/p329_023.wav (vctk/p329_023.wav)
  • vctk/p330_023.wav (vctk/p330_023.wav)
  • vctk/p333_023.wav (vctk/p333_023.wav)
  • vctk/p334_023.wav (vctk/p334_023.wav)
  • vctk/p335_023.wav (vctk/p335_023.wav)
  • vctk/p336_023.wav (vctk/p336_023.wav)
  • vctk/p339_023.wav (vctk/p339_023.wav)
  • vctk/p341_023.wav (vctk/p341_023.wav)
  • vctk/p343_023.wav (vctk/p343_023.wav)
  • vctk/p345_023.wav (vctk/p345_023.wav)
  • vctk/p347_023.wav (vctk/p347_023.wav)
  • vctk/p351_023.wav (vctk/p351_023.wav)
  • vctk/p360_023.wav (vctk/p360_023.wav)
  • vctk/p361_023.wav (vctk/p361_023.wav)
  • vctk/p363_023.wav (vctk/p363_023.wav)
  • vctk/p364_023.wav (vctk/p364_023.wav)
  • vctk/p374_023.wav (vctk/p374_023.wav)
  • vctk/p376_023.wav (vctk/p376_023.wav)
  • vctk/s5_023.wav (vctk/s5_023.wav)
Search Voices (Voicesearch) op('tts_kyutai').par.Voicesearch Str
Default:
"" (Empty String)
TTS Kyutai (Enginestatus) op('tts_kyutai').par.Enginestatus Str
Default:
"" (Empty String)
Streaming Mode (Streamingmode) op('tts_kyutai').par.Streamingmode Toggle
Default:
False
Temperature (Temperature) op('tts_kyutai').par.Temperature Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
CFG Coefficient (Cfgcoef) op('tts_kyutai').par.Cfgcoef Float
Default:
0.0
Range:
0.5 to 4
Slider Range:
0.5 to 4
Padding Between (sec) (Paddingbetween) op('tts_kyutai').par.Paddingbetween Int
Default:
0
Range:
0 to 5
Slider Range:
0 to 5
Clear Queue (Clearqueue) op('tts_kyutai').par.Clearqueue Pulse
Default:
False
Stop Synthesis (Stopsynth) op('tts_kyutai').par.Stopsynth Pulse
Default:
False
Clear Audio Buffers (Clearaudio) op('tts_kyutai').par.Clearaudio Pulse

Clears all generated audio from memory and from the output CHOPs (store_output and full_audio).

Default:
False
Audio Device Settings Header
Reset Playback (Resetpulse) op('tts_kyutai').par.Resetpulse Pulse
Default:
False
Active (Audioactive) op('tts_kyutai').par.Audioactive Toggle
Default:
True
Driver (Driver) op('tts_kyutai').par.Driver Menu
Default:
default
Options:
default, asio
Device (Audiodevice) op('tts_kyutai').par.Audiodevice Menu
Default:
default
Options:
default, {0.0.0.00000000}.{d7b929aa-ec27-4f96-bd39-78d6a8c2044a}||Out_1-2_(MOTU_M_Series)||1, {0.0.0.00000000}.{044f8ef8-f1ad-4655-90d6-0aef7b713b78}||Voicemeeter_AUX_Input_(VB-Audio_Voicemeeter_VAIO)||2, {0.0.0.00000000}.{170cc7c6-264f-46f4-a652-4c80058e49d2}||LS27A70_(NVIDIA_High_Definition_Audio)||3, {0.0.0.00000000}.{2162b344-60a4-4dda-a068-a0887826a518}||Voicemeeter_VAIO3_Input_(VB-Audio_Voicemeeter_VAIO)||4, {0.0.0.00000000}.{25b313f1-54e3-4bae-b661-e07474413cab}||Voicemeeter_In_5_(VB-Audio_Voicemeeter_VAIO)||5, {0.0.0.00000000}.{34b7624e-63b1-49fc-93dc-1d03ca1dd600}||CABLE-B_In_16ch_(VB-Audio_Virtual_Cable_B)||6, {0.0.0.00000000}.{372fb62b-07aa-4580-b062-2c6adba187e7}||Out_3-4_(MOTU_M_Series)||7, {0.0.0.00000000}.{394e3e0c-eac4-4fc9-ad76-1600f0cb570b}||CABLE-A_Input_(VB-Audio_Virtual_Cable_A)||8, {0.0.0.00000000}.{6b4330e7-1895-4e98-8c9b-f21172031db3}||F13NA_(NVIDIA_High_Definition_Audio)||9, {0.0.0.00000000}.{95545e0a-9e74-42cc-9835-fa5b33914d6c}||Voicemeeter_In_2_(VB-Audio_Voicemeeter_VAIO)||10, {0.0.0.00000000}.{9f9a792f-c06d-4923-9f4b-da5200378f26}||LEN_P32u-10_(NVIDIA_High_Definition_Audio)||11, {0.0.0.00000000}.{a4936fe7-9e56-4176-9789-f343437419f2}||Voicemeeter_In_4_(VB-Audio_Voicemeeter_VAIO)||12, {0.0.0.00000000}.{a720258f-2455-411a-a701-1f8ecee9d3d6}||Voicemeeter_In_3_(VB-Audio_Voicemeeter_VAIO)||13, {0.0.0.00000000}.{ad8837d6-c905-4a72-adcb-7018ee7baab3}||CABLE_Input_(VB-Audio_Virtual_Cable)||14, {0.0.0.00000000}.{b6526f47-8c31-48d5-8b30-489196c56a6b}||Headphones_(iLoud_Micro-Monitor)||15, {0.0.0.00000000}.{b8fcdf16-5164-47d2-80aa-0f80ce4bd0b5}||Voicemeeter_Input_(VB-Audio_Voicemeeter_VAIO)||16, {0.0.0.00000000}.{c6e56f21-72eb-46b0-9761-7ba5c2286004}||Voicemeeter_In_1_(VB-Audio_Voicemeeter_VAIO)||17, {0.0.0.00000000}.{d8576751-f212-4fa7-8cb5-0f825e64c87e}||CABLE-B_Input_(VB-Audio_Virtual_Cable_B)||18, {0.0.0.00000000}.{fddd6891-fc7b-4afa-8c26-9246692a19f0}||CABLE-A_In_16ch_(VB-Audio_Virtual_Cable_A)||19
Volume (Volume) op('tts_kyutai').par.Volume Float
Default:
1.0
Range:
0 to 1
Slider Range:
0 to 1
Auto Save To Disk (Autosavetodisk) op('tts_kyutai').par.Autosavetodisk Toggle

Automatically save generated audio and metadata locally after successful synthesis.

Default:
False
Save Folder (Folder) op('tts_kyutai').par.Folder Folder

Folder where generated audio files and metadata are saved.

Default:
"" (Empty String)
Base Name (Name) op('tts_kyutai').par.Name Str

Base filename for saved audio. Use $TIMESTAMP for unique names.

Default:
"" (Empty String)
File Type (Filetype) op('tts_kyutai').par.Filetype Menu

Audio file format.

Default:
wav
Options:
wav, ogg
Auto Version Files (Autoversion) op('tts_kyutai').par.Autoversion Toggle

Automatically add _1, _2, etc. if filename exists.

Default:
False
Save Current Audio (Savefile) op('tts_kyutai').par.Savefile Pulse

Saves the audio currently in the output CHOP to a file using the settings above.

Default:
False
Dependencies Available (Installdependencies) op('tts_kyutai').par.Installdependencies Pulse
Default:
False
Model Repository (Modelrepo) op('tts_kyutai').par.Modelrepo Str
Default:
"" (Empty String)
Download Model (Downloadmodel) op('tts_kyutai').par.Downloadmodel Pulse
Default:
False
Voice Repository (Voicerepo) op('tts_kyutai').par.Voicerepo Str
Default:
"" (Empty String)
Download Voices (Downloadvoices) op('tts_kyutai').par.Downloadvoices Pulse
Default:
False
Worker Connection Settings Header
IPC Mode (Ipcmode) op('tts_kyutai').par.Ipcmode Menu
Default:
tcp
Options:
tcp, stdio
Monitor Worker Logs (stderr) (Monitorworkerlogs) op('tts_kyutai').par.Monitorworkerlogs Toggle
Default:
False
Auto Reattach On Init (Autoreattachoninit) op('tts_kyutai').par.Autoreattachoninit Toggle
Default:
False
Force Attach (Skip PID Check) (Forceattachoninit) op('tts_kyutai').par.Forceattachoninit Toggle
Default:
False
Worker Logging Level (Workerlogging) op('tts_kyutai').par.Workerlogging Menu
Default:
OFF
Options:
OFF, CRITICAL, ERROR, WARNING, INFO, DEBUG
Device (Device) op('tts_kyutai').par.Device Menu
Default:
auto
Options:
auto, cpu, cuda
v1.1.22026-03-26

Initial release

v1.1.12026-03-01
  • Fix TD 32050+ freeze by removing moshi import at module level - Hardcode DEFAULT_DSM_TTS_REPO constants instead of importing from moshi - Use importlib.metadata for dependency checking
  • Initial commit
v1.1.02025-08-17
  • NEW: TCP IPC Mode - Added robust TCP communication with worker processes (recommended over STDIO)
  • NEW: Auto Worker Reattach - Automatically reconnect to existing workers on TD restart/reload
  • NEW: TCP Heartbeat System - Automatic connection monitoring with reconnect on timeout
  • NEW: Sophisticated Audio Saving - Auto-save with metadata, versioning, and multiple formats (WAV/OGG)
  • NEW: Clear Audio Method - Clear audio buffers and CHOPs with one button
  • NEW: Manual Save Function - Save current audio with comprehensive metadata tracking
  • IMPROVED: Parameter Organization - Cleaned and reorganized parameter menus for better UX
  • IMPROVED: Method Naming - Renamed Synthesize to Texttospeech with optional text parameter
  • IMPROVED: Connection Reliability - Automatic TCP reconnection and worker process persistence
  • IMPROVED: Audio Management - Enhanced buffering and progressive CHOP updates