Skip to content

VAD Silero

v2.1.0Updated

The VAD Silero LOP provides real-time Voice Activity Detection (VAD) using the Silero VAD model. It monitors an audio stream and determines whether someone is currently speaking, exposing speech state through parameters and CHOP-exportable dependencies. This operator is designed for low-latency applications such as triggering speech-to-text, controlling push-to-talk flows, or gating audio processing.

Audio is processed asynchronously in fixed 512-sample chunks (32ms at 16kHz). The operator uses a hysteresis threshold to avoid rapid toggling between speech and silence states.

  • Python package: silero-vad-lite — pulse Install Dependencies on the VAD Settings page to install it automatically.

This operator has no wired inputs. Audio is received programmatically via the ReceiveAudioChunk method, which accepts a NumPy float32 array of 16kHz mono audio samples. Other LOPs operators (such as speech-to-text operators) call this method to feed audio data into the VAD pipeline.

  • Output 1: Exposes three CHOP-monitorable dependencies:
    • SpeechActiveTrue while speech is detected, False during silence
    • OnSpeechStart — pulses True for one frame when speech begins
    • OnSpeechEnd — pulses True for one frame when speech ends

The Is Speaking read-only toggle on the VAD Settings page also reflects the current speech state.

  1. Place a VAD Silero LOP in your network.
  2. On the VAD Settings page, pulse Install Dependencies if this is the first time using the operator.
  3. Pulse Load Model to load the Silero VAD model. The Model Ready indicator will turn on when the model finishes loading.
  4. Toggle Active Monitor CHOPin1 to On to begin processing incoming audio.
  1. Enable Auto Load on Init on the VAD Settings page.
  2. The model will load automatically each time the project opens, so you do not need to manually pulse Load Model.

All sensitivity parameters update the running model in real time — no need to reload the model after changing them.

  1. Speech Threshold — controls how confident the model must be that speech is present before triggering. Higher values reduce false positives but may miss quiet speech. The end-of-speech threshold is automatically set 0.15 below this value (hysteresis).
  2. Min Silence Duration (ms) — the minimum length of silence before the operator considers speech to have ended. Increase this to avoid cutting off speakers who pause briefly.
  3. Speech Pad (ms) — extra padding added to the start and end of detected speech segments. Useful for capturing the onset of words that begin softly.

Use a CHOP Export or Expression to read the dependency values from the operator:

  • SpeechActive for a continuous on/off signal
  • OnSpeechStart for a single-frame pulse when speech begins
  • OnSpeechEnd for a single-frame pulse when speech ends

These can drive downstream logic such as starting a recording, triggering STT, or enabling an audio gate.

  • Model fails to load: Make sure dependencies are installed by pulsing Install Dependencies first. Check the Logger inside the operator for specific error messages.
  • No speech detected even though audio is playing: Verify the audio source is 16kHz mono. Multi-channel audio is automatically downmixed to the first channel, but the sample rate must be 16kHz.
  • Speech toggles on and off rapidly: Increase Min Silence Duration (ms) to require a longer silence gap before the operator declares speech has ended. You can also raise Speech Threshold to require higher confidence.
  • Model Ready is off after toggling Active: The model must be loaded before activating. Pulse Load Model or enable Auto Load on Init.

Research & Licensing

Silero AI

Silero AI specializes in speech recognition and voice processing, creating enterprise-grade, production-ready speech models released as open source.

Silero VAD

Silero VAD is a pre-trained Voice Activity Detector that provides reliable speech detection with minimal computational requirements, suitable for real-time voice processing systems.

Technical Details

  • Lightweight ONNX runtime for cross-platform compatibility (Windows, Mac, Linux)
  • 16kHz audio processing in 512-sample (32ms) chunks
  • Hysteresis-based threshold with configurable silence duration and speech padding

Research Impact

  • Production-ready VAD for commercial and creative applications
  • Open source alternative to commercial VAD solutions under MIT license
  • Real-time performance suitable for live audio and interactive installations

Citation

@misc{silero2024vad,
  author={Silero Team},
  title={Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD)},
  year={2024},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/snakers4/silero-vad}},
  email={hello@silero.ai}
}

Key Research Contributions

  • Enterprise-grade Voice Activity Detection with high accuracy
  • Real-time processing optimized for low-latency applications
  • Pre-trained model requiring no additional training or fine-tuning

License

MIT License - This model is freely available for research and commercial use.

Is Speaking (Isspeaking) op('vad_silero').par.Isspeaking Toggle
Default:
False
Active Monitor CHOPin1 (Active) op('vad_silero').par.Active Toggle
Default:
False
Model Ready (Modelready) op('vad_silero').par.Modelready Toggle
Default:
False
Load Model (Loadmodel) op('vad_silero').par.Loadmodel Pulse
Default:
False
Unload Model (Unloadmodel) op('vad_silero').par.Unloadmodel Pulse
Default:
False
Auto Load on Init (Autoloadoninit) op('vad_silero').par.Autoloadoninit Toggle
Default:
False
Speech Threshold (Speechthreshold) op('vad_silero').par.Speechthreshold Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Min Silence Duration (ms) (Minsilenceduration) op('vad_silero').par.Minsilenceduration Int
Default:
0
Range:
0 to 1000
Slider Range:
0 to 1000
Speech Pad (ms) (Speechpadding) op('vad_silero').par.Speechpadding Int
Default:
0
Range:
0 to 500
Slider Range:
0 to 500
Backend (Backend) op('vad_silero').par.Backend Menu
Default:
onnxlite
Options:
onnxlite, pytorch
Install Dependencies (Installdependencies) op('vad_silero').par.Installdependencies Pulse
Default:
False
v2.1.02026-02-18
  • Remove PyTorch backend option, ONNX-only now - Fix TD 32050+ compatibility (ONNX does not import torch)
  • add tox
  • Initial commit
v2.0.02025-12-12

## Major Update: Cross-Platform ONNX Backend

New Features

  • Added ONNX Lite backend (silero-vad-lite) for cross-platform support (Windows/Mac/Linux)
  • Backend selector parameter to choose between ONNX Lite and PyTorch
  • Live parameter updates - threshold, min silence duration, and speech padding update in real-time without needing to reload

Changes

  • ONNX Lite is now the default backend (no PyTorch dependency required)
  • Removed min_speech_duration parameter (not in original Silero VADIterator, added latency)
  • Removed Apply Settings pulse (parameters now update live)
  • PyTorch backend label corrected to Windows-only (TouchDesigner doesn't run on Linux)

Technical

  • OnnxLiteVADIterator class replicates Silero VADIterator state machine for streaming
  • Both backends share identical parameters: Speechthreshold, Minsilenceduration, Speechpadding
  • Threshold hysteresis maintained (neg_threshold = threshold - 0.15)
v1.0.12025-07-01

fixed paraexec frame dropping by specifically linking ONLY certain pars to be executed. Active *oad* *peech* *ilence*

v1.0.02025-06-29

Created the operator.

simple op to use silero vad to detect speech and silence.

added speech threshold silence duration and padding parameters + handy / easy pars for status , loading, unloading, and downloadng the model.

requirements are torch and torchaudio.