VAD Silero
Overview
Section titled “Overview”The VAD Silero LOP provides real-time Voice Activity Detection (VAD) using the Silero VAD model. It monitors an audio stream and determines whether someone is currently speaking, exposing speech state through parameters and CHOP-exportable dependencies. This operator is designed for low-latency applications such as triggering speech-to-text, controlling push-to-talk flows, or gating audio processing.
Audio is processed asynchronously in fixed 512-sample chunks (32ms at 16kHz). The operator uses a hysteresis threshold to avoid rapid toggling between speech and silence states.
Requirements
Section titled “Requirements”- Python package:
silero-vad-lite— pulse Install Dependencies on the VAD Settings page to install it automatically.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”This operator has no wired inputs. Audio is received programmatically via the ReceiveAudioChunk method, which accepts a NumPy float32 array of 16kHz mono audio samples. Other LOPs operators (such as speech-to-text operators) call this method to feed audio data into the VAD pipeline.
Outputs
Section titled “Outputs”- Output 1: Exposes three CHOP-monitorable dependencies:
- SpeechActive —
Truewhile speech is detected,Falseduring silence - OnSpeechStart — pulses
Truefor one frame when speech begins - OnSpeechEnd — pulses
Truefor one frame when speech ends
- SpeechActive —
The Is Speaking read-only toggle on the VAD Settings page also reflects the current speech state.
Usage Examples
Section titled “Usage Examples”Basic Setup
Section titled “Basic Setup”- Place a VAD Silero LOP in your network.
- On the VAD Settings page, pulse Install Dependencies if this is the first time using the operator.
- Pulse Load Model to load the Silero VAD model. The Model Ready indicator will turn on when the model finishes loading.
- Toggle Active Monitor CHOPin1 to On to begin processing incoming audio.
Auto-Loading on Startup
Section titled “Auto-Loading on Startup”- Enable Auto Load on Init on the VAD Settings page.
- The model will load automatically each time the project opens, so you do not need to manually pulse Load Model.
Tuning Detection Sensitivity
Section titled “Tuning Detection Sensitivity”All sensitivity parameters update the running model in real time — no need to reload the model after changing them.
- Speech Threshold — controls how confident the model must be that speech is present before triggering. Higher values reduce false positives but may miss quiet speech. The end-of-speech threshold is automatically set 0.15 below this value (hysteresis).
- Min Silence Duration (ms) — the minimum length of silence before the operator considers speech to have ended. Increase this to avoid cutting off speakers who pause briefly.
- Speech Pad (ms) — extra padding added to the start and end of detected speech segments. Useful for capturing the onset of words that begin softly.
Monitoring Speech State in Your Network
Section titled “Monitoring Speech State in Your Network”Use a CHOP Export or Expression to read the dependency values from the operator:
SpeechActivefor a continuous on/off signalOnSpeechStartfor a single-frame pulse when speech beginsOnSpeechEndfor a single-frame pulse when speech ends
These can drive downstream logic such as starting a recording, triggering STT, or enabling an audio gate.
Troubleshooting
Section titled “Troubleshooting”- Model fails to load: Make sure dependencies are installed by pulsing Install Dependencies first. Check the Logger inside the operator for specific error messages.
- No speech detected even though audio is playing: Verify the audio source is 16kHz mono. Multi-channel audio is automatically downmixed to the first channel, but the sample rate must be 16kHz.
- Speech toggles on and off rapidly: Increase Min Silence Duration (ms) to require a longer silence gap before the operator declares speech has ended. You can also raise Speech Threshold to require higher confidence.
- Model Ready is off after toggling Active: The model must be loaded before activating. Pulse Load Model or enable Auto Load on Init.
Research & Licensing
Silero AI
Silero AI specializes in speech recognition and voice processing, creating enterprise-grade, production-ready speech models released as open source.
Silero VAD
Silero VAD is a pre-trained Voice Activity Detector that provides reliable speech detection with minimal computational requirements, suitable for real-time voice processing systems.
Technical Details
- Lightweight ONNX runtime for cross-platform compatibility (Windows, Mac, Linux)
- 16kHz audio processing in 512-sample (32ms) chunks
- Hysteresis-based threshold with configurable silence duration and speech padding
Research Impact
- Production-ready VAD for commercial and creative applications
- Open source alternative to commercial VAD solutions under MIT license
- Real-time performance suitable for live audio and interactive installations
Citation
@misc{silero2024vad,
author={Silero Team},
title={Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD)},
year={2024},
publisher={GitHub},
journal={GitHub repository},
howpublished={\url{https://github.com/snakers4/silero-vad}},
email={hello@silero.ai}
} Key Research Contributions
- Enterprise-grade Voice Activity Detection with high accuracy
- Real-time processing optimized for low-latency applications
- Pre-trained model requiring no additional training or fine-tuning
License
MIT License - This model is freely available for research and commercial use.
Parameters
Section titled “Parameters”VAD Settings
Section titled “VAD Settings”op('vad_silero').par.Isspeaking Toggle - Default:
False
op('vad_silero').par.Active Toggle - Default:
False
op('vad_silero').par.Modelready Toggle - Default:
False
op('vad_silero').par.Loadmodel Pulse - Default:
False
op('vad_silero').par.Unloadmodel Pulse - Default:
False
op('vad_silero').par.Autoloadoninit Toggle - Default:
False
op('vad_silero').par.Speechthreshold Float - Default:
0.0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('vad_silero').par.Minsilenceduration Int - Default:
0- Range:
- 0 to 1000
- Slider Range:
- 0 to 1000
op('vad_silero').par.Speechpadding Int - Default:
0- Range:
- 0 to 500
- Slider Range:
- 0 to 500
op('vad_silero').par.Installdependencies Pulse - Default:
False
Changelog
Section titled “Changelog”v2.1.02026-02-18
- Remove PyTorch backend option, ONNX-only now - Fix TD 32050+ compatibility (ONNX does not import torch)
- add tox
- Initial commit
v2.0.02025-12-12
## Major Update: Cross-Platform ONNX Backend
New Features
- Added ONNX Lite backend (silero-vad-lite) for cross-platform support (Windows/Mac/Linux)
- Backend selector parameter to choose between ONNX Lite and PyTorch
- Live parameter updates - threshold, min silence duration, and speech padding update in real-time without needing to reload
Changes
- ONNX Lite is now the default backend (no PyTorch dependency required)
- Removed min_speech_duration parameter (not in original Silero VADIterator, added latency)
- Removed Apply Settings pulse (parameters now update live)
- PyTorch backend label corrected to Windows-only (TouchDesigner doesn't run on Linux)
Technical
- OnnxLiteVADIterator class replicates Silero VADIterator state machine for streaming
- Both backends share identical parameters: Speechthreshold, Minsilenceduration, Speechpadding
- Threshold hysteresis maintained (neg_threshold = threshold - 0.15)
v1.0.12025-07-01
fixed paraexec frame dropping by specifically linking ONLY certain pars to be executed. Active *oad* *peech* *ilence*
v1.0.02025-06-29
Created the operator.
simple op to use silero vad to detect speech and silence.
added speech threshold silence duration and padding parameters + handy / easy pars for status , loading, unloading, and downloadng the model.
requirements are torch and torchaudio.