VAD Silero

v2.1.0Updated

Overview

The VAD Silero LOP provides real-time Voice Activity Detection (VAD) using the Silero VAD model. It monitors an audio stream and determines whether someone is currently speaking, exposing speech state through parameters and CHOP-exportable dependencies. This operator is designed for low-latency applications such as triggering speech-to-text, controlling push-to-talk flows, or gating audio processing.

Audio is processed asynchronously in fixed 512-sample chunks (32ms at 16kHz). The operator uses a hysteresis threshold to avoid rapid toggling between speech and silence states.

Requirements

Python package: silero-vad-lite — pulse Install Dependencies on the VAD Settings page to install it automatically.

Input/Output

Inputs

This operator has no wired inputs. Audio is received programmatically via the ReceiveAudioChunk method, which accepts a NumPy float32 array of 16kHz mono audio samples. Other LOPs operators (such as speech-to-text operators) call this method to feed audio data into the VAD pipeline.

Outputs

Output 1: Exposes three CHOP-monitorable dependencies:
- SpeechActive — True while speech is detected, False during silence
- OnSpeechStart — pulses True for one frame when speech begins
- OnSpeechEnd — pulses True for one frame when speech ends

The Is Speaking read-only toggle on the VAD Settings page also reflects the current speech state.

Usage Examples

Basic Setup

Place a VAD Silero LOP in your network.
On the VAD Settings page, pulse Install Dependencies if this is the first time using the operator.
Pulse Load Model to load the Silero VAD model. The Model Ready indicator will turn on when the model finishes loading.
Toggle Active Monitor CHOPin1 to On to begin processing incoming audio.

Auto-Loading on Startup

Enable Auto Load on Init on the VAD Settings page.
The model will load automatically each time the project opens, so you do not need to manually pulse Load Model.

Tuning Detection Sensitivity

All sensitivity parameters update the running model in real time — no need to reload the model after changing them.

Speech Threshold — controls how confident the model must be that speech is present before triggering. Higher values reduce false positives but may miss quiet speech. The end-of-speech threshold is automatically set 0.15 below this value (hysteresis).
Min Silence Duration (ms) — the minimum length of silence before the operator considers speech to have ended. Increase this to avoid cutting off speakers who pause briefly.
Speech Pad (ms) — extra padding added to the start and end of detected speech segments. Useful for capturing the onset of words that begin softly.

Monitoring Speech State in Your Network

Use a CHOP Export or Expression to read the dependency values from the operator:

SpeechActive for a continuous on/off signal
OnSpeechStart for a single-frame pulse when speech begins
OnSpeechEnd for a single-frame pulse when speech ends

These can drive downstream logic such as starting a recording, triggering STT, or enabling an audio gate.

Troubleshooting

Model fails to load: Make sure dependencies are installed by pulsing Install Dependencies first. Check the Logger inside the operator for specific error messages.
No speech detected even though audio is playing: Verify the audio source is 16kHz mono. Multi-channel audio is automatically downmixed to the first channel, but the sample rate must be 16kHz.
Speech toggles on and off rapidly: Increase Min Silence Duration (ms) to require a longer silence gap before the operator declares speech has ended. You can also raise Speech Threshold to require higher confidence.
Model Ready is off after toggling Active: The model must be loaded before activating. Pulse Load Model or enable Auto Load on Init.

Research & Licensing

Silero AI

Silero AI specializes in speech recognition and voice processing, creating enterprise-grade, production-ready speech models released as open source.

Silero VAD

Silero VAD is a pre-trained Voice Activity Detector that provides reliable speech detection with minimal computational requirements, suitable for real-time voice processing systems.

Technical Details

Lightweight ONNX runtime for cross-platform compatibility (Windows, Mac, Linux)
16kHz audio processing in 512-sample (32ms) chunks
Hysteresis-based threshold with configurable silence duration and speech padding

Research Impact

Production-ready VAD for commercial and creative applications
Open source alternative to commercial VAD solutions under MIT license
Real-time performance suitable for live audio and interactive installations

Citation

@misc{silero2024vad,
  author={Silero Team},
  title={Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD)},
  year={2024},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/snakers4/silero-vad}},
  email={hello@silero.ai}
}

Key Research Contributions

Enterprise-grade Voice Activity Detection with high accuracy
Real-time processing optimized for low-latency applications
Pre-trained model requiring no additional training or fine-tuning

License

MIT License - This model is freely available for research and commercial use.

Parameters

VAD Settings

Is Speaking (Isspeaking) op('vad_silero').par.Isspeaking Toggle

Default:: False

Active Monitor CHOPin1 (Active) op('vad_silero').par.Active Toggle

Default:: False

Model Ready (Modelready) op('vad_silero').par.Modelready Toggle

Default:: False

Load Model (Loadmodel) op('vad_silero').par.Loadmodel Pulse

Default:: False

Unload Model (Unloadmodel) op('vad_silero').par.Unloadmodel Pulse

Default:: False

Auto Load on Init (Autoloadoninit) op('vad_silero').par.Autoloadoninit Toggle

Default:: False

Speech Threshold (Speechthreshold) op('vad_silero').par.Speechthreshold Float

Default:: 0.0
Range:: 0 to 1
Slider Range:: 0 to 1

Min Silence Duration (ms) (Minsilenceduration) op('vad_silero').par.Minsilenceduration Int

Default:: 0
Range:: 0 to 1000
Slider Range:: 0 to 1000

Speech Pad (ms) (Speechpadding) op('vad_silero').par.Speechpadding Int

Default:: 0
Range:: 0 to 500
Slider Range:: 0 to 500

Install Dependencies (Installdependencies) op('vad_silero').par.Installdependencies Pulse

Default:: False

Changelog

v2.1.02026-02-18

Remove PyTorch backend option, ONNX-only now - Fix TD 32050+ compatibility (ONNX does not import torch)
add tox
Initial commit

v2.0.02025-12-12

## Major Update: Cross-Platform ONNX Backend

New Features

Added ONNX Lite backend (silero-vad-lite) for cross-platform support (Windows/Mac/Linux)
Backend selector parameter to choose between ONNX Lite and PyTorch
Live parameter updates - threshold, min silence duration, and speech padding update in real-time without needing to reload

Changes

ONNX Lite is now the default backend (no PyTorch dependency required)
Removed min_speech_duration parameter (not in original Silero VADIterator, added latency)
Removed Apply Settings pulse (parameters now update live)
PyTorch backend label corrected to Windows-only (TouchDesigner doesn't run on Linux)

Technical

OnnxLiteVADIterator class replicates Silero VADIterator state machine for streaming
Both backends share identical parameters: Speechthreshold, Minsilenceduration, Speechpadding
Threshold hysteresis maintained (neg_threshold = threshold - 0.15)

v1.0.12025-07-01

fixed paraexec frame dropping by specifically linking ONLY certain pars to be executed. Active *oad* *peech* *ilence*

v1.0.02025-06-29

Created the operator.

simple op to use silero vad to detect speech and silence.

added speech threshold silence duration and padding parameters + handy / easy pars for status , loading, unloading, and downloadng the model.

requirements are torch and torchaudio.