Gemini Live

v2.3.3Updated

The Gemini Live LOP provides real-time bidirectional voice conversation powered by Google’s Multimodal Live API. It streams microphone audio to Gemini and plays back the model’s spoken responses with ultra-low latency, while optionally sending video frames and text. Like the Agent LOP, it can discover and call tools from connected operators during a live conversation.

Key Features

Real-time voice-to-voice conversation over WebSockets with automatic turn detection
Three turn management modes: Auto VAD, Push to Talk, and Hybrid
Video frame input (continuous streaming, manual pulse, or auto-send on connect)
Text injection during a live conversation
External tool orchestration via the same Tool sequence pattern used by the Agent LOP
Built-in Google Search grounding
Session history with save, load, and resumption support
Auto-reconnect on server errors with configurable retry

Requirements

Google API Key: A Gemini API key with Live API access. Enter it on the Config page or store it in ChatTD’s Key Manager under the name gemini.
Python Dependency: The google-genai package. Pulse ‘Install/Update google-genai’ on the Config page to install it automatically.

Input/Output

Inputs

Input 1 (Audio CHOP): Microphone audio fed to the operator via ReceiveAudioChunk. Typically a 16 kHz mono CHOP routed through an Audio Device In.

Outputs

Output 1: Conversation table DAT (role, message, id, timestamp)
Output 2: Current audio playback CHOP (store_output — latest turn’s audio at 24 kHz)
Output 3: Full session audio CHOP (full_audio — accumulated audio across all turns)
Output 4: Text output DAT (content from the output_text_content tool, when enabled)

Tool Integration

Gemini Live consumes tools from connected LOPs operators using the same pattern as the Agent LOP. It does not expose a GetTool() method itself.

Connecting External Tools

On the Tools page, enable ‘Use LOP Tools’.
In the ‘External Op Tools’ sequence, add a block and drag the tool operator into the ‘OP’ field.
Set ‘Mode’ per tool:
- enabled — the model waits for the tool result before continuing (blocking).
- enabled_nonblocking — the tool runs without pausing the conversation.
- disabled — the tool is skipped.
Start the conversation. The model will call tools as needed and incorporate the results into its spoken response.

Built-in Tools

Allow Model to Stop Conversation: When enabled, the model receives a stop_current_conversation tool it can call when the user says goodbye. The stop is deferred until the current turn completes so the model can finish speaking.
Output Text (out4): When enabled, the model receives an output_text_content tool to display text (code snippets, data) in the fourth output DAT without reading it aloud.
Enable Google Search (built in): Adds Google Search grounding so the model can look up current information during the conversation.

Usage Examples

Basic Voice Conversation

Enter your Google API key on the Config page (or let ChatTD’s Key Manager supply it).
On the Gemini Live page, choose a ‘Model’ and a ‘Voice’.
Optionally enter a ‘System Prompt’ to set the assistant’s personality.
Pulse ‘Start’. The status indicators will show the connection is live.
Speak into your microphone. The model responds with voice in real time.
Pulse ‘Stop’ to end the conversation.

Sending Video Frames

On the Image page, enable ‘Enable Image Input’.
Reference a TOP in the ‘TOP’ parameter (this feeds frame_null inside the component).
Set ‘Image Send Mode’:
- Stream (Continuous) sends frames at the interval set in ‘Stream Interval (sec)’.
- Pulse (Manual) sends a single frame when you pulse ‘Send Image’.
- Pulse + Send on Start sends one frame automatically when the conversation connects, then manual pulses afterward.
Choose a resolution preset or select ‘Custom’ and set width/height.
Start the conversation. The model can now see and discuss what is in the video feed.

Resuming a Previous Session

On the History page, enable ‘Enable Session History’.
Select a saved session from the ‘Session to Load’ menu, then pulse ‘Load Session’. The conversation table populates with the previous messages.
If the session is recent enough (within 12 hours), ‘Resume Session’ is automatically enabled and the Google session resumption handle is restored.
Pulse ‘Start’. The conversation continues with full context from the previous session.

Turn Management

The ‘Turn Management Mode’ on the Gemini Live page controls how user speech is detected:

Auto VAD: Audio streams continuously and the server detects speech start/end automatically. Fine-tune detection on the Config page under ‘Configure VAD’ (start/end sensitivity, prefix padding, silence duration).
Push to Talk: Audio is only sent while the ‘Push to Talk’ toggle is active. Ideal for noisy environments.
Hybrid: Auto VAD runs normally, but the ‘Push to Talk’ toggle can override it for manual control when needed.

Playback

The Playback page controls audio output hardware:

Driver: Default system driver or ASIO for low-latency output.
Device: Select the specific audio output device.
Volume: Adjust playback volume (0.0 to 1.0).
Pulse ‘Clear Audio Buffers’ to flush all generated audio from memory and the output CHOPs.

Troubleshooting

“google-genai not installed” popup on load: Pulse ‘Install/Update google-genai’ on the Config page, then restart TouchDesigner.
Connection drops with 1011 errors: Enable ‘Enable Auto-Reconnect’ on the Config page. Set a reasonable ‘Reconnect Delay’ and ‘Max Reconnect Attempts’. The operator will automatically resume the session after a server error.
Session resumption fails (1008 error): The session handle has expired. The operator clears the stale handle automatically and starts a fresh conversation on the next attempt.
No audio output: Verify the correct ‘Device’ is selected on the Playback page and ‘Active’ is enabled under Audio Device Settings.

Parameters

Gemini Live

Status (Status) op('gemini_live').par.Status Str

Default:: "" (Empty String)

Live Connection (Statusconnected) op('gemini_live').par.Statusconnected Toggle

Connection status.

Default:: False

Conversation Status (Statusconversationactive) op('gemini_live').par.Statusconversationactive Toggle

Conversation status.

Default:: False

Start (Start) op('gemini_live').par.Start Pulse

Default:: False

Pause (Pause) op('gemini_live').par.Pause Pulse

Pause the current conversation. Will stop and save the session, then prepare for resumption.

Default:: False

Stop (Stop) op('gemini_live').par.Stop Pulse

Default:: False

Resume Session (Resumesession) op('gemini_live').par.Resumesession Toggle

When enabled, Start will resume the loaded session. When disabled, Start will begin a fresh conversation with empty table and no Google session resumption.

Default:: False

Model (Model) op('gemini_live').par.Model StrMenu

Google Live API model for conversation. This is an editable menu - you can type a custom model ID if newer models become available. Available models: • Gemini 2.0 Flash Live (gemini-2.0-flash-live-001): The standard Live API model optimized for real-time bidirectional voice and video interactions. Features include: - Input: Audio, video, and text - Output: Text and audio - 1M token context window - Function calling support - Google Search grounding - Code execution capabilities - Low-latency streaming - Knowledge cutoff: August 2024 • Gemini 2.5 Flash Native Audio (gemini-2.5-flash-native-audio-preview-12-2025): Native audio model for natural conversations: - Input: Audio, video, and text - Output: Audio and text (interleaved) - High quality, natural conversational audio - Thinking support (configurable via thinkingBudget) - 128K token context window - Function calling support - Google Search grounding - Knowledge cutoff: January 2025 For more model details, see: https://ai.google.dev/gemini-api/docs/models#live-api

Default:

"" (Empty String)

Menu Options:

Gemini 2.0 Flash Live (gemini-2.0-flash-live-001)
Gemini 2.5 Flash Native Audio (gemini-2.5-flash-native-audio-preview-12-2025)

System Prompt (Systemprompt) op('gemini_live').par.Systemprompt Str

System prompt for the conversation.

Default:: "" (Empty String)

Turn Management Mode (Turnmode) op('gemini_live').par.Turnmode Menu

Turn management mode controls how user speech is detected and sent to the API: • Auto VAD: Continuous audio streaming with server-side Voice Activity Detection. The API automatically detects when you start and stop speaking. • Push to Talk: Manual control via Push to Talk toggle. Audio is only sent when toggle is active. Ideal for noisy environments or precise control. • Hybrid: Combines automatic VAD with Push to Talk override. VAD works normally, but Push to Talk can force activity start/stop for manual control when needed.

Default:: auto_vad
Options:: auto_vad, push_to_talk, hybrid

Push to Talk (Pushtotalk) op('gemini_live').par.Pushtotalk Toggle

Toggle to control when audio is sent (works in Push to Talk, Manual Activity, and Hybrid modes).

Default:: False

Send Image (Sendimage) op('gemini_live').par.Sendimage Pulse

Pulse to send a single video frame (only works in Pulse mode).

Default:: False

TOP (Top) op('gemini_live').par.Top TOP

Default:: "" (Empty String)

Send Text (Sendtext) op('gemini_live').par.Sendtext Pulse

Pulse to send text content from text_input DAT to the conversation.

Default:: False

DAT (Dat) op('gemini_live').par.Dat DAT

Default:: "" (Empty String)

Tools

Gemini Live Tools Header

Allow Model to Stop Conversation (Allowmodelstop) op('gemini_live').par.Allowmodelstop Toggle

If enabled, the model will be given a tool to stop the current conversation.

Default:: False

Output Text (out4) (Outputtext) op('gemini_live').par.Outputtext Toggle

Use this tool to display text content (like code, data, or long text) without speaking it aloud. This will appear in out4.

Default:: False

Enable Google Search (built in) (Enablegrounding) op('gemini_live').par.Enablegrounding Toggle

Allow the model to use Google Search to improve accuracy and recency.

Default:: False

Use LOP Tools Header

Use LOP Tools (Usetools) op('gemini_live').par.Usetools Toggle

Enable external tool operators via Tool sequence blocks (similar to Agent operator).

Default:: False

External Op Tools (Tool) op('gemini_live').par.Tool Sequence

Default:: 0

OP (Tool0op) op('gemini_live').par.Tool0op OP

Default:: "" (Empty String)

Image

Enable Image Input (Enableimage) op('gemini_live').par.Enableimage Toggle

Enable video frame input to the Live API. Requires frame_null TOP inside component.

Default:: False

Resolution [ fit outside ] (Imageresolution) op('gemini_live').par.Imageresolution Menu

Resolution handling for video frames: • Use TOP Resolution: Send frames at the resolution of frame_null TOP • 1024x1024: Square format (recommended max for Live API) • 512x512: Square format (good balance of quality/bandwidth) • 256x256: Square format (low bandwidth) • 1920x1080: Full HD 16:9 • 1280x720: HD 16:9 • 854x480: SD 16:9 • 1024x768: 4:3 aspect ratio • 800x600: 4:3 aspect ratio • Custom: Use custom width/height parameters

Default:: use_top
Options:: use_top, 1024x1024, 512x512, 256x256, 1920x1080, 1280x720, 854x480, 1024x768, 800x600, custom

Stream Interval (sec) (Streaminterval) op('gemini_live').par.Streaminterval Float

Time between frames when in Stream mode (seconds). Lower = more frames, higher bandwidth.

Default:: 0.0
Range:: 0.1 to 10
Slider Range:: 0.1 to 10

Custom Width (Customwidth) op('gemini_live').par.Customwidth Int

Custom frame width (only used when Video Resolution is set to Custom).

Default:: 0
Range:: 64 to 2048
Slider Range:: 64 to 2048

Custom Height (Customheight) op('gemini_live').par.Customheight Int

Custom frame height (only used when Video Resolution is set to Custom).

Default:: 0
Range:: 64 to 2048
Slider Range:: 64 to 2048

Playback

Audio Device Settings Header

Active (Audioactive) op('gemini_live').par.Audioactive Toggle

Default:: True

Device (Device) op('gemini_live').par.Device Menu

Default:: default
Options:: default, {0.0.0.00000000}.{d7b929aa-ec27-4f96-bd39-78d6a8c2044a}||Out_1-2_(MOTU_M_Series)||1, {0.0.0.00000000}.{044f8ef8-f1ad-4655-90d6-0aef7b713b78}||Voicemeeter_AUX_Input_(VB-Audio_Voicemeeter_VAIO)||2, {0.0.0.00000000}.{170cc7c6-264f-46f4-a652-4c80058e49d2}||LS27A70_(NVIDIA_High_Definition_Audio)||3, {0.0.0.00000000}.{2162b344-60a4-4dda-a068-a0887826a518}||Voicemeeter_VAIO3_Input_(VB-Audio_Voicemeeter_VAIO)||4, {0.0.0.00000000}.{25b313f1-54e3-4bae-b661-e07474413cab}||Voicemeeter_In_5_(VB-Audio_Voicemeeter_VAIO)||5, {0.0.0.00000000}.{34b7624e-63b1-49fc-93dc-1d03ca1dd600}||CABLE-B_In_16ch_(VB-Audio_Virtual_Cable_B)||6, {0.0.0.00000000}.{372fb62b-07aa-4580-b062-2c6adba187e7}||Out_3-4_(MOTU_M_Series)||7, {0.0.0.00000000}.{394e3e0c-eac4-4fc9-ad76-1600f0cb570b}||CABLE-A_Input_(VB-Audio_Virtual_Cable_A)||8, {0.0.0.00000000}.{6b4330e7-1895-4e98-8c9b-f21172031db3}||F13NA_(NVIDIA_High_Definition_Audio)||9, {0.0.0.00000000}.{95545e0a-9e74-42cc-9835-fa5b33914d6c}||Voicemeeter_In_2_(VB-Audio_Voicemeeter_VAIO)||10, {0.0.0.00000000}.{9f9a792f-c06d-4923-9f4b-da5200378f26}||LEN_P32u-10_(NVIDIA_High_Definition_Audio)||11, {0.0.0.00000000}.{a4936fe7-9e56-4176-9789-f343437419f2}||Voicemeeter_In_4_(VB-Audio_Voicemeeter_VAIO)||12, {0.0.0.00000000}.{a720258f-2455-411a-a701-1f8ecee9d3d6}||Voicemeeter_In_3_(VB-Audio_Voicemeeter_VAIO)||13, {0.0.0.00000000}.{ad8837d6-c905-4a72-adcb-7018ee7baab3}||CABLE_Input_(VB-Audio_Virtual_Cable)||14, {0.0.0.00000000}.{b6526f47-8c31-48d5-8b30-489196c56a6b}||Headphones_(iLoud_Micro-Monitor)||15, {0.0.0.00000000}.{b8fcdf16-5164-47d2-80aa-0f80ce4bd0b5}||Voicemeeter_Input_(VB-Audio_Voicemeeter_VAIO)||16, {0.0.0.00000000}.{c6e56f21-72eb-46b0-9761-7ba5c2286004}||Voicemeeter_In_1_(VB-Audio_Voicemeeter_VAIO)||17, {0.0.0.00000000}.{d8576751-f212-4fa7-8cb5-0f825e64c87e}||CABLE-B_Input_(VB-Audio_Virtual_Cable_B)||18, {0.0.0.00000000}.{fddd6891-fc7b-4afa-8c26-9246692a19f0}||CABLE-A_In_16ch_(VB-Audio_Virtual_Cable_A)||19

Volume (Volume) op('gemini_live').par.Volume Float

Default:: 1.0
Range:: 0 to 1
Slider Range:: 0 to 1

Clear Audio Buffers (Clearaudio) op('gemini_live').par.Clearaudio Pulse

Clears all generated audio from memory and from the output CHOPs (store_output and full_audio).

Default:: False

History

Enable Session History (Enablesessionhistory) op('gemini_live').par.Enablesessionhistory Toggle

Save and load session history to/from JSON files for conversation recall.

Default:: False

Save Session (Savesession) op('gemini_live').par.Savesession Pulse

Manually save current session to history file.

Default:: False

Load Session (Loadsession) op('gemini_live').par.Loadsession Pulse

Load a previous session from history. Shows file dialog.

Default:: False

List Sessions (Listsessions) op('gemini_live').par.Listsessions Pulse

Display all saved sessions with detailed metadata.

Default:: False

List All Sessions (Listallsessions) op('gemini_live').par.Listallsessions Toggle

Default:: False

Config

API Key (Apikey) op('gemini_live').par.Apikey Str

Google API key for authentication.

Default:: "" (Empty String)

Install/Update google-genai (Installgooglegenai) op('gemini_live').par.Installgooglegenai Pulse

Installs or updates the 'google-genai' Python package using the project's Python Manager. This is required for the operator to function.

Default:: False

Enable User Transcription (Enableusertranscription) op('gemini_live').par.Enableusertranscription Toggle

Enable transcription of user speech for conversation memory.

Default:: False

Enable Session Resumption (Enablesessionresumption) op('gemini_live').par.Enablesessionresumption Toggle

Use session resumption for continuous conversation.

Default:: False

Enable Context Compression (Enablecontextcompression) op('gemini_live').par.Enablecontextcompression Toggle

Enable context window compression for longer sessions.

Default:: False

Audio Send Interval (sec) (Audiosendinterval) op('gemini_live').par.Audiosendinterval Float

How often to send audio chunks to Google.

Default:: 0.0
Range:: 0.05 to 0.5
Slider Range:: 0.05 to 0.5

Configure VAD (Enablevadconfig) op('gemini_live').par.Enablevadconfig Toggle

Enable custom Voice Activity Detection (VAD) settings. If False, API defaults are used.

Default:: False

VAD Prefix Padding (ms) (Prefixpaddingms) op('gemini_live').par.Prefixpaddingms Int

Milliseconds of audio to include before detected start of speech. Requires "Configure VAD".

Default:: 0
Range:: 0 to 500
Slider Range:: 0 to 500

VAD Silence Duration (ms) (Silencedurationms) op('gemini_live').par.Silencedurationms Int

Milliseconds of silence to detect as end of speech. Requires "Configure VAD".

Default:: 0
Range:: 100 to 5000
Slider Range:: 100 to 5000

Language Code (BCP-47) (Languagecode) op('gemini_live').par.Languagecode Str

Optional. Sets response language (e.g., en-US, es-ES). Overrides voice/model default if set. May not apply to all models.

Default:: "" (Empty String)

Enable Auto-Reconnect (Enableautoreconnect) op('gemini_live').par.Enableautoreconnect Toggle

Automatically reconnect and resume session when connection is lost due to server errors (like 1011 internal errors).

Default:: False

Reconnect Delay (sec) (Reconnectdelay) op('gemini_live').par.Reconnectdelay Float

How long to wait before attempting to reconnect after a connection failure.

Default:: 0.0
Range:: 1 to 30
Slider Range:: 1 to 30

Max Reconnect Attempts (Maxreconnectattempts) op('gemini_live').par.Maxreconnectattempts Int

Maximum number of reconnection attempts before giving up. Set to 0 for unlimited attempts.

Default:: 0
Range:: 1 to 10
Slider Range:: 1 to 10

Current Reconnect Attempts (Reconnectattempts) op('gemini_live').par.Reconnectattempts Int

Current number of reconnection attempts (read-only status).

Default:: 0
Range:: 0 to 1
Slider Range:: 0 to 1

Changelog

v2.3.32026-03-01

Update Live API models to current IDs (remove deprecated gemini-2.5-flash-preview-native-audio-dialog and gemini-2.5-flash-exp-native-audio-thinking-dialog) - Add gemini-2.5-flash-native-audio-preview-12-2025 as current native audio model - Change Model parameter from menu to strmenu for custom model ID entry - Default model set to gemini-2.5-flash-native-audio-preview-12-2025
Minor change - added reset - does not qualify for release
Initial commit

v2.3.2

🔧 Major Fixes

Fixed MCP Server Integration Issues

Tool Schema Validation: Resolved LiveConnectConfig validation errors that prevented MCP server tools from working
Parameter Format Conversion: Fixed tool parameter type conversion issues causing external tool failures
Schema Compatibility: Enhanced schema cleaning to handle Gemini Live API requirements

🛠 Technical Improvements

Schema Processing Enhancements

Added automatic removal of additionalProperties and $schema fields from tool schemas
Implemented missing items field detection and auto-repair for array-type parameters
Fixed Any type compatibility issues by defaulting to str type for Live API

Parameter Transformation System

Enhanced Array Handling: Added support for multiline string to array conversion (newline-separated)
Improved Type Inference: Parameters with items field now correctly inferred as array type even without explicit type: 'array'
Better Format Support: Enhanced comma-separated string parsing and single-value array wrapping
Added comprehensive parameter validation with helpful warning messages

Tool Argument Processing

Fixed string-to-array conversion for tool parameters like insert_column values
Added support for multiline string parsing (common in tool_dat operations)
Improved type coercion for integer, number, boolean, and object parameters
Enhanced error reporting for parameter format mismatches

🐛 Bug Fixes

Tool_DAT Integration: Fixed insert_column and other tool_dat operations failing due to parameter format issues
MCP Server Tools: Resolved "missing field" errors preventing MCP tools from loading
Live API Compatibility: Fixed callable function format issues in LiveConnectConfig
Schema Validation: Eliminated pydantic validation errors for external tool schemas

📝 Developer Notes

All changes were made exclusively to GoogleVoice2VoiceEXT.py:

Enhanced _clean_parameters_recursive() method for better schema cleaning
Improved transform_tool_arguments() method with array type inference
Added _validate_transformed_arguments() method for parameter validation
Updated type mappings to use str instead of Any for Live API compatibility

✅ Verification

MCP server tools now integrate successfully with Gemini Live API
Tool_DAT operations (insert_column, replace_all_table, etc.) working correctly
Parameter format conversions handle various input formats automatically
Enhanced logging for debugging tool parameter issues

Impact: This release resolves all major MCP integration issues and tool parameter format problems, enabling full compatibility between external tools and the Gemini Live API.

v2.3.12025-09-01

improved handling of nested arrays to flatten them into strings for gemini live and avoid the 2 depth limit for items in gemini live api funcitoncalling schema format.

v2.3.02025-08-18

New Features

Added session history table with readable labels (date, duration, message count, resumable status)
Added pause conversation functionality
Added resume_status boolean column for easy session filtering
Shortened parameter names: Start, Stop, Pause (instead of Startconversation, etc.)

Session Management Improvements

Smart session resumption (only attempts recent sessions < 12 hours old)
Always creates new session IDs even when resuming (preserves conversation history)
Sessions sorted newest first in history table
Compact label format: "Aug18 18:43 25msgs 18s" with "View Only" indicator for old sessions

Bug Fixes

Fixed final message capture when assistant ends conversation (added 500ms + 100ms delays)
Fixed session age parsing for both timestamp and ISO string formats
Fixed table clearing logic to preserve loaded conversation history during resumption
Improved error handling for expired session handles (1008 errors)

Technical Changes

Enhanced session loading with age validation and resumption handle checks
Improved logging transparency for session operations
Model stop tool properly triggers session saving

v2.2.12025-08-01

Fixed MCP tool integration: Array parameters now include required 'items' field in schema conversion
Resolved "missing fie" error when using MCP tools with array parameters like search_terms

v2.2.02025-07-13

Added / fixed the Tool mode so that theree is the ability to set per op tool block or non blocking mode rather than needing to specify that in the GetTool definiotn since it is gemini specific. also allows for more specifity to choose to wait or not wait for the tool to complete - whether or not the agent waits still might depend on the system prompt or instructions or nature of the tool used.

v2.1.02025-06-30

fixed the conversation table updating correctly and added live mode so the table updates live. (par.Conversationupdate [menu: 'live', 'on_turn_complete'])
added Playback parameter page that controls the palyback of the agent audio + added a clear audio button to clear the audio buffers.

v2.0.02025-06-25

added video + text input

parity of tools between agent op

added textoutput internal tool.

v1.0.02025-05-25

Initial release

Gemini Live

Key Features

Requirements

Input/Output

Inputs

Outputs

Tool Integration

Connecting External Tools

Built-in Tools

Usage Examples

Basic Voice Conversation

Sending Video Frames

Resuming a Previous Session

Turn Management

Playback

Troubleshooting

Parameters

Gemini Live

Tools

Image

Playback

History

Config

Changelog

🔧 Major Fixes

🛠 Technical Improvements

🐛 Bug Fixes

📝 Developer Notes

✅ Verification

New Features

Session Management Improvements

Bug Fixes

Technical Changes

Related Operators