Gemini Live
The Gemini Live LOP provides real-time bidirectional voice conversation powered by Google’s Multimodal Live API. It streams microphone audio to Gemini and plays back the model’s spoken responses with ultra-low latency, while optionally sending video frames and text. Like the Agent LOP, it can discover and call tools from connected operators during a live conversation.
Key Features
Section titled “Key Features”- Real-time voice-to-voice conversation over WebSockets with automatic turn detection
- Three turn management modes: Auto VAD, Push to Talk, and Hybrid
- Video frame input (continuous streaming, manual pulse, or auto-send on connect)
- Text injection during a live conversation
- External tool orchestration via the same Tool sequence pattern used by the Agent LOP
- Built-in Google Search grounding
- Session history with save, load, and resumption support
- Auto-reconnect on server errors with configurable retry
Requirements
Section titled “Requirements”- Google API Key: A Gemini API key with Live API access. Enter it on the Config page or store it in ChatTD’s Key Manager under the name
gemini. - Python Dependency: The
google-genaipackage. Pulse ‘Install/Update google-genai’ on the Config page to install it automatically.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Input 1 (Audio CHOP): Microphone audio fed to the operator via
ReceiveAudioChunk. Typically a 16 kHz mono CHOP routed through an Audio Device In.
Outputs
Section titled “Outputs”- Output 1: Conversation table DAT (role, message, id, timestamp)
- Output 2: Current audio playback CHOP (
store_output— latest turn’s audio at 24 kHz) - Output 3: Full session audio CHOP (
full_audio— accumulated audio across all turns) - Output 4: Text output DAT (content from the
output_text_contenttool, when enabled)
Tool Integration
Section titled “Tool Integration”Gemini Live consumes tools from connected LOPs operators using the same pattern as the Agent LOP. It does not expose a GetTool() method itself.
Connecting External Tools
Section titled “Connecting External Tools”- On the Tools page, enable ‘Use LOP Tools’.
- In the ‘External Op Tools’ sequence, add a block and drag the tool operator into the ‘OP’ field.
- Set ‘Mode’ per tool:
- enabled — the model waits for the tool result before continuing (blocking).
- enabled_nonblocking — the tool runs without pausing the conversation.
- disabled — the tool is skipped.
- Start the conversation. The model will call tools as needed and incorporate the results into its spoken response.
Built-in Tools
Section titled “Built-in Tools”- Allow Model to Stop Conversation: When enabled, the model receives a
stop_current_conversationtool it can call when the user says goodbye. The stop is deferred until the current turn completes so the model can finish speaking. - Output Text (out4): When enabled, the model receives an
output_text_contenttool to display text (code snippets, data) in the fourth output DAT without reading it aloud. - Enable Google Search (built in): Adds Google Search grounding so the model can look up current information during the conversation.
Usage Examples
Section titled “Usage Examples”Basic Voice Conversation
Section titled “Basic Voice Conversation”- Enter your Google API key on the Config page (or let ChatTD’s Key Manager supply it).
- On the Gemini Live page, choose a ‘Model’ and a ‘Voice’.
- Optionally enter a ‘System Prompt’ to set the assistant’s personality.
- Pulse ‘Start’. The status indicators will show the connection is live.
- Speak into your microphone. The model responds with voice in real time.
- Pulse ‘Stop’ to end the conversation.
Sending Video Frames
Section titled “Sending Video Frames”- On the Image page, enable ‘Enable Image Input’.
- Reference a TOP in the ‘TOP’ parameter (this feeds
frame_nullinside the component). - Set ‘Image Send Mode’:
- Stream (Continuous) sends frames at the interval set in ‘Stream Interval (sec)’.
- Pulse (Manual) sends a single frame when you pulse ‘Send Image’.
- Pulse + Send on Start sends one frame automatically when the conversation connects, then manual pulses afterward.
- Choose a resolution preset or select ‘Custom’ and set width/height.
- Start the conversation. The model can now see and discuss what is in the video feed.
Resuming a Previous Session
Section titled “Resuming a Previous Session”- On the History page, enable ‘Enable Session History’.
- Select a saved session from the ‘Session to Load’ menu, then pulse ‘Load Session’. The conversation table populates with the previous messages.
- If the session is recent enough (within 12 hours), ‘Resume Session’ is automatically enabled and the Google session resumption handle is restored.
- Pulse ‘Start’. The conversation continues with full context from the previous session.
Turn Management
Section titled “Turn Management”The ‘Turn Management Mode’ on the Gemini Live page controls how user speech is detected:
- Auto VAD: Audio streams continuously and the server detects speech start/end automatically. Fine-tune detection on the Config page under ‘Configure VAD’ (start/end sensitivity, prefix padding, silence duration).
- Push to Talk: Audio is only sent while the ‘Push to Talk’ toggle is active. Ideal for noisy environments.
- Hybrid: Auto VAD runs normally, but the ‘Push to Talk’ toggle can override it for manual control when needed.
Playback
Section titled “Playback”The Playback page controls audio output hardware:
- Driver: Default system driver or ASIO for low-latency output.
- Device: Select the specific audio output device.
- Volume: Adjust playback volume (0.0 to 1.0).
- Pulse ‘Clear Audio Buffers’ to flush all generated audio from memory and the output CHOPs.
Troubleshooting
Section titled “Troubleshooting”- “google-genai not installed” popup on load: Pulse ‘Install/Update google-genai’ on the Config page, then restart TouchDesigner.
- Connection drops with 1011 errors: Enable ‘Enable Auto-Reconnect’ on the Config page. Set a reasonable ‘Reconnect Delay’ and ‘Max Reconnect Attempts’. The operator will automatically resume the session after a server error.
- Session resumption fails (1008 error): The session handle has expired. The operator clears the stale handle automatically and starts a fresh conversation on the next attempt.
- No audio output: Verify the correct ‘Device’ is selected on the Playback page and ‘Active’ is enabled under Audio Device Settings.
Parameters
Section titled “Parameters”Gemini Live
Section titled “Gemini Live”op('gemini_live').par.Status Str - Default:
"" (Empty String)
op('gemini_live').par.Statusconnected Toggle Connection status.
- Default:
False
op('gemini_live').par.Statusconversationactive Toggle Conversation status.
- Default:
False
op('gemini_live').par.Start Pulse - Default:
False
op('gemini_live').par.Pause Pulse Pause the current conversation. Will stop and save the session, then prepare for resumption.
- Default:
False
op('gemini_live').par.Stop Pulse - Default:
False
op('gemini_live').par.Resumesession Toggle When enabled, Start will resume the loaded session. When disabled, Start will begin a fresh conversation with empty table and no Google session resumption.
- Default:
False
op('gemini_live').par.Systemprompt Str System prompt for the conversation.
- Default:
"" (Empty String)
op('gemini_live').par.Pushtotalk Toggle Toggle to control when audio is sent (works in Push to Talk, Manual Activity, and Hybrid modes).
- Default:
False
op('gemini_live').par.Sendimage Pulse Pulse to send a single video frame (only works in Pulse mode).
- Default:
False
op('gemini_live').par.Top TOP - Default:
"" (Empty String)
op('gemini_live').par.Sendtext Pulse Pulse to send text content from text_input DAT to the conversation.
- Default:
False
op('gemini_live').par.Dat DAT - Default:
"" (Empty String)
op('gemini_live').par.Allowmodelstop Toggle If enabled, the model will be given a tool to stop the current conversation.
- Default:
False
op('gemini_live').par.Outputtext Toggle Use this tool to display text content (like code, data, or long text) without speaking it aloud. This will appear in out4.
- Default:
False
op('gemini_live').par.Enablegrounding Toggle Allow the model to use Google Search to improve accuracy and recency.
- Default:
False
op('gemini_live').par.Usetools Toggle Enable external tool operators via Tool sequence blocks (similar to Agent operator).
- Default:
False
op('gemini_live').par.Tool Sequence - Default:
0
op('gemini_live').par.Tool0op OP - Default:
"" (Empty String)
op('gemini_live').par.Enableimage Toggle Enable video frame input to the Live API. Requires frame_null TOP inside component.
- Default:
False
op('gemini_live').par.Streaminterval Float Time between frames when in Stream mode (seconds). Lower = more frames, higher bandwidth.
- Default:
0.0- Range:
- 0.1 to 10
- Slider Range:
- 0.1 to 10
op('gemini_live').par.Customwidth Int Custom frame width (only used when Video Resolution is set to Custom).
- Default:
0- Range:
- 64 to 2048
- Slider Range:
- 64 to 2048
op('gemini_live').par.Customheight Int Custom frame height (only used when Video Resolution is set to Custom).
- Default:
0- Range:
- 64 to 2048
- Slider Range:
- 64 to 2048
Playback
Section titled “Playback”op('gemini_live').par.Audioactive Toggle - Default:
True
op('gemini_live').par.Volume Float - Default:
1.0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('gemini_live').par.Clearaudio Pulse Clears all generated audio from memory and from the output CHOPs (store_output and full_audio).
- Default:
False
History
Section titled “History”op('gemini_live').par.Enablesessionhistory Toggle Save and load session history to/from JSON files for conversation recall.
- Default:
False
op('gemini_live').par.Savesession Pulse Manually save current session to history file.
- Default:
False
op('gemini_live').par.Loadsession Pulse Load a previous session from history. Shows file dialog.
- Default:
False
op('gemini_live').par.Listsessions Pulse Display all saved sessions with detailed metadata.
- Default:
False
op('gemini_live').par.Listallsessions Toggle - Default:
False
Config
Section titled “Config”op('gemini_live').par.Apikey Str Google API key for authentication.
- Default:
"" (Empty String)
op('gemini_live').par.Installgooglegenai Pulse Installs or updates the 'google-genai' Python package using the project's Python Manager. This is required for the operator to function.
- Default:
False
op('gemini_live').par.Enableusertranscription Toggle Enable transcription of user speech for conversation memory.
- Default:
False
op('gemini_live').par.Enablesessionresumption Toggle Use session resumption for continuous conversation.
- Default:
False
op('gemini_live').par.Enablecontextcompression Toggle Enable context window compression for longer sessions.
- Default:
False
op('gemini_live').par.Audiosendinterval Float How often to send audio chunks to Google.
- Default:
0.0- Range:
- 0.05 to 0.5
- Slider Range:
- 0.05 to 0.5
op('gemini_live').par.Enablevadconfig Toggle Enable custom Voice Activity Detection (VAD) settings. If False, API defaults are used.
- Default:
False
op('gemini_live').par.Prefixpaddingms Int Milliseconds of audio to include before detected start of speech. Requires "Configure VAD".
- Default:
0- Range:
- 0 to 500
- Slider Range:
- 0 to 500
op('gemini_live').par.Silencedurationms Int Milliseconds of silence to detect as end of speech. Requires "Configure VAD".
- Default:
0- Range:
- 100 to 5000
- Slider Range:
- 100 to 5000
op('gemini_live').par.Languagecode Str Optional. Sets response language (e.g., en-US, es-ES). Overrides voice/model default if set. May not apply to all models.
- Default:
"" (Empty String)
op('gemini_live').par.Enableautoreconnect Toggle Automatically reconnect and resume session when connection is lost due to server errors (like 1011 internal errors).
- Default:
False
op('gemini_live').par.Reconnectdelay Float How long to wait before attempting to reconnect after a connection failure.
- Default:
0.0- Range:
- 1 to 30
- Slider Range:
- 1 to 30
op('gemini_live').par.Maxreconnectattempts Int Maximum number of reconnection attempts before giving up. Set to 0 for unlimited attempts.
- Default:
0- Range:
- 1 to 10
- Slider Range:
- 1 to 10
op('gemini_live').par.Reconnectattempts Int Current number of reconnection attempts (read-only status).
- Default:
0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
Changelog
Section titled “Changelog”v2.3.32026-03-01
- Update Live API models to current IDs (remove deprecated gemini-2.5-flash-preview-native-audio-dialog and gemini-2.5-flash-exp-native-audio-thinking-dialog) - Add gemini-2.5-flash-native-audio-preview-12-2025 as current native audio model - Change Model parameter from menu to strmenu for custom model ID entry - Default model set to gemini-2.5-flash-native-audio-preview-12-2025
- Minor change - added reset - does not qualify for release
- Initial commit
v2.3.2
🔧 Major Fixes
Fixed MCP Server Integration Issues
- Tool Schema Validation: Resolved LiveConnectConfig validation errors that prevented MCP server tools from working
- Parameter Format Conversion: Fixed tool parameter type conversion issues causing external tool failures
- Schema Compatibility: Enhanced schema cleaning to handle Gemini Live API requirements
🛠 Technical Improvements
Schema Processing Enhancements
- Added automatic removal of
additionalPropertiesand$schemafields from tool schemas - Implemented missing
itemsfield detection and auto-repair for array-type parameters - Fixed
Anytype compatibility issues by defaulting tostrtype for Live API
Parameter Transformation System
- Enhanced Array Handling: Added support for multiline string to array conversion (newline-separated)
- Improved Type Inference: Parameters with
itemsfield now correctly inferred as array type even without explicittype: 'array' - Better Format Support: Enhanced comma-separated string parsing and single-value array wrapping
- Added comprehensive parameter validation with helpful warning messages
Tool Argument Processing
- Fixed string-to-array conversion for tool parameters like
insert_columnvalues - Added support for multiline string parsing (common in tool_dat operations)
- Improved type coercion for integer, number, boolean, and object parameters
- Enhanced error reporting for parameter format mismatches
🐛 Bug Fixes
- Tool_DAT Integration: Fixed
insert_columnand other tool_dat operations failing due to parameter format issues - MCP Server Tools: Resolved "missing field" errors preventing MCP tools from loading
- Live API Compatibility: Fixed callable function format issues in LiveConnectConfig
- Schema Validation: Eliminated pydantic validation errors for external tool schemas
📝 Developer Notes
All changes were made exclusively to GoogleVoice2VoiceEXT.py:
- Enhanced
_clean_parameters_recursive()method for better schema cleaning - Improved
transform_tool_arguments()method with array type inference - Added
_validate_transformed_arguments()method for parameter validation - Updated type mappings to use
strinstead ofAnyfor Live API compatibility
✅ Verification
- MCP server tools now integrate successfully with Gemini Live API
- Tool_DAT operations (insert_column, replace_all_table, etc.) working correctly
- Parameter format conversions handle various input formats automatically
- Enhanced logging for debugging tool parameter issues
Impact: This release resolves all major MCP integration issues and tool parameter format problems, enabling full compatibility between external tools and the Gemini Live API.
v2.3.12025-09-01
improved handling of nested arrays to flatten them into strings for gemini live and avoid the 2 depth limit for items in gemini live api funcitoncalling schema format.
v2.3.02025-08-18
New Features
- Added session history table with readable labels (date, duration, message count, resumable status)
- Added pause conversation functionality
- Added
resume_statusboolean column for easy session filtering - Shortened parameter names:
Start,Stop,Pause(instead ofStartconversation, etc.)
Session Management Improvements
- Smart session resumption (only attempts recent sessions < 12 hours old)
- Always creates new session IDs even when resuming (preserves conversation history)
- Sessions sorted newest first in history table
- Compact label format: "Aug18 18:43 25msgs 18s" with "View Only" indicator for old sessions
Bug Fixes
- Fixed final message capture when assistant ends conversation (added 500ms + 100ms delays)
- Fixed session age parsing for both timestamp and ISO string formats
- Fixed table clearing logic to preserve loaded conversation history during resumption
- Improved error handling for expired session handles (1008 errors)
Technical Changes
- Enhanced session loading with age validation and resumption handle checks
- Improved logging transparency for session operations
- Model stop tool properly triggers session saving
v2.2.12025-08-01
- Fixed MCP tool integration: Array parameters now include required 'items' field in schema conversion
- Resolved "missing fie" error when using MCP tools with array parameters like search_terms
v2.2.02025-07-13
Added / fixed the Tool mode so that theree is the ability to set per op tool block or non blocking mode rather than needing to specify that in the GetTool definiotn since it is gemini specific. also allows for more specifity to choose to wait or not wait for the tool to complete - whether or not the agent waits still might depend on the system prompt or instructions or nature of the tool used.
v2.1.02025-06-30
- fixed the conversation table updating correctly and added live mode so the table updates live. (par.Conversationupdate [menu: 'live', 'on_turn_complete'])
- added Playback parameter page that controls the palyback of the agent audio + added a clear audio button to clear the audio buffers.
v2.0.02025-06-25
added video + text input
parity of tools between agent op
added textoutput internal tool.
v1.0.02025-05-25
Initial release