Gemini Image Gen
The Gemini Image Gen LOP generates images using Google’s Gemini image generation models. It supports text-to-image and image-to-image workflows with configurable resolution (up to 4K), ten aspect ratio presets, and optional multi-modal prompt enrichment via a connected Context Grabber.
Agent Tool Integration
Section titled “Agent Tool Integration”This operator exposes 1 tool that allow Agent and Gemini Live LOPs to generate images from text prompts with configurable resolution and aspect ratio.
Use the Tool Debugger operator to inspect exact tool definitions, schemas, and parameters.
When connected to an Agent LOP, the agent can call the generate_image tool to create images on demand. The tool description adapts based on the selected model — Gemini 3 Pro emphasizes reasoning capabilities while Gemini 2.5 Flash highlights editing, character consistency, and multi-image blending.
Two parameters on the Gemini page control how agent tool calls behave:
- Agent Execution Mode: Set to “Wait for Completion” to block the agent until the image is ready (the file path is returned), or “Background Processing” to start generation and return immediately.
- Agent Result Content: Controls what the agent receives back — “Status Only”, “File Path”, or “Path + Metadata” (includes dimensions, cost, model, and aspect ratio).
Requirements
Section titled “Requirements”- Gemini API Key: Obtain a key from Google AI Studio and enter it in the Gemini API Key field on the Gemini page. The key is stored securely in the ChatTD environment. You can also pulse Get API Key to open AI Studio in your browser.
- Python Packages:
google-genai,Pillow,opencv-python, andnumpy. Install these via the ChatTD Python Manager.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”- Prompt: Enter text directly in the Prompt field, or set Prompt Source to “Conversation Table” to read from a connected DAT with
roleandmessagecolumns. - Input Image (Optional): Reference a TOP to provide a source image for image-to-image editing tasks.
- Context Grabber (Optional): Reference a Context Grabber operator to include its collected text and images in the generation prompt.
Outputs
Section titled “Outputs”- Conversation Table (
conversation_dat): A role/content table compatible with downstream LOPs, logging each prompt and assistant response with image paths, timestamps, model, and cost. - History Table (
history_dat): Detailed log of every generation job including job ID, prompt, status, model, aspect ratio, image size, cost estimate, dimensions, and file paths. - Image Viewer: Displays the generated image selected by the Display Image slider.
- Image Files: Generated images are saved as PNG files in the configured Output Directory (or a default location within the ChatTD environment).
Usage Examples
Section titled “Usage Examples”Basic Image Generation
Section titled “Basic Image Generation”- On the Gemini page, enter your Gemini API key in the Gemini API Key field.
- Select a model from the Model menu — Gemini 2.5 Flash Image for fast generation, or Gemini 3 Pro Image for higher quality with reasoning.
- Choose an Image Size (1K, 2K, or 4K) and an Aspect Ratio preset.
- Enter your prompt in the Prompt field.
- Pulse Generate Image to start generation.
- Monitor the Status field. Once complete, the image appears in the operator’s viewer.
Image-to-Image Editing
Section titled “Image-to-Image Editing”- Drag a TOP operator into the Input Image (Optional) parameter.
- Write an editing instruction in the Prompt field (e.g., “Make the sky purple and add northern lights”).
- Pulse Generate Image. The model receives both the input image and your prompt.
Using a Context Grabber
Section titled “Using a Context Grabber”- Drag a configured Context Grabber operator into the Context Grabber (Optional) parameter.
- The text and images collected by the Context Grabber are automatically prepended to your prompt.
- Enter any additional instructions in the Prompt field if needed.
- Pulse Generate Image.
Generating from a Conversation Table
Section titled “Generating from a Conversation Table”- Set Prompt Source to “Conversation Table”.
- Connect a Table DAT with
roleandmessagecolumns to the operator’s first input. - Pulse Generate Image. The operator concatenates user and assistant messages from the table into a single prompt.
Using with an Agent
Section titled “Using with an Agent”- Create an Agent operator and connect the Gemini Image Gen operator to it as a tool.
- Set Agent Execution Mode to “Wait for Completion” so the agent receives the generated file path.
- Set Agent Result Content to “File Path” or “Path + Metadata” depending on how much detail the agent needs.
- Ask the agent to generate an image — it will call the
generate_imagetool automatically.
Model Comparison
Section titled “Model Comparison”| Feature | Gemini 2.5 Flash Image | Gemini 3 Pro Image |
|---|---|---|
| Speed | Fast | Slower (reasoning) |
| Max Resolution | 1K | 4K |
| Text Rendering | Strong | Exceptional |
| Character Consistency | Yes | Yes |
| Image Blending | Yes | — |
| Estimated Cost | ~$0.039/image | ~$0.134 (1K/2K), ~$0.24 (4K) |
Best Practices
Section titled “Best Practices”- Use Gemini 2.5 Flash Image for rapid iteration and editing workflows where speed matters more than maximum resolution.
- Use Gemini 3 Pro Image when you need 2K/4K output or when the prompt is complex and benefits from the model’s reasoning capabilities.
- Set an Output Directory to organize generated images in a known location. If left empty, images are saved to the ChatTD environment’s
gemini_imagesfolder. - When using the agent tool in “Wait” mode, the agent blocks until generation completes. For long-running 4K generations, consider “Background” mode to keep the agent responsive.
Troubleshooting
Section titled “Troubleshooting”- “google-genai package not installed”: Install the
google-genaipackage via ChatTD’s Python Manager. The operator also requiresPillow,opencv-python, andnumpy. - “Gemini API key not set”: Enter a valid API key in the Gemini API Key field. The key is validated on entry and stored securely.
- API error responses: Check the operator’s Logger for detailed error messages. Common issues include invalid API keys, rate limiting, or content policy violations.
- Empty prompt errors: Ensure the Prompt field is not empty, or that the connected conversation table contains valid user/assistant messages when using “Conversation Table” as the prompt source.
- Higher resolutions not working: 2K and 4K output require Gemini 3 Pro Image. If using Gemini 2.5 Flash Image, the image size setting may be ignored by the API.
Parameters
Section titled “Parameters”Gemini
Section titled “Gemini”op('geminiimagegen').par.Generate Pulse Start the image generation process with the current settings.
- Default:
False
op('geminiimagegen').par.Onin1 Toggle - Default:
False
op('geminiimagegen').par.Prompt Str Enter your image generation prompt here.
- Default:
"" (Empty String)
op('geminiimagegen').par.Inputimage TOP Optionally provide an input TOP for image-to-image tasks.
- Default:
"" (Empty String)
op('geminiimagegen').par.Contextgrabber COMP Optionally provide a ContextGrabber operator to add its context (including images) to the prompt.
- Default:
"" (Empty String)
op('geminiimagegen').par.Outputdir Folder Directory to save generated images. If empty, will use default location in ChatTD directory.
- Default:
"" (Empty String)
op('geminiimagegen').par.Status Str Displays the current status of the image generator.
- Default:
"" (Empty String)
op('geminiimagegen').par.Active Toggle - Default:
False
op('geminiimagegen').par.Displayimage Int - Default:
0- Range:
- 1 to 1
- Slider Range:
- 1 to 1
op('geminiimagegen').par.Setdisplay Toggle - Default:
False
op('geminiimagegen').par.Apikey Str Enter your Gemini API key from Google AI Studio. It will be stored securely.
- Default:
"" (Empty String)
op('geminiimagegen').par.Getapikey Pulse Opens Google AI Studio in your browser to get an API key.
- Default:
False
Changelog
Section titled “Changelog”v1.2.12025-12-06
## geminiimagegen v1.2.1
Added
ResetOp()method to clear all operator data (history table, conversation table, status parameters, logger, and internal state)
v1.2.02025-11-29
Major Changes
- Switched from LiteLLM to direct REST API calls for proper aspect ratio and resolution support
- Added Aspect Ratio parameter with support for: 1:1, 3:2, 2:3, 4:3, 3:4, 16:9, 9:16, 4:5, 5:4, 21:9
- Added Image Size parameter with support for: 1K (Standard), 2K (Enhanced), 4K (Professional)
Model Updates
- Updated model menu:
gemini-2.5-flash-image- Stable production modelgemini-3-pro-image-preview- Preview model with reasoning capabilities- Uses direct REST API (
generativelanguage.googleapis.com/v1beta) instead of SDK - Bypasses SDK v1.52.0 limitation where
image_configparameter is not yet available - Proper
imageConfigsupport withaspectRatioandimageSizein REST payload - Added cost estimation logging:
- Gemini 2.5 Flash Image: $0.039 per image
- Gemini 3 Pro Image 1K/2K: $0.134 per image
- Gemini 3 Pro Image 4K: $0.24 per image
- Changed from
litellmtogoogle-genaipackage - Added
requestsfor direct API calls
Technical Changes
Cost Estimation
Dependency Changes
v1.1.12025-09-01
added nano banana - gemini 2.5 flash image gen
v1.1.02025-06-30
added GetTool method to the operator so it can be used by the LOPs controllers
v1.0.02025-05-02
Initial release