Gemini Image Gen

v1.2.1Updated

The Gemini Image Gen LOP generates images using Google’s Gemini image generation models. It supports text-to-image and image-to-image workflows with configurable resolution (up to 4K), ten aspect ratio presets, and optional multi-modal prompt enrichment via a connected Context Grabber.

Agent Tool Integration

🔧 GetTool Enabled 1 tool

This operator exposes 1 tool that allow Agent and Gemini Live LOPs to generate images from text prompts with configurable resolution and aspect ratio.

Use the Tool Debugger operator to inspect exact tool definitions, schemas, and parameters.

When connected to an Agent LOP, the agent can call the generate_image tool to create images on demand. The tool description adapts based on the selected model — Gemini 3 Pro emphasizes reasoning capabilities while Gemini 2.5 Flash highlights editing, character consistency, and multi-image blending.

Two parameters on the Gemini page control how agent tool calls behave:

Agent Execution Mode: Set to “Wait for Completion” to block the agent until the image is ready (the file path is returned), or “Background Processing” to start generation and return immediately.
Agent Result Content: Controls what the agent receives back — “Status Only”, “File Path”, or “Path + Metadata” (includes dimensions, cost, model, and aspect ratio).

Requirements

Gemini API Key: Obtain a key from Google AI Studio and enter it in the Gemini API Key field on the Gemini page. The key is stored securely in the ChatTD environment. You can also pulse Get API Key to open AI Studio in your browser.
Python Packages: google-genai, Pillow, opencv-python, and numpy. Install these via the ChatTD Python Manager.

Input/Output

Inputs

Prompt: Enter text directly in the Prompt field, or set Prompt Source to “Conversation Table” to read from a connected DAT with role and message columns.
Input Image (Optional): Reference a TOP to provide a source image for image-to-image editing tasks.
Context Grabber (Optional): Reference a Context Grabber operator to include its collected text and images in the generation prompt.

Outputs

Conversation Table (conversation_dat): A role/content table compatible with downstream LOPs, logging each prompt and assistant response with image paths, timestamps, model, and cost.
History Table (history_dat): Detailed log of every generation job including job ID, prompt, status, model, aspect ratio, image size, cost estimate, dimensions, and file paths.
Image Viewer: Displays the generated image selected by the Display Image slider.
Image Files: Generated images are saved as PNG files in the configured Output Directory (or a default location within the ChatTD environment).

Usage Examples

Basic Image Generation

On the Gemini page, enter your Gemini API key in the Gemini API Key field.
Select a model from the Model menu — Gemini 2.5 Flash Image for fast generation, or Gemini 3 Pro Image for higher quality with reasoning.
Choose an Image Size (1K, 2K, or 4K) and an Aspect Ratio preset.
Enter your prompt in the Prompt field.
Pulse Generate Image to start generation.
Monitor the Status field. Once complete, the image appears in the operator’s viewer.

Image-to-Image Editing

Drag a TOP operator into the Input Image (Optional) parameter.
Write an editing instruction in the Prompt field (e.g., “Make the sky purple and add northern lights”).
Pulse Generate Image. The model receives both the input image and your prompt.

Using a Context Grabber

Drag a configured Context Grabber operator into the Context Grabber (Optional) parameter.
The text and images collected by the Context Grabber are automatically prepended to your prompt.
Enter any additional instructions in the Prompt field if needed.
Pulse Generate Image.

Generating from a Conversation Table

Set Prompt Source to “Conversation Table”.
Connect a Table DAT with role and message columns to the operator’s first input.
Pulse Generate Image. The operator concatenates user and assistant messages from the table into a single prompt.

Using with an Agent

Create an Agent operator and connect the Gemini Image Gen operator to it as a tool.
Set Agent Execution Mode to “Wait for Completion” so the agent receives the generated file path.
Set Agent Result Content to “File Path” or “Path + Metadata” depending on how much detail the agent needs.
Ask the agent to generate an image — it will call the generate_image tool automatically.

Model Comparison

Feature	Gemini 2.5 Flash Image	Gemini 3 Pro Image
Speed	Fast	Slower (reasoning)
Max Resolution	1K	4K
Text Rendering	Strong	Exceptional
Character Consistency	Yes	Yes
Image Blending	Yes	—
Estimated Cost	~$0.039/image	~$0.134 (1K/2K), ~$0.24 (4K)

Best Practices

Use Gemini 2.5 Flash Image for rapid iteration and editing workflows where speed matters more than maximum resolution.
Use Gemini 3 Pro Image when you need 2K/4K output or when the prompt is complex and benefits from the model’s reasoning capabilities.
Set an Output Directory to organize generated images in a known location. If left empty, images are saved to the ChatTD environment’s gemini_images folder.
When using the agent tool in “Wait” mode, the agent blocks until generation completes. For long-running 4K generations, consider “Background” mode to keep the agent responsive.

Troubleshooting

“google-genai package not installed”: Install the google-genai package via ChatTD’s Python Manager. The operator also requires Pillow, opencv-python, and numpy.
“Gemini API key not set”: Enter a valid API key in the Gemini API Key field. The key is validated on entry and stored securely.
API error responses: Check the operator’s Logger for detailed error messages. Common issues include invalid API keys, rate limiting, or content policy violations.
Empty prompt errors: Ensure the Prompt field is not empty, or that the connected conversation table contains valid user/assistant messages when using “Conversation Table” as the prompt source.
Higher resolutions not working: 2K and 4K output require Gemini 3 Pro Image. If using Gemini 2.5 Flash Image, the image size setting may be ignored by the API.

Parameters

Gemini

Generate Image (Generate) op('geminiimagegen').par.Generate Pulse

Start the image generation process with the current settings.

Default:: False

Model (Model) op('geminiimagegen').par.Model StrMenu

Select the image generation model. Gemini 2.5 Flash Image (nano-banana) is the latest model with advanced capabilities including image blending, character consistency, and enhanced world knowledge.

Default:

gemini/gemini-2.5-flash-image-preview

Menu Options:

gemini/gemini-3-pro-image-preview (gemini/gemini-3-pro-image-preview)
gemini/gemini-2.5-flash-image (gemini/gemini-2.5-flash-image)

Onin1 (Onin1) op('geminiimagegen').par.Onin1 Toggle

Default:: False

Prompt (Prompt) op('geminiimagegen').par.Prompt Str

Enter your image generation prompt here.

Default:: "" (Empty String)

Input Image (Optional) (Inputimage) op('geminiimagegen').par.Inputimage TOP

Optionally provide an input TOP for image-to-image tasks.

Default:: "" (Empty String)

Context Grabber (Optional) (Contextgrabber) op('geminiimagegen').par.Contextgrabber COMP

Optionally provide a ContextGrabber operator to add its context (including images) to the prompt.

Default:: "" (Empty String)

Output Directory (Outputdir) op('geminiimagegen').par.Outputdir Folder

Directory to save generated images. If empty, will use default location in ChatTD directory.

Default:: "" (Empty String)

Status (Status) op('geminiimagegen').par.Status Str

Displays the current status of the image generator.

Default:: "" (Empty String)

Active (Active) op('geminiimagegen').par.Active Toggle

Default:: False

Display Image (Displayimage) op('geminiimagegen').par.Displayimage Int

Default:: 0
Range:: 1 to 1
Slider Range:: 1 to 1

Setdisplay (Setdisplay) op('geminiimagegen').par.Setdisplay Toggle

Default:: False

Gemini API Key (Apikey) op('geminiimagegen').par.Apikey Str

Enter your Gemini API key from Google AI Studio. It will be stored securely.

Default:: "" (Empty String)

Get API Key (Getapikey) op('geminiimagegen').par.Getapikey Pulse

Opens Google AI Studio in your browser to get an API key.

Default:: False

Changelog

v1.2.12025-12-06

## geminiimagegen v1.2.1

Added

ResetOp() method to clear all operator data (history table, conversation table, status parameters, logger, and internal state)

v1.2.02025-11-29

Major Changes

Switched from LiteLLM to direct REST API calls for proper aspect ratio and resolution support
Added Aspect Ratio parameter with support for: 1:1, 3:2, 2:3, 4:3, 3:4, 16:9, 9:16, 4:5, 5:4, 21:9
Added Image Size parameter with support for: 1K (Standard), 2K (Enhanced), 4K (Professional)

Model Updates

Updated model menu:

gemini-2.5-flash-image - Stable production model
gemini-3-pro-image-preview - Preview model with reasoning capabilities

Technical Changes

Uses direct REST API (generativelanguage.googleapis.com/v1beta) instead of SDK
Bypasses SDK v1.52.0 limitation where image_config parameter is not yet available
Proper imageConfig support with aspectRatio and imageSize in REST payload

Cost Estimation

Added cost estimation logging:

Gemini 2.5 Flash Image: $0.039 per image
Gemini 3 Pro Image 1K/2K: $0.134 per image
Gemini 3 Pro Image 4K: $0.24 per image

Dependency Changes

Changed from litellm to google-genai package
Added requests for direct API calls

v1.1.12025-09-01

added nano banana - gemini 2.5 flash image gen

v1.1.02025-06-30

added GetTool method to the operator so it can be used by the LOPs controllers

v1.0.02025-05-02

Initial release