Skip to content

Gemini Image Gen

v1.2.1Updated

The Gemini Image Gen LOP generates images using Google’s Gemini image generation models. It supports text-to-image and image-to-image workflows with configurable resolution (up to 4K), ten aspect ratio presets, and optional multi-modal prompt enrichment via a connected Context Grabber.

🔧 GetTool Enabled 1 tool

This operator exposes 1 tool that allow Agent and Gemini Live LOPs to generate images from text prompts with configurable resolution and aspect ratio.

When connected to an Agent LOP, the agent can call the generate_image tool to create images on demand. The tool description adapts based on the selected model — Gemini 3 Pro emphasizes reasoning capabilities while Gemini 2.5 Flash highlights editing, character consistency, and multi-image blending.

Two parameters on the Gemini page control how agent tool calls behave:

  • Agent Execution Mode: Set to “Wait for Completion” to block the agent until the image is ready (the file path is returned), or “Background Processing” to start generation and return immediately.
  • Agent Result Content: Controls what the agent receives back — “Status Only”, “File Path”, or “Path + Metadata” (includes dimensions, cost, model, and aspect ratio).
  • Gemini API Key: Obtain a key from Google AI Studio and enter it in the Gemini API Key field on the Gemini page. The key is stored securely in the ChatTD environment. You can also pulse Get API Key to open AI Studio in your browser.
  • Python Packages: google-genai, Pillow, opencv-python, and numpy. Install these via the ChatTD Python Manager.
  • Prompt: Enter text directly in the Prompt field, or set Prompt Source to “Conversation Table” to read from a connected DAT with role and message columns.
  • Input Image (Optional): Reference a TOP to provide a source image for image-to-image editing tasks.
  • Context Grabber (Optional): Reference a Context Grabber operator to include its collected text and images in the generation prompt.
  • Conversation Table (conversation_dat): A role/content table compatible with downstream LOPs, logging each prompt and assistant response with image paths, timestamps, model, and cost.
  • History Table (history_dat): Detailed log of every generation job including job ID, prompt, status, model, aspect ratio, image size, cost estimate, dimensions, and file paths.
  • Image Viewer: Displays the generated image selected by the Display Image slider.
  • Image Files: Generated images are saved as PNG files in the configured Output Directory (or a default location within the ChatTD environment).
  1. On the Gemini page, enter your Gemini API key in the Gemini API Key field.
  2. Select a model from the Model menu — Gemini 2.5 Flash Image for fast generation, or Gemini 3 Pro Image for higher quality with reasoning.
  3. Choose an Image Size (1K, 2K, or 4K) and an Aspect Ratio preset.
  4. Enter your prompt in the Prompt field.
  5. Pulse Generate Image to start generation.
  6. Monitor the Status field. Once complete, the image appears in the operator’s viewer.
  1. Drag a TOP operator into the Input Image (Optional) parameter.
  2. Write an editing instruction in the Prompt field (e.g., “Make the sky purple and add northern lights”).
  3. Pulse Generate Image. The model receives both the input image and your prompt.
  1. Drag a configured Context Grabber operator into the Context Grabber (Optional) parameter.
  2. The text and images collected by the Context Grabber are automatically prepended to your prompt.
  3. Enter any additional instructions in the Prompt field if needed.
  4. Pulse Generate Image.
  1. Set Prompt Source to “Conversation Table”.
  2. Connect a Table DAT with role and message columns to the operator’s first input.
  3. Pulse Generate Image. The operator concatenates user and assistant messages from the table into a single prompt.
  1. Create an Agent operator and connect the Gemini Image Gen operator to it as a tool.
  2. Set Agent Execution Mode to “Wait for Completion” so the agent receives the generated file path.
  3. Set Agent Result Content to “File Path” or “Path + Metadata” depending on how much detail the agent needs.
  4. Ask the agent to generate an image — it will call the generate_image tool automatically.
FeatureGemini 2.5 Flash ImageGemini 3 Pro Image
SpeedFastSlower (reasoning)
Max Resolution1K4K
Text RenderingStrongExceptional
Character ConsistencyYesYes
Image BlendingYes
Estimated Cost~$0.039/image~$0.134 (1K/2K), ~$0.24 (4K)
  • Use Gemini 2.5 Flash Image for rapid iteration and editing workflows where speed matters more than maximum resolution.
  • Use Gemini 3 Pro Image when you need 2K/4K output or when the prompt is complex and benefits from the model’s reasoning capabilities.
  • Set an Output Directory to organize generated images in a known location. If left empty, images are saved to the ChatTD environment’s gemini_images folder.
  • When using the agent tool in “Wait” mode, the agent blocks until generation completes. For long-running 4K generations, consider “Background” mode to keep the agent responsive.
  • “google-genai package not installed”: Install the google-genai package via ChatTD’s Python Manager. The operator also requires Pillow, opencv-python, and numpy.
  • “Gemini API key not set”: Enter a valid API key in the Gemini API Key field. The key is validated on entry and stored securely.
  • API error responses: Check the operator’s Logger for detailed error messages. Common issues include invalid API keys, rate limiting, or content policy violations.
  • Empty prompt errors: Ensure the Prompt field is not empty, or that the connected conversation table contains valid user/assistant messages when using “Conversation Table” as the prompt source.
  • Higher resolutions not working: 2K and 4K output require Gemini 3 Pro Image. If using Gemini 2.5 Flash Image, the image size setting may be ignored by the API.
Generate Image (Generate) op('geminiimagegen').par.Generate Pulse

Start the image generation process with the current settings.

Default:
False
Model (Model) op('geminiimagegen').par.Model StrMenu

Select the image generation model. Gemini 2.5 Flash Image (nano-banana) is the latest model with advanced capabilities including image blending, character consistency, and enhanced world knowledge.

Default:
gemini/gemini-2.5-flash-image-preview
Menu Options:
  • gemini/gemini-3-pro-image-preview (gemini/gemini-3-pro-image-preview)
  • gemini/gemini-2.5-flash-image (gemini/gemini-2.5-flash-image)
Prompt Source (Promptsource) op('geminiimagegen').par.Promptsource Menu

Source of the prompt text for image generation.

Default:
parameter
Options:
parameter, input_dat
Onin1 (Onin1) op('geminiimagegen').par.Onin1 Toggle
Default:
False
Prompt (Prompt) op('geminiimagegen').par.Prompt Str

Enter your image generation prompt here.

Default:
"" (Empty String)
Input Image (Optional) (Inputimage) op('geminiimagegen').par.Inputimage TOP

Optionally provide an input TOP for image-to-image tasks.

Default:
"" (Empty String)
Context Grabber (Optional) (Contextgrabber) op('geminiimagegen').par.Contextgrabber COMP

Optionally provide a ContextGrabber operator to add its context (including images) to the prompt.

Default:
"" (Empty String)
Output Directory (Outputdir) op('geminiimagegen').par.Outputdir Folder

Directory to save generated images. If empty, will use default location in ChatTD directory.

Default:
"" (Empty String)
Status (Status) op('geminiimagegen').par.Status Str

Displays the current status of the image generator.

Default:
"" (Empty String)
Active (Active) op('geminiimagegen').par.Active Toggle
Default:
False
Display Image (Displayimage) op('geminiimagegen').par.Displayimage Int
Default:
0
Range:
1 to 1
Slider Range:
1 to 1
Setdisplay (Setdisplay) op('geminiimagegen').par.Setdisplay Toggle
Default:
False
Gemini API Key (Apikey) op('geminiimagegen').par.Apikey Str

Enter your Gemini API key from Google AI Studio. It will be stored securely.

Default:
"" (Empty String)
Get API Key (Getapikey) op('geminiimagegen').par.Getapikey Pulse

Opens Google AI Studio in your browser to get an API key.

Default:
False
Image Size (Imagesize) op('geminiimagegen').par.Imagesize Menu

Output resolution. Higher resolutions available with Gemini 3 Pro. Must use uppercase K.

Default:
1K
Options:
1K, 2K, 4K
Aspect Ratio (Aspectratio) op('geminiimagegen').par.Aspectratio Menu

Aspect ratio for generated images.

Default:
1:1
Options:
1:1, 3:2, 2:3, 4:3, 3:4, 16:9, 9:16, 4:5, 5:4, 21:9
v1.2.12025-12-06

## geminiimagegen v1.2.1

Added

  • ResetOp() method to clear all operator data (history table, conversation table, status parameters, logger, and internal state)
v1.2.02025-11-29

Major Changes

  • Switched from LiteLLM to direct REST API calls for proper aspect ratio and resolution support
  • Added Aspect Ratio parameter with support for: 1:1, 3:2, 2:3, 4:3, 3:4, 16:9, 9:16, 4:5, 5:4, 21:9
  • Added Image Size parameter with support for: 1K (Standard), 2K (Enhanced), 4K (Professional)

Model Updates

  • Updated model menu:
    • gemini-2.5-flash-image - Stable production model
    • gemini-3-pro-image-preview - Preview model with reasoning capabilities

    Technical Changes

    • Uses direct REST API (generativelanguage.googleapis.com/v1beta) instead of SDK
    • Bypasses SDK v1.52.0 limitation where image_config parameter is not yet available
    • Proper imageConfig support with aspectRatio and imageSize in REST payload

    Cost Estimation

    • Added cost estimation logging:
      • Gemini 2.5 Flash Image: $0.039 per image
      • Gemini 3 Pro Image 1K/2K: $0.134 per image
      • Gemini 3 Pro Image 4K: $0.24 per image

      Dependency Changes

      • Changed from litellm to google-genai package
      • Added requests for direct API calls
v1.1.12025-09-01

added nano banana - gemini 2.5 flash image gen

v1.1.02025-06-30

added GetTool method to the operator so it can be used by the LOPs controllers

v1.0.02025-05-02

Initial release