Skip to content

Caption

v2.0.0

The Caption LOP generates text descriptions of images using vision-capable large language models. Point it at any TOP in your network, write a prompt, and get back a detailed caption — useful for accessibility, content tagging, visual analysis, or feeding image understanding into downstream LLM workflows.

  • Input 1 (TOP): The image to caption. Any TOP operator in your network can be used as the source.
  • Input 2 (DAT, optional): Conversation history table for multi-turn captioning. Must have columns: role, message, id, timestamp.
  • Output 1 (DAT): Conversation history with the user prompt and assistant response appended.
  • Output 2 (DAT): The generated caption text only.
  • ChatTD Operator: Must be configured with API keys. Set the ChatTD Operator parameter on the About page.
  • Vision-capable model: The selected model must support image inputs (e.g., Gemini Flash, GPT-4o, Claude Sonnet).
  1. Connect a TOP (e.g., a moviefilein or render TOP) to the Caption LOP’s TOP input.
  2. On the Caption page, enter your prompt in Caption Prompt (e.g., “Describe this image in detail”).
  3. On the Model page, select an API Server and AI Model that supports vision.
  4. Pulse Generate Caption.
  5. The caption appears in the output DAT.
  1. Set up a basic caption as above.
  2. Enable Include Input Conversation to carry forward previous exchanges.
  3. Enable Append to Conversation to build up a running dialogue.
  4. Enable Add User Message to include your prompt in the conversation history.
  5. Each time you pulse Generate Caption, the new prompt and response are added to the conversation output, allowing follow-up questions about the same image.
  1. Configure a model and prompt as above.
  2. Toggle Active to On.
  3. The operator will continuously generate captions whenever triggered, useful for real-time image analysis pipelines.
  • Use Temperature at 0 for consistent, factual descriptions. Increase for more creative or varied captions.
  • Set Max Tokens to control response length — 0 uses the model’s default.
  • For multi-turn conversations, enable both Append to Conversation and Add User Message to maintain full context.
  • Wire a conversation DAT into the input to give the model prior context when captioning related images in sequence.
Caption Prompt (Prompt) op('caption').par.Prompt Str
Default:
"" (Empty String)
Include Input Conversation (Includeinput) op('caption').par.Includeinput Toggle
Default:
False
Append to Conversation (Appendconversation) op('caption').par.Appendconversation Toggle
Default:
False
Add User Message (Adduser) op('caption').par.Adduser Toggle
Default:
False
Max Tokens (Maxtokens) op('caption').par.Maxtokens Int
Default:
0
Range:
0 to 1
Slider Range:
0 to 1
Temperature (Temperature) op('caption').par.Temperature Float
Default:
0.0
Range:
0 to 1
Slider Range:
0 to 1
Active (Active) op('caption').par.Active Toggle
Default:
False
Generate Caption (Call) op('caption').par.Call Pulse
Default:
False
Model Selection (Modelselection) op('caption').par.Modelselection Menu
Default:
custom_model
Options:
custom_model, chattd_model, controller_model
API Server (Apiserver) op('caption').par.Apiserver Menu
Default:
openrouter
Options:
openrouter, openai, groq, ollama, gemini, lmstudio, custom
AI Model (Model) op('caption').par.Model StrMenu
Default:
"" (Empty String)
Menu Options:
  • gemini-1.5-flash (gemini-1.5-flash)
  • gemini-1.5-flash-002 (gemini-1.5-flash-002)
  • gemini-1.5-flash-8b (gemini-1.5-flash-8b)
  • gemini-1.5-flash-8b-001 (gemini-1.5-flash-8b-001)
  • gemini-1.5-flash-8b-latest (gemini-1.5-flash-8b-latest)
  • gemini-1.5-flash-latest (gemini-1.5-flash-latest)
  • gemini-1.5-pro (gemini-1.5-pro)
  • gemini-1.5-pro-002 (gemini-1.5-pro-002)
  • gemini-1.5-pro-latest (gemini-1.5-pro-latest)
  • gemini-2.0-flash (gemini-2.0-flash)
  • gemini-2.0-flash-001 (gemini-2.0-flash-001)
  • gemini-2.0-flash-exp (gemini-2.0-flash-exp)
  • gemini-2.0-flash-exp-image-generation (gemini-2.0-flash-exp-image-generation)
  • gemini-2.0-flash-lite (gemini-2.0-flash-lite)
  • gemini-2.0-flash-lite-001 (gemini-2.0-flash-lite-001)
  • gemini-2.0-flash-lite-preview (gemini-2.0-flash-lite-preview)
  • gemini-2.0-flash-lite-preview-02-05 (gemini-2.0-flash-lite-preview-02-05)
  • gemini-2.0-flash-preview-image-generation (gemini-2.0-flash-preview-image-generation)
  • gemini-2.0-flash-thinking-exp (gemini-2.0-flash-thinking-exp)
  • gemini-2.0-flash-thinking-exp-01-21 (gemini-2.0-flash-thinking-exp-01-21)
  • gemini-2.0-flash-thinking-exp-1219 (gemini-2.0-flash-thinking-exp-1219)
  • gemini-2.0-pro-exp (gemini-2.0-pro-exp)
  • gemini-2.0-pro-exp-02-05 (gemini-2.0-pro-exp-02-05)
  • gemini-2.5-flash (gemini-2.5-flash)
  • gemini-2.5-flash-lite (gemini-2.5-flash-lite)
  • gemini-2.5-flash-lite-preview-06-17 (gemini-2.5-flash-lite-preview-06-17)
  • gemini-2.5-flash-preview-05-20 (gemini-2.5-flash-preview-05-20)
  • gemini-2.5-flash-preview-tts (gemini-2.5-flash-preview-tts)
  • gemini-2.5-pro (gemini-2.5-pro)
  • gemini-2.5-pro-preview-03-25 (gemini-2.5-pro-preview-03-25)
  • gemini-2.5-pro-preview-05-06 (gemini-2.5-pro-preview-05-06)
  • gemini-2.5-pro-preview-06-05 (gemini-2.5-pro-preview-06-05)
  • gemini-2.5-pro-preview-tts (gemini-2.5-pro-preview-tts)
  • gemini-exp-1206 (gemini-exp-1206)
  • gemma-3-12b-it (gemma-3-12b-it)
  • gemma-3-1b-it (gemma-3-1b-it)
  • gemma-3-27b-it (gemma-3-27b-it)
  • gemma-3-4b-it (gemma-3-4b-it)
  • gemma-3n-e2b-it (gemma-3n-e2b-it)
  • gemma-3n-e4b-it (gemma-3n-e4b-it)
  • learnlm-2.0-flash-experimental (learnlm-2.0-flash-experimental)
Model Controller (Modelcontroller) op('caption').par.Modelcontroller OP
Default:
"" (Empty String)
Search Models (Search) op('caption').par.Search Toggle
Default:
False
Model Search (Modelsearch) op('caption').par.Modelsearch Str
Default:
"" (Empty String)
v2.0.02025-07-30
  • Migrated from DotChatUtil to DotLOPUtils base class
  • Reduced code from 380 to 130 lines (68% reduction)
  • Simplified model selection using setup_standard_model_page()
  • Replaced manual ChatTD.Customapicall with make_api_call() method
  • Inherited streaming, tool execution, and error handling from DotLOPUtils
  • Maintained standard conversation_dat output pattern for LLM/agent consistency
  • Preserved all original functionality: vision support, conversation chaining, parameter compatibility
  • Added ResetOp function with logger.Clearlog() integration
v1.0.02024-11-09

Initial release