The AI Vision node in Fluidfit enables you to analyze images using advanced AI vision models and extract meaningful insights such as detected text, brands, descriptions, keywords, or product categories.
This feature allows users to convert visual content into structured data that can be used within automated workflows. Whether you are analyzing product photos, packaging images, retail shelves, or visual content from campaigns, the AI Vision node helps transform images into actionable information.
The AI Vision node takes an image as input and processes it using the selected AI model along with a custom prompt that guides the analysis.
The system then generates a structured response that can be used in subsequent workflow steps.
Typical workflow:
Provide an image input
Add analysis instructions (prompt)
Select the AI Vision model
Choose the output format
Generate structured insights
The output is sent to a Text node, from where the data can be copied or passed into downstream workflow components.
Fluidfit allows users to choose from multiple AI vision models depending on the complexity and performance requirements.
Available models include:
Gemini 2.0 Flash
Gemini 2.5 Flash Lite
Gemini 2.5 Flash
Gemini 3.1 Flash Lite Preview
Claude Sonnet 4.6
OpenAI GPT-4o
Replicate LLaVA-13B
Each model offers different trade-offs between speed, cost, and analytical capability.
Users can provide a prompt describing how the image should be analyzed.
Example prompt:
Analyze the provided image and return a JSON object containing detected text, brands, product description, keywords, and category.
This prompt helps the AI model understand what information should be extracted from the image.
Fluidfit allows users to select the desired format for the generated output.
Available options:
JSON – Returns structured machine-readable data
Raw Text – Returns a plain text description
Selecting JSON is recommended when the output will be used in automated workflows.
After configuring the model and prompt:
Click Generate
The AI Vision node processes the image
The output appears in the connected Text node
The generated text can then be copied or used in further workflow steps.
Example JSON output from image analysis:
This structured output can be easily used for:
cataloging products
tagging images
automating data pipelines
feeding downstream AI workflows
The AI Vision node can support many real-world scenarios, including:
Product Recognition
Identify products, brands, and packaging information from images.
Retail Shelf Analysis
Detect items present on shelves and categorize them.
Visual Content Tagging
Automatically generate keywords or descriptions for images.
Text Extraction
Identify text elements present within images.
Image Metadata Generation
Generate structured metadata for digital asset management.
To achieve the best results:
Use clear prompts that specify the expected output format
Select JSON output when building automated workflows
The AI Vision node enables Fluidfit users to transform images into structured insights using powerful AI models.
By combining image inputs, intelligent prompts, and flexible output formats, this feature allows visual data to seamlessly integrate into automated workflows.
This makes it possible to turn images into usable business intelligence inside Fluidfit pipelines.