Skip to content

Conversation

baasitsharief
Copy link
Collaborator

@baasitsharief baasitsharief commented Jul 31, 2025

Overview

  • Goal: Introduce multi-modal image generation capabilities in Rag Studio with admin-controlled enablement/selection, and make chat compatible with the latest LlamaIndex chat message schema.
  • Scope: Backend (FastAPI, agent tools, session config), Frontend (Tools page, footer Tools Manager, markdown image rendering), dependency updates, and a static serving route for generated images.

What’s new

  • Image generation tools (beta)
    • OpenAI DALL·E via llama-index-tools-openai-image-generation
    • AWS Bedrock: Stable Diffusion and Titan Image Generator
  • Global configuration
    • Admin-only setting to enable/disable image generation and choose exactly one provider tool to expose to sessions
    • Persisted in tools/image_generation_config.json
  • Serving generated images
    • FastAPI route GET /cache/{filename} serves png/jpg/etc. from llm-service/cache with MIME detection and long-lived cache headers
  • Chat message schema updates
    • Migrate to ChatMessage(blocks=[TextBlock(...)]) for compatibility with newer LlamaIndex versions across streaming and history
  • UI
    • Tools page: new Image Generation section (enable/disable, select provider; blocked deletion of image-gen tools)
    • Tools Manager in chat footer: shows only the globally selected image-gen tool when enabled

Key changes

  • Backend
    • New endpoints under llm-service:
      • GET /tools/image-generation → available image-gen tools for current model source
      • GET /tools/image-generation/config → returns { enabled, selected_tool }
      • POST /tools/image-generation/config → admin-only set config; validates tool availability
    • Image generation tool specs:
      • OpenAIImageGenerationToolSpec (DALL·E)
      • BedrockStableDiffusionToolSpec, BedrockTitanImageToolSpec
      • Bedrock helpers and pydantic request models
    • Chat pipeline:
      • Uses TextBlock-based messages for streaming and history
    • Model provider APIs:
      • New list_image_generation_models() on providers; Bedrock implementation returns IMAGE modality models
    • Static serving:
      • GET /cache/{filename} reads llm-service/cache and returns proper content-type
  • Frontend
    • toolsApi.ts: queries/mutations for image-gen tools and config
    • Tools page (pages/Tools/ToolsPage.tsx):
      • Toggle enablement, select one tool (or auto-select if only one)
      • Save Configuration button; prevents deletion of image-gen tools
    • Footer Tools Manager (RagChatTab/FooterComponents/ToolsManager.tsx):
      • Appends the globally selected image-gen tool (if enabled) to the selectable list
    • Markdown response (MarkdownResponse.tsx):
  • Dependencies
    • Add llama-index-tools-openai-image-generation>=0.4.0
    • Bump llama-index-llms-openai to >=0.4.7
    • Bump llama-index-llms-azure-openai to >=0.3.4
  • Misc
    • .gitignore: add tools/image_generation_config.json

API surface

  • New:
    • GET /tools/image-generation
    • GET /tools/image-generation/config
    • POST /tools/image-generation/config (admin-only)
    • GET /cache/{filename} (static image serving)
  • Behavior:
    • DELETE /tools/{name} now rejects deletion of image-gen tools

Configuration

  • File: tools/image_generation_config.json
    • Default: { "enabled": false, "selected_tool": null }
  • Env:
    • OpenAI requires OPENAI_API_KEY
    • Bedrock requires AWS credentials and access to selected image models

Security and operational notes

  • Admin-only config mutation; validated against available tools for the current provider
  • Image-gen tools are non-deletable via API and hidden from the regular tools table
  • Cached file serving:
    • Serves from llm-service/cache with MIME detection and long Cache-Control
    • Follow-up recommendation: ensure path traversal is prevented by verifying Path(file).resolve().is_relative_to(_cache_dir) before reading

Backwards compatibility

  • Existing chat/query flows remain supported
  • Internal refactor to TextBlock messages aligns with newer LlamaIndex; all internal call sites updated
  • No changes required to existing projects or data sources

How to test

  • Dependencies
    • In llm-service: update env and deps (e.g., uv sync or your standard env refresh)
  • OpenAI flow
    • Set OPENAI_API_KEY
    • In UI Tools page, enable Image Generation, select “OpenAI Image Generation”, save
    • In a chat, enable the tool via Tools Manager and ask for an image; verify a markdown image appears and loads from /llm-service/cache/...
  • Bedrock flow
    • Ensure AWS credentials and access to stability.sd3-5-large-v1:0 or amazon.titan-image-generator-v2:0
    • Enable Image Generation and select the desired Bedrock tool; generate an image and verify
  • Permissions
    • Attempt to POST /tools/image-generation/config without admin headers → 401
    • Attempt to delete an image-gen tool → 400
  • UI
    • Tools page shows Image Generation section; Save button state reflects changes
    • Footer Tools Manager only shows the globally selected image-gen tool when enabled
    • Markdown image renders full width; sandbox: prefix is stripped

Known limitations / follow-ups

  • Add path traversal hardening for /cache/{filename} as noted above
  • Add automated tests for new endpoints and image generation flows
  • OpenAI model discovery stubbed; consider implementing list_image_generation_models() for OpenAI
  • Documentation for enabling image generation and provider prerequisites

jkwatson and others added 30 commits July 10, 2025 11:29
lastFile:ui/src/pages/Settings/MetadataDBFields.tsx
lastFile:llm-service/app/config.py
lastFile:ui/src/pages/Settings/MetadataDBFields.tsx
actions-user and others added 30 commits July 22, 2025 21:41
# Conflicts:
#	llm-service/uv.lock
# Conflicts:
#	backend/src/test/java/com/cloudera/cai/util/db/RdbConfigTest.java
#	llm-service/app/config.py
#	llm-service/app/services/amp_metadata/__init__.py
#	llm-service/app/services/models/_model_source.py
#	llm-service/app/services/query/querier.py
#	llm-service/uv.lock
#	prebuilt_artifacts/fe-dist.tar.gz
#	prebuilt_artifacts/node-dist.tar.gz
#	prebuilt_artifacts/rag-api.jar
#	scripts/release_version.txt
#	ui/src/api/ampMetadataApi.ts
# Conflicts:
#	llm-service/app/services/models/__init__.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants