Feature: Add image generation tool #283

baasitsharief · 2025-07-31T19:49:07Z

Overview

Goal: Introduce multi-modal image generation capabilities in Rag Studio with admin-controlled enablement/selection, and make chat compatible with the latest LlamaIndex chat message schema.
Scope: Backend (FastAPI, agent tools, session config), Frontend (Tools page, footer Tools Manager, markdown image rendering), dependency updates, and a static serving route for generated images.

What’s new

Image generation tools (beta)
- OpenAI DALL·E via llama-index-tools-openai-image-generation
- AWS Bedrock: Stable Diffusion and Titan Image Generator
Global configuration
- Admin-only setting to enable/disable image generation and choose exactly one provider tool to expose to sessions
- Persisted in tools/image_generation_config.json
Serving generated images
- FastAPI route GET /cache/{filename} serves png/jpg/etc. from llm-service/cache with MIME detection and long-lived cache headers
Chat message schema updates
- Migrate to ChatMessage(blocks=[TextBlock(...)]) for compatibility with newer LlamaIndex versions across streaming and history
UI
- Tools page: new Image Generation section (enable/disable, select provider; blocked deletion of image-gen tools)
- Tools Manager in chat footer: shows only the globally selected image-gen tool when enabled

Key changes

Backend
- New endpoints under llm-service:
  - GET /tools/image-generation → available image-gen tools for current model source
  - GET /tools/image-generation/config → returns { enabled, selected_tool }
  - POST /tools/image-generation/config → admin-only set config; validates tool availability
- Image generation tool specs:
  - OpenAIImageGenerationToolSpec (DALL·E)
  - BedrockStableDiffusionToolSpec, BedrockTitanImageToolSpec
  - Bedrock helpers and pydantic request models
- Chat pipeline:
  - Uses TextBlock-based messages for streaming and history
- Model provider APIs:
  - New list_image_generation_models() on providers; Bedrock implementation returns IMAGE modality models
- Static serving:
  - GET /cache/{filename} reads llm-service/cache and returns proper content-type
Frontend
- toolsApi.ts: queries/mutations for image-gen tools and config
- Tools page (pages/Tools/ToolsPage.tsx):
  - Toggle enablement, select one tool (or auto-select if only one)
  - Save Configuration button; prevents deletion of image-gen tools
- Footer Tools Manager (RagChatTab/FooterComponents/ToolsManager.tsx):
  - Appends the globally selected image-gen tool (if enabled) to the selectable list
- Markdown response (MarkdownResponse.tsx):
Dependencies
- Add llama-index-tools-openai-image-generation>=0.4.0
- Bump llama-index-llms-openai to >=0.4.7
- Bump llama-index-llms-azure-openai to >=0.3.4
Misc
- .gitignore: add tools/image_generation_config.json

API surface

New:
- GET /tools/image-generation
- GET /tools/image-generation/config
- POST /tools/image-generation/config (admin-only)
- GET /cache/{filename} (static image serving)
Behavior:
- DELETE /tools/{name} now rejects deletion of image-gen tools

Configuration

File: tools/image_generation_config.json
- Default: { "enabled": false, "selected_tool": null }
Env:
- OpenAI requires OPENAI_API_KEY
- Bedrock requires AWS credentials and access to selected image models

Security and operational notes

Admin-only config mutation; validated against available tools for the current provider
Image-gen tools are non-deletable via API and hidden from the regular tools table
Cached file serving:
- Serves from llm-service/cache with MIME detection and long Cache-Control
- Follow-up recommendation: ensure path traversal is prevented by verifying Path(file).resolve().is_relative_to(_cache_dir) before reading

Backwards compatibility

Existing chat/query flows remain supported
Internal refactor to TextBlock messages aligns with newer LlamaIndex; all internal call sites updated
No changes required to existing projects or data sources

How to test

Dependencies
- In llm-service: update env and deps (e.g., uv sync or your standard env refresh)
OpenAI flow
- Set OPENAI_API_KEY
- In UI Tools page, enable Image Generation, select “OpenAI Image Generation”, save
- In a chat, enable the tool via Tools Manager and ask for an image; verify a markdown image appears and loads from /llm-service/cache/...
Bedrock flow
- Ensure AWS credentials and access to stability.sd3-5-large-v1:0 or amazon.titan-image-generator-v2:0
- Enable Image Generation and select the desired Bedrock tool; generate an image and verify
Permissions
- Attempt to POST /tools/image-generation/config without admin headers → 401
- Attempt to delete an image-gen tool → 400
UI
- Tools page shows Image Generation section; Save button state reflects changes
- Footer Tools Manager only shows the globally selected image-gen tool when enabled
- Markdown image renders full width; sandbox: prefix is stripped

Known limitations / follow-ups

Add path traversal hardening for /cache/{filename} as noted above
Add automated tests for new endpoints and image generation flows
OpenAI model discovery stubbed; consider implementing list_image_generation_models() for OpenAI
Documentation for enabling image generation and provider prerequisites

lastFile:ui/src/pages/Settings/MetadataDBFields.tsx

lastFile:llm-service/app/config.py

lastFile:ui/src/pages/Settings/MetadataDBFields.tsx

…rd in H2 configuration

# Conflicts: # llm-service/uv.lock

# Conflicts: # backend/src/test/java/com/cloudera/cai/util/db/RdbConfigTest.java # llm-service/app/config.py # llm-service/app/services/amp_metadata/__init__.py # llm-service/app/services/models/_model_source.py # llm-service/app/services/query/querier.py # llm-service/uv.lock # prebuilt_artifacts/fe-dist.tar.gz # prebuilt_artifacts/node-dist.tar.gz # prebuilt_artifacts/rag-api.jar # scripts/release_version.txt # ui/src/api/ampMetadataApi.ts

…arnings

…nd download

…n tools

…values

…tool retrieval

…uration model

… selection logic

… configurations

…efix and updating markdown link format

# Conflicts: # llm-service/app/services/models/__init__.py

…n Vite configuration

…onent

jkwatson and others added 30 commits July 10, 2025 11:29

convert to using blocks for llama index ChatMessages

f4fc127

also use ChatMessage for direct chat

6fbba3a

use text blocks for a couple more cases

3ebe708

update a couple llama index libs

37968e8

fix a bug when data source summary is none

9405431

add in image generation

91cf234

ai generated tests for the evaluators functions

d9498ef

don't try to look up node ids in empty vector stores

9677012

move suggested questions under the sessions route

49bbdb1

fix things up for postgres db access

19f0f43

formatting

b0c6f3c

fixes for not being able to create new dbs

50c1f3f

only set the DB_URL if it isn't already set

b343124

fix the install directory

77e14e1

change location of .nvm and source bash from install_node

0d83634

Update release version to dev-testing

cd202de

removed unused import

3b91002

add logging for initializing the JDBI instance

159b37d

Update release version to dev-testing

1150fbc

wip on ui for metadata

51b6337

lastFile:ui/src/pages/Settings/MetadataDBFields.tsx

wip

af8d53f

lastFile:llm-service/app/config.py

update FE types to match python land

3ebf3c5

fix margin bottom consistency

d10843b

Update release version to dev-testing

c3f2e2e

set the username/password for the database if set from env

497c80b

drop databases

0c9b0a2

lastFile:ui/src/pages/Settings/MetadataDBFields.tsx

Update release version to dev-testing

c3de72a

limit number of retries

3f7ac6e

Update release version to dev-testing

7d4ebfd

bumped bedrock converse and fixed a bug in tool calling check

168e79d

actions-user and others added 30 commits July 22, 2025 21:41

Update release version to dev-testing

a65a696

Use match-case for ModelSource

1c9e2d6

refactor: enhance input validation for JDBC URL, username, and passwo…

5c1a97b

…rd in H2 configuration

Reduce duplication

83e0239

Merge branch 'mob/main' into mob/spike-multi-modal

7d5e1d1

# Conflicts: # llm-service/uv.lock

Mostly satisfy mypy

60c49ad

Satisfy mypy

1b9b5ce

Remove unused import

e0df455

feat: add ModelSource and get_model_source to module exports to fix w…

b3e4ece

…arnings

fix: update API proxy port from 3000 to 8080 in Vite configuration

32ecf9a

feat: implement image generation tools based on model source selection

76cbe5f

refactor: remove unused import of BaseTool from image_generation.py

bbd1f47

refactor: update constructor parameters to use Optional for api_key a…

c801a47

…nd download

Merge branch 'main' into mob/spike-multi-modal

a6fe657

feat: add endpoint and query for retrieving available image generatio…

2504b9d

…n tools

feat: add functionality to select and retrieve image generation tools

3dece7f

feat: update selected image generation tool handling to support null …

8d4d020

…values

refactor: improve error logging format for selected image generation …

1447660

…tool retrieval

feat: refactor image generation tool handling to use a unified config…

5cd2304

…uration model

feat: update ToolsManager to handle session updates and refactor tool…

5c4acf5

… selection logic

feat: add cache static file serving functionality to Vite and Express…

04bd88f

… configurations

feat: enhance image handling in chat responses by removing sandbox pr…

9d30b12

…efix and updating markdown link format

Merge remote-tracking branch 'origin/main' into mob/spike-multi-modal

62ebd42

# Conflicts: # llm-service/app/services/models/__init__.py

refactor: remove unused regex import from chat service

92dcc34

refactor: improve type definitions and clean up cache static plugin i…

3615f53

…n Vite configuration

fix: update valueFormatter to handle undefined values in Metrics comp…

cc2e6c3

…onent

serve images from llm-service/cache

25b6edb

fix mypy

384cffa

Merge branch 'main' into mob/spike-multi-modal

09c6e0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Add image generation tool #283

Feature: Add image generation tool #283

Uh oh!

baasitsharief commented Jul 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Feature: Add image generation tool #283

Are you sure you want to change the base?

Feature: Add image generation tool #283

Uh oh!

Conversation

baasitsharief commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What’s new

Key changes

API surface

Configuration

Security and operational notes

Backwards compatibility

How to test

Known limitations / follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

baasitsharief commented Jul 31, 2025 •

edited

Loading