Skip to content

A Dify plugin that extracts text from images using the OnnxOCR API. This plugin provides a simple and efficient way to convert images containing text into machine-readable text format within Dify workflows and applications.

Notifications You must be signed in to change notification settings

ding113/dify-OnnxOCR-plugin

Repository files navigation

OnnxOCR for Dify

English | 中文

Description

OnnxOCR is a Dify plugin that extracts text from images using the OnnxOCR API. This plugin provides a simple and efficient way to convert images containing text into machine-readable text format within Dify workflows and applications.

Features

  • Image URL Support: Process images from web URLs
  • High Performance: Powered by OnnxOCR's ONNX-optimized PaddleOCR backend
  • Configurable API Endpoint: Use your own OnnxOCR service endpoint
  • Dual Output Format: Provides both clean text and structured JSON data
  • Robust Error Handling: Built-in retry mechanism and comprehensive error reporting
  • Workflow Integration: Provides structured variables for use in Dify workflows

Parameters

Parameter Type Required Default Description
image_url string Yes - URL of the image to process
api_endpoint string No http://127.0.0.1:5005 OCR API endpoint URL

Output Variables

The plugin provides the following variables for use in workflows:

  • extracted_text (string): Clean extracted text from the image (line-separated)
  • ocr_results (object): Complete OCR API response with coordinates and confidence scores
  • processing_time (number): Time taken to process the image in seconds

Output Format

The plugin automatically outputs both formats simultaneously:

  1. Text Display: Clean text format shown to users
  2. JSON Data: Complete OCR response available as structured data
  3. Workflow Variables: Both text and JSON formats available for workflow nodes

Usage Examples

Text Output (User Display)

Name
Header
Additional Text

JSON Output (Workflow Data)

{
  "processing_time": 0.456,
  "results": [
    {
      "text": "Name",
      "confidence": 0.9999361634254456,
      "bounding_box": [[4.0, 8.0], [31.0, 8.0], [31.0, 24.0], [4.0, 24.0]]
    },
    {
      "text": "Header", 
      "confidence": 0.9998759031295776,
      "bounding_box": [[233.0, 7.0], [258.0, 7.0], [258.0, 23.0], [233.0, 23.0]]
    }
  ]
}

Workflow Integration Guide

Basic Integration Steps

  1. Add OnnxOCR Node: Add the OnnxOCR tool to your workflow
  2. Configure Parameters: Set the image URL and optional API endpoint
  3. Use Output Variables: Reference the extracted text in subsequent nodes

Processing Uploaded Images

If you need to process user-uploaded images, use a code execution node to extract file URLs:

def main(files: list) -> dict:
    image_urls = []
    
    if files:
        for file_obj in files:
            if isinstance(file_obj, dict) and file_obj.get('type') == 'image' and 'url' in file_obj:
                image_urls.append(file_obj['url'])
    
    return {
        'image_urls': image_urls
    }

Usage Steps:

  1. Add a Code Execution Node with the above code
  2. Pass file uploads as input to the code node
  3. The code node outputs image_urls (Array[string])
  4. Use an Iteration Node to loop through the URL array
  5. Call the OnnxOCR node within the iteration to process each image

Installation

Step 1: Install OnnxOCR Service

This plugin requires an OnnxOCR service to be running. You can install and set up the original OnnxOCR project:

Option 1: Using the Original OnnxOCR Project

  1. Clone the repository:

    git clone https://github.com/jingsongliujing/OnnxOCR.git
    cd OnnxOCR
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the service:

    python app.py

    The service will start on http://127.0.0.1:5005 by default.

Option 2: Docker Installation

  1. Build the Docker image:

    git clone https://github.com/jingsongliujing/OnnxOCR.git
    cd OnnxOCR
    docker build -t onnxocr .
  2. Run the Docker container:

    docker run -p 5005:5005 onnxocr

Step 2: Install Dify Plugin

  1. Download or clone this plugin repository
  2. Upload the plugin to your Dify instance
  3. Configure the API endpoint if using a custom OnnxOCR service
  4. Start using the tool in your applications or workflows

API Requirements

This plugin requires an OnnxOCR service running and accessible at the configured endpoint. The service should accept POST requests to /ocr with the following format:

Request:

{
  "image": "base64_encoded_image_data"
}

Response:

{
  "processing_time": 0.456,
  "results": [
    {
      "text": "extracted_text",
      "confidence": 0.99,
      "bounding_box": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
    }
  ]
}

Error Handling

The plugin includes comprehensive error handling for:

  • Invalid image URLs
  • Network connectivity issues
  • API endpoint unavailability
  • Image processing failures

All errors are reported with descriptive messages to help troubleshoot issues.

Acknowledgments

This plugin is built on top of the excellent OnnxOCR project by jingsongliujing. OnnxOCR provides a high-performance OCR solution using ONNX Runtime with PaddleOCR models, delivering fast and accurate text recognition capabilities.

Original Project: https://github.com/jingsongliujing/OnnxOCR

We extend our gratitude to the OnnxOCR project maintainers and contributors for their excellent work in making OCR accessible and efficient.

Supported Image Formats

The plugin supports all image formats supported by the underlying OnnxOCR service:

  • JPEG
  • PNG
  • BMP
  • GIF
  • WebP
  • TIFF

License

This plugin is provided as-is for use with Dify. Please refer to the original OnnxOCR project for licensing information regarding the OCR service.

Contributing

Contributions to improve this plugin are welcome. Please ensure that any changes maintain compatibility with the OnnxOCR API format.

Support

For issues related to:

  • Plugin functionality: Open an issue in this repository
  • OCR service: Refer to the OnnxOCR project
  • Dify integration: Check the Dify documentation

About

A Dify plugin that extracts text from images using the OnnxOCR API. This plugin provides a simple and efficient way to convert images containing text into machine-readable text format within Dify workflows and applications.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages