Skip to content
This repository was archived by the owner on Jun 14, 2018. It is now read-only.
This repository was archived by the owner on Jun 14, 2018. It is now read-only.

tessedit_char_whitelist . detect only predefined chars .  #78

@MyraBaba

Description

@MyraBaba

Hi,

We are using pyocr to detect labels which is only contains alphanumeric chars and digits.

How I can Apply a specific list of the chars to be detected . ?

I try to :

in libtesseract/__init__py

if "label" in builder.tesseract_configs:
            tesseract_raw.set_is_label(handle, True)

and in tesseract_raw.py:

def set_is_label(handle, mode):
    global g_libtesseract
    assert(g_libtesseract)

    if mode:
        # wl = b"0123456789ABCDEFGHIJKLMNOPRSTUVYZXW"
        wl = b"0123456789ABNOPRSTUVYZXW"

    else:
        wl = b""

    g_libtesseract.TessBaseAPISetVariable(
        ctypes.c_void_p(handle),
        b"tessedit_char_whitelist",
        wl
    )

Bu I couldn't succeed ?

Is there anyway to do it more simple way, like:

tool.image_to_string(
            Image.open("tmp.png"),
            lang="eng",
            tessedit_char_whitelist = "0123456789ABNOPRSTUVYZXW"
            builder=pyocr.builders.LineBoxBuilder()
        )

thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions