Read text from images in pdf/docx while using GCSDirectoryLoader #18266

vishvas-chauhan · 2024-02-28T16:24:31Z

vishvas-chauhan
Feb 28, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

I tested the GCSDirectoryLoader on a pdf which had car names at the bottom of car images.
I was expecting GCSFileLoader to fetch all text from images in the pdf or docx.

Is it possible to have pre-function catch_image(path) --> boolean like if any image is detected in pdf/docs then return True or False.
For True it goes into Text from image processing ... function and rest default was working fine.

suggestion*:
You may convert image to string and detect text only content. or any other programme that suits.
Thanks

Motivation

My feature request is related to problem because pdf or docs content is missing the info and generates incomplete context. which mislead the decision. I can't implement it at this moment until it validates that all text has been fetched from the docs.

Proposal (If applicable)

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read text from images in pdf/docx while using GCSDirectoryLoader #18266

{{title}}

Replies: 0 comments

Select a reply

Read text from images in pdf/docx while using GCSDirectoryLoader #18266

vishvas-chauhan Feb 28, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

vishvas-chauhan
Feb 28, 2024