Read text from images in pdf/docx while using GCSDirectoryLoader #18266
vishvas-chauhan
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked
Feature request
I tested the GCSDirectoryLoader on a pdf which had car names at the bottom of car images.
I was expecting GCSFileLoader to fetch all text from images in the pdf or docx.
Is it possible to have pre-function
catch_image(path) --> boolean
like if any image is detected in pdf/docs then return True or False.For True it goes into
Text from image processing ... function
and rest default was working fine.suggestion*:
You may convert image to string and detect text only content. or any other programme that suits.
Thanks
Motivation
My feature request is related to problem because pdf or docs content is missing the info and generates incomplete context. which mislead the decision. I can't implement it at this moment until it validates that all text has been fetched from the docs.
Proposal (If applicable)
No response
Beta Was this translation helpful? Give feedback.
All reactions