Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extroactor pdf 2 image #11909

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

ic-xu
Copy link
Contributor

@ic-xu ic-xu commented Dec 20, 2024

Summary

Many friends chat with the large model by uploading PDFs. The current approach is to extract the text content from the PDF and input it into the model. We found that this method might lose some important information in the PDF, such as layout, tables, and even the relationships between elements. I hope to input as much information as possible into the model, so I created a feature to convert PDFs to images for input.

Tip

Close issue syntax: Fixes #<issue number> or Resolves #<issue number>, see documentation for more details.

Screenshots

Before After
... image
image

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

feat: add extractor image support in DocumentExtractorNode
feat: add extractor image support in DocumentExtractorNode
feat: output format
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🌊 feat:workflow Workflow related stuff. 💪 enhancement New feature or request labels Dec 20, 2024
feat: output format
feat: reformat
feat: reformat
api/libs/helper.py Outdated Show resolved Hide resolved
crazywoola
crazywoola previously approved these changes Dec 27, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 27, 2024
fix:: Missing return statement  [return]
@laipz8200
Copy link
Member

Adding a special case for PDF - Image in the Document Extractor might confuse users. I think this feature should be implemented as a Tool.

Additionally, models that excel at image recognition, such as Gemini and Claude, also support direct PDF input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request 🌊 feat:workflow Workflow related stuff. lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants