You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running v3.1 out of docker containers for testing (per https://docs.papermerge.io/3.1/setup/docker-compose/ ). When you upload and attempt to OCR a digitally signed document, the process fails silently. Looking at the logs (from the worker) finds a logical error message:
[2024-03-02 00:20:56,933: ERROR/ForkPoolWorker-8] Task papermerge.core.tasks.ocr_document_task[77be2d59-9703-42df-a3cc-bf920a61eab4] raised unexpected: DigitalSignatureError()
Traceback (most recent call last):
File "/core_app/.venv/lib/python3.10/site-packages/celery/app/trace.py", line 477, in trace_task
R = retval = fun(*args, **kwargs)
File "/core_app/.venv/lib/python3.10/site-packages/celery/app/trace.py", line 760, in __protected_call__
return self.run(*args, **kwargs)
File "/core_app/papermerge/core/tasks.py", line 79, in ocr_document_task
ocr_document(
File "/core_app/papermerge/core/ocr/document.py", line 86, in ocr_document
_ocr_document(
File "/core_app/papermerge/core/ocr/document.py", line 54, in _ocr_document
ocrmypdf.ocr(
File "/core_app/.venv/lib/python3.10/site-packages/ocrmypdf/api.py", line 337, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
File "/core_app/.venv/lib/python3.10/site-packages/ocrmypdf/_sync.py", line 388, in run_pipeline
validate_pdfinfo_options(context)
File "/core_app/.venv/lib/python3.10/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options
raise DigitalSignatureError()
ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.
I can't find any mention of this anywhere, but supporting OCR for digitally signed documents would be nice. Perhaps the version dropdown can indicate something like "Version X w/ OCRed and w/o digital signature". Honestly, I don't even care about accessing a version of the document with OCR'd text, so long as the text is there for full text search. Especially when dealing with a multiplicity of signed legal documents.
The text was updated successfully, but these errors were encountered:
Would you mind uploading a digitally signed document that I can experiment with? Of course, I mean document without sensitive information. One page document (digitally signed) with a couple of words would do the job just fine.
This will help me understand your request better and, of course, validate the feature while developing it.
Attaching 3. One is a digital document pushed right through docusign. One is the same document printed then scanned, and through docusign. The third is the same print/scan document signed with Adobe Acrobat (which I'm least confident in working, because Adobe...)
Running v3.1 out of docker containers for testing (per https://docs.papermerge.io/3.1/setup/docker-compose/ ). When you upload and attempt to OCR a digitally signed document, the process fails silently. Looking at the logs (from the worker) finds a logical error message:
I can't find any mention of this anywhere, but supporting OCR for digitally signed documents would be nice. Perhaps the version dropdown can indicate something like "Version X w/ OCRed and w/o digital signature". Honestly, I don't even care about accessing a version of the document with OCR'd text, so long as the text is there for full text search. Especially when dealing with a multiplicity of signed legal documents.
The text was updated successfully, but these errors were encountered: