Search on encrypted files #485
Replies: 1 comment
-
Oh. That note is still in there from Daniel Quinn and I should probably make that clear. My bad. Anyway, on encryption. Paperless does in fact use a search index, and it does contains terms from your document in plain text. It does not contain the actual content of the documents, but it is certainly possible to reconstruct some content of your documents with that index. Encrypting the search index is pretty difficult. Primarily due to the fact that many internal components in paperless need to access and modify not only the search index, but also other parts that store data (the database, the media directory, and the model that's responsible for the auto matching algorithm):
All of the above require paperless to have the means to decrypt encrypted data. This would require storing the key somewhere on the server (That's what has been done in old paperless, and I don't think its a good solution), and that defeats the purpose of having the data encrypted in the first place. Encryption protects in case of a server breach, and having the key available on that server, well... I'm also not an expert on security, so take that with a grain of salt. Regarding your specific questions:
If I were to add some actual support for multiple users, yes. Or some other means to make sure that users don't find things they're not supposed to see.
See above.
This is exactly what I would do If I would design and implement a secure DMS. Have everything encrypted on the server, and the key for decryption is derived from a user password and never sent to the server. However: These are no easy changes. In fact, we'd need to reconsider pretty much every aspect of paperless. How documents are added. How data is stored in the database. How searching works. Therefore, it's very unlikely that I'll ever add anything related to encryption to paperless. This would be a new project. |
Beta Was this translation helpful? Give feedback.
-
Dear Jonas Winkler,
Thank you for keeping paperless-ng moving forward. Regarding search and encryption as stated by the Important Note:
Is it possible to use an encrypted search index file/files? Encrypted by the user's password, random salt, or both?
A different index per user would be ok, I think?
The search index can be decrypted and stored in a locked-down/privileged portion of disk or memory upon user login or first search if the index is small enough, and destroyed on session end, disconnect, or logout?
Depending on the user's device specs, and the size of the index (which I imagine is small even for several hundred thousand files) the decrypted index can even be stored in the browser cache/memory of the user, which means decryption is done by the client device, likely by javascript?
Check search techniques used by Proton Mail or Tutanota, maybe?
I am not familiar with the internals of paperless-ng, yet it seems if the search is indexed-based and fast, we might be able to do something that makes sense without degrading performance.
Beta Was this translation helpful? Give feedback.
All reactions