-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use in-memory database with shared cache #719
Conversation
@lieryan this is ready for review. |
rope/contrib/autoimport/sqlite.py
Outdated
db_path = ":memory:" | ||
# Allows the in-memory db to be shared across threads | ||
# See https://www.sqlite.org/inmemorydb.html | ||
project_hash = hash(project and project.ropefolder and project.ropefolder.real_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid using hash
function here, instead use a more standard hashes in hashlib. While we don't really need the strong security property of hashing, the hash()
function is tied to the implementation of dict/set rather than being a general purpose hash function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that I am essentially hashing the string that is returned by project.ropefolder.real_path
. I do the checks as a fail-safe, but it should not happen that project is not None, but project.ropefolder is (source).
I can make the distinctions clearer if you want with another if condition:
project_hash = hash(project and project.ropefolder and project.ropefolder.real_path) | |
project_hash: int | |
if project is None or project.ropefolder is None: | |
project_hash = hash(None) | |
else: | |
project_hash = hash(project.ropefolder.real_path) |
I added a unit test that checks for the most common use case I think: if we have two different projects, then we get two different in-memory databases, while same projects share the database.
I find using hashlib
overkill here, as it doesn't add much value since we pass in None
or str
to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, if project.ropefolder is None, that means the .ropeproject
is disabled for that project. In that case, I think we should hash the project's path instead of the ropefolder's path, at the very least they will still share an in memory database. That will be a change from previous behavior, but I think it should be fine.
Also, this
project_hash = hash(None)
does not look right. It meant that when no project is provided, everything will connect to the same in-memory database.
The expectation here is that when no project is provided, that it should always create a new, empty database, so the right way to do this might be to just generate a random hash or leave that case to use unnamed in memory database. That said, IIRC, project = None is really only intended to be used in unittests anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked the code and it seems project = None is not allowed when creating an AutoImport instance, and it is allowed by create_database_connection, but not even used in tests by us. My feeling is to let everything set on fire if somebody uses create_database_connection without specifying a project, but I don't want to be a bad person so I just create a random hash that has the same format as the regular hash :)
rope/contrib/autoimport/sqlite.py
Outdated
db_path = ":memory:" | ||
# Allows the in-memory db to be shared across threads | ||
# See https://www.sqlite.org/inmemorydb.html | ||
project_hash: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about moving this into its own function, but that would add another hop for understanding what's happening, increasing obscurity of the code base. So I decided against it. I follow the principle of "small interfaces" and "deep modules"
hash() is intended to be used for sets and dictionaries, where the occasional collisions are expected. Also, in some versions of Python, hash() is affected by hash randomization, it's not clear whether this will be desirable.
This better clearly uniquely identify the database. While the likelihood of a problem in practice should be really, really small, in theory because rope is a library, it may share its memory space with other libraries, which may have used the same naming scheme.
ebc616a
to
f75467d
Compare
@tkrabel thank you for creating this PR |
Description
As discussed in #714 (comment), this PR sets the foundation for using
AutoImport
in a multithreading environment by using an in-memory database with shared cache perProject
ifmemory=True
. On-disk databases are not affected.Checklist (delete if not relevant):