Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use in-memory database with shared cache #719

Merged
merged 16 commits into from
Nov 5, 2023

Conversation

tkrabel
Copy link
Contributor

@tkrabel tkrabel commented Oct 28, 2023

Description

As discussed in #714 (comment), this PR sets the foundation for using AutoImport in a multithreading environment by using an in-memory database with shared cache per Project if memory=True. On-disk databases are not affected.

Checklist (delete if not relevant):

  • I have added tests that prove my fix is effective or that my feature works
  • I have updated CHANGELOG.md
  • I have made corresponding changes to user documentation for new features
  • I have made corresponding changes to library documentation for API changes

@tkrabel
Copy link
Contributor Author

tkrabel commented Oct 28, 2023

@lieryan this is ready for review.

CHANGELOG.md Outdated Show resolved Hide resolved
db_path = ":memory:"
# Allows the in-memory db to be shared across threads
# See https://www.sqlite.org/inmemorydb.html
project_hash = hash(project and project.ropefolder and project.ropefolder.real_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid using hash function here, instead use a more standard hashes in hashlib. While we don't really need the strong security property of hashing, the hash() function is tied to the implementation of dict/set rather than being a general purpose hash function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that I am essentially hashing the string that is returned by project.ropefolder.real_path. I do the checks as a fail-safe, but it should not happen that project is not None, but project.ropefolder is (source).

I can make the distinctions clearer if you want with another if condition:

Suggested change
project_hash = hash(project and project.ropefolder and project.ropefolder.real_path)
project_hash: int
if project is None or project.ropefolder is None:
project_hash = hash(None)
else:
project_hash = hash(project.ropefolder.real_path)

I added a unit test that checks for the most common use case I think: if we have two different projects, then we get two different in-memory databases, while same projects share the database.

I find using hashlib overkill here, as it doesn't add much value since we pass in None or str to it.

Copy link
Member

@lieryan lieryan Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, if project.ropefolder is None, that means the .ropeproject is disabled for that project. In that case, I think we should hash the project's path instead of the ropefolder's path, at the very least they will still share an in memory database. That will be a change from previous behavior, but I think it should be fine.

Also, this

project_hash = hash(None)

does not look right. It meant that when no project is provided, everything will connect to the same in-memory database.

The expectation here is that when no project is provided, that it should always create a new, empty database, so the right way to do this might be to just generate a random hash or leave that case to use unnamed in memory database. That said, IIRC, project = None is really only intended to be used in unittests anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked the code and it seems project = None is not allowed when creating an AutoImport instance, and it is allowed by create_database_connection, but not even used in tests by us. My feeling is to let everything set on fire if somebody uses create_database_connection without specifying a project, but I don't want to be a bad person so I just create a random hash that has the same format as the regular hash :)

@tkrabel tkrabel requested a review from lieryan October 29, 2023 10:57
db_path = ":memory:"
# Allows the in-memory db to be shared across threads
# See https://www.sqlite.org/inmemorydb.html
project_hash: int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about moving this into its own function, but that would add another hop for understanding what's happening, increasing obscurity of the code base. So I decided against it. I follow the principle of "small interfaces" and "deep modules"

hash() is intended to be used for sets and dictionaries, where the
occasional collisions are expected. Also, in some versions of Python,
hash() is affected by hash randomization, it's not clear whether this
will be desirable.
This better clearly uniquely identify the database. While the
likelihood of a problem in practice should be really, really small,
in theory because rope is a library, it may share its memory space with
other libraries, which may have used the same naming scheme.
@lieryan lieryan enabled auto-merge November 5, 2023 15:02
@lieryan lieryan merged commit 78315b5 into python-rope:master Nov 5, 2023
18 checks passed
@lieryan
Copy link
Member

lieryan commented Nov 5, 2023

@tkrabel thank you for creating this PR

@lieryan lieryan added this to the 1.11.0 milestone Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants