Use in-memory database with shared cache #719

tkrabel · 2023-10-28T20:49:19Z

Description

As discussed in #714 (comment), this PR sets the foundation for using AutoImport in a multithreading environment by using an in-memory database with shared cache per Project if memory=True. On-disk databases are not affected.

Checklist (delete if not relevant):

I have added tests that prove my fix is effective or that my feature works
I have updated CHANGELOG.md
I have made corresponding changes to user documentation for new features
I have made corresponding changes to library documentation for API changes

rope/contrib/autoimport/sqlite.py

tkrabel · 2023-10-28T21:41:54Z

@lieryan this is ready for review.

CHANGELOG.md

lieryan · 2023-10-29T07:26:19Z

rope/contrib/autoimport/sqlite.py

-            db_path = ":memory:"
+            # Allows the in-memory db to be shared across threads
+            # See https://www.sqlite.org/inmemorydb.html
+            project_hash = hash(project and project.ropefolder and project.ropefolder.real_path)


Can we avoid using hash function here, instead use a more standard hashes in hashlib. While we don't really need the strong security property of hashing, the hash() function is tied to the implementation of dict/set rather than being a general purpose hash function.

Please note that I am essentially hashing the string that is returned by project.ropefolder.real_path. I do the checks as a fail-safe, but it should not happen that project is not None, but project.ropefolder is (source).

I can make the distinctions clearer if you want with another if condition:

Suggested change

project_hash = hash(project and project.ropefolder and project.ropefolder.real_path)

project_hash: int

if project is None or project.ropefolder is None:

project_hash = hash(None)

else:

project_hash = hash(project.ropefolder.real_path)

I added a unit test that checks for the most common use case I think: if we have two different projects, then we get two different in-memory databases, while same projects share the database.

I find using hashlib overkill here, as it doesn't add much value since we pass in None or str to it.

IIRC, if project.ropefolder is None, that means the .ropeproject is disabled for that project. In that case, I think we should hash the project's path instead of the ropefolder's path, at the very least they will still share an in memory database. That will be a change from previous behavior, but I think it should be fine.

Also, this

project_hash = hash(None)

does not look right. It meant that when no project is provided, everything will connect to the same in-memory database.

The expectation here is that when no project is provided, that it should always create a new, empty database, so the right way to do this might be to just generate a random hash or leave that case to use unnamed in memory database. That said, IIRC, project = None is really only intended to be used in unittests anyway.

I also checked the code and it seems project = None is not allowed when creating an AutoImport instance, and it is allowed by create_database_connection, but not even used in tests by us. My feeling is to let everything set on fire if somebody uses create_database_connection without specifying a project, but I don't want to be a bad person so I just create a random hash that has the same format as the regular hash :)

tkrabel · 2023-11-01T17:29:02Z

rope/contrib/autoimport/sqlite.py

-            db_path = ":memory:"
+            # Allows the in-memory db to be shared across threads
+            # See https://www.sqlite.org/inmemorydb.html
+            project_hash: int


I thought about moving this into its own function, but that would add another hop for understanding what's happening, increasing obscurity of the code base. So I decided against it. I follow the principle of "small interfaces" and "deep modules"

hash() is intended to be used for sets and dictionaries, where the occasional collisions are expected. Also, in some versions of Python, hash() is affected by hash randomization, it's not clear whether this will be desirable.

This better clearly uniquely identify the database. While the likelihood of a problem in practice should be really, really small, in theory because rope is a library, it may share its memory space with other libraries, which may have used the same naming scheme.

lieryan · 2023-11-05T15:03:31Z

@tkrabel thank you for creating this PR

Use in-memory database with shared cache

afc622b

tkrabel commented Oct 28, 2023

View reviewed changes

rope/contrib/autoimport/sqlite.py Outdated Show resolved Hide resolved

tkrabel added 6 commits October 28, 2023 23:13

add URI

395250d

remove unit test bc the db name is '' for every db created

f021221

unit test share memory

a288c22

unit test: ensure different project uses different in-memory database

fd89af7

add CHANGELOG

9f3c7a3

update CHANGELOG

ab9c09d

lieryan reviewed Oct 29, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

lieryan reviewed Oct 29, 2023

View reviewed changes

lieryan and others added 3 commits October 29, 2023 18:44

Fix grammar

ba65f2e

black format

50ac219

Merge branch 'master' into named-inmemory-db

8cb48cc

tkrabel requested a review from lieryan October 29, 2023 10:57

tkrabel added 2 commits November 1, 2023 18:26

address comment

18f9f4a

Merge branch 'master' into named-inmemory-db

0bd414e

tkrabel commented Nov 1, 2023

View reviewed changes

lieryan added 3 commits November 6, 2023 01:23

Merge remote-tracking branch 'origin/master' into named-inmemory-db

7c39119

Change to use sha256 hashing

9856c02

hash() is intended to be used for sets and dictionaries, where the occasional collisions are expected. Also, in some versions of Python, hash() is affected by hash randomization, it's not clear whether this will be desirable.

lieryan force-pushed the named-inmemory-db branch from ebc616a to f75467d Compare November 5, 2023 14:59

Merge remote-tracking branch 'origin/master' into named-inmemory-db

340176a

lieryan enabled auto-merge November 5, 2023 15:02

lieryan merged commit 78315b5 into python-rope:master Nov 5, 2023
18 checks passed

lieryan added this to the 1.11.0 milestone Nov 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use in-memory database with shared cache #719

Use in-memory database with shared cache #719

tkrabel commented Oct 28, 2023 •

edited

Loading

tkrabel commented Oct 28, 2023

lieryan Oct 29, 2023

tkrabel Oct 29, 2023

lieryan Oct 31, 2023 •

edited

Loading

tkrabel Nov 1, 2023

tkrabel Nov 1, 2023

lieryan commented Nov 5, 2023

-            project_hash = hash(project and project.ropefolder and project.ropefolder.real_path)
+            project_hash: int
+            if project is None or project.ropefolder is None:
+                project_hash = hash(None)
+            else:
+                project_hash = hash(project.ropefolder.real_path)

Use in-memory database with shared cache #719

Use in-memory database with shared cache #719

Conversation

tkrabel commented Oct 28, 2023 • edited Loading

Description

Checklist (delete if not relevant):

tkrabel commented Oct 28, 2023

lieryan Oct 29, 2023

Choose a reason for hiding this comment

tkrabel Oct 29, 2023

Choose a reason for hiding this comment

lieryan Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

tkrabel Nov 1, 2023

Choose a reason for hiding this comment

tkrabel Nov 1, 2023

Choose a reason for hiding this comment

lieryan commented Nov 5, 2023

tkrabel commented Oct 28, 2023 •

edited

Loading

lieryan Oct 31, 2023 •

edited

Loading