-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make queries fast, filter all flexible attributes #5240
Conversation
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
ad2ea5a
to
9ceffb6
Compare
I'm using the |
3c293fd
to
bdb7fd9
Compare
@@ -1019,7 +1019,7 @@ def find_duplicates(self, lib): | |||
# temporary `Item` object to generate any computed fields. | |||
tmp_item = library.Item(lib, **info) | |||
keys = config["import"]["duplicate_keys"]["item"].as_str_seq() | |||
dup_query = library.Album.all_fields_query( | |||
dup_query = library.Item.match_all_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to me this was supposed to be Item
instead of Album
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Album
and Item
don't redefine all_fields_query
, so calling it is effectively the same with an album or an item object as it's just using LibModel
's function call. It definitely is a bit confusing though so it's a worth fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the same, as it uses Album._fields
and Item._fields
which are different, see
@classmethod
def field_query(
cls,
field,
pattern,
query_cls: Type[FieldQuery] = MatchQuery,
) -> FieldQuery:
"""Get a `FieldQuery` for this model."""
return query_cls(field, pattern, field in cls._fields)
@classmethod
def all_fields_query(
cls: Type["Model"],
pats: Mapping,
query_cls: Type[FieldQuery] = MatchQuery,
):
"""Get a query that matches many fields with different patterns.
`pats` should be a mapping from field names to patterns. The
resulting query is a conjunction ("and") of per-field queries
for all of these field/pattern pairs.
"""
subqueries = [cls.field_query(k, v, query_cls) for k, v in pats.items()]
return AndQuery(subqueries)
I'll be able to review in a week or two, just end of semester push at the moment. |
bdb7fd9
to
070c87f
Compare
Hey, also just chiming in to say that it will take some time for me to go through the current batch of PRs. I won't be able to keep up the response times I had last week, but I'll slowly work through all of them. |
070c87f
to
e2ffa2d
Compare
e2ffa2d
to
1ab1abd
Compare
I've left some comments on specific lines and things, but some general comments. I noticed that you changed I think it's quite clever how you merged all of the flexible attributes and optimised the SQL queries! This is a great PR, will really improve how beets works. Btw, with those later PRs to do, I'd open bugs/enhancements for them so we don't forget! Also, those benchmarks are great. We should add them as a poetry command I think, in another PR would probably be best. What have you used to make those? |
Hm I'm not being able to see them! Did you submit the review? |
@@ -1019,7 +1019,7 @@ def find_duplicates(self, lib): | |||
# temporary `Item` object to generate any computed fields. | |||
tmp_item = library.Item(lib, **info) | |||
keys = config["import"]["duplicate_keys"]["item"].as_str_seq() | |||
dup_query = library.Album.all_fields_query( | |||
dup_query = library.Item.match_all_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Album
and Item
don't redefine all_fields_query
, so calling it is effectively the same with an album or an item object as it's just using LibModel
's function call. It definitely is a bit confusing though so it's a worth fixing!
...I did not! I have now. |
Well noted @Serene-Arc. I think I adjusted every internal plugin that used this functionality. Otherwise, I haven't seen any external plugins based on this interface.
You're right, this completely needs a mention in the changelog! |
1ab1abd
to
a8c0cf4
Compare
This one's a bit complicated. 😅 In short,
In detail1. Prepare each executable for testing
Now one can run 2. Prepare some queries for testingI added the following contents to a file
3. Use
|
c7924ba
to
f253d4d
Compare
@snejus I just updated after this PR, and it seems that everything is painfully slow. Even a simple |
@arsaboo what kind of requests are taking longer? |
Beet update seems like the worst hit; even beet import seems slow. Let me know what additional information you would like. |
hmm, interesting. @Serene-Arc , do not release a new version just yet, I will investigate this over the coming days |
@arsaboo, can you provide the output of Also, what's your system, Python and SQLite versions? |
I added time to the command....even time beet stats
Tracks: 55012
Total time: 27.3 weeks
Approximate total size: 1.2 TiB
Artists: 16399
Albums: 11414
Album artists: 2361
real 0m27.519s
user 0m24.341s
sys 0m3.152s Here's another output from $ time beet import -m -I -t ~/shared/music/ --set genre="Filmi" --search-id 0vGMpTlGXYZ
deezer: Deezer API error: no data
/home/arsaboo/shared/music/Jubin Nautiyal/REDACTED (2024) (1 items)
Match (88.2%):
Jubin Nautiyal - REDACTED
≠ album, tracks
Spotify, None, 2024, None, Tips Industries Ltd, None, None
https://open.spotify.com/album/0vGMpTlGXYZ
* Artist: Jubin Nautiyal
...
➜ [A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort, Print tracks, Open files with Picard,
eDit, edit Candidates?
real 1m15.144s
user 1m1.506s
sys 0m10.224s ~$ sqlite3 --version
3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1
$ python3 --version
Python 3.10.12 |
@arsaboo can you also confirm you're using Ubuntu? |
yes, I am on Ubuntu |
Shouldn't sqlite be automatically updated? |
I am reverting this change, look for an incoming pr |
See #5326 |
Unfortunately Ubuntu does not keep your packages up to date :(. In any case, even with an up-to-date SQLite you will see that these commands take about twice as long to execute - thus the revert, at least for now! |
How unfortunate! This PR was very well done. It does highlight that we should have testing in place though for benchmarking purposes, at least where these types of PRs are concerned. And perhaps a wider range of machines on CI for benchmarking, such as specific ubuntu distros and so on, rather than just the latest. |
Fixes #4360 This PR enables querying albums by track fields and tracks by album fields, and speeds up querying albums by `path` field. It originally was part of #5240, however we found that the changes related to the flexible attributes caused degradation in performance. So this PR contains the first part of #5240 which joined `items` and `albums` tables in queries.
@arsaboo have you updated to Ubuntu 24? |
Agree with you! I was aware that it may be painfully slow on older versions of SQLite, it just that I wasn't aware that at that point Ubuntu shipped with such an old version of SQLite. |
Yes, updated now (after the initial feedback). Does it work better on 24? |
The SQLite version on 22 did not yet include the JSON extension, so it was painfully slow. Try checking out the |
@snejus can you rebase the |
Description
Another and (hopefully) final attempt to improve querying speed.
Fixes #4360
Fixes #3515
and possibly more issues to do with slow queries.
This PR supersedes #4746.
What's been done
The
album
anditem
tables are joined, and corresponding data fromitem_attributes
andalbum_attributes
is merged and made available for filtering. This enables to achieve the following:beet list -a path::some/path
beet list play_count:10
beet list -a title:something
beet list artpath:cover
beet list -a art_source:something
Benchmarks
You can see that now querying speed is more or less constant regardless of the query, and the speed is mostly influenced by how many results need to be printed out
Compare this with what we had previously
To Do
docs/
to describe it.)docs/changelog.rst
near the top of the document.)Later
#5318
#5319