-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Sorted Set Merge Performance for Disjoint Sets #53
Comments
You are correct. Locally I get these results:
Merging sorted sets appears 7-8x slower. The performance between chained and interleaved values is nearly identical. Let's look at why. The code for
So a new SortedSet is constructed by chaining
And that takes:
So iterating When a new
That's kind of surprising to me. But given the design of sets I guess it's not too surprising. The iterated union of "chained" native sets is sorted. What if the native sets are interleaved:
I'm impressed the iteration order is still sorted. The iterated union of "interleaved" native sets is also sorted. That means the cost of creating a
So we must be slower than native set union because
Calling
Before we draw conclusions, let's increase the size of the sets to avoid slowdowns based on one-offs like calling
Now
I would guess that
When I run that the result is:
I think that's as fast as we can get. |
It seems to me that this can be written as the general case of performing an operation with an "already sorted by my key" container as an argument. If all sorted containers advertised that they were sorted via a "sorted" attribute containing their key or None, then the internals of the SortedList can merely check that "sorted" attribute, and proceed without re-sorting, and not knowing too much about whether it's part of a set union. In other words, a minor change to the "update" function in the SortedList, skipping the sort if the hasattr(iterable,"sorted") and iterable.sorted==self.key. (I've always wanted a "sorted" flag to be a first class container citizen... with the sorted() function automatically setting this flag to the key used, and any unknown operations clearing it.) |
The "sorted" attribute is kind of already present given the type. Good catch on checking the key function as well. I could expose the key function as a readonly attribute. Then you could simply check: "isinstance(other, SortedList) and other.key == self.key" There has also been some discussion of adding Sorted as a kind of collections ABC. If you work on a proposal then I would be glad to contribute. Andrew Barnert also had some thoughts: http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html I was also thinking that a |
Couple updates:
|
I ran a simple benchmark to compare the merge speed between
SortedSet
and a regular (Python 2.7)set
. I expected the merge of sorted sets to be faster but it turns out it's dramatically slower. This seems weird to me, especially in the "chain" case where I'd expect the merge to be a simple concatenation as the min and max of each do not overlap.Here is my benchmark and its results:
Which gives:
Could you confirm/infirm my expectation regarding these results ?
Thanks.
The text was updated successfully, but these errors were encountered: