-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add contains(countIn:where:)
to Sequence
#53
base: main
Are you sure you want to change the base?
Conversation
ed6cdf3
to
5a19dd4
Compare
Have you tried lazy filtering? In general,
The I do think you've hit on something that's hard to do on the first try, but I think we can probably work on something that's a little more general which can be composed easily. It would also help to pare down the APIs needed as the pain points in question don't rely specifically on testing if a number is contained in a range. Let's start with the premise here. Suppose I want to test if a potentially or actually infinite sequence of numbers contains at least four even numbers: let sequence = Swift.sequence(first: 1) { $0 + 1 }
let prefix = sequence.lazy.filter { $0.isMultiple(of: 2) }.prefix(4)
let count = prefix.reduce(0) { x, _ in x + 1 }
print(count == 4) This is certainly unwieldy but not impossible to come up with. The only reason the third line is necessary is that we're not using the That leaves us with the second line as something that's possible to improve upon. Is there something there that can be a nice addition for Swift Algorithms? I think maybe. On the other hand, it's really just composing |
@xwu Thanks for taking a look at this! I think I need to improve the documentation clarifying why this is useful. While the implementation of this function isn’t terribly complex, the addition of this function not only makes it easier to read at the call-site, but helps avoid performance problems with My example in this PR’s description is similar to the kinds of problems I see frequently. When trying to see if there are more than n elements that match a given criteria, the natural way to do this is with something like this: let popUpButton: NSPopUpButton
// …
// Enable the pop up button if the user actually has more than one choice.
popUpButton.isEnabled = popUpButton.itemArray.filter({ $0.isEnabled }).count > 1 This not only creates an intermediary data structure, which can be expensive if you have a lot of elements, but will call the predicate on all n items, even if after just a few, there has already been 2 or more enabled menu items. So the next step is to use popUpButton.isEnabled = popUpButton.itemArray.lazy.filter({ $0.isEnabled }).count > 1 This still results in n comparisons. Using popUpButton.isEnabled = popUpButton.itemArray.lazy.filter({ $0.isEnabled }).reduce(0, { x, _ in x + 1 }) > 1 As far as I’m aware, there isn’t a combination of This function counts (from the beginning) the absolute minimum number necessary to definitively return popUpButton.isEnabled = popUpButton.itemArray.contains(moreThan: 1, where: { $0.isEnabled }) A sample project demonstrates this issue by counting the number of times the predicate is called for each of the alternative methods: // Swap out this line for the one below and it will result in an infinite loop for #1, #2, and #3
let sequence = 1 ... 1_000
//let sequence = Swift.sequence(first: 1, next: { $0 + 1 })
// Keep track of how many times `predicate` is called. This is reset back to before each check for count.
var comparisons: Int = 0
let predicate = { (number: Int) -> Bool in
comparisons += 1
// print("predicate: \(number)")
return number.isMultiple(of: 2)
}
var containsAtLeastFour: Bool = false
// 1. `filter.count`
comparisons = 0
containsAtLeastFour = sequence.filter(predicate).count >= 4
print("comparisons: \(comparisons); filter.count: \(containsAtLeastFour)")
// 2. `lazy.filter.count`
comparisons = 0
containsAtLeastFour = sequence.lazy.filter(predicate).count >= 4
print("comparisons: \(comparisons); lazy.filter.count: \(containsAtLeastFour)")
// 3. `lazy.filter.prefix.count`
comparisons = 0
containsAtLeastFour = sequence.lazy.filter(predicate).prefix(4).count >= 4
print("comparisons: \(comparisons); lazy.filter.prefix.count: \(containsAtLeastFour)")
// 4. `lazy.filter.prefix.reduce`
comparisons = 0
containsAtLeastFour = sequence.lazy.filter(predicate).prefix(4).reduce(0, { x, _ in x + 1 }) >= 4
print("comparisons: \(comparisons); lazy.filter.prefix.reduce: \(containsAtLeastFour)")
// 5. `contains(atLeast:where:)`
comparisons = 0
containsAtLeastFour = sequence.contains(atLeast: 4, where: predicate)
print("comparisons: \(comparisons); contains(atLeast:where:): \(containsAtLeastFour)")
// prints
// comparisons: 1000; filter.count: true
// comparisons: 1000; lazy.filter.count: true
// comparisons: 20; lazy.filter.prefix.count: true
// comparisons: 20; lazy.filter.prefix.reduce: true
// comparisons: 8; contains(atLeast:where:): true All of the methods above except for A more succinct way to describe this function is: |
It's not immediately clear to me why the |
I haven't given these APIs much thought yet, but here's my first impression. It's not immediately obvious to me why all these methods need to take a predicate if Something else I noticed about these methods is that none of them preserve the information of the exact count of the sequence, when this is known. The way I see it, the most fundamental missing functionality that these methods cover can be captured by extension Sequence {
/// Returns the count of the sequence as long as it does not exceed `limit`, and `nil` otherwise.
func count(notExceeding limit: Int) -> Int?
} The whole And finally, the method that takes a |
The original desire was to have a function like Today, if you wanted to check if a sequence contains at least one element matching a certain criteria, you’d use: sequence.contains(where: predicate) If you wanted to check if a sequence contains more than one element matching a certain criteria? The most common solution I’ve seen in practice is something like: sequence.filter(predicate).count > 1 But the performance here is less than ideal, filtering all n elements. Programmers with a bit more familiarity with Swift will avoid the intermediary data structure from If that programmer is in a priority-oriented mindset, they may use I didn’t find it obvious as to why using sequence.lazy.filter(predicate).prefix(2).reduce(0, { x, _ in x + 1 }) == 2
This is a really interesting idea. I had thought of having a kind of lazy sequence.count(where: predicate) > 1 But it would be smart enough to do the least amount of computation necessary to return from that
I think you’re right that those extra five functions aren’t necessary, but I do find reading those are a bit nicer than using a range directly, particularly for ranges like I ran some tests to see the behavior of these various approaches and what I found was surprising. I created two alternate implementations for extension Sequence {
@inlinable
public func _alternateImpl_contains<R: RangeExpression>(
countIn rangeExpression: R,
where predicate: (Element) throws -> Bool
) rethrows -> Bool where R.Bound == Int {
let range = rangeExpression.relative(to: R.Bound.zero..<R.Bound.max)
let threshold: R.Bound
if range.upperBound < R.Bound.max {
threshold = range.upperBound
} else {
threshold = range.lowerBound
}
return try self.lazy.filter(predicate).prefix(threshold).count >= threshold
}
}
extension Sequence {
@inlinable
public func _alternateImpl2_contains<R: RangeExpression>(
countIn rangeExpression: R,
where predicate: (Element) throws -> Bool
) rethrows -> Bool where R.Bound == Int {
let range = rangeExpression.relative(to: R.Bound.zero..<R.Bound.max)
let threshold: R.Bound
if range.upperBound < R.Bound.max {
threshold = range.upperBound
} else {
threshold = range.lowerBound
}
return try self.lazy.filter(predicate).prefix(threshold).reduce(0, { x, _ in x + 1 }) >= threshold
}
} let sequence = Array(1 ... 1_000)
//let sequence = 1 ... 1_000
//let sequence = 1...
//let sequence = Swift.sequence(first: 1, next: { $0 + 1 })
// Keep track of how many times `predicate` is called. This is reset back to before each check for count.
var comparisons: Int = 0
let predicate = { (number: Int) -> Bool in
comparisons += 1
// print("predicate: \(number)")
return number.isMultiple(of: 5)
}
var containsAtLeastFour: Bool = false
// 1. `filter.count`
comparisons = 0
containsAtLeastFour = sequence.filter(predicate).count >= 4
print("comparisons: \(comparisons); filter.count: \(containsAtLeastFour)")
// 2. `lazy.filter.count`
comparisons = 0
containsAtLeastFour = sequence.lazy.filter(predicate).count >= 4
print("comparisons: \(comparisons); lazy.filter.count: \(containsAtLeastFour)")
// 3. `lazy.filter.prefix.count`
comparisons = 0
containsAtLeastFour = sequence.lazy.filter(predicate).prefix(4).count >= 4
print("comparisons: \(comparisons); lazy.filter.prefix.count: \(containsAtLeastFour)")
// 4. `lazy.filter.prefix.reduce`
comparisons = 0
containsAtLeastFour = sequence.lazy.filter(predicate).prefix(4).reduce(0, { x, _ in x + 1 }) >= 4
print("comparisons: \(comparisons); lazy.filter.prefix.reduce: \(containsAtLeastFour)")
// 5. `contains(atLeast:where:)`
comparisons = 0
containsAtLeastFour = sequence.contains(countIn: 4..., where: predicate)
print("comparisons: \(comparisons); contains(atLeast:where:): \(containsAtLeastFour)")
// 6. `_alternateImpl_contains(atLeast:where:)`
comparisons = 0
containsAtLeastFour = sequence._alternateImpl_contains(countIn: 4..., where: predicate)
print("comparisons: \(comparisons); _alternateImpl_contains(atLeast:where:): \(containsAtLeastFour)")
// 7. `_alternateImpl2_contains(atLeast:where:)`
comparisons = 0
containsAtLeastFour = sequence._alternateImpl2_contains(countIn: 4..., where: predicate)
print("comparisons: \(comparisons); _alternateImpl2_contains(atLeast:where:): \(containsAtLeastFour)") // prints
// comparisons: 1000; filter.count: true
// comparisons: 1000; lazy.filter.count: true
// comparisons: 50; lazy.filter.prefix.count: true
// comparisons: 50; lazy.filter.prefix.reduce: true
// comparisons: 20; contains(atLeast:where:): true
// comparisons: 1000; _alternateImpl_contains(atLeast:where:): true
// comparisons: 1000; _alternateImpl2_contains(atLeast:where:): true |
cc @xwu
Partly it's because LazyFilterCollection's /// The position of the first element in a non-empty collection.
///
/// In an empty collection, `startIndex == endIndex`.
///
/// - Complexity: O(*n*), where *n* is the ratio between unfiltered and
/// filtered collection counts.
@inlinable // lazy-performance
public var startIndex: Index {
var index = _base.startIndex
while index != _base.endIndex && !_predicate(_base[index]) {
_base.formIndex(after: &index)
}
return index
} We see prefix actually calls @inlinable
public __consuming func prefix(_ maxLength: Int) -> SubSequence {
_precondition(
maxLength >= 0,
"Can't take a prefix of negative length from a collection")
let end = index(startIndex,
offsetBy: maxLength, limitedBy: endIndex) ?? endIndex
return self[startIndex..<end]
} And also In the example looking filtering for even numbers in
|
5a19dd4
to
79b7e65
Compare
77d9b3e
to
a0c29a2
Compare
@kylemacomber Thanks—that's informative. These APIs are really meant to be composable, and I think we need to think hard about what APIs we offer strictly to squeeze out performance gains and how much that can justify the burden of adding, maintaining, and having users learn a new API. I would propose that any APIs added for this purpose should themselves be deemed useful for further composition and, indeed, be more efficient building blocks than what we currently have. It's also worth considering whether the performance gained is a constant factor (as here) or, say, takes something from O(n^2) to O(n). |
Only if you compare it to the |
I thought that the extension Sequence {
/// Returns an iterator into the sequence that is just past the given number
/// of elements that match the given predicate.
///
/// The following example prints the value after the fifth even element of the range.
///
/// let range = -3..<20
/// if var iterator = range.skippingOver(count: 5, where: { $0.isMultiple(of: 2) }) {
/// if let nextValue = iterator.next() {
/// print(nextValue)
/// } else {
/// print("The range ended at its fifth even number.")
/// }
/// } else {
/// print("The range had less than five even numbers.")
/// }
///
/// // Prints: "7"
///
/// - Precondition:
/// - `count >= 0`.
/// - Either the sequence is finite or at least `count` elements in the
/// sequence match `predicate`.
///
/// - Parameters:
/// - count: The number of occurrences of matching elements to find before
/// stopping.
/// - predicate: A closure that takes an element of the sequence as its
/// argument and returns a Boolean value indicating whether the element
/// is a match.
/// - Returns: If the sequence has fewer than `count` elements that match
/// `predicate`, then `nil`. Otherwise, an iterator that starts with the
/// element after the last match. If that last match was also the last
/// element, then the iterator's virtual sequence will be empty.
///
/// - Complexity: O(*n*), where *n* is the length of the sequence.
public func skippingOver(
count: Int,
where predicate: (Element) throws -> Bool
) rethrows -> Iterator? {
var result = makeIterator()
counter: for _ in 0..<count {
while let element = result.next() {
if try predicate(element) {
continue counter
}
}
return nil
}
return result
}
}
extension Sequence where Element: Equatable {
/// Returns an iterator into the sequence that is just past the given number
/// of elements that equal the given element.
///
/// The following example prints the character after the third "a".
///
/// let sample = "Mary had a little lamb"
/// if var iterator = sample.skippingOver(count: 3, of: "a") {
/// if let nextCharacter = iterator.next() {
/// print("The next character is '\(nextCharacter)'.")
/// } else {
/// print("The third occurence of 'a' is at the last position.")
/// }
/// } else {
/// print("The occurrence count of 'a' is less than 3.")
/// }
///
/// // Prints: "The next character is ' '."
///
/// - Precondition:
/// - `count >= 0`.
/// - Either the sequence is finite or at least `count` elements in the
/// sequence equal `element`.
///
/// - Parameters:
/// - count: The number of occurrences of matching elements to find before
/// stopping.
/// - element: An element to search for in the sequence.
/// - Returns: If the sequence has fewer than `count` elements that equal
/// `element`, then `nil`. Otherwise, an iterator that starts with the
/// element after the last match. If that last match was also the last
/// element, then the iterator's virtual sequence will be empty.
///
/// - Complexity: O(*n*), where *n* is the length of the sequence.
@inlinable
public func skippingOver(count: Int, of element: Element) -> Iterator? {
return skippingOver(count: count, where: { $0 == element })
}
} Maybe this could form a building block to a better API. |
a0c29a2
to
d37b94b
Compare
If you're naming one of the auxiliary methods |
d37b94b
to
a1d95d5
Compare
Sorry for not reviewing this sooner! I'm wondering what this looks like if we approach it from the perspective of
The range-based |
a1d95d5
to
65e90ea
Compare
65e90ea
to
4b68aa8
Compare
Description
This addition allows the caller to compare the count of elements matching a predicate on a sequence.
If you need to compare the count of a filtered sequence, using this method can give you a performance boost over filtering the entire collection, then comparing its count:
Detailed Design
A function named
contains(countIn:where:)
added as an extension toSequence
:Five small wrapper functions added to make working with different ranges easier and more readable at the call-site:
These wrapper functions can make a check like:
more readable:
Documentation Plan
The documentation in
Guides/ContainsCountWhere.md
has been updated to reflect the new functions. Inline code documentation also exists for each of the functions.Test Plan
Source Impact
This is only additive, no existing source should be impacted.
Checklist