Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Two Sorted Sequences, Attempt 3 #236

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

CTMacUser
Copy link
Contributor

Description

This is a successor to #184, which succeeded #43. Should fix #192.

This library adapts the C++ functions merge and inplace_merge.

This library also adapts the C++ functions set_union, set_intersection, set_difference, and set_symmetric_difference. (They share the same internal implementation as merge.)

The main changes are:

  • The function and parameter names include "sorted," to remind programmers that the sequence arguments need to be sorted.
  • The eager-merging functions were changed from free functions to initializers for RangeReplaceableCollection. Lazy merging still uses a free function, which was renamed.
  • Add functions to merge adjacent sorted partitions of a MutableCollection. One version uses extra space to work faster, the other goes in-place to save space.

Detailed Design

The library composes of:

  • A type enumerating desired set operations.
  • An iterator with the core merging/set operation code.
  • A lazy sequence that vends the merging iterator above.
  • Functions that return the merging sequence above.
  • Initializers that merge (and possibly set-operate) the sequence arguments as the receiver's data.
  • Functions that merge consecutive sorted partitions, with speed- or space-optimizations.
extension MutableCollection {
    /// Given a partition point,
    /// where each side is sorted according to the given predicate,
    /// rearrange the elements until a single sorted run is formed.
    public mutating func mergeSortedPartitions(across pivot: Index, sortedBy areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows
}

extension MutableCollection where Self.Element : Comparable {
    /// Given a partition point, where each side is sorted,
    /// rearrange the elements until a single sorted run is formed.
    @inlinable public mutating func mergeSortedPartitions(across pivot: Index)
}

extension MutableCollection where Self : BidirectionalCollection {
    /// Given a partition point,
    /// where each side is sorted according to the given predicate,
    /// rearrange the elements until a single sorted run is formed,
    /// using minimal scratch memory.
    public mutating func mergeSortedPartitionsInPlace(across pivot: Index, sortedBy areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows
}

extension MutableCollection where Self : BidirectionalCollection, Self.Element : Comparable {
    /// Given a partition point, where each side is sorted,
    /// rearrange the elements until a single sorted run is formed,
    /// using minimal scratch memory.
    @inlinable public mutating func mergeSortedPartitionsInPlace(across pivot: Index)
}

/// Description of which elements of a merger will be retained.
public enum MergerSubset : UInt, CaseIterable {
    case none, firstWithoutSecond, secondWithoutFirst, symmetricDifference,
         intersection, first, second, union,
         sum
}

extension MergerSubset {
    /// Whether the elements exclusive to the first source are emitted.
    @inlinable public var emitsExclusivesToFirst: Bool { get }
    /// Whether the elements exclusive to the second source are emitted.
    @inlinable public var emitsExclusivesToSecond: Bool { get }
    /// Whether the elements shared by both sources are emitted.
    @inlinable public var emitsSharedElements: Bool { get }

    /// Create a filter specifying a full merge (duplicating the shared elements).
    @inlinable public init()
    /// Create a filter specifying which categories of elements are included in
    /// the merger, with shared elements consolidated.
    public init(keepExclusivesToFirst: Bool, keepExclusivesToSecond: Bool, keepSharedElements: Bool)
}

extension RangeReplaceableCollection {
    /// Given two sequences that are both sorted according to the given predicate,
    /// treat them as sets, and create the sorted result of the given set
    /// operation.
    public init<T, U>(mergeSorted first: T, and second: U, retaining filter: MergerSubset = .sum, sortedBy areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows where T : Sequence, U : Sequence, Self.Element == T.Element, T.Element == U.Element
}

extension RangeReplaceableCollection where Self.Element : Comparable {
    /// Given two sorted sequences, treat them as sets, and create the sorted
    /// result of the given set operation.
    @inlinable public init<T, U>(mergeSorted first: T, and second: U, retaining filter: MergerSubset = .sum) where T : Sequence, U : Sequence, Self.Element == T.Element, T.Element == U.Element
}

/// Given two sequences that are both sorted according to the given predicate
/// and treated as sets, apply the given set operation, returning the result as
/// a lazy sequence also sorted by the same predicate.
public func mergeSorted<T, U>(_ first: T, _ second: U, retaining filter: MergerSubset = .sum, sortedBy areInIncreasingOrder: @escaping (T.Element, U.Element) -> Bool) -> MergeSortedSequence<LazySequence<T>, LazySequence<U>> where T : Sequence, U : Sequence, T.Element == U.Element

/// Given two sorted sequences treated as sets, apply the given set operation,
/// returning the result as a sorted lazy sequence.
@inlinable public func mergeSorted<T, U>(_ first: T, _ second: U, retaining filter: MergerSubset = .sum) -> MergeSortedSequence<LazySequence<T>, LazySequence<U>> where T : Sequence, U : Sequence, T.Element : Comparable, T.Element == U.Element

/// A sequence that lazily vends the sorted result of a set operation upon
/// two sorted sequences treated as sets spliced together, using a predicate as
/// the sorting criteria for all three sequences involved.
public struct MergeSortedSequence<First, Second>
where First : Sequence, Second : Sequence, First.Element == Second.Element
{ /*...*/ }

extension MergeSortedSequence : Sequence { /*...*/ }

extension MergeSortedSequence : LazySequenceProtocol {}

/// An iterator that applies a set operation on two virtual sequences,
/// both treated as sets sorted according a predicate, spliced together to
/// vend a virtual sequence that is also sorted.
public struct MergeSortedIterator<First, Second>
where First : IteratorProtocol, Second : IteratorProtocol, First.Element == Second.Element
{ /*...*/ }

extension MergeSortedIterator : IteratorProtocol { /*...*/ }

Documentation Plan

A simple breakdown list and a guide have been provided.

Test Plan

A test file has been provided.

Source Impact

The changes should be additive.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

CTMacUser added 30 commits July 11, 2024 13:12
Add initializers to RangeReplaceableCollection that merge two sorted sequence arguments. Add a function returning a lazy sequence that can merge two other sorted sequences.

Add initializers to RangeReplaceableCollection that treat their two sorted sequence arguments as sets, then generates a result of the given set operation. These are included since they use the same base splicing code as straight mergers.

Add support types for mergers and set operations. One describes the specific set operation requested, including a straight merger. The others are a sequence that lazily generates a set operation or merger, and its corresponding iterator.

TODO: Add tests and a guide.
Add methods to MutableCollection that merge two sorted partitions within the same collection.

TODO: Add tests and a guide.
The set-operation initializers made everything more convoluted versus using a subset parameter in the base initializer.
Change the code for the prime eager merger function to reuse the base sequence and iterator easier. This mandated that the merger sequence object had to accept throwing predicates. This change is internal and users cannot exploit it.
Since the filter type's values model future changes, change the tense of its description from past to future. Correct some items' publicity, hiding internal details.
Add methods to MutableCollection that merge two sorted partitions within the same collection, but not requiring extra scratch space.
Remove references to deleted initializers. Add references to the in-place partition-merging functions.
Also include a test for the code that appears in the new Guide.
Change the set-operation merger sequence to make its lazy status conditional.
Correct a copy-and-paste error, where the some original functions' decreased capabilities were not updated. (Those capabilities stayed in the functions' copies.)

Mark all the functions above as not eagerly generating their results. These functions' results don't actually conform to being a lazy sequence unless both source sequences also are lazy.
Change the general list of functions and the descriptions of the merging functions to reflect the split between the merge-only and subsetting variants.
Change the detailed design to accommodate for the split between the full versus subsetting merger functions.
Remove the quick partition-merge functions.

Move the lower-memory partition-merge functions to their own file. Rename the functions, lessening the amount of "sort" in the labeling. Replace the general throw markers with exact-error markers.

Make the corresponding changes to the tests.
Remove initializer-based merging. Replace and rename all the free functions and their support sequence/iterator types. Adjust the documentation and testing to match.

Remove filter-less merging. Rename all the (lazy) free functions and their support sequence and iterator types. Replace the merging initializers with (eager) free functions.
Remove the parameter to specify the return type of eager mergers; they now are locked to being of the standard Array type. As this gives the eager merging functions the same name as the lazy versions, the lazy versions have been renamed to avoid ambiguity.
The sequence to merge other sequences needs to specify the ordering predicate's error type. But public use of that name never needs that error type. Rename the merging sequence to a un-public name. Add an alias from that un-public name to the old public name but removing the error type.
@ankushkushwaha
Copy link

can you please update PR with the base branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The merge algorithm for combining sorted collections is missing.
3 participants