Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): Multishard cutoffs #1924

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

dranikpg
Copy link
Contributor

No description provided.

@dranikpg
Copy link
Contributor Author

My condolences to the one reviewing it 🙂

@kostasrim
Copy link
Contributor

kostasrim commented Sep 26, 2023

My condolences to the one reviewing it 🙂

I actually wrote Vlad - Kostas 1-1 only to not submit the comment because it would have been noise. Anyway, when you wrote this I could not resist not to write this msg 😛

I would argue mine was simpler because it was mostly boilerplate (I haven't looked yours TBH so it could be the same case) so it's not really 1-1 and probably you are getting the 👑 👨‍🍳

@kostasrim kostasrim self-requested a review September 26, 2023 07:09
@dranikpg dranikpg changed the title WIP feat(search): Multishard cutoffs feat(search): Multishard cutoffs Oct 29, 2023
@dranikpg
Copy link
Contributor Author

dranikpg commented Oct 29, 2023

@kostasrim @chakaz

Please take a quick look at this. There is still some polishment left but the general "structure" is ready.

I'd wish for this to be merged before we announce search because it's important for fast multishard queries

PS: Yes there is lots of code and it's complicated 🙂 I'd start looking from MultiShardSearch::Run because it's the main entry point w everything else being tailored to fit it's needs

Copy link
Collaborator

@chakaz chakaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through this PR, it looks good.
What is missing, except for that TODO around how many items to retrieve?
Also see some minor questions/nits

src/server/search/doc_index.h Outdated Show resolved Hide resolved
src/server/search/doc_index.h Show resolved Hide resolved
Comment on lines +64 to +70
size_t avg_shard_min = hits * intlog2(hits) / (12 + shard_set->size() / 10);
avg_shard_min -= min(avg_shard_min, min(hits, size_t(5)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the rationale here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimentally from #1892 😄 Those formulas might not be final

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd link to that, at least until you change it :)

// If all shards have at least avg min, keep the bare minimum needed to cover the request
size_t limit = requested / shards + 1;

// Aggregations like SORTBY and KNN reorder the result and thus introduce some variance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to fetch all matching for SORTBY?

Copy link
Contributor Author

@dranikpg dranikpg Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No/Yes 🤓 We have to fetch all, but not serialize all. That is what the doc reference is for. If we find out when building the reply that some entries should have been included but are not serialized, we'll refill (see BuildSortedOrder())

dranikpg and others added 16 commits June 22, 2024 10:05
Signed-off-by: Vladislav Oleshko <[email protected]>
* combine replication tests and reg tests in one flow
* allow replication tests to run on arm
…ragonflydb#3173)

* fix(cluster): Support `FLUSHALL` while slot migration is in progress

Fixes dragonflydb#3132

Also do a small refactor to move cancellation logic into
`RestoreStreamer`.
…db#3171)

* fix: fix RegisterOnChange methods for journal and db_slice. Call db_slice and journal callbacks atomically. Made a hack to avoid deadlock during SAVE
…pes (dragonflydb#3181)

* fix(generic_family): fix RenameGeneric command for non-string data types

fixes dragonflydb#3107, fixes dragonflydb#3113, fixes dragonflydb#307

Signed-off-by: Stepan Bagritsevich <[email protected]>

* refactor: address comments

Signed-off-by: Stepan Bagritsevich <[email protected]>

* refactor: address comments 2

Signed-off-by: Stepan Bagritsevich <[email protected]>

---------

Signed-off-by: Stepan Bagritsevich <[email protected]>
…rt` (dragonflydb#3193)

That was a misleading name, as the logic was the exact opposite (oops 🤦)

This PR introduces a new name for the same flag: break_replication_on_master_restart

We're keeping the previous flag for now, to make transition easier. We'll remove it in a later Dragonfly version (>= 1.22)

Fixes dragonflydb#3192
kostasrim and others added 6 commits June 23, 2024 13:01
* add support for multiple passwords
* add support for deleting passwords
* add support for resetpass
* add tests
* always prefix passwords with hashtag when printed
* chore(tiering): More advanced tiering tests

* fix: fixes
* add replica-priority flag
* add it on info replication command
* add test
Signed-off-by: Vladislav Oleshko <[email protected]>
Signed-off-by: Vladislav Oleshko <[email protected]>
@dranikpg dranikpg force-pushed the multishard-search branch from 52856f4 to d4ee35d Compare June 23, 2024 10:01
Signed-off-by: Vladislav Oleshko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants