Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent filter query in KNNQueryBuilder across multiple shards #2359

Merged
merged 9 commits into from
Jan 3, 2025

Conversation

buddharajusahil
Copy link
Contributor

@buddharajusahil buddharajusahil commented Dec 30, 2024

Description

This change is to resolve issue #2339. This PR addresses it by changing some flawed logic in the rewrite function of the KNNQueryBuilder class, introduced in this PR: #1874. What was happening before was the filter for the KNNQueryBuilder object was being rewritten upon contact with first shard, and since this instance is getting shared across all shards, other shards are using the rewritten filter from the first shard.

What this causes is, if shard id 0 has no documents and shard id 1 has 1 document, if query lands on shard 0, the filter is rewritten to return none, as shard id 0 has no documents, so the filter is adjusted. However, since we were changing the filter in the KNN query object, rather than creating a new rewritten object for shard 0, when the query lands on shard 1, the filter has been rewritten to return none from shard 0, so even though the document is on shard 1, we will return no hits.

Now it is changed so that the rewrite class returns a new KNNQueryBuilder with the rewritten filter per shard.

Related Issues

Resolves #2339

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Sahil Buddharaju added 3 commits December 30, 2024 15:59
Signed-off-by: Sahil Buddharaju <[email protected]>
Signed-off-by: Sahil Buddharaju <[email protected]>
Signed-off-by: Sahil Buddharaju <[email protected]>
@navneet1v
Copy link
Collaborator

What was happening before was the filter for the KNNQueryBuilder object was being rewritten upon contact with first shard, and since this instance is getting shared across all shards, other shards are using the rewritten filter from the first shard.

Hi @buddharajusahil thanks for creating the PR. can you please add some details why above thing is impacting the queries?

src/test/java/org/opensearch/knn/index/FaissIT.java Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@jmazanec15
Copy link
Member

Thanks for fix @buddharajusahil - seems this is consistent with how bool query does it - https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/BoolQueryBuilder.java#L367-L401.

For the test cases, did you confirm that they fail before change is introduced and not after just to validate the fix?

@buddharajusahil
Copy link
Contributor Author

@jmazanec15 Hi Jack, I decided to add new IT tests, as martin's were moreso checking if unknown filters were being properly written. With these new IT's, they are able to properly fail with the old rewrite, and succeed with the change.

@shatejas shatejas changed the title Filterbug Ensure consistent filter queries in KNNQueryBuilder across multiple shards. Jan 1, 2025
Copy link
Collaborator

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the fields in KNNQueryBuilder final?

@shatejas shatejas changed the title Ensure consistent filter queries in KNNQueryBuilder across multiple shards. Ensure consistent filter query in KNNQueryBuilder across multiple shards. Jan 1, 2025
@shatejas shatejas changed the title Ensure consistent filter query in KNNQueryBuilder across multiple shards. Ensure consistent filter query in KNNQueryBuilder across multiple shards Jan 1, 2025
@navneet1v
Copy link
Collaborator

What this causes is, if shard id 0 has no documents and shard id 1 has 1 document, if query lands on shard 0, the filter is rewritten to return none, as shard id 0 has no documents, so the filter is adjusted. However, since we were changing the filter in the KNN query object, rather than creating a new rewritten object for shard 0, when the query lands on shard 1, the filter has been rewritten to return none from shard 0, so even though the document is on shard 1, we will return no hits.

@buddharajusahil This was a great find. Thanks for the deep-dive.

@navneet1v
Copy link
Collaborator

Should we make the fields in KNNQueryBuilder final?

not sure how making things final will help here? Final helps only in case when references are changing. Do you think any other reason here for making things final

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@buddharajusahil
Code looks good to me. Please fix the comments added by @shatejas , as I have no more extra comments apart from his comments

Signed-off-by: Sahil Buddharaju <[email protected]>
@shatejas
Copy link
Collaborator

shatejas commented Jan 3, 2025

Should we make the fields in KNNQueryBuilder final?

not sure how making things final will help here? Final helps only in case when references are changing. Do you think any other reason here for making things final

@navneet1v here the filter was getting reassigned in 1 shard which was affecting another shard. Making all fields final will makes sure that KNNQueryBuilder becomes immutable (sort of) and any change where the final is reverted can be identified

shatejas
shatejas previously approved these changes Jan 3, 2025
Copy link
Collaborator

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

Signed-off-by: Sahil Buddharaju <[email protected]>
Signed-off-by: Sahil Buddharaju <[email protected]>
@navneet1v navneet1v merged commit c969f1d into opensearch-project:main Jan 3, 2025
31 checks passed
@navneet1v navneet1v added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.19.0 backport 2.x labels Jan 3, 2025
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 3, 2025
…rds (#2359)

* Changed filter logic

Signed-off-by: Sahil Buddharaju <[email protected]>

* spotless

Signed-off-by: Sahil Buddharaju <[email protected]>

* Changelog

Signed-off-by: Sahil Buddharaju <[email protected]>

* Added unit tests to test multi shard filters

Signed-off-by: Sahil Buddharaju <[email protected]>

* Changed unit tests and used builder instead of constructor

Signed-off-by: Sahil Buddharaju <[email protected]>

* Spotless apply

Signed-off-by: Sahil Buddharaju <[email protected]>

* slight unit test adjustment

Signed-off-by: Sahil Buddharaju <[email protected]>

* changelog

Signed-off-by: Sahil Buddharaju <[email protected]>

---------

Signed-off-by: Sahil Buddharaju <[email protected]>
Signed-off-by: sahil <[email protected]>
Co-authored-by: Sahil Buddharaju <[email protected]>
(cherry picked from commit c969f1d)
Gankris96 pushed a commit to Gankris96/k-NN that referenced this pull request Jan 8, 2025
…rds (opensearch-project#2359)

* Changed filter logic

Signed-off-by: Sahil Buddharaju <[email protected]>

* spotless

Signed-off-by: Sahil Buddharaju <[email protected]>

* Changelog

Signed-off-by: Sahil Buddharaju <[email protected]>

* Added unit tests to test multi shard filters

Signed-off-by: Sahil Buddharaju <[email protected]>

* Changed unit tests and used builder instead of constructor

Signed-off-by: Sahil Buddharaju <[email protected]>

* Spotless apply

Signed-off-by: Sahil Buddharaju <[email protected]>

* slight unit test adjustment

Signed-off-by: Sahil Buddharaju <[email protected]>

* changelog

Signed-off-by: Sahil Buddharaju <[email protected]>

---------

Signed-off-by: Sahil Buddharaju <[email protected]>
Signed-off-by: sahil <[email protected]>
Co-authored-by: Sahil Buddharaju <[email protected]>
owenhalpert pushed a commit to owenhalpert/k-NN that referenced this pull request Jan 9, 2025
…rds (opensearch-project#2359)

* Changed filter logic

Signed-off-by: Sahil Buddharaju <[email protected]>

* spotless

Signed-off-by: Sahil Buddharaju <[email protected]>

* Changelog

Signed-off-by: Sahil Buddharaju <[email protected]>

* Added unit tests to test multi shard filters

Signed-off-by: Sahil Buddharaju <[email protected]>

* Changed unit tests and used builder instead of constructor

Signed-off-by: Sahil Buddharaju <[email protected]>

* Spotless apply

Signed-off-by: Sahil Buddharaju <[email protected]>

* slight unit test adjustment

Signed-off-by: Sahil Buddharaju <[email protected]>

* changelog

Signed-off-by: Sahil Buddharaju <[email protected]>

---------

Signed-off-by: Sahil Buddharaju <[email protected]>
Signed-off-by: sahil <[email protected]>
Co-authored-by: Sahil Buddharaju <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.19.0
Projects
Status: 2.19.0
Development

Successfully merging this pull request may close these issues.

[BUG] OpenSearch 2.17 K-NN efficient filtering with a Date Range Filter No Results
5 participants