Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Providing better experience for doing Exact Search without Script Score Query #1079

Open
navneet1v opened this issue Aug 31, 2023 · 10 comments
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Aug 31, 2023

Is your feature request related to a problem?
Currently if a user has to do exact search, they need to use script query to the exact search. Ref: https://opensearch.org/docs/latest/search-plugins/knn/knn-score-script/ This is not intuitive and uses an extra hop, which is script compilation. Given that scripts are notorious and posses security concerns running in OpenSearch in multi tenant environments.

What solution would you like?
Solution is to provide the exact search feature in the k-nn query clause itself. Given that k-NN vectors are stored as doc values in the segment, during query execution code can easily iterate over these doc values of the segment to do the exact search.

What alternatives have you considered?
NA

Do you have any additional context?
NA

Similar Issue: #1078

@navneet1v navneet1v added untriaged enhancement Features Introduces a new unit of functionality that satisfies a requirement backlog and removed untriaged labels Aug 31, 2023
@jmazanec15
Copy link
Member

jmazanec15 commented Aug 31, 2023

I like it. Something like this?

GET my-knn-index-1/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector2": {
        "vector": [2, 3, 5, 6],
        "k": 2,
        "exact": true
      }
    }
  }
}

then default exact to false for bwc?

@navneet1v
Copy link
Collaborator Author

@jmazanec15 this is inline to the thought I had when creating this issue. Do we see any other alternative here?

I see one, which is creating a new query clause but thats not a good option. So I never added it. So lets stick to this one unless some one has any better ideas.

@vamshin @heemin32 any thoughts you have here.

@vamshin vamshin moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Oct 5, 2023
@navneet1v
Copy link
Collaborator Author

navneet1v commented Jun 27, 2024

On thinking over this more I think, what we can do is we can start taking another option which is used for text fields too. We can use this field index(similar to store) in the field with value as false to indicate that whether to index vectors or not. Indexing vectors here means create KNN data structures or not.

Example:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "index":false
      }
    }
  }
}

We can then pass this set the value as an attribute in VectorField, and read this value in PerFieldCodec to hit a new plain codec that just stores the vectors.

This approach will work for all the engines.

cc: @vamshin , @jmazanec15 , @luyuncheng what your thought.

@jmazanec15
Copy link
Member

I think that makes sense. So this would fall back to brute force knn search?

@navneet1v
Copy link
Collaborator Author

I think that makes sense. So this would fall back to brute force knn search?

Yes that is correct.

@vamshin vamshin moved this from Backlog (Hot) to 2.17.0 in Vector Search RoadMap Jul 2, 2024
@vamshin vamshin added v2.17.0 and removed backlog labels Jul 2, 2024
@jmazanec15
Copy link
Member

@navneet1v do you think instead we can do "method": false? Right now, method specifies the structure for ANN search. So, if we set to false, it would make sense that we do not want to do ANN search.

@navneet1v
Copy link
Collaborator Author

navneet1v commented Jul 18, 2024

@navneet1v do you think instead we can do "method": false? Right now, method specifies the structure for ANN search. So, if we set to false, it would make sense that we do not want to do ANN search.

actually method false we can do but in Opensearch that way we define things are not indexed is by saying index: false. People can still do search using doc values. Similarly here users can do the search via VectorValues. I see that as more seamless experience.

Another thing is method:false doesn't work with LegacyFieldMapper

@jmazanec15
Copy link
Member

Got it. Then, I think we will need to ensure that method, model_id, and index:false, are all mutually exclusive

@navneet1v
Copy link
Collaborator Author

Yes that is correct. See index: false govern that should we be creating vector data structures or not. It has that simple job.

@harishbhakuni
Copy link

I am working on this. refer #2368 for detailed proposal.
Created these sub issues for tracking:

Next followup of this could be disablement of scoring script but need to think more on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

5 participants