You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, if user wants to perform exact search on a given data set, he has to:
Either use score scripting. Exact search with score scripting is also helpful if user wants to use a different space type other than the one used for the index.
Or, set index.knn index setting as false during index creation, which when done will allow users to run all the kNN queries as exact search. (if this setting is set as false, no graph files will be created for the index).
Problems with current approach:
Unnecessary vector data structure creation:
If user have more than one knn_vector field. vector related data structures will be still created for all the vector fields irrespective of whether user want to perform ANN search on all of them or not.
For doing exact search on a given field if index.knn setting is enabled, user needs to use score script query.
This involves script compilation overhead during query execution.
Also usually there are security concerns associated with using scripts during search. Although, there are no known security risks with the knn_score script provided by plugin as it only provides a predefined way of executing exact search. user cannot use custom logic in the script.
Solving problem (1): Unnecessary vector data structure creation
Approach 1 (Preferred):
Use a new parameter in the index mapping called searchMode which can have exact/ann value and can be used to determine which search should be done with a default value of ann.
Approach 2:
Introduce a field level parameter approximate_search_min_vectors which is the number of vectors a segment should have before creating specialized data structures for approximate search.
If set to -1, vector data structures will not be created and exact search will be used for the field. However, if set to 0, data structures will always get created.
This is similar to the currently available index setting index.knn.advanced.approximate_threshold. The new parameter takes precedence over this existing index setting.
Something like this:
PUT /test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"vect_field_1": { // vector data structures will always build.
"type": "knn_vector", // ANN will be used for the field.
"dimension": 3,
"approximate_search_min_vectors":0,
...
},
"vect_field_2": { // vector data structures will never build.
"type": "knn_vector", // Exact search will be used for the field.
"dimension": 3,
"approximate_search_min_vectors":-1,
...
},
"vect_field_3": { // vector data structures will build
"type": "knn_vector", // if number of vectors are greater than 5
"dimension": 6, // in the segment.ANN will be used for field.
"approximate_search_min_vectors":5,
...
},
...
}
}
}
Reason not to prefer: This setting would be confusing with existing index setting index.knn.advanced.approximate_threshold.
Approach 3:
Add a new boolean parameter index in field mappings which can be used to determine if exact search should be used for the field or not.
Reason not to prefer: index parameter is already used as a field parameter today to determine if the field should be indexed/searchable or not. So, using it would be confusing since we still want the field to be searchable.
Solving problem (2): Usage of score script for exact search
Although exact search with scoring script involves an additional hop of script compilation. it can still be useful for the user if the user requirement involves using different space type for the knn search than the one associated with the index.
To provide the same experience without score scripting, we need to provide additional query parameters in the knn query clause itself. One of the following option can be used for that:
Option 1 (preferred): Introduce two parameters use_exact_search and exact_search_space_type in the kNN clause. This one is preferred because this provides better clarity and flexibility in the long run. Option 2: Introduce single parameter exact_search_space_type which when set exact search will be executed for the query with the provided space type. else, ANN search will be performed if graph structures are available for the field.
Backward compatibility considerations:
After fixing problem (1), we should not allow users to disable knn.enabled index setting from an index if user is planning to use the index for knn related data. We should block index mapping updation if the field type is knn_vector and this setting is not enabled. However, these validations should be done with proper index version checks so that it does not impact the existing indices during version upgrades.
Until the score scripting for exact search is disabled, Need to think if it is possible to pass the new query parameters along with the scoring script in the same query. If it is possible, need to think if we should disable that or handle the conflicts in the values. (Need to think more on this one).
Fallback Logic considerations:
If space type is not provided in the query, default space type will be used for the search.
If approximate_search_min_vectors field parameter is not provided and exact search related parameters are not passed in the query. logic will fallback to index.knn.advanced.approximate_threshold index setting. if this index setting value is > 0, ANN search will be used for the field if vector data structures are available.
Future enhancements:
Support of field level default space type can be added in future as the graph file data structures are created per field anyway.
The text was updated successfully, but these errors were encountered:
Background:
Today, if user wants to perform exact search on a given data set, he has to:
Problems with current approach:
Solution:
Solving problem (1): Unnecessary vector data structure creation
Approach 1 (Preferred):
Approach 2:
index.knn.advanced.approximate_threshold
.Approach 3:
Solving problem (2): Usage of score script for exact search
Although exact search with scoring script involves an additional hop of script compilation. it can still be useful for the user if the user requirement involves using different space type for the knn search than the one associated with the index.
To provide the same experience without score scripting, we need to provide additional query parameters in the knn query clause itself. One of the following option can be used for that:
Option 1 (preferred): Introduce two parameters use_exact_search and exact_search_space_type in the kNN clause. This one is preferred because this provides better clarity and flexibility in the long run.
Option 2: Introduce single parameter exact_search_space_type which when set exact search will be executed for the query with the provided space type. else, ANN search will be performed if graph structures are available for the field.
Backward compatibility considerations:
Fallback Logic considerations:
approximate_search_min_vectors
field parameter is not provided and exact search related parameters are not passed in the query. logic will fallback toindex.knn.advanced.approximate_threshold
index setting. if this index setting value is > 0, ANN search will be used for the field if vector data structures are available.Future enhancements:
The text was updated successfully, but these errors were encountered: