Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Improved experience for exact search #2368

Open
harishbhakuni opened this issue Jan 6, 2025 · 0 comments
Open

[Proposal] Improved experience for exact search #2368

harishbhakuni opened this issue Jan 6, 2025 · 0 comments

Comments

@harishbhakuni
Copy link

harishbhakuni commented Jan 6, 2025

Background:

Today, if user wants to perform exact search on a given data set, he has to:

  • Either use score scripting. Exact search with score scripting is also helpful if user wants to use a different space type other than the one used for the index.
  • Or, set index.knn index setting as false during index creation, which when done will allow users to run all the kNN queries as exact search. (if this setting is set as false, no graph files will be created for the index).

Problems with current approach:

  1. Unnecessary vector data structure creation:
    1. If user have more than one knn_vector field. vector related data structures will be still created for all the vector fields irrespective of whether user want to perform ANN search on all of them or not.
    2. This consumes a lot of memory.
    3. related issue: [Feature Request] I have multiple knn_vector fields with settings.index.knn set to true, but it would be ideal if certain fields could use approximate k-NN search while others could only use exact k-NN. #2270
  2. Usage of score script for exact search:
    1. For doing exact search on a given field if index.knn setting is enabled, user needs to use score script query.
    2. This involves script compilation overhead during query execution.
    3. Also usually there are security concerns associated with using scripts during search. Although, there are no known security risks with the knn_score script provided by plugin as it only provides a predefined way of executing exact search. user cannot use custom logic in the script.
    4. related issue: [FEATURE] Providing better experience for doing Exact Search without Script Score Query #1079

Solution:

Solving problem (1): Unnecessary vector data structure creation

Approach 1 (Preferred):

  • Use a new parameter in the index mapping called searchMode which can have exact/ann value and can be used to determine which search should be done with a default value of ann.

Approach 2:

  • Introduce a field level parameter approximate_search_min_vectors which is the number of vectors a segment should have before creating specialized data structures for approximate search.
  • If set to -1, vector data structures will not be created and exact search will be used for the field. However, if set to 0, data structures will always get created.
  • This is similar to the currently available index setting index.knn.advanced.approximate_threshold. The new parameter takes precedence over this existing index setting.
  • Something like this:
PUT /test-index
        {
          "settings": {
            "index": {
              "knn": true
            }
          },
          "mappings": {
            "properties": {
              "vect_field_1": {         // vector data structures will always build.
                "type": "knn_vector",   // ANN will be used for the field.
                "dimension": 3,
                "approximate_search_min_vectors":0,
                ...
              },
              "vect_field_2": {         // vector data structures will never build.
                "type": "knn_vector",   // Exact search will be used for the field.
                "dimension": 3,
                "approximate_search_min_vectors":-1,
                ...
              },
              "vect_field_3": {         // vector data structures will build
                "type": "knn_vector",   // if number of vectors are greater than 5
                "dimension": 6,         // in the segment.ANN will be used for field.
                "approximate_search_min_vectors":5,
                ...
              },
              ...
            }
          }
        }
  • Reason not to prefer: This setting would be confusing with existing index setting index.knn.advanced.approximate_threshold.

Approach 3:

  • Add a new boolean parameter index in field mappings which can be used to determine if exact search should be used for the field or not.
  • Reason not to prefer: index parameter is already used as a field parameter today to determine if the field should be indexed/searchable or not. So, using it would be confusing since we still want the field to be searchable.

Solving problem (2): Usage of score script for exact search

Although exact search with scoring script involves an additional hop of script compilation. it can still be useful for the user if the user requirement involves using different space type for the knn search than the one associated with the index.

To provide the same experience without score scripting, we need to provide additional query parameters in the knn query clause itself. One of the following option can be used for that:

Option 1 (preferred): Introduce two parameters use_exact_search and exact_search_space_type in the kNN clause. This one is preferred because this provides better clarity and flexibility in the long run.
Option 2: Introduce single parameter exact_search_space_type which when set exact search will be executed for the query with the provided space type. else, ANN search will be performed if graph structures are available for the field.

Backward compatibility considerations:

  • After fixing problem (1), we should not allow users to disable knn.enabled index setting from an index if user is planning to use the index for knn related data. We should block index mapping updation if the field type is knn_vector and this setting is not enabled. However, these validations should be done with proper index version checks so that it does not impact the existing indices during version upgrades.
  • Until the score scripting for exact search is disabled, Need to think if it is possible to pass the new query parameters along with the scoring script in the same query. If it is possible, need to think if we should disable that or handle the conflicts in the values. (Need to think more on this one).

Fallback Logic considerations:

  • If space type is not provided in the query, default space type will be used for the search.
  • If approximate_search_min_vectors field parameter is not provided and exact search related parameters are not passed in the query. logic will fallback to index.knn.advanced.approximate_threshold index setting. if this index setting value is > 0, ANN search will be used for the field if vector data structures are available.

Future enhancements:

  • Support of field level default space type can be added in future as the graph file data structures are created per field anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 2.19.0
Development

No branches or pull requests

2 participants