-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009
base: main
Are you sure you want to change the base?
Conversation
The build fails with Edit: It has been moved to backward codecs. Will use something more stable. |
I have a preliminary benchmark here (top-k=100, fanout=0) using Cohere 768 dataset. Anyhow I can see these 2 things that should be addressed:
|
} | ||
Weight weight = indexSearcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1.0f); | ||
HitQueue queue = new HitQueue(k, false); | ||
for (var leaf : reader.leaves()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be switched to parallel execution similar to AbstractKnnVectorQuery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I was using single-thread as a simple version and try to benchmark the latency first, since multi-thread could add some overhead as well. This class only does vector loading and similarity computation for a small set of vectors (k * oversample) so it's not as CPU-intensive as the AbstractKnnVectorQuery
I'll also try multi-thread and run the benchmark again. From the below benchmark, the re-ranking phase only adds a trivial amount of latency it might not help much. Also the benchmark code seems to force merge so there's only a single partition, we need to change so that there are multiple partitions.
Edit: My previous benchmark was wrong because the vectors are corrupted First benchmark show the recall improvement for each oversample with reranking. It now aligns with what was produced in #13651. Second benchmark compare the latency across all algorithms. We are still adding only a small latency for the reranking phase. Last benchmark, I just ran oversample without reranking, but still cutoff at original K (so they act similar to fanout). This is just to make sure that the reranking phase actually adds value. Expectedly, the recall does not improve much compared to the reranking. |
Also this is the luceneutil branch I used for benchmarking: https://github.com/dungba88/luceneutil/tree/dungba88/two-phase-search, which incorporates the test for BQ implementation by @benwtrent and the two-phase search. |
float expectedScore = VECTOR_SIMILARITY_FUNCTION.compare(targetVector, docVector); | ||
Assert.assertEquals( | ||
"Score does not match expected similarity for doc ord: " + scoreDoc.doc + ", id: " + id, | ||
expectedScore, | ||
scoreDoc.score, | ||
1e-5); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test that the results are sorted by exact distance.
Maybe we can also test that the result of the same query with oversample will be "at lease the same or better" than without oversample ? By "better" I mean we should have higher recall. But I'm not sure if it's deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, the docs should be sorted by ord, so my first point should be irrelevant.
HitQueue queue = new HitQueue(k, false); | ||
for (var leaf : reader.leaves()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have the access to IndexSearcher#getTaskExecutor and could use it to parallelize the work across segments(like we did earlier with some other query rewrites). But the HitQueue
here isn't thread-safe. I don't know if using concurrency after making insertWithOverflow
thread-safe would be really helpful since it looks like the added cost is cheap? or Maybe it will be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. In order to apply parallelism we need to use a per-segment queue, then merge it like in AbstractKnnVectorQuery.mergeLeafResults. I think the added latency is already low, but still want to try if it helps.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Description
fixes #13564
Added a new Query which wraps around KnnFloatVectorQuery and does re-ranking for quantized index using full precision vectors. The idea is to first run KnnFloatVectorQuery with over-sampled k (x1.5, x2, x5, etc) and then re-rank the docs using full-precision (original, non-quantized) vector, and finally take top-k.
Questions:
target
insideKnnFloatVectorQuery
so that users don't need to pass the target twice? Currently it only exposes thegetTargetCopy()
which requires array copy so it's inefficient, but I assume the intention is to encapsulate the array so that it won't be modified from outside?mlock
for preventing the quantized vectors from being swapped out, as loading fp vectors (although only a small set per query) means there will more pressure on RAM.Usage: