From 6e7d9e040f9be9734277c3f27b2cb364a67f442d Mon Sep 17 00:00:00 2001 From: simshi Date: Tue, 28 May 2024 11:47:04 -0700 Subject: [PATCH] fix algorithm of spreading vectors over shards (#3374) Summary: simple math: | **input n** | **input nshards** | shard_size | idx | i0 | ni | | -- |-- |-- |-- |-- |-- | | 19 | 6 | 4 | 5 | 20 | **-1** | | 1000 | 37 | 28 | 36 | 1008 | -8 | | 1000 | 64 | 16 | 63 | 1008 | -8 | root cause: integer cause precision loss, `idx * shard_size` overflows, because `(n + nshards - 1) / nshards` is roundup my solution: each shard takes at least `base_shard_size = n / nshards`, then `remain = n % nshards`, we know `0 <= remain < nshards`, next, assign those remain vectors to first `remain` shards, i.e. first `remain` shards take one more vector each. ```c++ auto i0 = idx * base_shard_size; if (i0 < remain) { // if current idx is one of the first `remain` shards i0 += idx; } else { i0 += remain; } ``` simplify above code: `i0 = idx * base_shard_size + std::min(size_t(idx), n % nshards);` Pull Request resolved: https://github.com/facebookresearch/faiss/pull/3374 Reviewed By: fxdawnn Differential Revision: D57867910 Pulled By: junjieqi fbshipit-source-id: 7e72ea5cd197af4f3446fb7a3fd34ad08901dbb2 --- faiss/gpu/GpuIcmEncoder.cu | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/faiss/gpu/GpuIcmEncoder.cu b/faiss/gpu/GpuIcmEncoder.cu index 434fae9e36..8bd60f91b8 100644 --- a/faiss/gpu/GpuIcmEncoder.cu +++ b/faiss/gpu/GpuIcmEncoder.cu @@ -82,7 +82,7 @@ void GpuIcmEncoder::encode( size_t n, size_t ils_iters) const { size_t nshards = shards->size(); - size_t shard_size = (n + nshards - 1) / nshards; + size_t base_shard_size = n / nshards; auto codebooks = lsq->codebooks.data(); auto M = lsq->M; @@ -94,8 +94,14 @@ void GpuIcmEncoder::encode( // split input data auto fn = [=](int idx, IcmEncoderImpl* encoder) { - size_t i0 = idx * shard_size; - size_t ni = std::min(shard_size, n - i0); + size_t i0 = idx * base_shard_size + std::min(size_t(idx), n % nshards); + size_t ni = base_shard_size; + if (ni < n % nshards) { + ++ni; + } + if (ni <= 0) { // only if n < nshards + return; + } auto xi = x + i0 * d; auto ci = codes + i0 * M; std::mt19937 geni(idx + seed); // different seed for each shard