[Bug]: Milvus indexing use GPU but searching use CPU #38986

namogg · 2025-01-03T08:48:34Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: milvusdb/milvus:v2.5.1-gpu
- Deployment mode: standalone    
- SDK version: 2.5.2
- OS(Ubuntu or CentOS): Ubuntu 22.04
- CPU/Memory: 33GB
- GPU: RTX 4060Ti

Current Behavior

I have install milvus standalone with GPU. Milvus does indexing with GPU sucessfully but when i tried to search, milvus use all of my CPU.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yanliang567 · 2025-01-03T10:16:09Z

@namogg how do you recognize the indexing use GPU, and how do you recognize searching use CPU?
please attach the milvus pog logs for investigation.

xiaofan-luan · 2025-01-04T02:00:29Z

even search on GPU will take cpu usage, for example:
filterings
serialization/descrialization
routing and rpc
....

What is your cpu usage and gpu usage separated? and the QPS, data size..

COCO-hy · 2025-01-04T16:43:49Z

i have the same problem, please tell me why

namogg · 2025-01-05T03:11:20Z

Here is my testing code.

from pymilvus import (
    connections, 
    Collection, 
    CollectionSchema, 
    FieldSchema, 
    DataType
)
import numpy as np
import time
import math

# Step 1: Connect to Milvus
connections.connect(host="127.0.0.1", port="19530")

# Step 2: Define Collection Schema
collection_name = "gpu_performance_test"
dim = 512  # Vector dimension
db_size = 40000
batch_size = 1000  # Batch size for insertion
vectors = np.random.random((db_size, dim)).astype("float32")
# Define fields
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields, description="Test GPU Collection")


# Step 4: Recreate the collection
collection = Collection(name=collection_name, schema=schema)
collection.drop()
collection = Collection(name=collection_name, schema=schema)
#collection.drop()
print(f"Collection {collection_name} created.")

# Step 4: Create IVF-PQ Index
index_params = {
    "index_type": "GPU_IVF_PQ",  # GPU-accelerated index
    "metric_type": "L2",     # Distance metric
    "params": {"nlist": int(math.sqrt(db_size)), "m": 16},  # Index params: nlist and m
}
# index_params = {
#     "metric_type": "L2",
#     "index_type": "GPU_CAGRA",
#     "params": {
#         'intermediate_graph_degree': 64,
#         'graph_degree': 32
#     }
# }
# index_params = {
#     "index_type": "HNSW",  # GPU-accelerated index
#     "metric_type": "L2",     # Distance metric
# }
collection.create_index(field_name="embedding", index_params=index_params)

print("Creating index...")
  # Total number of vectors


# Step 4: Insert Data in Batches
print("Inserting data in batches...")
insert_time_start = time.time()

for i in range(0, db_size, batch_size):
    batch_vectors = vectors[i:i + batch_size]
    collection.insert([batch_vectors])  # Insert the current batch
    print(f"Inserted batch {i // batch_size + 1} with {len(batch_vectors)} vectors.")

insert_time_end = time.time()
print(f"Data insertion completed in {insert_time_end - insert_time_start:.4f} seconds.")

collection.load()
print("Index created successfully.")

# Step 5: Run Search and Measure Performance
# Generate query vectors
num_queries = 500
query_vectors = np.random.random((num_queries, dim)).astype("float32")

# Define search parameters
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 20},
}

# Run search and measure latency
for i in range(10000):
    start_time = time.time()from pymilvus import (
    connections, 
    Collection, 
    CollectionSchema, 
    FieldSchema, 
    DataType
)
import numpy as np
import time
import math

# Step 1: Connect to Milvus
connections.connect(host="127.0.0.1", port="19530")

# Step 2: Define Collection Schema
collection_name = "gpu_performance_test"
dim = 512  # Vector dimension
db_size = 40000
batch_size = 1000  # Batch size for insertion
vectors = np.random.random((db_size, dim)).astype("float32")
# Define fields
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields, description="Test GPU Collection")


# Step 4: Recreate the collection
collection = Collection(name=collection_name, schema=schema)
collection.drop()
collection = Collection(name=collection_name, schema=schema)
#collection.drop()
print(f"Collection {collection_name} created.")

# Step 4: Create IVF-PQ Index
index_params = {
    "index_type": "GPU_IVF_PQ",  # GPU-accelerated index
    "metric_type": "L2",     # Distance metric
    "params": {"nlist": int(math.sqrt(db_size)), "m": 16},  # Index params: nlist and m
}
# index_params = {
#     "metric_type": "L2",
#     "index_type": "GPU_CAGRA",
#     "params": {
#         'intermediate_graph_degree': 64,
#         'graph_degree': 32
#     }
# }
# index_params = {
#     "index_type": "HNSW",  # GPU-accelerated index
#     "metric_type": "L2",     # Distance metric
# }
collection.create_index(field_name="embedding", index_params=index_params)

print("Creating index...")
  # Total number of vectors


# Step 4: Insert Data in Batches
print("Inserting data in batches...")
insert_time_start = time.time()

for i in range(0, db_size, batch_size):
    batch_vectors = vectors[i:i + batch_size]
    collection.insert([batch_vectors])  # Insert the current batch
    print(f"Inserted batch {i // batch_size + 1} with {len(batch_vectors)} vectors.")

insert_time_end = time.time()
print(f"Data insertion completed in {insert_time_end - insert_time_start:.4f} seconds.")

collection.load()
print("Index created successfully.")

# Step 5: Run Search and Measure Performance
# Generate query vectors
num_queries = 500
query_vectors = np.random.random((num_queries, dim)).astype("float32")

# Define search parameters
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 20},
}

# Run search and measure latency
for i in range(10000):
    start_time = time.time()
    results = collection.search(query_vectors, "embedding", param=search_params, limit=10)
    end_time = time.time()

    # Display search results and performance
    print(f"Search completed in {end_time - start_time:.4f} seconds.")
# for i, result in enumerate(results):
#     print(f"Query {i + 1} results:")
#     for hit in result:
#         print(f"  ID: {hit.id}, Distance: {hit.distance}")

# Step 6: Clean Up
print("Performance test completed. Cleaning up...")
collection.drop()
print("Collection dropped.")

    results = collection.search(query_vectors, "embedding", param=search_params, limit=10)
    end_time = time.time()

    # Display search results and performance
    print(f"Search completed in {end_time - start_time:.4f} seconds.")
# for i, result in enumerate(results):
#     print(f"Query {i + 1} results:")
#     for hit in result:
#         print(f"  ID: {hit.id}, Distance: {hit.distance}")

# Step 6: Clean Up
print("Performance test completed. Cleaning up...")
collection.drop()
print("Collection dropped.")

GPU usage is around 5-20%.
CPU usage is almost 90%.
Query time for 1000 vectors is 300ms, which is too high if it were using GPU.

namogg · 2025-01-05T07:10:06Z

I have tried increase DB size to 1Mil, it is using GPU now.But is it normal for CPU usage to be this high if i was using GPU, and it seems to be really slow.

namogg · 2025-01-05T07:19:49Z

@namogg how do you recognize the indexing use GPU, and how do you recognize searching use CPU? please attach the milvus pog logs for investigation.

How do i get milvus pog logs?

namogg · 2025-01-05T10:17:41Z

If my DB size is around 1M, milvus start to use GPU after some iteration. If my DB size is 100k, it still use CPU all the time . I wonder if there is any mistake in my implementation or config.

xiaofan-luan · 2025-01-05T14:47:33Z

If my DB size is around 1M, milvus start to use GPU after some iteration. If my DB size is 100k, it still use CPU all the time . I wonder if there is any mistake in my implementation or config.

actually you can run a pprof to check your cpu usage flame graph.

But I guess most likely the data is running on GPU. The reason you see GPU usage is not high because performance bottleneck is not on vector search. (May RPC and result serdes).

After you increase the data size, computation becomes heavier.

Also for NQ=1000, 300ms might not be a very bad number(I'm assuming Cagra could be better).

We need more profiling result on cpu instance

COCO-hy · 2025-01-06T03:24:24Z

I checked my Milvus-standalone Docker container, and I can see the GPU information and the Milvus process running inside. Even though I created a GPU index, my GPU memory does not change at all during queries.
Below is the .yml configuration file used to start my milvus-standalone.

standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.5.1-gpu
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
MILVUS_WEB_UI_ENABLED: true
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
ports:
- "8888:19530"
- "9091:9091"
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: ['gpu']
device_ids: ['0']
depends_on:
- "etcd"
- "minio"

attu:
container_name: attu
image: zilliz/attu:v2.5.0
environment:
MILVUS_URL: standalone:19530
ports:
- "7676:3000"
depends_on:
- "standalone"

networks:
default:
name: milvus

xiaofan-luan · 2025-01-06T04:01:08Z

I checked my Milvus-standalone Docker container, and I can see the GPU information and the Milvus process running inside. Even though I created a GPU index, my GPU memory does not change at all during queries. Below is the .yml configuration file used to start my milvus-standalone.

standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.5.1-gpu command: ["milvus", "run", "standalone"] security_opt: - seccomp:unconfined environment: MILVUS_WEB_UI_ENABLED: true ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus ports: - "8888:19530" - "9091:9091" deploy: resources: reservations: devices: - driver: nvidia capabilities: ['gpu'] device_ids: ['0'] depends_on: - "etcd" - "minio"

attu: container_name: attu image: zilliz/attu:v2.5.0 environment: MILVUS_URL: standalone:19530 ports: - "7676:3000" depends_on: - "standalone"

networks: default: name: milvus

because you don't do flush on your collections.

xiaofan-luan · 2025-01-06T04:03:17Z

before segment sealed, all the data will be at cpu since GPU index don't support grow

COCO-hy · 2025-01-06T05:35:31Z

before segment sealed, all the data will be at cpu since GPU index don't support grow

I have made sure that the collection has been flushed.

namogg added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2025

namogg assigned yanliang567 Jan 3, 2025

yanliang567 assigned namogg and unassigned yanliang567 Jan 3, 2025

yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Milvus indexing use GPU but searching use CPU #38986

[Bug]: Milvus indexing use GPU but searching use CPU #38986

namogg commented Jan 3, 2025

yanliang567 commented Jan 3, 2025

xiaofan-luan commented Jan 4, 2025

COCO-hy commented Jan 4, 2025

namogg commented Jan 5, 2025

namogg commented Jan 5, 2025

namogg commented Jan 5, 2025

namogg commented Jan 5, 2025

xiaofan-luan commented Jan 5, 2025

COCO-hy commented Jan 6, 2025

xiaofan-luan commented Jan 6, 2025

xiaofan-luan commented Jan 6, 2025

COCO-hy commented Jan 6, 2025

[Bug]: Milvus indexing use GPU but searching use CPU #38986

[Bug]: Milvus indexing use GPU but searching use CPU #38986

Comments

namogg commented Jan 3, 2025

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Jan 3, 2025

xiaofan-luan commented Jan 4, 2025

COCO-hy commented Jan 4, 2025

namogg commented Jan 5, 2025

namogg commented Jan 5, 2025

namogg commented Jan 5, 2025

namogg commented Jan 5, 2025

xiaofan-luan commented Jan 5, 2025

COCO-hy commented Jan 6, 2025

xiaofan-luan commented Jan 6, 2025

xiaofan-luan commented Jan 6, 2025

COCO-hy commented Jan 6, 2025