Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Milvus indexing use GPU but searching use CPU #38986

Open
1 task done
namogg opened this issue Jan 3, 2025 · 12 comments
Open
1 task done

[Bug]: Milvus indexing use GPU but searching use CPU #38986

namogg opened this issue Jan 3, 2025 · 12 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@namogg
Copy link

namogg commented Jan 3, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: milvusdb/milvus:v2.5.1-gpu
- Deployment mode: standalone    
- SDK version: 2.5.2
- OS(Ubuntu or CentOS): Ubuntu 22.04
- CPU/Memory: 33GB
- GPU: RTX 4060Ti

Current Behavior

I have install milvus standalone with GPU. Milvus does indexing with GPU sucessfully but when i tried to search, milvus use all of my CPU.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@namogg namogg added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2025
@yanliang567
Copy link
Contributor

@namogg how do you recognize the indexing use GPU, and how do you recognize searching use CPU?
please attach the milvus pog logs for investigation.

@yanliang567 yanliang567 assigned namogg and unassigned yanliang567 Jan 3, 2025
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2025
@xiaofan-luan
Copy link
Collaborator

even search on GPU will take cpu usage, for example:
filterings
serialization/descrialization
routing and rpc
....

What is your cpu usage and gpu usage separated? and the QPS, data size..

@COCO-hy
Copy link

COCO-hy commented Jan 4, 2025

i have the same problem, please tell me why

@namogg
Copy link
Author

namogg commented Jan 5, 2025

Here is my testing code.

from pymilvus import (
    connections, 
    Collection, 
    CollectionSchema, 
    FieldSchema, 
    DataType
)
import numpy as np
import time
import math

# Step 1: Connect to Milvus
connections.connect(host="127.0.0.1", port="19530")

# Step 2: Define Collection Schema
collection_name = "gpu_performance_test"
dim = 512  # Vector dimension
db_size = 40000
batch_size = 1000  # Batch size for insertion
vectors = np.random.random((db_size, dim)).astype("float32")
# Define fields
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields, description="Test GPU Collection")


# Step 4: Recreate the collection
collection = Collection(name=collection_name, schema=schema)
collection.drop()
collection = Collection(name=collection_name, schema=schema)
#collection.drop()
print(f"Collection {collection_name} created.")

# Step 4: Create IVF-PQ Index
index_params = {
    "index_type": "GPU_IVF_PQ",  # GPU-accelerated index
    "metric_type": "L2",     # Distance metric
    "params": {"nlist": int(math.sqrt(db_size)), "m": 16},  # Index params: nlist and m
}
# index_params = {
#     "metric_type": "L2",
#     "index_type": "GPU_CAGRA",
#     "params": {
#         'intermediate_graph_degree': 64,
#         'graph_degree': 32
#     }
# }
# index_params = {
#     "index_type": "HNSW",  # GPU-accelerated index
#     "metric_type": "L2",     # Distance metric
# }
collection.create_index(field_name="embedding", index_params=index_params)

print("Creating index...")
  # Total number of vectors


# Step 4: Insert Data in Batches
print("Inserting data in batches...")
insert_time_start = time.time()

for i in range(0, db_size, batch_size):
    batch_vectors = vectors[i:i + batch_size]
    collection.insert([batch_vectors])  # Insert the current batch
    print(f"Inserted batch {i // batch_size + 1} with {len(batch_vectors)} vectors.")

insert_time_end = time.time()
print(f"Data insertion completed in {insert_time_end - insert_time_start:.4f} seconds.")

collection.load()
print("Index created successfully.")

# Step 5: Run Search and Measure Performance
# Generate query vectors
num_queries = 500
query_vectors = np.random.random((num_queries, dim)).astype("float32")

# Define search parameters
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 20},
}

# Run search and measure latency
for i in range(10000):
    start_time = time.time()from pymilvus import (
    connections, 
    Collection, 
    CollectionSchema, 
    FieldSchema, 
    DataType
)
import numpy as np
import time
import math

# Step 1: Connect to Milvus
connections.connect(host="127.0.0.1", port="19530")

# Step 2: Define Collection Schema
collection_name = "gpu_performance_test"
dim = 512  # Vector dimension
db_size = 40000
batch_size = 1000  # Batch size for insertion
vectors = np.random.random((db_size, dim)).astype("float32")
# Define fields
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields, description="Test GPU Collection")


# Step 4: Recreate the collection
collection = Collection(name=collection_name, schema=schema)
collection.drop()
collection = Collection(name=collection_name, schema=schema)
#collection.drop()
print(f"Collection {collection_name} created.")

# Step 4: Create IVF-PQ Index
index_params = {
    "index_type": "GPU_IVF_PQ",  # GPU-accelerated index
    "metric_type": "L2",     # Distance metric
    "params": {"nlist": int(math.sqrt(db_size)), "m": 16},  # Index params: nlist and m
}
# index_params = {
#     "metric_type": "L2",
#     "index_type": "GPU_CAGRA",
#     "params": {
#         'intermediate_graph_degree': 64,
#         'graph_degree': 32
#     }
# }
# index_params = {
#     "index_type": "HNSW",  # GPU-accelerated index
#     "metric_type": "L2",     # Distance metric
# }
collection.create_index(field_name="embedding", index_params=index_params)

print("Creating index...")
  # Total number of vectors


# Step 4: Insert Data in Batches
print("Inserting data in batches...")
insert_time_start = time.time()

for i in range(0, db_size, batch_size):
    batch_vectors = vectors[i:i + batch_size]
    collection.insert([batch_vectors])  # Insert the current batch
    print(f"Inserted batch {i // batch_size + 1} with {len(batch_vectors)} vectors.")

insert_time_end = time.time()
print(f"Data insertion completed in {insert_time_end - insert_time_start:.4f} seconds.")

collection.load()
print("Index created successfully.")

# Step 5: Run Search and Measure Performance
# Generate query vectors
num_queries = 500
query_vectors = np.random.random((num_queries, dim)).astype("float32")

# Define search parameters
search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 20},
}

# Run search and measure latency
for i in range(10000):
    start_time = time.time()
    results = collection.search(query_vectors, "embedding", param=search_params, limit=10)
    end_time = time.time()

    # Display search results and performance
    print(f"Search completed in {end_time - start_time:.4f} seconds.")
# for i, result in enumerate(results):
#     print(f"Query {i + 1} results:")
#     for hit in result:
#         print(f"  ID: {hit.id}, Distance: {hit.distance}")

# Step 6: Clean Up
print("Performance test completed. Cleaning up...")
collection.drop()
print("Collection dropped.")

    results = collection.search(query_vectors, "embedding", param=search_params, limit=10)
    end_time = time.time()

    # Display search results and performance
    print(f"Search completed in {end_time - start_time:.4f} seconds.")
# for i, result in enumerate(results):
#     print(f"Query {i + 1} results:")
#     for hit in result:
#         print(f"  ID: {hit.id}, Distance: {hit.distance}")

# Step 6: Clean Up
print("Performance test completed. Cleaning up...")
collection.drop()
print("Collection dropped.")

GPU usage is around 5-20%.
CPU usage is almost 90%.
Query time for 1000 vectors is 300ms, which is too high if it were using GPU.

@namogg
Copy link
Author

namogg commented Jan 5, 2025

I have tried increase DB size to 1Mil, it is using GPU now.But is it normal for CPU usage to be this high if i was using GPU, and it seems to be really slow.

@namogg
Copy link
Author

namogg commented Jan 5, 2025

@namogg how do you recognize the indexing use GPU, and how do you recognize searching use CPU? please attach the milvus pog logs for investigation.

How do i get milvus pog logs?

@namogg
Copy link
Author

namogg commented Jan 5, 2025

image
If my DB size is around 1M, milvus start to use GPU after some iteration. If my DB size is 100k, it still use CPU all the time . I wonder if there is any mistake in my implementation or config.

@xiaofan-luan
Copy link
Collaborator

image If my DB size is around 1M, milvus start to use GPU after some iteration. If my DB size is 100k, it still use CPU all the time . I wonder if there is any mistake in my implementation or config.

actually you can run a pprof to check your cpu usage flame graph.

But I guess most likely the data is running on GPU. The reason you see GPU usage is not high because performance bottleneck is not on vector search. (May RPC and result serdes).

After you increase the data size, computation becomes heavier.

Also for NQ=1000, 300ms might not be a very bad number(I'm assuming Cagra could be better).

We need more profiling result on cpu instance

@COCO-hy
Copy link

COCO-hy commented Jan 6, 2025

I checked my Milvus-standalone Docker container, and I can see the GPU information and the Milvus process running inside. Even though I created a GPU index, my GPU memory does not change at all during queries.
Below is the .yml configuration file used to start my milvus-standalone.

standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.5.1-gpu
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
MILVUS_WEB_UI_ENABLED: true
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
ports:
- "8888:19530"
- "9091:9091"
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: ['gpu']
device_ids: ['0']
depends_on:
- "etcd"
- "minio"

attu:
container_name: attu
image: zilliz/attu:v2.5.0
environment:
MILVUS_URL: standalone:19530
ports:
- "7676:3000"
depends_on:
- "standalone"

networks:
default:
name: milvus

@xiaofan-luan
Copy link
Collaborator

I checked my Milvus-standalone Docker container, and I can see the GPU information and the Milvus process running inside. Even though I created a GPU index, my GPU memory does not change at all during queries. Below is the .yml configuration file used to start my milvus-standalone.

standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.5.1-gpu command: ["milvus", "run", "standalone"] security_opt: - seccomp:unconfined environment: MILVUS_WEB_UI_ENABLED: true ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus ports: - "8888:19530" - "9091:9091" deploy: resources: reservations: devices: - driver: nvidia capabilities: ['gpu'] device_ids: ['0'] depends_on: - "etcd" - "minio"

attu: container_name: attu image: zilliz/attu:v2.5.0 environment: MILVUS_URL: standalone:19530 ports: - "7676:3000" depends_on: - "standalone"

networks: default: name: milvus

because you don't do flush on your collections.

@xiaofan-luan
Copy link
Collaborator

before segment sealed, all the data will be at cpu since GPU index don't support grow

@COCO-hy
Copy link

COCO-hy commented Jan 6, 2025

before segment sealed, all the data will be at cpu since GPU index don't support grow

I have made sure that the collection has been flushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants