Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error - not sure of solution #54

Open
JGBurgess1 opened this issue Nov 1, 2023 · 3 comments
Open

CUDA error - not sure of solution #54

JGBurgess1 opened this issue Nov 1, 2023 · 3 comments

Comments

@JGBurgess1
Copy link

Dear Unidock developers,

Thanks for producing this software for us to use!
I really appreciate that.

We've run into some errors when I start to use multiple ligands as input parameters.:

"....
.....
Computing Vina
grid ... done.
Total ligands: 283
Batch 1 size: 283
> CUDA error at /apps/chpc/bio/Uni-Dock/unidock/src/cuda/precalculate.cu:198
code=34(cudaErrorStubLibrary) "cudaMalloc(&atom_
xs_gpu, thread * max_atom_num * sizeof(sz))"

Do you know what is happening?

By the way, I think we have Nvidia V100 x 16GB cards,
will this cause an issue? I saw in your code that you recognize the 32GB V100 when you determine the memory size to be allocated. Is this part of the problem?

Regards

Jeremy

@caic99
Copy link
Member

caic99 commented Nov 1, 2023

Hi Jeremy @JGBurgess1 ,
I noticed that you are facing the error of code=34(cudaErrorStubLibrary). This indicates that your CUDA library or driver is not installed correctly. Run nvidia-smi to check if the system can use GPU.
V100 16GB should work. Please let us know if you encountered CUDA out-of-memory errors.

@JGBurgess1
Copy link
Author

JGBurgess1 commented Nov 2, 2023

I ran nvidia-smi, on the GPU node.

Wed Nov  1 21:03:05 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                  Off |
| N/A   32C    P0    34W / 250W |      0MiB / 16384MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:AF:00.0 Off |                  Off |
| N/A   53C    P0   191W / 250W |   1392MiB / 16384MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  Off  | 00000000:D8:00.0 Off |                  Off |
| N/A   45C    P0   119W / 250W |    378MiB / 16384MiB |     72%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    1   N/A  N/A    340553      C   ...u/amber/18/bin/pmemd.cuda     1388MiB |
|    2   N/A  N/A    325488      C   ...pu/gromacs/2020.1/bin/gmx      374MiB |
+-----------------------------------------------------------------------------+

Does this help answer the question?

Regards,

Jeremy

@caic99
Copy link
Member

caic99 commented Nov 2, 2023

@JGBurgess1
I guess you are using a cluster with slurm/PBS? Please try compiling and running Uni-Dock on the GPU nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants