You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
I have tried the latest version of nvitop in a new isolated virtual environment.
What version of nvitop are you using?
1.3.2
Operating system and version
Ubuntu 22.04
NVIDIA driver version
560.35.03
NVIDIA-SMI
$ nvidia-smi -i 0
Tue Nov 19 14:59:58 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:1D:00.0 Off | N/A |
| 30% 28C P8 31W / 350W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Python environment
$ python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] linux
gpustat==1.1.1
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.108
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
nvitop==1.3.2
Problem description
The nvitop exits with a segmentation fault when one of the gpu is lost from the bus.
First of all, this is not a problem with nvitop itself.
I encountered this issue and would like to suggest that nvitop should still be able to display other GPUs even when one GPU is faulty, instead of resulting in a segmentation fault.
It would be nice if nvitop could skip the faulty GPU. (like gpustat)
Steps to Reproduce
Unplug the gpu from the pcie bus. (don't know how to do that..)
nvitop
Traceback
nvitop[2398212]: segfault at 0 ip 00007f2113c7128b sp 00007ffc6e223820 error 4 in libnvidia-ml.so.560.35.03[7f2113c00000+1d3000]
Logs
No response
Expected behavior
It would be nice if nvitop could skip the faulty GPU.
For example gpustat can show the faulty GPU:
Additional context
No response
The text was updated successfully, but these errors were encountered:
Required prerequisites
What version of nvitop are you using?
1.3.2
Operating system and version
Ubuntu 22.04
NVIDIA driver version
560.35.03
NVIDIA-SMI
Python environment
$ python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] linux
gpustat==1.1.1
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.535.108
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
nvitop==1.3.2
Problem description
The
nvitop
exits with a segmentation fault when one of the gpu is lost from the bus.First of all, this is not a problem with nvitop itself.
I encountered this issue and would like to suggest that
nvitop
should still be able to display other GPUs even when one GPU is faulty, instead of resulting in a segmentation fault.It would be nice if nvitop could skip the faulty GPU. (like
gpustat
)Steps to Reproduce
nvitop
Traceback
Logs
No response
Expected behavior
It would be nice if nvitop could skip the faulty GPU.
For example
gpustat
can show the faulty GPU:Additional context
No response
The text was updated successfully, but these errors were encountered: