Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AMD ROCm devices #123

Closed
wants to merge 0 commits into from
Closed

Conversation

Junyi-99
Copy link

Issue Type

  • Feature implementation

Description

I've implemented ROCm support in nvitop, enabling it to run on AMD GPUs. This feature has been tested on mi50, mi100, and mi210 machines and is confirmed to maintain full functionality for NVIDIA GPUs.

Motivation and Context

Really need nvitop on AMD GPUs.

#74

Testing

Tested on

mi50

mi100

mi210

Images / Videos

mi100

(top: nvitop, bottom-left: rocm-smi, bottom-right: pytorch code)

@XuehaiPan XuehaiPan self-assigned this Mar 13, 2024
@XuehaiPan XuehaiPan added enhancement New feature or request api Something related to the core APIs labels Mar 13, 2024
@XuehaiPan XuehaiPan linked an issue Mar 13, 2024 that may be closed by this pull request
2 tasks
@XuehaiPan
Copy link
Owner

XuehaiPan commented Mar 15, 2024

Hi @Junyi-99, thanks for the contribution! Is there any PyPI package that provides the ROCm-SMI bindings like nvidia-ml-py for the NVIDIA NVML library? Maybe we should ship the ROCm support with:

pip3 install nvitop[rocm]

@Junyi-99
Copy link
Author

Oh, I think it's a very good suggestion to ship through nvitop[rocm]. Currently, there is a ROCm binding, but it is not that functional.

@XuehaiPan XuehaiPan force-pushed the main branch 3 times, most recently from 4e67ba0 to 6bc8a8b Compare July 4, 2024 08:57
@hartmark
Copy link

hartmark commented Aug 8, 2024

+1 I'd love to have this support, how is the development going?

@kswain98
Copy link

kswain98 commented Aug 8, 2024

+1 It would be great to have this for MI300X

@unclemusclez
Copy link

unclemusclez commented Aug 22, 2024

trying this now with hf autotrain, AMD Radeon 7900XT Navi31 gfx1100 with pip install git+https://github.com/XuehaiPan/nvitop.git

I still receive the errors:

Your installed package `nvidia-ml-py` is corrupted. Skip patch functions `nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses`. You may get incorrect or incomplete results. Please consider reinstall package `nvidia-ml-py` via `pip3 install --force-reinstall nvidia-ml-py nvitop`.
Your installed package `nvidia-ml-py` is corrupted. Skip patch functions `nvmlDeviceGetMemoryInfo`. You may get incorrect or incomplete results. Please consider reinstall package `nvidia-ml-py` via `pip3 install --force-reinstall nvidia-ml-py nvitop`.

@dmitrii-galantsev
Copy link

@Junyi-99 Would it be possible to use the rocmsmi repo as a submodule instead?
Are there any modifications beyond formatting?

Also please note that we're working on migration to AMDSMI and it would be much better long-term to use that :).
ROCMSMI will eventually be deprecated.

In fact RDC migrated to amdsmi somewhat recently.

Cheers!
-- Dev from SMI team at AMD.

@dmitrii-galantsev
Copy link

Is there any PyPI package that provides the ROCm-SMI bindings like nvidia-ml-py for the NVIDIA NVML library?

@XuehaiPan This is planned for amdsmi :)

@dmitrii-galantsev
Copy link

some more info.

  • You can build and install amdsmi python package fairly easily.
# if on ubuntu get dependencies:
# sudo apt install git python3 python3-pip cmake clang build-essential pkg-config libdrm-dev
git clone https://github.com/ROCm/amdsmi &&
cd amdsmi &&
cmake -B build &&
make -C build -j $(nproc) &&
cd build/py-interface/python_package &&
python3 -m pip install .

Now you should be able to use the api: https://github.com/ROCm/amdsmi/tree/amd-staging/py-interface#usage

  • amd-smi process returns some useful info. Here is me running rocm-validation-suite in the background on dual NV21s:
$ amd-smi process
GPU: 0
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

GPU: 1
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

@unclemusclez
Copy link

this works for wsl2?

some more info.

* You can build and install amdsmi python package fairly easily.
# if on ubuntu get dependencies:
# sudo apt install git python3 python3-pip cmake clang build-essential pkg-config libdrm-dev
git clone https://github.com/ROCm/amdsmi &&
cd amdsmi &&
cmake -B build &&
make -C build -j $(nproc) &&
cd build/py-interface/python_package &&
python3 -m pip install .

Now you should be able to use the api: https://github.com/ROCm/amdsmi/tree/amd-staging/py-interface#usage

* `amd-smi process` returns some useful info. Here is me running [rocm-validation-suite](https://github.com/ROCm/ROCmValidationSuite/) in the background on dual NV21s:
$ amd-smi process
GPU: 0
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

GPU: 1
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

@dmitrii-galantsev
Copy link

@unclemusclez AFAIK - no.
SMI needs access to amdgpu driver.
rule of thumb, if /sys/class/drm/card*/device/gpu_metrics exists - SMI will work.

@Junyi-99
Copy link
Author

Junyi-99 commented Sep 5, 2024

@dmitrii-galantsev I'll try it this weekend.

@ehartford
Copy link

ehartford commented Nov 27, 2024

hello - what happened to this PR? I would really like to have nvitop on AMD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Something related to the core APIs enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Add support to AMD's ROCm GPU
7 participants