Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm/HIP #452

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions docs/user/software/development/rocm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: ROCm/HIP
summary: A quick guide to getting set up for ROCm/HIP development on Solus
---

# ROCm/HIP

ROCm is AMD's open-source software stack for GPU computation.

Note that ROCm is not required in order for, say, your display or browser, to
use GPU-acclerated rendering. These are more on the driver side of things and

Check warning on line 11 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (acclerated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use GPU-acclerated rendering. These are more on the driver side of things and
use GPU-accelerated rendering. These are more on the driver side of things and

Other than this, spelling checks out

are
handled by the kernel and/or Mesa. ROCm is mainly focused on GPU-accelerated
computing, such as GPU rendering in Blender or GPU-accelerated machine learning
in PyTorch.

## Install ROCm/HIP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an introductory sentence to this block:

Suggested change
To install ROCm, execute the following command:

```bash
sudo eopkg it rocm-hip rocm-opencl

Check warning on line 20 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (rocm)

Check warning on line 20 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (rocm)

Check warning on line 20 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (opencl)
```

If you are also developing with ROCm/HIP, install the
development files and the `hipcc` compiler driver as well:

Check warning on line 24 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (hipcc)
```bash
sudo eopkg it rocm-hip-devel

Check warning on line 26 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (rocm)
```

## Necessary Environment Variables

It is recommended and safe to put these environment variables in your
`~/.bashrc`:
```bash
export ROCM_PATH=/usr

Check warning on line 34 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (ROCM)
export HIP_PATH=/usr
```

If you're developing with ROCm/HIP, the following environment variables will
save you a lot of hassle:
```bash
export HIP_DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode

Check warning on line 41 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (amdgcn)

Check warning on line 41 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (bitcode)
export DEVICE_LIB_PATH=$HIP_DEVICE_LIB_PATH
export HIP_PLATFORM=amd
export HIP_RUNTIME=amd
export HIP_ROCCLR_HOME=$ROCM_PATH

Check warning on line 45 in docs/user/software/development/rocm.md

View workflow job for this annotation

GitHub Actions / spellcheck

Unknown word (ROCCLR)
```

## Supported Hardware and GPU Architectures

<!--
ROCm is designed such that in order to for the compiled binaries to run on a
certain GPU model, during compiling one must compile with that GPU as the
compilation target.
!-->

ROCm is designed such that in order for a library to support N different GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wall of text, nobody will read this.

With that being said, is it even necessary to mention all of this? I mean, for end users what matters is whether the GPU architecture is supported or not: If it's supported, they'll continue their day without reading everything; if it's not supported, they'll reach out using the available support channels.

Suggested change
ROCm is designed such that in order for a library to support N different GPU
:::tip
Install and run `rocminfo` to see your GPU architecture.
:::
The ROCm package included in Solus supports the following GPU architectures out of the box:
- `gfx803`
- `gfx900`
- `gfx906`
- `gfx908`
- `gfx90a`
- `gfx1010`. This architecture can run programs compiled for `gfx101*` GPUs such as `gfx1011` and `gfx1012`, see [Emulating
as a Supported Architecture](#emulating-as-a-supported-architecture).
- `gfx1030`. This architecture can run programs compiled for `gfx103*` GPUs such as `gfx1031` and `gfx1032`, see [Emulating
as a Supported Architecture](#emulating-as-a-supported-architecture).
- `gfx1010`
- `gfx1011`
- `gfx1012`
If your GPU is not on the list, please open an issue in our [issue tracker](https://github.com/getsolus/packages/issues/new/choose) with your GPU model and release year.
This list is only the minimum supported architectures. Some packages like [Blender](#blender) support more architectures.

architectures, that library must be compiled N times, once for each
architecture, causing the build time of a package to grow linearly. For example,
if we want PyTorch to support running on 5 different GPU architectures, we
essentially need to compile PyTorch 5 times. It should be obvious this quickly
becomes a maintenance burden as the compile time grows linearly with respect to
the number of GPUs models we want to support.

Therefore, we have carefuly picked
the following baseline
architectures such that we support as many reasonably recent
hardware as possible while not causing compilation times to skyrocket. Any GPU
architecture in the list below should work out-of-the-box.

- `gfx803`
- `gfx900`
- `gfx906`
- `gfx908`
- `gfx90a`
- `gfx1010`; for `gfx101*` GPUs such as `gfx1011` and `gfx1012`, see [Emulating
as a Supported Architecture](#emulating-as-a-supported-architecture) section.
- `gfx1030`; for `gfx103*` GPUs such as `gfx1031` and `gfx1032`, see [Emulating
as a Supported Architecture](#emulating-as-a-supported-architecture) section.
- `gfx1010`
- `gfx1011`
- `gfx1012`

:::tip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this note to the top of the section (See my first comment for more info)


Run `rocminfo` provided by the `rocminfo` package to
see what architecture your GPU(s) have.

:::

:::note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a note (See my first comment for more info)


This list is only the minimum supported architectures. Some packages like
[Blender](#blender) are built with support for even more architectures.

:::

If your GPU model is not on the list, please open an issue in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this right after the GPU list.

our [Issue Tracker] with your GPU model and the year that this model is
released.

### Emulating as a Supported Architecture

Several GPU archiectures, such as those in the Navi 1 family, have
almost identical (if not exactly identical) ISA that allows a program compiled for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this line:

Suggested change
almost identical (if not exactly identical) ISA that allows a program compiled for
similar ISAs. This allows programs compiled for

one architecture to run seamlessly on other.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid excessive claims

Suggested change
one architecture to run seamlessly on other.
one architecture to run on another.

For example, any program compiled for the `gfx1030` architecture can also run on
the `gfx1031` and `gfx1032` architectures. A list of such architectures is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it's not a good idea to say things like "the previous section" or "the previous step". Instead, refer to the specific section name and include a link to it.

Suggested change
the `gfx1031` and `gfx1032` architectures. A list of such architectures is
the `gfx1031` and `gfx1032` architectures. See the _Supported Hardware and GPU Architectures_ list for the supported architectures.

listed in the previous section.

To emulate your GPU as a supported architecture, the environment variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid passive voice:

Suggested change
To emulate your GPU as a supported architecture, the environment variable
To emulate your GPU as a supported architecture, add the `HSA_OVERRIDE_GFX_VERSION` environment variable to your system. For example:

`HSA_OVERRIDE_GFX_VERSION` must be specified. Examples:

Emulating as `gfx1030`:
```bash
export HSA_OVERRIDE_GFX_VERSION=10.3.0
```

Emulating as `gfx1010`:
```bash
export HSA_OVERRIDE_GFX_VERSION=10.1.0
```

Emulating as `gfx900`:
```bash
export HSA_OVERRIDE_GFX_VERSION=9.0.0
```

## Specifying which GPU to use

Sometimes, it may be hard or impossible to tell your program to use the GPU
that you want. This not only happnes on a system with multiple GPUs; this can
also happen when your CPU is also made by AMD and has an
integrated GPU. You can check whether your CPU has usable integrated graphics as
well by running `linux-driver-management status`. If your CPU has
integrated graphics and you have turned on switchable/hybrid graphics in your
BIOS, you may see something like the following:
```
Hybrid Graphics
╒ Primary GPU (iGPU)
╞ Device Name : Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
╞ Manufacturer : Advanced Micro Devices, Inc. [AMD/ATI]
╞ Product ID : 0x1669
╞ Vendor ID : 0x1002
╞ X.Org PCI ID : PCI:7:0:0
╘ Boot VGA : yes
╒ Secondary GPU (dGPU)
╞ Device Name : Navi 23 [Radeon RX 6600/6600 XT/6600M]
╞ Manufacturer : Advanced Micro Devices, Inc. [AMD/ATI]
╞ Product ID : 0x73ab
╞ Vendor ID : 0x1002
╞ X.Org PCI ID : PCI:2:0:0
╘ Boot VGA : no
```

ROCm/HIP offers the environment variable `HIP_VISIBLE_DEVICES` to control which
GPUs are visible to a process from the ROCm/HIP API. Only devices whose index
is present in the sequence are visible to HIP. For example, `export
HIP_VISIBLE_DEVICES=0` makes only the GPU with device index 0 visible, and
`export HIP_VISIBLE_DEVICES=0,2` makes only the GPUs with device indices 0 and 2
visible.

:::caution

The device index is **NOT** its agent number in the output of `rocminfo`! You
can find your device's corresponding index through the output of `rocm-smi`,
provided by the `rocm-smi` package.

:::

:::note

As suggested by its name, `HIP_VISIBLE_DEVICES` only hides the GPU from the
ROCm/HIP side. A program can still access GPUs hidden by `HIP_VISIBLE_DEVICES`
by calling other graphics APIs such as OpenGL.

:::

## Software-Specific Instructions

### Blender

### PyTorch


## Reporting an Issue


Loading