Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub-RFC for increased availability of NUMA API #1545

Open
wants to merge 10 commits into
base: dev/vossmjp/rfc_numa_support
Choose a base branch
from
125 changes: 125 additions & 0 deletions rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# -*- fill-column: 80; -*-

#+title: Link ~tbbbind~ with static HWLOC to improve predictability of NUMA support API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the title is too long. its okay though since it is not a part of the main doc set, but i suggest rephrasing it to something like that:
Link tbbbind with Static HWLOC for NUMA API Predictability


*Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535.
*Note:* This document is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535.

Specifically, its section about "Increased availability of NUMA support".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specifically, its section about "Increased availability of NUMA support".
Specifically, the "Increased availability of NUMA support" section.


* Introduction
oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded
oneTBB has a soft dependency on several variants of ~tbbbind~, which

by the library as part of its initialization stage. In turn, each ~tbbbind~ has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by the library as part of its initialization stage. In turn, each ~tbbbind~ has
the library loads during the initialization stage. Each ~tbbbind~, in turn, has

a hard dependency on a concrete version of the HWLOC library [1, 2]. The soft
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a hard dependency on a concrete version of the HWLOC library [1, 2]. The soft
a hard dependency on a specific version of the HWLOC library [1, 2]. The soft

dependency of oneTBB on ~tbbbind~ allows the library to continue its execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dependency of oneTBB on ~tbbbind~ allows the library to continue its execution
dependency means that the library continues the execution

even if the system loader is unable to resolve the hard dependency on HWLOC for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
even if the system loader is unable to resolve the hard dependency on HWLOC for
even if the system loader fails to resolve the hard dependency on HWLOC for

~tbbbind~. In this case, the HW topology is not discovered and the machine is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
~tbbbind~. In this case, the HW topology is not discovered and the machine is
~tbbbind~. In this case, oneTBB does not discover the hardware topology.

seen as if all CPU cores were uniform, which is the default TBB behavior when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seen as if all CPU cores were uniform, which is the default TBB behavior when
Instead, it defaults to viewing all CPU cores as uniform, consistent with TBB behavior when

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here TBB means old TBB?

NUMA constraints are not used. Thus, the following code returns the values that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NUMA constraints are not used. Thus, the following code returns the values that
NUMA constraints are not used. As a result, the following code returns the irrelevant values that

do not reflect the real topology and do not matter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
do not reflect the real topology and do not matter:
do not reflect the actual topology:


#+begin_src C++
std::vector<oneapi::tbb::numa_node_id> numa_nodes = oneapi::tbb::info::numa_nodes();
std::vector<oneapi::tbb::core_type_id> core_types = oneapi::tbb::info::core_types();
#+end_src

This lack of valid HW topology data due to absence of a third party library is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This lack of valid HW topology data due to absence of a third party library is
This lack of valid HW topology, caused by the absence of a third-party library, is

the major problem with the current oneTBB behavior. There is no diagnostics for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the major problem with the current oneTBB behavior. There is no diagnostics for
the major problem with the current oneTBB behavior. The problem lies in the lack of diagnostics

the issue, which likely makes it unnoticeable by developers, and the code that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the issue, which likely makes it unnoticeable by developers, and the code that
making it difficult for developers to detect.

uses oneTBB NUMA support facilities continues running but does not use NUMA as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uses oneTBB NUMA support facilities continues running but does not use NUMA as
As a result, the code continues to run but fails to use NUMA as intended.

intended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
intended.


Having a dependency on a shared HWLOC library has advantages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Having a dependency on a shared HWLOC library has advantages:
Dependency on a shared HWLOC library has the following benefits:

1. Code reuse with all of the positive consequences out of this, including
relying on the same code that has been tested and debugged, allowing the OS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in fact most of Linux OSes has obsolete hwloc versions and relying on it does not provide benefits.
IMO, having most up-to-date static HWLOC together with recent versions of oneTBB has benefits and fixes / new features are available to oneTBB immediately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I did not know that. I will consider this in the future changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rewrite it to smth like this:
1. Reliability. Using a tested and debugged shared library, oneTBB benefits from established, reliable functionality.
2. Code Reuse. Reuse the same code across different processes, improving cache locality and reducing memory footprint, which is the primary purpose of shared libraries.
3. Drop-In Replacement. Use your version of HWLOC without recompiling oneTBB. It can be useful in the following cases:

  • You need to apply a hotfix to support your hardware that has not yet been integrated into the HWLOC project.
  • You use a HWLOC version that may never be upstreamed. For example, if hardware unavailable to the broader market.
  • You want to test a development version of HWLOC on your system.

to share it among different processes, which consequently improves on cache
locality and memory footprint. That's the primary purpose of shared
libraries.
2. A drop-in replacement. Users are able to use their own version of HWLOC
without recompilation of oneTBB. This specific version of HWLOC could include
a hotfix to support a particular and/or new hardware that a customer has, but
whose support is not yet upstreamed to HWLOC project. It is also possible
that such support won't be upstreamed at all if that hardware is not going to
be available for massive users. It could also be a development version of
HWLOC that someone wants to test on their systems first. Of course, they can
do it with the static version as well, but that's more cumbersome as it
requires recompilation of every dependent component.

The only disadvantage from depending on HWLOC library dynamically is that the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to present it as a disadvantage. Maybe make it a prerequisite:
Before using oneTBB's NUMA support API, make the library available and accessible for oneTBB by doing one of the following:

  • End-User Installation: pre-install the necessary version of the HWLOC library.
  • Bundled Distribution: Bundle the required HWLOC version alongside other components as part of the product release.

developers that use oneTBB's NUMA support API need to make sure the library is
available and can be found by oneTBB. Depending on the distribution model of a
developer's code, this is achieved either by:
1. Asking the end user to have necessary version of a dependency pre-installed.
2. Bundling necessary HWLOC version together with other pieces of a product
release.

However, the requirement to fulfill one of the above steps for the NUMA API to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the need to complete one of the above steps for the NUMA API to function effectively may be seen as inconvenient. More importantly, it is not always immediately clear that these steps are required. Especially, die to the silent fallback behavior when the HWLOC library is not found in the environment.

start paying off may be considered as an incovenience and, what is more
important, it is not always obvious that one of these steps is needed.
Especially, due to silent behavior in case HWLOC library cannot be found in the
environment.

This proposal suggests an improvement to reduce the effect of the disadvantage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This proposal suggests an improvement to reduce the effect of the disadvantage
The proposal is to reduce the effect of the disadvantage

being dependent on a dynamic version of HWLOC library by having it linked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
being dependent on a dynamic version of HWLOC library by having it linked
of relying on a dynamic HWLOC library.

statically with one of the ~tbbbind~ libraries that are distributed together
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
statically with one of the ~tbbbind~ libraries that are distributed together
The improvements involve statically linking HWLOC with one of the ~tbbbind~ libraries distributed together

with oneTBB, yet leaving possibility to specify another version of HWLOC library
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with oneTBB, yet leaving possibility to specify another version of HWLOC library
with oneTBB. At the same time, you retain the flexibility to specify different version of HWLOC library

if users see the need.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if users see the need.
if needed.


Since HWLOC 1.x is an old version of HWLOC and modern versions of operating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since HWLOC 1.x is an old version of HWLOC and modern versions of operating
Since HWLOC 1.x is an older version and modern operating

systems install HWLOC 2.x by default, the probability of someone who is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
systems install HWLOC 2.x by default, the probability of someone who is
systems install HWLOC 2.x by default, the probability of users being

constrained by using only HWLOC 1.x on their system is relatively small. Thus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constrained by using only HWLOC 1.x on their system is relatively small. Thus,
restricted to HWLOC 1.x is relatively small. Thus,

the filename of the ~tbbbind~ library that is linked against HWLOC 1.x can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the filename of the ~tbbbind~ library that is linked against HWLOC 1.x can be
we can reuse the filename of the ~tbbbind~ library linked to HWLOC 1.x

re-used for the library that is linked against static HWLOC version 2.x.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
re-used for the library that is linked against static HWLOC version 2.x.
for the library linked against a static HWLOC 2.x.


* Proposal
1. Replace the dynamic link of ~tbbbind~ library which is currently linked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Replace the dynamic link of ~tbbbind~ library which is currently linked
1. Replace the dynamic link of ~tbbbind~ library currently linked

against HWLOC 1.x with the link to a static HWLOC library version 2.x.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
against HWLOC 1.x with the link to a static HWLOC library version 2.x.
against HWLOC 1.x with a link to a static HWLOC library version 2.x.

2. Add loading of that ~tbbbind~ variant as the last attempt to resolve the
dependency on functionality provided by ~tbbbind~ layer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dependency on functionality provided by ~tbbbind~ layer.
dependency on functionality provided by the ~tbbbind~ layer.

3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to
3. Update the oneTBB documentation, including [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these pages]], to

include steps determining the variant of ~tbbbind~ being used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
include steps determining the variant of ~tbbbind~ being used.
detail the steps for identifying which ~tbbbind~ is being used.


** Advantages
1. The proposed behavior allows having a mechanism for resolving a dependency on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. The proposed behavior allows having a mechanism for resolving a dependency on
1. The proposed behavior introduces a fallback mechanism for resolving

HWLOC library in case it cannot be found in the environment, while still
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
HWLOC library in case it cannot be found in the environment, while still
the HWLOC library dependency when it is not in the environment, while still

preferring user-provided version of HWLOC. As a result, the problematic use of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preferring user-provided version of HWLOC. As a result, the problematic use of
preferring user-provided versions. As a result, the problematic oneTBB API usage

oneTBB API mentioned above should work as expected, returning enumerated list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oneTBB API mentioned above should work as expected, returning enumerated list
works as expected, returning an enumerated list

of actual NUMA nodes and core types on the system the code is running on,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of actual NUMA nodes and core types, provided that:

  • The loaded HWLOC library is compatible with the system.
  • The application properly distributes all oneTBB binaries and configures the environment to locate and load the required tbbbind library variant.

provided that the loaded HWLOC library works on that system and that an
application properly distributes all binaries of oneTBB, sets the environment
so that the necessary variant of ~tbbbind~ library can be found and loaded.
2. The drop of support for HWLOC 1.x allows to not introducing additional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. The drop of support for HWLOC 1.x allows to not introducing additional
2. Dropping support for HWLOC 1.x, introduces an additional

~tbbbind~ variant of the library, yet maintaining support for popular
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
~tbbbind~ variant of the library, yet maintaining support for popular
~tbbbind~ variant while maintaining support for widely used

versions of HWLOC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you still introduce new variant of tbbbind, as ALL distributions already ship only libtbbbind_2_5.so

It means, all TBB distribution across all package managers need to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or they may continue supporting the current variant with system-specific version of HWLOC as it is still a working approach.


** Disadvantages
By default still no diagnostics if users failed to setup environment with their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default still no diagnostics if users failed to setup environment with their
By default, there is still no diagnostics if you fail to correctly setup an environment with your

own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why user would be its own version of hwloc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced that approach in the past and it may be backward incompatible if we change it at once. Perhaps, we decide to prioritize own version of HWLOC someday, but AFAIK still need to leave the possibility to specify a user one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~
version of HWLOC. Although, specifying the ~TBB_VERSION=1~

envar will help identifying an issue with setup of environment pretty quickly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
envar will help identifying an issue with setup of environment pretty quickly.
environment variable helps identify configuration issues quickly.


* Alternative handling of inability to parse system topology
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Alternative handling of inability to parse system topology
* Alternative Handling for Missing System Topology

The other behavior in case HWLOC library cannot be found is to be more explicit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach to handle the absence of the HWLOC library is to adopt a more explicit response:

  • Issue a warning about the missing component.
  • Require one of the tbbbind variants to be loaded by refusing to work or throwing an exception.

about the problem of a missing component and to either issue a warning or to
refuse working requiring one of the ~tbbbind~ variant to be loaded (e.g., throw
an exception).

Comparing these alternative approaches to the one proposed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a heading?

** Common Advantages
- Explicitly tells that the functionality being used is not going to work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Explicitly tells that the functionality being used is not going to work
- Explicitly indicates that the functionality being used does not work,

instead of just being silent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instead of just being silent.
instead of failing silently.

- Does not require additional variant of ~tbbbind~ library to be distributed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Does not require additional variant of ~tbbbind~ library to be distributed
- Avoids the need to distribute an additional variant of ~tbbbind~ library.

along with the others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
along with the others.


** Common Disadvantages
- Requires additional step from the user side to resolve the problem. In other
words, it does not provide complete solution to the problem.

** Disadvantages of Issuing a Warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this section and the next one be subparts of the common disadvantages section?

- The warning may still not be visible, especially if standard streams are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The warning may still not be visible, especially if standard streams are
- The warning may be unnoticed, especially if standard streams are

closed.

** Disadvantages of Throwing an Exception
- May break existing code as it does not expect an exception to be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- May break existing code as it does not expect an exception to be thrown.
- May break existing code that does not expect an exception to be thrown.

- Requires introduction of an additional exception hierarchy.

* References
1. [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]]
2. [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]]