-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sub-RFC for increased availability of NUMA API #1545
base: dev/vossmjp/rfc_numa_support
Are you sure you want to change the base?
Changes from all commits
fdc26b4
ce6746d
258b82c
90bfaba
58a441f
5e8b79e
d0bf373
a10984c
35d7f55
81021be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,125 @@ | ||||||
# -*- fill-column: 80; -*- | ||||||
|
||||||
#+title: Link ~tbbbind~ with static HWLOC to improve predictability of NUMA support API | ||||||
|
||||||
*Note:* This is a sub-RFC of the https://github.com/oneapi-src/oneTBB/pull/1535. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Specifically, its section about "Increased availability of NUMA support". | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* Introduction | ||||||
oneTBB has a soft dependency on several variants of ~tbbbind~, which are loaded | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
by the library as part of its initialization stage. In turn, each ~tbbbind~ has | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
a hard dependency on a concrete version of the HWLOC library [1, 2]. The soft | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
dependency of oneTBB on ~tbbbind~ allows the library to continue its execution | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
even if the system loader is unable to resolve the hard dependency on HWLOC for | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
~tbbbind~. In this case, the HW topology is not discovered and the machine is | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
seen as if all CPU cores were uniform, which is the default TBB behavior when | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here TBB means old TBB? |
||||||
NUMA constraints are not used. Thus, the following code returns the values that | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
do not reflect the real topology and do not matter: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
#+begin_src C++ | ||||||
std::vector<oneapi::tbb::numa_node_id> numa_nodes = oneapi::tbb::info::numa_nodes(); | ||||||
std::vector<oneapi::tbb::core_type_id> core_types = oneapi::tbb::info::core_types(); | ||||||
#+end_src | ||||||
|
||||||
This lack of valid HW topology data due to absence of a third party library is | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
the major problem with the current oneTBB behavior. There is no diagnostics for | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
the issue, which likely makes it unnoticeable by developers, and the code that | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
uses oneTBB NUMA support facilities continues running but does not use NUMA as | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
intended. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Having a dependency on a shared HWLOC library has advantages: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
1. Code reuse with all of the positive consequences out of this, including | ||||||
relying on the same code that has been tested and debugged, allowing the OS | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but in fact most of Linux OSes has obsolete hwloc versions and relying on it does not provide benefits. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I did not know that. I will consider this in the future changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rewrite it to smth like this:
|
||||||
to share it among different processes, which consequently improves on cache | ||||||
locality and memory footprint. That's the primary purpose of shared | ||||||
libraries. | ||||||
2. A drop-in replacement. Users are able to use their own version of HWLOC | ||||||
without recompilation of oneTBB. This specific version of HWLOC could include | ||||||
a hotfix to support a particular and/or new hardware that a customer has, but | ||||||
whose support is not yet upstreamed to HWLOC project. It is also possible | ||||||
that such support won't be upstreamed at all if that hardware is not going to | ||||||
be available for massive users. It could also be a development version of | ||||||
HWLOC that someone wants to test on their systems first. Of course, they can | ||||||
do it with the static version as well, but that's more cumbersome as it | ||||||
requires recompilation of every dependent component. | ||||||
|
||||||
The only disadvantage from depending on HWLOC library dynamically is that the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if we need to present it as a disadvantage. Maybe make it a prerequisite:
|
||||||
developers that use oneTBB's NUMA support API need to make sure the library is | ||||||
available and can be found by oneTBB. Depending on the distribution model of a | ||||||
developer's code, this is achieved either by: | ||||||
1. Asking the end user to have necessary version of a dependency pre-installed. | ||||||
2. Bundling necessary HWLOC version together with other pieces of a product | ||||||
release. | ||||||
|
||||||
However, the requirement to fulfill one of the above steps for the NUMA API to | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, the need to complete one of the above steps for the NUMA API to function effectively may be seen as inconvenient. More importantly, it is not always immediately clear that these steps are required. Especially, die to the silent fallback behavior when the HWLOC library is not found in the environment. |
||||||
start paying off may be considered as an incovenience and, what is more | ||||||
important, it is not always obvious that one of these steps is needed. | ||||||
Especially, due to silent behavior in case HWLOC library cannot be found in the | ||||||
environment. | ||||||
|
||||||
This proposal suggests an improvement to reduce the effect of the disadvantage | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
being dependent on a dynamic version of HWLOC library by having it linked | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
statically with one of the ~tbbbind~ libraries that are distributed together | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
with oneTBB, yet leaving possibility to specify another version of HWLOC library | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
if users see the need. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Since HWLOC 1.x is an old version of HWLOC and modern versions of operating | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
systems install HWLOC 2.x by default, the probability of someone who is | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
constrained by using only HWLOC 1.x on their system is relatively small. Thus, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
the filename of the ~tbbbind~ library that is linked against HWLOC 1.x can be | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
re-used for the library that is linked against static HWLOC version 2.x. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* Proposal | ||||||
1. Replace the dynamic link of ~tbbbind~ library which is currently linked | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
against HWLOC 1.x with the link to a static HWLOC library version 2.x. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
2. Add loading of that ~tbbbind~ variant as the last attempt to resolve the | ||||||
dependency on functionality provided by ~tbbbind~ layer. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
3. Update the oneTBB documentation considering [[https://oneapi-src.github.io/oneTBB/search.html?q=tbb%3A%3Ainfo][these documentation pages]] to | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
include steps determining the variant of ~tbbbind~ being used. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
** Advantages | ||||||
1. The proposed behavior allows having a mechanism for resolving a dependency on | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
HWLOC library in case it cannot be found in the environment, while still | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
preferring user-provided version of HWLOC. As a result, the problematic use of | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
oneTBB API mentioned above should work as expected, returning enumerated list | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
of actual NUMA nodes and core types on the system the code is running on, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. of actual NUMA nodes and core types, provided that:
|
||||||
provided that the loaded HWLOC library works on that system and that an | ||||||
application properly distributes all binaries of oneTBB, sets the environment | ||||||
so that the necessary variant of ~tbbbind~ library can be found and loaded. | ||||||
2. The drop of support for HWLOC 1.x allows to not introducing additional | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
~tbbbind~ variant of the library, yet maintaining support for popular | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
versions of HWLOC. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, you still introduce new variant of tbbbind, as ALL distributions already ship only libtbbbind_2_5.so It means, all TBB distribution across all package managers need to be updated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or they may continue supporting the current variant with system-specific version of HWLOC as it is still a working approach. |
||||||
|
||||||
** Disadvantages | ||||||
By default still no diagnostics if users failed to setup environment with their | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why user would be its own version of hwloc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We introduced that approach in the past and it may be backward incompatible if we change it at once. Perhaps, we decide to prioritize own version of HWLOC someday, but AFAIK still need to leave the possibility to specify a user one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
envar will help identifying an issue with setup of environment pretty quickly. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* Alternative handling of inability to parse system topology | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
The other behavior in case HWLOC library cannot be found is to be more explicit | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An alternative approach to handle the absence of the HWLOC library is to adopt a more explicit response:
|
||||||
about the problem of a missing component and to either issue a warning or to | ||||||
refuse working requiring one of the ~tbbbind~ variant to be loaded (e.g., throw | ||||||
an exception). | ||||||
|
||||||
Comparing these alternative approaches to the one proposed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be a heading? |
||||||
** Common Advantages | ||||||
- Explicitly tells that the functionality being used is not going to work | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
instead of just being silent. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Does not require additional variant of ~tbbbind~ library to be distributed | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
along with the others. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
** Common Disadvantages | ||||||
- Requires additional step from the user side to resolve the problem. In other | ||||||
words, it does not provide complete solution to the problem. | ||||||
|
||||||
** Disadvantages of Issuing a Warning | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this section and the next one be subparts of the common disadvantages section? |
||||||
- The warning may still not be visible, especially if standard streams are | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
closed. | ||||||
|
||||||
** Disadvantages of Throwing an Exception | ||||||
- May break existing code as it does not expect an exception to be thrown. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Requires introduction of an additional exception hierarchy. | ||||||
|
||||||
* References | ||||||
1. [[https://www.open-mpi.org/projects/hwloc/][HWLOC project main page]] | ||||||
2. [[https://github.com/open-mpi/hwloc][HWLOC project repository on GitHub]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the title is too long. its okay though since it is not a part of the main doc set, but i suggest rephrasing it to something like that:
Link tbbbind with Static HWLOC for NUMA API Predictability