Skip to content

25 October 2024

Ilektra Christidi edited this page Nov 22, 2024 · 9 revisions

To ask:

  • Why does the documentation say it used only managed memory, when the code actually uses both managed and non-managed?
    • Docs out of date, don't worry about it. Non-managed is/should be used where needed.
  • From profile:
    • To only profile kernels of interest, out of the gazillion ApplyKernels: first run with nsys, note after which kernel invocation it starts being interested, and only profile those with ncu
    • What are the small cuda kernels with lots of small data transfers called between the big kernel calls? Are those the ones that we want to optimise or just the big kernel?
  • What to do with the tests that don't run: modify build to not build them with the wrong options, merge it to upstream....?
    • If you enable back mobius, tests build, so do that (still use Ed's patch). Also pull latest develop from GRID repo.
    • Some tests fail, differently on MG's machine and tursa - on CPU only. Look into more detail on tursa once we get some CPU nodes (see below), and report any tests that fail but are not relevant to us upstream.
    • Also look at the teamcity CI. Login as guest.
    • TODO: MG to run make check on GPU as well
  • What are the correct config + execution options for benchmarking (MPI partitioning, number of threads... --grid and --mpi )?
    • --grid = number of sites in each dimension. --mpi = number of MPI ranks per dimension
  • How does the threading work?
    • It's in a macro in threads.h. Can't have a #pragma in a macro, so cannot grep for it... How to use: specify OMP_NUMBER_OF_THREADS before run.
    • We definitely need threading on. On CPU node: 1MPI rank/chiplet = 8 ranks/node, with 16 threads/rank = 128 cores/node.
  • tursa is GPU only - we're burning GPU time to look at CPUs!
    • TODO: IC to ask for separate project code for CPU hours for this RSE project (as many as they'd give us, at least 10k CPU hours. We'd burn them quickly, since you get a whole node at a time, which has 128 cores) -> DONE, we got some more CPU credits on Tursa
    • If that fails, do CPU studies/development on CSD3 or dial3.
  • What's the feeling about introducing some linting over the code (clang-format)?
    • NOOOO! Peter will never speak to you again!
  • Automated testing: it looks like there are some automated unit tests, but the end-to-end tests are more like examples - they create the output and nothing else, the user would have to check it manually. We'd like to have some automated regression tests. Do we want to do this locally only in the Sp2n folder/tests that we care about, or make something more general to contribute to upstream Grid?
    • Look at Grid/util/FlightRecorder.h/cc for regression test facilities (used in unit tests) - most likely not appropriate...
    • TODO: EB will talk to Ryan (RSE/postdoc working on GRID at Edinburgh) to talk to Peter about how acceptable a contribution would be. -> DONE, the answer is no... Therefore, we'll have to have our own script (or other regression testing system that doesn't affect the rest of GRID) inside the test folder(s) of interest.