-
Notifications
You must be signed in to change notification settings - Fork 0
15 november 2024
We discussed the initial results from VTune (flame graph) and where we should focus our investigations. In general, there are two parts for derivative evaluations, Gauges and Fermions. There is approx a 50% spread between the Gauges and the Fermions, whereas it is expected that the Gauges should require far less computational time.
In the flame graph, the Fermions are the large flat parts which lead to the WillsonKernelImplementation and the Dhop and DhopDag functions we were discussing previously. These are already assumed to be quite optimised as they are part of the SU3 solver.
The part which is SP2N specific and has not been as optimised is in the Gauges. This is the other part to the left of the graph with the small peaks, which is where we should focus our initial investigations. One place we identified is GaugeImplTypes.h:144 => ProjectOnGeneralGroup.
It was suggested that it might be worth putting some print statements in the code and using that to help narrow down the timing information of the logs Grid produces as they are very hard to interpret. This aligns also with the NVTX idea, which could also help analyse the CPU code with nsys (although in a limited way).
We further discussed Asif's initial observations and ongoing investigations with OpenMP where he had some concerns. He will keep investigating and updating on those.
The coming week we intend to investigate deeper into the code from GaugeImplTypes.h so we can discuss the issues in this part of the code with more details.