Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about How much bandwidth does the L2 have to give, anyway? #96

Open
moep0 opened this issue Dec 5, 2022 · 1 comment
Open

Comments

@moep0
Copy link

moep0 commented Dec 5, 2022

I tried to reproduce the results in this article.

I know that uarch-bench can use perf, but since I always get jevents errors, I made a simple implementation of this part of uarch-bench, and the code is here, which can be run directly from cont.sh .

The experiments are done in my intel i7-10700. The data is shown below.

image-20221122141412549

The three PMUs are explained as follows.

  • r1d1: Retired load uops with L1 cache hits as data source
  • r2d1: Retired load uops with L2 cache hits as data source
  • r4d1: Retired load uops with L3 cache hits as data source

The first column size refers to the size of the Cache to be traversed, in Kib, with 500000 iterations.

The number of hits per cycle for the L1 and L2 cache is not quite the same as stated in the article, am I doing something wrong here?

@moep0
Copy link
Author

moep0 commented Dec 7, 2022

And I have re-run the code you provided in the article in both turning on and off prefetchers on the i5-8265u. It seems that turning on prefetchers will not cause any performance losses. Instead, when turning off them, there are performance losses. It seems that Intel has made some efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant