-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous performance unit testing #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really enjoyed this early stage read! my comments are mostly referring to trivial things like typos etc. plus a few questions. nothing serious I would say.
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
|
||
You can find a complete list of all JFR event types by JDK version in this https://bestsolution-at.github.io/jfr-doc/[nice matrix] created by https://twitter.com/tomsontom[Tom Schindl]. | ||
The number of JFR event types is growing constantly, as of JDK 15, there 157 different ones of them. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I stopped processing here ;-)
content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc
Outdated
Show resolved
Hide resolved
4215aee
to
1ce973f
Compare
♻️ PR Preview ddf9d24 has been successfully destroyed since this PR has been closed. 🤖 By surge-preview |
1ce973f
to
886a48c
Compare
886a48c
to
ed98607
Compare
Hey @hpgrahsl, thanks a lot for that first round of review! I've addressed most of your remarks, and I've also added the missing parts to the post. I.e. it's ready now for a review of that remainder, should you have the time and interest. There's also a rendered preview available now. I'm planning to do some more polishing and also updates to the images, but overall, it's 95% of what I had in mind. Thank you so much! |
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
but also to identify regressions -- bugs to existing functionality introduced by a code change. | ||
The situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones: | ||
How to detect increased response times in a web application? | ||
How to identify decreased throughput? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand what you mean, I'd say that these specific examples are a bit off since they are very much metrics which I would NOT test using your new tool?
There's a need for testing at both levels:
- very micro, such as "This sort operation here can be performed in less than Z memory allocated even for an N sized array"
- system wide impact, such as system design and integration is such that overall throughput (on certain machine) is within X,Y with some margins.
I'd use a tool like JfrUnit for the first cathefory only, it seems a slippery slope to try abusing it for beyond this and that's probably a claim I'd be umcofortable with :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a need for testing at both levels
Agreed. I think JfrUnit's testing approach can play a role for both, though. For the second, it wouldn't help you answering that question "overall throughput is within X,Y with some margins" directly, but it would help you to identify potential regressions, going against that goal. It should be more clear in the discussion towards the end, perhaps I need to reword here a bit, too.
How to identify decreased throughput? | ||
|
||
These aspects are typically hard to test in an automated and reliable way in the development workflow, | ||
as they are dependent on the underlying hardware and the workload of an application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But best to split what you're going to measure here in cathegories. Since JfrUnit is about "indirect" metrics, you can't really verify that not having a certain Sleep(1000)
or not having a certain GC event will necessarily give you the throughput you're after.
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
This post introduces https://github.com/gunnarmorling/jfrunit[JfrUnit], which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on _indirect metrics_ which may impact those. | ||
Based on https://openjdk.java.net/jeps/328[JDK Flight Recorder] events, JfrUnit allows you define and execute assertions e.g. against expected memory allocation, database I/O, or number of executed SQL statements, for a given workload. | ||
Starting off from a defined base line, future failures of such assertions are indicators for potential performance regressions in an application, as a code change may have introduced higher GC pressure, | ||
the retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well N+1 is meh.. it's certainly good to test against it (and maybe we should see to make this easier with some helper and a blog) but that could be done in much simpler / traditional ways.
I feel it's a bit distracting from the real potential of the Jfr metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to discuss another example around Hibernate/SQL, if you have one? N+1 was the first thing that came to my mind. It's going to be discussed in a follow-up post (in January), so you got some time to think about it ;)
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
A TLAB is a pre-allocated memory block that's exclusively used by a single thread. | ||
Creating new objects within a TLAB can happen without costly synchronization with other threads. | ||
Once a thread's current TLAB capacity is about to be exceeded by a new object allocation, | ||
a new TLAB will be allocated for that thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum I don't remember the limit defaults, but I think it's a bit misleading to let people think it will just keep growing, as the whole point of our most important optimisations is to not exceed the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following on that one; what is the "it" in "it will just keep growing"? The TLAB doesn't grow, it will be used up by allocations, and when the next allocation doesn't fit into it, the thread will get a new TLAB for the next set of allocations. You won't avoid that, unless you do zero allocations or allocate in your own self-managed byte[]
.
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
* *Hardware independent:* You can identify potential regressions also when running tests on hardware which is different (i.e. less powerful) from the actual production hardware | ||
* *Fast feedback cycle:* Being able to run performance regression tests on developer laptops, even in the IDE, allows for fast identification of potential regressions right during development, instead of having to wait for the results of less frequently executed test runs in a traditional performance test lab environment | ||
* *Robustness:* Tests are robust and not prone to factors such as the load induced by parallel jobs of a CI server or a virtualized/containerized environment | ||
* *Pro-active idenfication of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* *Pro-active idenfication of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on | |
* *Pro-active identification of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also might want to consider: in a complete production system there might be more components being integrated then the ones we have in our testbeds.
When measuring allocation rate of components individually, they might not seem problematic as it's easier to keep the allocation budget within the cheaper TLAB range, so while their individual allocation rate might not seem to have a problematic impact on throughput measurements performed on simple benchmarks, an excessive use of allocations could still translate in problematic bottlenecks on a more complex system as the same TLAB region is shared with other libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting one; any idea how you'd identify such bottleneck? I don't think it needs discussion in this post (it's super long already), but would like to better understand and see what perhaps even can be done in JfrUnit itself towards that end.
content/blog/towards-continuous-performance-regression-testing.asciidoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Sanne Grinovero <[email protected]>
Ok, so I think I've addressed the critical review remarks by all of you. Thank you so much. Latest update just pushed, going to look into zoomable images next. |
Some more word-smithing and fixing. Pushing now. Thank you all, I'm deeply grateful for all the feedback you provided on such short notice 🙏 ! |
No description provided.