Continuous performance unit testing #15

gunnarmorling · 2020-12-06T22:12:14Z

No description provided.

hpgrahsl

really enjoyed this early stage read! my comments are mostly referring to trivial things like typos etc. plus a few questions. nothing serious I would say.

content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc

hpgrahsl · 2020-12-07T20:52:47Z

content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc

+
+You can find a complete list of all JFR event types by JDK version in this https://bestsolution-at.github.io/jfr-doc/[nice matrix] created by https://twitter.com/tomsontom[Tom Schindl].
+The number of JFR event types is growing constantly, as of JDK 15, there 157 different ones of them.
+


note: I stopped processing here ;-)

content/blog/how-i-learned-to-stop-worrying-and-love-performance-regression-tests.asciidoc

github-actions · 2020-12-12T10:09:57Z

♻️ PR Preview ddf9d24 has been successfully destroyed since this PR has been closed.

_{🤖 By surge-preview}

gunnarmorling · 2020-12-14T21:18:46Z

Hey @hpgrahsl, thanks a lot for that first round of review! I've addressed most of your remarks, and I've also added the missing parts to the post. I.e. it's ready now for a review of that remainder, should you have the time and interest. There's also a rendered preview available now. I'm planning to do some more polishing and also updates to the images, but overall, it's 95% of what I had in mind. Thank you so much!

content/blog/towards-continuous-performance-regression-testing.asciidoc

Sanne · 2020-12-15T11:51:46Z

content/blog/towards-continuous-performance-regression-testing.asciidoc

+but also to identify regressions -- bugs to existing functionality introduced by a code change.
+The situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones:
+How to detect increased response times in a web application?
+How to identify decreased throughput?


While I understand what you mean, I'd say that these specific examples are a bit off since they are very much metrics which I would NOT test using your new tool?

There's a need for testing at both levels:

very micro, such as "This sort operation here can be performed in less than Z memory allocated even for an N sized array"

system wide impact, such as system design and integration is such that overall throughput (on certain machine) is within X,Y with some margins.

I'd use a tool like JfrUnit for the first cathefory only, it seems a slippery slope to try abusing it for beyond this and that's probably a claim I'd be umcofortable with :)

There's a need for testing at both levels

Agreed. I think JfrUnit's testing approach can play a role for both, though. For the second, it wouldn't help you answering that question "overall throughput is within X,Y with some margins" directly, but it would help you to identify potential regressions, going against that goal. It should be more clear in the discussion towards the end, perhaps I need to reword here a bit, too.

Sanne · 2020-12-15T11:53:58Z

content/blog/towards-continuous-performance-regression-testing.asciidoc

+How to identify decreased throughput?
+
+These aspects are typically hard to test in an automated and reliable way in the development workflow,
+as they are dependent on the underlying hardware and the workload of an application.


But best to split what you're going to measure here in cathegories. Since JfrUnit is about "indirect" metrics, you can't really verify that not having a certain Sleep(1000) or not having a certain GC event will necessarily give you the throughput you're after.

content/blog/towards-continuous-performance-regression-testing.asciidoc

Sanne · 2020-12-15T12:10:08Z

content/blog/towards-continuous-performance-regression-testing.asciidoc

+This post introduces https://github.com/gunnarmorling/jfrunit[JfrUnit], which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on _indirect metrics_ which may impact those.
+Based on https://openjdk.java.net/jeps/328[JDK Flight Recorder] events, JfrUnit allows you define and execute assertions e.g. against expected memory allocation, database I/O, or number of executed SQL statements, for a given workload.
+Starting off from a defined base line, future failures of such assertions are indicators for potential performance regressions in an application, as a code change may have introduced higher GC pressure,
+the retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.


well N+1 is meh.. it's certainly good to test against it (and maybe we should see to make this easier with some helper and a blog) but that could be done in much simpler / traditional ways.

I feel it's a bit distracting from the real potential of the Jfr metrics.

Happy to discuss another example around Hibernate/SQL, if you have one? N+1 was the first thing that came to my mind. It's going to be discussed in a follow-up post (in January), so you got some time to think about it ;)

content/blog/towards-continuous-performance-regression-testing.asciidoc

Sanne · 2020-12-15T12:19:46Z

content/blog/towards-continuous-performance-regression-testing.asciidoc

+A TLAB is a pre-allocated memory block that's exclusively used by a single thread.
+Creating new objects within a TLAB can happen without costly synchronization with other threads.
+Once a thread's current TLAB capacity is about to be exceeded by a new object allocation,
+a new TLAB will be allocated for that thread.


hum I don't remember the limit defaults, but I think it's a bit misleading to let people think it will just keep growing, as the whole point of our most important optimisations is to not exceed the limit.

I'm not quite following on that one; what is the "it" in "it will just keep growing"? The TLAB doesn't grow, it will be used up by allocations, and when the next allocation doesn't fit into it, the thread will get a new TLAB for the next set of allocations. You won't avoid that, unless you do zero allocations or allocate in your own self-managed byte[].

content/blog/towards-continuous-performance-regression-testing.asciidoc

Sanne · 2020-12-15T12:49:31Z

content/blog/towards-continuous-performance-regression-testing.asciidoc

+* *Hardware independent:* You can identify potential regressions also when running tests on hardware which is different (i.e. less powerful) from the actual production hardware
+* *Fast feedback cycle:* Being able to run performance regression tests on developer laptops, even in the IDE, allows for fast identification of potential regressions right during development, instead of having to wait for the results of less frequently executed test runs in a traditional performance test lab environment
+* *Robustness:* Tests are robust and not prone to factors such as the load induced by parallel jobs of a CI server or a virtualized/containerized environment
+* *Pro-active idenfication of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on


Suggested change

* *Pro-active idenfication of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on

* *Pro-active identification of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on

Also might want to consider: in a complete production system there might be more components being integrated then the ones we have in our testbeds.

When measuring allocation rate of components individually, they might not seem problematic as it's easier to keep the allocation budget within the cheaper TLAB range, so while their individual allocation rate might not seem to have a problematic impact on throughput measurements performed on simple benchmarks, an excessive use of allocations could still translate in problematic bottlenecks on a more complex system as the same TLAB region is shared with other libraries.

That's an interesting one; any idea how you'd identify such bottleneck? I don't think it needs discussion in this post (it's super long already), but would like to better understand and see what perhaps even can be done in JfrUnit itself towards that end.

content/blog/towards-continuous-performance-regression-testing.asciidoc

Co-authored-by: Sanne Grinovero <[email protected]>

gunnarmorling · 2020-12-15T21:37:20Z

Ok, so I think I've addressed the critical review remarks by all of you. Thank you so much. Latest update just pushed, going to look into zoomable images next.

gunnarmorling · 2020-12-16T19:17:59Z

Some more word-smithing and fixing. Pushing now. Thank you all, I'm deeply grateful for all the feedback you provided on such short notice 🙏 !

hpgrahsl reviewed Dec 7, 2020

View reviewed changes

gunnarmorling force-pushed the continuous-performance-unit-testing branch from 4215aee to 1ce973f Compare December 12, 2020 10:09

gunnarmorling added 5 commits December 14, 2020 20:00

WIP

7795c4c

Misc. changes

fa8302a

Database IO, WIP

895e191

More Database IO, WIP

92f2acc

Summary WIP

9c7ffe2

gunnarmorling force-pushed the continuous-performance-unit-testing branch from 1ce973f to 886a48c Compare December 14, 2020 21:15

Towards Continuous Performance Regression Testing

ed98607

gunnarmorling force-pushed the continuous-performance-unit-testing branch from 886a48c to ed98607 Compare December 14, 2020 21:16