Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous performance unit testing #15

Merged
merged 14 commits into from
Dec 16, 2020

Conversation

gunnarmorling
Copy link
Owner

No description provided.

Copy link

@hpgrahsl hpgrahsl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really enjoyed this early stage read! my comments are mostly referring to trivial things like typos etc. plus a few questions. nothing serious I would say.


You can find a complete list of all JFR event types by JDK version in this https://bestsolution-at.github.io/jfr-doc/[nice matrix] created by https://twitter.com/tomsontom[Tom Schindl].
The number of JFR event types is growing constantly, as of JDK 15, there 157 different ones of them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I stopped processing here ;-)

@gunnarmorling gunnarmorling force-pushed the continuous-performance-unit-testing branch from 4215aee to 1ce973f Compare December 12, 2020 10:09
@github-actions
Copy link

github-actions bot commented Dec 12, 2020

♻️ PR Preview ddf9d24 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

@gunnarmorling gunnarmorling force-pushed the continuous-performance-unit-testing branch from 1ce973f to 886a48c Compare December 14, 2020 21:15
@gunnarmorling gunnarmorling force-pushed the continuous-performance-unit-testing branch from 886a48c to ed98607 Compare December 14, 2020 21:16
@gunnarmorling
Copy link
Owner Author

Hey @hpgrahsl, thanks a lot for that first round of review! I've addressed most of your remarks, and I've also added the missing parts to the post. I.e. it's ready now for a review of that remainder, should you have the time and interest. There's also a rendered preview available now. I'm planning to do some more polishing and also updates to the images, but overall, it's 95% of what I had in mind. Thank you so much!

but also to identify regressions -- bugs to existing functionality introduced by a code change.
The situation looks different though when it comes to regressions related to non-functional requirements, in particular performance-related ones:
How to detect increased response times in a web application?
How to identify decreased throughput?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand what you mean, I'd say that these specific examples are a bit off since they are very much metrics which I would NOT test using your new tool?

There's a need for testing at both levels:

  • very micro, such as "This sort operation here can be performed in less than Z memory allocated even for an N sized array"
  • system wide impact, such as system design and integration is such that overall throughput (on certain machine) is within X,Y with some margins.

I'd use a tool like JfrUnit for the first cathefory only, it seems a slippery slope to try abusing it for beyond this and that's probably a claim I'd be umcofortable with :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a need for testing at both levels

Agreed. I think JfrUnit's testing approach can play a role for both, though. For the second, it wouldn't help you answering that question "overall throughput is within X,Y with some margins" directly, but it would help you to identify potential regressions, going against that goal. It should be more clear in the discussion towards the end, perhaps I need to reword here a bit, too.

How to identify decreased throughput?

These aspects are typically hard to test in an automated and reliable way in the development workflow,
as they are dependent on the underlying hardware and the workload of an application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But best to split what you're going to measure here in cathegories. Since JfrUnit is about "indirect" metrics, you can't really verify that not having a certain Sleep(1000) or not having a certain GC event will necessarily give you the throughput you're after.

This post introduces https://github.com/gunnarmorling/jfrunit[JfrUnit], which offers a fresh angle to this topic by supporting assertions not on metrics like latency/throughput themselves, but on _indirect metrics_ which may impact those.
Based on https://openjdk.java.net/jeps/328[JDK Flight Recorder] events, JfrUnit allows you define and execute assertions e.g. against expected memory allocation, database I/O, or number of executed SQL statements, for a given workload.
Starting off from a defined base line, future failures of such assertions are indicators for potential performance regressions in an application, as a code change may have introduced higher GC pressure,
the retrieval of unneccessary data from the database, or SQL problems commonly induced by ORM tools, like N+1 SELECT statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well N+1 is meh.. it's certainly good to test against it (and maybe we should see to make this easier with some helper and a blog) but that could be done in much simpler / traditional ways.

I feel it's a bit distracting from the real potential of the Jfr metrics.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to discuss another example around Hibernate/SQL, if you have one? N+1 was the first thing that came to my mind. It's going to be discussed in a follow-up post (in January), so you got some time to think about it ;)

A TLAB is a pre-allocated memory block that's exclusively used by a single thread.
Creating new objects within a TLAB can happen without costly synchronization with other threads.
Once a thread's current TLAB capacity is about to be exceeded by a new object allocation,
a new TLAB will be allocated for that thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum I don't remember the limit defaults, but I think it's a bit misleading to let people think it will just keep growing, as the whole point of our most important optimisations is to not exceed the limit.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following on that one; what is the "it" in "it will just keep growing"? The TLAB doesn't grow, it will be used up by allocations, and when the next allocation doesn't fit into it, the thread will get a new TLAB for the next set of allocations. You won't avoid that, unless you do zero allocations or allocate in your own self-managed byte[].

* *Hardware independent:* You can identify potential regressions also when running tests on hardware which is different (i.e. less powerful) from the actual production hardware
* *Fast feedback cycle:* Being able to run performance regression tests on developer laptops, even in the IDE, allows for fast identification of potential regressions right during development, instead of having to wait for the results of less frequently executed test runs in a traditional performance test lab environment
* *Robustness:* Tests are robust and not prone to factors such as the load induced by parallel jobs of a CI server or a virtualized/containerized environment
* *Pro-active idenfication of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* *Pro-active idenfication of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on
* *Pro-active identification of performance issues:* Asserting a metric like memory allocation can help to identify future performance problems before they actual materialize; while the additional allocation rate may make no difference with the system's load as of today, it may negatively impact latency and throughput as the system reaches its limits with increased load; being able to identify the increased allocation rate early on allows for a more efficient handling of the situation while working on the code, compared to when finding out about such regression only later on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might want to consider: in a complete production system there might be more components being integrated then the ones we have in our testbeds.

When measuring allocation rate of components individually, they might not seem problematic as it's easier to keep the allocation budget within the cheaper TLAB range, so while their individual allocation rate might not seem to have a problematic impact on throughput measurements performed on simple benchmarks, an excessive use of allocations could still translate in problematic bottlenecks on a more complex system as the same TLAB region is shared with other libraries.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting one; any idea how you'd identify such bottleneck? I don't think it needs discussion in this post (it's super long already), but would like to better understand and see what perhaps even can be done in JfrUnit itself towards that end.

@gunnarmorling
Copy link
Owner Author

Ok, so I think I've addressed the critical review remarks by all of you. Thank you so much. Latest update just pushed, going to look into zoomable images next.

@gunnarmorling
Copy link
Owner Author

Some more word-smithing and fixing. Pushing now. Thank you all, I'm deeply grateful for all the feedback you provided on such short notice 🙏 !

@gunnarmorling gunnarmorling merged commit a9fcc4b into master Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants