Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize for topology term #1310

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

huww98
Copy link
Contributor

@huww98 huww98 commented Dec 11, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Greatly reduce the CPU and memory usage, especially for drivers assign different topology to each host. Benchmark shows 30x CPU reduction and 21x memory allocation size reduction.

We were performing a stress test recently, and found the bottleneck to be the CPU usage of external-provisioner. From profiling, topologyTerm.hash takes almost all the CPU time (apart from logGRPC).
image
Our setup is a 3k node cluster, each node has a distinct topology. It took about 50s for provisioning 3k immediate binding volumes, despite that we have raised the CPU limit to 10. external-provisioner also took ~3GiB memory, which is far higher than expected.

This is my initial attempt (not included in this PR) to optimize this. I re-implemented the compare function for sort to avoid hash() and added some benchmarks. The benchmark result is:

goos: darwin
goarch: arm64
pkg: github.com/kubernetes-csi/external-provisioner/v5/pkg/controller
cpu: Apple M2 Pro
BenchmarkDedupAndSortZone/baseline-12               1641            643224 ns/op          697858 B/op      15032 allocs/op
BenchmarkDedupAndSortZone/compare-12                1812            642864 ns/op          697004 B/op      15012 allocs/op
BenchmarkDedupAndSortZone/slices.Sort-12            1795            644037 ns/op          696948 B/op      15010 allocs/op
BenchmarkDedupAndSortZone/slices.Compact-12          906           1309465 ns/op          540267 B/op       8060 allocs/op
BenchmarkDedupAndSortHost/baseline-12                 50          22663181 ns/op        27017105 B/op     530151 allocs/op
BenchmarkDedupAndSortHost/compare-12                 156           7625107 ns/op         5039063 B/op      60242 allocs/op
BenchmarkDedupAndSortHost/slices.Sort-12             156           7646612 ns/op         5040749 B/op      60260 allocs/op
BenchmarkDedupAndSortHost/slices.Compact-12          193           6147562 ns/op         3457850 B/op      37009 allocs/op

By comparing compare and baseline for BenchmarkDedupAndSortHost, re-implementing compare already gives 3x speedup and 5x memory allocation size reduction. But when I try to replace the hash() func completely by slices.Compact, the host case is faster, but the zone case becomes slower.

While I'm trying to find more opportunity to optimize, I do another profile for Host/slices.Sort case:
image
Making slice, accessing map, and sorting keys can all be avoid if we define topologyTerm as a sorted slice instead of map (the first commit of this PR). The benchmark result from the first commit is:

BenchmarkDedupAndSortZone/slices.Sort-12                    1975            527407 ns/op          747079 B/op      15014 allocs/op
BenchmarkDedupAndSortZone/slices.Compact-12                10000            114418 ns/op           74904 B/op         11 allocs/op
BenchmarkDedupAndSortHost/slices.Sort-12                     652           1847723 ns/op         2948386 B/op      30078 allocs/op
BenchmarkDedupAndSortHost/slices.Compact-12                 1597            739441 ns/op         1250314 B/op       9002 allocs/op

Host/slices.Sort case is speedup several times again, the Zone/slices.Sort case is also improved because of the more efficient hash(). With the very cheap compare(), using slices.Compact to deduplicate becomes beneficial for both case!
So, with the second commit, the hash() is removed completely.

Note that for fair comparation, the benchmark also includes toCSITopology invocation, because if topologyTerm is a slice, we need to convert it back to map for GRPC. Actually, after this refactor, most of the time is spent on toCSITopology(). If I remove it, the result is:

BenchmarkDedupAndSortZone-12               10725            111114 ns/op           73753 B/op          1 allocs/op
BenchmarkDedupAndSortHost-12                4029            287086 ns/op           73830 B/op          3 allocs/op

This is ~78x speedup compared to the baseline, with zero allocation in deduplication and sort step!

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Dec 11, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 11, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @huww98. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 11, 2024
pkg/controller/topology.go Show resolved Hide resolved
pkg/controller/topology.go Show resolved Hide resolved
pkg/controller/topology.go Show resolved Hide resolved
@@ -39,8 +39,14 @@ import (
"k8s.io/klog/v2"
)

type topologyKey struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it strange to call something a Key if it has a Key field inside. topologyKeyValue? topologyItem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to topologySegment, according to CSI spec.

@jsafrane
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 12, 2024
Compared with map, slice is a lot faster. And we can avoid allocation in the compare() func, which is a very hot path.
This allow us to get rid of the hash() func, which is slow and do too many allocations.
@huww98 huww98 force-pushed the optimize-topologyTerm branch from 266c667 to bd3b4a1 Compare December 12, 2024 17:35
@jsafrane
Copy link
Contributor

/lgtm
/approve
I am really glad we have a very good unit test coverage around.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: huww98, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2024
@k8s-ci-robot k8s-ci-robot merged commit c79d156 into kubernetes-csi:master Dec 13, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants