Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce in memory queue limit by 16x #2455

Merged
merged 1 commit into from
Dec 23, 2024

Conversation

jackkleeman
Copy link
Contributor

This was previously across all partitions, but since 1.1 its per partition. And it is 350M per partition. Those entries are not initially used, but as you scale to 1m invocations per partition, all the memory pages in the queue's ring buffer are dirtied and contribute to RSS. This leads to 9G of usage on a 24 partition node.

This PR reduces the limit by 16x to 21M per partition, or 562M on a 24 partition node, which it will reach after 1.5 million invocations. A more manageable figure, even if it still appears as a 'leak' until that amount is reached.

This was previously across all partitions, but since 1.1 its per
partition. And it is 350M per partition. Those entries are not initially
used, but as you scale to 1m invocations per partition, all the
memory pages in the queue's ring buffer are dirtied and contribute to
RSS. This leads to 9G of usage on a 24 partition node.

This PR reduces the limit by 16x to 21M per partition, or 562M on a 24
partition node, which it will reach after 1.5 million invocations. A
more manageable figure, even if it still appears as a 'leak' until that
amount is reached.
Copy link
Contributor

@AhmedSoliman AhmedSoliman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work investigating and proposing an improvement to this @jackkleeman. Changes look good to me.

@jackkleeman jackkleeman merged commit 1ac1f70 into restatedev:main Dec 23, 2024
11 checks passed
@jackkleeman jackkleeman deleted the in-memory-queue-limit branch December 23, 2024 14:34
jackkleeman added a commit that referenced this pull request Dec 23, 2024
This was previously across all partitions, but since 1.1 its per
partition. And it is 350M per partition. Those entries are not initially
used, but as you scale to 1m invocations per partition, all the
memory pages in the queue's ring buffer are dirtied and contribute to
RSS. This leads to 9G of usage on a 24 partition node.

This PR reduces the limit by 16x to 21M per partition, or 562M on a 24
partition node, which it will reach after 1.5 million invocations. A
more manageable figure, even if it still appears as a 'leak' until that
amount is reached.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants