Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: force snapshot under memory pressure #25726

Closed
wants to merge 1 commit into from

Conversation

praveen-influx
Copy link
Contributor

  • The core of the change is to introduce another method force_flush_buffer in Wal trait. This gives a handle to choose when to kick off snapshot.
  • A higher level background loop is introduced that checks the overall table buffer size every N seconds and if it is greater than a threshold (X) then it calls force_flush_buffer. Both N and X are configurable through cli. N defaults to 10s and X defaults to 70%
  • Some refactoring of the code went on to make sure the calls made via Wal trait to flush buffer and cleanup any snapshot is reused across both branches (forcing snapshot and normal wal buffer flush)

closes: #25685

- The core of the change is to introduce another method
  `force_flush_buffer` in `Wal` trait. This gives a handle to choose
  when to kick off snapshot.
- A higher level background loop is introduced that checks the overall
  table buffer size every `N` seconds and if it is greater than a
  threshold (`X`) then it calls `force_flush_buffer`. Both `N` and `X` are
  configurable through cli. `N` defaults to 10s and `X` defaults to 70%
- Some refactoring of the code went on to make sure the calls made
  via `Wal` trait to flush buffer and cleanup any snapshot is reused
  across both branches (forcing snapshot and normal wal buffer flush)

closes: #25685
Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about the logic, I think there's one bit to be cleaned up. Will have a quick call to walk through.

/// Interval to check buffer size (and compare with `force_snapshot_mem_threshold`)
#[clap(
long = "force-snapshot-interval",
env = "INFLUXDB3_FORCE_SNAPSHOT_INTERVAL",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a configuration option. Just have it as a constant. You can then either initialize with that or, in tests, with something smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this interval in the other PR, although I was planning to use it in e2e tests.

influxdb3_wal/src/lib.rs Show resolved Hide resolved
influxdb3_wal/src/object_store.rs Show resolved Hide resolved
@@ -118,6 +123,29 @@ impl SnapshotTracker {
})
}

fn snapshot_up_to_last_wal_period(&mut self) -> Option<SnapshotInfo> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is a bit off. During normal snapshot operation, we want to snapshot the flush size number of wal periods. But we want to leave behind flush size / 2 periods. So that means we wait until we have flush size + flush size / 2 periods and then we take the oldest flush size periods, leaving behind flush size / 2.

If we're in a situation where we can't flush the WAL and leave behind periods because the time stamps of the data are all interleaved, we flush everything, except the most recent wal period. We do this because the buffer snapshots what is in it and then puts the most recent period into the snapshot.

In the case of forcing a snapshot, we don't need to check should_snapshot, we just treat it like the situation where we have 3x the flush size. So we want to snapshot everything minus the last wal period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation - I've addressed it here in the other PR

@praveen-influx
Copy link
Contributor Author

Thanks @pauldix - I'll close this in favour of #25727, it takes a different approach so no point in continuing with the same PR. I've also addressed your comments that are still valid in the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Force snapshot under memory pressure
2 participants