main, view: Pair view builder drain with its start #21909

dawmd · 2024-12-12T18:50:28Z

This PR reverts a patch that solved a problem related to an initialization
order. Unfortunately, after merging it, we ran into another one -- after
changing the place where group0_service was initialized, it ended up
being destroyed AFTER the CDC generation service would. Since CDC
generations are accessed in storage_service::topology_state_load():

for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) {
    rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id);
    co_await _cdc_gens.local().handle_cdc_generation(gen_id);

we started getting the following failure:

Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed.

To solve the issue, we revert the changes and approach the problem
in another way: before that patch, we only drained the view builder
when stopping it, which was paired with its construction. However,
since the view builder depends on group0_service, which is initialized
AFTER the view builder is constructed, that could lead to segmentation
faults. To amend it, we pair draining the view builder with its start, while
leaving the lambda destroying it paired with its construction. This way
we should have a proper initialization order, avoid bugs, and reinforce
our understanding of how Scylla starts up.

Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too.

Fixes #20772
Fixes #21534

scylladb-promoter · 2024-12-12T19:20:10Z

🔴 CI State: FAILURE

Build Details:

Duration: 28 min
Builder: i-0648fc04d5a267f56 (m5d.8xlarge)

scylladb-promoter · 2024-12-14T07:15:09Z

🔴 CI State: FAILURE

✅ - Build
✅ - Docker Test
✅ - dtest with topology changes
✅ - dtest
✅ - dtest with tablets
✅ - Offline-installer Artifact Tests
❌ - Unit Tests

Failed Tests (1/36079):

test_view_build_status_cleanup_on_remove_node.dev.1 🔍

Build Details:

Duration: 3 hr 53 min
Builder: spider8.cloudius-systems.com

dawmd · 2024-12-16T12:21:12Z

test_view_build_status_cleanup_on_remove_node.dev.1 🔍

Looks like #21628. Rerunning CI

scylladb-promoter · 2024-12-16T17:56:49Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Docker Test
✅ - dtest with tablets
✅ - dtest
✅ - Offline-installer Artifact Tests
✅ - dtest with topology changes
✅ - Unit Tests

Build Details:

Duration: 5 hr 34 min
Builder: i-05fc31621ab76222d (m5d.8xlarge)

piodul

Looks good, I have only one comment.

piodul · 2024-12-17T10:12:12Z

db/view/view.cc

 future<> view_builder::stop() {
    vlogger.info("Stopping view builder");
-    return drain();
+    co_return;
 }


Nit: put an assertion here which checks that drain() was indeed run (it needs to be run in order for the view builder to be properly stopped) - for example, you could check if _as.abort_requested() is true

dawmd · 2024-12-17T11:30:15Z

Version 2:

Added an assert checking if we've drained a view builder before stopping it.

scylladb-promoter · 2024-12-17T19:34:55Z

🔴 CI State: FAILURE

✅ - Build
✅ - Docker Test
✅ - dtest with topology changes
✅ - Offline-installer Artifact Tests
✅ - dtest with tablets
✅ - dtest
❌ - Unit Tests

Failed Tests (2/36177):

Build Details:

Duration: 8 hr 4 min
Builder: i-004ca5bdcc17620cf (m5ad.8xlarge)

dawmd · 2024-12-19T11:50:53Z

* [test_mv_topology_change.dev.2](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13897/testReport/junit/topology_custom/test_mv_topology_change/Tests___Unit_Tests___test_mv_topology_change_dev_2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_mv_topology_change.dev.2)

* [test_mv_topology_change.dev.3](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13897/testReport/junit/topology_custom/test_mv_topology_change/Tests___Unit_Tests___test_mv_topology_change_dev_3) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_mv_topology_change.dev.3)

#21912

Rerunning CI

scylladb-promoter · 2024-12-19T20:04:47Z

🔴 CI State: FAILURE

✅ - Build
✅ - Docker Test
✅ - dtest with tablets
✅ - Offline-installer Artifact Tests
✅ - dtest with topology changes
❌ - dtest
✅ - Unit Tests

Failed Tests (1/36344):

test_add_node_with_large_partition1 🔍

Build Details:

Duration: 8 hr 13 min
Builder: spider4.cloudius-systems.com

piodul · 2024-12-20T16:35:34Z

Failed Tests (1/36344):

test_add_node_with_large_partition1 🔍

I believe it's #15715. Retriggering CI

scylladb-promoter · 2024-12-20T20:23:51Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Docker Test
✅ - dtest
✅ - Offline-installer Artifact Tests
✅ - dtest with tablets
✅ - dtest with topology changes
✅ - Unit Tests

Build Details:

Duration: 3 hr 47 min
Builder: spider2.cloudius-systems.com

xemul · 2024-12-24T08:01:42Z

db/view/view.cc

-    return drain();
+    // If the view builder has been started, verify that it's also been drained before we stop it.
+    SCYLLA_ASSERT(!_started_flag || _as.abort_requested());
+    co_return;


Before these changes, we only drained the view builder when stopping it,

No, we also drain v.b. in storage_service::do_drain(). Why is it not enough?

which was paired with its construction. However, since the view
builder depends on group0_service, which is initialized AFTER the view
builder is constructed, that could lead to segmentation faults.

Then group0 shouldn't be initialized after a service that depends on it. Why can't group0 service be started before view builder?

No, we also drain v.b. in storage_service::do_drain(). Why is it not enough?

Between the initialisation of the view builder and the construction of the deferred lambda responsible for calling storage_service::drain_on_shutdown() there are a number of other services being initialised. If any of those initalisations fails, not having drained the view builder will prevent Scylla from shutting down. I've only realised it now, but in the current version of the code, we already call drain on the view builders twice for that reason (once in the storage service code and once in view_builder::stop()).

Then group0 shouldn't be initialized after a service that depends on it. Why can't group0 service be started before view builder?

It is still started before it. My wording in the commit message is a bit vague, but what happens is this:

the view builder is constructed,

group0_service starts,

the view builder starts.

The problem I was referring to in the commit is #20772: view building could still be ongoing after stopping group0_service; when it ends, it tries to access group0, which causes a segmentation fault (cf. #20772 (comment)). The fix we applied was 7bad837, which moved the initialisation of group0 before the construction of the view builders. While that helped with the segmentation fault issue, another one popped up: #21534. This PR tries to solve both the right way.

I'll update the commit message because after two weeks off, I didn't understand it immediately myself. Thanks for your feedback, Pavel!

Between the initialisation of the view builder and the construction of the deferred lambda responsible for calling storage_service::drain_on_shutdown() there are a number of other services being initialised. If any of those initalisations fails, not having drained the view builder will prevent Scylla from shutting down.

For that, view_builder::stop() explicitly calls .drain(), so even if drain-on-shutdown wasn't scheduled, view_builder::drain() would still be invoked.

It is still started before it. My wording in the commit message is a bit vague, but what happens is this:

the view builder is constructed,

group0_service starts,

the view builder starts.

Ah, I see. And you need to undo the item #3 before undoing item #2 on stop.

However, I'm looking into code and see that the sequence is different:

group0_service.start().get(); auto stop_group0_service = defer_verbose_shutdown("group 0 service", [&group0_service] { sl_controller.local().abort_group0_operations(); group0_service.abort().get(); }); utils::get_local_injector().inject("stop_after_starting_group0_service", [] { std::raise(SIGSTOP); }); // Set up group0 service earlier since it is needed by group0 setup just below ss.local().set_group0(group0_service); supervisor::notify("starting view update generator"); view_update_generator.start(std::ref(db), std::ref(proxy), std::ref(stop_signal.as_sharded_abort_source())).get(); auto stop_view_update_generator = defer_verbose_shutdown("view update generator", [] { view_update_generator.stop().get(); }); supervisor::notify("starting the view builder"); view_builder.start(std::ref(db), std::ref(sys_ks), std::ref(sys_dist_ks), std::ref(mm_notifier), std::ref(view_update_generator), std::ref(group0_client), std::ref(qp)).get(); auto stop_view_builder = defer_verbose_shutdown("view builder", [cfg] { view_builder.stop().get(); });

so it's rather

grou0_service starts

view_builder is constructed

...

view_builder starts

on stop view_builder is stopped before group0_service is

The patch solved a problem related to an initialization order (scylladb#20772), but we ran into another one: scylladb#21534. After moving the initialization of group0_service, it ended up being destroyed AFTER the CDC generation service would. Since CDC generations are accessed in `storage_service::topology_state_load()`: ``` for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) { rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id); co_await _cdc_gens.local().handle_cdc_generation(gen_id); ``` we started getting the following failure: ``` Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed. ``` We're reverting the patch to go back to a more stable version of Scylla and in the following commit, we'll solve the original issue in a more systematic way. This reverts commit 7bad837.

What happens in these changes is the following: * We "split" `view_builder::stop()` and `view_builder::drain()` to avoid implicit calls to `drain()`, which before this commit was called by `stop()`. The goal is to have a clearer understanding what happens when. If the user wants to call both, they should do it explicitly and in the right order. Note that `stop()` becomes almost a NO-OP: the only thing it does is verify that the precondition (the view builder having been drained and no prior calls to `stop()` having been issued) is satisfied and log that the view builder is being stopped. * More importantly, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad837). Unfortunately, it led to other issues described in scylladb#21534. We reverted that change in the previous commit. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder. Fixes scylladb#20772

dawmd · 2025-01-07T15:11:21Z

Version 3:

Improved commit messages. They should describe the problem and the solution in detail now.

scylladb-promoter · 2025-01-08T00:16:26Z

🔴 CI State: FAILURE

✅ - Build
✅ - dtest with tablets
✅ - dtest
✅ - dtest with topology changes
❌ - Offline-installer Artifact Tests
✅ - Docker Test
❌ - Unit Tests

Build Details:

Duration: 8 hr 54 min
Builder: i-0d23222a5fc004d5d (m5ad.8xlarge)

dawmd requested review from nyh and piodul as code owners December 12, 2024 18:50

dawmd requested a review from xemul December 12, 2024 18:50

dawmd self-assigned this Dec 12, 2024

dawmd added the backport/6.2 should be backported to 6.2 label Dec 12, 2024

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 14, 2024

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 16, 2024

piodul reviewed Dec 17, 2024

View reviewed changes

dawmd force-pushed the mv_init_fix branch from 94407a8 to e99921f Compare December 17, 2024 11:29

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 17, 2024

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 19, 2024

piodul mentioned this pull request Dec 20, 2024

[dtest] Tests / Sanity Tests / update_cluster_layout_tests.TestUpdateClusterLayout.test_add_node_with_large_partition2 is flakey #15715

Open

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 20, 2024

xemul reviewed Dec 24, 2024

View reviewed changes

dawmd force-pushed the mv_init_fix branch from e99921f to 3bfa167 Compare January 7, 2025 15:10

dawmd requested review from xemul and piodul January 7, 2025 15:11

scylladbbot added status/ci in progress and removed status/ci in progress labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main, view: Pair view builder drain with its start #21909

main, view: Pair view builder drain with its start #21909

dawmd commented Dec 12, 2024 •

edited

Loading

scylladb-promoter commented Dec 12, 2024

scylladb-promoter commented Dec 14, 2024

dawmd commented Dec 16, 2024

scylladb-promoter commented Dec 16, 2024

piodul left a comment

piodul Dec 17, 2024

dawmd commented Dec 17, 2024

scylladb-promoter commented Dec 17, 2024

dawmd commented Dec 19, 2024

scylladb-promoter commented Dec 19, 2024

piodul commented Dec 20, 2024

Failed Tests (1/36344):

scylladb-promoter commented Dec 20, 2024

xemul Dec 24, 2024

dawmd Jan 7, 2025

xemul Jan 9, 2025

dawmd commented Jan 7, 2025

scylladb-promoter commented Jan 8, 2025

main, view: Pair view builder drain with its start #21909

Are you sure you want to change the base?

main, view: Pair view builder drain with its start #21909

Conversation

dawmd commented Dec 12, 2024 • edited Loading

scylladb-promoter commented Dec 12, 2024

🔴 CI State: FAILURE

Build Details:

scylladb-promoter commented Dec 14, 2024

🔴 CI State: FAILURE

Failed Tests (1/36079):

Build Details:

dawmd commented Dec 16, 2024

scylladb-promoter commented Dec 16, 2024

🟢 CI State: SUCCESS

Build Details:

piodul left a comment

Choose a reason for hiding this comment

piodul Dec 17, 2024

Choose a reason for hiding this comment

dawmd commented Dec 17, 2024

scylladb-promoter commented Dec 17, 2024

🔴 CI State: FAILURE

Failed Tests (2/36177):

Build Details:

dawmd commented Dec 19, 2024

scylladb-promoter commented Dec 19, 2024

🔴 CI State: FAILURE

Failed Tests (1/36344):

Build Details:

piodul commented Dec 20, 2024

Failed Tests (1/36344):

scylladb-promoter commented Dec 20, 2024

🟢 CI State: SUCCESS

Build Details:

xemul Dec 24, 2024

Choose a reason for hiding this comment

dawmd Jan 7, 2025

Choose a reason for hiding this comment

xemul Jan 9, 2025

Choose a reason for hiding this comment

dawmd commented Jan 7, 2025

scylladb-promoter commented Jan 8, 2025

🔴 CI State: FAILURE

Build Details:

dawmd commented Dec 12, 2024 •

edited

Loading