Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main, view: Pair view builder drain with its start #21909

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dawmd
Copy link
Contributor

@dawmd dawmd commented Dec 12, 2024

This PR reverts a patch that solved a problem related to an initialization
order. Unfortunately, after merging it, we ran into another one -- after
changing the place where group0_service was initialized, it ended up
being destroyed AFTER the CDC generation service would. Since CDC
generations are accessed in storage_service::topology_state_load():

for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) {
    rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id);
    co_await _cdc_gens.local().handle_cdc_generation(gen_id);

we started getting the following failure:

Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed.

To solve the issue, we revert the changes and approach the problem
in another way: before that patch, we only drained the view builder
when stopping it, which was paired with its construction. However,
since the view builder depends on group0_service, which is initialized
AFTER the view builder is constructed, that could lead to segmentation
faults. To amend it, we pair draining the view builder with its start, while
leaving the lambda destroying it paired with its construction. This way
we should have a proper initialization order, avoid bugs, and reinforce
our understanding of how Scylla starts up.

Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too.

Fixes #20772
Fixes #21534

@dawmd dawmd requested review from nyh and piodul as code owners December 12, 2024 18:50
@dawmd dawmd requested a review from xemul December 12, 2024 18:50
@dawmd dawmd self-assigned this Dec 12, 2024
@dawmd dawmd added the backport/6.2 should be backported to 6.2 label Dec 12, 2024
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

Build Details:

  • Duration: 28 min
  • Builder: i-0648fc04d5a267f56 (m5d.8xlarge)

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Docker Test
✅ - dtest with topology changes
✅ - dtest
✅ - dtest with tablets
✅ - Offline-installer Artifact Tests
❌ - Unit Tests

Failed Tests (1/36079):

Build Details:

  • Duration: 3 hr 53 min
  • Builder: spider8.cloudius-systems.com

@dawmd
Copy link
Contributor Author

dawmd commented Dec 16, 2024

Looks like #21628. Rerunning CI

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Docker Test
✅ - dtest with tablets
✅ - dtest
✅ - Offline-installer Artifact Tests
✅ - dtest with topology changes
✅ - Unit Tests

Build Details:

  • Duration: 5 hr 34 min
  • Builder: i-05fc31621ab76222d (m5d.8xlarge)

Copy link
Contributor

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I have only one comment.

db/view/view.cc Outdated
Comment on lines 2098 to 2104
future<> view_builder::stop() {
vlogger.info("Stopping view builder");
return drain();
co_return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: put an assertion here which checks that drain() was indeed run (it needs to be run in order for the view builder to be properly stopped) - for example, you could check if _as.abort_requested() is true

@dawmd
Copy link
Contributor Author

dawmd commented Dec 17, 2024

Version 2:

  • Added an assert checking if we've drained a view builder before stopping it.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Docker Test
✅ - dtest with topology changes
✅ - Offline-installer Artifact Tests
✅ - dtest with tablets
✅ - dtest
❌ - Unit Tests

Failed Tests (2/36177):

Build Details:

  • Duration: 8 hr 4 min
  • Builder: i-004ca5bdcc17620cf (m5ad.8xlarge)

@dawmd
Copy link
Contributor Author

dawmd commented Dec 19, 2024

* [test_mv_topology_change.dev.2](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13897/testReport/junit/topology_custom/test_mv_topology_change/Tests___Unit_Tests___test_mv_topology_change_dev_2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_mv_topology_change.dev.2)

* [test_mv_topology_change.dev.3](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13897/testReport/junit/topology_custom/test_mv_topology_change/Tests___Unit_Tests___test_mv_topology_change_dev_3) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_mv_topology_change.dev.3)

#21912

Rerunning CI

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Docker Test
✅ - dtest with tablets
✅ - Offline-installer Artifact Tests
✅ - dtest with topology changes
❌ - dtest
✅ - Unit Tests

Failed Tests (1/36344):

Build Details:

  • Duration: 8 hr 13 min
  • Builder: spider4.cloudius-systems.com

@piodul
Copy link
Contributor

piodul commented Dec 20, 2024

Failed Tests (1/36344):

I believe it's #15715. Retriggering CI

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Docker Test
✅ - dtest
✅ - Offline-installer Artifact Tests
✅ - dtest with tablets
✅ - dtest with topology changes
✅ - Unit Tests

Build Details:

  • Duration: 3 hr 47 min
  • Builder: spider2.cloudius-systems.com

return drain();
// If the view builder has been started, verify that it's also been drained before we stop it.
SCYLLA_ASSERT(!_started_flag || _as.abort_requested());
co_return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before these changes, we only drained the view builder when stopping it,

No, we also drain v.b. in storage_service::do_drain(). Why is it not enough?

which was paired with its construction. However, since the view
builder depends on group0_service, which is initialized AFTER the view
builder is constructed, that could lead to segmentation faults.

Then group0 shouldn't be initialized after a service that depends on it. Why can't group0 service be started before view builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we also drain v.b. in storage_service::do_drain(). Why is it not enough?

Between the initialisation of the view builder and the construction of the deferred lambda responsible for calling storage_service::drain_on_shutdown() there are a number of other services being initialised. If any of those initalisations fails, not having drained the view builder will prevent Scylla from shutting down. I've only realised it now, but in the current version of the code, we already call drain on the view builders twice for that reason (once in the storage service code and once in view_builder::stop()).

Then group0 shouldn't be initialized after a service that depends on it. Why can't group0 service be started before view builder?

It is still started before it. My wording in the commit message is a bit vague, but what happens is this:

  • the view builder is constructed,
  • group0_service starts,
  • the view builder starts.

The problem I was referring to in the commit is #20772: view building could still be ongoing after stopping group0_service; when it ends, it tries to access group0, which causes a segmentation fault (cf. #20772 (comment)). The fix we applied was 7bad837, which moved the initialisation of group0 before the construction of the view builders. While that helped with the segmentation fault issue, another one popped up: #21534. This PR tries to solve both the right way.

I'll update the commit message because after two weeks off, I didn't understand it immediately myself. Thanks for your feedback, Pavel!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the initialisation of the view builder and the construction of the deferred lambda responsible for calling storage_service::drain_on_shutdown() there are a number of other services being initialised. If any of those initalisations fails, not having drained the view builder will prevent Scylla from shutting down.

For that, view_builder::stop() explicitly calls .drain(), so even if drain-on-shutdown wasn't scheduled, view_builder::drain() would still be invoked.

It is still started before it. My wording in the commit message is a bit vague, but what happens is this:

  • the view builder is constructed,
  • group0_service starts,
  • the view builder starts.

Ah, I see. And you need to undo the item #3 before undoing item #2 on stop.

However, I'm looking into code and see that the sequence is different:

            group0_service.start().get();
            auto stop_group0_service = defer_verbose_shutdown("group 0 service", [&group0_service] {
                sl_controller.local().abort_group0_operations();
                group0_service.abort().get();
            });

            utils::get_local_injector().inject("stop_after_starting_group0_service",
                [] { std::raise(SIGSTOP); });

            // Set up group0 service earlier since it is needed by group0 setup just below
            ss.local().set_group0(group0_service);

            supervisor::notify("starting view update generator");
            view_update_generator.start(std::ref(db), std::ref(proxy), std::ref(stop_signal.as_sharded_abort_source())).get();
            auto stop_view_update_generator = defer_verbose_shutdown("view update generator", [] {
                view_update_generator.stop().get();
            });
            
            supervisor::notify("starting the view builder");
            view_builder.start(std::ref(db), std::ref(sys_ks), std::ref(sys_dist_ks), std::ref(mm_notifier), std::ref(view_update_generator), std::ref(group0_client), std::ref(qp)).get();
            auto stop_view_builder = defer_verbose_shutdown("view builder", [cfg] {
                view_builder.stop().get();
            });

so it's rather

  • grou0_service starts
  • view_builder is constructed
  • ...
  • view_builder starts

on stop view_builder is stopped before group0_service is

The patch solved a problem related to an initialization order
(scylladb#20772), but we ran into another one: scylladb#21534.
After moving the initialization of group0_service, it ended up being destroyed
AFTER the CDC generation service would. Since CDC generations are accessed
in `storage_service::topology_state_load()`:

```
for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) {
    rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id);
    co_await _cdc_gens.local().handle_cdc_generation(gen_id);
```

we started getting the following failure:

```
Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed.
```

We're reverting the patch to go back to a more stable version of Scylla
and in the following commit, we'll solve the original issue in a more
systematic way.

This reverts commit 7bad837.
What happens in these changes is the following:

* We "split" `view_builder::stop()` and `view_builder::drain()` to avoid
  implicit calls to `drain()`, which before this commit was called by
  `stop()`. The goal is to have a clearer understanding what happens when.
  If the user wants to call both, they should do it explicitly and in
  the right order. Note that `stop()` becomes almost a NO-OP: the only
  thing it does is verify that the precondition (the view builder having
  been drained and no prior calls to `stop()` having been issued) is
  satisfied and log that the view builder is being stopped.

* More importantly, we pair draining the view builder with its start.
  To better understand what was done and why, let's first look at the
  situation before this commit and the context of it:

  (a) The following things happened in order:

    1. The view builder would be constructed.
    2. Right after that, a deferred lambda would be created to stop the
       view builder during shutdown.
    3. group0_service would be started.
    4. A deferred lambda stopping group0_service would be created right
       after that.
    5. The view builder would be started.

  (b) Because the view builder depends on group0_client, it couldn't be
      started before starting group0_service. On the other hand, other
      services depend on the view builder, e.g. the stream manager. That
      makes changing the order of initialization a difficult problem,
      so we want to avoid doing that unless we're sure it's the right
      choice.

  (c) Since the view builder uses group0_client, there was a possibility
      of running into a segmentation fault issue in the following
      scenario:

      1. A call to `view_builder::mark_view_build_success()` is issued.
      2. We stop group0_service.
      3. `view_builder::mark_view_build_success()` calls
	 `announce_with_raft()`, which leads to a use-after-free because
         group0_service has already been destroyed.

      This very scenario took place in scylladb#20772.

  Initially, we decided to solve the issue by initializing
  group0_service a bit earlier (scylladb/scylladb@7bad837).
  Unfortunately, it led to other issues described in scylladb#21534.
  We reverted that change in the previous commit. These changes are the
  second attempt to the problem where we want to solve it in a safer manner.

  The solution we came up with is to pair the start of the view builder
  with a deferred lambda that deinitializes it by calling
  `view_builder::drain()`. No other component of the system should be
  able to use the view builder anymore, so it's safe to do that.
  Furthermore, that pairing makes the analysis of
  initialization/deinitialization order much easier. We also solve the
  aformentioned use-after-free issue because the view builder itself
  will no longer attempt to use group0_client.

  Note that we still pair a deferred lambda calling
  `view_builder::stop()` with the construction of the view builder.

Fixes scylladb#20772
@dawmd
Copy link
Contributor Author

dawmd commented Jan 7, 2025

Version 3:

  • Improved commit messages. They should describe the problem and the solution in detail now.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - dtest with tablets
✅ - dtest
✅ - dtest with topology changes
❌ - Offline-installer Artifact Tests
✅ - Docker Test
❌ - Unit Tests

Build Details:

  • Duration: 8 hr 54 min
  • Builder: i-0d23222a5fc004d5d (m5ad.8xlarge)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/6.2 should be backported to 6.2
Projects
None yet
5 participants