Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk creation/deletion takes more than 30 seconds #7188

Closed
askfongjojo opened this issue Nov 28, 2024 · 8 comments
Closed

Disk creation/deletion takes more than 30 seconds #7188

askfongjojo opened this issue Nov 28, 2024 · 8 comments
Assignees
Milestone

Comments

@askfongjojo
Copy link

Disk creation and deletion is substantially slower on rack2 since it was updated to commit e7d32ae2375b0231193f1dc84271f900915b2d6b. I was waiting for crucible#1571 and crucible#1572 to be fixed/root-caused before conflating this issue with other underlying problems. But after updating to commit 41d7c9b, the problem still persists.

I'll add some of the saga examples. Here is a delete disk saga I was looking at yesterday which took 34 seconds to complete:

root@oxz_nexus_65a11c18:~# grep 7f241c14-dacf-4655-a28e-b5db32c088e6 /var/svc/log/oxide-nexus\:default.log | looker
00:19:16.695Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): preparing saga
    file = nexus/src/app/saga.rs:255
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:16.696Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): saga create
    dag = {"end_node":14,"graph":{"edge_property":"directed","edges":[[0,1,null],[1,2,null],[2,3,null],[4,5,null],[4,6,null],[6,7,null],[5,7,null],[7,8,null],[8,9,null],[9,10,null],[10,11,null],[3,4,null],[11,12,null],[13,0,null],[12,14,null]],"node_holes":[],"nodes":[{"Action":{"action_name":"disk_delete.delete_disk_record","label":"DeleteDiskRecord","name":"deleted_disk"}},{"Action":{"action_name":"disk_delete.space_account","label":"SpaceAccount","name":"no_result1"}},{"Constant":{"name":"params_for_volume_delete_subsaga","value":{"serialized_authn":{"kind":{"Authenticated":[{"actor":{"SiloUser":{"silo_id":"7bd7623a-68ed-4636-8ecb-b59e3b068787","silo_user_id":"906e74cb-eab8-4a87-bda8-2cb0914bf853"}}},{"mapped_fleet_roles":{"admin":["admin"]}}]}},"volume_id":"95c5741c-f346-4b48-a9e5-a4017b49f3ae"}}},{"SubsagaStart":{"params_node_name":"params_for_volume_delete_subsaga","saga_name":"volume-delete"}},{"Action":{"action_name":"volume_delete.decrease_crucible_resource_count","label":"DecreaseCrucibleResourceCount","name":"crucible_resources_to_delete"}},{"Action":{"action_name":"volume_delete.delete_crucible_regions","label":"DeleteCrucibleRegions","name":"no_result_1"}},{"Action":{"action_name":"volume_delete.delete_crucible_running_snapshots","label":"DeleteCrucibleRunningSnapshots","name":"no_result_2"}},{"Action":{"action_name":"volume_delete.delete_crucible_snapshots","label":"DeleteCrucibleSnapshots","name":"no_result_3"}},{"Action":{"action_name":"volume_delete.delete_crucible_snapshot_records","label":"DeleteCrucibleSnapshotRecords","name":"no_result_4"}},{"Action":{"action_name":"volume_delete.find_freed_crucible_regions","label":"FindFreedCrucibleRegions","name":"freed_crucible_regions"}},{"Action":{"action_name":"volume_delete.delete_freed_crucible_regions","label":"DeleteFreedCrucibleRegions","name":"no_result_5"}},{"Action":{"action_name":"volume_delete.hard_delete_volume_record","label":"HardDeleteVolumeRecord","name":"volume_hard_deleted"}},{"SubsagaEnd":{"name":"volume_delete_subsaga_no_result"}},{"Start":{"params":{"disk_id":"71076db8-3879-4121-b6a0-36fcc172874f","project_id":"5e49b6de-cb2d-438d-83af-95c415bbb901","serialized_authn":{"kind":{"Authenticated":[{"actor":{"SiloUser":{"silo_id":"7bd7623a-68ed-4636-8ecb-b59e3b068787","silo_user_id":"906e74cb-eab8-4a87-bda8-2cb0914bf853"}}},{"mapped_fleet_roles":{"admin":["admin"]}}]}},"volume_id":"95c5741c-f346-4b48-a9e5-a4017b49f3ae"}}},"End"]},"saga_name":"disk-delete","start_node":13}
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/steno-0.4.1/src/sec.rs:1146
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
    sec_id = 65a11c18-7f59-41ac-b9e7-680627f996e7
00:19:16.696Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): creating saga
    file = nexus/db-queries/src/db/sec_store.rs:50
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:16.732Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): starting saga
    file = nexus/src/app/saga.rs:353
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:16.732Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): saga start
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/steno-0.4.1/src/sec.rs:967
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
    sec_id = 65a11c18-7f59-41ac-b9e7-680627f996e7
00:19:16.732Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 13
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.751Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 13
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.755Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 0
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.770Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Object {"block_size": String("Traditional"), "create_image_id": String("cb6eb1e9-69fd-40ad-9373-83926f8b32d9"), "create_snapshot_id": Null, "identity": Object {"description": String("ubuntu 22.04 mysql instance 11"), "id": String("71076db8-3879-4121-b6a0-36fcc172874f"), "name": String("sbmysql-11"), "time_created": String("2024-10-15T20:58:31.634624Z"), "time_deleted": Null, "time_modified": String("2024-10-15T20:58:31.634624Z")}, "pantry_address": Null, "project_id": String("5e49b6de-cb2d-438d-83af-95c415bbb901"), "rcgen": Number(1), "runtime_state": Object {"attach_instance_id": Null, "disk_state": String("detached"), "gen": Number(2), "time_updated": String("2024-10-15T20:58:36.747394Z")}, "size": Number(161061273600), "slot": Null, "volume_id": String("95c5741c-f346-4b48-a9e5-a4017b49f3ae")})
    node_id = 0
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.773Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 1
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.776Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): roles
    actor_id = 906e74cb-eab8-4a87-bda8-2cb0914bf853
    authenticated = true
    roles = RoleSet { roles: {} }
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
    saga_node = SpaceAccount
00:19:16.777Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): authorize result
    action = Query
    actor = Some(Actor::SiloUser { silo_user_id: 906e74cb-eab8-4a87-bda8-2cb0914bf853, silo_id: 7bd7623a-68ed-4636-8ecb-b59e3b068787, .. })
    actor_id = 906e74cb-eab8-4a87-bda8-2cb0914bf853
    authenticated = true
    resource = Database
    result = Ok(())
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
    saga_node = SpaceAccount
00:19:16.793Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 1
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.797Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 2
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.801Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Object {"serialized_authn": Object {"kind": Object {"Authenticated": Array [Object {"actor": Object {"SiloUser": Object {"silo_id": String("7bd7623a-68ed-4636-8ecb-b59e3b068787"), "silo_user_id": String("906e74cb-eab8-4a87-bda8-2cb0914bf853")}}}, Object {"mapped_fleet_roles": Object {"admin": Array [String("admin")]}}]}}, "volume_id": String("95c5741c-f346-4b48-a9e5-a4017b49f3ae")})
    node_id = 2
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.804Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.807Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:16.810Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 4
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:33.413Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Object {"V3": Object {"region_snapshots": Array [], "regions": Array [String("83539da4-cbad-48bf-89ca-f97d9bad1be3"), String("3320fd96-3858-41f3-b75f-16692ec36f27"), String("1a9074fe-afd7-4824-8bdb-dfe108f3a5ed")]}})
    node_id = 4
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:33.417Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 5
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:33.421Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 6
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:33.426Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 6
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:33.490Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 59.808423ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.490Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 102.04438ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.494Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 77.721667ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.556Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 129.938791ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.578Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 177.105542ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.599Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 172.519559ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.692Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 191.929827ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.761Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 207.89555ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.778Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 128.594134ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.891Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 237.654797ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.914Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 138.34675ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:33.975Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 87.969829ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.060Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 283.520345ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.069Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 251.275613ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.133Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 243.273794ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.326Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 288.094007ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.350Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 268.38349ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.383Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 263.520793ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.620Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 157.09792ms
    file = nexus/src/app/crucible.rs:828
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.625Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 260.76599ms
    file = nexus/src/app/crucible.rs:828
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.654Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 282.249634ms
    file = nexus/src/app/crucible.rs:828
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.785Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.891Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.941Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:50.066Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 5
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.070Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 7
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.073Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 7
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.077Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 8
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.081Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 8
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.084Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 9
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.195Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Array [])
    node_id = 9
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.199Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 10
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.203Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 10
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.206Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 11
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.214Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): allocated regions for 95c5741c-f346-4b48-a9e5-a4017b49f3ae is empty, calling volume_hard_delete
    file = nexus/src/app/sagas/volume_delete.rs:531
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:50.217Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Bool(true))
    node_id = 11
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.221Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 12
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.224Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Bool(true))
    node_id = 12
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.227Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 14
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.231Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 14
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.234Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): update for saga cached state
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/steno-0.4.1/src/sec.rs:1384
    new_state = Done
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    sec_id = 65a11c18-7f59-41ac-b9e7-680627f996e7
00:19:50.234Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): updating state
    file = nexus/db-queries/src/db/sec_store.rs:95
    new_state = done
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:19:50.242Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): saga finished
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/steno-0.4.1/src/sec.rs:1025
    result = success
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
    sec_id = 65a11c18-7f59-41ac-b9e7-680627f996e7
00:19:50.242Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): saga finished
    file = nexus/src/app/saga.rs:404
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
    saga_result = SagaResult { saga_id: 7f241c14-dacf-4655-a28e-b5db32c088e6, saga_log: SagaLog { saga_id: 7f241c14-dacf-4655-a28e-b5db32c088e6, unwinding: false, events: [N013 started, N013 succeeded, N000 started, N000 succeeded, N001 started, N001 succeeded, N002 started, N002 succeeded, N003 started, N003 succeeded, N004 started, N004 succeeded, N005 started, N006 started, N006 succeeded, N005 succeeded, N007 started, N007 succeeded, N008 started, N008 succeeded, N009 started, N009 succeeded, N010 started, N010 succeeded, N011 started, N011 succeeded, N012 started, N012 succeeded, N014 started, N014 succeeded], node_status: {0: Succeeded(Object {"block_size": String("Traditional"), "create_image_id": String("cb6eb1e9-69fd-40ad-9373-83926f8b32d9"), "create_snapshot_id": Null, "identity": Object {"description": String("ubuntu 22.04 mysql instance 11"), "id": String("71076db8-3879-4121-b6a0-36fcc172874f"), "name": String("sbmysql-11"), "time_created": String("2024-10-15T20:58:31.634624Z"), "time_deleted": Null, "time_modified": String("2024-10-15T20:58:31.634624Z")}, "pantry_address": Null, "project_id": String("5e49b6de-cb2d-438d-83af-95c415bbb901"), "rcgen": Number(1), "runtime_state": Object {"attach_instance_id": Null, "disk_state": String("detached"), "gen": Number(2), "time_updated": String("2024-10-15T20:58:36.747394Z")}, "size": Number(161061273600), "slot": Null, "volume_id": String("95c5741c-f346-4b48-a9e5-a4017b49f3ae")}), 1: Succeeded(Null), 2: Succeeded(Object {"serialized_authn": Object {"kind": Object {"Authenticated": Array [Object {"actor": Object {"SiloUser": Object {"silo_id": String("7bd7623a-68ed-4636-8ecb-b59e3b068787"), "silo_user_id": String("906e74cb-eab8-4a87-bda8-2cb0914bf853")}}}, Object {"mapped_fleet_roles": Object {"admin": Array [String("admin")]}}]}}, "volume_id": String("95c5741c-f346-4b48-a9e5-a4017b49f3ae")}), 3: Succeeded(Null), 4: Succeeded(Object {"V3": Object {"region_snapshots": Array [], "regions": Array [String("83539da4-cbad-48bf-89ca-f97d9bad1be3"), String("3320fd96-3858-41f3-b75f-16692ec36f27"), String("1a9074fe-afd7-4824-8bdb-dfe108f3a5ed")]}}), 5: Succeeded(Null), 6: Succeeded(Null), 7: Succeeded(Null), 8: Succeeded(Null), 9: Succeeded(Array []), 10: Succeeded(Null), 11: Succeeded(Bool(true)), 12: Succeeded(Bool(true)), 13: Succeeded(Null), 14: Succeeded(Null)} }, kind: Ok(SagaResultOk { saga_output: Bool(true), node_outputs: {"crucible_resources_to_delete": Object {"V3": Object {"region_snapshots": Array [], "regions": Array [String("83539da4-cbad-48bf-89ca-f97d9bad1be3"), String("3320fd96-3858-41f3-b75f-16692ec36f27"), String("1a9074fe-afd7-4824-8bdb-dfe108f3a5ed")]}}, "deleted_disk": Object {"block_size": String("Traditional"), "create_image_id": String("cb6eb1e9-69fd-40ad-9373-83926f8b32d9"), "create_snapshot_id": Null, "identity": Object {"description": String("ubuntu 22.04 mysql instance 11"), "id": String("71076db8-3879-4121-b6a0-36fcc172874f"), "name": String("sbmysql-11"), "time_created": String("2024-10-15T20:58:31.634624Z"), "time_deleted": Null, "time_modified": String("2024-10-15T20:58:31.634624Z")}, "pantry_address": Null, "project_id": String("5e49b6de-cb2d-438d-83af-95c415bbb901"), "rcgen": Number(1), "runtime_state": Object {"attach_instance_id": Null, "disk_state": String("detached"), "gen": Number(2), "time_updated": String("2024-10-15T20:58:36.747394Z")}, "size": Number(161061273600), "slot": Null, "volume_id": String("95c5741c-f346-4b48-a9e5-a4017b49f3ae")}, "freed_crucible_regions": Array [], "no_result1": Null, "no_result_1": Null, "no_result_2": Null, "no_result_3": Null, "no_result_4": Null, "no_result_5": Null, "params_for_volume_delete_subsaga": Object {"serialized_authn": Object {"kind": Object {"Authenticated": Array [Object {"actor": Object {"SiloUser": Object {"silo_id": String("7bd7623a-68ed-4636-8ecb-b59e3b068787"), "silo_user_id": String("906e74cb-eab8-4a87-bda8-2cb0914bf853")}}}, Object {"mapped_fleet_roles": Object {"admin": Array [String("admin")]}}]}}, "volume_id": String("95c5741c-f346-4b48-a9e5-a4017b49f3ae")}, "volume_delete_subsaga_no_result": Bool(true), "volume_hard_deleted": Bool(true)} }) }
00:23:22.326Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): observed saga start
    background_task = saga_recovery
    file = nexus/saga-recovery/src/recovery.rs:76
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
00:23:22.326Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): found saga that may be done (will be sure on the next pass)
    background_task = saga_recovery
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6

I'll add some other even longer-running ones from the latest build.

@askfongjojo askfongjojo added this to the 12 milestone Nov 28, 2024
@askfongjojo askfongjojo changed the title Disk creation and deletion takes more than 30 seconds in some cases Disk creation and deletion takes more than 30 seconds Nov 28, 2024
@askfongjojo
Copy link
Author

A more recent example from a Terraform destroy operation of 20 instances and their boot disks (concurrency = 10). This one took minutes:

root@oxz_nexus_65a11c18:~# grep 197b3fdc-c57a-4722-abcd-1eb137118d2d `svcs -L nexus` | egrep -v 'authorize|role' | looker
07:18:11.815Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): preparing saga
    file = nexus/src/app/saga.rs:255
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:18:11.815Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): creating saga
    file = nexus/db-queries/src/db/sec_store.rs:50
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:18:11.821Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): starting saga
    file = nexus/src/app/saga.rs:353
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:18:11.821Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): saga start
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/steno-0.4.1/src/sec.rs:967
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
    sec_id = 65a11c18-7f59-41ac-b9e7-680627f996e7
07:18:11.821Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 13
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:11.828Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 13
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:11.831Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 0
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:11.841Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Object {"block_size": String("Traditional"), "create_image_id": String("d4c821a2-a87d-43ce-ba5a-ac95ac6bb138"), "create_snapshot_id": Null, "identity": Object {"description": String("application Boot Disk"), "id": String("a6f238ed-c92c-4fad-a05a-d944cba3cbb9"), "name": String("app-boot-0"), "time_created": String("2024-11-25T22:48:05.658488Z"), "time_deleted": Null, "time_modified": String("2024-11-25T22:48:05.658488Z")}, "pantry_address": Null, "project_id": String("fe0da422-5c48-4b52-8010-f2fc401f090f"), "rcgen": Number(1), "runtime_state": Object {"attach_instance_id": Null, "disk_state": String("detached"), "gen": Number(2), "time_updated": String("2024-11-25T22:49:18.576042Z")}, "size": Number(10737418240), "slot": Null, "volume_id": String("f863681c-2315-4203-9767-c33111aba52f")})
    node_id = 0
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:11.873Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 1
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:11.982Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 1
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:12.015Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 2
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:12.022Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 3
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:12.026Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 3
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:18:12.029Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 4
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:19:34.425Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): observed saga start
    background_task = saga_recovery
    file = nexus/saga-recovery/src/recovery.rs:76
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:19:34.425Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): found saga that can be ignored (already running)
    background_task = saga_recovery
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:20:24.206Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Object {"V3": Object {"region_snapshots": Array [], "regions": Array [String("a7243f15-b0f0-427f-8285-6b98980b2b29"), String("314164a2-ca84-4673-87d4-e94a8e65361d"), String("df2d7654-8f7c-44bb-8b82-333e6bb7b9bc")]}})
    node_id = 4
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:20:24.211Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 5
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:20:24.215Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 6
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:20:24.219Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 6
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:20:24.285Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 115.441586ms
    file = nexus/src/app/crucible.rs:828
    region_id = a7243f15-b0f0-427f-8285-6b98980b2b29
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.292Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 52.090702ms
    file = nexus/src/app/crucible.rs:828
    region_id = df2d7654-8f7c-44bb-8b82-333e6bb7b9bc
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.349Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 133.628385ms
    file = nexus/src/app/crucible.rs:828
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.358Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 96.947532ms
    file = nexus/src/app/crucible.rs:828
    region_id = df2d7654-8f7c-44bb-8b82-333e6bb7b9bc
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.407Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 155.588577ms
    file = nexus/src/app/crucible.rs:828
    region_id = a7243f15-b0f0-427f-8285-6b98980b2b29
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.461Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 196.684445ms
    file = nexus/src/app/crucible.rs:828
    region_id = df2d7654-8f7c-44bb-8b82-333e6bb7b9bc
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.490Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 61.041153ms
    file = nexus/src/app/crucible.rs:828
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.556Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 137.356113ms
    file = nexus/src/app/crucible.rs:828
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.569Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = a7243f15-b0f0-427f-8285-6b98980b2b29
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.665Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = df2d7654-8f7c-44bb-8b82-333e6bb7b9bc
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.699Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 205.948298ms
    file = nexus/src/app/crucible.rs:828
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:24.912Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 238.585694ms
    file = nexus/src/app/crucible.rs:828
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:25.159Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): Transient(region not deleted yet), trying again in 309.743833ms
    file = nexus/src/app/crucible.rs:828
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:20:25.475Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 314164a2-ca84-4673-87d4-e94a8e65361d
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
    saga_name = disk-delete
07:23:58.503Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 5
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:23:58.507Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 7
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:23:58.512Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 7
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:23:58.517Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 8
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:23:58.522Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 8
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d
07:23:58.526Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Started
    node_id = 9
    saga_id = 197b3fdc-c57a-4722-abcd-1eb137118d2d

@leftwo
Copy link
Contributor

leftwo commented Nov 29, 2024

Here is the saga show for the first saga ID in the initial report:

root@oxz_switch1:~# /tmp/omdb-saga db sagas show 7f241c14-dacf-4655-a28e-b5db32c088e6 2> /dev/null
 id                                   | time_created                   | name        | state                                                    
--------------------------------------+--------------------------------+-------------+-----------------------                                  
 7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.696040 UTC | disk-delete | SagaCachedState(Done)                                
                                                                                                                                                                                        
                             saga id | event time                     | node id                                              | event type | data
------------------------------------ | ------------------------------ | ---------------------------------------------------- | ---------- | ---                                         
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.732722 UTC |  13: start                                           | started    |                                             
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.751858 UTC |  13: start                                           | succeeded  |                                             
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.755859 UTC |   0: disk_delete.delete_disk_record                  | started    |                                             
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.770124 UTC |   0: disk_delete.delete_disk_record                  | succeeded  | {"block_size":"Traditional","create_image_id
":"cb6eb1e9-69fd-40ad-9373-83926f8b32d9","create_snapshot_id":null,"identity":{"description":"ubuntu 22.04 mysql instance 11","id":"71076db8-3879-4121-b6a0-36fcc172874f","name":"sbmysq
l-11","time_created":"2024-10-15T20:58:31.634624Z","time_deleted":null,"time_modified":"2024-10-15T20:58:31.634624Z"},"pantry_address":null,"project_id":"5e49b6de-cb2d-438d-83af-95c415
bbb901","rcgen":1,"runtime_state":{"attach_instance_id":null,"disk_state":"detached","gen":2,"time_updated":"2024-10-15T20:58:36.747394Z"},"size":161061273600,"slot":null,"volume_id":"
95c5741c-f346-4b48-a9e5-a4017b49f3ae"}                                                                                                                                                  
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.773696 UTC |   1: disk_delete.space_account                       | started    |                                             7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.793826 UTC |   1: disk_delete.space_account                       | succeeded  | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.797354 UTC |   2: params_for_volume_delete_subsaga                | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.801116 UTC |   2: params_for_volume_delete_subsaga                | succeeded  | {"serialized_authn":{"kind":{"Authenticated"
:[{"actor":{"SiloUser":{"silo_id":"7bd7623a-68ed-4636-8ecb-b59e3b068787","silo_user_id":"906e74cb-eab8-4a87-bda8-2cb0914bf853"}}},{"mapped_fleet_roles":{"admin":["admin"]}}]}},"volume_
id":"95c5741c-f346-4b48-a9e5-a4017b49f3ae"}                                                                                                                                             
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.804557 UTC |   3: subsaga start volume-delete                     | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.807059 UTC |   3: subsaga start volume-delete                     | succeeded  | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.810554 UTC |   4: volume_delete.decrease_crucible_resource_count  | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.413136 UTC |   4: volume_delete.decrease_crucible_resource_count  | succeeded  | {"V3":{"region_snapshots":[],"regions":["835
39da4-cbad-48bf-89ca-f97d9bad1be3","3320fd96-3858-41f3-b75f-16692ec36f27","1a9074fe-afd7-4824-8bdb-dfe108f3a5ed"]}}                         
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.417634 UTC |   5: volume_delete.delete_crucible_regions           | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.421323 UTC |   6: volume_delete.delete_crucible_running_snapshots | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.426417 UTC |   6: volume_delete.delete_crucible_running_snapshots | succeeded  | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.065982 UTC |   5: volume_delete.delete_crucible_regions           | succeeded  | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.070045 UTC |   7: volume_delete.delete_crucible_snapshots         | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.073648 UTC |   7: volume_delete.delete_crucible_snapshots         | succeeded  |   
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.077414 UTC |   8: volume_delete.delete_crucible_snapshot_records  | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.081356 UTC |   8: volume_delete.delete_crucible_snapshot_records  | succeeded  | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.084225 UTC |   9: volume_delete.find_freed_crucible_regions       | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.195882 UTC |   9: volume_delete.find_freed_crucible_regions       | succeeded  | []  
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.199337 UTC |  10: volume_delete.delete_freed_crucible_regions     | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.203121 UTC |  10: volume_delete.delete_freed_crucible_regions     | succeeded  |     
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.206253 UTC |  11: volume_delete.hard_delete_volume_record         | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.217791 UTC |  11: volume_delete.hard_delete_volume_record         | succeeded  | true
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.221598 UTC |  12: subsaga end volume_delete_subsaga_no_result     | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.224238 UTC |  12: subsaga end volume_delete_subsaga_no_result     | succeeded  | true
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.227555 UTC |  14: end                                             | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.231251 UTC |  14: end                                             | succeeded  | 

Here is the saga show output for the second saga:

root@oxz_switch1:~# /tmp/omdb-saga db sagas show 197b3fdc-c57a-4722-abcd-1eb137118d2d 2> /dev/null
 id                                   | time_created                   | name        | state                 
--------------------------------------+--------------------------------+-------------+-----------------------
 197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.815172 UTC | disk-delete | SagaCachedState(Done) 

                             saga id | event time                     | node id                                              | event type | data
------------------------------------ | ------------------------------ | ---------------------------------------------------- | ---------- | ---
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.821161 UTC |  13: start                                           | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.828065 UTC |  13: start                                           | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.831154 UTC |   0: disk_delete.delete_disk_record                  | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.841430 UTC |   0: disk_delete.delete_disk_record                  | succeeded  | {"block_size":"Traditional","create_image_id":"d4c821a2-a87d-43ce-ba5a-ac95ac6bb138","create_snapshot_id":null,"identity":{"description":"application Boot Disk","id":"a6f238ed-c92c-4fad-a05a-d944cba3cbb9","name":"app-boot-0","time_created":"2024-11-25T22:48:05.658488Z","time_deleted":null,"time_modified":"2024-11-25T22:48:05.658488Z"},"pantry_address":null,"project_id":"fe0da422-5c48-4b52-8010-f2fc401f090f","rcgen":1,"runtime_state":{"attach_instance_id":null,"disk_state":"detached","gen":2,"time_updated":"2024-11-25T22:49:18.576042Z"},"size":10737418240,"slot":null,"volume_id":"f863681c-2315-4203-9767-c33111aba52f"}
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.873831 UTC |   1: disk_delete.space_account                       | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:11.982575 UTC |   1: disk_delete.space_account                       | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:12.015614 UTC |   2: params_for_volume_delete_subsaga                | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:12.019493 UTC |   2: params_for_volume_delete_subsaga                | succeeded  | {"serialized_authn":{"kind":{"Authenticated":[{"actor":{"SiloUser":{"silo_id":"7bd7623a-68ed-4636-8ecb-b59e3b068787","silo_user_id":"906e74cb-eab8-4a87-bda8-2cb0914bf853"}}},{"mapped_fleet_roles":{"admin":["admin"]}}]}},"volume_id":"f863681c-2315-4203-9767-c33111aba52f"}
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:12.022739 UTC |   3: subsaga start volume-delete                     | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:12.026550 UTC |   3: subsaga start volume-delete                     | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:18:12.029920 UTC |   4: volume_delete.decrease_crucible_resource_count  | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:20:24.206388 UTC |   4: volume_delete.decrease_crucible_resource_count  | succeeded  | {"V3":{"region_snapshots":[],"regions":["a7243f15-b0f0-427f-8285-6b98980b2b29","314164a2-ca84-4673-87d4-e94a8e65361d","df2d7654-8f7c-44bb-8b82-333e6bb7b9bc"]}}
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:20:24.211317 UTC |   5: volume_delete.delete_crucible_regions           | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:20:24.215283 UTC |   6: volume_delete.delete_crucible_running_snapshots | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:20:24.219539 UTC |   6: volume_delete.delete_crucible_running_snapshots | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:23:58.503744 UTC |   5: volume_delete.delete_crucible_regions           | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:23:58.507616 UTC |   7: volume_delete.delete_crucible_snapshots         | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:23:58.512471 UTC |   7: volume_delete.delete_crucible_snapshots         | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:23:58.517298 UTC |   8: volume_delete.delete_crucible_snapshot_records  | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:23:58.522164 UTC |   8: volume_delete.delete_crucible_snapshot_records  | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:23:58.526820 UTC |   9: volume_delete.find_freed_crucible_regions       | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.897366 UTC |   9: volume_delete.find_freed_crucible_regions       | succeeded  | []
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.901750 UTC |  10: volume_delete.delete_freed_crucible_regions     | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.904613 UTC |  10: volume_delete.delete_freed_crucible_regions     | succeeded  | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.909368 UTC |  11: volume_delete.hard_delete_volume_record         | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.922559 UTC |  11: volume_delete.hard_delete_volume_record         | succeeded  | true
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.925690 UTC |  12: subsaga end volume_delete_subsaga_no_result     | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.933145 UTC |  12: subsaga end volume_delete_subsaga_no_result     | succeeded  | true
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.936363 UTC |  14: end                                             | started    | 
197b3fdc-c57a-4722-abcd-1eb137118d2d | 2024-11-28 07:26:33.939892 UTC |  14: end                                             | succeeded  | 

@askfongjojo askfongjojo changed the title Disk creation and deletion takes more than 30 seconds Disk deletion takes more than 30 seconds Dec 2, 2024
@jmpesp
Copy link
Contributor

jmpesp commented Dec 3, 2024

Looking at saga 7f241c14-dacf-4655-a28e-b5db32c088e6, the majority of time used seems to be database related. There's two indications of this:

The first is:

7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:16.810554 UTC |   4: volume_delete.decrease_crucible_resource_count  | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.413136 UTC |   4: volume_delete.decrease_crucible_resource_count  | succeeded  | {"V3":{"region_snapshots":[],"regions":["83539da4-cbad-48bf-89ca-f97d9bad1be3","3320fd96-3858-41f3-b75f-16692ec36f27","1a9074fe-afd7-4824-8bdb-dfe108f3a5ed"]}}

That saga node only has a call to DataStore::soft_delete_transaction, and still requires 17ish seconds to do this!

The second is:

7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.417634 UTC |   5: volume_delete.delete_crucible_regions           | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.421323 UTC |   6: volume_delete.delete_crucible_running_snapshots | started    | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:33.426417 UTC |   6: volume_delete.delete_crucible_running_snapshots | succeeded  | 
7f241c14-dacf-4655-a28e-b5db32c088e6 | 2024-11-27 00:19:50.065982 UTC |   5: volume_delete.delete_crucible_regions           | succeeded  | 

In the volume delete saga, the delete_crucible_regions saga node is run concurrently with with delete_crucible_running_snapshots. Here, delete_crucible_running_snapshots requires a minuscule amount of time to run as it's not deleting anything (the region_snapshots array in the output of the decrease_crucible_resource_count node is empty), but delete_crucible_regions ends up requiring another 17ish seconds to run in order to delete three regions.

Looking into the log messages posted earlier in this issue:

00:19:34.785Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 83539da4-cbad-48bf-89ca-f97d9bad1be3
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.891Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 1a9074fe-afd7-4824-8bdb-dfe108f3a5ed
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete
00:19:34.941Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): region deleted
    file = nexus/src/app/crucible.rs:813
    region_id = 3320fd96-3858-41f3-b75f-16692ec36f27
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6
    saga_name = disk-delete

<space inserted by me>

00:19:50.066Z DEBG 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): recording saga event
    event_type = Succeeded(Null)
    node_id = 5
    saga_id = 7f241c14-dacf-4655-a28e-b5db32c088e6

This indicates to me that it's spending a significant amount of time recording the saga node success, after having deleted three regions within about a second of the saga node starting.

@askfongjojo
Copy link
Author

askfongjojo commented Dec 4, 2024

This particular database update alone can take up to 6+ seconds (found this as I used dtrace to get the database transaction latencies):

conn_id: 71984b2b-a8cf-4fe2-884c-d72cdd526c80, latency: 6423584 us, query: 'SELECT "disk"."id", "disk"."name", "disk"."description", "disk"."time_created", "disk"."time_modified", "disk"."time_deleted", "disk"."rcgen", "disk"."project_id", "disk"."volume_id", "disk"."disk_state", "disk"."attach_instance_id", "disk"."state_generation", "disk"."time_state_updated", "disk"."slot", "disk"."size_bytes", "disk"."block_size", "disk"."origin_snapshot", "disk"."origin_image", "disk"."pantry_address", "virtual_provisioning_resource"."id", "virtual_provisioning_resource"."time_modified", "virtual_provisioning_resource"."resource_type", "virtual_provisioning_resource"."virtual_disk_bytes_provisioned", "virtual_provisioning_resource"."cpus_provisioned", "virtual_provisioning_resource"."ram_provisioned", "volume"."id", "volume"."time_created", "volume"."time_modified", "volume"."time_deleted", "volume"."rcgen", "volume"."data", "volume"."resources_to_clean_up" FROM (("disk" LEFT OUTER JOIN "virtual_provisioning_resource" ON ("virtual_provisioning_resource"."id" = "disk"."id")) LEFT OUTER JOIN "volume" ON ("disk"."volume_id" = "volume"."id")) WHERE ("disk"."time_deleted" IS NOT NULL) -- binds: []'

@askfongjojo askfongjojo changed the title Disk deletion takes more than 30 seconds Disk creation/deletion takes more than 30 seconds Dec 5, 2024
@askfongjojo
Copy link
Author

askfongjojo commented Dec 5, 2024

On rack2 after upgrading to 37c7f18, disk deletion times took even longer. The virtual_provisioning_resource table update query took more than a minute to complete:

conn_id: d0a48184-90c2-400a-8df2-e5c94bb573cd, latency: 73821423 us, query: 'SELECT "disk"."id", "disk"."name", "disk"."description", "disk"."time_created", "disk"."time_modified", "disk"."time_deleted", "disk"."rcgen", "disk"."project_id", "disk"."volume_id", "disk"."disk_state", "disk"."attach_instance_id", "disk"."state_generation", "disk"."time_state_updated", "disk"."slot", "disk"."size_bytes", "disk"."block_size", "disk"."origin_snapshot", "disk"."origin_image", "disk"."pantry_address", "virtual_provisioning_resource"."id", "virtual_provisioning_resource"."time_modified", "virtual_provisioning_resource"."resource_type", "virtual_provisioning_resource"."virtual_disk_bytes_provisioned", "virtual_provisioning_resource"."cpus_provisioned", "virtual_provisioning_resource"."ram_provisioned", "volume"."id", "volume"."time_created", "volume"."time_modified", "volume"."time_deleted", "volume"."rcgen", "volume"."data", "volume"."resources_to_clean_up" FROM (("disk" LEFT OUTER JOIN "virtual_provisioning_resource" ON ("virtual_provisioning_resource"."id" = "disk"."id")) LEFT OUTER JOIN "volume" ON ("disk"."volume_id" = "volume"."id")) WHERE ("disk"."time_deleted" IS NOT NULL) -- binds: []'

@jgallagher
Copy link
Contributor

On rack2 after upgrading to 37c7f18, disk deletion times took even longer. The virtual_provisioning_resource table update took more than a minute to complete:

conn_id: d0a48184-90c2-400a-8df2-e5c94bb573cd, latency: 73821423 us, query: 'SELECT "disk"."id", "disk"."name", "disk"."description", "disk"."time_created", "disk"."time_modified", "disk"."time_deleted", "disk"."rcgen", "disk"."project_id", "disk"."volume_id", "disk"."disk_state", "disk"."attach_instance_id", "disk"."state_generation", "disk"."time_state_updated", "disk"."slot", "disk"."size_bytes", "disk"."block_size", "disk"."origin_snapshot", "disk"."origin_image", "disk"."pantry_address", "virtual_provisioning_resource"."id", "virtual_provisioning_resource"."time_modified", "virtual_provisioning_resource"."resource_type", "virtual_provisioning_resource"."virtual_disk_bytes_provisioned", "virtual_provisioning_resource"."cpus_provisioned", "virtual_provisioning_resource"."ram_provisioned", "volume"."id", "volume"."time_created", "volume"."time_modified", "volume"."time_deleted", "volume"."rcgen", "volume"."data", "volume"."resources_to_clean_up" FROM (("disk" LEFT OUTER JOIN "virtual_provisioning_resource" ON ("virtual_provisioning_resource"."id" = "disk"."id")) LEFT OUTER JOIN "volume" ON ("disk"."volume_id" = "volume"."id")) WHERE ("disk"."time_deleted" IS NOT NULL) -- binds: []'

Hitting this with EXPLAIN shows several full index scans:

  distribution: full
  vectorized: true

  • hash join (left outer)
  │ estimated row count: 38,684
  │ equality: (volume_id) = (id)
  │ right cols are key
  │
  ├── • merge join (left outer)
  │   │ estimated row count: 38,684
  │   │ equality: (id) = (id)
  │   │ left cols are key
  │   │ right cols are key
  │   │
  │   ├── • filter
  │   │   │ estimated row count: 38,684
  │   │   │ filter: time_deleted IS NOT NULL
  │   │   │
  │   │   └── • scan
  │   │         estimated row count: 39,122 (100% of the table; stats collected 29 days ago)
  │   │         table: disk@disk_pkey
  │   │         spans: FULL SCAN
  │   │
  │   └── • scan
  │         estimated row count: 576 (100% of the table; stats collected 3 days ago)
  │         table: virtual_provisioning_resource@virtual_provisioning_resource_pkey
  │         spans: FULL SCAN
  │
  └── • scan
        estimated row count: 898 (100% of the table; stats collected 2 days ago)
        table: volume@volume_pkey
        spans: FULL SCAN

  index recommendations: 1
  1. type: index replacement
     SQL commands: CREATE INDEX ON disk (volume_id) STORING (name, description, time_created, time_modified, time_deleted, rcgen, project_id, disk_state, attach_instance_id, state_generation, slot, time_state_updated, size_bytes, block_size, origin_snapshot, origin_image, pantry_address); DROP INDEX disk@lookup_any_disk_by_volume_id;
(36 rows)

The one that jumps out is disk@disk_pkey with 39,000+ rows. It looks like we're filtering specifically for deleted disks in this query; given we soft delete and don't current come back and garbage collect/hard delete, I think that means this query as written will continue to slow down as we accumulate more deleted disks. I know there was some chat recently about overloading time_deleted - is this another case of that? Would this query still work correctly if we started hard deleting rows that had previously been soft deleted?

@jgallagher
Copy link
Contributor

Quick followup that might raise more questions than answers; I tried running this under EXPLAIN ANALYZE to make sure we weren't seeing bad stats estimates or something. The actual row counts are pretty similar to what EXPLAIN predicted:

  planning time: 2ms
  execution time: 89ms
  distribution: full
  vectorized: true
  rows read from KV: 40,811 (8.3 MiB)
  cumulative time spent in KV: 89ms
  maximum memory usage: 3.0 MiB
  network usage: 41 MiB (316 messages)

  • hash join (left outer)
  │ nodes: n1, n4, n5
  │ actual row count: 38,904
  │ estimated max memory allocated: 3.9 MiB
  │ estimated max sql temp disk usage: 0 B
  │ estimated row count: 38,684
  │ equality: (volume_id) = (id)
  │ right cols are key
  │
  ├── • merge join (left outer)
  │   │ nodes: n1, n5
  │   │ actual row count: 38,904
  │   │ estimated max memory allocated: 2.1 MiB
  │   │ estimated max sql temp disk usage: 0 B
  │   │ estimated row count: 38,684
  │   │ equality: (id) = (id)
  │   │ left cols are key
  │   │ right cols are key
  │   │
  │   ├── • filter
  │   │   │ nodes: n1
  │   │   │ actual row count: 38,904
  │   │   │ estimated row count: 38,684
  │   │   │ filter: time_deleted IS NOT NULL
  │   │   │
  │   │   └── • scan
  │   │         nodes: n1
  │   │         actual row count: 39,338
  │   │         KV time: 60ms
  │   │         KV contention time: 0µs
  │   │         KV rows read: 39,338
  │   │         KV bytes read: 7.1 MiB
  │   │         estimated max memory allocated: 7.6 MiB
  │   │         estimated row count: 39,122 (100% of the table; stats collected 29 days ago)
  │   │         table: disk@disk_pkey
  │   │         spans: FULL SCAN
  │   │
  │   └── • scan
  │         nodes: n5
  │         actual row count: 577
  │         KV time: 11ms
  │         KV contention time: 0µs
  │         KV rows read: 577
  │         KV bytes read: 42 KiB
  │         estimated max memory allocated: 120 KiB
  │         estimated row count: 576 (100% of the table; stats collected 3 days ago)
  │         table: virtual_provisioning_resource@virtual_provisioning_resource_pkey
  │         spans: FULL SCAN
  │
  └── • scan
        nodes: n4
        actual row count: 896
        KV time: 17ms
        KV contention time: 0µs
        KV rows read: 896
        KV bytes read: 1.2 MiB
        estimated max memory allocated: 2.8 MiB
        estimated row count: 898 (100% of the table; stats collected 2 days ago)
        table: volume@volume_pkey
        spans: FULL SCAN
(69 rows)


Time: 92ms total (execution 92ms / network 1ms)

What I can't explain is the query time - 92ms seems perfectly reasonable for reading ~40,000 rows, I guess, but what could possibly have made that same query take 73 seconds in production?

@askfongjojo
Copy link
Author

Fixed by #7212 (see all related notes in #7208 which has a different failure mode but the same root cause)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants