-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pantry crashed when processing a region with mismatched information #6353
Comments
Just an extra data point: I happened to be testing sled expungement (details in this ticket). I saw that the regions expunged, including a snapshot, have been successfully replaced. So it would look like pantry was able to work on other regions despite the crash? I haven't tracked down the exact timing of when the replacements happened. I found only one non-empty pantry log file in the past 24 hours. It has been uploaded to catacomb:/staff/core/omicron-6353. |
The region actually has something to do with a region replaced in the sled I expunged. Here are some nexus log lines at a later time about this region:
Apparently it's a replacement of the region |
The code to replace a snapshot is not in yet, so if it was reported that a snapshot was "fixed" it is not telling the truth here. |
Looking at dogfood now, OMDB shows region replacements in progress (for some time now):
From the pantry, we have the line:
Of all the region-replacement tasks, none are replacing
No region with |
With the three regions I'm tracking so far: Looking for any of our regions, I only find one:
This could make sense I think if the saga failed for a replacement, the new region would be deleted when things unwound.
That looks a lot like a temporary volume used during replacement. |
Having a look now |
Some rough notes James collected during a debugging session with myself, @jmpesp, @iximeow, and @papertigers : TODO:
do some validate checks:
seems orthogonal?
== issue: new region id is gone!
region-replacement-start (fe446f50-e43b-46cb-86b5-1b1651c10edd) saga shows it was ensured ok:
but it was deleted right after?!
which agent?
create logs:
something requested delete 1 second later!
zpool history:
what called that request id?
why is region-replacement-finish happening here?!
this replacement request id is completely unrelated to the replacement we started looking at but in the volume delete sub saga in delete_freed_crucible_regions, it's picking up the region we just created:
(these timestamps line up with the crucible agent request log) so why?! wtf let's look at the first drive saga:
this first drive saga overlaps with the other region replacement finish saga that deleted the new region but that doesn't matter, it shouldn't be deleting the region anyway |
I forgot about passing the fetch-limit previously and got confused. You might need to re-check in case this matters:
|
I looked into the failed/stuck repair at id: 1ed5e8e1-1d98-4420-bef0-135cba510238 Here are the details:
This failed repair happened on the pantry fd00:1122:3344:105::4
On sled 9, looking for pantry logs:
Then, searching them with:
Just get the /pool/ext path that matches what we know we have, then put in the zone name and path to the log file.
That gives us our log as: So, the regions we care about are: The old region ID won't exist in the pantry logs because we are starting up the pantry Here is the log from the pantry starting with our "new" region:
Pantry starts up and starts reconcile.
This aborts the reconcile:
The downstairs client is at fd00:1122:3344:10a::7
that IP is this zone:
So, over on 23, in the global zone with zfs looking for region e03d2ffc
Inside the crucible zone that has the agent responsible for this, we look
For the region, we can find the SMF logs from it where it has started up and is
This is the same time the upstairs sees it disconnect. The agent on 23 in the crucible zone creates the region here:
Then, the agent is requested to delete it, 7 seconds later:
From that agent request, we can see where the delete came from: Which is, over on sled 8 in the nexus zone there:
Now looking in the Nexus log:
here is the region-replacement-drive saga starting:
Starting a little later, we have this saga:
This later saga is part of the region-replace 915f8726-a882-4960-9e5c-dfe359f54911
That sagas info:
At the end of all this, we are left with why did this other |
The pantry panic here is a 2nd order issue, let me explain how it happaned. A pantry was started up to handle a region replacement. meanwhile... A new downstairs region was created for some other purpose but it re-used the same IP:Port that |
Create new enum types and return those to give more information to callers: - `create_region_snapshot_replacement_step` and `insert_region_snapshot_replacement_step` now return `InsertRegionSnapshotReplacementStepResult` - `volume_replace_region` and `volume_replace_snapshot` now return `VolumeReplaceResult` Notably, `VolumeReplaceResult::ExistingVolumeDeleted` replaces the previous error type `TargetVolumeDeleted`, and is not an error, allowing the caller to take action of the existing volume was deleted. This commit was peeled off work in progress to address oxidecomputer#6353.
Create new enum types and return those to give more information to callers: - `create_region_snapshot_replacement_step` and `insert_region_snapshot_replacement_step` now return `InsertRegionSnapshotReplacementStepResult` - `volume_replace_region` and `volume_replace_snapshot` now return `VolumeReplaceResult` Notably, `VolumeReplaceResult::ExistingVolumeDeleted` replaces the previous error type `TargetVolumeDeleted`, and is not an error, allowing the caller to take action of the existing volume was deleted. This commit was peeled off work in progress to address oxidecomputer#6353.
The underlying issue that I've been addressing since this was opened is that region replacement and region snapshot replacement do not currently account for soft-deleted or hard-deleted volumes. Originally I thought I would have to deal with this by having three new related sagas that would transition each of the related objects into a new state to represent that the associated volume was deleted, but this week have tried to incorporate checks into the existing sagas instead, and this seems to work without a whole lot of extra effort. I'm going to take this approach unless I find a fundamental flaw during testing. I've been peeling off this work in progress in order to have smaller PRs to review. Here's what have been opened so far related to this work: |
Create new enum types and return those to give more information to callers: - `create_region_snapshot_replacement_step` and `insert_region_snapshot_replacement_step` now return `InsertRegionSnapshotReplacementStepResult` - `volume_replace_region` and `volume_replace_snapshot` now return `VolumeReplaceResult` Notably, `VolumeReplaceResult::ExistingVolumeDeleted` replaces the previous error type `TargetVolumeDeleted`, and is not an error, allowing the caller to take action of the existing volume was deleted. This commit was peeled off work in progress to address #6353.
Create new enum types and return those to give more information to callers: - `create_region_snapshot_replacement_step` and `insert_region_snapshot_replacement_step` now return `InsertRegionSnapshotReplacementStepResult` - `volume_replace_region` and `volume_replace_snapshot` now return `VolumeReplaceResult` Notably, `VolumeReplaceResult::ExistingVolumeDeleted` replaces the previous error type `TargetVolumeDeleted`, and is not an error, allowing the caller to take action of the existing volume was deleted. This commit was peeled off work in progress to address #6353.
Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly: - If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete - If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate. Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out. This commit was peeled off work in progress to address oxidecomputer#6353.
Fix for deleted volumes during region replacement addresses this problem for region replacement. |
Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly: - If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete - If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate. Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out. This commit was peeled off work in progress to address #6353.
Last week I discovered that read-only regions do not have proper reference counting like region snapshots do. This is only a problem if region snapshots are replaced through an expungement or by a manual request, as that's the only thing that currently creates read-only regions. I've confirmed that dogfood is not affected by this: there are no on-going region snapshot replacements, nor any read-only regions that have already been allocated. I've opened #6728 to disable region snapshot replacement until I can fix this. |
Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly: - If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete - If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate. Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out. This commit was peeled off work in progress to address #6353.
#6805 adds reference counting for read-only regions. |
Volumes can be deleted at any time, but the tasks and sagas that perform region snapshot replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region snapshot replacement accordingly: - if a volume that has the region snapshot being replaced is soft-deleted, then skip making a region snapshot replacement step for it - if a region snapshot replacement step has the volume deleted after the step was created, transition it directly to the VolumeDeleted state - if a region snapshot replacement step has the volume deleted during the saga invocation, then skip notifying any Upstairs and allow the saga to transition the request to Complete, where then associated clean up can proceed An interesting race condition emerged during unit testing: the read-only region allocated to replace a region snapshot would be swapped into the snapshot volume, but would be susceptible to being deleted by the user, and therefore unable to be swapped into other volumes that have that snapshot volume as a read-only parent. This requires an additional volume that used that read-only region in order to bump the reference count associated with that region, so that the user cannot delete it before it was used to replace all other uses of the region snapshot it was meant to replace. This additional volume's lifetime lives as long as the region snapshot replacement, and therefore needs to be deleted when the region snapshot replacement is finished. This required a new region snapshot replacement finish saga, which required a new "Completing" state to perform the same type of state based lock on the replacement request done for all the other sagas. Testing also revealed that there were scenarios where `find_deleted_volume_regions` would return volumes for hard-deletion prematurely. The function now returns a struct instead of a list of tuples, and in that struct, regions freed for deletion are now distinct from volumes freed for deletion. Volumes are now only returned for hard-deletion when all associated read/write regions have been (or are going to be) deleted. Fixes oxidecomputer#6353
Volumes can be deleted at any time, but the tasks and sagas that perform region snapshot replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region snapshot replacement accordingly: - if a volume that has the region snapshot being replaced is soft-deleted, then skip making a region snapshot replacement step for it - if a region snapshot replacement step has the volume deleted after the step was created, transition it directly to the VolumeDeleted state - if a region snapshot replacement step has the volume deleted during the saga invocation, then skip notifying any Upstairs and allow the saga to transition the request to Complete, where then associated clean up can proceed An interesting race condition emerged during unit testing: the read-only region allocated to replace a region snapshot would be swapped into the snapshot volume, but would be susceptible to being deleted by the user, and therefore unable to be swapped into other volumes that have that snapshot volume as a read-only parent. This requires an additional volume that used that read-only region in order to bump the reference count associated with that region, so that the user cannot delete it before it was used to replace all other uses of the region snapshot it was meant to replace. This additional volume's lifetime lives as long as the region snapshot replacement, and therefore needs to be deleted when the region snapshot replacement is finished. This required a new region snapshot replacement finish saga, which required a new "Completing" state to perform the same type of state based lock on the replacement request done for all the other sagas. Testing also revealed that there were scenarios where `find_deleted_volume_regions` would return volumes for hard-deletion prematurely. The function now returns a struct instead of a list of tuples, and in that struct, regions freed for deletion are now distinct from volumes freed for deletion. Volumes are now only returned for hard-deletion when all associated read/write regions have been (or are going to be) deleted. Fixes oxidecomputer#6353
Finally, Handle region snapshot replacement volume deletes will address the problem of deleted volumes for all of region snapshot replacement, closing this issue. |
Volumes can be deleted at any time, but the tasks and sagas that perform region snapshot replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region snapshot replacement accordingly: - if a volume that has the region snapshot being replaced is soft-deleted, then skip making a region snapshot replacement step for it - if a region snapshot replacement step has the volume deleted after the step was created, transition it directly to the VolumeDeleted state - if a region snapshot replacement step has the volume deleted during the saga invocation, then skip notifying any Upstairs and allow the saga to transition the request to Complete, where then associated clean up can proceed An interesting race condition emerged during unit testing: the read-only region allocated to replace a region snapshot would be swapped into the snapshot volume, but would be susceptible to being deleted by the user, and therefore unable to be swapped into other volumes that have that snapshot volume as a read-only parent. This requires an additional volume that used that read-only region in order to bump the reference count associated with that region, so that the user cannot delete it before it was used to replace all other uses of the region snapshot it was meant to replace. This additional volume's lifetime lives as long as the region snapshot replacement, and therefore needs to be deleted when the region snapshot replacement is finished. This required a new region snapshot replacement finish saga, which required a new "Completing" state to perform the same type of state based lock on the replacement request done for all the other sagas. Testing also revealed that there were scenarios where `find_deleted_volume_regions` would return volumes for hard-deletion prematurely. The function now returns a struct instead of a list of tuples, and in that struct, regions freed for deletion are now distinct from volumes freed for deletion. Volumes are now only returned for hard-deletion when all associated read/write regions have been (or are going to be) deleted. Fixes #6353
This issue was seen on rack2 after it was updated to omicron commit
4a6be3a40f6c9f876ff53308532f6c37f030b13f
. Here are the log lines at the time pantry crashed:The core file is available in catacomb under
/staff/dock/rack2/mupdate-20240815/cores
.The text was updated successfully, but these errors were encountered: