Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset UUID conflicts (blueprint vs sled agent) #7265

Closed
smklein opened this issue Dec 17, 2024 · 8 comments · Fixed by #7266
Closed

Dataset UUID conflicts (blueprint vs sled agent) #7265

smklein opened this issue Dec 17, 2024 · 8 comments · Fixed by #7266
Milestone

Comments

@smklein
Copy link
Collaborator

smklein commented Dec 17, 2024

Context

  • Omicron datasets previously were only known to Sled Agent
  • Nexus recently learned about datasets, and we applied dataset UUID within sled agents as a ZFS property

In the limit, we have the following goals:

  • RSS and blueprints know about all datasets - durable, transient filesystem, etc
  • During RSS and blueprint execution, we tell sled agents to "create all datasets" and then "create all zones".
  • During zone creation, we should not create any new datasets

However, for backward compatibility, the following code exists in omicron_zones_ensure:

https://github.com/oxidecomputer/omicron/blob/main/sled-agent/src/sled_agent.rs#L965-L970

The Sled Agent, during zone creation, "ensures" that necessary datasets exist. This is needed for backwards compatibility, because we do not know that all deployed systems will have transient datasets that get created prior to their zones launching. (We have a PR ready to cut this code out -- see: #7160 -- but it hasn't merged yet because systems exist in-field with old blueprints that do not know about datasets. That's why this PR was actually reverted in #7157 -- we still want it, but we need to ensure all systems are ready to forgo this "dataset ensuring within zone ensure" call).

Issue

The issue we're seeing arises from a conflict of "source-of-truth" for dataset UUIDs.

This can manifest errors in subtle ways. @iliana observed this in a new rack, where the "dataset ensure" step of blueprint execution failed.

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_executor
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
task: "blueprint_executor"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 366, triggered by a periodic timer firing
    started at 2024-12-17T23:45:32.808Z (33s ago) and ran for 4172ms
    target blueprint: 648b5619-6cf0-41ec-89cc-c12ba2e48d4a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
    execution:        enabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
    status:           failed at: Deploy datasets (step 4/16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
    error:            step failed: Deploy datasets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
      caused by:      4 errors encountered                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
      caused by:      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: c93031be-26a2-4a74-9ea8-77a9f8b6a2ab (zpool), kind: External } }, err: Some("Dataset oxp_c93031be-26a2-4a74-9ea8-77a9f8b6a2ab/crucible exists with a different uuid (has 074b4516-47be-4121-aeda-ba2df47d5b0d, requested 10795f0d-888d-404a-9ef9-a209831a8f13)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 8d25b9b7-2598-4887-becb-9a9fc34b8820 (zpool), kind: External } }, err: Some("Dataset oxp_8d25b9b7-2598-4887-becb-9a9fc34b8820/crucible exists with a different uuid (has 186679d6-c306-4c2f-b37b-51c78db0e186, requested 396994e7-4109-4568-81c7-a9e98060eabd)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 1816eaba-1a11-4dc1-ab17-c11c7d86f764 (zpool), kind: External } }, err: Some("Dataset oxp_1816eaba-1a11-4dc1-ab17-c11c7d86f764/crucible exists with a different uuid (has e27e9b5b-46cc-47e4-9070-3526d2420d3c, requested 46c52874-4f43-4394-82e7-7edd664c0401)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 3a50eb96-447b-46de-ae6f-ca2baa0e3011 (zpool), kind: External } }, err: Some("Dataset oxp_3a50eb96-447b-46de-ae6f-ca2baa0e3011/crucible exists with a different uuid (has 49f114e3-5126-4577-b6bf-758418f934b7, requested 51162c6f-670e-45ab-ad86-747ada0da490)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 9356d62c-2fb9-485d-a508-3ea11cd2057f (zpool), kind: External } }, err: Some("Dataset oxp_9356d62c-2fb9-485d-a508-3ea11cd2057f/crucible exists with a different uuid (has 046cb7fa-6f3e-4698-a8cf-98f134627357, requested 7a5fc75d-5ea1-42a0-8de8-b86e9ea47ad1)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: e5cba5ab-3219-49b7-9a75-f111b93ad09b (zpool), kind: External } }, err: Some("Dataset oxp_e5cba5ab-3219-49b7-9a75-f111b93ad09b/crucible exists with a different uuid (has e13c563b-1fb5-48b3-bc85-74aa85f355b8, requested 7f4bdce7-9845-4557-adef-09adf5a4008c)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: 0811e9c6-da24-474a-9c70-de7a1571d691 (zpool), kind: External } }, err: Some("Dataset oxp_0811e9c6-da24-474a-9c70-de7a1571d691/crypt/cockroachdb exists with a different uuid (has a6c27977-22ec-4675-84af-3fa339e99aa7, requested b71444d4-742f-468b-a2dd-d1686793ffac)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 0811e9c6-da24-474a-9c70-de7a1571d691 (zpool), kind: External } }, err: Some("Dataset oxp_0811e9c6-da24-474a-9c70-de7a1571d691/crucible exists with a different uuid (has 110dce89-bc87-4cb2-bb2f-e636cf5c6370, requested cf1f8b46-02f9-434f-b9ff-06721589d315)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: c93031be-26a2-4a74-9ea8-77a9f8b6a2ab (zpool), kind: External } }, err: Some("Dataset oxp_c93031be-26a2-4a74-9ea8-77a9f8b6a2ab/crypt/cockroachdb exists with a different uuid (has 5752fd6a-b1bc-4406-b002-13c55ffc1a6d, requested d1c0a521-b278-4ed4-b386-2ed8ea39ec15)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: f0e1da30-7e23-4907-b5fe-cee8799aa92f (zpool), kind: External } }, err: Some("Dataset oxp_f0e1da30-7e23-4907-b5fe-cee8799aa92f/crucible exists with a different uuid (has 2e6fe58f-d2b5-46d6-9ff4-2c2708d4c844, requested f28140c5-75e3-45c4-bd73-075f14cb198a)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 6623fac6-ffbb-4afc-89c2-eb9f49dd320e (zpool), kind: External } }, err: Some("Dataset oxp_6623fac6-ffbb-4afc-89c2-eb9f49dd320e/crucible exists with a different uuid (has 24b3e9d9-eadb-4215-91fe-f887bd9cc583, requested f4c0c079-95eb-44a6-bdf4-8b3bf6b1761f)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 1066d5fa-a973-4763-9b43-b1bd2a5928fb (zpool), kind: External } }, err: Some("Dataset oxp_1066d5fa-a973-4763-9b43-b1bd2a5928fb/crucible exists with a different uuid (has 2b40fac6-78ca-4aac-b263-d2655c5dd6d4, requested f7efcbfd-9578-48c2-a92b-4fc8e8f3833a)") }]                                                                                                                       
                      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: a35ee3ea-9de3-4f41-a4ef-3db3e33e46de (zpool), kind: External } }, err: Some("Dataset oxp_a35ee3ea-9de3-4f41-a4ef-3db3e33e46de/crucible exists with a different uuid (has 65c8bd50-db0d-4459-97db-3bb050157894, requested 0400eff3-ce0e-426e-ab1c-ef78fc171137)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 18b261d5-151d-462d-8b99-1fa5cb71eee8 (zpool), kind: External } }, err: Some("Dataset oxp_18b261d5-151d-462d-8b99-1fa5cb71eee8/crucible exists with a different uuid (has 41d1aa68-c6a9-43f8-a360-3b291bf9745e, requested 0d36849a-4a4c-41b8-bb0b-3cc63245a714)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 8ca478cc-835c-48be-8001-35f2c3df3611 (zpool), kind: External } }, err: Some("Dataset oxp_8ca478cc-835c-48be-8001-35f2c3df3611/crucible exists with a different uuid (has 03bbab0f-9a1d-4901-8226-e525617f1323, requested 0fa05d03-c7c5-43d8-bb15-6bea2d0981ab)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 3f71220e-3773-4999-8bd9-0ba1131e131d (zpool), kind: External } }, err: Some("Dataset oxp_3f71220e-3773-4999-8bd9-0ba1131e131d/crucible exists with a different uuid (has 72d998c4-39d3-4d3e-a70f-b0fb76b677e7, requested 41caa1c0-d99c-4e5a-a4b8-609c95b558b2)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: a35ee3ea-9de3-4f41-a4ef-3db3e33e46de (zpool), kind: External } }, err: Some("Dataset oxp_a35ee3ea-9de3-4f41-a4ef-3db3e33e46de/crypt/cockroachdb exists with a different uuid (has 0fc95ad1-fb49-40ad-9018-60b63502e82d, requested 50f154a4-94fd-4a41-bf83-cfb529a3324e)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: ff8584e6-c4d7-4a6b-a4cf-a823ce216a74 (zpool), kind: External } }, err: Some("Dataset oxp_ff8584e6-c4d7-4a6b-a4cf-a823ce216a74/crucible exists with a different uuid (has 00ab3308-8c90-4080-b6ce-8011d698a37f, requested 8a8716ea-c962-44f5-9006-7997cac2c083)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: InternalDns, pool_name: ZpoolName { id: a35ee3ea-9de3-4f41-a4ef-3db3e33e46de (zpool), kind: External } }, err: Some("Dataset oxp_a35ee3ea-9de3-4f41-a4ef-3db3e33e46de/crypt/internal_dns exists with a different uuid (has c1c3d8be-4e77-43f7-9739-7703de9b3b32, requested 8da35f44-8baa-4eb5-a775-d8d903f25c38)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 97a16bbd-49da-4294-a058-c898a56e0be5 (zpool), kind: External } }, err: Some("Dataset oxp_97a16bbd-49da-4294-a058-c898a56e0be5/crucible exists with a different uuid (has 7a1707af-fc42-4061-973a-0cd4718c8c1d, requested a08c713c-45c1-414f-93b0-033220af6c36)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 0513e823-33e4-4dfb-88d8-5227f70d8c5d (zpool), kind: External } }, err: Some("Dataset oxp_0513e823-33e4-4dfb-88d8-5227f70d8c5d/crucible exists with a different uuid (has c5de2dad-7281-45d7-b988-14ebc684ca50, requested c4498daa-63fb-4375-8d7c-e25de2a41238)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 21f3b575-92fd-48c4-b356-70b7ba2b83d8 (zpool), kind: External } }, err: Some("Dataset oxp_21f3b575-92fd-48c4-b356-70b7ba2b83d8/crucible exists with a different uuid (has b6fd83a6-c1bf-4a69-966c-1964d6a54fb1, requested d02c808b-3645-4189-af56-ad9f2265f5f0)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: c0e5dffc-702e-4bae-8bd4-f876bb8ce751 (zpool), kind: External } }, err: Some("Dataset oxp_c0e5dffc-702e-4bae-8bd4-f876bb8ce751/crucible exists with a different uuid (has 9418aed6-ae58-46ae-bf85-eae8fde2d258, requested f78a8c8e-7a96-4977-87f2-f135b9b0fa2c)") }]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
                      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 1b898b56-ccb0-4504-a513-45a5493bb914 (zpool), kind: External } }, err: Some("Dataset oxp_1b898b56-ccb0-4504-a513-45a5493bb914/crucible exists with a different uuid (has c6a39dc3-e0dd-41a4-93f0-0d3d7ed901f9, requested 0d845bbb-1c50-4b80-8056-1a63f18aa6a6)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 4bc14018-4db8-4308-a6d4-b0b58304531d (zpool), kind: External } }, err: Some("Dataset oxp_4bc14018-4db8-4308-a6d4-b0b58304531d/crucible exists with a different uuid (has 17e4fb03-8b7a-41df-8d0c-f334252bf063, requested 1b1699b3-4dba-4597-9a77-647485919a85)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 82f9c2fe-6c6b-45c3-b560-8e473f619640 (zpool), kind: External } }, err: Some("Dataset oxp_82f9c2fe-6c6b-45c3-b560-8e473f619640/crucible exists with a different uuid (has cde8f6f1-a752-47a4-a95d-fea45a77770e, requested 2ddf4ba9-dd17-4476-b50e-7b5b5a251c38)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 7b63d939-6415-469c-a297-a865825b6f80 (zpool), kind: External } }, err: Some("Dataset oxp_7b63d939-6415-469c-a297-a865825b6f80/crucible exists with a different uuid (has 53d1554f-8a9e-4c3a-a60a-a8d5b1edc8ad, requested 4e5ce5a1-83e5-4aaf-ac5d-a9bfa2a9ec38)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: 1b898b56-ccb0-4504-a513-45a5493bb914 (zpool), kind: External } }, err: Some("Dataset oxp_1b898b56-ccb0-4504-a513-45a5493bb914/crypt/cockroachdb exists with a different uuid (has 4c9ad1f5-b15f-44d5-b975-ca1a8f18a23f, requested 5b0a6030-4072-48af-8826-023615fa4128)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: ExternalDns, pool_name: ZpoolName { id: 1b898b56-ccb0-4504-a513-45a5493bb914 (zpool), kind: External } }, err: Some("Dataset oxp_1b898b56-ccb0-4504-a513-45a5493bb914/crypt/external_dns exists with a different uuid (has 841547e7-fbfe-40ad-bac6-ef65a07a1710, requested ad564f4c-b201-4db3-b726-77cbd745972b)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 752bce1d-1c70-41ba-9eab-1e765a97ccd3 (zpool), kind: External } }, err: Some("Dataset oxp_752bce1d-1c70-41ba-9eab-1e765a97ccd3/crucible exists with a different uuid (has c8001870-cb58-49cf-a1b4-c1ff5d99be59, requested beaffbb5-8f19-4782-b360-820d63bd6755)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: InternalDns, pool_name: ZpoolName { id: 1b898b56-ccb0-4504-a513-45a5493bb914 (zpool), kind: External } }, err: Some("Dataset oxp_1b898b56-ccb0-4504-a513-45a5493bb914/crypt/internal_dns exists with a different uuid (has 66b691ed-86c6-482d-b924-f6197993d9ff, requested c47e0f9a-b5d6-4240-9d31-0b39d0298673)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: a8025497-7eb1-409d-854b-8ba40e96095b (zpool), kind: External } }, err: Some("Dataset oxp_a8025497-7eb1-409d-854b-8ba40e96095b/crucible exists with a different uuid (has e52d4fb9-1122-4035-90bc-0a813c56c2f5, requested ce8f2a3e-912c-4df5-8d5a-99d1337961f5)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: a4a04165-8b0b-4d38-afe4-e0f2641f523d (zpool), kind: External } }, err: Some("Dataset oxp_a4a04165-8b0b-4d38-afe4-e0f2641f523d/crucible exists with a different uuid (has 2a07ea98-8c37-4a25-af3a-afcced72e3b7, requested d593af43-1ece-45f9-9f9e-44df7471b5e6)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 28dd172d-609b-4510-95b6-f56bc4d6da49 (zpool), kind: External } }, err: Some("Dataset oxp_28dd172d-609b-4510-95b6-f56bc4d6da49/crucible exists with a different uuid (has 8f600bca-1851-4f56-ba64-cc787ac24fb0, requested e781f837-6758-47cf-abb9-faf6b2c10768)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 560eb52a-f519-43e2-9c46-14523ea40623 (zpool), kind: External } }, err: Some("Dataset oxp_560eb52a-f519-43e2-9c46-14523ea40623/crucible exists with a different uuid (has 1816133d-5c99-4fde-95b7-e39f9ce3bb61, requested ee279684-97aa-4c8a-815f-bb0944476f6e)") }]                                                                                                       
                      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 18eddf74-dafd-4b95-bd73-ee8c81d6a6c3 (zpool), kind: External } }, err: Some("Dataset oxp_18eddf74-dafd-4b95-bd73-ee8c81d6a6c3/crucible exists with a different uuid (has b1765be5-f489-458d-851c-65ac31d2e4be, requested 0e71fd71-e6e1-4368-b273-0727b2dca7fe)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 196ba252-d3ec-49a5-a475-e5d25ffa6cc6 (zpool), kind: External } }, err: Some("Dataset oxp_196ba252-d3ec-49a5-a475-e5d25ffa6cc6/crucible exists with a different uuid (has 355aa14a-5104-491d-9bf0-e9ce913b9300, requested 12544abe-e8fc-43cb-abed-df70fc9b740d)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: ebdbc188-7bb2-4757-944a-95bffd7a1836 (zpool), kind: External } }, err: Some("Dataset oxp_ebdbc188-7bb2-4757-944a-95bffd7a1836/crucible exists with a different uuid (has af68ccf9-dc56-45af-9940-72a7e1ea86da, requested 35f5317f-3acc-4386-aed3-c7b57e0ae17d)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: a29a754a-10e1-41a6-ba1c-ae5a171ce796 (zpool), kind: External } }, err: Some("Dataset oxp_a29a754a-10e1-41a6-ba1c-ae5a171ce796/crucible exists with a different uuid (has fb35b6ff-817e-4bbe-8fed-502b2cd5d8a9, requested 518e81b1-5cee-46e8-a533-b262776ae8fe)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: c18703b8-7f5d-4448-a7bf-77d1617d40bf (zpool), kind: External } }, err: Some("Dataset oxp_c18703b8-7f5d-4448-a7bf-77d1617d40bf/crucible exists with a different uuid (has f6e0bfec-0969-482d-b357-dc3d6397c589, requested 526e9bb7-3a99-4fb4-b614-aefa426ad475)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: InternalDns, pool_name: ZpoolName { id: 292bb446-4069-4ba4-9444-806651fc9c08 (zpool), kind: External } }, err: Some("Dataset oxp_292bb446-4069-4ba4-9444-806651fc9c08/crypt/internal_dns exists with a different uuid (has e26b72a2-c9b1-4ac4-9961-0386ecd779c3, requested 5849e92f-778a-4602-a9ff-762cdb496a19)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: 292bb446-4069-4ba4-9444-806651fc9c08 (zpool), kind: External } }, err: Some("Dataset oxp_292bb446-4069-4ba4-9444-806651fc9c08/crypt/cockroachdb exists with a different uuid (has c15b5c0f-3c63-4339-8df3-b5171ef12249, requested 762e7f34-da2f-4c1f-b950-43e85dc91aa9)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Clickhouse, pool_name: ZpoolName { id: 292bb446-4069-4ba4-9444-806651fc9c08 (zpool), kind: External } }, err: Some("Dataset oxp_292bb446-4069-4ba4-9444-806651fc9c08/crypt/clickhouse exists with a different uuid (has 25bc826d-9dde-44c4-94e3-584b2a16aebd, requested a4b60a27-2a9f-4655-bf9f-56fb8af5f9b9)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 2c92ec21-fc46-44e7-aeff-2e1334a99e47 (zpool), kind: External } }, err: Some("Dataset oxp_2c92ec21-fc46-44e7-aeff-2e1334a99e47/crucible exists with a different uuid (has ba68aa1e-e698-45bc-8368-68a15446e146, requested adaa390b-f6e0-410a-86fe-ef6fce713782)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 292bb446-4069-4ba4-9444-806651fc9c08 (zpool), kind: External } }, err: Some("Dataset oxp_292bb446-4069-4ba4-9444-806651fc9c08/crucible exists with a different uuid (has 8fb6b2de-82aa-4a9b-9647-9ec6e4d1c001, requested b9432c01-f59d-44e4-8064-18c2f018b5fc)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: a5246c0e-b931-4f2a-b83e-758148f81bbf (zpool), kind: External } }, err: Some("Dataset oxp_a5246c0e-b931-4f2a-b83e-758148f81bbf/crucible exists with a different uuid (has d4d9c22b-cbd3-4e8b-8e58-da37d207aa79, requested d32828aa-1d9e-44f8-80ad-5af577335bbc)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 572bf165-a286-48c2-a00f-2ab5eabf7f80 (zpool), kind: External } }, err: Some("Dataset oxp_572bf165-a286-48c2-a00f-2ab5eabf7f80/crucible exists with a different uuid (has f43a5b63-565e-4f5e-aca4-c92c7231fd0d, requested de9c8951-a841-41a2-96fe-c2c452af2e31)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 34e8834d-1a32-4ea8-a34f-cf533f8420f2 (zpool), kind: External } }, err: Some("Dataset oxp_34e8834d-1a32-4ea8-a34f-cf533f8420f2/crucible exists with a different uuid (has 4c375660-1be8-4999-afd1-d90989a0726c, requested ea225cd8-c412-43c0-9260-bd41ab6ca538)") }]

With the code block here:

https://github.com/oxidecomputer/omicron/blob/main/sled-agent/src/sled_agent.rs#L965-L970

We effectively override the "dataset UUID" value for all datasets to the "zone UUID" value whenever we call omicron_zones_ensure.

However, whenever we try to execute the blueprint -- and call datasets_ensure -- we do not use the zone UUID, and instead, rely on the value explicitly in the blueprint. Unfortunately, if this is not the zone UUID, it throws the error @iliana saw: "dataset ... exists with a different uuid".

@smklein
Copy link
Collaborator Author

smklein commented Dec 18, 2024

#7266 is my proposed fix here, but it has drawbacks. However, my hope is that:

  • It deletes this source of UUID conflict, and stops further cases of "zone UUID used as dataset UUID" from propagating.
  • It allows blueprints to act as a source-of-truth, and update these ZFS properties.

Unfortunately, it does come with some downsides:

  • It continues to rely on "referencing datasets by (pool, dataset kind)", and hoping this combination is unique. If the dataset UUID is mutable (as we rely on in [sled-agent] Avoid causing UUID conflicts #7266 , to change it from "zone ID" to "whatever the blueprint thinks it should be") we are prevented from using the UUID as a stable identifier for a while longer. However, my hope is that this is a temporary condition, and that we can eventually trust the dataset UUID to be a unique identifier once blueprint execution has sufficiently propagated.

@jgallagher
Copy link
Contributor

I balked at it initially, but I'm starting to come around to @davepacheco's suggestion to change RSS to assign dataset UUIDs to match zone UUIDs, which works around this from the other direction: it makes sled-agent's incorrect choice incidentally correct. I think we could recover from this in the blueprint system in the long run (by checking for equal IDs and generating new dataset UUIDs, once we know sled-agent won't clobber them). Happy to chat more tomorrow.

@smklein
Copy link
Collaborator Author

smklein commented Dec 18, 2024

I balked at it initially, but I'm starting to come around to @davepacheco's suggestion to change RSS to assign dataset UUIDs to match zone UUIDs, which works around this from the other direction: it makes sled-agent's incorrect choice incidentally correct. I think we could recover from this in the blueprint system in the long run (by checking for equal IDs and generating new dataset UUIDs, once we know sled-agent won't clobber them). Happy to chat more tomorrow.

What about systems with blueprints that have datasets UUIDs != zone UUIDs? Are we making the assumption that these don't exist, in this proposal?

EDIT: ah, nevermind, clearly misread:

I think we could recover from this in the blueprint system in the long run (by checking for equal IDs and generating new dataset UUIDs, once we know sled-agent won't clobber them)

So presumably, we would just make the blueprint system "handle all cases" - it might have dataset UUIDs == zone UUIDs, or it might not

@iliana
Copy link
Contributor

iliana commented Dec 18, 2024

Just to confirm, I am seeing this again on madrid after installing 020fde1 and double-checking that I did clean slate properly. The symptoms are the same as we observed on dublin. The correct dataset ID for this particular Crucible dataset is aacd1786-b55f-4004-934c-7700e947afdc, but oxide:uuid was later rewritten to 6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71, the zone ID.

root@BRM42220007:~# zfs get oxide:uuid oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible
NAME                                               PROPERTY    VALUE                                 SOURCE
oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible  oxide:uuid  6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71  local

root@BRM42220007:~# zoneadm list | grep 6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71
oxz_crucible_6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71

root@BRM42220007:~# zpool history oxp_295f61f3-4edd-4b54-847b-d868bd7af718 | grep oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible
1986-12-28.00:19:45 zfs create -o zoned=on -o mountpoint=/data oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible
1986-12-28.00:19:50 zfs set quota=none reservation=none compression=off oxide:uuid=aacd1786-b55f-4004-934c-7700e947afdc oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible
2024-12-18.00:47:00 zfs set oxide:uuid=6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71 oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible
2024-12-18.00:47:06 zfs set oxide:uuid=6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71 oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible
2024-12-18.00:47:20 zfs create oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible/regions

root@BRM42220007:~# jq '.datasets["aacd1786-b55f-4004-934c-7700e947afdc"]' /pool/int/62cf5414-d097-4f67-b181-925e7f9c5009/config/omicron-datasets.json
{
  "id": "aacd1786-b55f-4004-934c-7700e947afdc",
  "name": {
    "pool_name": "oxp_295f61f3-4edd-4b54-847b-d868bd7af718",
    "kind": "crucible"
  },
  "compression": {
    "type": "off"
  },
  "quota": null,
  "reservation": null
}

And blueprint execution fails in the same way:

root@oxz_switch1:~# omdb nexus background-tasks show blueprint_executor
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
task: "blueprint_executor"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 13, triggered by a dependent task completing
    started at 2024-12-18T00:57:38.243Z (16s ago) and ran for 662ms
    target blueprint: 4335eb7f-a07b-4f0b-8245-1687a490942c                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
    execution:        enabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
    status:           failed at: Deploy datasets (step 4/16)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
    error:            step failed: Deploy datasets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
      caused by:      4 errors encountered                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
      caused by:      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: f68be2a4-6d42-49fe-bfe6-70f576904390 (zpool), kind: External } }, err: Some("Dataset oxp_f68be2a4-6d42-49fe-bfe6-70f576904390/crucible exists with a different uuid (has 9f47f721-4572-4428-aef3-c2687409879d, requested 3e77e1e7-0d5b-4b18-b6f4-8152f408a781)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 374a0e3b-149d-4160-8bba-13609abb1123 (zpool), kind: External } }, err: Some("Dataset oxp_374a0e3b-149d-4160-8bba-13609abb1123/crucible exists with a different uuid (has a15e9ee7-5fec-4697-a965-7a9579445930, requested 3f76cfc2-2920-4eb8-9ce2-6fe67f71c44b)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: cc1e5c46-80a2-460e-837e-f8068c6141f6 (zpool), kind: External } }, err: Some("Dataset oxp_cc1e5c46-80a2-460e-837e-f8068c6141f6/crucible exists with a different uuid (has 562e3f50-5f53-4024-9954-2a46546e16a5, requested 41514b48-14a3-4bb0-9d5b-0b1f0f8ae700)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: 9b601d89-79b8-44b5-a78a-319e46a7ff86 (zpool), kind: External } }, err: Some("Dataset oxp_9b601d89-79b8-44b5-a78a-319e46a7ff86/crypt/cockroachdb exists with a different uuid (has 33a3722d-ff6f-48ca-8e32-c7156e43d7e5, requested 786a7778-9814-4ab2-b88e-8ceb54834100)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 63264b81-8af8-405e-8c73-1b09714130ec (zpool), kind: External } }, err: Some("Dataset oxp_63264b81-8af8-405e-8c73-1b09714130ec/crucible exists with a different uuid (has ff4ba4f2-e510-4b59-a096-dd0ba559a245, requested 8ceadeaf-1918-46f5-a6be-6dcd09147dac)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: e14e618c-7ac2-40c6-9381-4f6cfb850d7e (zpool), kind: External } }, err: Some("Dataset oxp_e14e618c-7ac2-40c6-9381-4f6cfb850d7e/crucible exists with a different uuid (has 5f781b2e-e18e-40d9-9cbb-b39048d4ec77, requested a068ffc3-212b-436b-b7bf-84e7a8ca48ac)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 295f61f3-4edd-4b54-847b-d868bd7af718 (zpool), kind: External } }, err: Some("Dataset oxp_295f61f3-4edd-4b54-847b-d868bd7af718/crucible exists with a different uuid (has 6bb2ce7d-c48d-41a8-9545-9a3bff9c5f71, requested aacd1786-b55f-4004-934c-7700e947afdc)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 9c5906bd-aec6-4def-923f-c942fd9e4d4b (zpool), kind: External } }, err: Some("Dataset oxp_9c5906bd-aec6-4def-923f-c942fd9e4d4b/crucible exists with a different uuid (has 5274ece9-6723-47e1-aaa3-d0e1c1b08b78, requested ab646477-ccb3-4438-9445-43da9dc765d9)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: InternalDns, pool_name: ZpoolName { id: 9b601d89-79b8-44b5-a78a-319e46a7ff86 (zpool), kind: External } }, err: Some("Dataset oxp_9b601d89-79b8-44b5-a78a-319e46a7ff86/crypt/internal_dns exists with a different uuid (has 896343ab-b8fa-4756-8608-f5d3e1b22b06, requested bc60c583-e814-4de8-a834-e7e5e9962a22)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 9b601d89-79b8-44b5-a78a-319e46a7ff86 (zpool), kind: External } }, err: Some("Dataset oxp_9b601d89-79b8-44b5-a78a-319e46a7ff86/crucible exists with a different uuid (has 71f3f490-7551-4197-ad4f-88c09d65bbd6, requested d7d5eb33-dc8b-476d-b60e-11eb26c1576e)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Clickhouse, pool_name: ZpoolName { id: 9b601d89-79b8-44b5-a78a-319e46a7ff86 (zpool), kind: External } }, err: Some("Dataset oxp_9b601d89-79b8-44b5-a78a-319e46a7ff86/crypt/clickhouse exists with a different uuid (has 846635f5-64aa-4f83-a54b-6fffb403b20e, requested ee2a87f2-a1e1-4ce0-ba99-0f5b7118e01e)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 5e671abd-4d28-4055-af8a-e08c4e24046e (zpool), kind: External } }, err: Some("Dataset oxp_5e671abd-4d28-4055-af8a-e08c4e24046e/crucible exists with a different uuid (has 22c2d915-a63c-4df4-bfd8-485e7409983f, requested ef9da8be-08a0-4356-92a9-62ca285e8b56)") }]                                                                                                          
                      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: InternalDns, pool_name: ZpoolName { id: d91df18c-a00d-464a-841a-03cda67f3f8b (zpool), kind: External } }, err: Some("Dataset oxp_d91df18c-a00d-464a-841a-03cda67f3f8b/crypt/internal_dns exists with a different uuid (has 7df5d59f-a5e7-47f5-84ad-72cf8e4d9923, requested 21241953-32c4-4720-8955-a44374e782f7)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 619267a6-7fd2-4f60-a973-2be1346bac2f (zpool), kind: External } }, err: Some("Dataset oxp_619267a6-7fd2-4f60-a973-2be1346bac2f/crucible exists with a different uuid (has 45bc9753-d218-4264-b8e6-9ad366297847, requested 3e39e72c-5b40-4196-a55d-163540655579)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 2c908eb8-302c-427e-847c-f3e2ef3881b8 (zpool), kind: External } }, err: Some("Dataset oxp_2c908eb8-302c-427e-847c-f3e2ef3881b8/crucible exists with a different uuid (has bb578a21-4ced-43c9-a1a0-f144debbf960, requested 3eb35689-bf10-4347-9cd9-3766cbe0d750)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: ExternalDns, pool_name: ZpoolName { id: d91df18c-a00d-464a-841a-03cda67f3f8b (zpool), kind: External } }, err: Some("Dataset oxp_d91df18c-a00d-464a-841a-03cda67f3f8b/crypt/external_dns exists with a different uuid (has 4ffe6d68-e43f-4c9c-a48d-43f994f26112, requested 432bd824-697b-4da9-9d40-39a91f601a89)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: e6732d6f-e1f4-4213-b913-750b2b27ba97 (zpool), kind: External } }, err: Some("Dataset oxp_e6732d6f-e1f4-4213-b913-750b2b27ba97/crucible exists with a different uuid (has b48016f6-f012-473b-9e79-b8695d6929a5, requested 4676f827-c591-45da-b0f0-5c8314537e33)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 4a662bc2-9e83-467a-97b7-b6c409085813 (zpool), kind: External } }, err: Some("Dataset oxp_4a662bc2-9e83-467a-97b7-b6c409085813/crucible exists with a different uuid (has 74758955-a0bc-4b34-ad59-8d12cad69158, requested 74346e89-e624-4fb5-b025-0f9814941a66)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 30482ef8-897e-43a2-a94c-7f4bc2a53e7b (zpool), kind: External } }, err: Some("Dataset oxp_30482ef8-897e-43a2-a94c-7f4bc2a53e7b/crucible exists with a different uuid (has 751b199c-c6e8-4de7-9bb7-0bc0a91b5ad0, requested 938c04de-96ea-4f0b-86c1-8645983bf0ef)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 2f48b9ee-c1d3-4c19-b1e2-4010dcb599c9 (zpool), kind: External } }, err: Some("Dataset oxp_2f48b9ee-c1d3-4c19-b1e2-4010dcb599c9/crucible exists with a different uuid (has 0f844ea8-c99a-4e51-90af-6d69ae1be2e4, requested 959051f7-2bf7-41fa-95f2-cd61a65ef667)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 737c7f72-d9d1-48b1-83f0-45f45636b187 (zpool), kind: External } }, err: Some("Dataset oxp_737c7f72-d9d1-48b1-83f0-45f45636b187/crucible exists with a different uuid (has 300258cd-58a1-421a-aa21-8d7c27b26ee5, requested a315b8e1-5cd9-4f47-af16-086739b07ce5)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 64215f90-17e8-4258-81d2-16ae61feaa19 (zpool), kind: External } }, err: Some("Dataset oxp_64215f90-17e8-4258-81d2-16ae61feaa19/crucible exists with a different uuid (has 2773d920-1d16-4420-9d43-ed5382d39121, requested a8336f08-e5f7-4f86-bb10-ad4347716f0c)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: d91df18c-a00d-464a-841a-03cda67f3f8b (zpool), kind: External } }, err: Some("Dataset oxp_d91df18c-a00d-464a-841a-03cda67f3f8b/crypt/cockroachdb exists with a different uuid (has b25b97a9-d704-4cae-a120-a15109e48086, requested be937109-951e-4f1e-b0c4-eb15684aa2c7)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: c718e6d6-1a45-47b7-981c-f9c07c1ff004 (zpool), kind: External } }, err: Some("Dataset oxp_c718e6d6-1a45-47b7-981c-f9c07c1ff004/crucible exists with a different uuid (has 221c8bff-6b6a-4fe1-a1c4-d6a3e23caba8, requested c5e2399d-aa1b-4fb8-9585-91227b933a08)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: d91df18c-a00d-464a-841a-03cda67f3f8b (zpool), kind: External } }, err: Some("Dataset oxp_d91df18c-a00d-464a-841a-03cda67f3f8b/crucible exists with a different uuid (has ddb5fdf7-6ecd-4dcd-a70c-a4297b339156, requested ecfd5f0a-e88f-4bec-8fbf-978adf8ed316)") }]
                      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: InternalDns, pool_name: ZpoolName { id: 62449d99-c7c3-49f3-851a-aa382108607b (zpool), kind: External } }, err: Some("Dataset oxp_62449d99-c7c3-49f3-851a-aa382108607b/crypt/internal_dns exists with a different uuid (has 698350f4-0c6c-41db-8ae1-0043d6b43070, requested 05751f71-33d8-4e17-a1d2-b287b6c64a1d)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: d8c9cdfb-0ebd-4c4d-9dbf-c1ec26bbc3d5 (zpool), kind: External } }, err: Some("Dataset oxp_d8c9cdfb-0ebd-4c4d-9dbf-c1ec26bbc3d5/crucible exists with a different uuid (has 49bc2d66-0e0a-4576-9173-83a926044b8d, requested 14665dd4-57a4-4d71-be7f-88cbf5642931)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 0c867dac-b526-4f88-b940-93f9147220f4 (zpool), kind: External } }, err: Some("Dataset oxp_0c867dac-b526-4f88-b940-93f9147220f4/crucible exists with a different uuid (has 9de691bd-e831-40af-ab61-d88cec52b4c1, requested 2078b0b3-141d-4082-a5a8-d4c6b13aadcf)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: b68eddc8-3803-4153-bd33-b984df55525e (zpool), kind: External } }, err: Some("Dataset oxp_b68eddc8-3803-4153-bd33-b984df55525e/crucible exists with a different uuid (has e414a07f-6359-40c9-a59e-066bb90a27de, requested 5b2ebadf-fd85-4598-af3f-3b10dddad9bc)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 2b7940ff-c48b-4bc8-bbd9-9c6d2a9aabbb (zpool), kind: External } }, err: Some("Dataset oxp_2b7940ff-c48b-4bc8-bbd9-9c6d2a9aabbb/crucible exists with a different uuid (has 4ea18236-d35d-4d77-98fc-0ca6e7a0480a, requested 7068e0ec-2025-4cff-85d5-2cd9211044bb)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 89d7695d-1e15-45e2-bd18-b7bbed142349 (zpool), kind: External } }, err: Some("Dataset oxp_89d7695d-1e15-45e2-bd18-b7bbed142349/crucible exists with a different uuid (has 3f2b90b5-c532-4fc2-befe-36703fa1eb5d, requested 71eade4c-d260-431d-a19e-fc9ab4815b4a)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: aa9c56cd-4ade-410c-a04b-ddbd83fb9538 (zpool), kind: External } }, err: Some("Dataset oxp_aa9c56cd-4ade-410c-a04b-ddbd83fb9538/crucible exists with a different uuid (has d9966af2-daf7-4afb-8a27-031cb3e316af, requested bb1b0d19-bfd2-4483-938b-cca3fe5eb27b)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: 62449d99-c7c3-49f3-851a-aa382108607b (zpool), kind: External } }, err: Some("Dataset oxp_62449d99-c7c3-49f3-851a-aa382108607b/crypt/cockroachdb exists with a different uuid (has be912fd6-4b08-4e19-813a-a073092729dd, requested bb5cfb7c-7262-4f2d-a769-073a3a24a352)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 9b64a304-709d-4dc3-a026-56cd1adfef72 (zpool), kind: External } }, err: Some("Dataset oxp_9b64a304-709d-4dc3-a026-56cd1adfef72/crucible exists with a different uuid (has 1582c17e-d989-44ef-978e-39f1cbf72829, requested d1bb3d35-2c29-4dc2-b493-3c4298dceb3c)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: d167f45f-87b9-432b-836f-8b532766fde9 (zpool), kind: External } }, err: Some("Dataset oxp_d167f45f-87b9-432b-836f-8b532766fde9/crucible exists with a different uuid (has c6be4237-b8bc-4476-ae7d-5fc89b89ecb5, requested dea84a2a-432a-4859-a21b-87920ca90c45)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 963e1276-d253-4329-b078-7b205b123ab2 (zpool), kind: External } }, err: Some("Dataset oxp_963e1276-d253-4329-b078-7b205b123ab2/crucible exists with a different uuid (has 8bb34df3-904f-4c28-a772-070627cc414a, requested e0ad381d-0f65-4614-869c-9860d21c4502)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 62449d99-c7c3-49f3-851a-aa382108607b (zpool), kind: External } }, err: Some("Dataset oxp_62449d99-c7c3-49f3-851a-aa382108607b/crucible exists with a different uuid (has 542a22fa-4079-400c-974c-71c9a94cdd65, requested ebe36e83-abd3-40b2-8511-3b173c431824)") }]                                                                                                                    
                      Error: failure deploying datasets: [DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 70f43e09-8d50-440f-baf1-28f804578c52 (zpool), kind: External } }, err: Some("Dataset oxp_70f43e09-8d50-440f-baf1-28f804578c52/crucible exists with a different uuid (has 6a42d949-347d-4262-9ab1-210b6d4d5e79, requested 1bfbc574-6020-46f7-a614-a85799249180)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 1102b537-71ba-46a7-9841-7e08c2e73314 (zpool), kind: External } }, err: Some("Dataset oxp_1102b537-71ba-46a7-9841-7e08c2e73314/crucible exists with a different uuid (has fdf01a7e-c570-4b1f-9011-8f193b20c751, requested 3880e98b-7ff9-4b5b-9e4a-3db003b3a4da)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 16711d75-8d75-4785-bf53-665a39f7ff37 (zpool), kind: External } }, err: Some("Dataset oxp_16711d75-8d75-4785-bf53-665a39f7ff37/crucible exists with a different uuid (has cb19d9d9-0334-44e5-b5b9-cf348b2f3840, requested 3e5be34d-3aa7-4bc2-9f0a-e3bf58d3ba11)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 34bbfab4-5510-4bc3-9860-88546f3827ef (zpool), kind: External } }, err: Some("Dataset oxp_34bbfab4-5510-4bc3-9860-88546f3827ef/crucible exists with a different uuid (has 8c942000-9dd9-48a5-b63c-4593343ae43c, requested 67d0f014-13a7-4ccb-bc21-8c469d74e475)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: fed2fdfc-17a5-4538-951d-6a4994270046 (zpool), kind: External } }, err: Some("Dataset oxp_fed2fdfc-17a5-4538-951d-6a4994270046/crucible exists with a different uuid (has 9d186a86-c595-4722-b14c-9bb67ef0a546, requested 6b4acf2d-ca73-45f6-b553-1bd5a3e1d995)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 9cb4f62b-eed4-4474-9abe-8a0ee6f94532 (zpool), kind: External } }, err: Some("Dataset oxp_9cb4f62b-eed4-4474-9abe-8a0ee6f94532/crucible exists with a different uuid (has a3c52caf-96f4-4925-b037-a34dba181dce, requested 7c49862d-9282-446f-9f69-9fac2a9a9555)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 4ff0ed1f-4b23-4f14-854e-a8871d603905 (zpool), kind: External } }, err: Some("Dataset oxp_4ff0ed1f-4b23-4f14-854e-a8871d603905/crucible exists with a different uuid (has a08a0b12-f2a2-43af-8ce7-86172b07ae17, requested 8e69c374-497d-4370-be5a-5fe9ec95eada)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 7b6be2a4-5ec0-4e6b-96ad-3af2d4c581cc (zpool), kind: External } }, err: Some("Dataset oxp_7b6be2a4-5ec0-4e6b-96ad-3af2d4c581cc/crucible exists with a different uuid (has 936557c4-f101-44d7-88ee-e296365744e6, requested adecb4e8-869a-4dd0-843f-d3a6b2db60fb)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: 7b6be2a4-5ec0-4e6b-96ad-3af2d4c581cc (zpool), kind: External } }, err: Some("Dataset oxp_7b6be2a4-5ec0-4e6b-96ad-3af2d4c581cc/crypt/cockroachdb exists with a different uuid (has d815f78e-126b-4f2f-81e6-8543d22d22fe, requested dedbcf8e-e512-485c-87f6-5ac3604a503d)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: f0678e32-f1ca-4db8-8399-9fbceb18c085 (zpool), kind: External } }, err: Some("Dataset oxp_f0678e32-f1ca-4db8-8399-9fbceb18c085/crucible exists with a different uuid (has 166cac8d-5770-4085-a766-cef8da353b60, requested ec4bda34-6c81-4f5c-9adb-697e5a8d0c82)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Crucible, pool_name: ZpoolName { id: 4392bb10-7c68-46d4-978a-5c2b536dca53 (zpool), kind: External } }, err: Some("Dataset oxp_4392bb10-7c68-46d4-978a-5c2b536dca53/crucible exists with a different uuid (has 71ac999e-cb43-44d6-8476-e4deb4a897dc, requested ef0b68fa-8d67-429d-90f4-21f5d5648cef)") }, DatasetManagementStatus { dataset_name: DatasetName { kind: Cockroach, pool_name: ZpoolName { id: f0678e32-f1ca-4db8-8399-9fbceb18c085 (zpool), kind: External } }, err: Some("Dataset oxp_f0678e32-f1ca-4db8-8399-9fbceb18c085/crypt/cockroachdb exists with a different uuid (has 4a87f1cf-a1c3-4af9-9c70-111052807ec4, requested f779213b-6a9d-4b11-a7ed-0372dc11c6bd)") }] 

@jgallagher
Copy link
Contributor

What about systems with blueprints that have datasets UUIDs != zone UUIDs? Are we making the assumption that these don't exist, in this proposal?

I was assuming that, yeah. I thought prior to R12 there weren't any systems that had blueprints with dataset UUIDs?

@jgallagher
Copy link
Contributor

The Sled Agent, during zone creation, "ensures" that necessary datasets exist. This is needed for backwards compatibility, because we do not know that all deployed systems will have transient datasets that get created prior to their zones launching.

Hmm, I think I'm missing something. The dataset that omicron_zones_ensure is ensuring is the durable dataset, not the transient dataset, right? dataset_name() here:

let Some(dataset_name) = zone.dataset_name() else {

lands in the OmicronZoneTypeExt trait where it only returns Some for durable datasets.

Is there a separate path in sled-agent that ensures transient datasets exist?

@smklein
Copy link
Collaborator Author

smklein commented Dec 18, 2024

Is there a separate path in sled-agent that ensures transient datasets exist?

Yes. There is a different mechanism for picking the transient dataset:

let filesystem_pool = match (&zone.filesystem_pool, zone.dataset_name())
{
// If a pool was explicitly requested, use it.
(Some(pool), _) => pool.clone(),
// NOTE: The following cases are for backwards compatibility.
//
// If no pool was selected, prefer to use the same pool as the
// durable dataset. Otherwise, pick one randomly.
(None, Some(dataset)) => dataset.pool().clone(),
(None, None) => all_u2_pools
.choose(&mut rand::thread_rng())
.ok_or_else(|| Error::U2NotFound)?
.clone(),
};

@morlandi7 morlandi7 added this to the 12 milestone Dec 18, 2024
@smklein
Copy link
Collaborator Author

smklein commented Dec 18, 2024

Closing the loop a little for history's sake: we had a meeting about this, and are planning on going through the #7266 route (do not overwrite the dataset UUID, and keep checking it, but stop setting it to the zone ID in the omicron_zones_ensure pathway)

smklein added a commit that referenced this issue Dec 18, 2024
- Avoids overwriting the value of "dataset UUID" when creating datasets
from `omicron_zones_ensure`. Instead, don't set any dataset UUID, which
lets subsequent calls to `datasets_ensure` set the right value here.

Fixes #7265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants