Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ip and port columns from dataset table #7210

Open
andrewjstone opened this issue Dec 5, 2024 · 6 comments
Open

Remove ip and port columns from dataset table #7210

andrewjstone opened this issue Dec 5, 2024 · 6 comments
Assignees
Labels
cleanup Code cleanliness database Related to database access

Comments

@andrewjstone
Copy link
Contributor

These columns are optional and only used by crucible. This makes working with datasets trickier than need be - doubly so for those that don't know these columns are only used by crucible.

It also looks like the port itself is not actually used and instead the port in the region table is used. Regions map 1:1 with datasets and so we should move the ip column to the region table and remove the columns from the dataset table. It should be reasonably straightforward to do this in schema migration. Then we'll update the code to point to region table.

@andrewjstone andrewjstone added cleanup Code cleanliness database Related to database access labels Dec 5, 2024
@andrewjstone andrewjstone self-assigned this Dec 5, 2024
@andrewjstone
Copy link
Contributor Author

This is a pre-requisite for #6998

@andrewjstone
Copy link
Contributor Author

andrewjstone commented Dec 6, 2024

After reading more code and thinking through this, I have changed the implementation plan. Our goal is to remove ip and port columns from the region table. For existing regions it's easy to migrate the ip column from the dataset to a new column in region. However, this doesn't work for regions that aren't yet created, since the region will need some way to get the address it needs. This is why the dataset is used in the first place. So we need to step back and ask where that address comes from in the first place.

The address for the dataset is filled in by the reconfigurator and comes from the crucible zone associated with the dataset. Datasets map to a specific zone and are currently not relocated across zones. It's unclear if we'll ever want to do this, and it's somewhat irrelevant for the discussion at hand. So what we really want is for region allocation to be able to find the zone associated with the dataset that it has chosen for the region. Then the IP address the region will use will come from the zone and not from the dataset.

Now how can we find the zone associated with a dataset so that a region can use the right address? Well, that information already exists in the target blueprint. We can loop through all the crucible zones and then find the dataset that matches and return its IP. We may want to add an index for this. Alternatively, we could put a backpointer to the zone_id in the bp_omicron_dataset table and do a point lookup - since the dataset id from the dataset table is the same as in the blueprint. Both of these exclusively use the target blueprint to get to the eventual goal of eliminating the duplicate dataset table as described in #6998.

Ok, that solves how we find the IP address for a region, but what about the port? Well, AFAICT, the port isn't actually used from the dataset table, as it couldn't be, since multiple downstairs share a dataset. The ports already exist in the region table and we'll continue to use those.

@smklein
Copy link
Collaborator

smklein commented Dec 6, 2024

FWIW I like the idea of using the blueprint to access this information -- we'd need to check that the blueprint we're reading is "still the target" in whatever allocation database requests we make, but that feels like a more reasonable source-of-truth than the raw dataset table.

The address/port information existing in the dataset table is a vestigial artifact of being needed well before we had blueprints.

@andrewjstone
Copy link
Contributor Author

andrewjstone commented Dec 6, 2024

FWIW I like the idea of using the blueprint to access this information -- we'd need to check that the blueprint we're reading is "still the target" in whatever allocation database requests we make, but that feels like a more reasonable source-of-truth than the raw dataset table.

Yes, I think serializability should give us this if we read the current target blueprint table in a transaction.

@smklein
Copy link
Collaborator

smklein commented Dec 6, 2024

You may be interested in this helper:

/// Creates a transaction iff the current blueprint is "bp_id".
///
/// - The transaction is retryable and named "name"
/// - The "bp_id" value is checked as the first operation within the
/// transaction.
/// - If "bp_id" is still the current target, then "f" is called,
/// within a transactional context.
pub async fn transaction_if_current_blueprint_is<Func, R>(

@andrewjstone
Copy link
Contributor Author

You may be interested in this helper:

omicron/nexus/db-queries/src/db/datastore/deployment.rs

Lines 112 to 119 in e73a30e

 /// Creates a transaction iff the current blueprint is "bp_id". 
 /// 
 /// - The transaction is retryable and named "name" 
 /// - The "bp_id" value is checked as the first operation within the 
 /// transaction. 
 /// - If "bp_id" is still the current target, then "f" is called, 
 /// within a transactional context. 
 pub async fn transaction_if_current_blueprint_is<Func, R>(

Interesting. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Code cleanliness database Related to database access
Projects
None yet
Development

No branches or pull requests

2 participants