-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TXG timestamp database #16853
base: master
Are you sure you want to change the base?
Add TXG timestamp database #16853
Conversation
2a20b11
to
364f813
Compare
364f813
to
891c8f2
Compare
This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG. With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations. Signed-off-by: Mariusz Zaborski <[email protected]>
It crashes on |
This reminds me we recently added |
ret = sscanf(timestr, "%4d-%2d-%2d %2d:%2d", &tm.tm_year, &tm.tm_mon, | ||
&tm.tm_mday, &tm.tm_hour, &tm.tm_min); | ||
if (ret < 3) { | ||
fprintf(stderr, gettext("Failed to parse the date.\n")); | ||
usage(B_FALSE); | ||
} | ||
|
||
// Adjust struct | ||
tm.tm_year -= 1900; | ||
tm.tm_mon -= 1; | ||
|
||
return (timegm(&tm)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if strptime()
or something else specialized would be better.
static const spa_feature_t txg_log_time_deps[] = { | ||
SPA_FEATURE_EXTENSIBLE_DATASET, | ||
SPA_FEATURE_NONE | ||
}; | ||
zfeature_register(SPA_FEATURE_TXG_TIMELOG, | ||
"com.klaraystems:txg_log_time", "txg_log_time", | ||
"Log history of txg.", | ||
ZFEATURE_FLAG_PER_DATASET | ZFEATURE_FLAG_READONLY_COMPAT, | ||
ZFEATURE_TYPE_BOOLEAN, txg_log_time_deps, sfeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why do we need this feature at all. I don't see a problem in pool being imported by some implementation not supporting this feature. Sure some TXGs won't be recorded, but so what? This functionality seems to be a best effort any way.
And if we need this for some reason, why is it ZFEATURE_FLAG_PER_DATASET
? So far it feels like pool-wide.
/* Load time log */ | ||
error = spa_load_txg_log_time(spa); | ||
if (error != 0) | ||
return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could instead delete it and start from scratch. Not a big deal.
@@ -10229,6 +10343,7 @@ spa_sync(spa_t *spa, uint64_t txg) | |||
} | |||
|
|||
spa_sync_rewrite_vdev_config(spa, tx); | |||
spa_sync_time_logger(spa, tx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you are dirtying the pool too late in a sync process. It does not need to be that late. You could move it earlier, somewhere around brt_pending_apply()
.
typedef struct { | ||
int rrd_head; /* head (beginning) */ | ||
int rrd_tail; /* tail (end) */ | ||
size_t rrd_length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why length is size_t, while head/tail are int. They all address the same array and limited by RRD_MAX_ENTRIES. Plus you are writing these data directly to pool, while int and size_t (and might be hrtime_t?) might have different sizes on different platforms.
if (data == NULL || mindiff > rrd_abs(tv - cur->rrdd_time)) { | ||
data = cur; | ||
mindiff = rrd_abs(tv - cur->rrdd_time); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scrub we might want strict rounding down for start time and rounding up for end time, not the closest. It is better to scrub more rather than less.
VERIFY0(zap_add(spa_meta_objset(spa), | ||
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TXG_LOG_TIME_MINUTES, | ||
1, sizeof (spa->spa_txg_log_time.dbr_minutes), | ||
&spa->spa_txg_log_time.dbr_minutes, tx)); | ||
VERIFY0(zap_add(spa_meta_objset(spa), | ||
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TXG_LOG_TIME_DAYS, | ||
1, sizeof (spa->spa_txg_log_time.dbr_days), | ||
&spa->spa_txg_log_time.dbr_days, tx)); | ||
VERIFY0(zap_add(spa_meta_objset(spa), | ||
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TXG_LOG_TIME_MONTHS, | ||
1, sizeof (spa->spa_txg_log_time.dbr_months), | ||
&spa->spa_txg_log_time.dbr_months, tx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are storing the data as array of bytes. Where is byteswap handling? We already have BRT non-endian safe that must be fixed, please don't add more.
Motivation and Context
This feature enables tracking of when TXGs are committed to disk, providing an estimated timestamp for each TXG.
With this information, it becomes possible to perform scrubs based on specific date ranges, improving the granularity of data management and recovery operations.
Description
To achieve this, we implemented a round-robin database that keeps track of time. We separate the tracking into minutes, days, and years. We believe this provides the best resolution for time management. This feature does not track the exact time of each transaction group (txg) but provides an estimate. The txg database can also be used in other scenarios where mapping dates to transaction groups is required.
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.