-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRC removal during diskless full sync with TLS enabled. #1479
base: unstable
Are you sure you want to change the base?
Conversation
Signed-off-by: Tal Shachar <[email protected]>
@talxsha before I look into this, lets put some details in the top comment. linking the issue is not what we susually do. |
@@ -1244,11 +1244,12 @@ void syncCommand(client *c) { | |||
* the primary can accurately lists replicas and their listening ports in the | |||
* INFO output. | |||
* | |||
* - capa <eof|psync2|dual-channel> | |||
* - capa <eof|psync2|dual-channel|disable_sync_crc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super fond of the disable_sync_crc capability name. maybe a better name woulod be bypass_crc?
@@ -3156,6 +3156,7 @@ static int applyClientMaxMemoryUsage(const char **err) { | |||
standardConfig static_configs[] = { | |||
/* Bool configs */ | |||
createBoolConfig("rdbchecksum", NULL, IMMUTABLE_CONFIG, server.rdb_checksum, 1, NULL, NULL), | |||
createBoolConfig("disable-sync-crc", NULL, MODIFIABLE_CONFIG, server.disable_sync_crc, 0, NULL, NULL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we normally do not like to introduce new configurations. In this case the feature is controlled via capability so no issues with compatibility. Is there a way this would still be required in some cases>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please don't introduce a config for this.
@@ -2218,6 +2218,7 @@ void initServerConfig(void) { | |||
server.fsynced_reploff_pending = 0; | |||
server.rdb_client_id = -1; | |||
server.loading_process_events_interval_ms = LOADING_PROCESS_EVENTS_INTERVAL_DEFAULT; | |||
server.repl_meet_disable_crc_cond = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not follow on why we need to keep this flag on the server? aren't all the checks in readSyncBulkPayload valid at the point of decision to flag the rdb?
@@ -1838,6 +1842,7 @@ struct valkeyServer { | |||
double stat_fork_rate; /* Fork rate in GB/sec. */ | |||
long long stat_total_forks; /* Total count of fork. */ | |||
long long stat_rejected_conn; /* Clients rejected because of maxclients */ | |||
size_t stat_total_crc_disabled_syncs_stated; /* Total number of full syncs stated with CRC checksum disabled */ // AMZN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the AMZN, we are not in Kansas anymore. Also the stat name is weird, maybe we can use another stat eg, sync_bypass_crc? .
Also note that In general I would not find any reason to have this statistic unless it is used for writing tests right? Maybe such stats are better be placed under the debug section of the info, but I guess we already have so many stats so i would let it pass.
@@ -3601,6 +3612,12 @@ int rdbSaveToReplicasSockets(int req, rdbSaveInfo *rsi) { | |||
} | |||
serverSetCpuAffinity(server.bgsave_cpulist); | |||
|
|||
if (disable_sync_crc_capa == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (disable_sync_crc_capa == 1) { | |
if (disable_sync_crc_capa) { |
@@ -3354,7 +3355,7 @@ int rdbLoadRioWithLoadingCtx(rio *rdb, int rdbflags, rdbSaveInfo *rsi, rdbLoadin | |||
if (rioRead(rdb, &cksum, 8) == 0) goto eoferr; | |||
if (server.rdb_checksum && !server.skip_checksum_validation) { | |||
memrev64ifbe(&cksum); | |||
if (cksum == 0) { | |||
if (cksum == 0 || (rdb->flags & RIO_FLAG_DISABLE_CRC) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cksum == 0 || (rdb->flags & RIO_FLAG_DISABLE_CRC) != 0) { | |
if (cksum == 0 || (rdb->flags & RIO_FLAG_DISABLE_CRC)) { |
Also add justification for why we should do this only when TLS is enabled. Given that the network has built in checksumming, I'm still not convinced about the tradeoff we are making given that the steady state replication is not checksummed. |
@madolson should I tag it as a major-decision ? I think it worth discussion. |
// Set a flag to determin later whether or not the replica will skip CRC calculations for this sync - | ||
// Disable CRC on replica if: (1) TLS is enabled; (2) replica disable_sync_crc is enabled; (3) diskelss sync enabled on both replica and primary. | ||
// Otherwise, CRC should be enabled/disabled as per server.rdb_checksum | ||
if (connIsTLS(conn) && server.disable_sync_crc && use_diskless_load && usemark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. I think we should encapsulate this whole condition in some function to make the code more readable.
b. The connIsTLS part of the decision is somewhat too intrusive IMO. I think maybe we can add an API in the connection abstraction like connIntegrityChecked or something like this. maybe there will be non-TLS connections (eg QUIC) which will provide some integrity mechanism which will not be defined as "TLS"
For now it's not. It's just an internal one. I would probably just ping PingXie directly and core team if anyone else is interested. |
Implemented a mechanism to eliminate CRC64 checksumming during full sync when not writing to disk (with TLS enabled), as it adds overhead with minimal benefit. TLS already provides strong data integrity checks.
Replica can skip CRC calculations when these conditions are met:
Primary can skip CRC calculations when these conditions are met:
Closes #1129