Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL #1191

enjoy-binbin · 2024-10-18T08:55:23Z

Imagine we have a cluster, for example a three-shard cluster,
if shard 1 doing a CLUSTER RESET HARD, it will change the node
name, and then other nodes will mark it as NOADR since the node
name received by PONG has changed.

In the eyes of other nodes, there is one working primary node
left but with no address, and in this case, the address report
in MOVED will be invalid and will confuse the clients. And in
the same time, the replica will not failover since its primary
is not in the FAIL state. And the cluster looks OK to everyone.

This leaves a cluster that appears OK, but with no coverage for
shard 1, obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.

But the point in here, we can mark the NOADDR node as FAIL to
advance the cluster state. If a node is NOADDR means it does
not have a valid address, so we won't reconnect it, we won't
send PING, we won't gossip it, it seems reasonable to mark it
as FAIL.

Imagine we have a cluster, for example a three-shard cluster, if shard 1 doing a CLUSTER RESET HARD, it will change the node name, and then other nodes will mark it as NOADR since the node name received by PONG has changed. In the eyes of other nodes, there is one working primary node left but with no address, and in this case, the address report in MOVED will be invalid and will confuse the clients. And in the same time, the replica will not failover since its primary is not in the FAIL state. And the cluster looks OK to everyone. This leaves a cluster that appears OK, but with no coverage for shard 1, obviously we should do something like CLUSTER FORGET to remove the node and fix the cluster before using it. But the point in here, we can mark the NOADDR node as FAIL to advance the cluster state. If a node is NOADDR means it does not have a valid address, so we won't reconnect it, we won't send PING, we won't gossip it, it seems reasonable to mark it as FAIL. Signed-off-by: Binbin <[email protected]>

codecov · 2024-10-18T09:11:26Z

Codecov Report

Attention: Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.82%. Comparing base (d00c856) to head (ba0e235).
Report is 3 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	75.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1191      +/-   ##
============================================
- Coverage     70.86%   70.82%   -0.04%     
============================================
  Files           119      119              
  Lines         64852    64858       +6     
============================================
- Hits          45958    45938      -20     
- Misses        18894    18920      +26

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`86.90% <75.00%> (+0.05%)`	⬆️

... and 14 files with indirect coverage changes

zuiderkwast

It seems to be correct, but I want someone else to take a look. @PingXie?

So when a node is in NOADDR state, it will never be PFAIL and later FAIL? We never get any updates from the old node ID so it should automatically become PFAIL at some point?

Are there any other case where a node can be marked as NOADDR and come back again? Changed IP address of the server but still running?

src/cluster_legacy.c

enjoy-binbin · 2024-10-19T12:24:33Z

So when a node is in NOADDR state, it will never be PFAIL and later FAIL? We never get any updates from the old node ID so it should automatically become PFAIL at some point?

yes, it won't never be PFAIL or FAIL. It also won't automatically become PFAIL since in clusterCron, we will skip the NOADDR node for the timeout check.

Are there any other case where a node can be marked as NOADDR and come back again? Changed IP address of the server but still running?

Maybe, but i am not aware of it. If the IP changed, we will call nodeUpdateAddressIfNeeded to update it, so it won't be NOADDR state.

Signed-off-by: Binbin <[email protected]>

zuiderkwast · 2024-10-20T12:59:38Z

Just an idea: Can we set it to PFAIL? It is unreachable in myself's view but maybe another node can reach it somehow? To mark a node as FAIL is usually a majority decision. We can wait for a majority of nodes to mark it as FAIL, but it takes more time. Is that a problem?

enjoy-binbin · 2024-10-20T15:15:25Z

we won't include the noaddr node in the gossip section. That is the problem so we will never get the majority

zuiderkwast · 2024-10-21T07:39:48Z

we won't include the noaddr node in the gossip section. That is the problem so we will never get the majority

I guess another option is to start including it in the gossip section then.

PingXie · 2024-10-22T05:32:39Z

Imagine we have a cluster, for example a three-shard cluster,
if shard 1 doing a CLUSTER RESET HARD, it will change the node
name, and then other nodes will mark it as NOADR since the node
name received by PONG has changed.

This sounds like a human error to me to begin with? If proper steps, such as failing over the primary, cluster forget, etc, were followed, we wouldn't enter this state, IMO. Did this actually happen due to runtime errors?

obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.

Generally speaking, I am more in favor of operational mitigations for non-runtime errors like this. That said, the proposal fix makes sense to me too.

src/cluster_legacy.c

zuiderkwast · 2024-10-22T06:20:57Z

obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.

Generally speaking, I am more in favor of operational mitigations for non-runtime errors like this. That said, the proposal fix makes sense to me too.

Good point. Then the problem is that the admin API is not safe. In a good API, it should not be possible to mess up the runtime state like this.

When CLUSTER RESET HARD is used, can we handle it as if CLUSTER FORGET of the old node id was called? Add the old node id to forgotten nodes blacklist, etc.

enjoy-binbin · 2024-10-22T06:25:20Z

When CLUSTER RESET HARD is used, can we handle it as if CLUSTER FORGET of the old node id was called? Add the old node id to forgotten nodes blacklist, etc.

I have also thought about this, for example, gossip the CLUSTER RESET, but finally gave up, not sure if there are other issues.

zuiderkwast · 2024-10-22T06:28:58Z

Gossip the CLUSTER RESET seems like a new message = more work. We already gossip the forgotten nodes, so if we use this, it's not too difficult I think. Also not sure if there are other issues though.

Signed-off-by: Binbin <[email protected]>

enjoy-binbin · 2024-10-25T09:53:25Z

If we don't want to set the NOADDR node to FAIL, a another way is to add a check for NOADDR node in clusterCron / cluster info so that the cluster can report FAIL. And add a check for NOADDR primary so that a replica has a way that can trigger a failover.

zuiderkwast · 2024-10-25T10:14:13Z

If we don't want to set the NOADDR node to FAIL, a another way is to add a check for NOADDR node in clusterCron / cluster info so that the cluster can report FAIL. And add a check for NOADDR primary so that a replica has a way that can trigger a failover.

This sounds like a workaround. I think we can set it to FAIL.

But I agree with Ping, we should probably broadcast the FAIL, because FAIL is normally a cluster majority agreed state. The cluster should have a consistent view of which nodes are FAIL and which are not.

Signed-off-by: Binbin <[email protected]>

enjoy-binbin requested review from PingXie and zuiderkwast October 18, 2024 08:55

enjoy-binbin force-pushed the NOADDR_nodes branch from e59417b to c1bf0e6 Compare October 18, 2024 08:56

zuiderkwast reviewed Oct 18, 2024

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

enjoy-binbin added 2 commits October 19, 2024 20:27

update comment

bc86365

Signed-off-by: Binbin <[email protected]>

fix timing issue

4d2780a

Signed-off-by: Binbin <[email protected]>

enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Oct 19, 2024

zuiderkwast approved these changes Oct 20, 2024

View reviewed changes

PingXie reviewed Oct 22, 2024

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

Broadcast the node when marked with NOADDR

a594b86

Signed-off-by: Binbin <[email protected]>

enjoy-binbin changed the title ~~Mark the node as FAIL when the node is marked as NOADDR~~ Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL Oct 22, 2024

zuiderkwast approved these changes Oct 25, 2024

View reviewed changes

zuiderkwast added bug Something isn't working cluster release-notes This issue should get a line item in the release notes labels Oct 25, 2024

enjoy-binbin added 2 commits November 17, 2024 17:28

Merge remote-tracking branch 'upstream/unstable' into NOADDR_nodes

0d360bc

Signed-off-by: Binbin <[email protected]>

set fail_time when maring FAIL and cleanup the test code

80651b3

Signed-off-by: Binbin <[email protected]>

enjoy-binbin requested a review from madolson November 22, 2024 08:32

Merge remote-tracking branch 'upstream/unstable' into NOADDR_nodes

ba0e235

Signed-off-by: Binbin <[email protected]>

enjoy-binbin requested a review from hpatro December 23, 2024 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL #1191

Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL #1191

enjoy-binbin commented Oct 18, 2024

codecov bot commented Oct 18, 2024 •

edited

Loading

zuiderkwast left a comment

enjoy-binbin commented Oct 19, 2024

zuiderkwast commented Oct 20, 2024

enjoy-binbin commented Oct 20, 2024

zuiderkwast commented Oct 21, 2024

PingXie commented Oct 22, 2024

zuiderkwast commented Oct 22, 2024 •

edited

Loading

enjoy-binbin commented Oct 22, 2024

zuiderkwast commented Oct 22, 2024

enjoy-binbin commented Oct 25, 2024

zuiderkwast commented Oct 25, 2024

Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL #1191

Are you sure you want to change the base?

Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL #1191

Conversation

enjoy-binbin commented Oct 18, 2024

codecov bot commented Oct 18, 2024 • edited Loading

Codecov Report

zuiderkwast left a comment

Choose a reason for hiding this comment

enjoy-binbin commented Oct 19, 2024

zuiderkwast commented Oct 20, 2024

enjoy-binbin commented Oct 20, 2024

zuiderkwast commented Oct 21, 2024

PingXie commented Oct 22, 2024

zuiderkwast commented Oct 22, 2024 • edited Loading

enjoy-binbin commented Oct 22, 2024

zuiderkwast commented Oct 22, 2024

enjoy-binbin commented Oct 25, 2024

zuiderkwast commented Oct 25, 2024

codecov bot commented Oct 18, 2024 •

edited

Loading

zuiderkwast commented Oct 22, 2024 •

edited

Loading