-
Notifications
You must be signed in to change notification settings - Fork 42
Validator disconnect on step regression #323
Comments
@mdyring I'd suggest opening a corresponding issue on https://github.com/tendermint/tendermint since I do not believe the KMS is closing the connection here, but rather my best guess is Tendermint is closing the connection after receiving a double-signing error response. It'd be good to confirm these are both the case, though. |
@tarcieri done, thanks. From looking at Tendermint log it looks like it sees the connection close, I have added log output to tendermint/tendermint#3844. |
Here's the relevant KMS codepath. It returns an error response to Tendermint, but otherwise considers the request successfully handled and therefore it shouldn't abort the connection. You can also turn on debug logging on the KMS by passing the |
I looked at the attached log output but I don't see what you're talking about re: Tendermint showing the connection being remotely closed by the KMS. |
Sorry, should have been clearer: First line has a "EOF" at the end, indicating the connection was closed. This seems to indicate the connection closure is unexpected for Tendermint, but it could also just be a case of Tendermint closing the connection earlier and then failing on a read later and logging that. |
Isn't this only for the |
@ebuchman good catch, that appears to be what's happening. Perhaps the KMS should send back an error to Tendermint on |
In below log, 5 validators are connected to single tmkms process on same chain-id. It appears that during a Proposal, some validators (.240 and .108 below) get disconnected due to step regression.
Is this as expected? It does not seem to happen with validators attempting to double sign with different block ids, so I would think not.
Disconnecting seems undesirable when trying to do HA. ;-)
The text was updated successfully, but these errors were encountered: