-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Rethink and Redesign the MEET Protocol to Address Increasing Brittleness and Complexity #1471
Comments
I'm not sure what you're suggesting. If we move to a single source of truth, then we would have a different meet system and would get consensus through the other library we pick. The current meet approach just didn't handle partial timeouts, which is part of the price for pay for building a bespoke consensus algorithm. |
Are you talking about cluster v2? If so, I don't think we should tie the improvement of existing cluster to the timeline of the v2 project. I think that we at least need to document the current meet behavior and then see if we can reason about future patches more easily. "Redesign" though does sound like the wrong word. Let me update the title to clarify the issue better. |
Yeah.
I'm always here for better documentation. The more I think about cluster v2 the more I think we should really just take a fresh pass at the current cluster code and clean it up and try to simplify it. There are many undocumented assumptions, confusing code paths, and the algorithms that were used are brittle in failure modes. Some fuzzy/chaos testing would also might be a good idea. |
What I want to add to MEET is to add some auth meet / auth handshake. We often suffer from some cluster merging problems, so in internal we had meet auth and handshake auth to control the CLUSTER MEET and the clusterAddNode(in the gossip one). Our current code relies too much on CLUSTER MEET / clusterAddNode. If the administrator enters the wrong IP/port, or there are some dead nodes in the cluster, the cluster will be in chaos when the IP/port is reused. |
In AWS talked about having a config to isolate a node, so that it would just silently drop all meet and pong packets and wouldn't try to communicate with a cluster. Having a unique cluster secret that is shared between all the nodes in a cluster also makes a lot of sense. We could add that with a ping extension I think? |
@madolson / @PingXie / @enjoy-binbin
Originally posted by @hpatro in #1436 (review)
The text was updated successfully, but these errors were encountered: