This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
[BUG] Index management can hang cluster on master failure if history is enabled #447
Labels
bug
Something isn't working
Describe the bug
When the active master terminates, and index management history is enabled, the cluster can hang. When the cluster hangs, all cluster tasks are blocked behind an election-to-master task that never completes. I believe https://discuss.opendistrocommunity.dev/t/killed-active-master-not-being-removed-from-the-cluster-state/5011 describes the same issue.
Other plugins installed
Using the standard ODFE install including security plugin
To Reproduce
Difficult to reproduce. We've had ILM enabled for a while and this did not happen until recently. I'm not clear on the exact conditions that cause this, but it seems to happen when network traffic is high.
Expected behavior
When a master node dies it is cleanly replaced.
Additional context
When index state management history is enabled, and the master node dies, it seems that the new master hangs processing a task and the cluster is left in a state where no further cluster operations can occur. This results in a cluster that is practically useless and cannot do any operations that require cluster updates. On the master node, the following stack traces appear:
If I disable Index State Management History, and delete the history write alias, the problem goes away, and masters are reelected normally.
The text was updated successfully, but these errors were encountered: