-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: High availability explanation page #940
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# High Availability | ||
|
||
High availability (HA) is a core feature of {{ product }}, ensuring that | ||
a Kubernetes cluster remains operational and resilient, even when nodes or | ||
critical components encounter failures. This capability is crucial for | ||
maintaining continuous service for applications and workloads running in | ||
production environments. | ||
|
||
HA is automatically enabled in {{ product }} for clusters with three or | ||
bschimke95 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
more nodes independent of the deployment method. By distributing key components | ||
across multiple nodes, HA reduces the risk of downtime and service | ||
interruptions, offering built-in redundancy and fault tolerance. | ||
|
||
## Key Components of a Highly Available Cluster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Key components of a highly available cluster |
||
|
||
A highly available Kubernetes cluster exhibits the following characteristics: | ||
|
||
### 1. **Multiple Nodes for Redundancy** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Multiple nodes for redundancy |
||
|
||
Having multiple nodes in the cluster ensures workload distribution and | ||
redundancy. If one node fails, workloads can be rescheduled on other available | ||
nodes without disrupting services. This node-level redundancy minimizes the | ||
Comment on lines
+21
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a sentence or two explaining if rescheduling needs manual intervention or if it happens automatically? |
||
impact of hardware or network failures. | ||
|
||
### 2. **Control Plane Redundancy** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Control plane redundancy |
||
|
||
The control plane manages the cluster’s state and operations. For high | ||
availability, the control plane components—such as the API server, scheduler, | ||
and controller-manager—are distributed across multiple nodes. This prevents a | ||
single point of failure from rendering the cluster inoperable. | ||
|
||
### 3. **Highly Available Datastore** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Highly available datastore |
||
|
||
By default, {{ product }} uses **dqlite** to manage the Kubernetes | ||
cluster state. Dqlite leverages the Raft consensus algorithm for leader | ||
election and voting, ensuring reliable data replication and failover | ||
capabilities. When a leader node fails, a new leader is elected seamlessly | ||
without administrative intervention. This mechanism allows the cluster to | ||
remain operational even in the event of node failures. More details on | ||
replication and leader elections can be found in | ||
the [dqlite replication documentation][dqlite-replication]. | ||
|
||
<!-- LINKS --> | ||
[dqlite-replication]: https://dqlite.io/docs/explanation/replication |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ channels | |
clustering | ||
ingress | ||
epa | ||
high-availability | ||
security | ||
cis | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: High availability