You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature would allow more than one replica of Gatus with the exact same configuration to coexist by leveraging leader election through the new postgres storage.type.
Programmatically, this is how I envision it to work:
First instance of Gatus, henceforth G1, starts.
G1 tries to acquire lock by querying the new instance table in the Postgres database.
Because the row specifying whether an instance has claimed the role of leader does not exist yet, G1 creates a row with the column label set to default, the role set to LEADER and the last_heartbeat set to CURRENT_TIMESTAMP.
G1 is now the leader, therefore it begins monitoring the services configured.
Every minute, G1 updates the timestamp in the Postgres database.
Second instance of Gatus, henceforth G2, starts.
G2 tries to acquire the writer lock by querying the instance table in the Postgres database for the label default and the role LEADER.
G2 fails to acquire the lock, because another instance has already acquired it and the last_heartbeat timestamp is within the past 5 minutes. This 5 minutes shall be defined as time until reelection.
G2 tries to acquire the writer lock every 2 minutes.
Now, let's assume that G1 runs into an issue and crashes.
G1 restarts, tries to acquire the lock, but as documented by step 8, it fails.
5 minutes goes by and the time for reelection has come, after which either G1 or G2 will grab the lock.
During this entire time, both G1 and G2 can read from the database, and therefore handle HTTP requests. The only restriction is that no more than one leader for one label can write at any given time.
distributed:
mode: HAlabel: default
The parameter distributed.label is optional, and will default to the value default.
Why do we need a label?
This will be needed for #64 -- basically, let's say you wanted to deploy Gatus in 3 isolated environments which all have access to the postgres database, let's call them alpha, bravo and charlie. Of course, each environment has their own set of services to monitor.
You'd use the label to differentiate these environments and allow one leader per environment to push their data in the database, all while allowing each separate environment to be highly available.
Requirements:
storage.type must be set postgres
The text was updated successfully, but these errors were encountered:
Could you make HA available without usage of a database ?
If we know by advance the endpoint (IP) of all gatus, we could simply list them in the configuration and they can elect a leader by talking to each other. One of known algorithm to do that is Raft https://raft.github.io/
Could you make HA available without usage of a database ?
If we know by advance the endpoint (IP) of all gatus, we could simply list them in the configuration and they can elect a leader by talking to each other. One of known algorithm to do that is Raft https://raft.github.io/
I think an easier/quicker path to HA might be to model it after Prometheus and leverage Alertmanager to de-dupe alerts.
I've only taken a brief look so far but I think the existing custom notification will work with Alertmanager so long as the notification limiter is commented out.
This feature would allow more than one replica of Gatus with the exact same configuration to coexist by leveraging leader election through the new postgres
storage.type
.Programmatically, this is how I envision it to work:
instance
table in the Postgres database.label
set todefault
, therole
set toLEADER
and thelast_heartbeat
set toCURRENT_TIMESTAMP
.instance
table in the Postgres database for the labeldefault
and the roleLEADER
.last_heartbeat
timestamp is within the past 5 minutes. This 5 minutes shall be defined as time until reelection.During this entire time, both G1 and G2 can read from the database, and therefore handle HTTP requests. The only restriction is that no more than one leader for one label can write at any given time.
The parameter
distributed.label
is optional, and will default to the valuedefault
.Why do we need a label?
This will be needed for #64 -- basically, let's say you wanted to deploy Gatus in 3 isolated environments which all have access to the postgres database, let's call them
alpha
,bravo
andcharlie
. Of course, each environment has their own set of services to monitor.You'd use the
label
to differentiate these environments and allow one leader per environment to push their data in the database, all while allowing each separate environment to be highly available.Requirements:
storage.type
must be setpostgres
The text was updated successfully, but these errors were encountered: