Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High availability mode #176

Open
TwiN opened this issue Sep 17, 2021 · 3 comments
Open

High availability mode #176

TwiN opened this issue Sep 17, 2021 · 3 comments
Labels
feature New feature or request

Comments

@TwiN
Copy link
Owner

TwiN commented Sep 17, 2021

This feature would allow more than one replica of Gatus with the exact same configuration to coexist by leveraging leader election through the new postgres storage.type.

Programmatically, this is how I envision it to work:

  1. First instance of Gatus, henceforth G1, starts.
  2. G1 tries to acquire lock by querying the new instance table in the Postgres database.
  3. Because the row specifying whether an instance has claimed the role of leader does not exist yet, G1 creates a row with the column label set to default, the role set to LEADER and the last_heartbeat set to CURRENT_TIMESTAMP.
  4. G1 is now the leader, therefore it begins monitoring the services configured.
  5. Every minute, G1 updates the timestamp in the Postgres database.
  6. Second instance of Gatus, henceforth G2, starts.
  7. G2 tries to acquire the writer lock by querying the instance table in the Postgres database for the label default and the role LEADER.
  8. G2 fails to acquire the lock, because another instance has already acquired it and the last_heartbeat timestamp is within the past 5 minutes. This 5 minutes shall be defined as time until reelection.
  9. G2 tries to acquire the writer lock every 2 minutes.
  10. Now, let's assume that G1 runs into an issue and crashes.
  11. G1 restarts, tries to acquire the lock, but as documented by step 8, it fails.
  12. 5 minutes goes by and the time for reelection has come, after which either G1 or G2 will grab the lock.

During this entire time, both G1 and G2 can read from the database, and therefore handle HTTP requests. The only restriction is that no more than one leader for one label can write at any given time.

distributed:
  mode: HA
  label: default

The parameter distributed.label is optional, and will default to the value default.

Why do we need a label?

This will be needed for #64 -- basically, let's say you wanted to deploy Gatus in 3 isolated environments which all have access to the postgres database, let's call them alpha, bravo and charlie. Of course, each environment has their own set of services to monitor.

You'd use the label to differentiate these environments and allow one leader per environment to push their data in the database, all while allowing each separate environment to be highly available.

Requirements:

  • storage.type must be set postgres
@TwiN TwiN added the feature New feature or request label Sep 17, 2021
@guillomep
Copy link

Could you make HA available without usage of a database ?

If we know by advance the endpoint (IP) of all gatus, we could simply list them in the configuration and they can elect a leader by talking to each other. One of known algorithm to do that is Raft https://raft.github.io/

@BrianInAz
Copy link

Could you make HA available without usage of a database ?

If we know by advance the endpoint (IP) of all gatus, we could simply list them in the configuration and they can elect a leader by talking to each other. One of known algorithm to do that is Raft https://raft.github.io/

I think an easier/quicker path to HA might be to model it after Prometheus and leverage Alertmanager to de-dupe alerts.

I've only taken a brief look so far but I think the existing custom notification will work with Alertmanager so long as the notification limiter is commented out.

@beatkind
Copy link

Hi there, I think this issue lost a bit of traction. Is there any other status on this topic, then what is described in this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants