Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-47199: Docs updates: new env provisioning and IDF env descriptions #44

Merged
merged 1 commit into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/_static/bootstrap_forwarding_rule.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/forwarding_rule_details.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/gcp_ip_addresses.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/promote_ip_address.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/developer-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ A Sasquatch developer is responsible for maintaining the Sasquatch components an
kafka-shutdown
broker-migration
connectors
new-environment

.. toctree::
:caption: Troubleshooting
Expand Down
141 changes: 141 additions & 0 deletions docs/developer-guide/new-environment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
################################
Deploying into a new environment
################################

Deploying Sasquatch into a new environment requires multiple ArgoCD syncs with some manual information gathering and updating in between.


Enable Sasquatch in Phalanx
===========================

#. Cut a `Phalanx`_ development branch.
#. Ensure the ``strimzi`` and ``strimzi-access-operator`` Phalanx applications are enabled and synced in the new environment by adding them to the :samp:`environments/values-{environment}.yaml` file, and adding a blank :samp:`values-{environment}.yaml` file to their ``applications/`` directories.
`These docs <https://phalanx.lsst.io/developers/switch-environment-to-branch.html>`_ can help you enable them from your development branch.
#. Enable the ``sasquatch`` app in the environment.
For the :samp:`applications/sasquatch/values-{environment}.yaml` file, copy one from an existing environment that has the same enabled services that you want in the new environment.
Change all of the environment references to the new environment, and change or add anything else you need for the new environment.
#. Comment out any ``loadBalancerIP`` entries in the :samp:`applications/sasquatch/values-{environment}.yaml` file.
We'll fill these in later.
#. In the new environment's ArgoCD, point the ``sasquatch`` app at your Phalanx development branch, and sync it.

This first sync will not be successful.
The `cert-manager`_ ``Certificate`` resource will be stuck in a progressing state until we update some values and provision some DNS.

.. _Phalanx: https://phalanx.lsst.io
.. _cert-manager: https://cert-manager.io/

Gather IP addresses and update Phalanx config
=============================================

.. note::

The public IP address gathering and modification described here only applies to environments deployed on `GCP`_.
This process will be different for other types of environments.

#. Get the broker ids, which are the node ids of the the kafka brokers.
In this example, the broker ids are ``0``, ``1``, and ``2``:

.. code::

❯ kubectl get kafkanodepool -n sasquatch
NAME DESIRED REPLICAS ROLES NODEIDS
controller 3 ["controller"] [3,4,5]
kafka 3 ["broker"] [0,1,2]

#. A GCP public IP address will be provisioned for each of these broker nodes.
Another IP address will be provisioned for the external `kafka bootstrap servers`_ endpoint.
You can see all of the provisioned ip addresses in your GCP project here: :samp:`https://console.cloud.google.com/networking/addresses/list?authuser=1&hl=en&project={project name}`:

.. figure:: /_static/gcp_ip_addresses.png
:name: GCP IP addresses

#. One by one, click on the ``Forwarding rule`` links in each row until you find the ones annotated with :samp:`\{"kubernetes.io/service-name":"sasquatch/sasquatch-kafka-{broker node id}"\}` for each broker node.
Note the ip address and node number.

.. figure:: /_static/forwarding_rule_details.png
:name: Forwarding rule details

#. Find and note the IP address that is annotated with ``{"kubernetes.io/service-name":"sasquatch/sasquatch-kafka-external-bootstrap"}``:

.. figure:: /_static/bootstrap_forwarding_rule.png
:name: Bootstrap forwarding rule

#. Promote all of these IP addresses to GCP Static IP Addresses by choosing the option in the three-vertical-dots menu for each IP address (you may have to scroll horrizontally).
This makes sure that we won't lose these IP addresses and have to update DNS later:

.. figure:: /_static/promote_ip_address.png
:name: Promote IP address

#. Update the :samp:`applications/sasquatch/values-{environment}.yaml` ``strimzi-kafka.kafka`` config with ``loadBalancerIP`` and ``host`` entries that correspond with the node ids that you found.
Here is an example from ``idfint``.
Note that the broker node ids are in the ``broker`` entries, and that the ``host`` entries have numbers in them that match the those ids.

.. code:: yaml

strimzi-kafka:
kafka:
externalListener:
tls:
enabled: true
bootstrap:
loadBalancerIP: "35.188.187.82"
host: sasquatch-int-kafka-bootstrap.lsst.cloud

brokers:
- broker: 0
loadBalancerIP: "34.171.69.125"
host: sasquatch-int-kafka-0.lsst.cloud
- broker: 1
loadBalancerIP: "34.72.50.204"
host: sasquatch-int-kafka-1.lsst.cloud
- broker: 2
loadBalancerIP: "34.173.225.150"
host: sasquatch-int-kafka-2.lsst.cloud

#. Push these changes to your Phalanx branch and sync ``sasquatch`` in ArgoCD.

.. _GCP: https://cloud.google.com
.. _kafka bootstrap servers: https://kafka.apache.org/documentation/#producerconfigs_bootstrap.servers

Provision DNS for TLS certificate
=================================

#. Provision ``CNAME`` records (probably in AWS Route53) for `LetsEncrypt`_ verification for each of the ``host`` entries in the ``strimzi-kafka.kafka`` values.
Continuing with the ``idfint`` example:

.. code:: text

_acme-challenge.sasquatch-int-kafka-0.lsst.cloud (_acme-challenge.tls.lsst.cloud)
_acme-challenge.sasquatch-int-kafka-1.lsst.cloud (_acme-challenge.tls.lsst.cloud)
_acme-challenge.sasquatch-int-kafka-2.lsst.cloud (_acme-challenge.tls.lsst.cloud)
_acme-challenge.sasquatch-int-kafka-bootstrap.lsst.cloud (_acme-challenge.tls.lsst.cloud)

#. Provision ``A`` records for each of the ``host`` entries with their matching IP address values:

.. code:: text

sasquatch-int-kafka-0.lsst.cloud (34.171.69.125)
sasquatch-int-kafka-1.lsst.cloud (34.72.50.204)
sasquatch-int-kafka-2.lsst.cloud (34.173.225.150)
sasquatch-int-kafka-bootstrap.lsst.cloud (35.188.187.82)

#. Wait for the ``Certificate`` Kubernetes resource to provision in ArgoCD! This might take several minutes

.. _LetsEncrypt: https://letsencrypt.org

Configure Gafaelfawr OIDC authentication
========================================

Sasquatch assumes that Chronograf will use OIDC authentication.
Follow `these instructions <https://gafaelfawr.lsst.io/user-guide/openid-connect.html#chronograf>`_ to set it up.

.. warning::

This requires a Gafaelfawr restart.
It could also affect all of the apps in an environment if done incorrectly.
If your new environment is a production environment, you should probably wait for a maintenance window to do this step!

Merge your Phalanx branch!
==========================

If all is well, of course.
58 changes: 58 additions & 0 deletions docs/environments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ The table below summarizes the Sasquatch environments and their main entry point
+---------------------------+---------------------------------------------------+-----------------------------------+----------------+
| :ref:`USDF dev<usdfdev>` | https://usdf-rsp-dev.slac.stanford.edu/chronograf | ``usdfdev_efd`` | Not required |
+---------------------------+---------------------------------------------------+-----------------------------------+----------------+
| :ref:`IDF<idf>` | https://data.lsst.cloud/chronograf | (not available) | Not required |
+---------------------------+---------------------------------------------------+-----------------------------------+----------------+
| :ref:`IDF int<idfint>` | https://data-int.lsst.cloud/chronograf | (not available) | Not required |
+---------------------------+---------------------------------------------------+-----------------------------------+----------------+
| :ref:`IDF dev<idfdev>` | https://data-dev.lsst.cloud/chronograf | ``idfdev_efd`` | Not required |
+---------------------------+---------------------------------------------------+-----------------------------------+----------------+
| :ref:`TTS<tts>` | https://tucson-teststand.lsst.codes/chronograf | ``tucson_teststand_efd`` | NOIRLab VPN |
+---------------------------+---------------------------------------------------+-----------------------------------+----------------+
| :ref:`BTS<bts>` | https://base-lsp.lsst.codes/chronograf | ``base_efd`` | Chile VPN |
Expand Down Expand Up @@ -75,6 +81,58 @@ Intended audience: Project staff.
- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal)
- Kafka REST proxy API: ``https://usdf-rsp-dev.slac.stanford.edu/sasquatch-rest-proxy``

.. _idf:

IDF
---

Sasquatch production environment for the community science platform in Google Cloud.
This instance is mainly used for :ref:`application metrics<appmetrics>`.

Intended audience: Project staff.

- Chronograf: ``https://data.lsst.cloud/chronograf``
- InfluxDB HTTP API: ``https://data.lsst.cloud/influxdb``
- Kafdrop UI: ``https://data.lsst.cloud/kafdrop``
- Kafka boostrap server: ``sasquatch-kafka-bootstrap.lsst.cloud:9094``
- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal)
- Kafka REST proxy API: (not available)

.. _idfint:

IDF int
-------

Sasquatch integration environment for the community science platform in Google Cloud.
This instance is used for testing.
There is no direct EFD integration.

Intended audience: Project staff.

- Chronograf: ``https://data-int.lsst.cloud/chronograf``
- InfluxDB HTTP API: ``https://data-int.lsst.cloud/influxdb``
- Kafdrop UI: ``https://data-int.lsst.cloud/kafdrop``
- Kafka boostrap server: ``sasquatch-int-kafka-bootstrap.lsst.cloud:9094``
- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal)
- Kafka REST proxy API: ``https://data-int.lsst.cloud/sasquatch-rest-proxy``

.. _idfdev:

IDF dev
-------

Sasquatch dev environment for the community science platform in Google Cloud.
This instance is used for testing.

Intended audience: Project staff.

- Chronograf: ``https://data-dev.lsst.cloud/chronograf``
- InfluxDB HTTP API: ``https://data-dev.lsst.cloud/influxdb``
- Kafdrop UI: ``https://data-dev.lsst.cloud/kafdrop``
- Kafka boostrap server: ``sasquatch-dev-kafka-bootstrap.lsst.cloud:9094``
- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal)
- Kafka REST proxy API: ``https://data-dev.lsst.cloud/sasquatch-rest-proxy``

fajpunk marked this conversation as resolved.
Show resolved Hide resolved
.. _tts:

Tucson Test Stand (TTS)
Expand Down
4 changes: 3 additions & 1 deletion docs/user-guide/app-metrics.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
.. _appmetrics:

===================
Application metrics
===================

Applications can use Sasquatch infrastructure to publish metrics events to `InfluxDB`_ via `Kafka`_.
Setting certain Sasquatch values in Phalanx will create Kafka user and topic, and configure a Telegraf consumer to put messages from that topic into the ``telegraf-kafka-app-metrics-consumer`` database in the Sasquatch InfluxDB instance.
Setting certain Sasquatch values in Phalanx will create Kafka user and topic, and configure a Telegraf consumer to put messages from that topic into the ``lsst.square.metrics`` database in the Sasquatch InfluxDB instance.

The messages are expected to be in :ref:`Avro <avro>` format, and schemas are expected to be in the `Schema Registry`_ for any messages that are encoded with a schema ID.

Expand Down
6 changes: 6 additions & 0 deletions docs/user-guide/directconnection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,17 @@ This guide describes the the most secure and straightforward option, assuming th
Generating Kafka credentials
============================

.. note::

The ``strimzi-access-operator`` `Phalanx`_ app must be enabled.
It provides the ``KafkaAccess`` CRD that is used in this guide.

You can generate Kafka credentials by creating a couple of `Strimzi`_ resources:

* A `KafkaUser`_ resource, in the ``sasquatch`` namespace, to configure a user in the Kafka cluster and provision a Kubernetes Secret with that user's credentials
* A `KafkaAccess`_ resource, in your app's namespace, to make those credentials and other Kafka connection information available to your app

.. _Phalanx: https://phalanx.lsst.io
.. _Strimzi: https://strimzi.io
.. _KafkaUser: https://strimzi.io/docs/operators/latest/configuring.html#type-KafkaUser-reference
.. _KafkaAccess: https://github.com/strimzi/kafka-access-operator
Expand Down