Skip to content

Commit

Permalink
Merge branch 'master' into demo-runbook
Browse files Browse the repository at this point in the history
  • Loading branch information
arikalon1 authored Dec 8, 2024
2 parents cc09e01 + 142ed7d commit 48c4766
Show file tree
Hide file tree
Showing 61 changed files with 1,008 additions and 415 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,6 @@ jobs:
name: helm-chart
path: helm/robusta/

- name: Upload helm chart
run: |
cd helm && ./upload_chart.sh
- name: Release Docker to Dockerhub
run: |-
docker buildx build \
Expand All @@ -118,3 +114,7 @@ jobs:
--tag robustadev/robusta-runner:${{env.RELEASE_VER}} \
--push \
.
- name: Upload helm chart
run: |
cd helm && ./upload_chart.sh
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,8 @@
"tutorials/playbook-failed-liveness.html": "/master/playbook-reference/kubernetes-examples/playbook-failed-liveness.html",
"tutorials/playbook-track-secrets.html": "/master/playbook-reference/kubernetes-examples//playbook-track-secrets.html",
"tutorials/alert-remediation.html": "/master/playbook-reference/prometheus-examples/alert-remediation.html",
"tutorials/alert-custom-enrichment.html": "/master/playbook-reference/prometheus-examples/alert-custom-enrichment.html"
"tutorials/alert-custom-enrichment.html": "/master/playbook-reference/prometheus-examples/alert-custom-enrichment.html",
"catalog/sinks/slack.html": "/master/configuration/sinks/slack.html"


}
Expand Down
45 changes: 43 additions & 2 deletions docs/configuration/ai-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ To use HolmesGPT with the Robusta UI, one further step may be necessary, dependi
* If you store the Robusta UI token in a Kubernetes secret, follow the instructions below.

Note: the same Robusta UI token is used for the Robusta UI sink and for HolmesGPT.

Reading the Robusta UI Token from a secret in HolmesGPT
************************************************************

Expand All @@ -249,7 +249,7 @@ Reading the Robusta UI Token from a secret in HolmesGPT
.. code-block:: yaml
holmes:
additional_env_vars:
additionalEnvVars:
....
- name: ROBUSTA_UI_TOKEN
valueFrom:
Expand Down Expand Up @@ -428,3 +428,44 @@ Finally, after updating your ``generated_values.yaml``, apply the changes to you
helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
This will update the deployment to use the custom Docker image, which includes the new binaries. The ``toolsets`` defined in the configuration will now be available for Holmes to use, including any new binaries like ``jq``.


Adding Permissions for Additional Resources
----------------------------------------------

There are scenarios where HolmesGPT may require access to additional Kubernetes resources or CRDs to perform specific analyses or interact with external tools.

You will need to extend its ClusterRole rules whenever HolmesGPT needs to access resources that are not included in its default configuration.

Common Scenarios for Adding Permissions:

* External Integrations and CRDs: When HolmesGPT needs to access custom resources (CRDs) in your cluster, like ArgoCD Application resources or Istio VirtualService resources.
* Additional Kubernetes resources: By default, Holmes can only access a limited number of Kubernetes resources. For example, Holmes has no access to Kubernetes secrets by default. You can give Holmes access to more built-in cluster resources if it is useful for your use case.

As an example, let's consider a case where we ask HolmesGPT to analyze the state of Argo CD applications and projects to troubleshoot issues related to application deployments managed by Argo CD, but it doesn't have access to the relevant CRDs.

**Steps to Add Permissions for Argo CD:**

1. **Update generated_values.yaml with Required Permissions:**

Add the following configuration under the ``customClusterRoleRules`` section:

.. code-block:: yaml
enableHolmesGPT: true
holmes:
customClusterRoleRules:
- apiGroups: ["argoproj.io"]
resources: ["applications", "appprojects"]
verbs: ["get", "list", "watch"]
2. **Apply the Configuration:**

Deploy the updated configuration using Helm:

.. code-block:: bash
helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
This will grant HolmesGPT the necessary permissions to analyze Argo CD applications and projects.
Now you can ask HolmesGPT questions like "What is the current status of all Argo CD applications in the cluster?" and it will be able to answer.
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,9 @@ To allow the Grafana dashboard to persist after the Grafana instance restarts, y
enabled: true
Apply the change by performing a :ref:`Helm Upgrade <Simple Upgrade>`.

Troubleshooting
---------------------

Encountering issues with your Prometheus? Follow this guide to resolve some :ref:`common errors <Common Errors>`.

126 changes: 0 additions & 126 deletions docs/configuration/cluster-misconfigurations.rst

This file was deleted.

123 changes: 122 additions & 1 deletion docs/configuration/exporting/exporting-data.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Alert History Import and Export API
===================================

GET https://api.robusta.dev/api/alerts
GET https://api.robusta.dev/api/query/alerts
--------------------------------------

Use this endpoint to export alert history data. You can filter the results based on specific criteria using query parameters such as ``alert_name``, ``account_id``, and time range.
Expand Down Expand Up @@ -149,6 +149,127 @@ Response Fields
- The node where the resource is located.


GET `https://api.robusta.dev/api/query/report`
--------------------------------------

Use this endpoint to retrieve aggregated alert data, including the count of each type of alert during a specified time range. Filters can be applied using query parameters such as `account_id` and the time range.


Query Parameters
^^^^^^^^^^^^^^^

.. list-table::
:widths: 20 10 70 10
:header-rows: 1

* - Parameter
- Type
- Description
- Required
* - ``account_id``
- string
- The unique account identifier (found in your ``generated_values.yaml`` file).
- Yes
* - ``start_ts``
- string
- Start timestamp for the query (in ISO 8601 format, e.g., ``2024-10-27T04:02:05.032Z``).
- Yes
* - ``end_ts``
- string
- End timestamp for the query (in ISO 8601 format, e.g., ``2024-11-27T05:02:05.032Z``).
- Yes


Example Request
^^^^^^^^^^^^^^^

The following `curl` command demonstrates how to query aggregated alert data for a specified time range:

.. code-block:: bash
curl --location 'https://api.robusta.dev/api/query/report?account_id=XXXXXX-XXXX_XXXX_XXXXX7&start_ts=2024-10-27T04:02:05.032Z&end_ts=2024-11-27T05:02:05.032Z' \
--header 'Authorization: Bearer TOKEN_HERE'
In the command, make sure to replace the following placeholders:

- **`account_id`**: Your account ID, which can be found in your `generated_values.yaml` file.
- **`TOKEN_HERE`**: Your API token for authentication. Generate this token in the platform by navigating to **Settings** -> **API Keys** -> **New API Key**, and creating a key with the "Read Alerts" permission.



Request Headers
^^^^^^^^^^^^^^^

.. list-table::
:widths: 30 70
:header-rows: 1

* - Header
- Description
* - ``Authorization``
- Bearer token for authentication (e.g., ``Bearer TOKEN_HERE``). The token must have "Read Alerts" permission.

Response Format
^^^^^^^^^^^^^^^

The API will return a JSON array of aggregated alerts, with each object containing:

- **`aggregation_key`**: The unique identifier of the alert type (e.g., `KubeJobFailed`).
- **`alert_count`**: The total count of occurrences of this alert type within the specified time range.

Example Response
^^^^^^^^^^^^^^^
.. code-block:: json
[
{"aggregation_key": "KubeJobFailed", "alert_count": 17413},
{"aggregation_key": "KubePodNotReady", "alert_count": 11893},
{"aggregation_key": "KubeDeploymentReplicasMismatch", "alert_count": 2410},
{"aggregation_key": "KubeDeploymentRolloutStuck", "alert_count": 923},
{"aggregation_key": "KubePodCrashLooping", "alert_count": 921},
{"aggregation_key": "KubeContainerWaiting", "alert_count": 752},
{"aggregation_key": "PrometheusRuleFailures", "alert_count": 188},
{"aggregation_key": "KubeMemoryOvercommit", "alert_count": 187},
{"aggregation_key": "PrometheusOperatorRejectedResources", "alert_count": 102},
{"aggregation_key": "KubeletTooManyPods", "alert_count": 94},
{"aggregation_key": "NodeMemoryHighUtilization", "alert_count": 23},
{"aggregation_key": "TargetDown", "alert_count": 19},
{"aggregation_key": "test123", "alert_count": 7},
{"aggregation_key": "KubeAggregatedAPIDown", "alert_count": 4},
{"aggregation_key": "KubeAggregatedAPIErrors", "alert_count": 4},
{"aggregation_key": "KubeMemoryOvercommitTEST2", "alert_count": 1},
{"aggregation_key": "TestAlert", "alert_count": 1},
{"aggregation_key": "TestAlert2", "alert_count": 1},
{"aggregation_key": "dsafd", "alert_count": 1},
{"aggregation_key": "KubeMemoryOvercommitTEST", "alert_count": 1},
{"aggregation_key": "vfd", "alert_count": 1}
]
Response Fields
^^^^^^^^^^^^^^^
.. list-table::
:widths: 25 10 70
:header-rows: 1

* - Field
- Type
- Description
* - ``aggregation_key``
- string
- The unique key representing the type of alert (e.g., ``KubeJobFailed``).
* - ``alert_count``
- integer
- The number of times this alert occurred within the specified time range.

Notes
^^^^^^^^^^^^^^^

- Ensure that the `start_ts` and `end_ts` parameters are in ISO 8601 format and are correctly set to cover the desired time range.
- Use the correct `Authorization` token with sufficient permissions to access the alert data.


POST https://api.robusta.dev/api/alerts
--------------------------------------
Use this endpoint to send alert data to Robusta. You can send up to 1000 alerts in a single request.
Expand Down
7 changes: 7 additions & 0 deletions docs/configuration/sinks/Opsgenie.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ Robusta can report issues and events in your Kubernetes cluster to the OpsGenie

To configure OpsGenie, We need an OpsGenie API key. It can be configured using the OpsGenie team integration.

Customizing Opsgenie Extra Details
------------------------------------------------

We can add Prometheus alert labels into Opsgenie alert extra details by setting `extra_details_labels` to `true` in the `sinksConfig` section.


Configuring the OpsGenie sink
------------------------------------------------

Expand All @@ -21,6 +27,7 @@ Configuring the OpsGenie sink
- "sre"
tags:
- "prod a"
extra_details_labels: false # optional, default is false
Save the file and run

Expand Down
Loading

0 comments on commit 48c4766

Please sign in to comment.