diff --git a/config/_default/menus/menus.en.yaml b/config/_default/menus/menus.en.yaml index 89b20c62d9b06..55afbe13dd01f 100644 --- a/config/_default/menus/menus.en.yaml +++ b/config/_default/menus/menus.en.yaml @@ -1328,16 +1328,21 @@ main: parent: slos identifier: slos_metric weight: 2 + - name: Time Slice SLOs + url: service_management/service_level_objectives/time_slice/ + parent: slos + identifier: slos_time_slice + weight: 3 - name: Error Budget Alerts url: service_management/service_level_objectives/error_budget/ parent: slos identifier: error_budget - weight: 3 + weight: 4 - name: Burn Rate Alerts url: service_management/service_level_objectives/burn_rate/ parent: slos identifier: burn_rate - weight: 4 + weight: 5 - name: Guides url: service_management/service_level_objectives/guide/ parent: slos diff --git a/content/en/service_management/service_level_objectives/_index.md b/content/en/service_management/service_level_objectives/_index.md index 8f63bd2da79c0..4553c0b30d288 100644 --- a/content/en/service_management/service_level_objectives/_index.md +++ b/content/en/service_management/service_level_objectives/_index.md @@ -51,18 +51,27 @@ Service Level Agreement (SLA) Error Budget : The allowed amount of unreliability derived from an SLO's target percentage (100% - target percentage) that is meant to be invested into product development. +## SLO types + +When creating SLOs, you can choose from the following types: +- **Metric-based SLOs**: can be used when you want the SLI calculation to be count-based, the SLI is calculated as the sum of good events divided by the sum of total events. +- **Monitor-based SLOs**: can be used when you want the SLI calculation to be time-based, the SLI is based on the Monitor's uptime. Monitor-based SLOs must be based on a new or existing Datadog monitor, any adjustments must be made to the underlying monitor (cannot be done through SLO creation). +- **Time Slice SLOs**: can be used when you want the SLI calculation to be time-based, the SLI is based on your custom uptime definition (amount of time your system exhibits good behavior divided by the total time). Time Slice SLOs do not require a Datadog monitor, you can try out different metric filters and thresholds and instantly explore downtime during SLO creation. + +For a full comparison, see the [SLO Type Comparison][1] chart. + ## Setup -Use Datadog's [Service Level Objectives status page][1] to create new SLOs or to view and manage all your existing SLOs. You can also add [SLO widgets](#slo-widgets) to your dashboards to visualize your SLO statuses at a glance. +Use Datadog's [Service Level Objectives status page][2] to create new SLOs or to view and manage all your existing SLOs. ### Configuration -1. On the [SLO status page][1], select **New SLO +**. -2. Define the source for your SLO. You can create an SLO from [metrics][2] or [monitors][3]. +1. On the [SLO status page][2], select **New SLO +**. +2. Select the SLO type. You can create an SLO with any of the following types: [Metric-based][3], [Monitor-based][4], or [Time Slices][5]. 3. Set a target and a rolling time window (past 7, 30, or 90 days) for the SLO. Datadog recommends you make the target stricter than your stipulated SLAs. If you configure more than one time window, select one to be the primary time window. This time window is displayed on SLO lists. By default, the shortest time window is selected. 4. Finally, give the SLO a title, describe it in more detail or add links in the description, add tags, and save it. -After you set up the SLO, select it from the [Service Level Objectives list view][1] to open the details side panel. The side panel displays the overall status percentage and remaining error budget for each of the SLO's targets, as well as status bars (monitor-based SLOs) or bar graphs (metric-based SLOs) of the SLI's history. If you created a grouped monitor-based SLO using one [multi alert monitor][4] or a grouped metric-based SLO using the [`sum by` clause][5], the status percentage and remaining error budget for each individual group is displayed in addition to the overall status percentage and remaining error budget. +After you set up the SLO, select it from the [Service Level Objectives list view][2] to open the details side panel. The side panel displays the overall status percentage and remaining error budget for each of the SLO's targets, as well as status bars (monitor-based SLOs) or bar graphs (metric-based SLOs) of the SLI's history. If you created a grouped monitor-based SLO using one [multi alert monitor][6] or a grouped metric-based SLO using the [`sum by` clause][7], the status percentage and remaining error budget for each individual group is displayed in addition to the overall status percentage and remaining error budget. **Example:** If you create a monitor-based SLO to track latency per availability-zone, the status percentages and remaining error budget for the overall SLO and for each individual availability-zone that the SLO is tracking are displayed. @@ -78,9 +87,9 @@ Setting a 100% target means having an error budget of 0% since error budget is e **Note:** The number of decimal places you can specify for your SLOs differs depending on the type of SLO and the time windows you choose. Refer to the links below for more information for each respective SLO type. -[Monitor-based SLOs][6]: Up to two decimal places are allowed for 7-day and 30-day targets, up to three decimal places are allowed for 90-day targets. +[Monitor-based SLOs][8]: Up to two decimal places are allowed for 7-day and 30-day targets, up to three decimal places are allowed for 90-day targets. -[Metric-based SLOs][7]: Up to three decimal places are allowed for all targets. +[Metric-based SLOs][9]: Up to three decimal places are allowed for all targets. ## Edit an SLO @@ -90,13 +99,13 @@ To edit an SLO, hover over the SLO's row in the list view and click the edit pen ### Role based access -All users can view SLOs and [SLO status corrections](#slo-status-corrections), regardless of their associated [role][8]. Only users attached to roles with the `slos_write` permission can create, edit, and delete SLOs. +All users can view SLOs and [SLO status corrections](#slo-status-corrections), regardless of their associated [role][10]. Only users attached to roles with the `slos_write` permission can create, edit, and delete SLOs. -To create, edit, and delete status corrections, users require the `slos_corrections` permissions. A user with this permission can make status corrections, even if they do not have permission to edit those SLOs. For the full list of permissions, see the [RBAC documentation][9]. +To create, edit, and delete status corrections, users require the `slos_corrections` permissions. A user with this permission can make status corrections, even if they do not have permission to edit those SLOs. For the full list of permissions, see the [RBAC documentation][11]. ### Granular access controls -Restrict access to individual SLOs by specifying a list of [roles][8] that are allowed to edit it. +Restrict access to individual SLOs by specifying a list of [roles][10] that are allowed to edit it. {{< img src="service_management/service_level_objectives/slo_set_permissions.png" style="width:100%; background:none; border:none; box-shadow:none;" alt="SLO permissions option in the cog menu">}} @@ -112,11 +121,11 @@ Restrict access to individual SLOs by specifying a list of [roles][8] that are a To maintain your edit access to the SLO, the system requires you to include at least one role that you are a member of before saving. Users on the access control list can add roles and can only remove roles other than their own. -**Note**: Users can create SLOs on any monitor even if they do not have write permissions to the monitor. Similarly, users can create SLO alerts even if they do not have write permissions to the SLO. For more information on RBAC permissions for Monitors, see the [RBAC documentation][10] or the [guide on how to set up RBAC for Monitors][11]. +**Note**: Users can create SLOs on any monitor even if they do not have write permissions to the monitor. Similarly, users can create SLO alerts even if they do not have write permissions to the SLO. For more information on RBAC permissions for Monitors, see the [RBAC documentation][12] or the [guide on how to set up RBAC for Monitors][13]. ## Searching SLOs -The [Service Level Objectives status page][1] lets you run an advanced search of all SLOs so you can find, view, edit, clone or delete SLOs from the search results. +The [Service Level Objectives status page][2] lets you run an advanced search of all SLOs so you can find, view, edit, clone or delete SLOs from the search results. Advanced search lets you query SLOs by any combination of SLO attributes: @@ -136,15 +145,15 @@ Group your SLOs by *team*, *service* or *environment* to get a summary view of y Sort SLOs by the *status* and *error budget* columns to prioritize which SLOs need your attention. The SLO list displays the details of SLOs over the primary time window selected in your [configuration](#configuration). All other configuration time windows are available to view in the individual side panel. Open the SLO details side panel by clicking the respective table row. -**Note**: You can view your SLOs from your mobile device home screen by downloading the [Datadog Mobile App][12], available on the [Apple App Store][13] and [Google Play Store][14]. +**Note**: You can view your SLOs from your mobile device home screen by downloading the [Datadog Mobile App][14], available on the [Apple App Store][15] and [Google Play Store][16]. {{< img src="service_management/service_level_objectives/slos-mobile.png" style="width:100%; background:none; border:none; box-shadow:none;" alt="SLOs on iOS and Android">}} ### SLO tags -When you create or edit an SLO, you can add tags for filtering on the [SLO status page][1] or for creating [SLO saved views][15]. +When you create or edit an SLO, you can add tags for filtering on the [SLO status page][2] or for creating [SLO saved views][17]. -Add tags to SLOs in bulk with the *Edit Tags* and the *[Edit Teams][16]* dropdown options at the top of the SLO list. +Add tags to SLOs in bulk with the *Edit Tags* and the *[Edit Teams][18]* dropdown options at the top of the SLO list. ### SLO default view @@ -204,7 +213,7 @@ Three types of SLO audit events appear in the Event Explorer: To get a full list of all SLO audit events, enter the search query `tags:audit,slo` in the Event Explorer. To view the list of audit events for a specific SLO, enter `tags:audit,slo_id:` with the ID of the desired SLO. -You can also query the Event Explorer programmatically using the [Datadog Events API][17]. +You can also query the Event Explorer programmatically using the [Datadog Events API][19]. **Note:** If you don't see events appear in the UI, be sure to set the time frame of the Event Explorer to a longer period, for example, the past 7 days. @@ -219,11 +228,11 @@ For example, if you wish to be notified when a specific SLO's configuration is m After creating your SLO, you can visualize the data through Dashboards and widgets. - Use the SLO Summary widget to visualize the status of a single SLO. - Use the SLO List widget to visualize a set of SLOs - - Graph 15 months' worth of metric-based SLO data with the [SLO data source][18] in both timeseries and scalar (query value, top list, table, change) widgets. + - Graph 15 months' worth of metric-based SLO data with the [SLO data source][20] in both timeseries and scalar (query value, top list, table, change) widgets. -For more information about SLO Widgets, see the [SLO Summary][19] and [SLO List][20] widget pages. For more information on the SLO data source, see the guide on how to [Graph historical SLO data on Dashboards][18]. +For more information about SLO Widgets, see the [SLO Summary][21] and [SLO List][22] widget pages. For more information on the SLO data source, see the guide on how to [Graph historical SLO data on Dashboards][20]. -To proactively manage the configurations of your SLOs, set an [Event Monitor][21] to notify you when events corresponding to certain tags occur. +To proactively manage the configurations of your SLOs, set an [Event Monitor][23] to notify you when events corresponding to certain tags occur. ## SLO status corrections @@ -235,8 +244,9 @@ Status corrections allow you to exclude specific time periods from SLO status an When you apply a correction, the time period you specify is dropped from the SLO's calculation. - For monitor-based SLOs, the correction time window is not counted. - For metric-based SLOs, all good and bad events in the correction window are not counted. +- For Time Slice SLOs, the correction time window is treated as uptime. -You have the option to create one-time corrections for ad hoc adjustments, or recurring corrections for predictable adjustments that occur on a regular cadence. One-time corrections require a start and end time, while recurring corrections require a start time, duration, and interval. Recurring corrections are based on [iCalendar RFC 5545's RRULE specification][22]. The supported rules are `FREQ`, `INTERVAL`, `COUNT`, and `UNTIL`. Specifying an end date for recurring corrections is optional in case you need the correction to repeat indefinitely. +You have the option to create one-time corrections for ad hoc adjustments, or recurring corrections for predictable adjustments that occur on a regular cadence. One-time corrections require a start and end time, while recurring corrections require a start time, duration, and interval. Recurring corrections are based on [iCalendar RFC 5545's RRULE specification][24]. The supported rules are `FREQ`, `INTERVAL`, `COUNT`, and `UNTIL`. Specifying an end date for recurring corrections is optional in case you need the correction to repeat indefinitely. For either type of correction, you must select a correction category that states why the correction is being made. The available categories are `Scheduled Maintenance`, `Outside Business Hours`, `Deployment`, and `Other`. You can optionally include a description to provide additional context if necessary. @@ -253,7 +263,7 @@ The 90-day limits per SLO are as follows: | Weekly recurring | 3 | | Monthly recurring | 5 | -You may configure status corrections through the UI by selecting `Correct Status` in your SLO's side panel, the [SLO status corrections API][23], or a [Terraform resource][24]. +You may configure status corrections through the UI by selecting `Correct Status` in your SLO's side panel, the [SLO status corrections API][25], or a [Terraform resource][26]. {{< img src="service_management/service_level_objectives/slo-corrections-ui.png" alt="SLO correction UI" >}} @@ -271,31 +281,39 @@ To access SLO status corrections in the UI: To view, edit, and delete existing status corrections, click on the **Corrections** tab at the top of an SLO's detailed side panel view. +## SLO Calendar View + +The SLO Calendar View is available on the [SLO status page][2]. On the top right corner, switch from the "Primary" view to the "Weekly" or "Monthly" view to see 12 months of historical SLO status data. The Calendar View is supported for Metric-based SLOs and Time Slice SLOs. + +{{< img src="service_management/service_level_objectives/calendar-view-slo.png" alt="SLO calendar view" >}} + ## Further Reading {{< partial name="whats-next/whats-next.html" >}} -[1]: https://app.datadoghq.com/slo -[2]: /service_management/service_level_objectives/metric/ -[3]: /service_management/service_level_objectives/monitor/ -[4]: /monitors/types/metric/?tab=threshold#alert-grouping -[5]: /service_management/service_level_objectives/metric/#define-queries -[6]: /service_management/service_level_objectives/monitor/#set-your-slo-targets -[7]: /service_management/service_level_objectives/metric/#set-your-slo-targets -[8]: /account_management/rbac/ -[9]: /account_management/rbac/permissions/#service-level-objectives/ -[10]: /account_management/rbac/permissions/#monitors -[11]: /monitors/guide/how-to-set-up-rbac-for-monitors/ -[12]: /mobile -[13]: https://apps.apple.com/app/datadog/id1391380318 -[14]: https://play.google.com/store/apps/details?id=com.datadog.app -[15]: /service_management/service_level_objectives/#saved-views -[16]: /account_management/teams/#associate-resources-with-team-handles -[17]: /api/latest/events/ -[18]: /dashboards/guide/slo_data_source/ -[19]: /dashboards/widgets/slo/ -[20]: /dashboards/widgets/slo_list/ -[21]: /monitors/types/event/ -[22]: https://icalendar.org/iCalendar-RFC-5545/3-8-5-3-recurrence-rule.html -[23]: /api/latest/service-level-objective-corrections/ -[24]: https://registry.terraform.io/providers/DataDog/datadog/latest/docs/resources/slo_correction +[1]: /service_management/service_level_objectives/guide/slo_types_comparison/ +[2]: https://app.datadoghq.com/slo +[3]: /service_management/service_level_objectives/metric/ +[4]: /service_management/service_level_objectives/monitor/ +[5]: /service_management/service_level_objectives/time_slice/ +[6]: /monitors/types/metric/?tab=threshold#alert-grouping +[7]: /service_management/service_level_objectives/metric/#define-queries +[8]: /service_management/service_level_objectives/monitor/#set-your-slo-targets +[9]: /service_management/service_level_objectives/metric/#set-your-slo-targets +[10]: /account_management/rbac/ +[11]: /account_management/rbac/permissions/#service-level-objectives/ +[12]: /account_management/rbac/permissions/#monitors +[13]: /monitors/guide/how-to-set-up-rbac-for-monitors/ +[14]: /mobile +[15]: https://apps.apple.com/app/datadog/id1391380318 +[16]: https://play.google.com/store/apps/details?id=com.datadog.app +[17]: /service_management/service_level_objectives/#saved-views +[18]: /account_management/teams/#associate-resources-with-team-handles +[19]: /api/latest/events/ +[20]: /dashboards/guide/slo_data_source/ +[21]: /dashboards/widgets/slo/ +[22]: /dashboards/widgets/slo_list/ +[23]: /monitors/types/event/ +[24]: https://icalendar.org/iCalendar-RFC-5545/3-8-5-3-recurrence-rule.html +[25]: /api/latest/service-level-objective-corrections/ +[26]: https://registry.terraform.io/providers/DataDog/datadog/latest/docs/resources/slo_correction diff --git a/content/en/service_management/service_level_objectives/guide/_index.md b/content/en/service_management/service_level_objectives/guide/_index.md index ed279ffb9bf59..d56df3e1d09c5 100644 --- a/content/en/service_management/service_level_objectives/guide/_index.md +++ b/content/en/service_management/service_level_objectives/guide/_index.md @@ -7,6 +7,7 @@ disable_toc: true {{< whatsnext desc="General guides:">}} {{< nextlink href="/service_management/service_level_objectives/guide/slo-checklist" >}}SLO Checklist{{< /nextlink >}} + {{< nextlink href="/service_management/service_level_objectives/guide/slo_types_comparison" >}}SLO Type Comparison{{< /nextlink >}} {{< /whatsnext >}} {{< whatsnext desc="Dashboard guides:">}} diff --git a/content/en/service_management/service_level_objectives/guide/slo_types_comparison.md b/content/en/service_management/service_level_objectives/guide/slo_types_comparison.md new file mode 100644 index 0000000000000..bc7ed3112d9e1 --- /dev/null +++ b/content/en/service_management/service_level_objectives/guide/slo_types_comparison.md @@ -0,0 +1,53 @@ +--- +title: SLO Type Comparison +kind: Guide +further_reading: +- link: "/service_management/service_level_objectives/" + tag: "Documentation" + text: "Overview of Service Level Objectives" +- link: "/service_management/service_level_objectives/metric/" + tag: "Documentation" + text: "Metric-based SLOs" +- link: "/service_management/service_level_objectives/monitor/" + tag: "Documentation" + text: "Monitor-based SLOs" +- link: "/service_management/service_level_objectives/time_slice/" + tag: "Documentation" + text: "Time Slice SLOs" +--- + +## Overview + +When creating SLOs, you can choose from the following types: +- **Metric-based SLOs**: can be used when you want the SLI calculation to be count-based, the SLI is calculated as the sum of good events divided by the sum of total events. +- **Monitor-based SLOs**: can be used when you want the SLI calculation to be time-based, the SLI is based on the Monitor's uptime. Monitor-based SLOs must be based on a new or existing Datadog monitor, any adjustments must be made to the underlying monitor (cannot be done through SLO creation). +- **Time Slice SLOs**: can be used when you want the SLI calculation to be time-based, the SLI is based on your custom uptime definition (amount of time your system exhibits good behavior divided by the total time). Time Slice SLOs do not require a Datadog monitor, you can try out different metric filters and thresholds and instantly explore downtime during SLO creation. + +## Comparison chart + +| | **Metric-based SLO** | **Monitor-based SLO** | **Time Slice SLO** | +|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------| +| **Supported data types** | Metrics with type of count, rate, or distribution | Metric Monitor types, Synthetic Monitors, and Service Checks | All metric types (including gauge metrics) | +| **Functionality for SLO with Groups** | SLO calculated based on all groups

Can view all groups in SLO side panel, up to 20 groups in SLO summary widget | Supported for SLOs with a single multi alert Monitor

**Option 1:** SLO calculated based on all groups (can view 5 worst groups in SLO side panel and SLO summary widget)
**Option 2:** SLO calculated based on up to 20 selected groups (can view all selected groups in SLO side panel and SLO summary widget) | SLO calculated based on all groups

Can view all groups in SLO side panel, up to 20 groups in SLO summary widget | +| **SLO details side panel (up to 90 days of historical data)** | Can set custom time windows to view SLO info | Cannot set custom time windows to view SLO info (can view 7, 30, or 90 day history) | Can set custom time windows to view SLO info | +| **SLO alerting ([Error Budget][1] or [Burn Rate][2] Alerts)** | Available | Available for SLOs based on Metric Monitor types only (not available for Synthetic Monitors or Service Checks) | Not available | +| [**SLO Status Corrections**][3] | Correction periods are ignored from SLO status calculation | Correction periods are ignored from SLO status calculation | Correction periods are counted as uptime in SLO status calculation | +| **[SLO Widgets][4] (up to 90 days of historical data)** | Available | Available | Available | +| [**SLO Data Source**][5] | Available (with up to 15 months of historical data) | Not available | Not available | +| **Handling missing data in the SLO calculation** | Missing data is ignored in SLO status and error budget calculations | Missing data is handled based on the [underlying Monitor's configuration][6] | Missing data is treated as uptime in SLO status and error budget calculations | +| **Uptime Calculations** | N/A | Uptime calculations are based on the underlying Monitor

If groups are present, overall uptime requires *all* groups to have uptime| [Uptime][7] is calculated by looking at discrete time chunks, not rolling time windows

If groups are present, overall uptime requires *all* groups to have uptime | +| **Calendar View on SLO Manage Page** | Available | Not available | Available | +| **Public [APIs][8] and Terraform Support** | Available | Available | Not available | + +## Further Reading + +{{< partial name="whats-next/whats-next.html" >}} + +[1]: https://docs.datadoghq.com/service_management/service_level_objectives/error_budget/ +[2]: https://docs.datadoghq.com/service_management/service_level_objectives/burn_rate/ +[3]: https://docs.datadoghq.com/service_management/service_level_objectives/#slo-status-corrections +[4]: https://docs.datadoghq.com/service_management/service_level_objectives/#slo-widgets +[5]: https://docs.datadoghq.com/dashboards/guide/slo_data_source/ +[6]: https://docs.datadoghq.com/service_management/service_level_objectives/monitor/#missing-data +[7]: /service_management/service_level_objectives/time_slice/#uptime-calculations +[8]: https://docs.datadoghq.com/api/latest/service-level-objectives/ \ No newline at end of file diff --git a/content/en/service_management/service_level_objectives/time_slice.md b/content/en/service_management/service_level_objectives/time_slice.md new file mode 100644 index 0000000000000..eac998a74f068 --- /dev/null +++ b/content/en/service_management/service_level_objectives/time_slice.md @@ -0,0 +1,95 @@ +--- +title: Time Slice SLOs +kind: documentation +is_beta: true +further_reading: +- link: "service_management/service_level_objectives/" + tag: "Documentation" + text: "Overview of Service Level Objectives" +--- + +{{< jqmath-vanilla >}} + +## Overview + +Time Slice SLOs allow you to measure reliability using a custom definition of uptime. You define uptime as a condition over a metric timeseries. For example, you can create a latency SLO by defining uptime as whenever p95 latency is less than 1 second. + +Time Slice SLOs are a convenient alternative to Monitor-based SLOs. You can create an uptime SLO without going through a monitor, so you don't have to create and maintain both a monitor and an SLO. + +## Create a Time Slice SLO + +You can create a Time Slice SLO through the following ways: +- [Create an SLO from the create page](#create-an-slo-from-the-create-page) +- [Export an existing Monitor-based SLO](#export-an-existing-monitor-slo) +- [Import from a monitor](#import-from-a-monitor) + +### Create an SLO from the create page + +{{< img src="service_management/service_level_objectives/time_slice/create_and_configuration.png" alt="Configuration options to create a Time Slice SLO" style="width:100%;" >}} + +1. Navigate to [**Service Management > SLOs**][1]. +1. Click **+ New SLO** to open up the Create SLO page. +1. Select **By Time Slices** to define your SLo measurement. +1. Define your uptime condition by choosing a metric query, comparator and threshold. For example, to define uptime as whenever p95 latency is less than 1s. Alternatively, you can [import the uptime from a monitor](#import-from-a-monitor). +1. Choose your timeframe and target +1. Name and tag your SLO. +1. Click **Create**. + +### Export an existing monitor SLO + +
Only single metric monitor SLOs can be exported. Non-metric monitors or multi-monitor SLOs cannot be exported.
+ +Create a Time Slice SLO by exporting an existing Monitor-based SLO. From a monitor SLO, click **Export to Time Slice SLO**. + +{{< img src="service_management/service_level_objectives/time_slice/export_monitor_slo.png" alt="On a Monitor-based SLO detail side panel, the button to Export to Time Slice is highlighted" style="width:90%;" >}} + +### Import from a monitor + +
Only metric monitor SLOs appear in the monitor selection for import.
+ +From the **Create or Edit SLO** page, under **Define your SLI**, click **Import from Monitor** and select from the dropdown or search in the monitor selector. + +**Note**: Time Slice SLOs do not support rolling periods. Rolling periods do not transfer from a monitor query to a Time Slice query. + +{{< img src="service_management/service_level_objectives/time_slice/import_from_monitor.png" alt="Highlighted option to Import From Monitor in the Define your SLI section of an SLO configuration" style="width:90%;" >}} + +## Uptime calculations + +To calculate the uptime percentage for a Time Slice SLOs, Datadog cuts the timeseries into equal-duration intervals, called "slices". The length of the interval is 5 minutes and not configurable. The space and time aggregation are determined by the metric query. For more information on time and space aggregation, see the [metrics][2] documentation. + +For each slice, there is a single value for the timeseries, and the uptime condition (such as `value < 1`) is evaluated for each slice. If the condition is met, the slice is considered uptime, otherwise it is considered downtime. + +{{< img src="service_management/service_level_objectives/time_slice/uptime_latency.png" alt="Time Slice SLO detail panel showing application latency with one uptime violation" style="width:100%;" >}} + +For the above example, exactly one point in the timeseries violates the uptime condition (in this case, the condition is that the p95 latency is less than or equal to 2.5 seconds). Since the total time period shown is 12 hours (720 minutes), and 715 minutes are considered uptime (720 min total time - 5 min downtime), the uptime percentage is 715/720 * 100 = 99.305% + +### Groups and overall uptime + +Time Slice SLOs allow you to track uptime for individual groups, where groups are defined in the "group by" portion of the metric query. + +When groups are present, uptime is calculated for each individual group. However, overall uptime works differently. In order to match existing monitor SLO functionality, Time Slice SLOs use the same definition of overall uptime. When **all** groups have uptime, it is considered overall uptime. Conversely, if **any** group has downtime, it is considered overall downtime. Overall uptime is always less than the uptime for any individual group. + +{{< img src="service_management/service_level_objectives/time_slice/uptime_latency_groups.png" alt="Time Slice SLO detail panel of application latency uptime with groups" style="width:100%;" >}} + +In the example above, environment "prod" has 5 minutes of downtime over a 12 hour (720 minute) period, resulting in approximately 715/720 * 100 = 99.305% of uptime. Environment "dev" also had 5 minutes of downtime, resulting in the same uptime. This means that overall downtime--when either datacenter prod or dev had downtime--was 10 minutes (since there is no overlap), resulting in approximately (720-10)/720 * 100 = 98.611% uptime. + +### Corrections + +Time Slice SLOs count correction periods as uptime in all calculations. Since the total time remains constant, the error budget is always a fixed amount of time as well. This is a significant simplification and improvement over how corrections are handled for monitor-based SLOs. + +For monitor-based SLOs, corrections are periods that are removed from the calculation. If a one-day-long correction is added to a 7-day SLO, 1 hour of downtime counts as 0.7% instead of 0.6%: + +$$ 60/8640 *100 = ~0.7% $$ + +The effects on error budget can be unusual. Removing time from an uptime SLO causes time dilation, where each minute of downtime represents a larger fraction of the total time. + +### Missing data + +In Time Slice SLOs, missing data is always treated as uptime. While missing data is treated as uptime, it is gray on the timeline visualization. + +## Further Reading + +{{< partial name="whats-next/whats-next.html" >}} + +[1]: https://app.datadoghq.com/slo/manage +[2]: /metrics/#time-and-space-aggregation diff --git a/static/images/service_management/service_level_objectives/calendar-view-slo.png b/static/images/service_management/service_level_objectives/calendar-view-slo.png new file mode 100644 index 0000000000000..007465ffa672a Binary files /dev/null and b/static/images/service_management/service_level_objectives/calendar-view-slo.png differ diff --git a/static/images/service_management/service_level_objectives/time_slice/create_and_configuration.png b/static/images/service_management/service_level_objectives/time_slice/create_and_configuration.png new file mode 100644 index 0000000000000..bca22ee39e23b Binary files /dev/null and b/static/images/service_management/service_level_objectives/time_slice/create_and_configuration.png differ diff --git a/static/images/service_management/service_level_objectives/time_slice/export_monitor_slo.png b/static/images/service_management/service_level_objectives/time_slice/export_monitor_slo.png new file mode 100644 index 0000000000000..3f11506d879f4 Binary files /dev/null and b/static/images/service_management/service_level_objectives/time_slice/export_monitor_slo.png differ diff --git a/static/images/service_management/service_level_objectives/time_slice/import_from_monitor.png b/static/images/service_management/service_level_objectives/time_slice/import_from_monitor.png new file mode 100644 index 0000000000000..cf110ebb5ef87 Binary files /dev/null and b/static/images/service_management/service_level_objectives/time_slice/import_from_monitor.png differ diff --git a/static/images/service_management/service_level_objectives/time_slice/time_slice_detail_panel_group.png b/static/images/service_management/service_level_objectives/time_slice/time_slice_detail_panel_group.png new file mode 100644 index 0000000000000..689974f06d3d2 Binary files /dev/null and b/static/images/service_management/service_level_objectives/time_slice/time_slice_detail_panel_group.png differ diff --git a/static/images/service_management/service_level_objectives/time_slice/uptime_latency.png b/static/images/service_management/service_level_objectives/time_slice/uptime_latency.png new file mode 100644 index 0000000000000..2074abf097660 Binary files /dev/null and b/static/images/service_management/service_level_objectives/time_slice/uptime_latency.png differ diff --git a/static/images/service_management/service_level_objectives/time_slice/uptime_latency_groups.png b/static/images/service_management/service_level_objectives/time_slice/uptime_latency_groups.png new file mode 100644 index 0000000000000..dc2ed8a768f01 Binary files /dev/null and b/static/images/service_management/service_level_objectives/time_slice/uptime_latency_groups.png differ