Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS-6680] Add keyword dictionary and priority level #20991

Merged
merged 13 commits into from
Dec 23, 2023
63 changes: 34 additions & 29 deletions content/en/sensitive_data_scanner.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,49 +28,59 @@

Often, businesses are required to identify, remediate, and prevent the exposure of such sensitive data within their logs due to organizational policies, compliance requirements, industry regulations, and privacy concerns. This is especially true within industries such as banking, financial services, healthcare, and insurance.

## Sensitive Data Scanner

Sensitive Data Scanner is a stream-based, pattern matching service that you can use to identify, tag, and optionally redact or hash sensitive data. Security and compliance teams can implement Sensitive Data Scanner as a new line of defense, helping prevent against sensitive data leaks and limiting non-compliance risks.

Sensitive Data Scanner can be found under [Organization Settings][1].

{{< img src="sensitive_data_scanner/sds_main_28_03_23.png" alt="Sensitive Data Scanner in Organization Settings" style="width:90%;">}}

### Setup

- **Define Scanning Groups:** A scanning group determines what data to scan. It consists of a query filter and a set of toggles to enable scanning for Logs, APM, RUM, and/or Events. See the [Log Search Syntax][2] documentation to learn more about query filters.
- For Terraform, see the [datadog_sensitive_data_scanner_group][3] resource.
- **Define Scanning Rules:** A scanning rule determines what sensitive information to match within the data. Within a scanning group, add predefined scanning rules from Datadog's Scanning Rule Library or create your own rules from scratch to scan using custom regex patterns.
- For Terraform, see the [datadog_sensitive_data_scanner_rule][4] resource.

Sensitive Data Scanner supports Perl Compatible RegEx (PCRE), but the following patterns are not supported:
- Backreferences and capturing sub-expressions (lookarounds)
- Arbitrary zero-width assertions
- Subroutine references and recursive patterns
- Conditional patterns
- Backtracking control verbs
- The \C "single-byte" directive (which breaks UTF-8 sequences)
- The \R newline match
- The \K start of match reset directive
- Callouts and embedded code
- Atomic grouping and possessive quantifiers
## Setup

1. **Define Scanning Groups:** A scanning group determines what data to scan. It consists of a query filter and a set of toggles to enable scanning for Logs, APM, RUM, and/or Events. See the [Log Search Syntax][2] documentation to learn more about query filters.

Check warning on line 39 in content/en/sensitive_data_scanner.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.words

Use 'and', 'or', or 'either or' instead of 'and/or'.
- For Terraform, see the [datadog_sensitive_data_scanner_group][3] resource.
2. **Define Scanning Rules:** A scanning rule determines what sensitive information to match within the data. Within a scanning group, add predefined scanning rules from Datadog's Scanning Rule Library or create your own rules from scratch to scan using custom regex patterns.
- For Terraform, see the [datadog_sensitive_data_scanner_rule][4] resource.

**Note:**
- Any rules that you add or update only affect data coming into Datadog after the rule was defined.
- Sensitive Data Scanner does not affect any rules you define on the Datadog Agent directly.
- To turn off Sensitive Data Scanner entirely, set the toggle to **off** for each Scanning Group and Scanning Rule so that they are disabled.

### Custom Scanning Rules
### Define Scanning Rules

Check warning on line 49 in content/en/sensitive_data_scanner.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.headings

'Define Scanning Rules' should use sentence-style capitalization.

#### Out-of-the-box Scanning Rules

Check warning on line 51 in content/en/sensitive_data_scanner.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.headings

'Out-of-the-box Scanning Rules' should use sentence-style capitalization.

The Scanning Rule Library contains an evergrowing collection of predefined rules maintained by Datadog for detecting common patterns such as email addresses, credit card numbers, API keys, authorization tokens, and more.

Check notice on line 53 in content/en/sensitive_data_scanner.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.
{{< img src="sensitive_data_scanner/sds-library-28-03-23.png" alt="Scanning Rule Library" style="width:90%;">}}

- **Define pattern:** Specify the regex pattern to be used for matching against events. Test with sample data to verify that your regex pattern is valid.
#### Custom Scanning Rules

Check warning on line 56 in content/en/sensitive_data_scanner.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.headings

'Custom Scanning Rules' should use sentence-style capitalization.

- Define custom scanning rules to scan for sensitive data specific to your business.
- **Define match conditions:** Specify the regex pattern to be used for matching against events. Test with sample data to verify that your regex pattern is valid.
- Sensitive Data Scanner supports Perl Compatible RegEx (PCRE), but the following patterns are not supported:
- Backreferences and capturing sub-expressions (lookarounds)
- Arbitrary zero-width assertions
- Subroutine references and recursive patterns
- Conditional patterns
- Backtracking control verbs
- The \C "single-byte" directive (which breaks UTF-8 sequences)
- The \R newline match
- The \K start of match reset directive
- Callouts and embedded code
- Atomic grouping and possessive quantifiers

#### Define rule target and action

- **Create keyword dictionary**: Add keywords to tune detection accuracy when matching regex conditions. For example, if you are scanning for a sixteen-digit Visa credit card number, you can add keywords like `visa`, `credit`, and `card` and require that these keywords must be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
maycmlee marked this conversation as resolved.
Show resolved Hide resolved
- **Define scope:** Specify whether you want to scan the entire event or just specific attributes. You can also choose to exclude specific attributes from the scan.
- **Add tags:** Specify the tags you want to associate with events where the values match the specified regex pattern. Datadog recommends using `sensitive_data` and `sensitive_data_category` tags. These tags can then be used in searches, dashboards, and monitors.
- **Process matching values:** Optionally, specify whether you want to redact, partially redact, or hash matching values. When redacting, specify placeholder text to replace the matching values with. When partially redacting, specify the position (start/end) and length (# of characters) to redact within matching values. Redaction, partial redaction, and hashing are all irreversible actions.
- **Add tags:** Specify the tags you want to associate with events where the values match the specified regex pattern. Datadog recommends using `sensitive_data` and `sensitive_data_category` tags. These tags can then be used in searches, dashboards, and monitors.
- **Set priority level**: Set the priority level for a rule based on your business needs.
maycmlee marked this conversation as resolved.
Show resolved Hide resolved
- **Name the rule:** Provide a human-readable name for the rule.

{{< img src="sensitive_data_scanner/sds_rules_28_03_23.png" alt="A Sensitive Data Scanner custom rule" style="width:90%;">}}

### Redact sensitive data in tags
#### Redact sensitive data in tags

To redact sensitive data contained in tags, you must [remap][5] the tag to an attribute and then redact the attribute. Uncheck `Preserve source attribute` in the remapper processor so that the tag is not preserved during the remapping.

Expand All @@ -97,11 +107,6 @@
7. Optionally, add tags.
8. Click **Add Rules**.

### Out-of-the-box Scanning Rules

The Scanning Rule Library contains an evergrowing collection of predefined rules maintained by Datadog for detecting common patterns such as email addresses, credit card numbers, API keys, authorization tokens, and more.
{{< img src="sensitive_data_scanner/sds-library-28-03-23.png" alt="Scanning Rule Library" style="width:90%;">}}

### Permissions

By default, users with the Datadog Admin role have access to view and define the scanning rules. To allow other user access, grant read or write permissions for Data Scanner under **Compliance**. See the [Custom RBAC documentation][7] for details on Roles and Permissions.
Expand Down
Loading