Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add(clickhouse): local on-disk cache for remote files #609

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion docs/products/clickhouse/concepts/clickhouse-tiered-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The tiered storage feature introduces a method of organizing and storing data in
On top of this default data allocation mechanism, you can control the tier your
data is stored in using custom data retention periods.

## Tiered storage architecture

The tiered storage in Aiven for ClickHouse® consists of the following two
layers:

Expand All @@ -18,12 +20,22 @@ layers:
- Object storage - the second tier: Affordable storage device with unlimited capability,
better suited for historical and more rarely queried data, relatively slower

Aiven for ClickHouse's tiered storage supports
[local on-disk cache for remote files](/docs/products/clickhouse/howto/local-cache-tiered-storage),
which is enabled by default. You can
[disable the cache](/docs/products/clickhouse/howto/local-cache-tiered-storage#disable-the-cache)
or
[drop it](/docs/products/clickhouse/howto/local-cache-tiered-storage#free-up-space) to free
up the space it occupies.

## Supported cloud platforms

On the Aiven tenant (in non-[BYOC](/docs/platform/concepts/byoc) environments), Aiven for
ClickHouse tiered storage is supported on the following cloud platforms:

- Microsoft Azure
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Google Cloud

## Why use it

Expand Down
62 changes: 62 additions & 0 deletions docs/products/clickhouse/howto/local-cache-tiered-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: Local on-disk cache for remote files in Aiven for ClickHouse®'s tiered storage
sidebar_label: Local on-disk cache for remote files
---

Aiven for ClickHouse®'s tiered storage features local on-disk cache for remote files for improved query performance and reduced latency.

To manage data, Aiven for ClickHouse's tiered storage uses local storage and remote storage.
When remote storage is used, Aiven for ClickHouse leverages a local on-disk cache to avoid
repeated remote fetches.

## How it works

When a query requires parts of a table stored in the remote tier, Aiven for ClickHouse
fetches the required parts from the remote storage. The fetched parts are automatically
stored in a local cache directory on the disk to avoid repeated downloads for subsequent
queries. For future queries, Aiven for ClickHouse checks the local cache first:

- If the data is found in the cache, it is read directly from the local disk.
- If the data is not found in the cache, it is fetched from the remote storage and stored
in the local cache.

Local on-disk cache for remote files is enabled by default for Aiven for ClickHouse's
tiered storage. You can
[disable the cache](/docs/products/clickhouse/howto/local-cache-tiered-storage#disable-the-cache)
or
[drop it](/docs/products/clickhouse/howto/local-cache-tiered-storage#free-up-space) to
free up the space it occupies.

## Prerequisites

- At least one Aiven for ClickHouse service using tiered storage
- Command line tool
([ClickHouse client](/docs/products/clickhouse/howto/connect-with-clickhouse-cli))
installed

## Disable the cache

To disable the local cache for a query, set the `enable_filesystem_cache` setting for the
query to `false`.
You can achieve this by appending `SETTINGS enable_filesystem_cache = false` to the end of
your query using an SQL client (for example, the
[ClickHouse client](/docs/products/clickhouse/howto/connect-with-clickhouse-cli)):

```sql
SELECT 1
SETTINGS enable_filesystem_cache = false;
```

## Free up space

To drop the local cache and free up the used space, use the following cache command:

```bash
SYSTEM DROP FILESYSTEM CACHE 'remote_cache'
```

## Related pages

- [About tiered storage in Aiven for ClickHouse](/docs/products/clickhouse/concepts/clickhouse-tiered-storage)
- [Check data distribution between SSD and object storage](/docs/products/clickhouse/howto/check-data-tiered-storage)
- [Configure data retention thresholds for tiered storage](/docs/products/clickhouse/howto/configure-tiered-storage)
1 change: 1 addition & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1301,6 +1301,7 @@ const sidebars: SidebarsConfig = {
'products/clickhouse/howto/configure-tiered-storage',
'products/clickhouse/howto/check-data-tiered-storage',
'products/clickhouse/howto/transfer-data-tiered-storage',
'products/clickhouse/howto/local-cache-tiered-storage',
],
},
],
Expand Down
Loading