Skip to content

Commit

Permalink
feat(snapshots): Support direct cloud storage (#117)
Browse files Browse the repository at this point in the history
* use recent build to make it work

* feat(snapshots): Support direct cloud storage

* use v1.11.0 version instead

* rename s3 to dir to be more common

* remove unnecessary image change

* add docs for feature

* small updates

* fix issue with dir missing
  • Loading branch information
Pothulapati authored Oct 27, 2023
1 parent 04c3e71 commit 2034628
Show file tree
Hide file tree
Showing 4 changed files with 139 additions and 5 deletions.
7 changes: 7 additions & 0 deletions api/v1alpha1/dragonfly_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,13 @@ type DragonflySpec struct {
}

type Snapshot struct {
// (Optional) The path to the snapshot directory
// This can also be an S3 URI with the prefix `s3://` when
// using S3 as the snapshot backend
// +optional
// +kubebuilder:validation:Optional
Dir string `json:"dir,omitempty"`

// (Optional) Dragonfly snapshot schedule
// +optional
// +kubebuilder:validation:Optional
Expand Down
5 changes: 5 additions & 0 deletions config/crd/bases/dragonflydb.io_dragonflies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1093,6 +1093,11 @@ spec:
cron:
description: (Optional) Dragonfly snapshot schedule
type: string
dir:
description: (Optional) The path to the snapshot directory This
can also be an S3 URI with the prefix `s3://` when using S3
as the snapshot backend
type: string
persistentVolumeClaimSpec:
description: (Optional) Dragonfly PVC spec
properties:
Expand Down
15 changes: 10 additions & 5 deletions internal/resources/resources.go
Original file line number Diff line number Diff line change
Expand Up @@ -165,13 +165,17 @@ func GetDragonflyResources(ctx context.Context, df *resourcesv1.Dragonfly) ([]cl
}

if df.Spec.Snapshot != nil {
// err if pvc is not specified while cron is specified
if df.Spec.Snapshot.Cron != "" && df.Spec.Snapshot.PersistentVolumeClaimSpec == nil {
// err if pvc is not specified & s3 sir is not present while cron is specified
if df.Spec.Snapshot.Cron != "" && df.Spec.Snapshot.PersistentVolumeClaimSpec == nil && df.Spec.Snapshot.Dir == "" {
return nil, fmt.Errorf("cron specified without a persistent volume claim")
}

if df.Spec.Snapshot.PersistentVolumeClaimSpec != nil {
dir := "/dragonfly/snapshots"
if df.Spec.Snapshot.Dir != "" {
dir = df.Spec.Snapshot.Dir
}

if df.Spec.Snapshot.PersistentVolumeClaimSpec != nil {
// attach and use the PVC if specified
statefulset.Spec.VolumeClaimTemplates = append(statefulset.Spec.VolumeClaimTemplates, corev1.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Expand All @@ -187,11 +191,12 @@ func GetDragonflyResources(ctx context.Context, df *resourcesv1.Dragonfly) ([]cl

statefulset.Spec.Template.Spec.Containers[0].VolumeMounts = append(statefulset.Spec.Template.Spec.Containers[0].VolumeMounts, corev1.VolumeMount{
Name: "df",
MountPath: "/dragonfly/snapshots",
MountPath: dir,
})
}

statefulset.Spec.Template.Spec.Containers[0].Args = append(statefulset.Spec.Template.Spec.Containers[0].Args, "--dir=/dragonfly/snapshots")
statefulset.Spec.Template.Spec.Containers[0].Args = append(statefulset.Spec.Template.Spec.Containers[0].Args, fmt.Sprintf("--dir=%s", dir))

if df.Spec.Snapshot.Cron != "" {
statefulset.Spec.Template.Spec.Containers[0].Args = append(statefulset.Spec.Template.Spec.Containers[0].Args, fmt.Sprintf("--snapshot_cron=%s", df.Spec.Snapshot.Cron))
}
Expand Down
117 changes: 117 additions & 0 deletions s3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Configure Snapshots to S3 with the Dragonfly Operator

In this guide, We will see how to configure the Dragonfly Instances to use S3 as a backup location with the
Dragonfly Operator. While just having the AWS credentials in the environment (through a file or env) is enough
to use S3, In this guide we will use [AWS IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) to provide the credentials to the Dragonfly
pod.

As this mechanism uses OIDC (OpenID Connect) to authenticate the service account, we will also get the benefits of
credentials isolation and automatic rotation of credentials. This way we can avoid having to pass long lived credentials. This is all done automatically by EKS.

## Create an EKS cluster

```bash
eksctl create cluster --name df-storage --region us-east-1
```

## Create and Associate IAM OIDC Provider for your cluster

By following the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html), create and associate an IAM OIDC provider for your cluster.
This is a required step for the next steps to work.

## Create an S3 bucket

Now, we will create an S3 bucket to store the snapshots. This bucket can be created using the AWS console or using the AWS CLI.

```bash
aws s3api create-bucket --bucket dragonfly-backup --region us-east-1
```

## Create a policy to read a specific S3 bucket

We will now create a policy that allows the Dragonfly Instance to read and write to the S3 bucket we created in the previous step.

```bash
cat <<EOF > policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::dragonfly-backup/*",
"arn:aws:s3:::dragonfly-backup"
]
}
]
}
EOF
```

```bash
aws iam create-policy --policy-name dragonfly-backup --policy-document file://policy.json
```

## Associate the policy with a role

Now, we will associate the policy we created in the previous step with a role. This role will be used by the service account called `dragonfly-backup` which will also be created in this step.

```bash
eksctl create iamserviceaccount --name dragonfly-backup --namespace default --cluster df-storage --role-name dragonfly-backup --attach-policy-arn arn:aws:iam::<account-no>:policy/dragonfly-backup --approve
```

## Create a Dragonfly Instance with that service account

Let's create a Dragonfly Instance with the service account we created in the previous step. We will also configure the snapshot location to be the S3 bucket we created in the previous steps.

Important to note that this feature is only available from the `v1.12.0` version of the Dragonfly. Currently, We will
use the weekly release of Dragonfly to use this feature.

```bash
kubectl apply -f - <<EOF
apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
name: dragonfly-sample
spec:
replicas: 1
image: <>
snapshot:
dir: "s3://dragonfly-backup"
EOF
```

## Verify that the Dragonfly Instance is running

```bash
kubectl describe dragonfly dragonfly-sample
```

## Load Data and Terminate the Dragonfly Instance

Now, we will load some data into the Dragonfly Instance and then terminate the Dragonfly Instance.

```bash
kubectl run -it --rm --restart=Never redis-cli --image=redis:7.0.10 -- redis-cli -h dragonfly-sample.default SET 1 2
```

```bash
kubectl delete pod dragonfly-sample-0
```

## Verification

### Verify that the backups are created in the S3 bucket

```bash
aws s3 ls s3://dragonfly-backup
```

### Verify that the data is automatically restored

```bash
kubectl run -it --rm --restart=Never redis-cli --image=redis:7.0.10 -- redis-cli -h dragonfly-sample.default GET 1
```

As you can see, the data is automatically restored from the S3 bucket. This is because the Dragonfly instance is configured to use the S3 bucket as the snapshot location.

0 comments on commit 2034628

Please sign in to comment.