Skip to content

Latest commit

 

History

History
273 lines (203 loc) · 9.22 KB

README.md

File metadata and controls

273 lines (203 loc) · 9.22 KB

Release Travis branch Docker Pulls Go Report Card license

Cain

Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.

Cain supports the following cloud storage services:

  • AWS S3
  • Minio S3
  • Azure Blob Storage
  • Google Cloud Storage

Cain is now an official part of the Helm incubator/cassandra chart!

Install

Prerequisites

  1. git
  2. dep

From a release

Download the latest release from the Releases page or use it with a Docker image

From source

mkdir -p $GOPATH/src/github.com/maorfr && cd $_
git clone https://github.com/maorfr/cain.git && cd cain
make

Commands

Backup Cassandra cluster to cloud storage

Cain performs a backup in the following way:

  1. Backup the keyspace schema (using cqlsh).
  2. Get backup data using nodetool snapshot - it creates a snapshot of the keyspace in all Cassandra pods in the given namespace (according to selector).
  3. Copy the files in parallel to cloud storage using Skbn - it copies the files to the specified dst, under namespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/.
  4. Clear all snapshots.

Usage

$ cain backup --help
backup cassandra cluster to cloud storage

Usage:
  cain backup [flags]

Flags:
  -b, --buffer-size float           in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
      --cassandra-data-dir string   cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
  -c, --container string            container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
      --dst string                  destination to backup to. Example: s3://bucket/cassandra. Overrides $CAIN_DST
  -k, --keyspace string             keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string            namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -p, --parallel int                number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
  -l, --selector string             selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")

Examples

Backup to AWS S3

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra

Backup to Azure Blob Storage

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst abs://my-account/db-backup-container/cassandra

Backup to Google Cloud Storage

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst gcs://db-backup/cassandra

Restore Cassandra backup from cloud storage

Cain performs a restore in the following way:

  1. Restore schema if schema is specified.
  2. Truncate all tables in keyspace.
  3. Copy files from the specified src (under keyspace/<keyspaceSchemaHash>/tag/) - restore is only possible for the same keyspace schema.
  4. Load new data using nodetool refresh.

Usage

$ cain restore --help
restore cassandra cluster from cloud storage

Usage:
  cain restore [flags]

Flags:
  -b, --buffer-size float           in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
      --cassandra-data-dir string   cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
  -c, --container string            container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
  -k, --keyspace string             keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string            namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -p, --parallel int                number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
  -s, --schema string               schema version to restore (optional). Overrides $CAIN_SCHEMA
  -l, --selector string             selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
      --src string                  source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name. Overrides $CAIN_SRC
  -t, --tag string                  tag to restore. Overrides $CAIN_TAG
      --user-group string           user and group who should own restored files. Overrides $CAIN_USER_GROUP (default "cassandra:cassandra")

Examples

Restore from S3

cain restore \
    --src s3://db-backup/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Restore from Azure Blob Storage

cain restore \
    --src s3://my-account/db-backup-container/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Restore from Google Cloud Storage

cain restore \
    --src gcs://db-backup/cassandra/default/ring01
    -n default \
    -k keyspace \
    -l release=cassandra \
    -t 20180903091624

Describe keyspace schema

Cain describes the keyspace schema using cqlsh. It can return the schema itself, or a checksum of the schema file (used by backup and restore).

Usage

$ cain schema --help
get schema of cassandra cluster

Usage:
  cain schema [flags]

Flags:
  -c, --container string   container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
  -k, --keyspace string    keyspace to act on. Overrides $CAIN_KEYSPACE
  -n, --namespace string   namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
  -l, --selector string    selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
      --sum                print only checksum. Overrides $CAIN_SUM

Examples

cain schema \
    -n default \
    -l release=cassandra \
    -k keyspace
cain schema \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --sum

Environment variables support

Cain commands support the usage of environment variables instead of flags. For example: The backup command can be executed as mentioned in the example:

cain backup \
    -n default \
    -l release=cassandra \
    -k keyspace \
    --dst s3://db-backup/cassandra

You can also set the appropriate envrionment variables (CAIN_FLAG, _ instead of -):

export CAIN_NAMESPACE=default
export CAIN_SELECTOR=release=cassandra
export CAIN_KEYSPACE=keyspace
export CAIN_DST=s3://db-backup/cassandra

cain backup

Support for additional storage services

Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.

Skbn compatibility matrix

Cain version Skbn version
0.6.0 0.5.0
0.5.1 0.4.2
0.5.0 0.4.1
0.4.2 0.4.1
0.4.1 0.4.1
0.4.0 0.4.0
0.3.0 0.3.0
0.2.0 0.2.0
0.1.0 0.1.1

Credentials

Kubernetes

Cain tries to get credentials in the following order:

  1. if KUBECONFIG environment variable is set - cain will use the current context from that config file
  2. if ~/.kube/config exists - cain will use the current context from that config file with an out-of-cluster client configuration
  3. if ~/.kube/config does not exist - cain will assume it is working from inside a pod and will use an in-cluster client configuration

AWS

Cain uses the default AWS credentials chain.

Azure Blob Storage

Cain uses AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY environment variables for authentication.

Google Cloud Storage

Cain uses Google Application Default Credentials. Basically, it will first look for the GOOGLE_APPLICATION_CREDENTIALS environment variable. If it is not defined, it will look for the default service account, or throw an error if none is configured.

Examples

  1. Helm example
  2. Code example