- Build Data Vault powered by dbtVault and Greenplum
- Deploy Infrastructure as Code with Terraform and Yandex.Cloud
- Instant development with Github Codespaces
- Assignment checks with Github Actions
- Fork this repository
- Configure Developer Environment
- Deploy Infrastructure
- Check database connection
- Populate Data Vault day-by-day
- Build Business Vault on top of Data Vault
- Create PR and pass CI tests
-
You have got 3 options to set up:
Start with GitHub Codespaces / Dev Container:
Open in Github Codespace:
Or open in a local Dev Container (VS Code):
Set up Docker containers manually:
Install Docker and run commands:
# build & run container docker-compose build docker-compose up -d # alias docker exec command alias dbt="docker-compose exec dev dbt"
Alternatively, install on a local machine:
-
Configure profile manually by yourself. By default, dbt expects the
profiles.yml
file to be located in the~/.dbt/
directory. Use this template and enter your own credentials. -
Intsall yc CLI
-
Install Terraform
-
-
Populate
.env
file.env
is used to store secrets as environment variables.Copy template file .env.template to
.env
file:cp .env.template .env
Open file in editor and set your own values.
❗️ Never commit secrets to git
-
Get familiar with Managed Service for Greenplum
-
Install and configure
yc
CLI: Getting started with the command-line interface by Yandex Cloudyc init
-
Set environment variables:
export YC_TOKEN=$(yc iam create-token) export YC_CLOUD_ID=$(yc config get cloud-id) export YC_FOLDER_ID=$(yc config get folder-id) export $(xargs <.env)
-
Deploy using yc CLI
Add network, greenplum, egress NAT (s3)
```bash
yc managed-greenplum cluster create gp_datavault \
--network-name default \
--zone-id ru-central1-a \
--environment prestable \
--master-host-count 2 \
--segment-host-count 2 \
--master-config resource-id=s3-c2-m8,disk-size=30,disk-type=network-ssd \
--segment-config resource-id=s3-c2-m8,disk-size=30,disk-type=network-ssd \
--segment-in-host 1 \
--user-name greenplum \
--user-password $(TF_VAR_greenplum_password) \
--greenplum-version 6.22 \
--assign-public-ip
```
-
Deploy using Terraform
terraform init terraform validate terraform fmt terraform plan terraform apply
Store terraform output values as Environment Variables:
export DBT_HOST=$(terraform output -raw greenplum_host_fqdn) export DBT_USER='greenplum' export DBT_PASSWORD=${TF_VAR_greenplum_password} export DBT_HOST='rc1b-j9injttb11tl6ohd.mdb.yandexcloud.net,rc1b-o0tu24372qtf0qko.mdb.yandexcloud.net' export DBT_USER='greenplum' export DBT_PASSWORD='greenplum'
[EN] Reference: Getting started with Terraform by Yandex Cloud
[RU] Reference: Начало работы с Terraform by Yandex Cloud
! To connect to external sources, set up an NAT gateway for the subnet hosting the Managed Service for Greenplum® cluster. https://cloud.yandex.com/en/docs/vpc/operations/create-nat-gateway
- First read the official guide:**
- Install dependencies
Initial repo is intended to run on Snowflake only.
I have forked it and adapted to run on Greenplum/PostgreSQL. Check out what has been changed: 47e0261cea67c3284ea409c86dacdc31b1175a39
packages.yml
:
packages:
# - package: Datavault-UK/dbtvault
# version: 0.7.3
- git: "https://github.com/kzzzr/dbtvault.git"
revision: master
warn-unpinned: false
Install package:
dbt deps
- Adapt models to Greenplum/PostgreSQL
Check out the commit history.
- a97a224 - adapt prepared staging layer for greenplum - Artemiy Kozyr (HEAD -> master, kzzzr/master)
- dfc5866 - configure raw layer for greenplum - Artemiy Kozyr
- bba7437 - configure data sources for greenplum - Artemiy Kozyr
- aa25600 - configure package (adapted dbt_vault) for greenplum - Artemiy Kozyr
- eafed95 - configure dbt_project.yml for greenplum - Artemiy Kozyr
- Run models step-by-step
Load one day to Data Vault structures:
dbt run -m tag:raw
dbt run -m tag:stage
dbt run -m tag:hub
dbt run -m tag:link
dbt run -m tag:satellite
dbt run -m tag:t_link
- Load next day
Simulate next day load by incrementing load_date
varible:
# dbt_profiles.yml
vars:
load_date: '1992-01-08' # increment by one day '1992-01-09'