Name		Name	Last commit message	Last commit date
parent directory ..
.assets		.assets
configs		configs
data		data
materializers		materializers
pipelines		pipelines
steps		steps
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
run.py		run.py

README.md

ZenML ECB Interest Rate Project

This project uses ZenML to create production-ready machine learning pipelines for predicting European Central Bank (ECB) interest rates. It demonstrates best practices for building and iterating on ML pipelines using ZenML's framework and integrations. The dataset is a slightly modified (column names) version of the one available at the European Central Bank website.

📊 Project Overview

The project consists of three main pipelines:

ETL Pipeline (Runs on Airflow)
- extract_data: Extracts raw ECB interest rate data
- transform: Transforms and cleans the data
- Output: ecb_transformed_dataset
Feature Engineering Pipeline (Runs on Airflow)
- Input: ecb_transformed_dataset
- augment: Augments the dataset with additional features
- Output: ecb_augmented_dataset
Model Training Pipeline (Runs on Airflow but the trainer step runs on Vertex AI)
- Input: ecb_augmented_dataset
- train_xgboost_model: Trains an XGBoost regression model
- promote_model: Evaluates and potentially promotes the new model

🚀 Getting Started

The pipelines run in two modes: develop and production.

develop mode is the default mode which means no pushing or pulling from GCP occurs. Data is written to and read from the local file system. This is good to iterate locally, and requires only the data/raw_data.csv file to be present.
production mode is what you can switch when you want to run this on a stack that contains Airflow or Vertex AI pipelines. It reads from a remote storage location and uses BigQuery to persist the results.

Set up a Python virtual environment:

# Set up a Python virtual environment, if you haven't already
python3 -m venv .venv
source .venv/bin/activate
# Install requirements & integrations
pip install -r requirements.txt
# Install integrations
zenml integration install gcp airflow

Configure your stack:

In develop mode, the default stack can be used, no changes needed.
In production mode, the default stack can be used as well, but you can build a remote stack like:

This is very simple using the ZenML GCP Stack Terraform module:

module "zenml_stack" {
  source  = "zenml-io/zenml-stack/gcp"

  project_id = "your-gcp-project-id"
  region = "europe-west1"
  orchestrator = "vertex" # or "skypilot" or "airflow"
  zenml_server_url = "https://your-zenml-server-url.com"
  zenml_api_key = "ZENKEY_1234567890..."
}
output "zenml_stack_id" {
  value = module.zenml_stack.zenml_stack_id
}
output "zenml_stack_name" {
  value = module.zenml_stack.zenml_stack_name
}

To learn more about the terraform script, read the ZenML documentation. or see the Terraform registry.

Looking for a different way to register or provision a stack? Check out the in-browser stack deployment wizard, or the stack registration wizard, for a shortcut on how to deploy & register a cloud stack.

Configure your pipelines:

In order to use production mode, please edit the following files with your dataset:

Point the data_path and table_id in the etl_production config to the place where your dataset is and where you want the data to be stored in BigQuery
Point the table_id in the feature_engineering_production config to the place where you want the output data to be stored.

Run the pipelines:

Here are some examples. In general, you should do the etl pipeline first, then the feature engineering, and then the training pipeline, as they all rely on each other.

# Run the ETL pipeline
python run.py --etl

# Run the ETL pipeline in production, i.e., using the right keys
python run.py --etl --mode production

# Run the feature engineering pipeline with the latest transformed dataset version
python run.py --feature --mode production

# Run the model training pipeline with the latest augmented dataset version
python run.py --training --mode production

# Run the feature engineering pipeline with a specific transformed dataset version
python run.py --feature --transformed_dataset_version "200"

# Run the model training pipeline with a specific augmented dataset version
python run.py --training --augmented_dataset_version "120"

After running the pipelines, you can check the results in the ZenML UI by following the link printed in the terminal. Next steps:

Explore the CLI options: python run.py --help
Review the project structure and code
Read the ZenML documentation to learn more about ZenML concepts
Start customizing the project for your specific needs

📜 Project Structure

The project follows the recommended ZenML project structure:

├── .assets # Asset files for the project ├── .git_cache # Git cache ├── .zen # ZenML configuration files ├── configs # Pipeline configuration files │ ├── etl_develop.yaml │ ├── etl_production.yaml │ ├── feature_engineering_develop.yaml │ ├── feature_engineering_production.yaml │ ├── training_develop.yaml │ └── training_production.yaml ├── data # Data files │ └── raw_data.csv ├── materializers # Custom materializers │ ├── bq_dataset_materializer.py │ ├── bq_dataset.py │ ├── csv_dataset_materializer.py │ ├── csv_dataset.py │ └── dataset.py ├── pipelines # ZenML pipeline implementations │ ├── etl.py │ ├── feature_engineering.py │ └── training.py ├── steps # ZenML step implementations │ ├── extract_data_local.py │ ├── extract_data_remote.py │ └── transform.py ├── feature_engineering # Feature engineering steps │ ├── augment.py │ └── promote.py ├── training # Training steps │ └── model_trainer.py ├── tmp # Temporary files ├── .dockerignore ├── .gitignore ├── demo.py # Demo script ├── LICENSE ├── Makefile ├── README.md # This file ├── requirements.txt # Python dependencies └── run.py # CLI tool to run pipelines

Feel free to modify and expand upon this project to suit your specific ECB interest rate prediction needs!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

airflow-cloud-composer-etl-feature-train

airflow-cloud-composer-etl-feature-train

README.md

ZenML ECB Interest Rate Project

📊 Project Overview

🚀 Getting Started

📜 Project Structure

Files

airflow-cloud-composer-etl-feature-train

Directory actions

More options

Directory actions

More options

Latest commit

History

airflow-cloud-composer-etl-feature-train

Folders and files

parent directory

README.md

ZenML ECB Interest Rate Project

📊 Project Overview

🚀 Getting Started

📜 Project Structure