Heart Disease Classification using AzureML pipelines (MLOps)

On this project we'll build a heart disease classification using AzureML pipelines (model details here) (dataset details here and here). All the pipelines are build using the AzureML SDK and an AzureML instance with your Azure account. We'll end up with two pieplines:

Training pipeline
Batch Inference pipeline

Training pipeline

The training pipeline is composed of 6 steps:

Data validation
Data preparation
Model training
Model evaluation
Model comparison
Model registration

The Data Validation validates the training data, checking for missing columns and data types. The Data Preparation preprocess the data and splits it into train and test datasets. The Model Training trains the new model. The Model Evaluation calculates some metrics, like F1-score and recall, and logs them to the AzureML execution experiment. The Model Comparison compares the new model being trained to the latest registered model on AzureML. If the new model's recall metric is greater or equals to the old model, the execution proceeds, if not, it fails and displays an error. The Model Registration registers the new model to AzureML.

Batch Inference pipeline

The batch inference pipeline is composed of 4 steps:

Data validation
Data preparation
Batch inference
Persist inference

The Data Validation validates the data to be inferred, checking for missing columns and data types. The Data Preparation downloads the model assets and use the fit preprocessor registered to preprocess the data. The Batch Inference downloads the model and make predictions. The Persist Inference saves the outputs to a specific path.

Setup the environment

Execute the script local-setup.sh to make the automated local setup. Execute the lines of code on the path of your project folder on Linux. Replace YOUR-AZURE-SUBSCRIPTION with your Azure subscription:

chmod a+x local-setup.sh
./local-setup.sh "YOUR-AZURE-SUBSCRIPTION"

This will install, build and activate the virtualenv. It will also install the project required libraries into this virtual environment. At last, it will login to your subscription on azure using the Azure CLI.

Now we will execute the script create-dotenv-file.sh to create the environment variables used on this project and save to a hidden file named .env. For that you need to replace some values inside this script with the values corresponding to your AzureML instance.

In the file create-dotenv-file.sh, you just have to replace the values of the resource group and workspace name for this project to work. Nonetheless, if you want to use this project as a framework for a custom further development, you can replace the other values as well as create other in this script.

RESOURCE_GROUP_NAME="YOUR-AZUREML-RG"
AML_WORKSPACE_NAME="YOUR-AZUREML-NAME"
STORAGE_ACCOUNT_NAME="YOUR-STORAGE"
KV_SERVICE_PRINCIPAL_SECRET_NAME="YOUR-KEY-VAULT-NAME"
AML_VNET_NAME=""
AML_SUBNET_NAME=""

Then execute the commands:

chmod a+x create-dotenv-file.sh
./create-dotenv-file.sh

If you're using the VSCode IDE, there are some configurations ready for you to use, like the laucher to run, test local files as well as to validate, run and deploy AzureML pipelines on Azure.

To execute without the laucher make sure you've activated the venv before running the pipelines.

For example:

# activate the virtual environment
. ./venv/bin/activate

# run training pipeline validation
~/your-path/heart-disease-classifier/ml/heart_disease/pipeline_train.py --validate

# run training pipeline experiment
~/your-path/heart-disease-classifier/ml/heart_disease/pipeline_train.py --run

# deploy training pipeline
~/your-path/heart-disease-classifier/ml/heart_disease/pipeline_train.py --deploy

# run inference pipeline validation
~/your-path/heart-disease-classifier/ml/heart_disease/pipeline_batch_inference.py --validate

# run inference pipeline experiment
~/your-path/heart-disease-classifier/ml/heart_disease/pipeline_batch_inference.py --run

# deploy inference pipeline
~/your-path/heart-disease-classifier/ml/heart_disease/pipeline_batch_inference.py --deploy

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.vscode		.vscode
images		images
ml		ml
notebooks		notebooks
.amlignore		.amlignore
.gitignore		.gitignore
README.md		README.md
auto-formatter.sh		auto-formatter.sh
create-dotenv-file.sh		create-dotenv-file.sh
local-setup.sh		local-setup.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Classification using AzureML pipelines (MLOps)

Training pipeline

Batch Inference pipeline

Setup the environment

About

Releases

Packages

Languages

paulocressoni/heart-disease-classification

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Classification using AzureML pipelines (MLOps)

Training pipeline

Batch Inference pipeline

Setup the environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages