Skip to content

Commit

Permalink
changing the classifier example
Browse files Browse the repository at this point in the history
  • Loading branch information
bcdurak committed Dec 12, 2024
1 parent 05f77b9 commit f393f96
Show file tree
Hide file tree
Showing 6 changed files with 112 additions and 71 deletions.
100 changes: 59 additions & 41 deletions classifier-e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,58 +11,76 @@ pinned: false
license: apache-2.0
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# ZenML MLOps Breast Cancer Classification Demo

# 📜 ZenML Stack Show Case
## 🌍 Project Overview

This project aims to demonstrate the power of stacks. The code in this
project assumes that you have quite a few stacks registered already.
This is a minimalistic MLOps project demonstrating how to put machine learning
workflows into production using ZenML. The project focuses on building a breast
cancer classification model with end-to-end ML pipeline management.

## default
* `default` Orchestrator
* `default` Artifact Store
### Key Features

```commandline
zenml stack set default
python run.py --training-pipeline
- 🔬 Feature engineering pipeline
- 🤖 Model training pipeline
- 🧪 Batch inference pipeline
- 📊 Artifact and model lineage tracking
- 🔗 Integration with Weights & Biases for experiment tracking

## 🚀 Installation

1. Clone the repository
2. Install requirements:
```bash
pip install -r requirements.txt
```
3. Install ZenML integrations:
```bash
zenml integration install sklearn xgboost wandb -y
zenml login
zenml init
```
4. You need to register a stack with a [Weights & Biases Experiment Tracker](https://docs.zenml.io/stack-components/experiment-trackers/wandb).

## 🧠 Project Structure

- `steps/`: Contains individual pipeline steps
- `pipelines/`: Pipeline definitions
- `run.py`: Main script to execute pipelines

## 🔍 Workflow and Execution

First, you need to set your stack:

```bash
zenml stack set stack-with-wandb
```

## local-sagemaker-step-operator-stack
* `default` Orchestrator
* `s3` Artifact Store
* `local` Image Builder
* `aws` Container Registry
* `Sagemaker` Step Operator
### 1. Data Loading and Feature Engineering

```commandline
zenml stack set local-sagemaker-step-operator-stack
zenml integration install aws -y
python run.py --training-pipeline
- Uses the Breast Cancer dataset from scikit-learn
- Splits data into training and inference sets
- Preprocesses data for model training

```bash
python run.py --feature-pipeline
```

## sagemaker-airflow-stack
* `Airflow` Orchestrator
* `s3` Artifact Store
* `local` Image Builder
* `aws` Container Registry
* `Sagemaker` Step Operator

```commandline
zenml stack set sagemaker-airflow-stack
zenml integration install airflow -y
pip install apache-airflow-providers-docker apache-airflow~=2.5.0
zenml stack up
### 2. Model Training

- Supports multiple model types (SGD, XGBoost)
- Evaluates and compares model performance
- Tracks model metrics with Weights & Biases

```bash
python run.py --training-pipeline
```

## sagemaker-stack
* `Sagemaker` Orchestrator
* `s3` Artifact Store
* `local` Image Builder
* `aws` Container Registry
* `Sagemaker` Step Operator
### 3. Batch Inference

```commandline
zenml stack set sagemaker-stack
python run.py --training-pipeline
- Loads production model
- Generates predictions on new data

```bash
python run.py --inference-pipeline
```
20 changes: 16 additions & 4 deletions classifier-e2e/run_full.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -941,10 +941,17 @@
" .ravel()\n",
" .tolist(),\n",
" }\n",
" log_model_metadata(metadata={\"wandb_url\": wandb.run.url})\n",
" log_artifact_metadata(\n",
"\n",
" try:\n",
" if get_step_context().model:\n",
" log_metadata(metadata=metadata, infer_model=True)\n",
" except StepContextError:\n",
" # If a model is not configured, it is not able to log metadata\n",
" pass\n",
"\n",
" log_metadata(\n",
" metadata=metadata,\n",
" artifact_name=\"breast_cancer_classifier\",\n",
" artifact_version_id=get_step_context().inputs[\"model\"].id,\n",
" )\n",
"\n",
" wandb.log({\"train_accuracy\": metadata[\"train_accuracy\"]})\n",
Expand Down Expand Up @@ -1073,6 +1080,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "1e2130b9",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -1083,6 +1091,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "476cbf5c",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -1091,6 +1100,7 @@
},
{
"cell_type": "markdown",
"id": "75df10e7",
"metadata": {},
"source": [
"Now full run executed on local stack and experiment is tracked using Model Control Plane and Weights&Biases.\n",
Expand All @@ -1103,6 +1113,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "bfd6345f",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -1113,6 +1124,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "24358031",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -1136,7 +1148,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.11.3"
}
},
"nbformat": 4,
Expand Down
15 changes: 11 additions & 4 deletions classifier-e2e/run_skip_basics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -829,10 +829,17 @@
" .ravel()\n",
" .tolist(),\n",
" }\n",
" log_model_metadata(metadata={\"wandb_url\": wandb.run.url})\n",
" log_artifact_metadata(\n",
"\n",
" try:\n",
" if get_step_context().model:\n",
" log_metadata(metadata=metadata, infer_model=True)\n",
" except StepContextError:\n",
" # If a model is not configured, it is not able to log metadata\n",
" pass\n",
"\n",
" log_metadata(\n",
" metadata=metadata,\n",
" artifact_name=\"breast_cancer_classifier\",\n",
" artifact_version_id=get_step_context().inputs[\"model\"].id,\n",
" )\n",
"\n",
" wandb.log({\"train_accuracy\": metadata[\"train_accuracy\"]})\n",
Expand Down Expand Up @@ -1211,7 +1218,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.11.3"
}
},
"nbformat": 4,
Expand Down
6 changes: 5 additions & 1 deletion classifier-e2e/steps/deploy_endpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from utils.aws import get_aws_config
from utils.sagemaker_materializer import SagemakerPredictorMaterializer
from zenml import ArtifactConfig, get_step_context, log_artifact_metadata, step
from zenml.enums import ArtifactType


@step(
Expand All @@ -16,7 +17,10 @@
def deploy_endpoint() -> (
Annotated[
Predictor,
ArtifactConfig(name="sagemaker_endpoint", is_deployment_artifact=True),
ArtifactConfig(
name="sagemaker_endpoint",
artifact_type=ArtifactType.SERVICE
),
]
):
role, session, region = get_aws_config()
Expand Down
35 changes: 16 additions & 19 deletions classifier-e2e/steps/model_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,7 @@
import wandb
from sklearn.base import ClassifierMixin
from sklearn.metrics import confusion_matrix
from zenml import (
get_step_context,
log_artifact_metadata,
log_model_metadata,
step,
)
from zenml import step, log_metadata, get_step_context
from zenml.client import Client
from zenml.exceptions import StepContextError
from zenml.logger import get_logger
Expand Down Expand Up @@ -60,12 +55,12 @@ def model_evaluator(
step to force the pipeline run to fail early and all subsequent steps to
be skipped.
This step is parameterized to configure the step independently of the step code,
before running it in a pipeline. In this example, the step can be configured
to use different values for the acceptable model performance thresholds and
to control whether the pipeline run should fail if the model performance
does not meet the minimum criteria. See the documentation for more
information:
This step is parameterized to configure the step independently of the step
code, before running it in a pipeline. In this example, the step can be
configured to use different values for the acceptable model performance
thresholds and to control whether the pipeline run should fail if the model
performance does not meet the minimum criteria. See the documentation for
more information:
https://docs.zenml.io/user-guide/advanced-guide/configure-steps-pipelines
Expand All @@ -89,17 +84,19 @@ def model_evaluator(
dataset_tst.drop(columns=[target]),
dataset_tst[target],
)
logger.info(f"Train accuracy={trn_acc*100:.2f}%")
logger.info(f"Test accuracy={tst_acc*100:.2f}%")
logger.info(f"Train accuracy={trn_acc * 100:.2f}%")
logger.info(f"Test accuracy={tst_acc * 100:.2f}%")

messages = []
if trn_acc < min_train_accuracy:
messages.append(
f"Train accuracy {trn_acc*100:.2f}% is below {min_train_accuracy*100:.2f}% !"
f"Train accuracy {trn_acc * 100:.2f}% is below "
f"{min_train_accuracy * 100:.2f}% !"
)
if tst_acc < min_test_accuracy:
messages.append(
f"Test accuracy {tst_acc*100:.2f}% is below {min_test_accuracy*100:.2f}% !"
f"Test accuracy {tst_acc * 100:.2f}% is below "
f"{min_test_accuracy * 100:.2f}% !"
)
else:
for message in messages:
Expand All @@ -115,14 +112,14 @@ def model_evaluator(
}
try:
if get_step_context().model:
log_model_metadata(metadata={"wandb_url": wandb.run.url})
log_metadata(metadata=metadata, infer_model=True)
except StepContextError:
# if model not configured not able to log metadata
pass

log_artifact_metadata(
log_metadata(
metadata=metadata,
artifact_name="breast_cancer_classifier",
artifact_version_id=get_step_context().inputs["model"].id,
)

wandb.log(
Expand Down
7 changes: 5 additions & 2 deletions classifier-e2e/steps/model_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from typing import Optional

Expand All @@ -23,6 +22,7 @@
from typing_extensions import Annotated
from utils.sagemaker_materializer import SagemakerMaterializer
from zenml import ArtifactConfig, step
from zenml.enums import ArtifactType
from zenml.logger import get_logger

logger = get_logger(__name__)
Expand All @@ -39,7 +39,10 @@ def model_trainer(
target: Optional[str] = "target",
) -> Annotated[
ClassifierMixin,
ArtifactConfig(name="breast_cancer_classifier", is_model_artifact=True),
ArtifactConfig(
name="breast_cancer_classifier",
artifact_tyoe=ArtifactType.MODEL,

Check warning on line 44 in classifier-e2e/steps/model_trainer.py

View workflow job for this annotation

GitHub Actions / spell-check

"tyoe" should be "toe" or "toey" or "type".
),
]:
"""Configure and train a model on the training dataset.
Expand Down

0 comments on commit f393f96

Please sign in to comment.