-
Notifications
You must be signed in to change notification settings - Fork 1
Using the SageWorks Python API
After Onboarding SageWorks to your AWS Account AWS Onboarding you are now ready to have your Data Science team start doing Science!
Note: There's no special setup for SageWorks. If you already have this setup/working, you can skip. If you haven't already setup your local/laptop with an AWS PROFILE/SSO User then please see our guide AWS SSO Setup.
Note: That you only need to do this setup once and then you're ready to go from then on.
SageWorks requires Python 3.10 or higher. PyEnv is great but feel free to use any Python Environment (Anaconda, VirtualEnv, PyEnv) that you'd like.
pip install sageworks
Note: You may want to put this ENV var in your ~/.bashrc or ~/.zshrc or Windows Environments
export SAGEWORKS_BUCKET=mycompany-sageworks-bucket (or whatever this bucket is called)
set SAGEWORKS_BUCKET=mycompany-sageworks-bucket # on Windows
$Env:SAGEWORKS_BUCKET = "mycompany-sageworks-bucket" # on Windows (PowerShell)
For window put this in any folder and add it to your system path.
$env:AWS_PROFILE = "my-aws-profile"
$env:SAGEWORKS_BUCKET = "mycompany-sageworks-bucket"
aws sso login --profile my-aws-profile
- Make sure your AWS_PROFILE is set correctly
- Renew your SSO Token
- Try it out
$ ipython
In [1]: from sageworks.views.artifacts_text_view import ArtifactsTextView
In [2]: ArtifactsTextView().summary()
===============================================================================================================
GLUE_JOBS
===============================================================================================================
Name GlueVersion Workers WorkerType Modified LastRun Status
NSM_Log_Loader 4.0 4 G.1X 2023-08-14 17:09 2023-09-02 02:48 SUCCEEDED
dns_load_heavy 4.0 4 G.1X 2023-06-06 16:10 2023-06-06 16:10 SUCCEEDED
===============================================================================================================
DATA_SOURCES
===============================================================================================================
Name Ver Size(MB) Modified Num Columns DataLake Tags Input
abalone_data 25 0.07 2023-09-22 22:40 9 False abalone:public /Users/briford/work/sageworks/data/abalone.csv
abalone_data_copy 20 0.07 2023-09-24 01:54 9 False abalone:public abalone_data
test_data 10 0.01 2023-09-22 23:34 10 False test:small DataFrame
===============================================================================================================
FEATURE_SETS
===============================================================================================================
Feature Group Size(MB) Catalog DB Athena Table Online Created Tags Input
test_feature_set 0.47 sagemaker_featurestore test_feature_set_1695520074 True 2023-09-24 01:47 test:small test_data
abalone_feature_set 0.59 sagemaker_featurestore abalone_feature_set_1695423652 True 2023-09-22 23:00 abalone:public abalone_data
===============================================================================================================
MODELS
===============================================================================================================
Model Group Ver Status Description Created Tags Input
abalone-regression 1 Completed Abalone Regression Model 2023-09-22 23:15 abalone:regression abalone_feature_set
===============================================================================================================
ENDPOINTS
===============================================================================================================
Name Status Created DataCapture Sampling(%) Tags Input
abalone-regression-end InService 2023-09-22 23:15 False - abalone:regression abalone-regression
SageWorks is currently quite verbose in it's logging, so if you want to make it a bit more quiet you can reduce the logging with this bit of code:
import logging
logging.getLogger("sageworks").setLevel(logging.WARNING)
If you get an error like this it means that your AWS_PROFILE needs to be set or your need to renew your SSO Token
RuntimeError: AWS Identity Check Failure: Check AWS_PROFILE and/or Renew SSO Token...
You can obviously run any of the SageWorks API in a Jupyter Notebook, so feel free to look at the Tutorials below. Also if you want to set an ENV var in a notebook you can just do
import os
os.environ['SAGEWORKS_BUCKET'] = 'mycompany-sageworks-bucket'
- Notebook: SageWorks Pipeline Building an AWS® ML Pipeline from start to finish.
- Video: Coding with SageWorks Informal coding + chatting while building a full ML pipeline.
SageWorks uses an OPTIONAL Redis database as a temporal cache to minimize AWS Service calls if you have access to a Redis database set that ENV Var
export REDIS_HOST=<your redis host>
You can also spin up a local docker image super easy but again this is optional, it will make SageWorks more responsive but is not required.
docker run --name my-redis -p 6379:6379 -d redis