Skip to content

Using the SageWorks Python API

Brian Wylie edited this page Nov 5, 2023 · 15 revisions

After Onboarding SageWorks to your AWS Account AWS Onboarding you are now ready to have your Data Science team start doing Science!

SSO Set up the AWS CLI/Python Usage

Note: There's no special setup for SageWorks. If you already have this setup/working, you can skip. If you haven't already setup your local/laptop with an AWS PROFILE/SSO User then please see our guide AWS SSO Setup.

Note: That you only need to do this setup once and then you're ready to go from then on.

Python Setup: Virtual Environments

SageWorks requires Python 3.10 or higher. PyEnv is great but feel free to use any Python Environment (Anaconda, VirtualEnv, PyEnv) that you'd like.

Installing SageWorks

pip install sageworks

Set the SageWorks Artifacts Bucket ENV Var

Note: You may want to put this ENV var in your ~/.bashrc or ~/.zshrc or Windows Environments

export SAGEWORKS_BUCKET=mycompany-sageworks-bucket (or whatever this bucket is called)
set SAGEWORKS_BUCKET=mycompany-sageworks-bucket # on Windows
$Env:SAGEWORKS_BUCKET = "mycompany-sageworks-bucket" # on Windows (PowerShell)

Windows PowerShell Instructions (for Anaconda Installs)

For window put this in any folder and add it to your system path.

$env:AWS_PROFILE = "my-aws-profile"
$env:SAGEWORKS_BUCKET = "mycompany-sageworks-bucket"
aws sso login --profile my-aws-profile

Testing out the AWS Connection

  • Make sure your AWS_PROFILE is set correctly
  • Renew your SSO Token
  • Try it out
$ ipython
In [1]: from sageworks.views.artifacts_text_view import ArtifactsTextView
In [2]: ArtifactsTextView().summary()

===============================================================================================================
GLUE_JOBS
===============================================================================================================
          Name GlueVersion  Workers WorkerType         Modified          LastRun    Status
NSM_Log_Loader         4.0        4       G.1X 2023-08-14 17:09 2023-09-02 02:48 SUCCEEDED
dns_load_heavy         4.0        4       G.1X 2023-06-06 16:10 2023-06-06 16:10 SUCCEEDED

===============================================================================================================
DATA_SOURCES
===============================================================================================================
             Name Ver Size(MB)         Modified  Num Columns  DataLake           Tags                                          Input
     abalone_data  25     0.07 2023-09-22 22:40            9     False abalone:public /Users/briford/work/sageworks/data/abalone.csv
abalone_data_copy  20     0.07 2023-09-24 01:54            9     False abalone:public                                   abalone_data
        test_data  10     0.01 2023-09-22 23:34           10     False     test:small                                      DataFrame

===============================================================================================================
FEATURE_SETS
===============================================================================================================
      Feature Group Size(MB)             Catalog DB                   Athena Table Online          Created           Tags        Input
   test_feature_set     0.47 sagemaker_featurestore    test_feature_set_1695520074   True 2023-09-24 01:47     test:small    test_data
abalone_feature_set     0.59 sagemaker_featurestore abalone_feature_set_1695423652   True 2023-09-22 23:00 abalone:public abalone_data

===============================================================================================================
MODELS
===============================================================================================================
       Model Group  Ver    Status              Description          Created               Tags               Input
abalone-regression    1 Completed Abalone Regression Model 2023-09-22 23:15 abalone:regression abalone_feature_set

===============================================================================================================
ENDPOINTS
===============================================================================================================
                  Name    Status          Created DataCapture Sampling(%)               Tags              Input
abalone-regression-end InService 2023-09-22 23:15       False           - abalone:regression abalone-regression

Reduce logging verbosity

SageWorks is currently quite verbose in it's logging, so if you want to make it a bit more quiet you can reduce the logging with this bit of code:

import logging
logging.getLogger("sageworks").setLevel(logging.WARNING)

Errors

If you get an error like this it means that your AWS_PROFILE needs to be set or your need to renew your SSO Token

RuntimeError: AWS Identity Check Failure: Check AWS_PROFILE and/or Renew SSO Token...

Jupyter Notebooks

You can obviously run any of the SageWorks API in a Jupyter Notebook, so feel free to look at the Tutorials below. Also if you want to set an ENV var in a notebook you can just do

import os
os.environ['SAGEWORKS_BUCKET'] = 'mycompany-sageworks-bucket'

Using SageWorks Tutorials

Redis (Optional)

SageWorks uses an OPTIONAL Redis database as a temporal cache to minimize AWS Service calls if you have access to a Redis database set that ENV Var

export REDIS_HOST=<your redis host>

You can also spin up a local docker image super easy but again this is optional, it will make SageWorks more responsive but is not required.

docker run --name my-redis -p 6379:6379 -d redis