- What is a real-time feature pipeline?
- Cool, but how can I implement one?
- What is this repo about?
- Run the whole thing in 10 minutes
- Wanna learn more real-time ML?
Machine Learning models are as good as the input features you feed at training and inference time.
And for many real-world applications, like financial trading, these features must be generated and served as fast as possible, so the ML system produces the best predictions possible.
Generating and serving features fast is what a real-time feature pipeline does.
Python alone is not a language designed for speed π’, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink.
However, things are changing fast with the emergence of Rust π¦ and libraries like Bytewax π that expose a pure Python API on top of a highly-efficient language like Rust.
So you get the best from both worlds.
- Rust's speed and performance, plus
- Python-rich ecosystem of libraries.
So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.
In this repository you will learn how to develop and deploy a real-time feature pipeline in 100% Python that
- fetches real-time trade data (aka raw data) from the Coinbase Websocket API
- transforms trade data into OHLC data (aka features) in real-time using Bytewax, and
- stores these features in the Hopsworks Feature Store
You will also build a dashboard using Bokeh and Streamlit to visualize the final features, in real-time.
-
Create a Python virtual environment with the project dependencies with
$ make init
-
Set your Hopsworks API key and project name variables in
set_environment_variables_template.sh
, rename the file and run it (sign up for free at hospworks.ai to get these 2 values)$ . ./set_environment_variables.sh
-
To run the feature pipeline locally
$ make run
-
To spin up a Streamlit dashboard to visualize the data in real-time
$ make frontend
-
To run the feature pipeline on an AWS EC2 instance you first need to have an AWS account and the
aws-cli
tool installed in your local system. Then run the following command to deploy your feature pipeline onto an EC2 instance$ make deploy
-
Feature pipeline logs are send to AWS CloudWatch. Run the following command to grab the URL where you can see the logs.
$ make info
-
To shutdown the feature pipeline on AWS and free resources run
$ make undeploy
I am preparing a new hands-on tutorial where you will learn to buld a complete real-time ML system, from A to Z.
β‘οΈ Subscribe to The Real-World ML Newsletter to be notified when the tutorial is out.