You've finished your wonderful web prototpe for the Data Fest, your model is awesome and that visualizaiton is going to blow everyone's minds... You get on stage for a live demo and open the webapp only to find that nothing works. Spooky.
To prevent that from happening, it's important to write tests. Think about it, when you write a function, when do you know it works™? You probably test it against some ad hoc inputs and, if you get the proper result, you move on. Maybe your function does indeed work today, but what if in the following weeks you change a few lines of the function to make it faster, but you don't want to spend time testing again (it works™)? Maybe you just introduced a bug and didn't notice.
For that reason, it's important to write automated tests to check your code and make sure that everything works consistently, and that you can identify bugs introduced by future changes.
This tutorial is divided in three parts. The first one addresses unit testing, which is a simple (but powerful) form of testing. The second part is devoted specifically to testing Data Science pipelines.
The last section covers Continuous Integration (or CI). CI is a technique to automatically run all of your tests every time you push new code to Github. CI is language-agnostic, but our tutorial specifically addresses Python, because it is the language we use the most.
- Part 1: Unit Testing with Python (interactive tutorial), How I Learned to Stop Worrying and Love Unit Testing (teachout slides - Kat Rasch)
- Part 2: Testing Python Data Science Pipelines
- Part 3: Continuous Integration