See our developing doc for build pre-requisites.
To run a single test, first connect to a cluster and then run the following:
scripts/run-oneshot.sh --accelerator v2-8 --file tests/tensorflow/nightly/mnist.libsonnet --type functional
To build all of the templates and output Kubernetes resources, run the following:
scripts/gen-tests.sh
This command will output Kubernetes CronJob
resources into k8s/
directory.
Note: Googlers and contributors working out of this repository don't need to manually deploy generated Kubernetes resources with kubectl
, since we have triggers set up to do that automatically.
To create a new test, start by copying a similar file from the same ML framework and version. Update the training commands as necessary, and add that file to the targets.jsonnet
in the same directory.
See here for details on configuring alerts and recording the training metrics of your test.
Before you send your code for review, we recommend that you run a one-shot test using the command above to ensure that the test works as expected. If you're not sure what the generated name of your test will be, try running multifile.jsonnet
to see what the file names of the generated tests are.