Skip to content

Template for reproducible audio ML research on TU Berlin's HPC cluster using DVC, Docker, and TensorBoard.

License

Notifications You must be signed in to change notification settings

tu-studio/hpc-cluster-ml-workflow

Repository files navigation

HPC-Cluster-ML-Workflow

This template provides a structured workflow tailored for audio machine learning research on the HPC Cluster of ZECM at TU Berlin. It was developed for projects that require continuous management of multiple experiments to ensure high reproducibility and reliability of results. By incorporating tools such as DVC, Docker, and TensorBoard, the template not only enhances reproducibility but also provides a robust framework for effective collaboration and seamless sharing of experiments.

Features

  • Reproducible Experiments:
    • Tracks all dependencies, configurations, and artifacts to ensure experiments can be easily reproduced and shared.
    • Uses containerization to maintain consistency across different systems.
  • Resource Optimization:
    • Reuses unchanged stages to avoid redundant computations, speeding up workflows and conserving resources.
  • Automation:
    • Reduces manual tasks through automated builds, data pipelines, and syncing, allowing you to focus on research.
  • HPC Integration:
    • Extends DVC for multi-node parallel experiments, optimizing HPC resource utilization.
    • Supports Docker for development, with automated conversion to Singularity for seamless HPC deployment.
  • TensorBoard Integration:
    • Provides visualization and comparison of DVC experiments with audio logging support of TensorBoard.
    • Enables real-time monitoring and quick decisions on underperforming runs.

Overview

The table below summarizes the key tools involved in the HPC-Cluster-ML-Workflow, detailing their primary roles and providing links to their official documentation for further reference.

Tool Role Documentation
Git Version control for code. Git Docs
DVC Data version control and pipeline management. DVC Docs
TensorBoard DVC experiment visualization and monitoring. TensorBoard Docs
Docker Containerization for development, converted to Singularity for HPC. Docker Docs
Singularity HPC-compatible containerization tool. Singularity Docs
SLURM Job scheduling and workload management on the HPC-Cluster. SLURM Docs

System Transfer

The figure below offers a simplified overview of how data is transferred between systems. While some of the commands depicted are automated by the provided workflows, the visualization is intended for comprehension and not as a direct usage reference.

Simplified diagram of dependency transfer between systems

Prerequisites

  • macOS, Windows or Linux operating system.
  • Access to an HPC Cluster with SLURM-sheduler.
  • Local Python installation.
  • Familiarity with Git, DVC, and Docker.
  • Docker Hub account.

Setup

Follow the setup instructions below for step-by-step guidance on configuring this template repository, which offers a basic PyTorch project that you can customize, reuse, or reference for your pipeline implementation.

Usage

Once the setup is complete, you can begin using the setup by referring to the User Guide provided. This guide will help you to understand how to develop, initiate experiments and monitor your training processes.

Contributors

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE.

References

Schulz, F. [faressc]. (n.d.). Guitar LSTM [pytorch-version]. GitHub. Link

About

Template for reproducible audio ML research on TU Berlin's HPC cluster using DVC, Docker, and TensorBoard.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published