Skip to content

Releases: microsoft/DeepSpeed

DeepSpeed v0.3.10

12 Jan 18:17
Compare
Choose a tag to compare

v0.3.10 Release notes

Combined release notes since November 12th v0.3.1 release

  • Various updates to torch.distributed initialization
    • New deepspeed.init_distributed API, #608, #645, #644
    • Improved AzureML support for patching torch.distributed backend, #542
    • Simplify dist init and only init if needed #553
  • Transformer kernel updates
    • Support for different hidden dimensions #559
    • Support arbitrary sequence-length #587
  • Elastic training support (#602)
    • NOTE: More details to come on this feature, currently still in initial piloting of this feature.
  • Module replacement support #586
    • NOTE: Will be used more and documented in the short-term to help automatically inject/replace deepspeed ops into client models.
  • #528 removes dependencies psutil and cpufeature
  • Various ZeRO 1 and 2 bug fixes and updates: #531, #532, #545, #548
  • #543 backwards compatible checkpoints with older deepspeed v0.2 version
  • Add static_loss_scale support to unfused optimizer #546
  • Bug fix for norm calculation in absence of model parallel group #551
  • Switch CI from azure pipelines to github actions
  • Deprecate client ability to disable gradient reduction #552
  • Bug fix for tracking optimizer step in cpu-adam when loading checkpoint #564
  • Improved support for Ampere architecture #572, #570, #577, #578, #591, #642
  • Fix potential random layout inconsistency issues in sparse attention modules #534
  • Supported customizing kwargs for lr_scheduler #584
  • Support deepspeed.initialize with dict configuration instead of arg #632
  • Allow DeepSpeed models to be initialized with optimizer=None #469

Special thanks to our contributors in this release

@stas00, @gcooper-isi, @g-karthik, @sxjscience, @brettkoonce, @carefree0910, @Justin1904, @harrydrippin

DeepSpeed v0.3.1

12 Nov 19:55
31f46fe
Compare
Choose a tag to compare

Updates

  • Efficient and robust compressed training through progressive layer dropping
  • JIT compilation of C++/CUDA extensions
  • Python-only install support, ~10x faster install time
  • PyPI hosted installation via pip install deepspeed
  • Removed apex dependency
  • Bug fixes for ZeRO-offload and CPU-Adam
  • Transformer support for dynamic sequence length (#424)
  • Linear warmup+decay lr schedule (#414)

DeepSpeed v0.3.0

10 Sep 19:44
Compare
Choose a tag to compare

New features

Software improvements

  • Refactor codebase to make cleaner distinction between ops/runtime/zero/etc.
  • Conditional Op builds
    • Not all users should have to spend time building transformer kernels if they don't want to use them.
    • To ensure DeepSpeed is portable in multiple environments some features require unique dependencies that not everyone will be able to or want to install.
  • DeepSpeed launcher supports different backends in additional to pdsh such as Open MPI and MVAPICH.

DeepSpeed v0.2.0

16 Jun 06:32
96c4daa
Compare
Choose a tag to compare

DeepSpeed 0.2.0 Release Notes

Features

  • ZeRO-1 with reduce scatter
  • ZeRO-2
  • Transformer kernels
  • Various bug fixes and usability improvements

DeepSpeed v0.1.0

19 May 06:41
c61e23b
Compare
Choose a tag to compare

DeepSpeed 0.1.0 Release Notes

Features

  • Distributed Training with Mixed Precision
    • 16-bit mixed precision
    • Single-GPU/Multi-GPU/Multi-Node
  • Model Parallelism
    • Support for Custom Model Parallelism
    • Integration with Megatron-LM
  • Memory and Bandwidth Optimizations
    • Zero Redundancy Optimizer (ZeRO) stage 1 with all-reduce
    • Constant Buffer Optimization (CBO)
    • Smart Gradient Accumulation
  • Training Features
    • Simplified training API
    • Gradient Clipping
    • Automatic loss scaling with mixed precision
  • Training Optimizers
    • Fused Adam optimizer and arbitrary torch.optim.Optimizer
    • Memory bandwidth optimized FP16 Optimizer
    • Large Batch Training with LAMB Optimizer
    • Memory efficient Training with ZeRO Optimizer
  • Training Agnostic Checkpointing
  • Advanced Parameter Search
    • Learning Rate Range Test
    • 1Cycle Learning Rate Schedule
  • Simplified Data Loader
  • Performance Analysis and Debugging