12 Jan 18:17

jeffra

DeepSpeed v0.3.10

v0.3.10 Release notes

Combined release notes since November 12th v0.3.1 release

Various updates to torch.distributed initialization
- New deepspeed.init_distributed API, #608, #645, #644
- Improved AzureML support for patching torch.distributed backend, #542
- Simplify dist init and only init if needed #553
Transformer kernel updates
- Support for different hidden dimensions #559
- Support arbitrary sequence-length #587
Elastic training support (#602)
- NOTE: More details to come on this feature, currently still in initial piloting of this feature.
Module replacement support #586
- NOTE: Will be used more and documented in the short-term to help automatically inject/replace deepspeed ops into client models.
#528 removes dependencies psutil and cpufeature
Various ZeRO 1 and 2 bug fixes and updates: #531, #532, #545, #548
#543 backwards compatible checkpoints with older deepspeed v0.2 version
Add static_loss_scale support to unfused optimizer #546
Bug fix for norm calculation in absence of model parallel group #551
Switch CI from azure pipelines to github actions
Deprecate client ability to disable gradient reduction #552
Bug fix for tracking optimizer step in cpu-adam when loading checkpoint #564
Improved support for Ampere architecture #572, #570, #577, #578, #591, #642
Fix potential random layout inconsistency issues in sparse attention modules #534
Supported customizing kwargs for lr_scheduler #584
Support deepspeed.initialize with dict configuration instead of arg #632
Allow DeepSpeed models to be initialized with optimizer=None #469

Special thanks to our contributors in this release

@stas00, @gcooper-isi, @g-karthik, @sxjscience, @brettkoonce, @carefree0910, @Justin1904, @harrydrippin

Assets 2

12 Nov 19:55

jeffra

DeepSpeed v0.3.1

Updates

Efficient and robust compressed training through progressive layer dropping
JIT compilation of C++/CUDA extensions
Python-only install support, ~10x faster install time
PyPI hosted installation via pip install deepspeed
Removed apex dependency
Bug fixes for ZeRO-offload and CPU-Adam
Transformer support for dynamic sequence length (#424)
Linear warmup+decay lr schedule (#414)

Assets 2

10 Sep 19:44

jeffra

DeepSpeed v0.3.0

New features

DeepSpeed: Extreme-scale model training for everyone

Software improvements

Refactor codebase to make cleaner distinction between ops/runtime/zero/etc.
Conditional Op builds
- Not all users should have to spend time building transformer kernels if they don't want to use them.
- To ensure DeepSpeed is portable in multiple environments some features require unique dependencies that not everyone will be able to or want to install.
DeepSpeed launcher supports different backends in additional to pdsh such as Open MPI and MVAPICH.

Assets 2

16 Jun 06:32

jeffra

DeepSpeed v0.2.0

DeepSpeed 0.2.0 Release Notes

Features

ZeRO-1 with reduce scatter
ZeRO-2
Transformer kernels
Various bug fixes and usability improvements

Assets 2

19 May 06:41

jeffra

DeepSpeed v0.1.0

DeepSpeed 0.1.0 Release Notes

Features

Distributed Training with Mixed Precision
- 16-bit mixed precision
- Single-GPU/Multi-GPU/Multi-Node
Model Parallelism
- Support for Custom Model Parallelism
- Integration with Megatron-LM
Memory and Bandwidth Optimizations
- Zero Redundancy Optimizer (ZeRO) stage 1 with all-reduce
- Constant Buffer Optimization (CBO)
- Smart Gradient Accumulation
Training Features
- Simplified training API
- Gradient Clipping
- Automatic loss scaling with mixed precision
Training Optimizers
- Fused Adam optimizer and arbitrary torch.optim.Optimizer
- Memory bandwidth optimized FP16 Optimizer
- Large Batch Training with LAMB Optimizer
- Memory efficient Training with ZeRO Optimizer
Training Agnostic Checkpointing
Advanced Parameter Search
- Learning Rate Range Test
- 1Cycle Learning Rate Schedule
Simplified Data Loader
Performance Analysis and Debugging

Assets 2