Releases · microsoft/DeepSpeed

11 Sep 21:12

jeffra

v0.10.3

04a6fed

v0.10.3: Patch release

New Features

ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading

What's Changed

Add Mixed Precision ZeRO++ tutorial by @HeyangQin in #4241
DeepSpeed-Chat Llama2/stability release by @awan-10 in #4240
Update README.md by @awan-10 in #4244
Pin Triton version to >=2.0.0 and <2.1.0 by @lekurile in #4251
Allow modification of zero partitioned parameters by @tjruwase in #4192
Checks for user injection policy by @satpalsr in #3052
Add check that opening issues on CI failure requires schedule by @loadams in #4242
Code Refactoring by @tosemml in #4262
tolerating missing optimizer states for MoE [2nd attempt] by @clumsy in #4120
Fix nv-inference/un-pin transformers by @loadams in #4269
check for zero (empty) param groups in llama + hf/accelerate. by @awan-10 in #4270
use non_reentrant_checkpoint fix requires_grad of input must be true for activation checkpoint layer in pipeline train. by @inkcherry in #4224
The PostBackwardFunction class should be more clearly named to distinguish it from the PreBackwardFunction class. by @Crispig in #2548
fix iteration timing used in autotuning when gradient_accumulation_steps > 1 by @cli99 in #2888
Update README.md by @NinoRisteski in #4284
update deepspeed to run with the most recent triton 2.1.0 by @stephen-youn in #4278
Keep hpz secondary tensor in forward pass by @HeyangQin in #4288
Support iterators with incompletely defined len functions by @codedecde in #2445
AMD Kernel Compatibility Fixes by @cmikeh2 in #3180
ZeRO-Inference refresh by @tjruwase in #4197
fix user args parsing of string with spaces on runner by @YudiZh in #4265
Update index.md by @NinoRisteski in #4297

New Contributors

@tosemml made their first contribution in #4262
@Crispig made their first contribution in #2548
@NinoRisteski made their first contribution in #4284
@codedecde made their first contribution in #2445
@YudiZh made their first contribution in #4265

Full Changelog: v0.10.2...v0.10.3

Contributors

clumsy, cmikeh2, and 14 other contributors

Assets 2

31 Aug 17:38

jeffra

v0.10.2

462def4

v0.10.2: Patch release

What's Changed

MP ZeRO++ by @HeyangQin in #3954
do allgather only in shared optimizer states groups by @inkcherry in #4167
Permit empty environment variables as unset in setup.py by @loadams in #4185
enable autoTP for mpt in huggingface model hub without trust_remote_c… by @sywangyi in #4062
Fix nv-nightly workflow by @mrwyattii in #4163
Fix the path in tutorial by @kytimmylai in #4193
Add unit test to check HF low_cpu_mem_usage_flag by @loadams in #4184
Fix ZeRO parameter initialization for tensors with requires_grad=True by @XuehaiPan in #4138
DeepSpeed Ulysses tutorial by @minjiaz in #4200
Load z3 checkpoints for inference by @tjruwase in #4171
DeepSpeed Ulysses release by @samadejacobs in #4198
Deepspeed-Ulysses blog by @samadejacobs in #4201
Ds ulysses news by @samadejacobs in #4202
DS-Ulysses formating by @samadejacobs in #4204
Update Ulyssess by @samadejacobs in #4205
Update README.md by @samadejacobs in #4211
Add Japanese blog of DS-Ulysses by @tohtana in #4209
DeepSpeed Ulysses Chinese blog translation by @HeyangQin in #4210
add ulysses blog index by @conglongli in #4215
Add MuP optimizers by @mrwyattii in #2043
Simplify Gradient Attribute Names by @jomayeri in #4214
add meta onDevice support for LLAMA2 by @dc3671 in #4147
Fixes timer error referenced in #4212 by @bjoernpl in #4213
Fix pipline dataloader when batch elements contain tuple by @ghosthamlet in #565
feat(activation_checkpointing): add non_reentrant_checkpoint to support inputs require no grad by @hughpu in #4118
add npu support dtypes by @CurryRice233 in #4223
Fix fused qkv sizing for bloom by @molly-smith in #4161
added port argument for ssh by @Hiromasa-H in #4117
Empty tensor size check by @jomayeri in #4186
fix: linker issues in conda environments #3929 by @maximegmd in #4235
Enable AMD MI200 and H100 to run on branches for testing by @loadams in #4238
fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer by @dc3671 in #4236
Enable hpz when running with torch.no_grad by @HeyangQin in #4232

New Contributors

@kytimmylai made their first contribution in #4193
@bjoernpl made their first contribution in #4213
@Hiromasa-H made their first contribution in #4117
@maximegmd made their first contribution in #4235

Full Changelog: v0.10.1...v0.10.2

Contributors

maximegmd, ghosthamlet, and 19 other contributors

Assets 2

18 Aug 22:29

jeffra

v0.10.1

46d859a

v0.10.1: Patch release

What's Changed

[docs] add zero++ paper link by @jeffra in #3974
Avoid race condition with port selection in unit tests by @mrwyattii in #3975
Remove duplicated inference unit tests by @mrwyattii in #3951
Switch to torch.linalg.norm by @loadams in #3984
Simplify chain comparisons, remove redundant parentheses by @digger-yu in #3912
[CPU] Support HBM flatmode and fakenuma mode by @delock in #3918
Fix checkpoint conversion when model layers share weights by @awaelchli in #3825
fixing flops profiler formatting, units and precision by @clumsy in #3927
Specify language=python in pre-commit hook by @wangruohui in #3994
[CPU] Skip CPU support unimplemented error by @Yejing-Lai in #3633
ZeRO Gradient Accumulation Dtype. by @jomayeri in #2847
[CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) by @delock in #3919
Re-enable skipped unit tests by @mrwyattii in #3939
Make AMD/ROCm apex install to /blob to save test/compile time. by @loadams in #3997
Option to exclude frozen weights for checkpoint save by @tjruwase in #3953
Allow user to select name of .deepspeed_env by @loadams in #4006
Silence backend warning by @mrwyattii in #4009
Fix user arg parsing in single node deployment by @mrwyattii in #4007
Specify triton 2.0.0 requirement by @mrwyattii in #4008
Re-enable elastic training for torch 2+ by @loadams in #4010
add /dev/shm size to ds_report by @jeffra in #4015
Make Ascend NPU available by @hipudding in #3831
RNNprofiler: fix gates size retrieval logic in _rnn_flops by @pinstripe-potoroo in #3921
fix typo in SECURITY.md by @jstan327 in #4019
add llama2 autoTP support in replace_module by @dc3671 in #4022
[zero_to_fp32] 3x less cpu memory requirements by @stas00 in #4025
[CPU] FusedAdam and CPU training support by @delock in #3991
remove duplicate check for pp and zero stage by @inkcherry in #4033
Pass missing positional arguments in DeepSpeedHybridEngine.generate() by @XuehaiPan in #4026
Remove print of weight parameter in RMS norm by @puneeshkhanna in #4031
Monitored Loss Calculations by @jomayeri in #4030
fix(pipe): make pipe module load_state_dir non-strict-mode work by @hughpu in #4020
polishing timers and log_dist by @clumsy in #3996
Engine side fix for loading llama checkpoint fine-tuned with zero3 by @minjiaz in #3981
fix: Remove duplicate word the by @digger-yu in #4051
[Bug Fix] Fix comm logging for inference by @delock in #4043
fix opt-350m shard loading issue in AutoTP by @sywangyi in #3600
enable autoTP for MPT by @sywangyi in #3861
autoTP for fused qkv weight by @inkcherry in #3844
[CPU] Faster reduce kernel for SHM allreduce by @delock in #4049
Multiple zero stage 3 related fixes by @tjruwase in #3886
Fix deadlock when SHM based allreduce spin too fast by @delock in #4048
[MiCS] [Bugfix] set self.save_non_zero_checkpoint=True only for first partition group by @zarzen in #3787
add reproducible compilation environment by @fecet in #3943
fix: remove unnessary # punct in the second sed command by @hughpu in #4061
Refactor autoTP inference for HE by @molly-smith in #4040
Fix transformers unit tests by @mrwyattii in #4079
Fix Stable Diffusion Injection by @lekurile in #4078
Spread layers more uniformly when using partition_uniform by @marcobellagente93 in #4053
fix typo: change polciies to policies by @digger-yu in #4090
update ut/doc for glm/codegen by @inkcherry in #4057
zero_to_fp32 script adds support for tag argument by @EeyoreLee in #4089
add type checker ignore by @EeyoreLee in #4102
Fix generate config validation error on inference unit tests by @mrwyattii in #4107
use correct ckpt path when base_dir not available by @polisettyvarma in #4101
Disable z3 tracing profiler by @tjruwase in #4106
Pass correct node size for ZeRO++ by @cmikeh2 in #4085
add deepspeed chat arxiv report by @conglongli in #4110
enable pipeline checkpoint loading mode by @leiwen83 in #3629
Fix Issue 4083 by @jomayeri in #4084
Add full list of DS_BUILD_* by @loadams in #4119
Update nightly workflows to open an issue if CI fails by @loadams in #3952
Update torch1.9 tests to 1.10 to match latest accelerate. by @loadams in #4126
Handle PermissionError in os.chmod Call - Update engine.py by @M-Chris in #4139
Generalize frozen weights unit test by @tjruwase in #4140
Respect memory pinning config by @tjruwase in #4131
Remove incorrect async-io library checking code. by @loadams in #4150
Return nn.parameter type for weights and biases by @molly-smith in #4146
Fixes #4151 by @saforem2 in #4152
Handling for SIGTERM as well by @loadams in #4160
Fix CI Badges by @mrwyattii in #4162
Add DS-Chat CI workflow by @lekurile in #4127
[CPU][Bugfix] Make uid and addr_port part of SHM name in CCL backend by @delock in #4115
Add DSE branch input to nv-ds-chat by @lekurile in #4173
Pin transformers by @mrwyattii in #4174

New Contributors

@awaelchli made their first contribution in #3825
@wangruohui made their first contribution in #3994
@jstan327 made their first contribution in #4019
@XuehaiPan made their first contribution in #4026
@puneeshkhanna made their first contribution in #4031
@hughpu made their first contribution in #4020
@fecet made their first contribution in #3943
@marcobellagente93 made their first contribution in #4053
@polisettyvarma made their first contribution in #4101
@leiwen83 made their first contribution in #3629
@M-Chris made their first contribution in #4139

Full Changelog: v0.10.0...v0.10.1

Contributors

clumsy, jeffra, and 32 other contributors

Assets 2

17 Jul 17:55

jeffra

v0.10.0

f5c834a

DeepSpeed v0.10.0

New features

ZeRO++: A leap in speed for LLM and chat model training with 4X less communication[English] [中文] [日本語]
H100 support and testing w. FP8 using NVIDIA's TransformerEngine

What's Changed

Documentation for DeepSpeed Accelerator Abstraction Interface by @delock in #3184
FP8 unittest for H100 by @jomayeri in #3731
Fix apex install bugs by @loadams in #3741
Fix Autotuner get_gas_from_user_config by @straywarrior in #3664
Include cublas error details when getting cublas handle fails by @jli in #3695
fix hybrid engine mlp module by @tensor-tang in #3736
Fix output transpose dimension bugs by @loadams in #3747
remove UtilsBuilder load, use torch (un)flatten ops by @inkcherry in #3728
add Chinese Zhihu social account by @conglongli in #3755
Account for expert parameters when calculating the total number of pa… by @alito in #3720
fix ccl_backend and residual_add problems by @dc3671 in #3642
Fix url in getting-started guide (docs) by @acforvs in #3768
Update deepspeed-chat/japanese/README.md by @eltociear in #3765
Add H100 workflow and status badge. by @loadams in #3754
Add an api in deepspeed engine for adjusting micro batch size during training by @kisseternity in #3773
Prevent hangs in CI during parallel run compilation by @mrwyattii in #2844
Revert "Prevent hangs in CI during parallel run compilation" by @jeffra in #3817
[Docs] chrome://tracing is deprecated by @keyboardAnt in #3805
Support model declaration in zero.Init context by @tohtana in #3592
Update zeropp.md by @samadejacobs in #3821
Reduce Unit Test Times (Part 1) by @mrwyattii in #3829
Re-enable GPT-J unit tests and refactor inference tests by @mrwyattii in #3618
Fix racing condition in GatheredParameters by @HeyangQin in #3819
zero/mics.py: use on_accelerator instead of cuda only by @guoyejun in #3806
Disable AMD test flows in YML by @loadams in #3847
Reduce Unit Test Time (Part 2) by @mrwyattii in #3838
[profiling]add show_straggler argument to log_summary() by @delock in #3579
checking process_group before merging bucket ranges (#3521) by @clumsy in #3577
scripts/check-torchcuda.py: add checking for tensor.is_cuda by @guoyejun in #3843
Zero3 Fix allreduce optimization for extra large tensor by @hablb in #3832
[zero] revert PR #3166, it disabled grad clip for bf16 by @jeffra in #3790
Fix transpose convolution FLOPS profiler (retrieval of out_channels) by @pinstripe-potoroo in #3834
Fix LoRA Fuse/Unfuse in Hybrid Engine by @sxjscience in #3563
Update pytorch-lightning version in CI by @mrwyattii in #3882
[Docs] MMEngine has integrated deepspeed. by @HAOCHENYE in #3879
Add FALCON Auto-TP Support by @RezaYazdaniAminabadi in #3640
Update apex installation to resolve apex's pyproject.toml issues. by @loadams in #3745
Extend HE-Lora test with Z3 support + Fix/add guard in HE for Z3 by @awan-10 in #3883
Separate ZeRO3 InflightParamRegistry for train and eval by @HeyangQin in #3884
Add GPTNeoX AutoTP support by @Yejing-Lai in #3778
Fix Meta Tensor checkpoint load for BLOOM models by @lekurile in #3885
fix error :Dictionary expression not allowed in type annotation Pylance by @digger-yu in #3708
Fix rnn flop profiler to compute flops instead of macs by @pinstripe-potoroo in #3833
Update workflows for merge queue by @mrwyattii in #3892
Avoid deprecation warnings in CHECK_CUDA by @Flamefire in #3854
Silence comm.py warning by @mrwyattii in #3893
Fix a typo of global variable in comm.py by @hipudding in #3852
[ROCm] Enable TestCUDABackward::test_backward unit tests by @rraminen in #3849
[profiling][mics]Fix some issues for log_summary(). by @ys950902 in #3899
fix "undefined symbol: curandCreateGenerator" for quantizer op by @jinzhen-lin in #3846
fix memory leak with zero-3 by @jeffra in #3903
fix some typo docs/ by @digger-yu in #3917
fix: change ==NONE to is under deepspeed/ by @digger-yu in #3923
Del comment deepspeed.zero.Init() can be used as a decorator by @hipudding in #3894
Remove the param.ds_tensor from print by @HeyangQin in #3928
Reduce Unit Test Times (Part 3) by @mrwyattii in #3850
Update zero_to_fp32.py - to support deepspeed_stage_1 by @PicoCreator in #3936
[docs] add xTrimoPGLM by @jeffra in #3940
Update Nvidia docker base image by @KaiChen1008 in #3930
Fix inference tutorial docs for checkpoints by @loadams in #3955
fix Megatron-DeepSpeed links by @conglongli in #3956
skip bcast when enable pp but pp_group_size=1 by @inkcherry in #3915
Use device_name instead of device index to support other device by @hipudding in #3933
Create accelerator for apple silicon GPU Acceleration by @NripeshN in #3907
fix(cpu_accelerator): 🐛 Convert LOCAL_SIZE to integer by @javsalgar in #3971

New Contributors

@straywarrior made their first contribution in #3664
@alito made their first contribution in #3720
@acforvs made their first contribution in #3768
@keyboardAnt made their first contribution in #3805
@pinstripe-potoroo made their first contribution in #3834
@HAOCHENYE made their first contribution in #3879
@Yejing-Lai made their first contribution in #3778
@Flamefire made their first contribution in #3854
@hipudding made their first contribution in #3852
@PicoCreator made their first contribution in #3936
@KaiChen1008 made their first contribution in #3930
@NripeshN made their first contribution in #3907
@javsalgar made their first contribution in #3971

Full Changelog: v0.9.4...v0.10.0

Contributors

jli, Flamefire, and 37 other contributors

Assets 2

22 Jun 19:01

jeffra

v0.9.5

fc9e1ee

v0.9.5: Patch release

What's Changed

Documentation for DeepSpeed Accelerator Abstraction Interface by @delock in #3184
FP8 unittest for H100 by @jomayeri in #3731
Fix apex install bugs by @loadams in #3741
Fix Autotuner get_gas_from_user_config by @straywarrior in #3664
Include cublas error details when getting cublas handle fails by @jli in #3695
fix hybrid engine mlp module by @tensor-tang in #3736
Fix output transpose dimension bugs by @loadams in #3747
remove UtilsBuilder load, use torch (un)flatten ops by @inkcherry in #3728
add Chinese Zhihu social account by @conglongli in #3755
Account for expert parameters when calculating the total number of pa… by @alito in #3720
fix ccl_backend and residual_add problems by @dc3671 in #3642
Fix url in getting-started guide (docs) by @acforvs in #3768
Update deepspeed-chat/japanese/README.md by @eltociear in #3765
Add H100 workflow and status badge. by @loadams in #3754
Zero++ tutorial PR by @HeyangQin in #3783
[Fix] _conv_flops_compute when padding is a str and stride=1 by @zhiruiluo in #3169
fix interpolate flops compute by @cli99 in #3782
use Flops Profiler to test model.generate() by @CaffreyR in #2515
[zero] revert PR #3611 by @jeffra in #3786

New Contributors

@straywarrior made their first contribution in #3664
@alito made their first contribution in #3720
@acforvs made their first contribution in #3768
@zhiruiluo made their first contribution in #3169
@CaffreyR made their first contribution in #2515

Full Changelog: v0.9.4...v0.9.5

Contributors

jli, jeffra, and 15 other contributors

Assets 2

09 Jun 19:32

jeffra

v0.9.4

a65f6b9

v0.9.4: Patch release

What's Changed

[MiCS] [Fix] saving and loading model checkpoint logic for MiCS sharding by @zarzen in #3440
fix some typo by @digger-yu in #3675
Use logger in accelerator by @tjruwase in #3682
Update README to add ICS'23 paper on Tensor Parallel MoEs by @siddharth9820 in #3687
non-JIT build fix on ROCm by @rraminen in #3638
Fix local rank mismatch error when training on nodes with different number of GPUs by @byungsoo-oh in #3409
Correct world_size/backend for mpi by @abhilash1910 in #3694
Fix incorrectly formatted f string in hostfile checking by @loadams in #3698
fix typo name of hybrid engine func by @tensor-tang in #3689
Revert "fix typo name (#3689)" by @loadams in #3702
Fix gpt-j inference issue by @RezaYazdaniAminabadi in #3639
change partititon_name to partition_name by @digger-yu in #3700
Fix unit test typo in tests/unit/ops/transformer/inference by @mrwyattii in #3697
Small tweak on cuda version mismatch documentation by @jli in #3706
DeepSpeed overview in Japanese by @conglongli in #3709
zero3 performance optimizations by @hablb in #3622
Fix typo in name of hybrid engine function by @loadams in #3704
Increase tensor creator coverage by @tjruwase in #3684
[Bugfix][CPU] Remove C++ version in CPU OpBuilder by @delock in #3643
Single Node is using unreferenced pdsh kill cmd while terminating by @abhilash1910 in #3730
Update Dockerfile with newer cuda and torch. by @loadams in #3716

New Contributors

@byungsoo-oh made their first contribution in #3409
@abhilash1910 made their first contribution in #3694
@tensor-tang made their first contribution in #3689
@jli made their first contribution in #3706

Full Changelog: v0.9.3...v0.9.4

Contributors

jli, zarzen, and 13 other contributors

Assets 2

02 Jun 22:14

jeffra

v0.9.3

4559aa9

v0.9.3: Patch release

What's Changed

Enable auto TP policy for llama model by @jianan-gu in #3170
Allow users to use mis-matched CUDA versions by @mrwyattii in #3436
Hybrid Engine Refactor and Llama Inference Support by @cmikeh2 in #3425
add sharded checkpoint loading for AutoTP path to reduce the peak mem… by @sywangyi in #3102
launcher/multinode_runner.py: mapping env variables by @YizhouZ in #3372
Update automatic-tensor-parallelism.md by @sywangyi in #3198
Build: Update license in setup by @PabloEmidio in #3484
Doc corrections by @goodship1 in #3435
Fix spelling errors in comments and documents by @digger-yu in #3486
Fix spelling error in function GetMaxTokenLength() by @luliyucoordinate in #3482
Fix a type error on bf16+Pipeline Parallelism by @ys950902 in #3441
Fix spelling errors in DeepSpeed codebase by @digger-yu in #3494
fix spelling error with docs/index.md by @digger-yu in #3443
delete the line to keep user_zero_stages by @MrZhengXin in #3473
Update Inference Engine checkpoint loading + meta tensor assertions by @lekurile in #2940
fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP by @sywangyi in #3457
Add snip_momentum structured pruning which supports higher sparse ratio by @ftian1 in #3300
Update README.md by @goodship1 in #3504
Hybrid Engine Fix Llama by @lekurile in #3505
fix spelling error with deepspeed/runtime/ by @digger-yu in #3509
Skip autoTP if tp_size is 1 by @molly-smith in #3449
Changing monitor loss to aggregate loss over gradient accumulation steps by @jomayeri in #3428
change actions/checkout@v2 to v3 by @digger-yu in #3526
fix typo with docs/ by @digger-yu in #3523
Doc updates by @goodship1 in #3520
Fix bug in Hybrid Engine by @mrwyattii in #3497
Fix wrong passing of offload_optimizer_config to DeepSpeedZeRoOffload by @mmhab in #3420
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 by @YizhouZ in #2999
share inflight registry between PartitionedParameterCoordinators by @HeyangQin in #3462
Syncing FusedAdam with new Apex features by @jomayeri in #3434
fix typo in comments with deepspeed/ by @digger-yu in #3537
[ROCm] Hip headers fix by @rraminen in #3532
[CPU] Support Intel CPU inference by @delock in #3041
Clone tensors to avoid torch.save bloat by @tjruwase in #3348
Fix attribute error when loading FusedAdamBuilder() by @rraminen in #3527
fix typo by @inkcherry in #3559
Fixing bf16 test by @jomayeri in #3551
Fix Hybrid Engine for BLOOM by @lekurile in #3580
Fix op_builder against PyTorch nightly by @malfet in #3596
data efficiency bug fix, avoid invalid range step size by @conglongli in #3609
DS init should not broadcast or move zero.Init models by @tjruwase in #3611
Expose Consecutive Hysteresis to Users by @Quentin-Anthony in #3553
Align InferenceEngine to store ms in _model_times by @HolyFalafel in #3501
AISC launcher fixes by @jeffra in #3637
stage3.py: do not scale if gradient_predivide_factor is 1.0 by @guoyejun in #3630
Add Ascend NPU accelerator support by @CurryRice233 in #3595
Skip tests on docs-only changes by @mrwyattii in #3651
Update megatron.md by @wjessup in #3641
Typo Correction by @MicahZoltu in #3621
deepspeed/comm/comm.py: fix typo of warning message by @guoyejun in #3636
Fix RuntimeError when using ZeRO Stage3 with mpu: #3564 by @eggiter in #3565
Allow dict datatype for checkpoints (inference) by @mrwyattii in #3007
fix typo with deepspeed/ by @digger-yu in #3547
flops_profiler: add option recompute_fwd_factor for the case of activation c… by @guoyejun in #3362
fix typo deepspeed/runtime by @digger-yu in #3663
Refactor check_enabled root validator in DeepSpeedMonitorConfig by @bgr8 in #3616

New Contributors

@jianan-gu made their first contribution in #3170
@YizhouZ made their first contribution in #3372
@PabloEmidio made their first contribution in #3484
@luliyucoordinate made their first contribution in #3482
@ys950902 made their first contribution in #3441
@MrZhengXin made their first contribution in #3473
@ftian1 made their first contribution in #3300
@mmhab made their first contribution in #3420
@malfet made their first contribution in #3596
@HolyFalafel made their first contribution in #3501
@CurryRice233 made their first contribution in #3595
@wjessup made their first contribution in #3641
@MicahZoltu made their first contribution in #3621
@eggiter made their first contribution in #3565
@bgr8 made their first contribution in #3616

Full Changelog: v0.9.2...v0.9.3

Contributors

wjessup, jeffra, and 30 other contributors

Assets 2

03 May 17:33

jeffra

v0.9.2

e0e8085

v0.9.2: Patch release

What's Changed

MiCS implementation by @zarzen in #2964
Fix formatting by @mrwyattii in #3343
[ROCm] Hipify cooperative_groups headers by @rraminen in #3323
Diffusers 0.15.0 bug fix by @molly-smith in #3345
Print default values for DeepSpeed --help by @mrwyattii in #3347
add bf16 cuda kernel support by @dc3671 in #3092
README.md: Update MosaicML docs link by @kobindra in #3344
hybrid_engine: check tuple size when fusing lora params by @adammoody in #3311
fix mpich launcher issue in multi-node by @sywangyi in #3078
Update DS-Chat issue template by @mrwyattii in #3368
add deepspeed chat blog links, add tags by @conglongli in #3369
Fix redundant shared_params in zero_to_fp32.py by @ShijieZZZZ in #3149
fixing default communication_data_type for bfloat16_enabled and docs by @clumsy in #3370
Auto TP Tutorial with T5 Example by @molly-smith in #2962
stage_1_and_2.py: do gradient scale only for fp16 by @guoyejun in #3166
Fix memory leak in zero2 contiguous gradients by @hablb in #3306
remove megatron-lm, no longer pip installable by @jeffra in #3389
Fix pipeline module evaluation when contiguous activation checkpoin… by @hablb in #3005
doc updates by @goodship1 in #3415
Save tensors in context of memory_efficient_linear by @tohtana in #3413
Add HE support for the rest of model containers by @RezaYazdaniAminabadi in #3191
Update PyTorch Lightning/DeepSpeed examples links by @loadams in #3424
Fix PipelineEngine.eval_batch result by @nrailgun in #3316
OPT Activation Function Hotfix by @cmikeh2 in #3400
Add ZeRO 1 support to PP for BF16. by @jomayeri in #3399
[zero_to_fp32] fix shared param recovery by @stas00 in #3407
Adagrad support in ZeRO by @jomayeri in #3401
Update 2020-09-09-sparse-attention.md by @goodship1 in #3432

New Contributors

@dc3671 made their first contribution in #3092
@kobindra made their first contribution in #3344
@hablb made their first contribution in #3306
@nrailgun made their first contribution in #3316

Full Changelog: v0.9.1...v0.9.2

Contributors

kobindra, clumsy, and 20 other contributors

Assets 2

21 Apr 00:49

jeffra

v0.9.1

793c23e

v0.9.1: Patch release

What's Changed

Update DS-Chat docs for v0.9.0 by @mrwyattii in #3216
Update DeepSpeed-Chat docs with latest changes to scripts by @mrwyattii in #3219
Nested zero.Init() and dynamically defined model class by @tohtana in #2989
Update torch version check in building sparse_attn by @loadams in #3152
Fix for Stable Diffusion by @mrwyattii in #3218
[update] reference in cifar-10 by @dtunai in #3212
[fp16/doc] correct initial_scale_power default value by @stas00 in #3275
update link to PL docs by @Borda in #3237
fix typo in autotuner.py by @eltociear in #3269
improving int4 asymmetric quantization accuracy by @HeyangQin in #3190
Update install.sh by @digger-yu in #3270
Fix cupy install version detection by @mrwyattii in #3276
[ROCm] temporary workaround till __double2half support enabled in HIP by @bmedishe in #3236
Fix pydantic and autodoc_pydantic version to <2.0.0 until support is added. by @loadams in #3290
Add contribution images to readme by @digger-yu in #3282
remove torch.cuda.is_available() check when compiling ops by @jinzhen-lin in #3085
Update MI200 workflow to install apex with changes from pip by @loadams in #3294
Add pre-compiling ops test by @loadams in #3277
Update README.md by @digger-yu in #3315
Update Dockerfile to use python 3.6 specifically by @bobowwb in #3298
zero3 checkpoint frozen params by @tjruwase in #3205
Fix for dist not being initialized when constructing main config by @mrwyattii in #3324
Fix missing scale attributes for GPTJ by @cmikeh2 in #3256
Explicitly check for OPT activation function by @cmikeh2 in #3278

New Contributors

@dtunai made their first contribution in #3212
@Borda made their first contribution in #3237
@digger-yu made their first contribution in #3270
@bmedishe made their first contribution in #3236
@jinzhen-lin made their first contribution in #3085
@bobowwb made their first contribution in #3298

Full Changelog: v0.9.0...v0.9.1

Contributors

cmikeh2, tjruwase, and 11 other contributors

Assets 2

13 Apr 15:33

jeffra

v0.9.0

0b5252b

DeepSpeed v0.9.0

New features

🚀 DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales 🚀

What's Changed

[docs] add MCR-DL paper to readme/docs by @Quentin-Anthony in #3066
Several fixes to unblock CI by @loadams in #3047
Assert mp_size is factor of model dimensions by @molly-smith in #2891
[CI] follow-up fixes by @jeffra in #3072
fix return prev key and value , added strides to from_blob by @mzusman in #2828
Remove bf16 from inference config dtye enum by @molly-smith in #3010
Softmax Scheduling Cleanup by @cmikeh2 in #3046
Fix nebula in save_16bit_model issue by @FreyaRao in #3023
Allow lists by @satpalsr in #3042
Goodbye Torch 1.8 by @mrwyattii in #3082
Empty ZeRO3 partition cache by @tjruwase in #3060
pre-commit check for torch.cuda in code by @delock in #2981
Move cuda check into utils by @loadams in #3074
update yapf version and style settings by @jeffra in #3098
Fix comms benchmark import issues and support MPI/slurm launching by @Quentin-Anthony in #2932
Disable Stage 1&2 CPUAdam pathways by @mrwyattii in #3097
♻️ replace deprecated functions for communication by @mayank31398 in #2995
Make fp32 default communication data type by @tjruwase in #2970
Update DeepSpeed copyright license to Apache 2.0 by @mrwyattii in #3111
Add Full Apache License by @mrwyattii in #3119
VL MoE Blog by @yaozhewei in #3120
Update SD triton version in requirements-sd.txt by @lekurile in #3135
Fix launch issue by @tjruwase in #3137
Fix CI badges by @mrwyattii in #3138
Optimize Softmax Kernel by @molly-smith in #3112
Use generic O_DIRECT by @tjruwase in #3115
Enable autoTP for bloom by @sywangyi in #3035
[cleanup] remove pass calls where they aren't needed by @stas00 in #2826
[ci] nv-transformers-v100 - use the same torch version as transformers CI by @stas00 in #3096
Fixes code and tests skipping/asserting incorrectly on torch 2+. by @loadams in #3136
fix example symlink about DeepSpeed+AzureML by @EeyoreLee in #3127
Remove Extra Bracket by @VHellendoorn in #3101
Recover shared parameters by @ShijieZZZZ in #3033
Fix for Diffusers 0.14.0 by @molly-smith in #3142
Fix copyright check, add copyright replace script by @mrwyattii in #3141
Update curriculum-learning.md by @goodship1 in #3031
Remove benchmark code by @mrwyattii in #3157
fixing a bug in CPU Adam and Adagrad by @xiexbing in #3109
op_builder: conditionally compute relative path for hip compiled files by @adammoody in #3095
zero.Init() should pin params in GPU memory as requested by @tjruwase in #2953
deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache by @guoyejun in #2803
Add DeepSpeed-Chat Blogpost by @awan-10 in #3185
[docs] add run command for 13b by @awan-10 in #3187
add news item. by @awan-10 in #3188
DeepSpeed Chat by @tjruwase in #3186
Fix references to figures by @tohtana in #3189
Fix typo by @zhouzaida in #3183
Fix typo by @dawei-wang in #3164
Chatgpt chinese blog by @yaozhewei in #3193
Add Japanese version of ChatGPT-like pipeline blog by @tohtana in #3194
fix hero figure by @conglongli in #3199
feat: Add support for NamedTuple when sharding parameters [#3029] by @alexandervaneck in #3037
fix license badge by @conglongli in #3200
Update AMD workflows by @loadams in #3179
[CPU support] Optionally bind each rank to different cores on host by @delock in #2881

New Contributors

@mzusman made their first contribution in #2828
@FreyaRao made their first contribution in #3023
@sywangyi made their first contribution in #3035
@EeyoreLee made their first contribution in #3127
@VHellendoorn made their first contribution in #3101
@goodship1 made their first contribution in #3031
@zhouzaida made their first contribution in #3183
@dawei-wang made their first contribution in #3164
@alexandervaneck made their first contribution in #3037

Full Changelog: v0.8.3...v0.9.0

Contributors

jeffra, adammoody, and 27 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New features

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New features

What's Changed

New Contributors

Contributors

Releases: microsoft/DeepSpeed

v0.10.3: Patch release

New Features

What's Changed

New Contributors

Contributors

v0.10.2: Patch release

What's Changed

New Contributors

Contributors

v0.10.1: Patch release

What's Changed

New Contributors

Contributors

DeepSpeed v0.10.0

New features

What's Changed

New Contributors

Contributors

v0.9.5: Patch release

What's Changed

New Contributors

Contributors

v0.9.4: Patch release

What's Changed

New Contributors

Contributors

v0.9.3: Patch release

What's Changed

New Contributors

Contributors

v0.9.2: Patch release

What's Changed

New Contributors

Contributors

v0.9.1: Patch release

What's Changed

New Contributors

Contributors

DeepSpeed v0.9.0

New features

What's Changed

New Contributors

Contributors