DeepSpeed v0.9.0
New features
What's Changed
- [docs] add MCR-DL paper to readme/docs by @Quentin-Anthony in #3066
- Several fixes to unblock CI by @loadams in #3047
- Assert mp_size is factor of model dimensions by @molly-smith in #2891
- [CI] follow-up fixes by @jeffra in #3072
- fix return prev key and value , added strides to from_blob by @mzusman in #2828
- Remove bf16 from inference config dtye enum by @molly-smith in #3010
- Softmax Scheduling Cleanup by @cmikeh2 in #3046
- Fix nebula in save_16bit_model issue by @FreyaRao in #3023
- Allow lists by @satpalsr in #3042
- Goodbye Torch 1.8 by @mrwyattii in #3082
- Empty ZeRO3 partition cache by @tjruwase in #3060
- pre-commit check for torch.cuda in code by @delock in #2981
- Move cuda check into utils by @loadams in #3074
- update yapf version and style settings by @jeffra in #3098
- Fix comms benchmark import issues and support MPI/slurm launching by @Quentin-Anthony in #2932
- Disable Stage 1&2 CPUAdam pathways by @mrwyattii in #3097
- ♻️ replace deprecated functions for communication by @mayank31398 in #2995
- Make fp32 default communication data type by @tjruwase in #2970
- Update DeepSpeed copyright license to Apache 2.0 by @mrwyattii in #3111
- Add Full Apache License by @mrwyattii in #3119
- VL MoE Blog by @yaozhewei in #3120
- Update SD triton version in requirements-sd.txt by @lekurile in #3135
- Fix launch issue by @tjruwase in #3137
- Fix CI badges by @mrwyattii in #3138
- Optimize Softmax Kernel by @molly-smith in #3112
- Use generic O_DIRECT by @tjruwase in #3115
- Enable autoTP for bloom by @sywangyi in #3035
- [cleanup] remove
pass
calls where they aren't needed by @stas00 in #2826 - [ci]
nv-transformers-v100
- use the same torch version as transformers CI by @stas00 in #3096 - Fixes code and tests skipping/asserting incorrectly on torch 2+. by @loadams in #3136
- fix example symlink about DeepSpeed+AzureML by @EeyoreLee in #3127
- Remove Extra Bracket by @VHellendoorn in #3101
- Recover shared parameters by @ShijieZZZZ in #3033
- Fix for Diffusers 0.14.0 by @molly-smith in #3142
- Fix copyright check, add copyright replace script by @mrwyattii in #3141
- Update curriculum-learning.md by @goodship1 in #3031
- Remove benchmark code by @mrwyattii in #3157
- fixing a bug in CPU Adam and Adagrad by @xiexbing in #3109
- op_builder: conditionally compute relative path for hip compiled files by @adammoody in #3095
- zero.Init() should pin params in GPU memory as requested by @tjruwase in #2953
- deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache by @guoyejun in #2803
- Add DeepSpeed-Chat Blogpost by @awan-10 in #3185
- [docs] add run command for 13b by @awan-10 in #3187
- add news item. by @awan-10 in #3188
- DeepSpeed Chat by @tjruwase in #3186
- Fix references to figures by @tohtana in #3189
- Fix typo by @zhouzaida in #3183
- Fix typo by @dawei-wang in #3164
- Chatgpt chinese blog by @yaozhewei in #3193
- Add Japanese version of ChatGPT-like pipeline blog by @tohtana in #3194
- fix hero figure by @conglongli in #3199
- feat: Add support for
NamedTuple
when sharding parameters [#3029] by @alexandervaneck in #3037 - fix license badge by @conglongli in #3200
- Update AMD workflows by @loadams in #3179
- [CPU support] Optionally bind each rank to different cores on host by @delock in #2881
New Contributors
- @mzusman made their first contribution in #2828
- @FreyaRao made their first contribution in #3023
- @sywangyi made their first contribution in #3035
- @EeyoreLee made their first contribution in #3127
- @VHellendoorn made their first contribution in #3101
- @goodship1 made their first contribution in #3031
- @zhouzaida made their first contribution in #3183
- @dawei-wang made their first contribution in #3164
- @alexandervaneck made their first contribution in #3037
Full Changelog: v0.8.3...v0.9.0