v0.6.1: Patch release #3040
hiyouga
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This patch mainly fixes #2983
In commit 9bec3c9, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the
create_optimizer_and_scheduler
method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b and 8c77b10. Thank @HideLord for helping us identify this critical bug.[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881
We have also fixed #2961 #2981 #2982 #2983 #2991 #3010
This discussion was created from the release v0.6.1: Patch release.
Beta Was this translation helpful? Give feedback.
All reactions