Skip to content

Commit

Permalink
deepspeed-chat: fix incorrect lr when using lora only (#756)
Browse files Browse the repository at this point in the history
When using lora only, get_optimizer_grouped_parameters() returns a list of 3
parameter groups, where only the second is not empty.
Then, deepspeed removes empty parameter groups.
[ref: DeepSpeedEngine._configure_optimizer() deepspeed v0.10.3]
However, the lr_scheduler still contains 3 groups.
This causes the lr scheduler to update the lora params with the wrong lr.

Fix it by removing all empty groups in get_optimizer_grouped_parameters().

Change-Id: I520841312bdedd6a572cf4c827e0bbf06f983575

Signed-off-by: Moshe Island <[email protected]>
Co-authored-by: Moshe Island <[email protected]>
  • Loading branch information
mosheisland and mosheisland authored Oct 3, 2023
1 parent 4bf1924 commit ca03bd7
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions applications/DeepSpeed-Chat/training/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,9 +209,12 @@ def get_optimizer_grouped_parameters(
0.0,
},
]
if not optimizer_grouped_parameters[1]["params"]:
optimizer_grouped_parameters.pop(1)
return optimizer_grouped_parameters

non_empty_groups = []
for group in optimizer_grouped_parameters:
if group["params"]:
non_empty_groups.append(group)
return non_empty_groups


def _z3_params_to_fetch(param_list):
Expand Down

0 comments on commit ca03bd7

Please sign in to comment.