-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepspeed-chat: filter stage3 too long prompts #782
deepspeed-chat: filter stage3 too long prompts #782
Conversation
Hi @mosheisland, Thank you for the contribution! The reasoning behind this PR makes sense to me, but I'd be curious on the implications of training convergence for the OPT model. We've previously done a full OPT Step 3 sweep across various configurations and found training to converge across all tested cases: We have a sweeping script to make characterization simple: I'd be very curious on the implications of this prompt change on the convergence. I don't expect this to necessarily negatively affect it, but probably worth running nonetheless. Thanks, |
In case stage3 prompts are too long, the prompts are used but they are arbitrary sliced at start to fit into the configured max prompt length. This arbitrary slicing sometimes causes prompts to be less meaningful. Which in turn, causes the generator to generate garbage. This phenomena was observed to de-stabilize RLHF stage3. To fix it, we filter prompts that are too long. In addition, dataset rebuild flag is propagated to other required consumers. Note that since generated dataset are cached in disk, this commit will have effect only if we cleanup step3 cached datasets. Change-Id: I440f09decf0784e4c2c8167a893006dff312281b Signed-off-by: Moshe Island <[email protected]>
06a94f8
to
ad55abc
Compare
Hi @lekurile, I have run the sweep with and without this PR. @lekurile, please note that currently the sweep test is broken due to commit: Please also note that the number of training steps with this PR is smaller than without. DeepSpeed versions used for testing: Instead of 16xV100, I used 8xA100-80G. RESULTS Following are GENERIC sweep EMA reward results:
Following are MPL sweep EMA reward results:
Following are tensor board images of GENERIC sweep runs before-this-PR and with-this-PR: Following are tensor board images of MPL sweep runs before-this-PR and with-this-PR: |
Thank for the amazing work @mosheisland! Appreciate the thoroughness. The data looks good and shows that the training is still very stable. I'll approve the PR and run the tests. Thanks, |
In case stage3 prompts are too long, the prompts are used but they are arbitrary sliced at start to fit into the configured max prompt length. This arbitrary slicing sometimes causes prompts to be less meaningful. Which in turn, causes the generator to generate garbage. This phenomena was observed to de-stabilize RLHF stage3. To fix it, we filter prompts that are too long.
In addition, dataset rebuild flag is propagated to other required consumers. Note that since generated dataset are cached in disk, this commit will have effect only if we cleanup step3 cached datasets.
Change-Id: I440f09decf0784e4c2c8167a893006dff312281b