Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I am opening this PR with the hope of adding veRL support to Megatron 0.6 (although I noticed that the veRL paper seems to have already used Megatron 0.6 as the test version). From my naive perspective, I envision two possible approaches:
In the current draft, when
self._pp_rank == pp_rank
, it directly uses the buffer defined in Megatron 0.6 (without even checking if use_distributed_optimizer is set), and communicates at the parameter level during parameter synchronization—this, of course, incurs some performance overhead.At the very least, this approach seems feasible.