Support megatron 0.6 in veRL #85

Chendong98 · 2025-01-07T12:36:51Z

I am opening this PR with the hope of adding veRL support to Megatron 0.6 (although I noticed that the veRL paper seems to have already used Megatron 0.6 as the test version). From my naive perspective, I envision two possible approaches:

Communication at the parameter level.
Creating a MemoryBuffer in veRL that is fully aligned with the ParamAndGradBuffer in Megatron 0.6, and then performing broadcast and other communication operations based on this buffer.

In the current draft, when self._pp_rank == pp_rank, it directly uses the buffer defined in Megatron 0.6 (without even checking if use_distributed_optimizer is set), and communicates at the parameter level during parameter synchronization—this, of course, incurs some performance overhead.

At the very least, this approach seems feasible.

Signed-off-by: chendong-1998 <[email protected]>

PeterSH6 · 2025-01-08T06:47:25Z

It looks really nice, we'll take some time to check how to align the two buffers to accelerate the resharding process.

PeterSH6 · 2025-01-08T06:49:23Z

Another question is whether there are any issues in MCore 0.6?
If not, we may not need to patch the upstream megatron anymore.

Chendong98 added 2 commits January 7, 2025 17:29

support megatron-0.6 in verl

6f37c68

Signed-off-by: chendong-1998 <[email protected]>

small fix

9c55d46

Signed-off-by: chendong-1998 <[email protected]>

Wodswos mentioned this pull request Jan 7, 2025

Actor model didn't update correctly when upgrade megatron to core-r0.6.0 #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support megatron 0.6 in veRL #85

Support megatron 0.6 in veRL #85

Chendong98 commented Jan 7, 2025

PeterSH6 commented Jan 8, 2025

PeterSH6 commented Jan 8, 2025

Support megatron 0.6 in veRL #85

Are you sure you want to change the base?

Support megatron 0.6 in veRL #85

Conversation

Chendong98 commented Jan 7, 2025

PeterSH6 commented Jan 8, 2025

PeterSH6 commented Jan 8, 2025