From NVIDIA Megatron-LM for visibility #18

RaymondLi0 · 2023-01-24T20:01:13Z

No description provided.

ci: Disable auto-format on forks See merge request ADLR/megatron-lm!2337

NVLM tile tag support See merge request ADLR/megatron-lm!2311

…nks and log warning in case of mismatch. Co-authored-by: Shanmugam Ramasamy <[email protected]>

Check common state dict consistancy across ranks and log warning in case of mismatch. See merge request ADLR/megatron-lm!2085

Llava pp > 0 fixes See merge request ADLR/megatron-lm!2267

…ad_stats_parallel_group.

Rename optimizer's model_parallel_group -> grad_stats_parallel_group. See merge request ADLR/megatron-lm!2240

Co-authored-by: Deepak Narayanan <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: James Shen <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Keshav Santhanam <[email protected]> Co-authored-by: jasonwan <[email protected]>

Add support for PyTorch FSDP-2 See merge request ADLR/megatron-lm!2150

Update simple_text_generation_controller.py See merge request ADLR/megatron-lm!2345

…oder, encoder-decoder) to be compatible with all 3 TE backends Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: root <[email protected]>

Updating all T5 attention masks (encoder, decoder, encoder-decoder) to be compatible with all 3 TE backends See merge request ADLR/megatron-lm!2273

Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Add hierarchical cp comm group See merge request ADLR/megatron-lm!2279

Add missing arg to save_checkpoint call See merge request ADLR/megatron-lm!2351

NVLM example scripts See merge request ADLR/megatron-lm!2306

ci: Re-enable llava tests See merge request ADLR/megatron-lm!2348

ci: Retry download assets See merge request ADLR/megatron-lm!2357

… ckpt-format when epp>1 Co-authored-by: Jon Barker <[email protected]>

Support etp==tp when epp==0 and enforce torch ckpt-format when epp>1 See merge request ADLR/megatron-lm!2260

Move mmodal evaluation code to its own folder See merge request ADLR/megatron-lm!2491

Co-authored-by: Huy Vu2 <[email protected]>

Updating T5 codes to fix bugs See merge request ADLR/megatron-lm!2471

ci: Add memory consumption to tests See merge request ADLR/megatron-lm!2467

…m norm in a memory-efficient way

Reuse optimizer's main_params to compute param norm in a memory-efficient way See merge request ADLR/megatron-lm!2483

Co-authored-by: Oliver Koenig <[email protected]>

Add NeMo MoE test. See merge request ADLR/megatron-lm!2460

ci: Move most of LTS tests to nightly See merge request ADLR/megatron-lm!2496

Video training See merge request ADLR/megatron-lm!2500

ci: Update golden values of nightlies See merge request ADLR/megatron-lm!2511

…r newly added requests

Make generate function only return results for newly added requests See merge request ADLR/megatron-lm!2370

ci: Use torchrun See merge request ADLR/megatron-lm!2507

chore: Fix local generator script See merge request ADLR/megatron-lm!2519

Co-authored-by: William Dykas <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: William Dykas <[email protected]> Co-authored-by: root <[email protected]>

Fix log probs output for inference See merge request ADLR/megatron-lm!2430

…ective=True Co-authored-by: Oliver Koenig <[email protected]>

Add tests for MoE models with average_in_collective=True See merge request ADLR/megatron-lm!2489

ci: Allow running nemo-ci See merge request ADLR/megatron-lm!2509

ci: Fail-fast on unit tests See merge request ADLR/megatron-lm!2520

RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12

RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12

ko3n1g and others added 28 commits November 12, 2024 08:48

ADLR/megatron-lm!2337 - ci: Disable auto-format on forks

8666fdb

Merge branch 'ko3n1g/ci/fix-auto-format-forks' into 'main'

aded519

ci: Disable auto-format on forks See merge request ADLR/megatron-lm!2337

ADLR/megatron-lm!2311 - NVLM tile tag support

b94bbb4

Merge branch 'trintamaki/nvlm-tile-tag' into 'main'

0e29f58

NVLM tile tag support See merge request ADLR/megatron-lm!2311

ADLR/megatron-lm!2085 - Check common state dict consistancy across ra…

2e7030e

…nks and log warning in case of mismatch. Co-authored-by: Shanmugam Ramasamy <[email protected]>

Merge branch 'dist_common_fix' into 'main'

64cbae5

Check common state dict consistancy across ranks and log warning in case of mismatch. See merge request ADLR/megatron-lm!2085

ADLR/megatron-lm!2267 - Llava pp > 0 fixes

ff790ad

Merge branch 'trintamaki/llava-pp-fixes' into 'main'

00e76ee

Llava pp > 0 fixes See merge request ADLR/megatron-lm!2267

ADLR/megatron-lm!2240 - Rename optimizer's model_parallel_group -> gr…

26b8b64

…ad_stats_parallel_group.

Merge branch 'lmcafee/distopt-doc-oct24' into 'main'

ae9c141

Rename optimizer's model_parallel_group -> grad_stats_parallel_group. See merge request ADLR/megatron-lm!2240

Merge branch 'boxiangw/fsdp2' into 'main'

4c4215f

Add support for PyTorch FSDP-2 See merge request ADLR/megatron-lm!2150

ADLR/megatron-lm!2345 - Update simple_text_generation_controller.py

229e225

Merge branch 'shanmugamr-main-patch-24278' into 'main'

8e22e5b

Update simple_text_generation_controller.py See merge request ADLR/megatron-lm!2345

ADLR/megatron-lm!2273 - Updating all T5 attention masks (encoder, dec…

c1728c1

…oder, encoder-decoder) to be compatible with all 3 TE backends Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'huvu/update_t5_attentionmasktype' into 'main'

2163865

Updating all T5 attention masks (encoder, decoder, encoder-decoder) to be compatible with all 3 TE backends See merge request ADLR/megatron-lm!2273

ADLR/megatron-lm!2279 - Add hierarchical cp comm group

645c329

Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'add_hierarchical_cp_comm_group' into 'main'

2bdc60c

Add hierarchical cp comm group See merge request ADLR/megatron-lm!2279

ADLR/megatron-lm!2351 - Add missing arg to save_checkpoint call

8b72751

Merge branch 'jbarker-main-patch-72619' into 'main'

63b8520

Add missing arg to save_checkpoint call See merge request ADLR/megatron-lm!2351

ADLR/megatron-lm!2306 - NVLM example scripts

4131b07

Merge branch 'trintamaki/nvlm-example-scripts' into 'main'

ce507ee

NVLM example scripts See merge request ADLR/megatron-lm!2306

ADLR/megatron-lm!2348 - ci: Re-enable llava tests

9e9d4f5

Merge branch 'ko3n1g/ci/re-enable-mm-tests' into 'main'

6c88bfc

ci: Re-enable llava tests See merge request ADLR/megatron-lm!2348

ADLR/megatron-lm!2357 - ci: Retry download assets

06c67b4

Merge branch 'ko3n1g/ci/retry-download' into 'main'

5438d15

ci: Retry download assets See merge request ADLR/megatron-lm!2357

ADLR/megatron-lm!2260 - Support etp==tp when epp==0 and enforce torch…

57ed924

… ckpt-format when epp>1 Co-authored-by: Jon Barker <[email protected]>

Merge branch 'jbarker/etp_equals_tp' into 'main'

0f389f2

Support etp==tp when epp==0 and enforce torch ckpt-format when epp>1 See merge request ADLR/megatron-lm!2260

Matthieu Le and others added 30 commits December 23, 2024 11:19

ADLR/megatron-lm!2491 - Move mmodal evaluation code to its own folder

e51a3ac

Merge branch 'mmodal_eval_in_folder' into 'main'

2da43ef

Move mmodal evaluation code to its own folder See merge request ADLR/megatron-lm!2491

ADLR/megatron-lm!2471 - Updating T5 codes to fix bugs

48103f4

Co-authored-by: Huy Vu2 <[email protected]>

Merge branch 'huvu/t5_fixes_updates' into 'main'

076972e

Updating T5 codes to fix bugs See merge request ADLR/megatron-lm!2471

ADLR/megatron-lm!2467 - ci: Add memory consumption to tests

9238a5e

Merge branch 'ko3n1g/tests/add-memory-consumption' into 'main'

24e0126

ci: Add memory consumption to tests See merge request ADLR/megatron-lm!2467

ADLR/megatron-lm!2483 - Reuse optimizer's main_params to compute para…

079dc66

…m norm in a memory-efficient way

Merge branch 'dnarayanan/fix_param_norm_memory_main' into 'main'

f682bd0

Reuse optimizer's main_params to compute param norm in a memory-efficient way See merge request ADLR/megatron-lm!2483

ADLR/megatron-lm!2460 - Add NeMo MoE test.

a6ba070

Co-authored-by: Oliver Koenig <[email protected]>

Merge branch 'denliu/moe_nemo_test' into 'main'

30ffe88

Add NeMo MoE test. See merge request ADLR/megatron-lm!2460

ADLR/megatron-lm!2496 - ci: Move most of LTS tests to nightly

47b8470

Merge branch 'ko3n1g/ci/prune-tests' into 'main'

2d7c521

ci: Move most of LTS tests to nightly See merge request ADLR/megatron-lm!2496

ADLR/megatron-lm!2500 - Video training

82a6dfd

Merge branch 'video_training' into 'main'

86e5481

Video training See merge request ADLR/megatron-lm!2500

ADLR/megatron-lm!2511 - ci: Update golden values of nightlies

c383fe9

Merge branch 'ko3n1g/ci/update-nightlies' into 'main'

15517f6

ci: Update golden values of nightlies See merge request ADLR/megatron-lm!2511

ADLR/megatron-lm!2370 - Make generate function only return results fo…

342e359

…r newly added requests

Merge branch 'generate_fix' into 'main'

df28200

Make generate function only return results for newly added requests See merge request ADLR/megatron-lm!2370

ADLR/megatron-lm!2507 - ci: Use torchrun

6e09dd4

Merge branch 'ko3n1g/ci/use-torchrun' into 'main'

ab171c5

ci: Use torchrun See merge request ADLR/megatron-lm!2507

ADLR/megatron-lm!2519 - chore: Fix local generator script

c8d12e6

Merge branch 'ko3n1g/chore/fix-local-generator-script' into 'main'

65720c8

chore: Fix local generator script See merge request ADLR/megatron-lm!2519

ADLR/megatron-lm!2430 - Fix log probs output for inference

5ff34d0

Co-authored-by: William Dykas <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: William Dykas <[email protected]> Co-authored-by: root <[email protected]>

Merge branch 'wdykas/fix-logprobs' into 'main'

4dc8977

Fix log probs output for inference See merge request ADLR/megatron-lm!2430

ADLR/megatron-lm!2489 - Add tests for MoE models with average_in_coll…

c99a5fe

…ective=True Co-authored-by: Oliver Koenig <[email protected]>

Merge branch 'add_test_for_average_in_collective_ddp' into 'main'

ad41174

Add tests for MoE models with average_in_collective=True See merge request ADLR/megatron-lm!2489

ADLR/megatron-lm!2509 - ci: Allow running nemo-ci

6ce0da5

Merge branch 'ko3n1g/ci/run-nemo-ci' into 'main'

05780f3

ci: Allow running nemo-ci See merge request ADLR/megatron-lm!2509

ADLR/megatron-lm!2520 - ci: Fail-fast on unit tests

9220838

Merge branch 'ko3n1g/ci/fail-fast-unit-tests' into 'main'

1ce944c

ci: Fail-fast on unit tests See merge request ADLR/megatron-lm!2520

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

From NVIDIA Megatron-LM for visibility #18

From NVIDIA Megatron-LM for visibility #18

RaymondLi0 commented Jan 24, 2023

From NVIDIA Megatron-LM for visibility #18

Are you sure you want to change the base?

From NVIDIA Megatron-LM for visibility #18

Conversation

RaymondLi0 commented Jan 24, 2023