Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From NVIDIA Megatron-LM for visibility #18

Open
wants to merge 3,624 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
ko3n1g and others added 28 commits November 12, 2024 08:48
ci: Disable auto-format on forks

See merge request ADLR/megatron-lm!2337
NVLM tile tag support

See merge request ADLR/megatron-lm!2311
…nks and log warning in case of mismatch.

Co-authored-by: Shanmugam Ramasamy <[email protected]>
Check common state dict consistancy across ranks and log warning in case of mismatch.

See merge request ADLR/megatron-lm!2085
Llava pp > 0 fixes

See merge request ADLR/megatron-lm!2267
Rename optimizer's model_parallel_group -> grad_stats_parallel_group.

See merge request ADLR/megatron-lm!2240
Co-authored-by: Deepak Narayanan <[email protected]>
Co-authored-by: Oliver Koenig <[email protected]>
Co-authored-by: James Shen <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Keshav Santhanam <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Add support for PyTorch FSDP-2

See merge request ADLR/megatron-lm!2150
Update simple_text_generation_controller.py

See merge request ADLR/megatron-lm!2345
…oder, encoder-decoder) to be compatible with all 3 TE backends

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: root <[email protected]>
Updating all T5 attention masks (encoder, decoder, encoder-decoder) to be compatible with all 3 TE backends

See merge request ADLR/megatron-lm!2273
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Add hierarchical cp comm group

See merge request ADLR/megatron-lm!2279
Add missing arg to save_checkpoint call

See merge request ADLR/megatron-lm!2351
NVLM example scripts

See merge request ADLR/megatron-lm!2306
ci: Re-enable llava tests

See merge request ADLR/megatron-lm!2348
ci: Retry download assets

See merge request ADLR/megatron-lm!2357
Support etp==tp when epp==0 and enforce torch ckpt-format when epp>1

See merge request ADLR/megatron-lm!2260
Matthieu Le and others added 30 commits December 23, 2024 11:19
Move mmodal evaluation code to its own folder

See merge request ADLR/megatron-lm!2491
Updating T5 codes to fix bugs

See merge request ADLR/megatron-lm!2471
ci: Add memory consumption to tests

See merge request ADLR/megatron-lm!2467
Reuse optimizer's main_params to compute param norm in a memory-efficient way

See merge request ADLR/megatron-lm!2483
Add NeMo MoE test.

See merge request ADLR/megatron-lm!2460
ci: Move most of LTS tests to nightly

See merge request ADLR/megatron-lm!2496
Video training

See merge request ADLR/megatron-lm!2500
ci: Update golden values of nightlies

See merge request ADLR/megatron-lm!2511
Make generate function only return results for newly added requests

See merge request ADLR/megatron-lm!2370
ci: Use torchrun

See merge request ADLR/megatron-lm!2507
chore: Fix local generator script

See merge request ADLR/megatron-lm!2519
Co-authored-by: William Dykas <[email protected]>
Co-authored-by: Mcore Bot <[email protected]>
Co-authored-by: William Dykas <[email protected]>
Co-authored-by: root <[email protected]>
Fix log probs output for inference

See merge request ADLR/megatron-lm!2430
Add tests for MoE models with average_in_collective=True

See merge request ADLR/megatron-lm!2489
ci: Allow running nemo-ci

See merge request ADLR/megatron-lm!2509
ci: Fail-fast on unit tests

See merge request ADLR/megatron-lm!2520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.