11 implement moe #116

AbasKhan · 2024-04-29T09:43:49Z

No description provided.

One currently failing.

lllAlexanderlll

Great work! Well done! :)
For me, the tests run through locally, but github actions seems to have a problem with installing modalities for the tests.
When running pre-commit run --all-files locally, two files will be altered. This should get fixed, please.

A clarification question:
In this initial MoE implementation, we could face inefficient training where only a few lucky experts get trained intensively, as there is no auxiliary loss, expert capacity or similar technique implemented, yet, right?
E.g. as mentioned here https://huggingface.co/blog/moe#load-balancing-tokens-for-moes

lllAlexanderlll · 2024-05-06T08:10:14Z

src/modalities/models/gpt2/gpt2_model.py

+        # if poe_type is not PositionTypes.NOPE and RotaryTransform in [
+        #     config.type_hint.value for config in attention_config.qkv_transforms
+        # ]:
+        #     raise ValueError('It is expected to use "RotaryTransform" together with "NOPE".')


Can this be deleted?

We had commented out this part during the development, I re-added the check in the code.

lllAlexanderlll · 2024-05-06T09:12:57Z

src/modalities/nn/moe.py

+        self.layer = nn.Linear(self.hidden_size, self.moe_num_experts, bias=False)
+
+    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.LongTensor]:
+        if self.training and self.moe_jitter_eps is not None:


Would it be possible to add a small section in the README or here as part of the docs explaining what the different flags here do/ what their purpose is? I.e. jitter, normalization of expert weights and uniform expert assignment.

x.view(-1, x.shape[-1]) is the information basis for the routing. What does it translate to? It is not an aggregate of embeddings, but seems only to get rid of the first dimension by merging? As this is usually the batch_size dimension, we get a resulting Matrix of size [batch_size * context_length, hidden_size]. We do this, as we must select for each token top_k experts, so that each expert ends up with processing a bunch of contextualized embeddings. One could imagine it as a "gap text" for each expert with different, potentially overlapping gaps and non-gaps between gap texts.

I think so too, i really liked the "gap text" analogy/explanation, nice way of visualizing whats going on . Plus the first dimension i think is more being combined with the sequence length (sounds more accurate than getting rid i guess) to create a batch of sequence elements, each of which will be independently routed to the experts.

…onfig overfitting

…uced by MoE integration

mrudat-iais

The code was very readable, especially with the tests you added. We left comments regarding implementation changes.
Besides that, the MoEExperts.forward() is still unclear to us. Would be nice to go through the code there together ;-)

src/modalities/nn/moe.py

tests/nn/test_moe.py

mrudat-iais · 2024-05-15T12:39:49Z

src/modalities/nn/moe.py

+        )
+
+    def forward(self, x: torch.Tensor, top_weights: torch.Tensor, top_experts: torch.LongTensor) -> torch.Tensor:
+        bsz, q_len, hidden_size = x.shape


Suggested change

bsz, q_len, hidden_size = x.shape

batch_size, sequence_length, hidden_size = x.shape

mrudat-iais · 2024-05-15T12:57:35Z

src/modalities/nn/moe.py

+            hidden_size=hidden_size, ffn_hidden_size=ffn_hidden_size, moe_num_experts=moe_num_experts, act_fn=act_fn
+        )
+
+    def forward(self, x: torch.Tensor, top_weights: torch.Tensor, top_experts: torch.LongTensor) -> torch.Tensor:


Would be nice to go through this together. We were just debugging but did not get the fill picture of what happens here.

rrutmann · 2024-05-21T12:09:07Z

src/modalities/nn/moe.py

+    ):
+        super().__init__()
+        self.moe_num_experts = moe_num_experts
+        self.mlp = MoEExpertGLU(


Is mlp a good variable name here? It is also used in gpt2_model.py for MoEFFN

Totally agreed that name is probably not the best. We just wanted to be consistent with the names already used, as you can see here

modalities/src/modalities/models/gpt2/gpt2_model.py

Line 325 in 2b562b1

self.mlp = TransformerMLP(n_embd=n_embd, ffn_hidden=ffn_hidden, bias=bias, dropout=dropout)

.

…ling for NOPE and RotaryTransform to wok even when attention_config is not passed

…nt config files for MoE inference run

mrudat-iais

this has been reviewed as a peer review. nice work!

mali-git and others added 19 commits April 8, 2024 14:01

feat: Added initial MoE support

3585aed

refactor(moe): Minor refactorings.

6e11dd0

test(moe): Added initial tests for MoE modules.

8bf4e14

One currently failing.

refactor(moe): Minor refactorings.

e05f08f

feat(moe): Added MoE to GPT2 config.

dcd8cd3

feat: refactor and add configs

48136e8

feat: add configs

6bb1e3f

fix(moe): Fixed some pydantic errors.

68ef13f

fix(moe): Added missing MoE component imports.

3559058

fix(moe): Moved validators to correct BaseModel.

b6c9bce

feat: Updated the config to Add support for MoEBlock

cade4fb

chore: Merge branch 'main' into 11-implement-moe

e75605c

refactor(config): Streamlined handling of empty config entries.

80bfd9b

chore(config): Separated MoE config from standard lorem ipsum config.

26b8970

chore: Merge remote-tracking branch 'origin/main' into 11-implement-moe

20b1dd2

chore(config): Updated checkpointing config for MoE.

077b846

test(moe): Finished MoE tests.

a685555

feat(config): Updated lorem ipsum configs to new gpt2 block style.

aecf02a

fix: Added fixes to lorem ipsum configs

ff6960c

AbasKhan requested a review from le1nux April 29, 2024 09:43

AbasKhan self-assigned this Apr 29, 2024

AbasKhan linked an issue Apr 29, 2024 that may be closed by this pull request

Implement MoE #11

Open

lllAlexanderlll reviewed May 6, 2024

View reviewed changes

AbasKhan added 3 commits May 8, 2024 13:45

chore: Added minor fixes in the code and added config for MoE small c…

48faada

…onfig overfitting

chore: Merge remote-tracking branch 'origin/main' into 11-implement-moe

148c0ad

fix: Updated relevant config files to conform with the changes introd…

46b5337

…uced by MoE integration

mrudat-iais requested changes May 15, 2024

View reviewed changes

thomaschhh added 3 commits May 16, 2024 12:08

chore: Merge branch 'main' into 11-implement-moe

9d9ae1e

fix: remove duplicate

996c3d1

fix: adapt gradient_clipper to config_lorem_ipsum.yaml

210a480

fix: update evaluation subscriber

f093593

rrutmann reviewed May 21, 2024

View reviewed changes

BlueCrescent and others added 6 commits May 23, 2024 16:41

fix(moe): Added weight initialization for MoEExpertGLU.

2052c92

feat: merged with main

c251d7b

fix: Added minor fixes to the config files and updated exception hand…

a85dc69

…ling for NOPE and RotaryTransform to wok even when attention_config is not passed

chore: Merge branch 'main' into 11-implement-moe

c50f497

fix: Removed vocab size from validate_sizes, and added/updated releva…

6e69aa3

…nt config files for MoE inference run

fix: reverted changes made to toml file

288c0a2

AbasKhan requested a review from mrudat-iais June 4, 2024 11:03

AbasKhan added 5 commits June 17, 2024 09:54

fix: Resolved conflicts

a971c44

fix: Added fixes to the config files

113b245

fix: removed redundant try/catch

4c9ca63

fix: Added fix for rotary embedding check

2ef2898

fix: Removed activation_type check from gpt2block

e1f22b3

mrudat-iais approved these changes Jun 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11 implement moe #116

11 implement moe #116

AbasKhan commented Apr 29, 2024

lllAlexanderlll left a comment •

edited

Loading

lllAlexanderlll May 6, 2024

AbasKhan May 16, 2024 •

edited

Loading

lllAlexanderlll May 6, 2024 •

edited

Loading

AbasKhan May 6, 2024

mrudat-iais left a comment

mrudat-iais May 15, 2024

mrudat-iais May 15, 2024

rrutmann May 21, 2024

AbasKhan May 21, 2024

mrudat-iais left a comment •

edited

Loading

	bsz, q_len, hidden_size = x.shape
	batch_size, sequence_length, hidden_size = x.shape

11 implement moe #116

Are you sure you want to change the base?

11 implement moe #116

Conversation

AbasKhan commented Apr 29, 2024

lllAlexanderlll left a comment • edited Loading

Choose a reason for hiding this comment

lllAlexanderlll May 6, 2024

Choose a reason for hiding this comment

AbasKhan May 16, 2024 • edited Loading

Choose a reason for hiding this comment

lllAlexanderlll May 6, 2024 • edited Loading

Choose a reason for hiding this comment

AbasKhan May 6, 2024

Choose a reason for hiding this comment

mrudat-iais left a comment

Choose a reason for hiding this comment

mrudat-iais May 15, 2024

Choose a reason for hiding this comment

mrudat-iais May 15, 2024

Choose a reason for hiding this comment

rrutmann May 21, 2024

Choose a reason for hiding this comment

AbasKhan May 21, 2024

Choose a reason for hiding this comment

mrudat-iais left a comment • edited Loading

Choose a reason for hiding this comment

lllAlexanderlll left a comment •

edited

Loading

AbasKhan May 16, 2024 •

edited

Loading

lllAlexanderlll May 6, 2024 •

edited

Loading

mrudat-iais left a comment •

edited

Loading