-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]是否支持 deepseek v2 (moe)系列的 一系列转换和训练 #166
Comments
目前,chatlearn不支持deepseek v2 moe模型的转换和训练。您可以参考支持Mixtral模型训练的WIP PR #95 。deepseek v2 moe模型的支持将会在mixtral模型适配后列入开发计划。同时我们也欢迎社区贡献支持deepseek v2 moe模型sft、reward和alignment训练的相关PR。 |
相比mixtral,主要开发点在于哪里,哪些维度需要做适配修改 |
例如,deepseek v2有MLA、shared experts等mixtral模型没有的module,而因为vllm端tensor排布和megatron不同,所以如何将这些module同步到vllm模型上,并支持训练端和推理端不同并行策略的参数同步就是其中一个开发点 |
Closing this issue due to a long period of inactivity. If there are any other problems, feel free to reopen the issue at any time :) |
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
文档里边仅看到llama系列的支持,想问下 deepsek v2 这种 moe系列的,是否存在问题
The text was updated successfully, but these errors were encountered: