Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]是否支持 deepseek v2 (moe)系列的 一系列转换和训练 #166

Closed
yiyepiaoling0715 opened this issue Dec 3, 2024 · 4 comments

Comments

@yiyepiaoling0715
Copy link

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.
文档里边仅看到llama系列的支持,想问下 deepsek v2 这种 moe系列的,是否存在问题

@haolin-nju
Copy link
Collaborator

目前,chatlearn不支持deepseek v2 moe模型的转换和训练。您可以参考支持Mixtral模型训练的WIP PR #95 。deepseek v2 moe模型的支持将会在mixtral模型适配后列入开发计划。同时我们也欢迎社区贡献支持deepseek v2 moe模型sft、reward和alignment训练的相关PR。

@yiyepiaoling0715
Copy link
Author

目前,chatlearn不支持deepseek v2 moe模型的转换和训练。您可以参考支持Mixtral模型训练的WIP PR #95 。deepseek v2 moe模型的支持将会在mixtral模型适配后列入开发计划。同时我们也欢迎社区贡献支持deepseek v2 moe模型sft、reward和alignment训练的相关PR。

相比mixtral,主要开发点在于哪里,哪些维度需要做适配修改

@haolin-nju
Copy link
Collaborator

haolin-nju commented Dec 5, 2024

目前,chatlearn不支持deepseek v2 moe模型的转换和训练。您可以参考支持Mixtral模型训练的WIP PR #95 。deepseek v2 moe模型的支持将会在mixtral模型适配后列入开发计划。同时我们也欢迎社区贡献支持deepseek v2 moe模型sft、reward和alignment训练的相关PR。

相比mixtral,主要开发点在于哪里,哪些维度需要做适配修改

例如,deepseek v2有MLA、shared experts等mixtral模型没有的module,而因为vllm端tensor排布和megatron不同,所以如何将这些module同步到vllm模型上,并支持训练端和推理端不同并行策略的参数同步就是其中一个开发点

@haolin-nju
Copy link
Collaborator

Closing this issue due to a long period of inactivity. If there are any other problems, feel free to reopen the issue at any time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants