-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store last.ckpt as symlink when appropriate to save space #14973
Comments
This has been already proposed in #4335, but this cannot be done because the |
Thanks for the context! I'm not very familiar with |
Yes, it can be implemented. It's open to community contributions. |
I'm happy to give this a try. But to verify the best way to do it: current I'm considering changing |
And of course, if we don't want this behavior to be the default, we can introduce a flag like |
could this be revived? This is not only a problem with disc space, the way it's done now is also taking 2x longer to save the checkpoint which for 100B model could take a really long time. Additionally as I shared in #18670 there is a race condition and one could end up with 2 last files. I proposed an alternatively solution used by other frameworks which uses an actual file But my main need is to remove the overhead of saving the file 2 times, which is a significant problem, especially on a slow filesystem. Thank you! |
🚀 Feature
Currently, if I'm understanding correctly,
last.ckpt
is always stored as a separate model file (unless it's disabled, of course). But sometimes the same checkpoint already exists, for example as the best checkpoint, or as one of the top-k checkpoints. When this happens, it'd be very helpful that only a symlink is stored to reduce space usage.Motivation
Modern models could take up a lot of disk space. In the typical use case where only the best and last checkpoints are stored, this could reduce the space usage by ~half.
cc @Borda @awaelchli
The text was updated successfully, but these errors were encountered: