-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some question about the svit method #14
Comments
Yes we tried pruning token in layers1~3 during some initial experiments on classification, and the performance dropped significantly. We also tested some other pruning ratios for different layers, but didn't easily find one that outperformed the default setting provided by DynamicViT and EViT. However, there is a relevant paper on pruning ratios for different layers: "DiffRate : Differentiable Compression Rate for Efficient Vision Transformers". I hope this might be helpful! |
@kaikai23 Thanks for your reply ! May I ask how did you set the token keep ratio of all layers ? I think most token pruning methods in 4th ~ 12th layer follow [k, k^2, k^3] setting. |
@kaikai23 Oh I mean the settings of 12 layers (layer 1 ~ layer 12) |
Hi, we kept all the tokens in layer 1~layer 3, and keep 70%, 70%, 70%, 49%, 49%, 49%, 34.3%, 34.3% tokens in layer 4 ~ layer 12. |
I have an interesting question: since most token pruning methods actually prune layer 4 to layer 12 of vit model, have you tried pruning the early layers of vit model (layer 1 ~ layer3) ? And how is the performance ? Hopes to get your reply, thanks !!!
The text was updated successfully, but these errors were encountered: