General purpose weight clustering support #1148
Replies: 1 comment
-
The resource reductions will depend on the reuse factor you choose for your model. If it's one, there is generally more freedom in how to implement weight sharing. Depending on the granularity of weight sharing, you may or may not have to add ID entries for the BRAM. For example, if you know that entire rows (or columns) in the weight matrix are shared (i.e. all the weights in the first row are the same, all the weights in the second are the same etc.), then you can just implement the summation of inputs in HLS and then multiply them with the correct weight. A similar concept to this was DSP-aware pruning: https://ieeexplore.ieee.org/abstract/document/10416091 with the hls4ml implementation of the hardware part described in #809 , which can act as the starting point for your hl4ml implementation. |
Beta Was this translation helpful? Give feedback.
-
I have a weight clustering algorithm that I would like to implement on the CNN MNIST network that can cluster similar weights into a single averaged one, such that the inputs of perception can be added altogether and multiplied once with that weight.
I would start with a TF-supported weight-sharing algorithm which I think might be a similar idea, such as, in: https://www.tensorflow.org/model_optimization/guide/clustering, but I'm not sure if this is supported directly by HLS4ML? If not, what would be the correct path to mimic the DSP resource reductions by such an algorithm on HLS? Will I have to modify the multiplications being handled in, e.g., dense_resource.h by adding a new BRAM that will hold the cluster IDs of each weight group? I'm looking for the least-effort but modestly correct way of implementing this optimization with HLS4ML.
Beta Was this translation helpful? Give feedback.
All reactions