You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ideally they should yield to the same accuracy. Speed-wise:
The torch.quantization: torch model->quantized torch model->quantized onnx->TRT
The onnx.quantization: torch model->onnx model-> quantized onnx->TRT
It really depends on how compatible the quantized torch->onnx conversion with the TRT you use. In most cases, the onnx.quantization may have better perf because we baby sit the onnx dqd node insertion while we cannot do it if we quantize in torch.
For the effect of quantization (speed and accuracy), which quantization method is recommended?
The text was updated successfully, but these errors were encountered: