What is difference of torch.quantization and onnx.quantization for speed and accuracy ? #114

demuxin · 2024-12-11T08:17:26Z

For the effect of quantization (speed and accuracy), which quantization method is recommended?

cjluo-omniml · 2025-01-02T18:02:08Z

Ideally they should yield to the same accuracy. Speed-wise:

The torch.quantization: torch model->quantized torch model->quantized onnx->TRT
The onnx.quantization: torch model->onnx model-> quantized onnx->TRT

It really depends on how compatible the quantized torch->onnx conversion with the TRT you use. In most cases, the onnx.quantization may have better perf because we baby sit the onnx dqd node insertion while we cannot do it if we quantize in torch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is difference of torch.quantization and onnx.quantization for speed and accuracy ? #114

What is difference of torch.quantization and onnx.quantization for speed and accuracy ? #114

demuxin commented Dec 11, 2024

cjluo-omniml commented Jan 2, 2025

What is difference of torch.quantization and onnx.quantization for speed and accuracy ? #114

What is difference of torch.quantization and onnx.quantization for speed and accuracy ? #114

Comments

demuxin commented Dec 11, 2024

cjluo-omniml commented Jan 2, 2025