Large drop in accuracy in QKeras => hls4ml conversion regardless of quant/overflow params #614

mackncheesiest · 2022-07-20T18:22:29Z

mackncheesiest
Jul 20, 2022

We're in the process of training QKeras models with fixed precision and then converting them to hls4ml (tried 0.6.0 and db943b7) models. However, we're running into issues where we can't seem to get hls4ml's model outputs to match the baseline behavior/accuracy of our QKeras model. This is quite different from what we expected as we're under the assumption that QKeras' quantization model should match hls4ml's for some selection of ap_fixed<*, *, Q, O> quantization and overflow modes

At the bottom of this post is a proof-of-concept python script that shows what we're doing via an mnist sample network (sweeping through various word/int widths and quantization/overflow modes and retraining + testing). We seem to never find a configuration where hls4ml matches QKeras over ~80% of the time (with the exception of, say, some extremely large 32 bit models). We've seen this behavior across different image classification tasks too, though, so we believe whatever issue we're having is fairly data independent.

If anyone has thoughts as to what we might be doing wrong, we'd appreciate any feedback.

import hls4ml
from tensorflow import keras
from tensorflow.keras.layers import Activation, MaxPooling2D, Flatten, Input

from qkeras.qlayers import QDense, QActivation
from qkeras.qconvolutional import QConv2D
from qkeras.quantizers import quantized_bits, quantized_relu

import numpy as np

import itertools

def mnist_dataset(one_hot = True):
    (x_train,y_train),(x_test,y_test) = keras.datasets.mnist.load_data()
    if one_hot:
        y_train = keras.utils.to_categorical(y_train)
        y_test = keras.utils.to_categorical(y_test)
    return (x_train,y_train),(x_test,y_test)

def quantizers(word_width,
               int_width,
               alpha,
               symmetric,
              ):
    quant_relu = quantized_relu(word_width)
    kernel_quant = quantized_bits( bits = word_width, 
                                   integer = int_width, 
                                   symmetric = symmetric, 
                                   alpha=alpha,
                                 )
    bias_quant = quantized_bits( bits = word_width, 
                                 integer = int_width, 
                                 symmetric = symmetric, 
                                 alpha=alpha,
                               )
    return quant_relu,kernel_quant,bias_quant

def get_network(filt_size,
                n_filt,
                n_dense,
                word_width = 12,
                int_width = 4,
                alpha = 1,
                symmetric = 1):
    quant_relu,kernel_quant,bias_quant = quantizers(word_width,int_width,
                                                    alpha,symmetric)

    # make the NN architecture
    in_layer = Input((28,28,1))
    x = in_layer
    x = QConv2D(filters=n_filt,
                kernel_size=filt_size,
                padding='valid',
                kernel_quantizer=kernel_quant,
                bias_quantizer=bias_quant,
                kernel_initializer='glorot_uniform',
                )(x)
    x = QActivation(activation=quant_relu)(x)
    x = MaxPooling2D()(x)
    x = Flatten()(x)
    x = QDense(n_dense, kernel_quantizer=kernel_quant, bias_quantizer=bias_quant)(x)
    x = QActivation(activation=quant_relu )(x)
    x = QDense(10, kernel_quantizer=kernel_quant, bias_quantizer=bias_quant)(x)
    out_layer = Activation(activation='softmax')(x)

    model = keras.models.Model(in_layer,out_layer)
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    
    return model

def test_config(word_width=12, int_width=4, quant_mode='AP_TRN', overflow_mode='AP_WRAP'):
    alpha = 1
    symmetric = 0
    proj_dir = './hls4ml_prjdir'
    fpga_part = 'xczu28dr-ffvg1517-2-e'
    
    default_precision = f'ap_fixed<{word_width}, {int_width}, {quant_mode}, {overflow_mode}>'
    
    (x_train, y_train), (x_test, y_test) = mnist_dataset()
    x_train = x_train.astype(float)
    x_test = x_test.astype(float)
    
    filt_size = 2
    n_filt = 3
    n_dense = 5
    nn = get_network(filt_size,n_filt,n_dense,word_width=word_width, int_width=int_width, symmetric=symmetric)
    
    nn.fit(x_train, y_train, epochs=5)

    config = hls4ml.utils.config_from_keras_model(nn,
                                                  granularity='name',
                                                  default_precision=default_precision,
                                                  default_reuse_factor='1')

    for layer in config['LayerName'].keys():
        config['LayerName'][layer]['Trace'] = True

    hls_model = hls4ml.converters.convert_from_keras_model(nn,
                                                           hls_config=config,
                                                           output_dir=proj_dir,
                                                           io_type='io_stream',
                                                           part=fpga_part)

    hls_model.compile()

    y_hls = hls_model.predict(x_test)
    y_nn = nn.predict(x_test)

    hls_classes = np.argmax(y_hls, axis=1)
    nn_classes = np.argmax(y_nn, axis=1)

    match_pct = sum(hls_classes == nn_classes) / len(hls_classes) * 100.0
    print(f"Match percent between keras and hls4ml was: {match_pct}")

if __name__ == "__main__":
    word_widths = range(8, 32+1, 4)
    int_widths = range(8, 32+1, 4)
    quantization_modes = ["AP_RND", "AP_RND_ZERO", "AP_RND_MIN_INF", "AP_RND_INF", "AP_RND_CONV", "AP_TRN", "AP_TRN_ZERO"]
    overflow_modes = ["AP_SAT", "AP_SAT_ZERO", "AP_WRAP"]

    test_configs = list(itertools.product(word_widths, int_widths, quantization_modes, overflow_modes))
    
    for config in test_configs:
        print(f"========================")
        print(f"Testing config: {config}")
        print(f"========================")
        word_width = config[0]
        int_width = config[1]
        quantization_mode = config[2]
        overflow_mode = config[3]
        
        if int_width >= word_width:
            print(f"Skipping configuration (int_width >= word_width)")
            continue
            
        test_config(word_width, int_width, quantization_mode, overflow_mode)

mackncheesiest · 2022-07-20T19:28:37Z

mackncheesiest
Jul 20, 2022
Author

To show the output we get, the results of this sweep (modified to run different configurations in parallel with a multiprocessing pool and presented as "configuration: match %") are below

The only configurations that give a 100% match between the qkeras and hls4ml models are all using 32 bit words, but for our use case that is quite excessive and will yield resource usage that is way too high.

ap_fixed<20, 12, AP_RND_INF, AP_WRAP>: 35.120000000000005%
ap_fixed<16, 8, AP_RND_MIN_INF, AP_SAT_ZERO>: 29.12%
ap_fixed<16, 8, AP_RND, AP_SAT>: 51.839999999999996%
ap_fixed<12, 8, AP_RND, AP_SAT>: 32.08%
ap_fixed<20, 8, AP_RND, AP_SAT>: 47.17%
ap_fixed<24, 8, AP_RND, AP_SAT>: 47.660000000000004%
ap_fixed<20, 12, AP_RND_CONV, AP_SAT>: 35.92%
ap_fixed<16, 8, AP_RND, AP_SAT_ZERO>: 30.4%
ap_fixed<12, 8, AP_RND, AP_SAT_ZERO>: 25.11%
ap_fixed<20, 8, AP_RND, AP_SAT_ZERO>: 33.12%
ap_fixed<24, 8, AP_RND, AP_SAT_ZERO>: 25.240000000000002%
ap_fixed<16, 8, AP_RND_MIN_INF, AP_WRAP>: 18.8%
ap_fixed<16, 8, AP_RND, AP_WRAP>: 14.96%
ap_fixed<20, 8, AP_RND, AP_WRAP>: 14.829999999999998%
ap_fixed<12, 8, AP_RND, AP_WRAP>: 20.7%
ap_fixed<20, 12, AP_RND_CONV, AP_SAT_ZERO>: 41.160000000000004%
ap_fixed<24, 8, AP_RND, AP_WRAP>: 18.04%
ap_fixed<16, 8, AP_RND_INF, AP_SAT>: 47.63%
ap_fixed<20, 12, AP_RND_CONV, AP_WRAP>: 34.39%
ap_fixed<20, 8, AP_RND_ZERO, AP_SAT>: 51.12%
ap_fixed<12, 8, AP_RND_ZERO, AP_SAT>: 37.26%
ap_fixed<24, 8, AP_RND_ZERO, AP_SAT>: 50.12%
ap_fixed<16, 8, AP_RND_ZERO, AP_SAT>: 38.95%
ap_fixed<16, 8, AP_RND_INF, AP_SAT_ZERO>: 24.92%
ap_fixed<20, 12, AP_TRN, AP_SAT>: 43.769999999999996%
ap_fixed<24, 8, AP_RND_ZERO, AP_SAT_ZERO>: 23.34%
ap_fixed<12, 8, AP_RND_ZERO, AP_SAT_ZERO>: 22.35%
ap_fixed<20, 8, AP_RND_ZERO, AP_SAT_ZERO>: 20.28%
ap_fixed<16, 8, AP_RND_ZERO, AP_SAT_ZERO>: 25.15%
ap_fixed<16, 8, AP_RND_INF, AP_WRAP>: 18.89%
ap_fixed<20, 12, AP_TRN, AP_SAT_ZERO>: 31.430000000000003%
ap_fixed<12, 8, AP_RND_ZERO, AP_WRAP>: 12.44%
ap_fixed<20, 8, AP_RND_ZERO, AP_WRAP>: 29.29%
ap_fixed<16, 8, AP_RND_ZERO, AP_WRAP>: 12.509999999999998%
ap_fixed<24, 8, AP_RND_ZERO, AP_WRAP>: 15.52%
ap_fixed<20, 12, AP_TRN, AP_WRAP>: 34.12%
ap_fixed<16, 8, AP_RND_CONV, AP_SAT>: 14.82%
ap_fixed<12, 8, AP_RND_MIN_INF, AP_SAT>: 38.68%
ap_fixed<16, 8, AP_RND_MIN_INF, AP_SAT>: 48.089999999999996%
ap_fixed<20, 8, AP_RND_MIN_INF, AP_SAT>: 54.379999999999995%
ap_fixed<24, 8, AP_RND_MIN_INF, AP_SAT>: 40.21%
ap_fixed<20, 12, AP_TRN_ZERO, AP_SAT>: 37.55%
ap_fixed<16, 8, AP_RND_CONV, AP_SAT_ZERO>: 31.66%
ap_fixed<12, 8, AP_RND_MIN_INF, AP_SAT_ZERO>: 29.28%
ap_fixed<24, 8, AP_RND_CONV, AP_WRAP>: 19.98%
ap_fixed<20, 8, AP_RND_MIN_INF, AP_SAT_ZERO>: 30.61%
ap_fixed<24, 8, AP_RND_MIN_INF, AP_SAT_ZERO>: 21.029999999999998%
ap_fixed<12, 8, AP_RND_MIN_INF, AP_WRAP>: 24.47%
ap_fixed<24, 8, AP_TRN, AP_SAT>: 42.309999999999995%
ap_fixed<20, 8, AP_RND_MIN_INF, AP_WRAP>: 35.05%
ap_fixed<16, 8, AP_RND_CONV, AP_WRAP>: 19.25%
ap_fixed<20, 12, AP_TRN_ZERO, AP_SAT_ZERO>: 32.07%
ap_fixed<24, 8, AP_RND_MIN_INF, AP_WRAP>: 14.62%
ap_fixed<12, 8, AP_RND_INF, AP_SAT>: 55.13%
ap_fixed<24, 8, AP_TRN, AP_SAT_ZERO>: 25.019999999999996%
ap_fixed<20, 12, AP_TRN_ZERO, AP_WRAP>: 32.15%
ap_fixed<20, 8, AP_RND_INF, AP_SAT>: 50.96000000000001%
ap_fixed<16, 8, AP_TRN, AP_SAT>: 51.23%
ap_fixed<24, 8, AP_RND_INF, AP_SAT>: 50.68%
ap_fixed<12, 8, AP_RND_INF, AP_SAT_ZERO>: 23.26%
ap_fixed<24, 8, AP_TRN, AP_WRAP>: 14.299999999999999%
ap_fixed<16, 8, AP_TRN, AP_SAT_ZERO>: 27.22%
ap_fixed<20, 16, AP_RND, AP_SAT>: 32.93%
ap_fixed<20, 8, AP_RND_INF, AP_SAT_ZERO>: 24.099999999999998%
ap_fixed<12, 8, AP_RND_INF, AP_WRAP>: 17.919999999999998%
ap_fixed<24, 8, AP_RND_INF, AP_SAT_ZERO>: 22.38%
ap_fixed<16, 8, AP_TRN, AP_WRAP>: 13.94%
ap_fixed<24, 8, AP_TRN_ZERO, AP_SAT>: 44.05%
ap_fixed<20, 8, AP_RND_INF, AP_WRAP>: 23.25%
ap_fixed<20, 16, AP_RND, AP_SAT_ZERO>: 26.82%
ap_fixed<24, 8, AP_RND_INF, AP_WRAP>: 20.26%
ap_fixed<12, 8, AP_RND_CONV, AP_SAT>: 42.39%
ap_fixed<24, 8, AP_TRN_ZERO, AP_SAT_ZERO>: 16.85%
ap_fixed<16, 8, AP_TRN_ZERO, AP_SAT>: 51.88%
ap_fixed<20, 8, AP_RND_CONV, AP_SAT>: 50.349999999999994%
ap_fixed<20, 16, AP_RND, AP_WRAP>: 30.4%
ap_fixed<24, 8, AP_RND_CONV, AP_SAT>: 45.95%
ap_fixed<12, 8, AP_RND_CONV, AP_SAT_ZERO>: 22.919999999999998%
ap_fixed<24, 8, AP_TRN_ZERO, AP_WRAP>: 25.11%
ap_fixed<16, 8, AP_TRN_ZERO, AP_SAT_ZERO>: 23.76%
ap_fixed<20, 8, AP_RND_CONV, AP_SAT_ZERO>: 23.77%
ap_fixed<20, 16, AP_RND_ZERO, AP_SAT>: 30.240000000000002%
ap_fixed<24, 8, AP_RND_CONV, AP_SAT_ZERO>: 30.09%
ap_fixed<16, 8, AP_TRN_ZERO, AP_WRAP>: 17.09%
ap_fixed<24, 12, AP_RND, AP_SAT>: 32.42%
ap_fixed<20, 8, AP_RND_CONV, AP_WRAP>: 35.699999999999996%
ap_fixed<12, 8, AP_RND_CONV, AP_WRAP>: 21.97%
ap_fixed<20, 16, AP_RND_ZERO, AP_SAT_ZERO>: 30.049999999999997%
ap_fixed<24, 16, AP_TRN, AP_SAT>: 37.71%
ap_fixed<16, 12, AP_RND, AP_SAT>: 21.04%
ap_fixed<20, 8, AP_TRN, AP_SAT>: 51.4%
ap_fixed<24, 12, AP_RND, AP_SAT_ZERO>: 36.88%
ap_fixed<12, 8, AP_TRN, AP_SAT>: 32.47%
ap_fixed<20, 16, AP_RND_ZERO, AP_WRAP>: 36.77%
ap_fixed<24, 16, AP_TRN, AP_SAT_ZERO>: 38.379999999999995%
ap_fixed<16, 12, AP_RND, AP_SAT_ZERO>: 30.15%
ap_fixed<24, 12, AP_RND, AP_WRAP>: 25.39%
ap_fixed<20, 8, AP_TRN, AP_SAT_ZERO>: 30.159999999999997%
ap_fixed<12, 8, AP_TRN, AP_SAT_ZERO>: 13.639999999999999%
ap_fixed<20, 16, AP_RND_MIN_INF, AP_SAT>: 26.5%
ap_fixed<24, 16, AP_TRN, AP_WRAP>: 36.4%
ap_fixed<16, 12, AP_RND, AP_WRAP>: 30.04%
ap_fixed<20, 8, AP_TRN, AP_WRAP>: 14.879999999999999%
ap_fixed<24, 12, AP_RND_ZERO, AP_SAT>: 33.33%
ap_fixed<12, 8, AP_TRN, AP_WRAP>: 18.790000000000003%
ap_fixed<20, 16, AP_RND_MIN_INF, AP_SAT_ZERO>: 32.6%
ap_fixed<16, 12, AP_RND_ZERO, AP_SAT>: 28.48%
ap_fixed<24, 16, AP_TRN_ZERO, AP_SAT>: 35.870000000000005%
ap_fixed<20, 8, AP_TRN_ZERO, AP_SAT>: 51.15%
ap_fixed<24, 12, AP_RND_ZERO, AP_SAT_ZERO>: 30.54%
ap_fixed<20, 16, AP_RND_MIN_INF, AP_WRAP>: 29.12%
ap_fixed<12, 8, AP_TRN_ZERO, AP_SAT>: 48.1%
ap_fixed<24, 16, AP_TRN_ZERO, AP_SAT_ZERO>: 36.38%
ap_fixed<16, 12, AP_RND_ZERO, AP_SAT_ZERO>: 31.929999999999996%
ap_fixed<20, 8, AP_TRN_ZERO, AP_SAT_ZERO>: 21.33%
ap_fixed<24, 12, AP_RND_ZERO, AP_WRAP>: 35.44%
ap_fixed<20, 16, AP_RND_INF, AP_SAT>: 39.68%
ap_fixed<12, 8, AP_TRN_ZERO, AP_SAT_ZERO>: 36.27%
ap_fixed<24, 16, AP_TRN_ZERO, AP_WRAP>: 38.48%
ap_fixed<20, 8, AP_TRN_ZERO, AP_WRAP>: 21.78%
ap_fixed<24, 12, AP_RND_MIN_INF, AP_SAT>: 28.93%
ap_fixed<16, 12, AP_RND_ZERO, AP_WRAP>: 40.32%
ap_fixed<20, 16, AP_RND_INF, AP_SAT_ZERO>: 38.800000000000004%
ap_fixed<12, 8, AP_TRN_ZERO, AP_WRAP>: 29.13%
ap_fixed<24, 20, AP_RND, AP_SAT>: 27.01%
ap_fixed<20, 12, AP_RND, AP_SAT>: 42.72%
ap_fixed<24, 12, AP_RND_MIN_INF, AP_SAT_ZERO>: 37.45%
ap_fixed<16, 12, AP_RND_MIN_INF, AP_SAT>: 31.28%
ap_fixed<20, 16, AP_RND_INF, AP_WRAP>: 46.57%
ap_fixed<28, 8, AP_RND, AP_SAT>: 27.060000000000002%
ap_fixed<20, 12, AP_RND, AP_SAT_ZERO>: 33.95%
ap_fixed<24, 12, AP_RND_MIN_INF, AP_WRAP>: 39.98%
ap_fixed<24, 20, AP_RND, AP_SAT_ZERO>: 31.65%
ap_fixed<16, 12, AP_RND_MIN_INF, AP_SAT_ZERO>: 30.740000000000002%
ap_fixed<20, 16, AP_RND_CONV, AP_SAT>: 27.01%
ap_fixed<20, 12, AP_RND, AP_WRAP>: 38.54%
ap_fixed<24, 12, AP_RND_INF, AP_SAT>: 33.339999999999996%
ap_fixed<16, 12, AP_RND_MIN_INF, AP_WRAP>: 32.05%
ap_fixed<28, 8, AP_RND, AP_SAT_ZERO>: 24.04%
ap_fixed<24, 20, AP_RND, AP_WRAP>: 28.49%
ap_fixed<20, 16, AP_RND_CONV, AP_SAT_ZERO>: 36.7%
ap_fixed<20, 12, AP_RND_ZERO, AP_SAT>: 33.800000000000004%
ap_fixed<24, 12, AP_RND_INF, AP_SAT_ZERO>: 24.83%
ap_fixed<16, 12, AP_RND_INF, AP_SAT>: 39.18%
ap_fixed<28, 8, AP_RND, AP_WRAP>: 21.68%
ap_fixed<24, 20, AP_RND_ZERO, AP_SAT>: 35.6%
ap_fixed<20, 16, AP_RND_CONV, AP_WRAP>: 31.330000000000002%
ap_fixed<20, 12, AP_RND_ZERO, AP_SAT_ZERO>: 19.11%
ap_fixed<24, 12, AP_RND_INF, AP_WRAP>: 35.410000000000004%
ap_fixed<16, 12, AP_RND_INF, AP_SAT_ZERO>: 33.25%
ap_fixed<28, 8, AP_RND_ZERO, AP_SAT>: 25.929999999999996%
ap_fixed<20, 16, AP_TRN, AP_SAT>: 19.29%
ap_fixed<24, 20, AP_RND_ZERO, AP_SAT_ZERO>: 33.17%
ap_fixed<20, 12, AP_RND_ZERO, AP_WRAP>: 17.93%
ap_fixed<24, 12, AP_RND_CONV, AP_SAT>: 27.76%
ap_fixed<16, 12, AP_RND_INF, AP_WRAP>: 36.85%
ap_fixed<24, 20, AP_RND_ZERO, AP_WRAP>: 27.169999999999998%
ap_fixed<20, 16, AP_TRN, AP_SAT_ZERO>: 21.65%
ap_fixed<28, 8, AP_RND_ZERO, AP_SAT_ZERO>: 22.509999999999998%
ap_fixed<20, 12, AP_RND_MIN_INF, AP_SAT>: 32.7%
ap_fixed<24, 12, AP_RND_CONV, AP_SAT_ZERO>: 34.54%
ap_fixed<16, 12, AP_RND_CONV, AP_SAT>: 36.27%
ap_fixed<20, 16, AP_TRN, AP_WRAP>: 28.610000000000003%
ap_fixed<24, 20, AP_RND_MIN_INF, AP_SAT>: 23.52%
ap_fixed<28, 8, AP_RND_ZERO, AP_WRAP>: 14.63%
ap_fixed<20, 12, AP_RND_MIN_INF, AP_SAT_ZERO>: 31.080000000000002%
ap_fixed<24, 12, AP_RND_CONV, AP_WRAP>: 34.46%
ap_fixed<16, 12, AP_RND_CONV, AP_SAT_ZERO>: 27.18%
ap_fixed<20, 16, AP_TRN_ZERO, AP_SAT>: 26.810000000000002%
ap_fixed<24, 20, AP_RND_MIN_INF, AP_SAT_ZERO>: 36.3%
ap_fixed<28, 8, AP_RND_MIN_INF, AP_SAT>: 42.47%
ap_fixed<20, 12, AP_RND_MIN_INF, AP_WRAP>: 31.830000000000002%
ap_fixed<24, 12, AP_TRN, AP_SAT>: 28.68%
ap_fixed<16, 12, AP_RND_CONV, AP_WRAP>: 30.61%
ap_fixed<20, 16, AP_TRN_ZERO, AP_SAT_ZERO>: 34.33%
ap_fixed<24, 20, AP_RND_MIN_INF, AP_WRAP>: 37.34%
ap_fixed<28, 8, AP_RND_MIN_INF, AP_SAT_ZERO>: 35.870000000000005%
ap_fixed<20, 12, AP_RND_INF, AP_SAT>: 38.09%
ap_fixed<24, 12, AP_TRN, AP_SAT_ZERO>: 30.34%
ap_fixed<20, 16, AP_TRN_ZERO, AP_WRAP>: 35.77%
ap_fixed<16, 12, AP_TRN, AP_SAT>: 13.420000000000002%
ap_fixed<24, 20, AP_RND_INF, AP_SAT>: 29.599999999999998%
ap_fixed<28, 8, AP_RND_MIN_INF, AP_WRAP>: 33.07%
ap_fixed<24, 12, AP_TRN, AP_WRAP>: 34.260000000000005%
ap_fixed<20, 12, AP_RND_INF, AP_SAT_ZERO>: 38.45%
ap_fixed<28, 12, AP_TRN_ZERO, AP_SAT>: 35.089999999999996%
ap_fixed<24, 20, AP_RND_INF, AP_SAT_ZERO>: 31.979999999999997%
ap_fixed<28, 8, AP_RND_INF, AP_SAT>: 47.099999999999994%
ap_fixed<16, 12, AP_TRN, AP_SAT_ZERO>: 24.709999999999997%
ap_fixed<28, 20, AP_TRN_ZERO, AP_SAT_ZERO>: 29.18%
ap_fixed<28, 12, AP_TRN_ZERO, AP_SAT_ZERO>: 41.85%
ap_fixed<24, 20, AP_RND_INF, AP_WRAP>: 34.94%
ap_fixed<28, 8, AP_RND_INF, AP_SAT_ZERO>: 29.17%
ap_fixed<16, 12, AP_TRN, AP_WRAP>: 17.69%
ap_fixed<24, 12, AP_TRN_ZERO, AP_SAT>: 32.11%
ap_fixed<28, 20, AP_TRN_ZERO, AP_WRAP>: 31.71%
ap_fixed<24, 20, AP_RND_CONV, AP_SAT>: 28.02%
ap_fixed<28, 8, AP_RND_INF, AP_WRAP>: 14.09%
ap_fixed<16, 12, AP_TRN_ZERO, AP_SAT>: 27.13%
ap_fixed<24, 12, AP_TRN_ZERO, AP_SAT_ZERO>: 43.309999999999995%
ap_fixed<28, 12, AP_TRN_ZERO, AP_WRAP>: 26.86%
ap_fixed<28, 24, AP_RND, AP_SAT>: 28.74%
ap_fixed<24, 20, AP_RND_CONV, AP_SAT_ZERO>: 35.05%
ap_fixed<28, 8, AP_RND_CONV, AP_SAT>: 34.25%
ap_fixed<16, 12, AP_TRN_ZERO, AP_SAT_ZERO>: 28.33%
ap_fixed<24, 12, AP_TRN_ZERO, AP_WRAP>: 31.85%
ap_fixed<28, 16, AP_RND, AP_SAT>: 22.650000000000002%
ap_fixed<28, 24, AP_RND, AP_SAT_ZERO>: 30.380000000000003%
ap_fixed<24, 20, AP_RND_CONV, AP_WRAP>: 28.21%
ap_fixed<28, 8, AP_RND_CONV, AP_SAT_ZERO>: 28.37%
ap_fixed<16, 12, AP_TRN_ZERO, AP_WRAP>: 25.509999999999998%
ap_fixed<24, 16, AP_RND, AP_SAT>: 32.41%
ap_fixed<28, 16, AP_RND, AP_SAT_ZERO>: 30.29%
ap_fixed<28, 24, AP_RND, AP_WRAP>: 32.22%
ap_fixed<24, 20, AP_TRN, AP_SAT>: 17.630000000000003%
ap_fixed<32, 8, AP_RND, AP_SAT>: 100.0%
ap_fixed<24, 16, AP_RND, AP_SAT_ZERO>: 34.12%
ap_fixed<28, 16, AP_RND, AP_WRAP>: 31.3%
ap_fixed<28, 8, AP_RND_CONV, AP_WRAP>: 27.689999999999998%
ap_fixed<28, 24, AP_RND_ZERO, AP_SAT>: 32.32%
ap_fixed<24, 20, AP_TRN, AP_SAT_ZERO>: 29.549999999999997%
ap_fixed<24, 16, AP_RND, AP_WRAP>: 25.119999999999997%
ap_fixed<32, 8, AP_RND, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 8, AP_TRN, AP_SAT>: 47.93%
ap_fixed<28, 16, AP_RND_ZERO, AP_SAT>: 37.4%
ap_fixed<28, 24, AP_RND_ZERO, AP_SAT_ZERO>: 32.690000000000005%
ap_fixed<32, 8, AP_RND, AP_WRAP>: 100.0%
ap_fixed<24, 16, AP_RND_ZERO, AP_SAT>: 31.009999999999998%
ap_fixed<24, 20, AP_TRN, AP_WRAP>: 21.39%
ap_fixed<28, 8, AP_TRN, AP_SAT_ZERO>: 17.54%
ap_fixed<28, 16, AP_RND_ZERO, AP_SAT_ZERO>: 30.669999999999998%
ap_fixed<28, 24, AP_RND_ZERO, AP_WRAP>: 23.51%
ap_fixed<32, 8, AP_RND_ZERO, AP_SAT>: 100.0%
ap_fixed<24, 16, AP_RND_ZERO, AP_SAT_ZERO>: 28.48%
ap_fixed<24, 20, AP_TRN_ZERO, AP_SAT>: 35.25%
ap_fixed<28, 8, AP_TRN, AP_WRAP>: 13.74%
ap_fixed<28, 16, AP_RND_ZERO, AP_WRAP>: 27.650000000000002%
ap_fixed<24, 16, AP_RND_ZERO, AP_WRAP>: 26.950000000000003%
ap_fixed<24, 20, AP_TRN_ZERO, AP_SAT_ZERO>: 32.379999999999995%
ap_fixed<28, 24, AP_RND_MIN_INF, AP_SAT>: 36.559999999999995%
ap_fixed<28, 8, AP_TRN_ZERO, AP_SAT>: 45.94%
ap_fixed<32, 8, AP_RND_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 16, AP_RND_MIN_INF, AP_SAT>: 41.97%
ap_fixed<24, 16, AP_RND_MIN_INF, AP_SAT>: 35.96%
ap_fixed<24, 20, AP_TRN_ZERO, AP_WRAP>: 33.269999999999996%
ap_fixed<28, 24, AP_RND_MIN_INF, AP_SAT_ZERO>: 26.47%
ap_fixed<28, 8, AP_TRN_ZERO, AP_SAT_ZERO>: 26.58%
ap_fixed<32, 8, AP_RND_ZERO, AP_WRAP>: 100.0%
ap_fixed<28, 16, AP_RND_MIN_INF, AP_SAT_ZERO>: 34.410000000000004%
ap_fixed<24, 16, AP_RND_MIN_INF, AP_SAT_ZERO>: 18.12%
ap_fixed<32, 12, AP_RND, AP_SAT>: 100.0%
ap_fixed<28, 24, AP_RND_MIN_INF, AP_WRAP>: 20.919999999999998%
ap_fixed<28, 8, AP_TRN_ZERO, AP_WRAP>: 13.420000000000002%
ap_fixed<32, 8, AP_RND_MIN_INF, AP_SAT>: 100.0%
ap_fixed<28, 16, AP_RND_MIN_INF, AP_WRAP>: 28.849999999999998%
ap_fixed<24, 16, AP_RND_MIN_INF, AP_WRAP>: 30.5%
ap_fixed<32, 12, AP_RND, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 24, AP_RND_INF, AP_SAT>: 24.529999999999998%
ap_fixed<28, 12, AP_RND, AP_SAT>: 32.11%
ap_fixed<28, 16, AP_RND_INF, AP_SAT>: 34.63%
ap_fixed<32, 8, AP_RND_MIN_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<24, 16, AP_RND_INF, AP_SAT>: 22.52%
ap_fixed<32, 12, AP_RND, AP_WRAP>: 100.0%
ap_fixed<28, 24, AP_RND_INF, AP_SAT_ZERO>: 33.83%
ap_fixed<28, 12, AP_RND, AP_SAT_ZERO>: 38.67%
ap_fixed<32, 8, AP_RND_MIN_INF, AP_WRAP>: 100.0%
ap_fixed<28, 16, AP_RND_INF, AP_SAT_ZERO>: 31.55%
ap_fixed<24, 16, AP_RND_INF, AP_SAT_ZERO>: 33.11%
ap_fixed<32, 12, AP_RND_ZERO, AP_SAT>: 100.0%
ap_fixed<28, 24, AP_RND_INF, AP_WRAP>: 31.25%
ap_fixed<28, 12, AP_RND, AP_WRAP>: 34.03%
ap_fixed<32, 8, AP_RND_INF, AP_SAT>: 100.0%
ap_fixed<28, 16, AP_RND_INF, AP_WRAP>: 32.06%
ap_fixed<32, 12, AP_RND_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 24, AP_RND_CONV, AP_SAT>: 31.929999999999996%
ap_fixed<24, 16, AP_RND_INF, AP_WRAP>: 28.77%
ap_fixed<28, 12, AP_RND_ZERO, AP_SAT>: 31.89%
ap_fixed<32, 8, AP_RND_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 16, AP_RND_CONV, AP_SAT>: 39.26%
ap_fixed<28, 24, AP_RND_CONV, AP_SAT_ZERO>: 22.439999999999998%
ap_fixed<32, 12, AP_RND_ZERO, AP_WRAP>: 100.0%
ap_fixed<28, 12, AP_RND_ZERO, AP_SAT_ZERO>: 28.78%
ap_fixed<32, 8, AP_RND_INF, AP_WRAP>: 100.0%
ap_fixed<24, 16, AP_RND_CONV, AP_SAT>: 25.480000000000004%
ap_fixed<28, 16, AP_RND_CONV, AP_SAT_ZERO>: 34.28%
ap_fixed<28, 24, AP_RND_CONV, AP_WRAP>: 46.97%
ap_fixed<32, 12, AP_RND_MIN_INF, AP_SAT>: 100.0%
ap_fixed<28, 12, AP_RND_ZERO, AP_WRAP>: 23.080000000000002%
ap_fixed<32, 8, AP_RND_CONV, AP_SAT>: 100.0%
ap_fixed<28, 16, AP_RND_CONV, AP_WRAP>: 29.57%
ap_fixed<24, 16, AP_RND_CONV, AP_SAT_ZERO>: 43.09%
ap_fixed<28, 24, AP_TRN, AP_SAT>: 20.979999999999997%
ap_fixed<32, 12, AP_RND_MIN_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 12, AP_RND_MIN_INF, AP_SAT>: 35.57%
ap_fixed<32, 8, AP_RND_CONV, AP_SAT_ZERO>: 100.0%
ap_fixed<24, 16, AP_RND_CONV, AP_WRAP>: 35.55%
ap_fixed<28, 16, AP_TRN, AP_SAT>: 33.78%
ap_fixed<32, 12, AP_RND_MIN_INF, AP_WRAP>: 100.0%
ap_fixed<28, 24, AP_TRN, AP_SAT_ZERO>: 26.290000000000003%
ap_fixed<28, 12, AP_RND_MIN_INF, AP_SAT_ZERO>: 35.6%
ap_fixed<32, 8, AP_RND_CONV, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_RND, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 16, AP_TRN, AP_SAT_ZERO>: 32.98%
ap_fixed<32, 12, AP_RND_INF, AP_SAT>: 100.0%
ap_fixed<28, 24, AP_TRN, AP_WRAP>: 22.53%
ap_fixed<28, 12, AP_RND_MIN_INF, AP_WRAP>: 33.98%
ap_fixed<32, 8, AP_TRN, AP_SAT>: 100.0%
ap_fixed<32, 20, AP_RND, AP_WRAP>: 100.0%
ap_fixed<28, 16, AP_TRN, AP_WRAP>: 20.18%
ap_fixed<32, 12, AP_RND_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 24, AP_TRN_ZERO, AP_SAT>: 28.03%
ap_fixed<28, 12, AP_RND_INF, AP_SAT>: 40.14%
ap_fixed<32, 8, AP_TRN, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_RND_ZERO, AP_SAT>: 100.0%
ap_fixed<32, 12, AP_RND_INF, AP_WRAP>: 100.0%
ap_fixed<28, 16, AP_TRN_ZERO, AP_SAT>: 35.97%
ap_fixed<28, 24, AP_TRN_ZERO, AP_SAT_ZERO>: 28.82%
ap_fixed<28, 12, AP_RND_INF, AP_SAT_ZERO>: 37.22%
ap_fixed<32, 8, AP_TRN, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_RND_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 24, AP_TRN_ZERO, AP_WRAP>: 35.31%
ap_fixed<32, 12, AP_RND_CONV, AP_SAT>: 100.0%
ap_fixed<28, 16, AP_TRN_ZERO, AP_SAT_ZERO>: 32.86%
ap_fixed<28, 12, AP_RND_INF, AP_WRAP>: 23.25%
ap_fixed<32, 8, AP_TRN_ZERO, AP_SAT>: 100.0%
ap_fixed<32, 20, AP_RND_ZERO, AP_WRAP>: 100.0%
ap_fixed<32, 28, AP_RND, AP_WRAP>: 100.0%
ap_fixed<28, 16, AP_TRN_ZERO, AP_WRAP>: 36.9%
ap_fixed<32, 12, AP_RND_CONV, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 12, AP_RND_CONV, AP_SAT>: 33.019999999999996%
ap_fixed<32, 8, AP_TRN_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_RND_MIN_INF, AP_SAT>: 100.0%
ap_fixed<32, 28, AP_RND_ZERO, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_RND, AP_SAT>: 32.87%
ap_fixed<32, 12, AP_RND_CONV, AP_WRAP>: 100.0%
ap_fixed<28, 12, AP_RND_CONV, AP_SAT_ZERO>: 19.580000000000002%
ap_fixed<32, 8, AP_TRN_ZERO, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_RND_MIN_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 28, AP_RND_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 12, AP_TRN, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_RND, AP_SAT_ZERO>: 38.4%
ap_fixed<28, 12, AP_RND_CONV, AP_WRAP>: 31.0%
ap_fixed<32, 28, AP_RND_ZERO, AP_WRAP>: 100.0%
ap_fixed<28, 20, AP_RND, AP_WRAP>: 32.029999999999994%
ap_fixed<32, 12, AP_TRN, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_RND_MIN_INF, AP_WRAP>: 100.0%
ap_fixed<28, 12, AP_TRN, AP_SAT>: 31.290000000000003%
ap_fixed<32, 12, AP_TRN, AP_WRAP>: 100.0%
ap_fixed<28, 20, AP_RND_ZERO, AP_SAT>: 30.79%
ap_fixed<32, 28, AP_RND_MIN_INF, AP_SAT>: 100.0%
ap_fixed<32, 20, AP_RND_INF, AP_SAT>: 100.0%
ap_fixed<28, 12, AP_TRN, AP_SAT_ZERO>: 30.55%
ap_fixed<32, 12, AP_TRN_ZERO, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_RND_ZERO, AP_SAT_ZERO>: 32.01%
ap_fixed<32, 28, AP_RND_MIN_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_RND_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 12, AP_TRN, AP_WRAP>: 33.46%
ap_fixed<28, 20, AP_RND_ZERO, AP_WRAP>: 32.37%
ap_fixed<32, 12, AP_TRN_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 28, AP_RND_MIN_INF, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_RND_INF, AP_WRAP>: 100.0%
ap_fixed<32, 12, AP_TRN_ZERO, AP_WRAP>: 100.0%
ap_fixed<28, 20, AP_RND_MIN_INF, AP_SAT>: 32.39%
ap_fixed<32, 28, AP_RND_INF, AP_SAT>: 100.0%
ap_fixed<32, 20, AP_RND_CONV, AP_SAT>: 100.0%
ap_fixed<32, 16, AP_RND, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_RND_MIN_INF, AP_SAT_ZERO>: 29.42%
ap_fixed<32, 28, AP_RND_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_RND_CONV, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 20, AP_RND_MIN_INF, AP_WRAP>: 31.269999999999996%
ap_fixed<32, 16, AP_RND, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 28, AP_RND_INF, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_RND_CONV, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_RND, AP_WRAP>: 100.0%
ap_fixed<28, 20, AP_RND_INF, AP_SAT>: 39.79%
ap_fixed<32, 28, AP_RND_CONV, AP_SAT>: 100.0%
ap_fixed<32, 20, AP_TRN, AP_SAT>: 100.0%
ap_fixed<32, 16, AP_RND_ZERO, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_RND_INF, AP_SAT_ZERO>: 33.48%
ap_fixed<32, 28, AP_RND_CONV, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_TRN, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 20, AP_RND_INF, AP_WRAP>: 23.87%
ap_fixed<32, 16, AP_RND_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 28, AP_RND_CONV, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_TRN, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_RND_ZERO, AP_WRAP>: 100.0%
ap_fixed<28, 20, AP_RND_CONV, AP_SAT>: 30.94%
ap_fixed<32, 28, AP_TRN, AP_SAT>: 100.0%
ap_fixed<32, 20, AP_TRN_ZERO, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_RND_CONV, AP_SAT_ZERO>: 33.660000000000004%
ap_fixed<32, 16, AP_RND_MIN_INF, AP_SAT>: 100.0%
ap_fixed<32, 28, AP_TRN, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_TRN_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 20, AP_RND_CONV, AP_WRAP>: 28.189999999999998%
ap_fixed<32, 28, AP_TRN, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_RND_MIN_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 20, AP_TRN_ZERO, AP_WRAP>: 100.0%
ap_fixed<28, 20, AP_TRN, AP_SAT>: 32.81%
ap_fixed<32, 16, AP_RND_MIN_INF, AP_WRAP>: 100.0%
ap_fixed<32, 28, AP_TRN_ZERO, AP_SAT>: 100.0%
ap_fixed<28, 20, AP_TRN, AP_SAT_ZERO>: 34.53%
ap_fixed<32, 24, AP_RND, AP_SAT>: 100.0%
ap_fixed<32, 16, AP_RND_INF, AP_SAT>: 100.0%
ap_fixed<32, 28, AP_TRN_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 20, AP_TRN, AP_WRAP>: 35.0%
ap_fixed<32, 24, AP_RND, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 28, AP_TRN_ZERO, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_RND_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<28, 20, AP_TRN_ZERO, AP_SAT>: 31.94%
ap_fixed<32, 24, AP_RND, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_RND_INF, AP_WRAP>: 100.0%
ap_fixed<32, 24, AP_RND_ZERO, AP_SAT>: 100.0%
ap_fixed<32, 16, AP_RND_CONV, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_RND_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 16, AP_RND_CONV, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 24, AP_RND_ZERO, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_RND_CONV, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_TRN, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_RND_MIN_INF, AP_SAT>: 100.0%
ap_fixed<32, 16, AP_TRN, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 24, AP_RND_MIN_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 16, AP_TRN, AP_WRAP>: 100.0%
ap_fixed<32, 24, AP_RND_MIN_INF, AP_WRAP>: 100.0%
ap_fixed<32, 16, AP_TRN_ZERO, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_RND_INF, AP_SAT>: 100.0%
ap_fixed<32, 16, AP_TRN_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 24, AP_RND_INF, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 16, AP_TRN_ZERO, AP_WRAP>: 100.0%
ap_fixed<32, 24, AP_RND_INF, AP_WRAP>: 100.0%
ap_fixed<32, 20, AP_RND, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_RND_CONV, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_RND_CONV, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 24, AP_RND_CONV, AP_WRAP>: 100.0%
ap_fixed<32, 24, AP_TRN, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_TRN, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 24, AP_TRN, AP_WRAP>: 100.0%
ap_fixed<32, 24, AP_TRN_ZERO, AP_SAT>: 100.0%
ap_fixed<32, 24, AP_TRN_ZERO, AP_SAT_ZERO>: 100.0%
ap_fixed<32, 24, AP_TRN_ZERO, AP_WRAP>: 100.0%
ap_fixed<32, 28, AP_RND, AP_SAT>: 100.0%
ap_fixed<32, 28, AP_RND, AP_SAT_ZERO>: 100.0%

0 replies

thesps · 2022-07-21T08:47:00Z

thesps
Jul 21, 2022
Maintainer

Hi,

This is quite different from what we expected as we're under the assumption that QKeras' quantization model should match hls4ml's

At the step config = hls4ml.utils.config_from_keras_model(nn, granularity='name', default_precision=default_precision, default_reuse_factor='1') hls4ml picks up the precision used in the QKeras model for kernel, bias, and activation quantizers (with AP_RND_CONV, AP_SAT by the way). What remains with the default precision are accumulators - accum in the config - and layer output tensors for the Dense/Conv layers - result in the config. For these, you can use profiling and tracing to find the appropriate values, which can be different for each layer. In your case you need to also make sure the input precision is appropriate for the input data (I think ap_uint<8>), and I think you need Strategy = Stable for the Softmax at the end.

2 replies

mackncheesiest Jul 25, 2022
Author

Oh that's interesting, I didn't realize that the default precision wasn't just overriding any parameters inferred by the conversion of the QKeras model. Thanks for the feedback, I'll give these suggestions a shot and see where they get me

mackncheesiest Jul 25, 2022
Author

Is it strange that we see many of the same behaviors if we instead train a model composed of non-qkeras layers? Or is the basic idea (we just need to give up trying to set a single default quantizer for all layers) still probably the largest factor?

mackncheesiest · 2022-08-08T19:36:07Z

mackncheesiest
Aug 8, 2022
Author

As a follow up here, AutoQKeras should be able to perform per-layer quantization for us, right? Even if I adapt the AutoQKeras segment from the CNN tutorial notebook, I still end up seeing large differences in behavior between the "best model" from autoqkeras's search and the resulting performance after conversion via hls4ml. I'm assuming I must still be doing something wrong somewhere, but I can't figure out what it is.

With the following script, I end up with an output of

Match % (AutoQ Keras vs AutoQ hls4ml): 11.540000000000001
Accuracy AutoQ Keras:  93.35666666666667
Accuracy AutoQ hls4ml: 11.236666666666666

i.e. the hls4ml model is getting a training accuracy basically equivalent to random guessing despite autoqkeras giving a model with reasonable enough accuracy. Is there a chance that this is related to our usage of Xilinx 2019.2? We still get good performance when we run that CNN notebook using the same virtualenv, so it doesn't seem to be a package/environment issue

#!/usr/bin/env python
# coding: utf-8

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, MaxPooling2D, Flatten, Dense
from tensorflow.keras.models import Model

import numpy as np
import hls4ml

from qkeras.autoqkeras import AutoQKeras
from qkeras.autoqkeras.utils import print_qmodel_summary
from sklearn.metrics import accuracy_score

import os
os.environ['PATH'] = '/media/jmack2545/data_drive/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']

def print_dict(d, indent=0):
    align=20
    for key, value in d.items():
        print('  ' * indent + str(key), end='')
        if isinstance(value, dict):
            print()
            print_dict(value, indent+1)
        else:
            print(':' + ' ' * (20 - len(key) - 2 * indent) + str(value))

def generate_mnist_dataset(one_hot = True):
    (x_train,y_train),(x_test,y_test) = tf.keras.datasets.mnist.load_data()
    if one_hot:
        y_train = tf.keras.utils.to_categorical(y_train)
        y_test = tf.keras.utils.to_categorical(y_test)
    return (x_train,y_train),(x_test,y_test)
            
def generate_mnist_network(filt_size, n_filt, n_dense):
    # build the baseline architecture
    in_layer = Input((28,28,1))
    x = in_layer
    x = Conv2D(filters=n_filt,
               kernel_size=filt_size,
               padding='valid',
               kernel_initializer='glorot_uniform',
              )(x)
    x = BatchNormalization(name='bn_conv_1')(x)
    x = Activation(activation='relu')(x)
    x = MaxPooling2D()(x)
    x = Flatten()(x)
    x = Dense(n_dense)(x)
    x = BatchNormalization(name='bn_dense_1')(x)
    x = Activation(activation='relu')(x)
    # avoid touching these with the autoqkeras optimization process
    x = Dense(10, name='output_dense')(x)
    out_layer = Activation(activation='softmax', name='output_softmax')(x)
    
    model = Model(inputs=[in_layer],outputs=[out_layer])

    # check hls4ml synthesizability with reuse factor = 1
    for layer in model.layers:
        if layer.__class__.__name__ in ['Conv2D', 'Dense']:
            w = layer.get_weights()[0]
            layersize = np.prod(w.shape)
            print("{}: {}".format(layer.name,layersize)) # 0 = weights, 1 = biases
            if (layersize > 4096): # assuming that shape[0] is batch, i.e., 'None'
                print("Layer {} is too large ({}), are you sure you want to train?".format(layer.name,layersize))

    # finishing touches...
    LOSS        = tf.keras.losses.CategoricalCrossentropy()
    OPTIMIZER   = tf.keras.optimizers.Adam(learning_rate=3E-3, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=True)

    model.compile(loss=LOSS, optimizer=OPTIMIZER, metrics=["accuracy"])
    
    return model

def get_autoq_model(nn, x_train, y_train, x_test, y_test):
    quantization_config = {
        "kernel": {
            "quantized_bits(2,0,1,alpha=1.0)": 2,
            "quantized_bits(4,0,1,alpha=1.0)": 4,
            "quantized_bits(6,0,1,alpha=1.0)": 6,
            "quantized_bits(8,0,1,alpha=1.0)": 8,
        },
        "bias": {
            "quantized_bits(2,0,1,alpha=1.0)": 2,
            "quantized_bits(4,0,1,alpha=1.0)": 4,
            "quantized_bits(6,0,1,alpha=1.0)": 6,
            "quantized_bits(8,0,1,alpha=1.0)": 8,
        },
        "activation": {
            "quantized_relu(3,1)": 3,
            "quantized_relu(4,2)": 4,
            "quantized_relu(8,2)": 8,
            "quantized_relu(8,4)": 8,
            "quantized_relu(16,6)": 16
        },
        "linear": {
            "quantized_bits(2,0,1,alpha=1.0)": 2,
            "quantized_bits(4,0,1,alpha=1.0)": 4,
            "quantized_bits(6,0,1,alpha=1.0)": 6,
            "quantized_bits(8,0,1,alpha=1.0)": 8,
        }
    }

    # Layer-type limitations on which configurations are acceptable
    limit = {
        "Dense": [8, 8, 16],
        "Conv2D": [8, 8, 16],
        "Activation": [16],
    }

    goal_energy = {
        "type": "energy",
        "params": {
            "delta_p": 8.0,
            "delta_n": 8.0,
            "rate": 1.5,
            "stress": 1.0,
            "process": "horowitz",
            "parameters_on_memory": ["sram", "sram"],
            "activations_on_memory": ["sram", "sram"],
            "rd_wr_on_io": [False, False],
            "min_sram_size": [0, 0],
            "source_quantizers": ["fp32"],
            "reference_internal": "int8",
            "reference_accumulator": "int32"
        }
    }

    run_config = {
        # Note: goal_bits seems to just not work well in general even though it's more of what we want
        "goal": goal_energy,
        "quantization_config": quantization_config,
        "learning_rate_optimizer": False,
        "transfer_weights": False, # Don't randomly initialize weights
        "mode": "bayesian", # This can be bayesian,random,hyperband
        "seed": 1000,
        "limit": limit,
        "tune_filters": "layer",
        "tune_filters_exceptions": "^output",
        "distribution_strategy": None,
        "max_trials": 2 # Let's just do 2 trials for this demonstrator, ideally you should do as many as possible
    }

    autoqk = AutoQKeras(nn, output_dir=f'autoq_mnist_sweep', metrics=["acc"], custom_objects={}, **run_config)
    # yeah yeah, test data != validation data
    autoqk.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=5)

    aqmodel = autoqk.get_best_model()
    print_qmodel_summary(aqmodel)
    
    return aqmodel

def strip_autoq_model(aqmodel):
    aqmodel.save_weights(f"autoq_mnist_finalparams.h5")

    layers = [l for l in aqmodel.layers]
    x = layers[0].output
    for i in range(1, len(layers)):
        x = layers[i](x)

    new_model = Model(inputs=[layers[0].input], outputs=[x])   
    LOSS        = tf.keras.losses.CategoricalCrossentropy()
    OPTIMIZER   = tf.keras.optimizers.Adam(learning_rate=3E-3, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=True)

    new_model.compile(loss=LOSS, optimizer=OPTIMIZER, metrics=["accuracy"])
    new_model.summary()
    new_model.load_weights(f"autoq_mnist_finalparams.h5")
    print_qmodel_summary(new_model)  
    
    return new_model

def convert_via_hls4ml(new_model):
    hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
    hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
    hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

    hls_config_aq = hls4ml.utils.config_from_keras_model(new_model, granularity='name')
    hls_config_aq['Model']['ReuseFactor'] = 1
    hls_config_aq['Model']['Precision'] = 'ap_fixed<16,6>'
    hls_config_aq['Model']['Strategy'] = 'Latency'
    hls_config_aq['LayerName']['output_softmax']['Strategy'] = 'Stable'
    print_dict(hls_config_aq)

    cfg_aq = hls4ml.converters.create_config(backend='Vivado')
    cfg_aq['IOType']     = 'io_stream'
    cfg_aq['HLSConfig']  = hls_config_aq
    cfg_aq['KerasModel'] = new_model
    cfg_aq['OutputDir']  = f'autoq_mnist_hls4ml/'
    cfg_aq['XilinxPart'] = 'xczu28dr-ffvg1517-2-e'
    #pprint.pprint(cfg_aq)
  
    hls_model_aq = hls4ml.converters.keras_to_hls(cfg_aq)
    # hls_model_aq.build(reset=True, synth=False, vsynth=False, csim=False, cosim=False)
    hls_model_aq.compile()
    
    return hls_model_aq

if __name__ == "__main__":
    (x_train,y_train), (x_test,y_test) = generate_mnist_dataset()
    x_train = x_train.astype(float)
    x_test = x_test.astype(float)

    # instantiate a baseline architecture
    nn = generate_mnist_network(filt_size = 3, n_filt = 5, n_dense = 4)

    # execute the autoqkeras search
    aqmodel = get_autoq_model(nn, x_train, y_train, x_test, y_test)

    # train the best model found by autoqkeras a bit more
    aqmodel.fit(x_train,
                y_train,
                epochs = 5,
                validation_data = (x_test, y_test),
                callbacks = [], 
                verbose=1)

    # This model (apparently) has some remnants from the optimization procedure attached to it
    new_model = strip_autoq_model(aqmodel)
    
    # Create the hls4ml project directory and compile our C++ representation
    hls_model_aq = convert_via_hls4ml(new_model)

    # compare the two networks
    y_predict_aq        = new_model.predict(x_train)
    y_predict_hls4ml_aq = hls_model_aq.predict(x_train)

    num_matches = sum(np.argmax(y_predict_aq, axis=1) == np.argmax(y_predict_hls4ml_aq, axis=1))
    num_images = y_predict_aq.shape[0]

    print(f"Match % (AutoQ Keras vs AutoQ hls4ml): {num_matches / num_images * 100.0}")

    accuracy_keras  = 100 * float(accuracy_score (np.argmax(y_train, axis=1), np.argmax(y_predict_aq, axis=1)))
    accuracy_hls4ml = 100 * float(accuracy_score (np.argmax(y_train, axis=1), np.argmax(y_predict_hls4ml_aq, axis=1)))

    print("Accuracy AutoQ Keras:  {}".format(accuracy_keras))
    print("Accuracy AutoQ hls4ml: {}".format(accuracy_hls4ml))

4 replies

mackncheesiest Aug 10, 2022
Author

Sorry to be a bother (and of course any feedback is voluntary and greatly appreciated), but if anyone has any possible ideas about why we see mismatches with the above code, I'd really appreciate it

vloncar Aug 10, 2022
Maintainer

Hi Joshua, did you try increasing the accum type precision that @thesps suggested? Barring any bugs in conversion of the model, this is the most common culprit, since QKeras does all intermediate computation on full precision (floating point) types, while in hls4ml everything is quantized to fixed precison, even the intermediate results (represented by accum type), making this difference. Beyond that, you can explore the profiling package and go layer by layer to see where the difference in the output tensor is coming from. Sorry I can't provide more detailed answer now, it would require running your model and debugging it, but I'm swamped and don't have time. This is valuable feedback for us, and we will do our best to improve the documentation to provide the concise suggestions for the next release.

mackncheesiest Aug 15, 2022
Author

No worries, thank you for the feedback so far. I've been messing with accum and layer-level profiling but no luck so far even with accum as high as ap_fixed<32, 24>. It's looking like setting an activity_regularizer on some of my layers might help to reduce the dynamic range of the produced model but it's a bit WIP at the moment

mackncheesiest Aug 15, 2022
Author

One question regarding hls4ml.model.profiling.numerical: my brief read through the code seems to imply that the HLS models produced by hls4ml.converters.keras_to_hls include all of the optimizations that are listed in the final / after optimization images. However, I don't actually know if this is the case. Do you have any insight on which of these plots are actually representative of the model performance I should see when running hls_model.predict?

mackncheesiest · 2022-08-17T16:51:09Z

mackncheesiest
Aug 17, 2022
Author

follow up here, I didn't have much luck modifying accum, but I've found that it is possible to make them match by setting the result quantizer absurdly large with an hls4ml config process that looks like

hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND_CONV'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

hls_config = hls4ml.utils.config_from_keras_model(autoq_network, granularity='name')
hls_config['Model']['ReuseFactor'] = 1
hls_config['Model']['Strategy'] = 'Latency'
hls_config['LayerName']['output_softmax']['Strategy'] = 'Stable'

quantizers_to_modify = ['result']
big_quantizer = 'ap_fixed<64,48,AP_RND_CONV,AP_SAT>'

for layer in hls_config['LayerName'].keys():
    if isinstance(hls_config['LayerName'][layer]['Precision'], dict):
        for quantizer in hls_config['LayerName'][layer]['Precision'].keys():
            if quantizer in quantizers_to_modify:
                hls_config['LayerName'][layer]['Precision'][quantizer] = big_quantizer

cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType']     = 'io_stream'
cfg['HLSConfig']  = hls_config
cfg['KerasModel'] = autoq_network
cfg['OutputDir']  = f'hls4ml_proj/'
cfg['XilinxPart'] = 'xczu28dr-ffvg1517-2-e'

hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()

which then gives

Match % (AutoQ Keras vs AutoQ hls4ml): 99.7
Accuracy AutoQ Keras:  89.4
Accuracy AutoQ hls4ml: 89.7

thanks for the suggestion there @thesps
it's hopefully just a process of tuning from here on out

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large drop in accuracy in QKeras => hls4ml conversion regardless of quant/overflow params #614

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Large drop in accuracy in QKeras => hls4ml conversion regardless of quant/overflow params #614

mackncheesiest Jul 20, 2022

Replies: 4 comments · 6 replies

mackncheesiest Jul 20, 2022 Author

thesps Jul 21, 2022 Maintainer

mackncheesiest Jul 25, 2022 Author

mackncheesiest Jul 25, 2022 Author

mackncheesiest Aug 8, 2022 Author

mackncheesiest Aug 10, 2022 Author

vloncar Aug 10, 2022 Maintainer

mackncheesiest Aug 15, 2022 Author

mackncheesiest Aug 15, 2022 Author

mackncheesiest Aug 17, 2022 Author

mackncheesiest
Jul 20, 2022

Replies: 4 comments 6 replies

mackncheesiest
Jul 20, 2022
Author

thesps
Jul 21, 2022
Maintainer

mackncheesiest Jul 25, 2022
Author

mackncheesiest Jul 25, 2022
Author

mackncheesiest
Aug 8, 2022
Author

mackncheesiest Aug 10, 2022
Author

vloncar Aug 10, 2022
Maintainer

mackncheesiest Aug 15, 2022
Author

mackncheesiest Aug 15, 2022
Author

mackncheesiest
Aug 17, 2022
Author