View the runnable example on GitHub

Quantize Tensorflow Model for Inference by Specifying Accuracy Control#

To quantize your TensorFlow models while considering accuracy drop, you can apply InferenceOptimizer.quantize API and specify several parameters, which takes only a few lines.

Let’s take an EfficientNetB0 model pretrained on ImageNet dataset and finetuned on Imagenette dataset for validation as an example (seen full definition of prepare_dataset and create_model in runnable example):

[ ]:
train_set, test_set, calibration_set, ds_info = prepare_dataset()
ori_model = create_model()
ori_model.fit(train_set,
          epochs=10,
          steps_per_epoch=(ds_info.splits['train'].num_examples // 512 + 1),
          )

By default, InferenceOptimizer.quantize() doesn’t search the tuning space and returns the fully-quantized model without considering the accuracy drop. If you need to search quantization tuning space for a model with accuracy control, you may need to specify a few parameters.

Following parameters can help you tune the results for both INC and POT quantization:

  • metric: A tensorflow.keras.metrics.Metric object for evaluation.

  • accuracy_criterion: A dictionary to specify the acceptable accuracy drop, e.g. {'relative': 0.01, 'higher_is_better': True}

    • relative / absolute: Drop type, the accuracy drop should be relative or absolute to baseline

    • higher_is_better: Indicate if a larger value of metric means better accuracy

  • max_trials: Maximum trails on the search, if the algorithm can’t find a satisfying model, it will exit and raise the error.

  • batch: Specify the batch size of the dataset. This will only take effect on evaluation. If it’s not set, then we use batch=1 for evaluation.

[ ]:
from bigdl.nano.tf.keras import InferenceOptimizer
from tensorflow.keras.metrics import CategoricalAccuracy

q_model = InferenceOptimizer.quantize(ori_model,
                                      x=calibration_set,
                                      metric=CategoricalAccuracy(),
                                      accuracy_criterion={'relative': 0.1,
                                                          'higher_is_better': True},
                                      tuning_strategy='bayesian',
                                      timeout=0,
                                      max_trials=10)

📝 Note

InferenceOptimizer will by default quantize your TensorFlow models using int8 precision through static post-training quantization. Currently ‘dynamic’ approach is not supported yet. For this case, x (for calibration data) is required for accuracy control.

Please refer to API documentation for more information on InferenceOptimizer.quantize.

You could then do the normal inference steps with the quantized model:

[ ]:
x = tf.random.normal(shape=(2, 224, 224, 3))
# use the optimized model here
y_hat = q_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)