View the runnable example on GitHub
Quantize Tensorflow Model for Inference by Specifying Accuracy Control#
To quantize your TensorFlow models while considering accuracy drop, you can apply InferenceOptimizer.quantize API and specify several parameters, which takes only a few lines.
Let’s take an EfficientNetB0 model pretrained on ImageNet dataset and finetuned on Imagenette dataset for validation as an example (seen full definition of prepare_dataset and create_model in runnable
example):
[ ]:
train_set, test_set, calibration_set, ds_info = prepare_dataset()
ori_model = create_model()
ori_model.fit(train_set,
epochs=10,
steps_per_epoch=(ds_info.splits['train'].num_examples // 512 + 1),
)
By default, InferenceOptimizer.quantize() doesn’t search the tuning space and returns the fully-quantized model without considering the accuracy drop. If you need to search quantization tuning space for a model with accuracy control, you may need to specify a few parameters.
Following parameters can help you tune the results for both INC and POT quantization:
metric: Atensorflow.keras.metrics.Metricobject for evaluation.accuracy_criterion: A dictionary to specify the acceptable accuracy drop, e.g.{'relative': 0.01, 'higher_is_better': True}relative/absolute: Drop type, the accuracy drop should be relative or absolute to baselinehigher_is_better: Indicate if a larger value of metric means better accuracy
max_trials: Maximum trails on the search, if the algorithm can’t find a satisfying model, it will exit and raise the error.batch: Specify the batch size of the dataset. This will only take effect on evaluation. If it’s not set, then we usebatch=1for evaluation.
[ ]:
from bigdl.nano.tf.keras import InferenceOptimizer
from tensorflow.keras.metrics import CategoricalAccuracy
q_model = InferenceOptimizer.quantize(ori_model,
x=calibration_set,
metric=CategoricalAccuracy(),
accuracy_criterion={'relative': 0.1,
'higher_is_better': True},
tuning_strategy='bayesian',
timeout=0,
max_trials=10)
📝 Note
InferenceOptimizerwill by default quantize your TensorFlow models using int8 precision through static post-training quantization. Currently ‘dynamic’ approach is not supported yet. For this case,x(for calibration data) is required for accuracy control.Please refer to API documentation for more information on
InferenceOptimizer.quantize.
You could then do the normal inference steps with the quantized model:
[ ]:
x = tf.random.normal(shape=(2, 224, 224, 3))
# use the optimized model here
y_hat = q_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)
📚 Related Readings