There are very many different approaches using convolutional neural networks for object detection, but the following two families of models are dominant:
After getting bounding boxes, we often see that the same object seems to get recognized in multiple bounding boxes which are very similar in size and are just shifted by small amounts. In such cases we need to define a method to select one of the boxes and reject the others.
Non-maximal suppression does this by finding, for a given bounding box, all other boxes which have substantial overlap with it (have IoU over a threshold), and takes the one box among this set (plus the original box) which has the highest confidence, while discarding the rest. This substantially improves the quality of the output, though at some computational cost.
Post training quantization is a common technique used to reduce the model size while also providing approximately 2 to 3 times lower latency with little degradation in accuracy of the model.
Quantization can be done in 2 ways