01 Jun 2020
On This Page

Deep Learning with OpenCV

01 Jun 2020
On This Page

Deep learning with OpenCV using its inbuilt dnn module

Introduction to dnn module

The dnn module of OpenCV was included in the main repository in v3.3.

With OpenCV 3.3 or later, we can utilize pre-trained networks with popular deep learning frameworks. The fact that they are pre-trained implies that we don’t need to spend many hours training the network - rather we can complete a forward pass and utilize the output to make a decision within our application.

For more information about supported frameworks and networks, follow this link

OpenCV deep learning functions and frameworks

Using OpenCV 3.3 we can load images from disk using the following functions inside dnn:

  • cv2.dnn.blobFromImage
  • cv2.dnn.blobFromImages

Use “read” methods to load serialized model from disk directly:

  • cv2.dnn.readNetFromCaffe
  • cv2.dnn.readNetFromTensorFlow
  • cv2.dnn.readNetFromTorch

Once we have loaded model from disk, the .forward method is used to forward-propogate our image and obtain the actual classification.

We’ll take a look at how to load model and classify images using pre-trained model.

Image classification

For this example, we’ll use:

  • Caffe framework
  • GoogleLeNet or Inception (pre-trained on ImageNet) network

To learn more about how we load a pre-trained Caffe model and use it to classify an image using OpenCV, follow this.

Object Detection

Image classification just classify an image into one of ImageNet’s 1000 separate class labels, but it could not tell us where an object resides in image. In order to obtain the bounding box (x, y) co-ordinates for an object in an image, we need to apply Object Detection.

When it comes to deep learning based object detection, there are following methods primarily:

Faster R-CNNs

This technique can be difficult to understand, hard to implement, and challenging to train. The algorithm is quite slow, on the order of 7 FPS.

YOLO

This technique is faster, capable of processing 40-90 FPS on a Titan X GPU. The super fast variant of YOLO can even get upto 155 FPS. But the problem with YOLO is the accuracy, which is not very good.

SSDs

This technique is balance between above twos. The algorithm is straight forward to understand, implement. It gives comparable faster FPS throughput at 22-46 FPS depending on which variant of the network we use. SSDs also tend to be more accurate than YOLO.

MobileNets: Efficient (deep) neural networks

When building object detection networks, we normally use an existing network architecture, such as VGG or ResNet, and then use it inside the object detection pipeline. But the problem is that these networks can be very large in the order of 200-500 MB.
These networks are unsuitable for resource constrained devices due to their sheer size and resulting number of computations.

But, we can use MobileNets (Howard et al., 2017), another paper by Google researchers. These networks are designed for resource constrained devices such as smartphones, IOT devices, etc. These networks use depthwise separable convolution compared to traditional CNNs.
The general idea behind depthwise separable convolution is to split convolution into 2 stages:

  • A 3x3 depthwise convolution
  • Followed by a 1x1 pointwise convolution

This allows us to reduce the number of parameters in out network.

But we loose accuracy in MobileNets.

To overcome the shortcoming, we combine MobileNets and SSDs for fast, efficient deep learning based object detection.

For this example, we’ll use:

The MobileNet SSD was first trained on the COCO dataset (Common Objects in COntext) and then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean Average Precision).

We can therefore detect 20 objects in images (+1 for the background class), including airplanes, bicycles, birds, boats, bottles, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorbikes, people, potted plants, sheep, sofas, trains, and tv monitors.