Session: Object Detecion: from Viola-Jones to Convolutional-Neural-Networks-based object detector

Target-audience:
Advanced

Object Detection: from Viola-Jones to Convolutional-Neural-Networks-based object detector

Problem formulation.

The task of object detection consists of (1): - Finding the locations of all the instances of an object in an image. - Determining the sizes of all found instances. - Having a certain amount of invariance to view-point changes, illumination changes and occlusions. - Depending on the application, being able to do it in real-time or speed that makes batch processing feasible.

One example of the above is face detection. The input to the algorithm is an image that may or may not contain faces, and the algorithm should find bounding boxes for each face. At the same time, the algorithm should be robust with regard to:

  • intra-class variation: detecting both baby faces and bearded faces;
  • differences in pose and expression: detecting slightly tilted faces and faces with varying expressions;
  • occlusions: detecting faces partially obscured by objects such as glasses.

Often, there is a requirement that face detection should be performed very rapidly (2). E.g., some cameras adjust focus and exposure real-time based on faces found in the scene (3).

Talk overview.

In the talk we plan to cover these topics: - Algorithmic stages of the first real-time Object Detection framework by Viola and Jones (2), that was originally developed for the task of Face Detection (2). - Available implementation of the algorithm in OpenCV (4) and its problems. Overall complexity of installation of OpenCV for usual users. - Our implementation of the algorithm (5) that can be fully integrated into scikit-image image processing library (4) and avoids patent in a way similar to OpenCV's implementation. - Description of one of the modern approaches to object detection using Convolutional Neural Networks (6) that combines accuracy of Convolutional Neural Networks (7) and speed by using attentional cascade approach from the original Viola-Jones paper (2).

Viola-Jones object detector.

In the algorithmic stage section, the main blocks of the Viola-Jones algorithms are described: - Haar features (2) and MultiBlock-Local-Binary Patterns (MB-LBP) (8) features that were shown to give better results (8). - Integral images -- a special representation of the image, that enables all the features like Haar and MultiBlock-Local-Binary Patterns to be computed in a constant time on any scale (2). - Adaboost feature selection technique -- an approach that selects a certain amount of features(Haar or MB-LBP) to create a best performing weak classifier (2). - Attentional cascade -- an algorithmic approach that helps to combine weak classifiers into a strong one, while at the same time speeding up the evaluation process. It quickly rejects the easy example in the first stages of cascade, and carefully evaluates a small number of challenging candidates in the last stages (2) (9).

OpenCV implementation and related problems.

In this section we colver those topics: - OpenCV implementation. TBB library speed up, that allows us to make use of multiple threads while training the classifier. GPU implementation. - Complexity of installation of OpenCV. Although OpenCV provides highly efficient implementations, first time users experience problems. Easy-install and easy-use solution is missing. - An implementation that avoids the patent.

Our implementation.

In the section describing our implementation we cover (10): - The choice of MB-LBP features over Haar to speed up the computation. - Cython implementation of critical sections. - OpenMP library usage in Cython to employ multiple threads. - Implementation that avoids the patent, analogous to OpenCV. - Training dataset creation. Negative and positive training examples creation. - Implementation that makes it possible to use xml files with weights from OpenCV. - Possible future usage of Cuda to improve performance.

Convolutional-Neural-Networks-based approach

  • Recent success of Convolutional Neural Networks (CNNs) in multiple areas over approaches based on hand-crafted features like Haar features and MB-LBP (7).
  • Recent hybrid approach that combines accuracy of CNNs and speed that is achieved by using Attentional Cascade (6).
  • Description of the overall algorithm of the paper.

Conclusion and discussion

Our talk gives an overview of how to implement a highly accurate, real-time, and patent-unencumbered object detection algorithm in Python. We discuss in some detail the usage of Cython and OpenMP for this purpose, and how the algorithm was integrated into scikit-image. Finally, we give an overview of recent successes in object detection, based on an approach that combines the cascade approach and convolutional neural networks.