Facial landmarks with dlib, OpenCV, and Python

Till Now we learned how to install and configure dlib on our system with Python bindings.

So,Now we are going to use dlib and OpenCV to detect facial landmarks in an image.

Facial landmarks are used to localize and represent salient regions of the face, such as:

  • Eyes
  • Eyebrows
  • Nose
  • Mouth
  • Jawline

Facial landmarks have been successfully applied to face alignment, head pose estimation, face swapping, blink detection and much more.

So today we’ll be focusing on the basics of facial landmarks, including:

  1. Exactly what facial landmarks are and how they work.
  2. How to detect and extract facial landmarks from an image using dlib, OpenCV, and Python.
  3. Then we will learn how to extract specific facial regions & label them based on these facial landmarks.

To learn more about facial landmarks, just keep reading.


Facial landmarks with dlib, OpenCV, and Python

The first part of this blog post will discuss facial landmarks and why they are used in computer vision applications.

From there, I’ll demonstrate how to detect and extract facial landmarks using dlib, OpenCV, and Python.

Finally, we’ll look at some results of applying facial landmark detection to images.

What are facial landmarks?

Figure 1: Facial landmarks are used to label and identify key facial attributes in an image.



Detecting facial landmarks is a subset of the shape prediction problem. Given an input image (and normally an ROI that specifies the object of interest), a shape predictor attempts to localize key points of interest along the shape.

In the context of facial landmarks, our goal is detect important facial structures on the face using shape prediction methods.



Detecting facial landmarks is therefore a two step process:

  • Step #1: Localize the face in the image.
  • Step #2: Detect the key facial structures on the face ROI.

Face detection (Step #1) can be achieved in a number of ways.

We could use OpenCV’s built-in Haar cascades.

We might apply a pre-trained HOG + Linear SVM object detector specifically for the task of face detection.

Or we might even use deep learning-based algorithms for face localization. There are many Face Detection Algorithm with Deep Learning such as :

In either case, the actual algorithm used to detect the face in the image doesn’t matter. Instead, what’s important is that through some method we obtain the face bounding box (i.e., the (x, y)-coordinates of the face in the image).

Given the face region we can then apply Step #2: detecting key facial structures in the face region.

There are a variety of facial landmark detectors, but all methods essentially try to localize and label the following facial regions:

  • Mouth
  • Right eyebrow
  • Left eyebrow
  • Right eye
  • Left eye
  • Nose
  • Jaw

The facial landmark detector included in the dlib library is an implementation of the One Millisecond Face Alignment with an Ensemble of Regression Trees paper by Kazemi and Sullivan (2014).

This method starts by using:

  1. A training set of labeled facial landmarks on an image. These images are manually labeled, specifying specific (x, y)-coordinates of regions surrounding each facial structure.
  2. Priors, of more specifically, the probability on distance between pairs of input pixels.

Given this training data, an ensemble of regression trees are trained to estimate the facial landmark positions directly from the pixel intensities themselves (i.e., no “feature extraction” is taking place).

The end result is a facial landmark detector that can be used to detect facial landmarks in real-time with high quality predictions.

For more information and details on this specific technique, be sure to read the paper by Kazemi and Sullivan linked to above, along with the official dlib announcement.

Understanding dlib’s facial landmark detector

The pre-trained facial landmark detector inside the dlib library is used to estimate the location of 68 (x, y)-coordinates that map to facial structures on the face.

The indexes of the 68 coordinates can be visualized on the image below:

Figure 2: Visualizing the 68 facial landmark coordinates from the iBUG 300-W dataset .

These annotations are part of the 68 point iBUG 300-W dataset which the dlib facial landmark predictor was trained on.

It’s important to note that other flavors of facial landmark detectors exist, including the 194 point model that can be trained on the HELEN dataset.

Regardless of which dataset is used, the same dlib framework can be leveraged to train a shape predictor on the input training data — this is useful if you would like to train facial landmark detectors or custom shape predictors of your own.

In the remaining of this blog post I’ll demonstrate how to detect these facial landmarks in images.

Future blog posts in this series will use these facial landmarks to extract specific regions of the face, apply face alignment, and even build a blink detection system.

Detecting facial landmarks with dlib, OpenCV, and Python

In order to prepare for this series of blog posts on facial landmarks, I’ve added a few convenience functions to my imutils library, specifically inside face_utils/helpers.py.

We’ll be reviewing two of these (rect_to_bb, shape_to_np ) functions inside helpers.py .

The first utility function is rect_to_bb, short for “rectangle to bounding box”:

Facial landmarks with dlib, OpenCV, and Python

This function accepts a single argument, "rect" , which is assumed to be a bounding box rectangle produced by a dlib detector (i.e., the face detector).

The rect object includes the (x, y)-coordinates of the detection.

However, in OpenCV, we normally think of a bounding box in terms of “(x, y, width, height)” so as a matter of convenience, the rect_to_bb function takes this rect object and transforms it into a 4-tuple of coordinates. Again, this is simply a matter of conveinence and taste.

Secondly, we have the shape_to_np function:

Facial landmarks with dlib, OpenCV, and Python

The dlib face landmark detector will return a shape object containing the 68 (x, y)-coordinates of the facial landmark regions.

Using the shape_to_np function, we cam convert this object to a NumPy array.

Given these two helper functions, we are now ready to detect facial landmarks in images.

Open up a new file, name it facial_landmarks.py , and insert the following code:

Facial landmarks with dlib, OpenCV, and Python

Facial landmark visualizations

Before we test our facial landmark detector, make sure you have upgraded to the latest version of imutils which includes the face_utils.py file:

Facial landmarks with dlib, OpenCV, and Python

$ pip install --upgrade imutils

Note: If you are using Python virtual environments, make sure you upgrade the imutils inside the virtual environment.

parse our command line arguments:

  • --shape-predictor : This is the path to dlib’s pre-trained facial landmark detector. You can download the detector model here or you can use the Downloads section of this post to grab the code + example images + pre-trained detector as well.
  • --image : The path to the input image that we want to detect facial landmarks on.

From there, use the Downloads section of this guide to download the source code, example images, and pre-trained dlib facial landmark detector.

Once you’ve downloaded the .zip archive, unzip it, change directory to facial-landmarks, and execute the following command:

Facial landmarks with dlib, OpenCV, and Python

$ python facial_landmarks.py --shape-predictor shape_predictor_68_face_landmarks.dat \
--image images/example_01.jpg

Figure 3: Applying facial landmark detection using dlib, OpenCV, and Python.

Notice how the bounding box of my face is drawn in green while each of the individual facial landmarks are drawn in red.

The same is true for this second example image:

Facial landmarks with dlib, OpenCV, and Python

$ python facial_landmarks.py --shape-predictor shape_predictor_68_face_landmarks.dat \
--image images/example_02.jpg

Figure 4: Facial landmarks with dlib.


Here we can clearly see that the red circles map to specific facial features, including my jawline, mouth, nose, eyes, and eyebrows.

Let’s take a look at one final example, this time with multiple people in the image:

Facial landmarks with dlib, OpenCV, and Python

$ python facial_landmarks.py --shape-predictor shape_predictor_68_face_landmarks.dat \
--image images/example_03.jpg

Figure 5: Detecting facial landmarks for multiple people in an image.

For all the people in above image , not only our faces are detected correctly but also annotated via facial landmarks as well.

Summary of Phase-1

Till Now we learned what facial landmarks are and how to detect them using dlib, OpenCV, and Python.

Detecting facial landmarks in an image is a two step process:

  1. First we must localize a face(s) in an image. This can be accomplished using a number of different techniques, but normally involve either Haar cascades or HOG + Linear SVM detectors (but any approach that produces a bounding box around the face will suffice).
  2. Apply the shape predictor, specifically a facial landmark detector, to obtain the (x, y)-coordinates of the face regions in the face ROI.

Given these facial landmarks we can apply a number of computer vision techniques, including:

  • Face part extraction (i.e., nose, eyes, mouth, jawline, etc.)
  • Facial alignment
  • Head pose estimation
  • Face swapping
  • Blink detection
  • …and much more!

Detect eyes, nose, lips, and jaw with dlib, OpenCV, and Python

Till Now discussed how to detect facial landmarks in images.

Now we are going to take the next step and use our detected facial landmarks to help us label and extract face regions, including:

  • Mouth , Right eyebrow , Left eyebrow , Right eye , Left eye , Nose , Jaw

Facial landmark indexes for face regions

The facial landmark detector implemented inside dlib produces 68 (x, y)-coordinates that map to specific facial structures. These 68 point mappings were obtained by training a shape predictor on the labeled iBUG 300-W dataset.

Below we can visualize what each of these 68 coordinates map to:

Figure 1: Visualizing each of the 68 facial coordinate points from the iBUG 300-W dataset .


Examining the image, we can see that facial regions can be accessed via simple Python indexing (assuming zero-indexing with Python since the image above is one-indexed):

  • The mouth can be accessed through points [48, 68].
  • The right eyebrow through points [17, 22].
  • The left eyebrow through points [22, 27].
  • The right eye using [36, 42].
  • The left eye with [42, 48].
  • The nose using [27, 35].
  • And the jaw via [0, 17].

These mappings are encoded inside the FACIAL_LANDMARKS_IDXS dictionary inside face_utils of the imutils library:

Detect eyes, nose, lips, and jaw with dlib, OpenCV, and Python

Using this dictionary we can easily extract the indexes into the facial landmarks array and extract various facial features simply by supplying a string as a key.

Visualizing facial landmarks with OpenCV and Python

A slightly harder task is to visualize each of these facial landmarks and overlay the results on an input image.

To accomplish this, we’ll need the visualize_facial_landmarks function, already included in the imutils library

Now, We are ready to visualize each of the individual facial regions via facial landmarks using loop over facial landmark regions individually , already included in the imutils library

The last step is to create a transparent overlay via the cv2.addWeighted function

Extracting parts of the face using dlib, OpenCV, and Python

From there, open up a new file, name it detect_face_parts.py, and insert the following code:

Face part labeling results

Now that our example has been coded up, let’s take a look at some results.

Be sure to use the Downloads section of this guide to download the source code + example images + dlib facial landmark predictor model.

From there, you can use the following command to visualize the results:

Detect eyes, nose, lips, and jaw with dlib, OpenCV, and Python

$ python detect_face_parts.py --shape-predictor shape_predictor_68_face_landmarks.dat \
--image images/example_01.jpg

Notice how my mouth is detected first:

Mouth

Right_Eyebrow

Left_Eyebrow

Right_eye

left_eye

Jaw

inner_mouth

I have created a GIF animation of the output:

Real-time facial landmark detection

Now let's expand our implementation of facial landmarks to work in real-time video streams, paving the way for more real-world applications, including next tutorial on blink detection.

Facial landmarks in video streams

Let’s go ahead and get this facial landmark example started.

Open up a new file, name it video_facial_landmarks.py , and insert the following code:

Real-time facial landmark results

To test our real-time facial landmark detector using OpenCV, Python, and dlib, make sure you use the “Downloads” section of this blog post to download an archive of the code, project structure, and facial landmark predictor model.

If you are using a standard webcam/USB camera, you can execute the following command to start the video facial landmark predictor:

Real-time facial landmark detection with OpenCV, Python, and dlib

$ python video_facial_landmarks.py \--shape-predictor shape_predictor_68_face_landmarks.dat

I have included a full video output below :

Summary

In this blog post we explored about facial landmarks and applied them to the task of real-time detection.

As our results demonstrated, we are fully capable of detecting facial landmarks in a video stream in real-time using a system with a modest CPU.

Now that we understand how to access a video stream and apply facial landmark detection, we can move on to next week’s real-world computer vision application — blink detection.