What is in that picture?

3/4/23

Understanding image segmentation

Identifying objects in images

Identifying objects in images would be a more accurate title for this blog but I would like to bridge the gap between image segmentation and sharing a family photo. When a family member sees the family photo they immediately identify all of the relatives in the photo. Now let's say you have an album with hundreds of family photos; it's you cousins birthday and you would like to find all of the pictures which contain you and your cousin. You would like to look through them all and choose a nice photo; you'll then share the photo of you two on their social media with a nice happy birthday message. You could do this by hand or you have a computer help you. This is where image segmentation comes.

Real life examples of image segmentation

When you take a picture with your smart phone it may create "bounding boxes" around heads in the viewer. The boxes exist to help you focus your shot (whether you like it or not). Another example if with video meetings. Most video meeting tools allow you to "blur" your background. The video (continual images) will only show you and it'll blur out the background. A very useful feature to help with privacy concerns.

Running your first image segmentation model

To hit the ground running with an image segmentation model will we download one from Hugging Face (Hugging Face is an AI company which helps with services around state-of-the-art deep learning models (mostly text, vision, audio). They have a landing page on image segmentation (https://huggingface.co/tasks/image-segmentation) where they explain it, give examples, provide a demo and provide links to models).

The first thing to try is their demo using their example image on their page (no luck, see below).

Next is to try running the model locally. The demo above uses "DETR (End-to-End Object Detection) model with ResNet-50 backbone" (https://huggingface.co/facebook/detr-resnet-50-panoptic). This is the most popular segmentation model on Hugging Face and i won't try to explain it here. Let's download the model and use it with our image.

To simplify creating an environment i'm going to run this on google colab (https://colab.research.google.com/drive/1pGayh4ZWeWjhhHRQJqeAfeV9lw8ynvhP#scrollTo=_xiokfNaIxpN) and i'll include the code here for refence

!pip install transformers timm rich pyyaml==5.4.1


from transformers import DetrFeatureExtractor, DetrForSegmentation

from PIL import Image

from rich import inspect

import requests

import xarray as xr


url = 'https://huggingface.co/tasks/assets/image-segmentation/image-segmentation-input.jpeg'

image = Image.open(requests.get(url, stream=True).raw)

image

image.size


feature_extractor = DetrFeatureExtractor.from_pretrained('facebook/detr-resnet-50-panoptic')

model = DetrForSegmentation.from_pretrained('facebook/detr-resnet-50-panoptic')


inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)

inspect(outputs)


pred_masks = outputs.pred_masks

pred_masks.shape

xr.DataArray(pred_masks.squeeze().argmax(axis=0).flipud()).plot()

You can see in the image above if the cat was in a zoom call and wanted to "blur background" the display would blur the pixels outside of the outline of the cat.

How does image segmentation work?