What is in that picture?
3/4/23
Understanding image segmentation
Identifying objects in images
Identifying objects in images would be a more accurate title for this blog but I would like to bridge the gap between image segmentation and sharing a family photo. When a family member sees the family photo they immediately identify all of the relatives in the photo. Now let's say you have an album with hundreds of family photos; it's you cousins birthday and you would like to find all of the pictures which contain you and your cousin. You would like to look through them all and choose a nice photo; you'll then share the photo of you two on their social media with a nice happy birthday message. You could do this by hand or you have a computer help you. This is where image segmentation comes.
Real life examples of image segmentation
When you take a picture with your smart phone it may create "bounding boxes" around heads in the viewer. The boxes exist to help you focus your shot (whether you like it or not). Another example if with video meetings. Most video meeting tools allow you to "blur" your background. The video (continual images) will only show you and it'll blur out the background. A very useful feature to help with privacy concerns.
Running your first image segmentation model
To hit the ground running with an image segmentation model will we download one from Hugging Face (Hugging Face is an AI company which helps with services around state-of-the-art deep learning models (mostly text, vision, audio). They have a landing page on image segmentation (https://huggingface.co/tasks/image-segmentation) where they explain it, give examples, provide a demo and provide links to models).
The first thing to try is their demo using their example image on their page (no luck, see below).
No luck with Hugging Face's image segmentation demo
Next is to try running the model locally. The demo above uses "DETR (End-to-End Object Detection) model with ResNet-50 backbone" (https://huggingface.co/facebook/detr-resnet-50-panoptic). This is the most popular segmentation model on Hugging Face and i won't try to explain it here. Let's download the model and use it with our image.
To simplify creating an environment i'm going to run this on google colab (https://colab.research.google.com/drive/1pGayh4ZWeWjhhHRQJqeAfeV9lw8ynvhP#scrollTo=_xiokfNaIxpN) and i'll include the code here for refence
!pip install transformers timm rich pyyaml==5.4.1
from transformers import DetrFeatureExtractor, DetrForSegmentation
from PIL import Image
from rich import inspect
import requests
import xarray as xr
url = 'https://huggingface.co/tasks/assets/image-segmentation/image-segmentation-input.jpeg'
image = Image.open(requests.get(url, stream=True).raw)
image
image.size
feature_extractor = DetrFeatureExtractor.from_pretrained('facebook/detr-resnet-50-panoptic')
model = DetrForSegmentation.from_pretrained('facebook/detr-resnet-50-panoptic')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
inspect(outputs)
pred_masks = outputs.pred_masks
pred_masks.shape
xr.DataArray(pred_masks.squeeze().argmax(axis=0).flipud()).plot()
Image pixel predicted to a COCO class (https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/)
You can see in the image above if the cat was in a zoom call and wanted to "blur background" the display would blur the pixels outside of the outline of the cat.