PEPSE

PErsonal Picture Search Engine

Insoo Chung, Kwangkyu Hwang, Youngki Kim, Jinhyun Park

Semantic search engine for your own photos

Consider the query: "Insoo and Jihyun at Grand Canyon"

Tag-based

Works pretty well!

Semantic search

Results: N/A

🧐...

Can we get best of both worlds?

Tag-based + semantic search

👉 Code 👈

Approach

Data Collection

1) Our team members gathered 1,500 images, contributing approximately 300-400 each. We preprocessed these images by eliminating redundant spaces in their filenames and standardizing their extensions to .jpg.

2) For face tagging, we each provided nine selfie images in various lights and angles, which were used as ground truth for face tagging.

Pretrained model

1) Salesforce/blip-image-captioning-large is utilized for generating image captions.

2) sentence-transformers/all-distilroberta-v1 is used for creating query embeddings/vectors.

3) DeepFace is used to extract facial embeddings.

Faces matching

1) We prepared reference face images of ourselves (Insoo, Kwangkyu, Youngki, Jinhyun) and precomputed face embeddings

2) Face bounding boxes are identified from images, and from them, face embeddings are computed (offline inference)

3) Distances between embeddings are used to determine whose face it is then face tags are attached to the images (offline inference).

4) Names are identified from user queries to compute face tag score!

* For step 2,3 Deepface is used.

Sentence-based semantic search

1) Captions are generated from images, then BERT embedding matrix for all captions is computed (offline inference)

2) An user query is received, then a BERT embedding vector for it is computed (online inference)

3) Dot-product scores between embedding vectors are computed as a measure of similarity!

Results

Selected (but not 🍒-picked) results

Jinhyun's baby photo in a frame

Beer

Insoo with his cat and his wife

A dude enjoying beer

Youngki and his friends

A photo of us all jumping

Pic of us posing in a train station

Jinhyun enjoying his coffee

Caption-centric vs Tag-centric search

Kwangkyu's pic in front of a tower

Tag-centric

When the name is weighted heavily, pictures with Kwangkyu tags are favored, resulting in images containing kwangkyu name tags (although 2nd, 3rd pics are incorrectly tagged).

Caption-centric

With the same query, if we weigh caption score more, we get more pictures with towers.

Kwangkyu's winter trip

Tag-centric

All kwangkyu pics.

Caption-centric

No kwangkyu pics.

Evaluation: nDCG@5

We utilize nDCG@5 to evaluate the performance of our model on a selected set of queries.
The oracle results were composed to IDCG.
- Example oracle results for "Insoo taking a selfie": (photo id: 54 / rel: 5, photo id:415 / rel:5, photo id: 58 / rel: 5, photo id: 1010/ rel: 3, photo id: 953/ rel: 3)
Evaluation results for sample queries (NDCG@5):

╔═══════════════════════╦════════╦═══════════════╗

║ ║ <Ours> ║ <Apple Photo> ║

╠═══════════════════════╬════════╬═══════════════╣

║ Insoo taking a selfie ║ 0.82 ║ 0.00 ║

╠═══════════════════════╬════════╬═══════════════╣

║ Youngki with friends ║ 0.67 ║ 0.00 ║

╠═══════════════════════╬════════╬═══════════════╣

║ Theme park ║ 0.50 ║ 0.30 ║

╠═══════════════════════╬════════╬═══════════════╣

║ Kwangkyu with snow ║ 0.75 ║ 0.00 ║

╚═══════════════════════╩════════╩═══════════════╝

Shortcomings

Tag errors

The process of 1) face-detection 2) comparing the faces against references pose possibilities of error propagation.

Caption errors

Caption: a photography of two people standing in front of a pool

Since existing image captioning modules aren't perfect, their captions cannot be perfect.

Conclusion

Pros:

PEPSE combines tag-search and semantic-search using BERT embeddings, providing users with an advanced search method to easily search for their photos based on multiple factors such as faces, and semantic search.
PEPSE uses Deepface for face matching and BERT for semantic matching to ensure high accuracy in searching for photos that were not labeled with specific tags.
PEPSE makes organizing and finding your photos a lot easier, which saves time and effort for users.
PEPSE provides an overall good result, which indicates its effectiveness and efficiency as a personal picture search engine.

Cons:

Some parts of PEPSE may not work well sometimes, which can be frustrating for users who rely on the engine to search for their photos.
PEPSE may not be as effective in searching for photos that have very specific or unique features that are difficult to match using the available search methods.

Drink PEPSE 🥤

Page updated

Report abuse