PErsonal Picture Search Engine
Insoo Chung, Kwangkyu Hwang, Youngki Kim, Jinhyun Park
Consider the query: "Insoo and Jihyun at Grand Canyon"
Works pretty well!
Results: N/AΒ
π§...
Can we get best of both worlds?
Tag-based + semantic search
1) Our team members gathered 1,500 images, contributing approximately 300-400 each. We preprocessed these images by eliminating redundant spaces in their filenames and standardizing their extensions to .jpg.Β
2) For face tagging, we each provided nine selfie images in various lights and angles, which were used as ground truth for face tagging.
1) Salesforce/blip-image-captioning-large is utilized for generating image captions.
2) sentence-transformers/all-distilroberta-v1 is used for creating query embeddings/vectors.
3) DeepFace is used to extract facial embeddings.
1) We prepared reference face images of ourselves (Insoo, Kwangkyu, Youngki, Jinhyun) and precomputed face embeddings
2) Face bounding boxes are identified from images, and from them, face embeddings are computed (offline inference)
3) Distances between embeddings are used to determine whose face it is then face tags are attached to the images (offline inference).
4) Names are identified from user queries to compute face tag score!
* For step 2,3 Deepface is used.
1) Captions are generated from images, then BERT embedding matrix for all captions is computed (offline inference)
2) An user query is received, then a BERT embeddingΒ vector for it is computed (online inference)
3) Dot-product scores between embedding vectors are computed as a measure of similarity!
Kwangkyu's pic in front of a tower
Kwangkyu's winter trip
We utilize nDCG@5 to evaluate the performance of our model on a selected set of queries.
The oracle results were composed to IDCG.
Example oracle results for "Insoo taking a selfie": (photo id: 54 / rel: 5, photo id:415 / rel:5, photo id: 58 / rel: 5, photo id: 1010/ rel: 3, photo id: 953/ rel: 3)
Evaluation results for sample queries (NDCG@5):
βββββββββββββββββββββββββ¦βββββββββ¦ββββββββββββββββ
β Β Β Β Β Β Β Β Β Β Β Β β <Ours> β <Apple Photo> β
β ββββββββββββββββββββββββ¬βββββββββ¬ββββββββββββββββ£
β Insoo taking a selfie βΒ 0.82Β βΒ Β Β 0.00 Β Β β
β ββββββββββββββββββββββββ¬βββββββββ¬ββββββββββββββββ£
βΒ Youngki with friends βΒ 0.67Β βΒ Β Β 0.00 Β Β β
β ββββββββββββββββββββββββ¬βββββββββ¬ββββββββββββββββ£
β Β Β Β Theme parkΒ Β Β βΒ 0.50Β βΒ Β Β 0.30 Β Β β
β ββββββββββββββββββββββββ¬βββββββββ¬ββββββββββββββββ£
β Β Kwangkyu with snowΒ βΒ 0.75Β βΒ Β Β 0.00 Β Β β
βββββββββββββββββββββββββ©βββββββββ©ββββββββββββββββ
The process of 1) face-detection 2) comparing the faces against references pose possibilities of error propagation.
Caption: a photography of two people standing in front of a pool
Since existing image captioning modules aren't perfect,Β their captions cannot be perfect.
Pros:
PEPSE combines tag-search and semantic-search using BERT embeddings, providing users with an advanced search method to easily search for their photos based on multiple factors such as faces, and semantic search.
PEPSE uses Deepface for face matching and BERT for semantic matching to ensure high accuracy in searching for photos that were not labeled with specific tags.
PEPSE makes organizing and finding your photos a lot easier, which saves time and effort for users.
PEPSE provides an overall good result, which indicates its effectiveness and efficiency as a personal picture search engine.
Cons:
Some parts of PEPSE may not work well sometimes, which can be frustrating for users who rely on the engine to search for their photos.
PEPSE may not be as effective in searching for photos that have very specific or unique features that are difficult to match using the available search methods.