Daniel de Monteiro - demonteiro@wisc.edu

Adeel Iqbal - aiqbal3@wisc.edu

Theodore Nguyen - ttnguyen38@wisc.edu

Motivation

What We Are Trying To Solve:

Today, cameras are more accessible than ever: 97% of Americans own a smartphone of some kind. This means almost every American has access to a high-quality camera that can easily fit in the pockets of common articles of clothing. With these phones, people are continually taking pictures to share life experiences on Facebook and Instagram; for some, the photos are primarily kept for their own personal sake. Generally, when group photos or landscape pictures are taken, more than one photo is taken for a single winning shot. That leaves the issue of having many duplicate photos of the same subject in the user’s camera roll. The user must then manually scan through the photos, deciding which photos would be kept for each corresponding subject.

Why We Want To Solve It:

We want to solve this issue because it will save people time. After each “photo session”, people would typically look through the photos taken and decide which ones they would like to keep. Most of the time, the photos look almost identical, with little differences among them. Automating this process will save time and enable quicker photo-sharing capabilities. Furthermore, with smartphone cameras increasing in quality every year, file sizes of photos are also increasing. Filtering out the unnecessary photos will leave more storage space on the user’s phone and in the cloud for backups.



The Approach

Possible Steps To The Solution:

The possible steps to the solution would be to create a program that would scan through camera rolls on smartphones and find pictures that are visually similar to each other. Within each of these groups of visually similar images, the program would find the best image and discard the rest of the images. This would be done for all photos in the user’s camera roll.

In order to create this program, there are two major algorithms that will have to be used. First, an algorithm must be created to find the visually similar images. There could be thousands of photos in a user’s camera roll, and an algorithm must be created to scan through those images and find the ones that look nearly identical. Furthermore, once those visually similar images have been found, another algorithm must be created to find the best image. This could include things such as the blurriness of the images, brightness of the image, the zoom level of the image (in a group photo, making sure everyone is included), and other factors.

Time Table:

Oct 12 - Turn in project proposal

Oct 22 - Gather all images to use for project (dataset), start project webpage

Nov 2 - Finish first part of project (algorithm to find visually similar images in a dataset)

Nov 9 - Mid-term report due

Nov 29 - Finish second part of project (algorithm to select best image among visually similar images)

Dec 7 - Final presentation; Dec 16 - Project webpage due



Implementation

  • We created a program in MATLAB that would take a collection of pictures as input

  • Our algorithm would then find similar photos within that collection and group those similar photos into separate folders

  • Within each of those groups of similar photos, our program would select the “best” photo from each group

  • Best photo was based on various criteria such as brightness level and blurriness level

  • At the end, left with only the best photos, removing as many “duplicate” photos as possible (saving data on phone, computer, etc.)



Results

  • Initial collection of photos


  • Best photos selected after run through our program

  • Program is not perfect, does not always group similar images together

    • However, in this example, able to free up 55% of data from original collection



Implementation Problems

We had a few difficulties implementing the feature of finding similar images in the dataset. Our end goal was to group together the sets of similar images together and separate them among the other images. However, we could only handle one image to find other similar images from the dataset. This issue was eventually resolved.

Other issues were that our algorithm would mark one image as more visually similar to another image, when it should not have been. For example, there is a picture of an Xbox One controller that our algorithm classified as more visually similar to a picture of a YETI cup, than when two YETI cups were compared to each other.



What We Learned

  • Automated process of deleting photos may sometimes have errors, where some unalike photos are categorized as visually similar to others

    • Algorithm classified Xbox One controller as similar to a YETI cup picture

    • Program may be prone to unintentional biases

  • Users have preferences in whether or not they would like to manually review photos prior to deletion

    • Should have option to recover photos after algorithm finishes run

  • Database tools may facilitate the handling of large amounts of data



References:

Pew Research, Center. “Demographics of Mobile Device Ownership and Adoption in the United States.” Pew Research Center: Internet, Science & Tech, Pew Research Center, 23 Nov. 2021, https://www.pewresearch.org/internet/fact-sheet/mobile/.