Peter Blank, Jack Reed, Chris Beer, Jessie Keck, Josh Schneider, Hilary Thorsen
After experimenting with Google Cloud Vision's API, DLSS software engineer Jack Reed thought we might be able to glean some useful image classification data through this automated service.
Metadata development for the extensive image collections at Stanford Library is quite expensive and time consuming yet it is essential if we want our collections to be discoverable and navigable by researchers. And yet, the commercial services available to us offer results that cater to business applications rather than the needs of libraries, archives and museums.
This project will explore the cost/benefit of "out-of-the-box" services vs. more customized solutions. If we find that we need customized models to produce useful labels for our image collections, can we generalize those models enough to make them applicable across not only our own holdings but those of our partner institutions as well. And can we develop models tailored for particular types of image collections that offer valuable specialized results?
Using the image above, titled "Martin Luther King Jr. & Joan Baez march to integrate schools, Grenada, MS, 1966" we compared the results from three different services: Amazon Rekognition, Clarifai, and Google Cloud Vision. The results below show the labels that each service gave to the image with a certainty score beside it.
There is also inconsistency between services in facial recognition. In the example below, Google's service recognized 10 faces and Clarifai recognized half as many.
Faces recognized
Faces recognized
Though none of the services were able to find matching faces, Google provides a Web matching service that shows where the entire image can be found on the web. The first two hits are from the Stanford Libraries' Exhibit for this collection. Further down is a link to Jack Reed's post about this experiment.
Hundreds of thousands of photographs from the Stanford Library collections.
This project will evaluate the kind of first order classification provided by these services as show above to determine whether there is enough useful information. Depending upon the successes or failures with that test we may also want to experiment with unsupervised learning across the public collections to see if automated clustering, without any pre-defined labels, will help us devise new useful classifications for images.
If the first order approach is at least somewhat useful, it will help us build label sets for sharing images across institutional repositories. Similarly, if we can uncover meaningful classifications from the automated machine clustering process, we may be able to develop a classification algorithm that is more appropriate to library, archive and museum content than those provided by the commercial services.
https://cloud.google.com/solutions/image-search-app-with-cloud-vision
Peter Blank, Josh Schneider and Hilary Thorsen
We ran some additional images against the Cloud Vision API to see if there would be any useful information to gain.