The aim of this project was to compare object records from the Metropolitan Museum of Art API with corresponding data from the Wikidata API. The MET data is limited to all of the object records coded as 'isHighlight' in the collection.
The visualisations created from this data mashup are result of counting the occurrence of each 'instanceOfLabel' Wikidata property in the Met highlight object records.
It was very clear that the most common objects included in the Met highlights selection are paintings!
I started out by communicating with the Met API to return object data for the records that contained the 'isHighlight' code. I did this by including the request in the search url:
https://collectionapi.metmuseum.org/public/collection/v1/search?isHighlight=true&q=*
I used information from the Met API website to build this URL. Parsing through Met object records, I stored the returned data as individual JSON files in a directory named objects_json (using glob and os path). I then used glob to open up all of the object JSON files, creating a regular expression for extracting all of the object Wikidata QIDs from the data. I saved the Wikidata QIDs in their own JSON file.
In another script I formed my request to Wikidata's SPARQL endpoint using the triple quote string function in Python. I tested out this request in the SPARQL query service before inserting it into the code. I also used the QID JSON file and the python join function to include these QIDs in my SPARQL request. Additionally, I made sure that all of the Wikidata was returned as JSON. I then extracted the 'item', 'itemLabel', 'instanceOf', and 'instanceOfLabel' Wikidata properties from my list of QIDs.
Looping through the returned JSON to extract the values from the 'instanceOfLabel' key, I created an empty dictionary to count these values. I saved this dictionary into a separate JSON file. Finally, I made another script to sort the JSON count dictionary into individual dictionaries, making a different JSON file that was more digestible for Tableau visualization.
Met object record JSON
Wikidata QIDs JSON
Instance count JSON
Visualization JSON
From the data it was evident that the most common 'instanceOfLabel' values in the Met highlight object records were paintings, sculptures, literary works, and drawings. The frequency of literary works surprised me the most because I was unaware that the Met had such an extensive collection of textual and print materials. The image on the right is of a highlighted 1535 Albrecht Dürer geometry manual (image accessed from the Metropolitan Museum website 2022).
My first challenge was constructing the query to the Met API. I had a lot of struggles with the format and the way the data is returned. The requests url has to include the relevant object ID in order to access the entire object record. This required making sure the url was not hardcoded with a particular object ID. I also had to figure out how to best recall all of the 'isHighlight' records when the API required the q parameter to be filled. Professor Matt Miller helped me with this in suggesting I populate that parameter with *.
I also struggled with the Wikidata endpoint. Despite my query functioning on the SPARQL website, it would not work in my python code. Professor Miller suggested replacing the GET request with a POST request, which finally returned the data.
Finally, I had never done data visualization before and had to teach myself how to use Tableau Public! In future iterations of the project I would definitely try for some more complex visualizations as I quite enjoyed the process.
This project would not have been possible without the help of Professor Matt Miller who teaches Pratt Institute School of Information's Programming for Cultural Heritage INFO 664 class!