An exploration of dogs painted by artists found on wikidata's open data repository
The inspiration for this project came to me while visiting the Metropolitan Museum of Art. I happened to view an exhibition called "In Praise of Painting: Dutch Masterpieces at The Met". As a dog owner, I found myself spotting all the dogs painted amongst the domestic paintings on display while I perused the gallery. My background in Art History also played a part in exploring this subject. When I finally sat down to focus on the topic I wanted to explore for this project, I elected to examine the Wikidata repository and limit my query to depictions of dogs in all instances of paintings. My main objective was to extract as much data available via this query and analyze the information presented.
The main source of data came from Wikidata, an open data repository. I used the Wikidata Query Service to construct a SPARQL query to access information on the paintings. You can find the dataset in my Github listed below.
Step I: Wikidata.org
My first step was to construct a query using Wikidata Query Service and test it with their visual interface for errors, and then review results.
The query was built with the intention of gathering QID's which have the property "instance of" with a the value of "painting" and also "depicts" the value "dog".
In addition, the query includes the optional properties:
image | location | collection | owned by | creator | made from material | country | genre | described at URL | copyright status | time period | movement | inception | title
You can find the query I built with this link: Wikidata Query for Dogs in Paintings
Step II: Requesting data from Wikidata using Python
The second step was to use python requests and json module.
The objective was to receive the data back with json headers and view the printed output in the terminal.
Step III: Wikidata data to JSON file.
The third step was to convert the wikidata output to a JSON file with the chosen key values.
Step IV: Writing a CSV file
The fourth step was to write a script to transform the JSON file to a CSV file.
This step was important to view the data in tabular format and begin the next step.
Step V: Cleaning the dataset
I used OpenRefine and Microsoft Excel to tidy my data.
OpenRefine was used for simple transformations and Excel was used for reviewing row by row.
Step VI: Tableau
My final step was to use Tableau software to create visualizations with the cleaned dataset.
Please go to my analysis page to read about findings and visualizations.
Overall, this was a very enriching experience. Unfortunately, I learned late into my steps that the data exported from wikidata ended up including multiple repetitive records, leading to an overestimation of results. After hours of cleaning the data for duplicates, the total went from about ~11k to ~2k rows of data. I think in the future I would need to create a script that fixes this problem because it was time consuming to go through every single row in Microsoft Excel. As a beginner user of OpenRefine, this tool was used for simple transformations. Ultimately, I think there was still a decent amount of results to interpret! The paintings from the query are in several different collections around the world. For the most part, the paintings were created with oil paint and a look at the image results in the Gallery, exemplifies that dogs were depicted mainly as companions, spectators, and part of daily life.
Please refer to my Github for files and documentation