What information is included in the dataset?
The dataset includes a list of over 1000 AAA and indie games that were released between 1980 and 2023. The variables include release date, developer, genre, and reviews.
What information, events, or phenomena your dataset can illuminate and cannot?
The data is able to highlight the popularity of certain game genres over time, along with exploring what genres tend to be paired when developing games. With the release date variable, we are able to see how certain socioeconomic events affected the consensus of ratings on video games. However, due to lack of live statistics, such as current player count, our EDA is limited to tracking trends based on categorical variables, over numerical statistics.
What are the ideological effects of the way in which your sources have been divided into data? (dataset ontology)
This specific dataset was filtered by the author of the Kaggle post. The original dataset had over 100k games. The author also mentioned in their description that they used a mixed method of selecting to find both popular games and indie games. This means the current dataset may suffer from selection bias. Another interesting ideological impact on the data is from its origins from a website called Backloggd. The website focuses on adding a social element to video game data collection so the dataset is more tailored to humanities research. This is useful in a humanities context because users are able to comment and give feedback directly on the website. One way the dataset includes a greater emphasis on the humanities is the inclusion of a column with user reviews separated individually. This is a useful element to have a dataset because it gives increased access to the reactions of players for sentiment analysis.
What information is included in the dataset?
This dataset includes over 39,000+ unique games. It includes details such as developer, publisher, console, and genre, as well as critic score and total sales.
What information, events, or phenomena your dataset can illuminate and cannot?
The dataset includes the release dates of games dating from 1971 to 2024 along with the respective platform/console of each game. The most important data from this dataset for our project is the amount of sales each game received. Using this information, our project can find the popularity of each listed game. The dataset illuminates the advancement in gaming technology over the years. Although, the dataset omits detailed specifications of the consoles, as well as only listing critic reviews and not player review scores. Due to this, we will not have as much information towards the sentiment players have towards each game.
What are the ideological effects of the way in which your sources have been divided into data? (dataset ontology)
This specific dataset has information about not only total sales but also sales by region. It is interesting to note the regions that are represented: North America, Japan, Europe & Africa, and Rest of World. The separation of Japan from the rest of the world gives the connotation that the author believes Japan is an important country for video games. This is because it is the only country with its own sales column. The combination of Europe & Africa into one column implies the author believes these regions are less important for video game sales. This is because the two regions are vastly different not only geographically but culturally. So combining these two regions is less of a humanities approach as it loses information related to more diverse video game player populations. The dataset also has an element indicating when the data was last updated. This implies that the data is actively being looked at still so the dataset will reflect current day trends as well. This adds more of a humanities approach to the dataset as it can give information on larger periods.