Our group project investigates the representation of women in video games from 1980 to 2020. To support our analysis, we examined three online datasets: Brisa Palomar’s Gender Representation in Video Games (Kaggle), Alice Corona’s Women in Video Games dataset (GitHub), and Rabie El Kharoua’s Predict Online Gaming Behavior dataset (Kaggle). Each dataset illuminates different aspects of our research—sexualization, narrative role, and audience engagement—and provides some answers to our overarching question: how have women been represented in games, and extending to real life, how does that impact player experience and industry dynamics?
The Gender Representation in Video Games dataset is the most structured and directly relevant of the three. It includes 64 top-selling or top-rated games from 2012 to 2022, divided into three tables: “games,” “characters,” and “sexualization.” The final table assigns each character a four-point sexualization score based on binary indicators: “sexualized clothing,” “trophy,” “damsel in distress,” and “sexualized cutscenes.” This dataset provides a valuable attempt to quantify patterns of representation, allowing us to construct a timeline of sexualization scores and analyze how character traits, roles, and gender correlate with objectification. Its integration of multiple variables, ranging from team demographics to protagonist gender, also makes it possible to explore deeper connections between gendered design and critical reception. However, the dataset has notable limitations. With only 64 games over ten years, the sample is too small to support broader generalizations, especially given our project’s scope across four decades. Some years include as few as five titles, resulting in unstable averages (e.g., a sexualization score of 0 for an entire year, which is highly implausible). Additionally, the four-point metric, while helpful for pattern detection, is reductive. Definitions for categories like “trophy” or “sexualized clothing” remain vague and subjective, with no documentation about who assigned the scores or how their reliability was ensured. As such, these indicators should be treated as suggestive rather than definitive. Despite its issues, the dataset offers a strong foundation for quantitative analysis of gender representation in recent video games.
The Women in Video Games dataset encompasses several CSV files, the most relevant being the “Damsel in Distress Trope” table. This subset focuses on 266 games in which female characters are portrayed as damsels in distress, including data on game profiles, release years, and user reviews. It enables us to visualize how this trope appears across genres, particularly in action and adventure titles. This genre-level insight strengthens our argument that certain game types are especially prone to disempowering portrayals of women. Additionally, the dataset includes qualitative descriptions of each game’s plot and character dynamics, helping us contextualize tropes within broader narratives, which complements our more quantitatively focused sources. Nonetheless, this dataset also presents challenges. Its narrow focus on a single trope risks oversimplifying the diverse ways women are portrayed in games. The absence of metadata or methodological transparency makes it difficult to assess the accuracy or consistency of its classifications. It is unclear how games were selected, what qualified a character as a “damsel,” or how sources were verified. Moreover, the dataset was last updated over a decade ago, meaning it likely misses recent shifts in industry practices and representation trends. While it serves as a meaningful case study of harmful narrative conventions, it is best used as a historical supplement rather than as a basis for broad generalizations.
The third dataset, Predict Online Gaming Behavior, takes a different approach by focusing on player behavior and demographics. It can be employed by machine learning training to model gender disparities in gameplay patterns, such as average playtime, engagement, and genre preferences. We used it to show that female players are consistently underrepresented across all game genres, which lends real-world context to our analysis of in-game character representation. The underrepresentation of women players may stem from various factors, including misogynistic gaming cultures, dissatisfaction with poorly portrayed female characters, and a lack of identification with existing avatars. As Teresa Lynch et al.’s 2025 study suggests, well-represented female characters can be empowering for players, particularly women and gender-diverse individuals, who often express strong emotional and critical responses to character design (Lynch et al.). By situating content analysis alongside data about who plays and why, this dataset helps ground our project in social realities and underscores the broader cultural impact of representation. Nonetheless, this dataset has some drawbacks. Gender is treated as binary, excluding nonbinary and trans players whose representation is also critical to our feminist analysis. Furthermore, the dataset only measures player behavior and does not address in-game content or character portrayal.
Admittedly, our own approach to this data is not neutral. We selected datasets that emphasize problematic portrayals of women, which may reflect a confirmation bias shaped by our commitment to feminist critique. Our interpretation of “empowerment” or “sexualization” is influenced by our cultural background and assumptions. As D’Ignazio and Klein argue in Data Feminism, “the numbers don’t speak for themselves”—we speak for them. This reminder urges us to remain self-aware, especially when working with imperfect datasets or making interpretive judgments. Despite their flaws, these datasets collectively allow us to approach gender representation in video games from multiple angles—quantitative, qualitative, and contextual. By combining character-level analysis with player data and trope examination, we develop a more holistic picture of the industry’s treatment of women. While none of these datasets are comprehensive, their limitations invite us to ask deeper questions and imagine better methods for future work, perhaps through community-based tagging, intersectional frameworks, or original data collection rooted in transparency and care.
Corona, Alice. “Women in Video Games”. GitHub, 2015. https://github.com/ali-ce/datasets/tree/master/Women-in-Videogames. Accessed 23 June 2025.
D’Ignazio, Catherine, and Lauren F. Klein. “The Numbers Don’t Speak for Themselves.” Data Feminism, MIT Press, 2020, https://data-feminism.mitpress.mit.edu/pub/czq9dfs5. Accessed 23 June 2025.
El Kharoua, Rabie. “Predict Online Gaming Behavior Dataset”. Kaggle, 2024. https://doi.org/10.34740/kaggle/dsv/8742674. Accessed 23 June 2025.
Lynch, Teresa, et al. “Empowered by the Experience: Playing as Female Characters in Video Games.” Media and Communication, vol. 13, 2025, https://doi.org/10.17645/mac.8733. Accessed 23 June 2025.
Palomar, Brisa. “Gender Representation in Video Games”. Kaggle, 2022. https://doi.org/10.34740/kaggle/dsv/4482342. Accessed 23 June 2025.