WNBA game attendance rates: WNBA game attendance rates provide a tangible measure of fan commitment that complements online indicators of interest. By examining attendance figures for each game over time we can validate whether spikes in media discussion translate into larger live audiences. Attendance data also allow us to control for factors such as team performance and high profile matchups that naturally affect turnout. Finally tracking attendance over the long term reveals whether digital narratives lead to sustained growth in fan support rather than fleeting moments of excitement.
Media topics and its sentiments throughout the years: Mapping the themes and sentiment of media conversation over the years reveals how the narrative around the WNBA has evolved and influenced support. Year by year analysis of topic clusters and sentiment scores makes it possible to identify shifts in discourse from pure game recaps to personal brand building or advocacy efforts. By measuring the relative impact of different types of posts such as highlight clips or social justice messages we can see which narratives align most closely with rises in viewership and attendance. Tracking these changes also shows how fan communities have matured and diversified as they engage in richer and more varied dialogues.
Search-Interest Trends (Google Trends): Search interest trends from Google Trends offer an independent measure of public curiosity that can be compared to media and attendance data. Tracking search volumes for terms such as WNBA names and team brands helps distinguish passive information seeking from active fan engagement. It also allows us to identify interest cycles around events such as drafts or major games that often set the stage for media bursts. Finally comparing peaks in search volume with shifts in online sentiment and real world support lets us cross validate the timing of digital conversations and their impact on broader fan behavior.
How have discussions on media outlets shaped national enthusiasm and support for the WNBA?
With our research question focused around media's influence on the WNBA, we opted to find as much datasets that are related to this question as much as possible. At first, to assemble the evidence for our DH project, we began by defining the key concepts such as “media discussions,” “national enthusiasm,” and “support for the WNBA” and then mapped those to measurable indicators.
Eventually we ended up with lots of different datasets, with the main common theme that they are all focusing on WNBA game attendance and media coverage, respectively. The dataset on WNBA game attendance records we used was compiled by Across the Timeline, an independent data archive dedicated to women’s basketball. This particular dataset includes attendance figures for every game in WNBA history, dating back to the league’s inaugural season in 1997. Each entry contains the date of the game, the teams involved, the location, and the reported number of attendees. Because it spans multiple decades, this dataset allows for the analysis of long-term trends and fluctuations in fan engagement. The dataset on WNBA media representation was created by us, web scraping via the MediaCloud API to create our own dataset of media headlines from 1995 to 2025 relating to the WNBA. We utilized keyword-based tagging (specifically looking for controversies, milestones, player movement, sponsorships and other non-specific headlines), sentiment analysis, topical focus and publication frequency to understand how the WNBA is represented.
Besides, we also leverage that google has an open source trends for different keywords, and we were able to get the trends search for the keyword "wnba" from 2004 to now. Notably, our Google Trends dataset captures weekly search-interest values for the term “WNBA” from January 2004 through June 2025. Each record consists of:
Date (week-ending): The calendar week corresponding to the observation.
Search Interest Index: A normalized value (0-100) reflecting relative query volume nationwide.
URL to Trend Snapshot: A direct link to the Trends chart for that week, enabling drill-down into related queries.
These datasets provides an essential quantitative foundation for our project on the recent rise in WNBA popularity. One of our central research questions is how changes in cultural attitudes, media coverage, and player visibility (especially on media) have impacted real-world fan behaviors. Attendance data provides a concrete, longitudinal signal of fan interest that can help validate or challenge claims about surging popularity. For example, if we identify a substantial increase in attendance figures in 2023 and 2024, particularly around games featuring high-profile players like Caitlin Clark, Angel Reese, or Sabrina Ionescu, we can contextualize that trend alongside broader cultural or technological shifts. Attendance figures can also be disaggregated by team and location, allowing us to examine regional differences or team-specific fan engagement. These figures directly create the attendance-related bar graphs seen on the front of the website. Media focused data similarly allows us to identify trends based on sentiment over time, sentiment by topic, topic distributions and media outlet frequencies, which are highlighted within our visualizations. These data components allow us to match spikes and declines in public attention (as measured by search behavior) with specific media-coverage events or league milestones. By integrating the attendance archive, our scraped headline database, and Google Trends series, we obtain a multi-dimensional view of how “media discussions” (volume, tone, topic) align with; and potentially drive “national enthusiasm” (digital interest and in-person attendance) and broader “support for the WNBA” over time.
From the google trends for the keyword "wnba", we can see that throughout the years, media coverage increase significantly for WNBA, especially the huge surge around 2023 forward. With this in mind, to ensure transparency and avoid bias in how we categorized WNBA media headlines, we developed a systematic tagging process based on manually curated keyword lists, automated sentiment analysis, and iterative refinement through exploratory data analysis (EDA).
Topic Categorization- We assigned each headline to a single primary topic based on its central subject, using a combination of keyword matching and contextual reading. The categories were defined prior to analysis and cover a broad range of WNBA-related themes:
activism_social_justice – Headlines about protests, advocacy, racial justice, LGBTQ+ rights
pay_equity – Mentions of salary, contracts, equal pay
recognition_milestones – Awards, records, career achievements
offcourt_drama – Controversies, suspensions, disputes, or public criticism
injury_report – Injury updates or medical news
game_result – Match outcomes, scores, and in-game performance
business_media_coverage – Broadcasting, visibility, sponsorships, media deals
intersectionality – Stories involving overlapping identities (race, gender, class)
gender_focus – Headlines emphasizing gendered framing or roles
other – General updates or neutral headlines that didn’t strongly fit any category
To reduce misclassification, we reviewed edge cases manually and refined category definitions iteratively during data cleaning. With Sentiment Tagging- We applied two sentiment analysis tools:
VADER (Valence Aware Dictionary and Sentiment Reasoner): tailored for short text like headlines
TextBlob: offered additional polarity and subjectivity scores
Each headline was assigned a vader_compound score between -1 (very negative) and +1 (very positive). Based on standard thresholds:
vader_sentiment_label = "Positive" if compound ≥ 0.05
Neutral if between -0.05 and 0.05
Negative if compound ≤ -0.05
These thresholds are widely used in computational social science research and were not modified to influence results. To further mitigate potential bias, we avoided manually assigning sentiment and allowed the NLP tools to classify tone based on natural language features. We also compared sentiment across topics, rather than judging headlines in isolation, to identify broader framing trends.
Bias Reduction Measures:
We did not allow personal opinions to influence how headlines were tagged or interpreted.
We normalized media outlet names to avoid overrepresentation from duplicated sources.
“Other” was used sparingly and only when no category clearly applied - not as a way to filter out inconvenient data.
Our final visualizations include both positive and negative sentiment across all topics, helping to ensure a balanced view of WNBA media coverage.
By following these transparent and repeatable steps, we aimed to preserve the integrity of our analysis and allow the data to speak for itself. Below is a visualization that shows the headline sentiment overtime.
While these datasets is extremely valuable, it is not without its limitations. One key limitation is the accuracy and consistency of attendance reporting. Reported attendance is often not based on ticket scans or actual turnstile counts but on tickets distributed, which can include unused comps or giveaways. As a result, it may inflate the appearance of fan engagement, particularly in earlier years when marketing strategies relied heavily on bulk ticket distribution. Additionally, there is no metadata included in the dataset to explain the methods of data collection or whether definitions of “attendance” changed over time. This lack of transparency can introduce measurement error, especially when comparing modern figures against those from the early 2000s.
Limitations even exist when creating our own dataset. The API rate limits restricted us from a complete scraping of every piece of relevant data, and duplicate headlines would frequently come up in our data. Additionally, the snippet fields were empty after our scraping process, which has forced use to depend on headlines alone for our media assessment. Finally, the media outlet representation was uneven and changed dramatically depending on both the time period and topic, which can make certain aspects of our data have a greater weight of influence than intended. To counter these things, we began by scraping our data in large batches across multiple dates to ensure the highest amount of accuracy possible. We manually verified the uniqueness of headlines and normalized outlet names to ensure that the data doesn't repeat itself, which has similarly maintained high levels of accuracy. Finally, we redefined our keyword lists through much trial and error, putting our dataset through EDA and cleaning it overall.
Another challenge is that attendance alone cannot capture the full scope of popularity. With the rise of streaming services and media, fan engagement increasingly occurs outside the physical arena. Someone who follows every game on YouTubeTV, engages with team accounts on Instagram, or participates in Twitter discourse contributes meaningfully to the WNBA's cultural footprint, but will not appear in this data. So, we must be careful not to conflate in-person attendance with overall relevance or support. In our project, we plan to pair this dataset with media (specifically targeting news/newspaper) scraping to provide a fuller picture of how the WNBA’s visibility and fan culture are evolving. Despite these limitations, we believe that the attendance dataset can help illuminate which moments and players catalyze public interest, especially when compared to temporal events like the NCAA-to-WNBA transitions of players like Clark and Reese. It can also help identify long-term structural patterns, such as whether attendance peaks during Olympic years, or how league expansion and relocation affect fan bases. By displaying these changes through an interactive visualization, we hope to not only make the data more accessible but also highlight under-appreciated moments of growth and stability in the league’s history.
We are also aware of our own position in interpreting this data. Since we are trying to show the changes in popularity of the WNBA, we may be especially attuned to moments of growth and less focused on sustained fan bases that have supported the WNBA over decades. It is important not to let the current media hype overshadow the contributions of past players and communities. Additionally, our decision to center attendance as a key metric, even with noted limitations, reflects a bias toward quantifiable, visible signals of legitimacy. While necessary for this project’s scope, we recognize that popularity and cultural significance are not always easily measured. Our approach is informed in part by Catherine D’Ignazio and Lauren Klein’s framework in Data Feminism, which encourages scholars to question not just what data shows, but what is missing and who benefits from its collection. As D’Ignazio and Klein argue, “Working with data without considering power can reinforce the very systems of inequality we seek to challenge” (D’Ignazio and Klein 12). By applying this lens, we are prompted to ask not only why attendance is increasing but also whose interests are being served by the renewed attention to the WNBA and whose stories still remain overlooked.
In summary, the Across the Timeline attendance and our Web scraping datasets are a valuable resource that gives structure and continuity to our study of WNBA popularity. It provides measurable insight into patterns of engagement, while also requiring careful contextualization and supplementation. By situating this dataset within a broader interpretive framework and acknowledging both its affordances and constraints, we hope to create a more meaningful and representative narrative about how and why the WNBA is finally receiving nationwide attention.