The goal of this project is to create a set of interactive and dynamic data visualizations using The Simpsons transcripts from Kaggle allowing users to analyze and explore the data. This web app consists of several D3-powered visualizations, such as a bar chart, histogram, word cloud, and line chart.
The project was developed using primarily JavaScript, HTML, and CSS and can be run locally by cloning the main branch of the GitHub repository and opening the index.html file in a web browser. The Simpsons transcripts were obtained from Kaggle, and the visualizations were created using D3.js.
Interactive bar chart displaying individual characters sorted by total lines featuring a brush to scroll through the data. Hovering over a bar or label will display a tooltip showing an image of the character along with their info. Clicking on the bar/label will apply a filter to all the visualizations to only display lines based on that character.
Displays the most frequent locations for either the entire show or the current filter. Hover over a location to see the location and the number of times it is used. Click on a location to filter the page on the current location.
D3-powered histogram displaying total number of lines per episode colored by season. Features a brush to select an area of the histogram and tooltips that show each episode's number, season, and title. Tooltips also feature thumbnail images from TVMaze API. Click on a bar to filter the page and only show selected episodes or seasons based on the value of the dropdown in the top right corner.
The chord diagram displays the relationships and interactions between characters. Hover over a chord to see the characters involved and the number of times they spoke to each other. Also includes a picture of the character using the Simpsons Wiki API. Click on a chord to filter the page and only show the selected characters.
The word cloud displays the most frequently used words for either the entire show or the current filter. Hover over a word to see the word and the number of times it is used, or click on a word to update the line chart and show the number of times the word is used in each episode
The line-chart displays number of times a word is used in each episode of the show. Hover over the line to see each episode's number, season, title, and number of times the word is used. Use the input box to enter a specific word or phrase.
To use this web application, visit the latest deployment or clone the GitHub repo and open the index.html file in a web browser. The data should be automatically loaded and the data visualizations should appear on the screen.
All of the included visualizations include filtering support. In the character bar-chart, click on one or multiple bar charts to only display those selected characters. Similarly, select a bar in the histogram to filter the data based on the selected episode or season based on the value in the dropdown box. Clicking a chord in the chord diagram will apply a filter on those two characters. Finally, selecting a word in the word cloud will update the line-chart to show the use of that word over time.
After applying a filter or selection, click the "Reset Charts" to reset the data and refresh all visualizations.
This web app utilizes The Simpsons transcripts obtained from Kaggle. The data includes a unique identifier for each line of dialogue (id), the episode identifier (episode_id), episode number (number), and title (episode_title). Other information includes the raw text of the line (raw_text), timestamp in milliseconds (timestamp_in_ms), and whether it's a speaking line (speaking_line). Character information includes the character's identifier (character_id), the raw text of the character's name (raw_character_text), and the location identifier (location_id).
Additional data includes the raw text of the location (raw_location_text), spoken words (spoken_words), normalized text (normalized_text), and word count (word_count). The season (season), prior speaker (prev_speaker), next speaker (next_speaker), and episode of the season (episode_of_season) are also included.