`Deep Space Nine' is set on a space station orbiting the planet Bajor. This time, Commander Benjamin Sisko is in charge of a diverse crew. There's no USS Enterprise to help them. Sisko and the crew must fight off rival alien species who want control of Deep Space Nine because of its strategic position close to a wormhole, which allows speedy travel to the far reaches of space. In all their crisis we play a very important part we look at all their conversations and analyse it for class while they're off fighting aliens.
FUN!
We got the data from Kaggle at this link:https://www.kaggle.com/datasets/gjbroughton/start-trek-scripts?resource=download. The people from Kaggle scraped it from this website: http://www.chakoteya.net/StarTrek/index.html.
This graph is showing the no. of views of this particular dataset. It's not the most popular but text datasets are anyway very difficult to analyse.
D3.js (also known as D3, short for Data-Driven Documents) is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It makes use of Scalable Vector Graphics (SVG), HTML5, and Cascading Style Sheets (CSS) standards.
Numpy, Pandas from Python were also used for doing preprocessing on the data
Preprocessing:
The data originally had all of the series for Star Trek. We only wanted to look at the Deep Space Nine series. So we went through and removed all of the JSON data that was not associated with DS9.
Another pre-processing thing we had to do was change chief O'Brien's name to OBrien. This is because the apostrophe was causing issues in our code.
The data had two JSON files. One JSON file contained the entire script. The other JSON file had each character with all of their lines for each episode.
Depending on what visualization we wanted to use, we would process either all lines data or the lines per character data. Most of the visualizations could use the lines per character because we were not looking at interactions with the exception of the scene breakdown visualizations.
Data view:
We used four different types of visualizations.
We used chord diagrams, bar charts, line charts and word Maps.
We used bar charts the most. we chose to use the bar charts to display things like the amount of words per season and who is speaking what.
there are drop downs To interact with most of the data. There are drop downs for the most important characters. There are also drop-downs for a specific season and episodes in a season. Each of the views will update according to the drop-down that it is associated with in their containers.
This is a chord diagram which basically is used to find the relations b/w the top 10 characters of the show. The data for this was developed in python.
Steps:
Breaking the data scene wise
Finding all the characters in a particular scene
Adding the no. of interactions to a matrix
So this is data that does not need to be tracked over time. We chose to use the line charts for Stuff that could be tracked over time like the gayness of Garak. We used the court diagram to compare who was speaking in what scenes to each other. The word cloud was used to visualize in a fun way the most often spoken words by characters.
Using the website:
Page looks empty when you load it.
On the top left block the user can select what particular Season and Episode info would they like.
In this example we selected S5E7 and found out that Bashir spoke the most in that episode
The word cloud on the right is also showing the same information.
The charts on the bottom are also updated they show information about
This is a static visualization which was a bonus assignment.
This is a video tutorial of our website