Search this site
Embedded Files
Skip to main content
Skip to navigation
CS 424 Data and Visualization
Home
Project 3
Visualization
Data
Files
Interesting Discovery
Team
2/20/2020
Project 1
(*NEW*)Project 1 Resources
Data
GitHub
Interesting Finds
Application Description
Xrite Test
First ShinyApp Link
Sources
Viewers Choice
CS 424 Data and Visualization
Home
Project 3
Visualization
Data
Files
Interesting Discovery
Team
2/20/2020
Project 1
(*NEW*)Project 1 Resources
Data
GitHub
Interesting Finds
Application Description
Xrite Test
First ShinyApp Link
Sources
Viewers Choice
More
Home
Project 3
Visualization
Data
Files
Interesting Discovery
Team
2/20/2020
Project 1
(*NEW*)Project 1 Resources
Data
GitHub
Interesting Finds
Application Description
Xrite Test
First ShinyApp Link
Sources
Viewers Choice
Data
IMDb Web-page
Website:
ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/
Files Used in Visualization
release-dates.list
running-times.list
certificates.list
genres.list
keywords.list
movies.list
ratings.list
How the Data Was Cleaned
We started by downloading the data from the website that was shown above
For each of the data sets there was a paragraph explaining the data
We removed the paragraph at the top of every data set and replaced it with the heading: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
This allowed us to easily separate the movie names from the rest of the data
This was done for all the data
After this we used grepl functions from tidyverse to comb through and text mine through the data to retrieve specific values
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse