Data
IMDb Web-page
IMDb Web-page
Website: ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/
- Files Used in Visualization
- release-dates.list
- running-times.list
- certificates.list
- genres.list
- keywords.list
- movies.list
- ratings.list
How the Data Was Cleaned
How the Data Was Cleaned
- We started by downloading the data from the website that was shown above
- For each of the data sets there was a paragraph explaining the data
- We removed the paragraph at the top of every data set and replaced it with the heading: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
- This allowed us to easily separate the movie names from the rest of the data
- This was done for all the data
- After this we used grepl functions from tidyverse to comb through and text mine through the data to retrieve specific values