Data

IMDb Web-page

Website: ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/

  • Files Used in Visualization
    • release-dates.list
    • running-times.list
    • certificates.list
    • genres.list
    • keywords.list
    • movies.list
    • ratings.list


How the Data Was Cleaned

  1. We started by downloading the data from the website that was shown above
  2. For each of the data sets there was a paragraph explaining the data
  3. We removed the paragraph at the top of every data set and replaced it with the heading: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
  4. This allowed us to easily separate the movie names from the rest of the data
  5. This was done for all the data
  6. After this we used grepl functions from tidyverse to comb through and text mine through the data to retrieve specific values