Assignments V: Advanced Data Handling

The dataset provided in this assignment contains 6000 movies title from 1995-2018 (year and genres included) with around total of 100,000 reviews (ratings and epoch timestamps included)

By using all the provided dataset (and also all the skills you've learned so far in this course), make 10 DataFrames with the following specification(the ordering for all of the DataFrame is in descending order.)

1. Movies with their average ratings.

2. Movies with the number of ratings each gets.

3. Movies with the length of their title.
- You need to first omit the movie translation written in parentheses ().
- Also, you need to omit the quotes "" if any.
- Space and dash - are included in the title length

4. User Ids with the number of ratings they have.

5. User Ids with the number of ratings given in each genres categories.

6. User Ids with the average ratings they give.

7. User Ids with the longest period of giving ratings ((max - min) timestamp of each userIds).

8. Years with number of movies in that year.

9. Years with movies of the highest (average) ratings in that year. (Write both movies and their average rating.)

10. The correlation table for movies genre. (Use 3 decimals value)

At the end, do the following:

Convert all those results (with appropriate title) into csv form and put them (not manually) in the top10 folder.
Print out the Top 10 of each dataframes. Gives an appropriate heading line for each (like "Top 10 of Movies from the ratings"). Careful for no.10, since it is a matrix, print only top 10 pairs of movie genre with their correlation (Hint: Use unstack).

Page updated

Google Sites

Report abuse