Assignments I : Data Handling

Using pandas, read the movie.csv file and save it in a variable called movie.
Save the column 'budget' into a variable then print out the average budget of all the movies. Print it out in the following sentence: The average budget of all the movies is {budget} USD.
Using len and drop_duplicates(), try to check how many different directors in our dataset and print it out as follows: The number of different directors in the dataset is {num_diff_directors}.
Our movie dataset has some missing value. In the aspect_ratio column, fill the missing value with 2.35, then replace the original column in variable movie with the new filled column.
Remove the data in movie if the value in budget , imdb_score, or title_year is missing.
Remove the data in movie if the title_year is less than 1990.
Make a new dataset called director_budget. This new dataset consists of the following columns: director_name, movie_title, budget, imdb_score, and title_year.
In director_budget, make a new column called director_expense. This new column consists of the total of budget for each director. (The values of director_expense of the movies with the same director are all the same.) Hint: you can use groupby , transform and np.sum.
In director_budget, group the directors using groupby method then print out the statistic description of it using describe().
Save that new dataset as dir_budget.csv.

Page updated

Google Sites

Report abuse