Assignment III: Data Analysis

In this assignment, we will be using dir_budget.csv dataset you have made in Assignment I. Copy and paste its content to the empty dir_budget.csv.

Print out the mean, median, lower and upper quartile, skewness and kurtosis for both IMDB score and budget.
Floor the IMDB score of all the movies, save it in a variable, and then calculate the number of movies for each IMDB score floored then print it out.
Divide the budget into 10 bins, then calculate the number of movies for each bin then print it out.
Make a contingency table for the binned IMDB score and budget you have made in no. 3 and no.4. Normalize the columns and to make it easier to see as percentage, multiply it by 100.
Calculate the variance of IMDB score of 6 and 7, then perform t-test on them. (Uncomment the commented area)
Perform One-way ANOVA on the budget used for movies with IMDB score of 5, 6, 7, and 8 then print out its f-value and p-value. (Uncomment the commented area)
Find the covariance and correlation from the original dir_budget.csv. (excluding the director_budget column)
Find the correlation of IMDB score against budget for both Pearson and Spearman method.
Perform chi-square test on the contingency table you have made in no. 4 then print out the chi-square and p-value.
Perform scaling on the original IMDB score and budget then use the following 8 transformations: linear (x), reciprocal (1/x), square root (x^(1/2)), cube root (x^(1/3)), square (x^2), logarithm with base 10, with base e, and with base 2. Find out which one gives Pearson's r correlation the highest.

Page updated

Google Sites

Report abuse