Just as the relationship between variables is graphically representable, it is also measurable by a statistical estimate. When working with numeric variables, the estimate is a correlation. When you work with categorical variables, the estimate is an association, and the chi-square statistic is the most frequently used tool for measuring association between features.
Covariance determines whether two variables have a coincident behavior with respect to their mean.
The scale of the variables you observe influence covariance, so you should use a different, but standard, measure. The solution is to use correlation, which is the covariance estimation after having standardized the variables.
Correlations can work fine when your variables are numeric and their relationship is strictly linear. Sometimes, your feature could have an ordering or nonlinearity due to non-normal distributions in your data. A possible solution is to test this with a nonparametric correlation, such as a Spearman rank-order correlation.
A Spearman correlation transforms your numeric values into rankings and then correlates the rankings, thus minimizing the influence of any nonlinear relationship between the two variables.
You can apply another nonparametric test for relationship when working with cross-tables. This test is applicable to both categorical and numeric data (after it has been discretized into bins).
The chi-square statistic tells you when the table distribution of two variables is statistically comparable to a table in which the two variables are hypothesized as not related to each other (the so-called independence hypothesis).
What Influence the Taxes?
Using homes.csv, try to find out the following:
The correlation for all variables in homes.
Spearman and Pearson's correlation between Sell and Taxes.
Try to bin and concatenate the sell, list, acres, and taxes columns, and save it in variable homes_binned. Use 20% interval for binning.
Using homes_binned, make a contingency table for: Sell vs Number of Rooms, List vs Number of Rooms, Area vs Number of Rooms, and Taxes vs Number of Rooms.
Do the chi-square test for each contingency table in no.4. Check if the number of rooms can be effectively used for distinguishing between homes groups.