Up to now, we explored variables by looking at each one separately. If you've followed along with the examples, you have created a univariate description of the data. The data is rich in information because it offers a perspective that goes beyond the single variable, presenting more variables with their reciprocal variations. The way to use more of the data is to create a bivariate exploration.
Visualizations can convey the variety of statistical characteristics of the variables and their reciprocal relationships with greater ease.
Boxplots provide a way to represent distributions and their extreme ranges. Let us inspect the boxplot for the MLB players stats for height based on player's position.
We are using .boxplot() command here.
After you have spotted a possible group difference relative to a variable, a t-test or a one-way Analysis of Variance can provide you with a statistical significance of the difference between the groups' means.
The t-test compares two groups at a time, and it requires that you define whether the groups have similar variance or not..
You interpret the pvalue as the probability that the calculate t statistic difference is just due to chance. Usually, when it is below 0.05, you can confirm that the groups' means are significantly different.
You can simultaneously check more than two groups using the one-way ANOVA test. In this case, the pvalue has an interpretation similar to the t-test.
Make two boxplots: 1. For 'Sell' vs number of bathrooms. 2. For 'Acre" vs number of bathrooms.
Calculate the variance of 'Sell' price for houses with 1 and 2 bathroom. Then, perform the t-test and print out the conclusion of the test.
Perform one-way ANOVA test on 'Acre' for homes with 1, 2, and 3 bathrooms. Then, print out the conclusion of the test.