The first actions that you can take with the data are to produce some synthetic measures to help figure out what is going in it. You acquire knowledge of measures such as maximum and minimum values, and you define which intervals are the best place to start.
Mean and median are the first measures to calculate for numeric variables. They can provide you with an estimate when the variables are centered and somehow symmetric.
Using pandas, you can quickly compute both means and medians.
As a next step, you should check the variance by using its square root, the standard deviation. The standard deviation is as informative as the variance and it is a good indicator of whether a mean is a suitable indicator of the variable distribution.
The higher the variance, the farther you can expect some values to appear from mean.
In addition, you also check the range, which is the difference between the maximum and minimum value for each quantitative variable.
Because the median is the value in the central position of your distribution of values, you may need to consider other notable positions. Apart from the minimum and maximum, the position at 25% of your values (the lower quartile) and the position at 75% (the upper quartile) are useful for determining the data distribution, and they are the basis of an illustrative graph called a boxplot.
Skewness is a measure of the symmetry in a distribution. A symmetrical dataset will have a skewness equal to 0. So, a normal distribution will have a skewness of 0. Skewness essentially measures the relative size of the two tails.
Kurtosis is a measure of the combined sizes of the two tails. It measures the amount of probability in the tails. The value is often compared to the kurtosis of the normal distribution, which is equal to 3. If the kurtosis is greater than 3, then the dataset has heavier tails than a normal distribution (more in the tails). If the kurtosis is less than 3, then the dataset has lighter tails than a normal distribution (less in the tails).
When performing the skewness and kurtosis tests, you determine whether the p-value is less than or equal 0.05. If so, you have to reject normality.
You can perform another test for kurtosis, as shown in the following code:
Using homes.csv, try to find out the following and print it out:
Means of Sell & List Price
Medians of Sell & List Price
Range of Sell & List Price
10% and 90% percentile of Sell & List Price