Hypothesis1:
H0: there is no significant difference in mean values of feature energy between pop songs and non-pop songs.
· T-test:
T test is used to test the difference between means when two samples are from independently normal distributions. We use the t statistic to evaluate whether two means are identical. A p value will be calculated. If p value is lower than the threshold, we can reject the Null hypothesis. Here we set the threshold as 0.05.
Motivation:
In the real world, there are no connection between pop songs and non pop songs. A reasonable assumption is that pop songs data and non-pop songs data are from independent normal distribution. To compare mean values of energy for two independent normal datasets, t test is a common choice.
Experiment:
1. select records with Parentcat as pop group
2. Rest rercords belong to non-pop group
3. apply t test on energy values of two groups
Result: The statistical test result is
Since p value is smaller than 0.05, we thus can reject the null hypothesis. There is significant difference between means of feature energy values between pop songs and non-pop songs. We can infer that songs of certain category have its characteristics, so we will be more confident to conduct further analysis like classification.