Independence test using Chi-square test

Post date: May 15, 2014 3:22:06 PM

To test whether or not two variables (cnt vs label) are independent, Chi-square test is a simple choice. In this example we have cnt ~ (cnt_0, cnt_1) and label ~ (label1, label2).

When computing Weight of Evidence two consecutive intervals label1(cnt_0, cnt_1) and label2(cnt_0, cnt_1) can be merged if cnt and label are independent.

## Experimet on chi-square test   cnt0 <- c(10,40) cnt1 <- c(5,20) df <- data.frame(cnt0, cnt1) row.names(df) <- c('label1','label2') chi.test <- chisq.test(df)  cat(sprintf('p-value = %.2f',chi.test$p.value))   # p-value = 1, which is greater than 0.05 --> label and cnt are independent  # --> lab1 and lab0 are the same        cnt0 <- c(10,40) cnt1 <- c(6,20) df <- data.frame(cnt0, cnt1) row.names(df) <- c('label1','label2') chi.test <- chisq.test(df)  cat(sprintf('p-value = %.2f',chi.test$p.value))   # p-value = 0.99, which is greater than 0.05 --> label and cnt are independent  # --> lab1 and lab0 are the same        cnt0 <- c(10,1000) cnt1 <- c(5,1000) df <- data.frame(cnt0, cnt1) row.names(df) <- c('label1','label2') chi.test <- chisq.test(df)  cat(sprintf('p-value = %.2f',chi.test$p.value))   # p-value = 1.00, which is greater than 0.05 --> label and cnt are independent  # --> lab1 and lab0 are the same 
Created by Pretty R at inside-R.org

ddd