探索不願表態者隱藏意向 (Tacit Intention)

場景

設計問卷的目的是要探索在填問卷的當下,探索受問者隱藏的意向。問卷中的題項不能與問卷目的太過直接,否則受問者可能會依照或故意相反地回答,而失去了問卷設計初衷。問卷中的題項隱含著一個理論模型,說明著為什麼要這樣問,以及問完後能獲致何種結果,或者更直接地說,就適用問卷來證明那個理論模型是否正確,因此沒有理論模型的存在,問卷的價值就只在蒐集一堆個人化情報而已,對於知識建構或發現沒有助益,真正問題是理論模型的建構牽涉很廣,並不是件容易的事。

今有一組問卷資料,要探索受問者對某個理念是否願意表態 (是/否),也有許多人不願意表態 (NA),在理論模型中假說:「表態」是受另外 5 個因素所左右,其中一個是「年齡」。在這些因素中,除了「年齡」外,有些人也不願意全部告知。

問題

能否有一個方法及程序,「推敲」不願意表態者隱藏的意向,也就是說從影響因素中「推敲」受問者表態 (是/否) 呢?

資料

    • 筆數:1077,其中「表態=是」:670,「表態=否」:350,「表態=NA」:57

    • 各因素筆數分析:

    • 各因素統計值分析:

                • RawData <- read.table("RawData.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE) summary(RawData)

    • 某因素迴歸分析: 在 95% 信賴區間下,Factor4 由 Factor2 及 是否表態 (Attitude) 所決定。

        • Call: lm(formula = Factor4 ~ Age + Factor1 + Factor2 + Factor3 + Attitude, data = SRC) Residuals: Min 1Q Median 3Q Max -6.1798 -1.1013 0.0496 1.8800 4.6377 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.448575 0.427161 10.414 < 2e-16 *** Age -0.002476 0.006456 -0.384 0.70140 Factor1 -0.027297 0.110072 -0.248 0.80421 Factor2 0.164183 0.068692 2.390 0.01710 * Factor3 0.005206 0.089847 0.058 0.95381 Attitude 1.148066 0.363276 3.160 0.00164 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.536 on 721 degrees of freedom Multiple R-squared: 0.02735, Adjusted R-squared: 0.02061 F-statistic: 4.055 on 5 and 721 DF, p-value: 0.001229

資料處理程序

    • 資料分群處理:將原始資料 Attitude 欄位資料,將 {1=Y, 0=N} 與 { NA } 分成各自獨立的檔案。以下為 Python 程式:

                • # Data Normalization # Filter out NA Records # Source File fileName = "RawData.csv" csvFile = open(fileName, 'r') # Normalized File solidFileName = "Data_NA_None.csv" solidCsvFile = open(solidFileName, 'w') isFirstLine = True csvHeader = [] while True: csvLine = csvFile.readline() if not csvLine: break if isFirstLine: isFirstLine = False csvHeader = csvLine.strip().split(',') solidCsvFile.write('"%s","%s","%s","%s","%s","%s"\n' % (csvHeader[1], csvHeader[2], csvHeader[3], csvHeader[4], csvHeader[5], csvHeader[6])) else: csvFields = csvLine.strip().split(',') isNA = False for i in range(1, 6): if csvFields[i] == 'NA': isNA = True break if not isNA: Attitude = 'N' if csvFields[1] == '1': Attitude = 'Y' solidCsvFile.write('"%s",%s,%s,%s,%s,%s\n' % (Attitude, csvFields[2], csvFields[3], csvFields[4], csvFields[5], csvFields[6])) solidCsvFile.close() csvFile.close()

                • # Data Normalization # Extract Turnout=NA Records # Filter out Others=NA Records # Source File fileName = "RawData.csv" csvFile = open(fileName, 'r') # Normalized File solidFileName = "Data_NA.csv" solidCsvFile = open(solidFileName, 'w') isFirstLine = True csvHeader = [] while True: csvLine = csvFile.readline() if not csvLine: break if isFirstLine: isFirstLine = False csvHeader = csvLine.strip().split(',') solidCsvFile.write('"%s","%s","%s","%s","%s","%s"\n' % (csvHeader[1], csvHeader[2], csvHeader[3], csvHeader[4], csvHeader[5], csvHeader[6])) else: csvFields = csvLine.strip().split(',') isNA = False for i in range(2, 6): if csvFields[i] == 'NA': isNA = True break if not isNA: if csvFields[1] == 'NA': Attitude = 'NA' solidCsvFile.write('"%s",%s,%s,%s,%s,%s\n' % (Attitude, csvFields[2], csvFields[3], csvFields[4], csvFields[5], csvFields[6])) solidCsvFile.close() csvFile.close()

    • 轉換資料格式:將 CSV 檔案格式轉換為 Weka ARFF 格式。

        • @attribute Attitude {Y,N,NA} @attribute FactorA numeric @attribute FactorB numeric @attribute FactorC numeric @attribute FactorD numeric @attribute Age numeric

資料分類模型 (Random Tree): Multilayer Perceptron Naive Bayes Logistic K-nearest Neighbours