Title: Predictive Analytics and Sensitivity Analysis of Association Rules
Abstract:
Modern organizations generate a large amount of unstructured textual transaction data on a daily basis. These include logbooks of systems and processes, technician reports of product failures and social media blogs. Transactions typically include semantic descriptors that require specialized methods of analysis. Exploring association rule (AR) is a powerful semantic data analytic technique used for extracting information from transaction databases. AR analysis was originally developed in shopping basket analysis where the combination of items in a shopping cart are evaluated. To generate an AR, the collection of more frequent itemsets, a set of two or more items—is first detected. Then, as a second step, all possible ARs are generated from each itemset in this group. The ARs are then ranked using measures of association such as support, confidence, and lift. These measures are labelled “measures of interest”. The R package “arules” provides more than a dozen such measures including the relative linkage disequilibrium (RLD) which normalizes classical Euclidean distances of the itemset from a surface of independence. The talk will show how to conduct a latent clustering and sensitivity analysis of ARs by clustering documents and repeatedly splitting data into a training and validation sets. These methods provide expanded interpretability and generalizability of AR analysis, even beyond the currently popular Rapid Keyword Extraction (RAKE) algorithm in Natural Language Processing.
References
Hahsler, M., Grun, B., and Hornik, K. (2005). arules – A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15):1–25. ISSN 1548-7660. URL http://www.jstatsoft.org/v14/i15/.
Kenett, R. (1983). On an Exploratory Analysis of Contingency Tables. Journal of the Royal Statistical Society, series D, 32, pp. 395-403.
Kenett, R. S., and Salini, S. (2010). Measures of association applied to operational risks. Operational risk management: A practical approach to intelligent data analysis, 149-167, John Wiley and Sons.
Kenett, RS, Gotwalt, C, Freeman, L, Deng, X (2022). Self‐supervised cross validation using data generation structure. Applied Stochastic Models in Business and Industry, 38(5), 750-765.
Kenett RS, Zacks S, Gedeck P (2022). Modern Statistics: A Computer-Based Approach with Python, 1st edn. Springer-Birkhauser
Kenett, RS, and Gotwalt, C (2024). The analysis of association rules: latent class analysis, Statistical Analysis and Data Mining, in press.
Vives‐Mestres, M., Kenett, R. S., Thió‐Henestrosa, S., Martín‐Fernández, J. A. (2022). Measurement, selection, and visualization of association rules: A compositional data perspective: A Compositional Data perspective on Association Rules. Quality and Reliability Engineering International, 38(3), 1327-1339.