// wk3:

Critiquing Lin et al. (2002)

"Efficient Adaptive-Support Association Rule Mining for Recommender Systems" by Lin et al. (2002) provides a detailed explanation of an approach to association rule mining, the Adaptive-Support Association Rule Mining (ASARM) algorithm, that builds off of the Apriori and CBG-RD algorithms but is novel in its improved runtime. Previously existing association rule mining algorithms, while performing well for market basket analysis, performed poorly in the context of personalized recommendations. Lin et al. observed that unlike market basket analysis, which takes into account aggregate customer behavior and thus considers all users as target items to mine rules, recommender systems mine rules for one target item at a time. In other words, rule heads may be specified ahead of rule-mining time, which drastically decreases runtime. Furthermore, users looking for recommendations are unlikely to ask to see all suggestions above a given support threshold all at once; rather, they are likely to incrementally ask for more and more high-quality suggestions. Recommender systems may capitalize on this practicality by requesting some range of number of rules to recommend and adjusting the minimum support threshold automatically. This additionally decreases runtime in comparison with the standard algorithm for market basket analysis, which requires the user to input the minimum support themselves and potentially yields either too few or too many rules. By mining for rules relevant to a target item, ASARM is finally able to mine rules for items that appear in transaction data infrequently (i.e., new movies, articles, products). Rules are of the form “{user_a : like} AND {user_b : like} ⇒ {user_c : like} with confidence 90% and support 30%,” and translate to recommendations of the form "90% of articles liked by users A and B are also liked by user C, [where] 30% of all articles are liked by all of them." ASARM explores associations in terms of likes and dislikes (implying positive negative correlations between users' tastes respectively). Finally, associations are made both among users and among articles with the aim of making use of overlaps of different users' tastes.

After reading about the work done by Lin et al., I feel very familiar with association rule mining, its measures of correlation and significance, and how it may be altered to suit the context of different applications. The motivation to cut down runtime is especially applicable to my project's goal to learn design rules. Our greatest challenge currently is knowing what constitutes meaningful design rules and how to look for them; via a brute force method, we could encounter an exponential number of rules. Drawing from ASARM, however, we might begin to narrow down our search and thus drastically decrease runtime. For example, we might exploit the use of rule heads specified in advance by choosing only a few lines of code that the user has recently looked at (measured through an IDE-hosted event logger), and possibly enforcing the order of the lines of code to further narrow down the search. Because our goal is to provide a few high-quality recommendations, we might also exploit the use of a range for number of rules rather than some support threshold. Lastly, Lin et al. calculate scores for each rule as the product of its support and confidence, an intuitive calculation that we might use as well.

Relative to the paper specifically, one question that remains unanswered for me has to do with the implementation of "set-trees," which Lin et al. use to help "facilitate the join and prune operations on candidate itemsets." There is little to no description of what a set-tree is, how it differs from other data structures, or how it was the best option for this algorithm. Because one of our greatest challenges is determining the representation of our equivalent items, transactions, and, ultimately, rules, it would be especially useful to see some elaboration on Lin et al.'s implementation. In terms of my project only, however, one additional question that persists is how we tokenize and choose the 'IF' and 'THEN' parts of our rules, and the order in which to place such tokens. By specifying rule heads only and not rule tails, we might improve performance, but we might also restrict our search for meaningful rules. For the moment, it will be most useful to continue considering Lin et al.'s treatment of associations among articles; associations among users seems like an interesting step in the context of learning design rules, albeit a step that will be left for future studies.

Bibliography

Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. “Efficient Adaptive-Support Association Rule Mining for Recommender Systems”. In: Data Mining and Knowledge Discovery 6.1 (2002), pp. 83–105. issn: 1573-756X. doi: 10.1023/A:1013284820704. url: https://doi.org/10.1023/A:1013284820704.