Caleb Sotelo's UCSD CSE Honors Project

Pattern Explorer is a novel graphical interface for visual discovery of patterns in data, currently being developed by Caleb Sotelo. The goal of this project is to complete and publish Pattern Explorer with full documentation, and to write an accompanying paper of sufficient quality and novelty to be submitted to a professional data mining conference. The project will span Winter and Spring Quarters 2011, and is to be undertaken with guidance from Professor Charles Elkan of the Computer Science & Engineering Department.

A typical challenge in data mining is the discovery of patterns that are actually interesting to the user, as opposed to patterns that are coincidental or already well-known. A solution to this problem is to create a system that allows the user to fluidly explore the rule space and facilitates discovery by users of significant patterns. One such system was proposed and  implemented by Lei Zhang et al. [1] for Motorola Inc., but to our knowledge no similar system exists as open source software. Rapidminer is a widely-used, highly functional, and robust open source data mining environment written in Java. Our goal is to develop a graphical interface extension to Rapidminer, named Pattern Explorer, that will allow for intuitive and user-friendly pattern exploration, inspired by the Motorola system. The system will include novel interactive capabilities to this end, such as exploration-session history management, drill-down, color-coded data, computer memory and time economization options, and level-of-detail view options. The system will also implement software engineering best-practices that aim to provide an extensible, secure, and responsive pattern exploration environment, such as inner loop optimizations, testing with very large data sets, multi-threaded architecture, and extensive end-user and developer documentation.

[1] Zhang, L., Liu, B., Benkler, J., Zhou C. "Finding actionable knowledge via automated comparison." IEEE ICDE, 2009.