Email: hyifen AT cs DOT cmu DOT edu
Language Technologies Institute
School of Computer Science
My thesis research focuses on mixed-initiative learning that explores a new type of machine learning tasks. In a mixed-initiative learning task, a user and a machine collaborate with each other by making interleaved contributions to a task, and the user and the machine can both update their model assumptions by learning from the other's contributions. My research goal is to make machine learning
techniques more popular and feasible for non-expert users. I have also worked on question answering and speech recognition in the past.
Advisor: Tom Mitchell
Dissertation: Mixed-Initiative Clustering, April 2010Abstract
Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other.
The first contribution of this thesis is providing a framework of mixed-initiative clustering. The framework consists of machine learning and teaching phases, and user learning and teaching phases connected in an interactive loop which allows bi-directional communication. The bi-directional communication languages define types of information exchanged in an interface. Coordination between the two communication languages and the adaptation capability of the machine's clustering model is the key to building a mixed-initiative clustering system.
The second contribution comes from successfully building several systems using our proposed framework. Two systems are built with incrementally enriched communication languages -- one enables user feedback on features for non-hierarchical clustering and the other accepts user feedback on hierarchical clustering results. This achievement validates our framework and also demonstrates the possibility to develop machine learning algorithms to work with conceptual properties.
The third contribution comes from the study of enabling real-time interactive capability in our full-fledged mixed-initiative clustering system. We provide several guidelines on practical issues that developers of mixed-initiative learning systems may encounter.
The fourth contribution is the design of user studies for examining effectiveness of a mixed-initiative clustering system. We design the studies according to two scenarios, a learning scenario where a user develops a topic ontology from an unfamiliar data set, and a teaching scenario where a user knows the ontology and wants to transfer this knowledge to a machine. Results of the user studies demonstrate that mixed-initiative clustering has advantages over non-mixed-initiative approaches in terms of helping users learn an ontology as well as helping users teach a known ontology to a machine.
- “Exploring Hierarchical User Feedback in Email Clustering,” Yifen Huang and Tom Mitchell. Enhanced Messaging Workshop, AAAI 2008, Chicago, IL, July 2008.
- "A Framework for Mixed-Initiative Clustering," Yifen Huang and Tom Mitchell. NESCAI 2007, Ithaca, NY, April 2007.
- "Text Clustering with Extended User Feedback," Yifen Huang and Tom Mitchell. SIGIR 2006, Seattle, WA, August 2006.
- "Extracting Knowledge about Users’ Activities from Raw Workstation Contents," Tom Mitchell, Sophie Wang, Yifen Huang and Adam Cheyer. AAAI 2006, Boston, MA, July 2006.
- "Inferring Ongoing Activities of Workstation Users by Clustering Email ," Yifen Huang, Dinesh Govindaraju, Tom Mitchell, Vitor R. Carvalho, William Cohen. First Conference on Email and Spam, Mountain View, CA, July 2004.
- "Towards Light Semantic Processing for Question Answering ," Benjamin Van Durme, Yifen Huang, Anna Kupsc, Eric Nyberg, HLT/NAACL Workshop on Text Meaning, Edmonton, Cananda, May 31 2003.
- "The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategy Approach with Dynamic Planning
," E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K.
Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J.
Ko, A. Kupsc, L. Lita, V. Pedro, D. Svoboda and B. Van Durme, 12th Text REtrieval Conference , November 2003.
- "The JAVELIN Question-Answering System at TREC 2002
," Nyberg, E., T. Mitamura, J. Carbonell, J. Callan, K.
Collins-Thompson, K. Czuba, M. Duggan, L. Hiyakumoto, N. Hu, Y. Huang,
J. Ko, L. Lita, S. Murtagh, V. Pedro and D. Svoboda, Proceedings of TREC 11 , November 2002.