Ph.D. Graduate
Email: hyifen AT cs DOT cmu DOT edu


Language Technologies Institute
School of Computer Science

Carnegie Mellon University



Research Interests

My thesis research focuses on mixed-initiative learning that explores a new type of machine learning tasks. In a mixed-initiative learning task, a user and a machine collaborate with each other by making interleaved contributions to a task, and the user and the machine can both update their model assumptions by learning from the other's contributions. My research goal is to make machine learning techniques more popular and feasible for non-expert users. I have also worked on question answering and speech recognition in the past.

Advisor: Tom Mitchell


Publications


Dissertation: Mixed-Initiative Clustering, April 2010
Abstract
Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other.
    The first contribution of this thesis is providing a framework of mixed-initiative clustering. The framework consists of machine learning and teaching phases, and user learning and teaching phases connected in an interactive loop which allows bi-directional communication. The bi-directional communication languages define types of information exchanged in an interface. Coordination between the two communication languages and the adaptation capability of the machine's clustering model is the key to building a mixed-initiative clustering system.
    The second contribution comes from successfully building several systems using our proposed framework. Two systems are built with incrementally enriched communication languages -- one enables user feedback on features for non-hierarchical clustering and the other accepts user feedback on hierarchical clustering results. This achievement validates our framework and also demonstrates the possibility to develop machine learning algorithms to work with conceptual properties.
    The third contribution comes from the study of enabling real-time interactive capability in our full-fledged mixed-initiative clustering system. We provide several guidelines on practical issues that developers of mixed-initiative learning systems may encounter.
    The fourth contribution is the design of user studies for examining effectiveness of a mixed-initiative clustering system. We design the studies according to two scenarios, a learning scenario where a user develops a topic ontology from an unfamiliar data set, and a teaching scenario where a user knows the ontology and wants to transfer this knowledge to a machine. Results of the user studies demonstrate that mixed-initiative clustering has advantages over non-mixed-initiative approaches in terms of helping users learn an ontology as well as helping users teach a known ontology to a machine.


Conference Papers: