EDUCATION
PUBLICATION Book Chapters:
Peer-Reviewed Conference Papers:
Workshop papers:
RESEARCH/TEACHING/WORKING EXPERIENCES:
SKILLS: · Technique: o Data Mining: Classification (Decision Tree, Association Rule, Naïve Bayes, Supported Vector Machines, Genetic Algorithm, Genetic Programming) Clustering Frequent Itemset and Rule Mining Probabilistic Methods (Bayesian, Maximum A Posterior, Maximum Likelihood) o Information Retrieval: Crawling, Parsing, Indexing, Searching, Ranking (PageRank, HITS) Recommender System (Collaborative Filtering) Personalization, Visualization Information Extraction (Hidden Markov Model, Conditional Random Field) Social Network Analysis o Tools: Lucene Weka GATE Prefuse, GUESS · Languages: o JAVA, C/C++ o Python, JSP o Matlab o PL/SQL
KEY PROJECTS: 1. Evidence Mining (dissertation topic and proposal for NSF) Just as an open ended question calls for an explanation instead of yes or no, so too do some applications, such as recommender systems and literature-based discovery, require evidence in support of suggested relationships. I proposed a graph-based algorithm to search and rank evidences, and an exploratory approach to present found evidences to users. 2. Personalization in Education (funded by NSF) For the NSDL (National Science Digital Library) to have a significant impact on the future of education, it needs to be integrated into the current pedagogical practices of educators and students. We are exploring how to best tailor NSDL functionality toward particular communities of users, including learners and educators. Current strategies include Facebook applications and DSpace deployment. 3. Stepping Stones and Pathways (funded by NSF) Given a query covering multiple topics, it is not possible for traditional information retrieval techniques to find appropriate answers. SSP is proposed to answer such a query by a chain of related documents. 4. Support Vector Machines Current solution for Support Vector Machines is based on Duality. We are employing multiple mathematic smoothing approaches to calculate the super plane. Our approach is tested with other well known implementations of Support Vector Machines and classification models. 5. Online Knowledge Community Mining (proposal for NSF) There is no specific metric to evaluate the quality of knowledge communities such as discussion forums or no guide for them to better serve their members. We are proposing to fill the gap. A prototype was built using GUESS to visualize communities on the VBcity forum and to employ a variety of SNA metrics to evaluate their quality. 6. Personalized News Portal Given a variety of news feeds, we proposed to provide personalized news according to users’ feedback. A variety of models such as Genetic Algorithm and Genetic Programming were applied to find news more interesting to individuals. 7. Active Web Intermediaries (Master’s thesis at National Univ. of Singapore) With the maturity of content adaptation technologies (such as I-CAP and OPES) in web intermediaries and of content personalization in the network, data integrity has already become a key concern when such technologies are deployed. To address this issue, I proposed a data integrity framework to support content adaptation, which was deployed in the proxy server, Squid.
HONORS/ AWARDS: First Place in College of Engineering, Virginia Tech, for the demo, Embodied Data Objects: Tangible Interfaces to Information Appliances, at the 22nd Annual Research Symposium and Exposition, Virginia Tech, 2006. PROFESSIONAL SERVICES: Reviewer: Journal of the American Society for Information Science and Technology, 2007. Reviewer: Journal of Natural Language Engineering, 2007. Reviewer: Journal of Management Information Systems, 2007. Reviewer: Transactions on Information Systems, 2006. | Xiaoyan Yu Email: xiaoyan.yu at gmail dot com Homepage: http://xiaoyan.yu.googlepages.com/ Please get the print version of my CV. |
Copyright © 2007 Xiaoyan Yu