Curriculum Vitae




 

Home   Research    Publications    Curriculum Vitae   

EDUCATION                                                              

  • Ph.D. in Computer Science, 08/2003  to 08/2008 (Expected) GPA-to-date: 3.9 / 4.0.
    Virginia Tech, Blacksburg, Virginia.

    Dissertation Title: Mining Evidence for Relationships in Heterogeneous Object Spaces
    Advisor: Dr. Edward A. Fox,
    Committee: Dr. Patrick Fan, Dr. Naren Ramakrishnan, Dr Lillian Cassel, Dr. Manuel Pérez-Quiñones

  • Masters in Computer Science, 08/2001 to 07/2003. GPA: 3.8 / 4.0.
    National Univ. of Singapore, Singapore.
  • Bachelor in Computer Engineering, 08/1997 to 06/2001. GPA: 3.3/4.0
    Fudan Univ., Shanghai, China.

 

PUBLICATION                                                                             

Book Chapters:

  • Xiaoyan Yu, Manas Tungare, Weiguo Fan, Yubo Yuan, Manuel Pérez-Quiñones, Edward A. Fox, William Cameron, Lillian Cassel: Automatic Syllabus Classification using Support Vector Machines. (To appear in) Handbook of Research on Text and Web Mining Technologies, 2008, edited by Min Song, Yi-Fang Wu, Idea Group Inc.
  • Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Pérez-Quiñones, Edward A. Fox, William Cameron, Lillian Cassel: Automatic Genre-specific Text Classification. (To appear in) Encyclopedia of Data Warehousing and Mining, 2008, edited by John Wang, Idea Group, Inc.
Journal/Magazine Papers:
  • Edward A. Fox, Fernando Das Neves, Xiaoyan Yu, RaoShen, Seonho Kim, Weiguo Fan: Exploring the computing literature with visualization and stepping stones & pathways. Commun. ACM 49(4):52-58, 2006
  • Xiaoyan Yu, Fernando A. Das Neves, Edward A. Fox: Hard queries can be addressed with query splitting plus stepping stones and pathways. IEEE Data Eng. Bull. 28(4): 29-38, 2005

Peer-Reviewed Conference Papers:

  • Xiaoyan Yu, Manas Tungare, Weiguo Fan, Yubo Yuan, Manuel Pérez-Quiñones, Edward A. Fox, William Cameron, Lillian Cassel: Using Automatic Metadata Extraction to Build a Structured Syllabus Repository. (To appear in) Proceedings of the 10th International Conference on Asian Digital Libraries, 2007.
  • Xiaoyan Yu, Manas Tungare , Weiguo Fan, Manuel Pérez-Quiñones, Edward A. Fox, Lillian Cassel:  Automatic syllabus classification. In Proceedings of the 2007 Conference on Digital Libraries (Vancouver, BC, Canada, June 18 - 23, 2007). JCDL '07. ACM Press, New York, NY, 440-441.
  • Manas Tungare , Xiaoyan Yu, Manuel Pérez-Quiñones, Edward A. Fox, Weiguo Fan, Lillian Cassel: Towards a Syllabus Repository for Computer Science Courses. In Proc. Special Interest Group on Computer Science Education (SIGCSE) 2007.
  • Manas Tungare, Patha Pyla, Pradyut Bafna, Vladimir Glina, Wenjie Zheng, Xiaoyan Yu, Umut Bali, Steve Harrison: Embodied data objects: tangible interfaces to information appliances. In Proceedings of the 44th ACM Southeast Conference, 2006. 
  • Fernando A. Das Neves, Edward A. Fox, Xiaoyan Yu: Connecting topics in document collections with stepping stones and pathways. In Proceedings of the 14th ACM international Conference on information and Knowledge Management (Bremen, Germany, October 31 - November 05, 2005). CIKM '05. ACM Press, New York, NY, 91-98. 2005

Workshop papers:

  • Xiaoyan Yu, Pengbo Liu, Mohammad Alkandari, Manuel Pérez-Quiñones: Visualizing a personal social network of email archives for re-finding. In proceedings of the 2nd workshop on Personal Information Management (PIM 2006) at SIGIR 2006.
  • Manas Tungare , Xiaoyan Yu, Manuel Pérez-Quiñones, Edward A. Fox, Weiguo Fan, Lillian Cassel: Towards a standardized representation of syllabi to facilitate sharing and personalization of digital library content. In Proceedings of the 4th International Workshop on Applications of Semantic Web Technologies for E-Learning, Dublin, Ireland, 2006.
  • Hung-Chi Chi, Xiaoyan Yu, Wenjie Zheng: Data integrity framework and language support for active web intermediaries. In proceedings of the 9th International Workshops on Web Content Caching and Distribution (IWCW 2004), 2004 

RESEARCH/TEACHING/WORKING                          EXPERIENCES:                                                       

  • Research Assistant, Dept. of Computer Science, Virginia Tech, 08/2004 – Present 
    •  Help in writing proposals for several fundings 
    • Brainstorm a variety of ideas with professors, and design and implement prototypes 
    • Present active work in conferences and related classes
  • Mentor, Dept. of Computer Science, Virginia Tech, 08/2006 – Present
    • Guide a undergraduate student and a graduate student on their research studies
    • Introduce them background knowledge
    • Refer them to interesting research studies and discuss with them for better understanding those studies
  • Teaching Assistant, CS 2604 Data Structure and Algorithms, Dept. of Computer Science, Virginia Tech, 08/2003 – 05/2004
    • Grade assignments, answer students’ questions, coordinate lab activities
    • Give lectures when instructors were absent
  • Intern, Cougaar Software Inc., VA, 05/2004 – 08/2004
    • Learn the company’s  Artificial Intelligence platform
    • Design and implement (Java) a workflow editor and monitor for the platform

SKILLS:

·        Technique:

o       Data Mining:

Classification (Decision Tree, Association Rule, Naïve Bayes, Supported Vector Machines, Genetic Algorithm, Genetic Programming)

Clustering

Frequent Itemset and Rule Mining

Probabilistic Methods (Bayesian, Maximum A Posterior, Maximum Likelihood)

o       Information Retrieval:

Crawling, Parsing, Indexing, Searching, Ranking (PageRank, HITS)

Recommender System (Collaborative Filtering)

Personalization, Visualization

Information Extraction (Hidden Markov Model, Conditional Random Field)

Social Network Analysis

o       Tools:

Lucene

Weka

GATE

Prefuse, GUESS

·        Languages:

o       JAVA, C/C++

o       Python, JSP

o       Matlab

o       PL/SQL

 

KEY PROJECTS: 

1. Evidence Mining (dissertation topic and proposal for NSF)

            Just as an open ended question calls for an explanation instead of yes or no, so too do some applications, such as recommender systems and literature-based discovery, require evidence in support of suggested relationships. I proposed a graph-based algorithm to search and rank evidences, and an exploratory approach to present found evidences to users.

2. Personalization in Education (funded by NSF)

For the NSDL (National Science Digital Library) to have a significant impact on the future of education, it needs to be integrated into the current pedagogical practices of educators and students. We are exploring how to best tailor NSDL functionality toward particular communities of users, including learners and educators. Current strategies include Facebook applications and DSpace deployment.

3. Stepping Stones and Pathways (funded by NSF)

Given a query covering multiple topics, it is not possible for traditional information retrieval techniques to find appropriate answers. SSP is proposed to answer such a query by a chain of related documents.

4. Support Vector Machines

Current solution for Support Vector Machines is based on Duality. We are employing multiple mathematic smoothing approaches to calculate the super plane. Our approach is tested with other well known implementations of Support Vector Machines and classification models. 

5. Online Knowledge Community Mining (proposal for NSF)

There is no specific metric to evaluate the quality of knowledge communities such as discussion forums or no guide for them to better serve their members. We are proposing to fill the gap. A prototype was built using GUESS to visualize communities on the VBcity forum and to employ a variety of SNA metrics to evaluate their quality.

6. Personalized News Portal

Given a variety of news feeds, we proposed to provide personalized news according to users’ feedback. A variety of models such as Genetic Algorithm and Genetic Programming were applied to find news more interesting to individuals.

7. Active Web Intermediaries (Master’s thesis at National Univ. of Singapore)

With the maturity of content adaptation technologies (such as I-CAP and OPES) in web intermediaries and of content personalization in the network, data integrity has already become a key concern when such technologies are deployed. To address this issue, I proposed a data integrity framework to support content adaptation, which was deployed in the proxy server, Squid. 

 

HONORS/ AWARDS:

First Place in College of Engineering, Virginia Tech, for the demo, Embodied Data Objects: Tangible Interfaces to Information Appliances, at the 22nd Annual Research Symposium and Exposition, Virginia Tech, 2006.

PROFESSIONAL SERVICES:

Reviewer:  Journal of the American Society for Information Science and Technology, 2007.

Reviewer: Journal of Natural Language Engineering, 2007.

Reviewer: Journal of Management Information Systems, 2007.

Reviewer: Transactions on Information Systems, 2006.

Xiaoyan Yu

Email: 

xiaoyan.yu at gmail dot com

Homepage: 

http://xiaoyan.yu.googlepages.com/

Please get the print  version of my CV.