导航

Homepage


Welcome to Min's NLP homepage!  I have been pursuing for a CS PhD in The City University of New York since August 2012. My advisor is Andrew Rosenberg. Here is my recent CV, contents below on the homepage are no longer updated.

                                                                                
Undergraduate (Senior)
Beijing University of Posts and Telecommunications.
Lab:Natural Language Processing Group,
State Key Lab of Intelligent Technology and Systems,
National Lab for Information Science and Technology,
Tsinghua University,Beijing, China.

ACL member, CCF student member

Email: minma.cs.bupt@gmail.com 
Skype ID: Chinesemin99


Lab Add: Room 4-506, FIT Building, Tsinghua University, Beijing, P. R. China, 100084. 


                                                                                                                       
(Photo: Min Ma at MSRA for SIGIR&ACL Summer School, Beijing, August 29th,2011)



Spotlights

My team is awarded the Meritorious Winner out of 735 teams from different countries in  ICM2011 (U.S. Interdisciplinary Contest in Modeling) !    More

My full paper: Acoustic Feature-based Clustering for the Phonetic Degree of Chinese Phonograms was accepted by the EECM2011!     More

My team has developed a toolkit for Speech Recognition based on Scilab platform!     Demo

Research Interests
I am applying for a PhD in Computer Science, especially in large-scale text mining, information extraction and retrieval. I also span a wide interest spectrum in the applications of Natural Language Processing technologies, such as topics detecting and tracking of social media, knowledge-based QA system, multilingual machine translation system, etc. Now I am working as research assistant in the NLP group of the State Key Lab of Intelligent Technology and System in Tsinghua University. My graduation thesis concentrates on the keyword extraction from microblog using Labeled Latent Dirichlet Allocation (OLDA) model. (CV)

Knowledge

Text Mining: Latent Semantics Indexing, Probabilistic Latent Semantics, Latent Dirichlet Allocation, Social Networks, Hierarchy Dirichlet Process, Labeled Latent Dirichlet Allocation.

Statistics Theories: Markov Chain Monte Carlo, Gibbs Sampling, Expectation Maximization, Maximum Likelihood Estimation, Hidden Markov Model, Conditional Random Field, Gaussian Processes, Poisson Processes, Dynamic Programming,Graphical Models.

Machine Learning:  Linear and Logistic Regression, Decision Trees, Linear Discriminant Analysis, Support Vector Machines, Boosting, Clustering Algorithms.

Natural Language Processing Knowledge: Parsing and POS Tagging, Lexical Semantics; Knowledge Representation, Temporal Reasoning and Extraction, Statistical Natural Language Modeling, Reinforcement Learning, Learning to Rank, Paraphrasing.

Linguistic Skills: Chinese (mandarin) : proficient; English: proficient; French: basic; Japanese: basic. 


Previous Projects


Natural Language Processing Research                       Mentor: Maosong Sun               July 2011–Present

Research Assistant, NLP Group, State Key Lab of Intelligent Technology and System, Tsinghua Univeristy

ü  Topic:  Keyword Extraction Based on Topic Models 

·   Utilized and conducted an online variational Bayes (VB) algorithm for Latent Dirichilet Allocation (LDA). The program was able to analyze stream data and demonstrated nice performance on keyword extraction from Sina microblog.

·   To overcome the inherent credit attribution problem of LDA for keyword extraction from large multi-labeled corpora, built up one-to-one correspondences between LDA’s latent topics and user’s tags. Experimentalized on Sina microblog.

·   Put forward to employ adaptive online Labeled LDA for text stream extraction, use Kullback-Leibler distance of topics to determine their correlations, and process Hidden Markov Chain to describe the evolution process of topics among the timeline. Consequent applications might include opinion analysis and recommendation system.

ü  Topic:  Chinese Word Segmentation and Short Text Classification Techniques Based on CRF model

·   Inspired by the character selection algorithms of text classification, translated the Chinese word segmentation into sequence tagging through character tagging process. Employed conditional random field (CRF) model to parse.

·   Proposed a refined parsing method by expanding the scope of potential parsing candidates beyond the local context, with adoption of word clustering and HowNet dictionary.

Quantum telecommunication theory Research                     Mentor: Fenzhuo Guo                                     July2010–June 2011

Research Assistant, Quantum Information Group, State Key Lab of Network and Switch, BUPT

  Topic I:  Security Analysis and Improvements of Quantum telecommunication Protocols

(Supported by College Student Innovation Fund of BUPT)                                                                       

·    Learned about Shannon Information Theory and Quantum Cryptology. Used them to analyze various quantum telecommunication protocols including BB84, B92, two-step direct telecommunication, etc. Investigated the security of quantum key distribution (QKD) under the individual attacks with the adoption of Einstein-Podolsky-Rosen protocol,

·    Put forward an attracting method toward a new quantum signature protocol, disproved its absolute security.

·    Compiled the first interpretable brochure about Quantum Computation and Quantum Information (Nielsen & Chuang).

Mathematical Modeling Application                                          Mentor: Zuguo He                                  March 2010–September 2010

Research Assistant, Math and Physics Innovation Lab, BUPT                                               

   Topic I:  Impact Factors of TV Consumers’ behavior                                                                                       (April 2010–May 2010)

·    Probed the impact factors through statistical analysis of relevant data and deep interviews of six respondents. Introduced subjective norm, balanced demand, and perceived risk variables into the classical Technology Acceptance Frame, then recognized the top four impact factors and their psychological and transmission effects.

·    Processed a survey report. Development strategies based on these impact factors for TV market were also given.

    Topic II:  Rectification of Phonetic Degree of Chinese Phonogram                                                (August 2010–September 2010)

·    Introduced acoustic features to design comprehensive similarity measurement of Chinese phonogram, developed K-mean dynamic clustering algorithm to classify and overlapped area to describe tone closeness, finally recalibrated the overall phonetic degree of Chinese phonogram to be 50.0%.

·    Raised new classification standards for Chinese phonogram. Accomplished new classification chart for foreign learners.

    Topic III:  Optimal Decision of Bus Paths                                                                                                                                     (August 2010)

·    Simplified the transportation system into directed graph. Considering the basic needs of stop, transfer and cost with different priorities, constructed multi-objective optimization model based on spots searching. Solved by improved Dijkstra algorithm.

·    Raised double-objective optimization model based on line searching. Introduced concepts of visual station and visual path to transform original problem into dynamic planning question. Expanded the model into bus-subway system by separating the real subway station into several sub-stations.

     Topic IV: Assessment of Urban Heavy Metal Pollution                                                                                                    (September 2011)

·    Visualized the distributions of main heavy metal elements in city using Matlab, accessed the pollution degree by single factor method and fuzzy comprehensive model. Located the sources of pollution by factor analysis. Constructed partial differential equations regarding the propagation characteristics to describe and simulate the migration of edatope. 


Current Project

            Online Labeled LDA: a partially supervised topic model for topic extraction from microblog                   (December 2011-present)



Professional Skills

·   IT Skills: C,C++, Python, Java, HTML, XML;    

·    Mathematical Software skills: Mathematica, Scilab, SPSS, Sybase, Lingo



Awards and Honors

ü Meritorious Winner, International Interdisciplinary Contest in Modeling (ICM), April 2011

ü  First Prize of Beijing Division, China Undergraduate Mathematical Contest in Modeling(CUMCM), December 2010

ü  Second Place, Humanistic Knowledge Contest of BUPT, November 2009

üThird Prize, Beijing College Humanistic Knowledge Competition, December 2009

ü  Scholarship Recipient, Beijing University of Post and Telecommunications, September 2011

ü  Participated Award, French-Chinese Open Source Software Competition, September 2011

ü  Piano Skill: 10th grade(Highest Grade) in the National Piano Grade Test, August 2008 

ü  Singing Skill: 9th grade (Highest Grade) in the Vocal Grade Test of Arts Academy of Yunnan Province, August 2009

üYellow Belt of Taekwondo in the Grade Test of World Taekwondo Association, January 2009