Automatic Terminology Extraction


   Terminology extraction plays an important role in dictionary compilation, thesaurus construction, Ontology learning, etc. World Wide Web (WWW) is one of important data source for terminology extraction.
Figure 1 shows the process for acquiring new terms from web document collection. Our research in the field of terminology extraction is focus in using machine learning and linguistic information to improve performance of terminology extraction.
       Related Links:
       Our Projects:
  • National Key Project of Scientific and Technical Supporting Programs funded by Ministry of Science & Technology of China: Information Service System of Scientific and Technical Documents: Key Techniques and Application Demonstration  (No. 2006BAH03B02, 2006BAH03B04) (2006-2009)
  • China Postdoctoral Science Foundation (No. 20080430463) & Special Foundation (No. 200801105): Key Techniques in Multilingual Domain Ontology Learning (2008-2009), PI
  • Project of the Education Ministry's Humanities and Social Science (No. 08JC870007): Multilingual Domain Ontology Learning (2008-2010), PI
  • Youth Research Support Fund funded by Nanjing University of Science & Technology (No. JGQN0701): Domain Ontology Learning (2007-2009), PI
       Our Publications:
  1. Jie Gui, Peng Li, Chengzhi Zhang, Ying Li and Zhaofeng Zhang. Integrating CRF and Rule method for Knowledge extraction in Patent Mining Task at NTCIR-8. In: Proceedings of NTCIR-8 Workshop Meeting (NTCIR8-PATMN). Tokyo, Japan, 2010: 341-344.
  2. Zhang Chengzhi. Extracting Chinese-English Bilingual Core Terminology from Parallel Classified Corpora in Special Domain. In: Proceedings of Workshop on Natural Language Processing and Ontology Engineering (NLPOE 2009) in conjunction with Conference on Web Intelligence (WI/IAT-09). Milan, Italy, 2009: 271-274.
  3. Chengzhi Zhang. Using Integration Strategy and Multi-level Termhood to Extract Terminology. Journal of the China Society for Scientific and Technical Information, 2010, Accepted. (in Chinese with English abstract)