互联网上的信息日益多语言化,进行跨语言的信息处理和检索多加强各国科技、经济、文化等交流具有重要的意义。跨语言信息处理与检索属于多语言信息获取(Multilingual Information Access)研究范畴,主要涉及信息科学、人工智能以及其他相关领域,如图1所示。本研究组研究主要侧重于对多语言文本(主要为中文和英文)进行跨语言的文本处理(Cross-language Information Processing),包括跨语言关键词提取、跨语言文本分类、跨语言文本聚类和检索(Cross-language Information Retrieva)等。
Fig1. Related Fields about Multilingual Information Access. [ See: Douglas W. Oard, IRAL99]
Links:
- National Natural Science Foundation (No. 70903032): Multilingual Documents Clustering Based on Comparable Corpus (2010-1012), PI
- National Key Project of Scientific and Technical Supporting Programs funded by Ministry of Science & Technology of China: Information Service System of Scientific and Technical Documents: Key Techniques and Application Demonstration (No. 2006BAH03B02, 2006BAH03B04) (2006-2009)
Our Publications:
- Zhang Chengzhi. Extracting Chinese-English Bilingual Core Terminology from Parallel Classified Corpora in Special Domain. In: Proceedings of Workshop on Natural Language Processing and Ontology Engineering (NLPOE 2009) in conjunction with Conference on Web Intelligence (WI/IAT-09). Milan, Italy, 2009: 271-274. [PPT]
- Zhang Chengzhi,Wang Huilin. Survey on Multilingual Document Clustering. New Technology of Library and Information Service, 2009, (6): 31-36. (in Chinese with English abstract)
- Wu Dan, He Daqing, Wang Huilin, Shi Chongde, Zhang Chengzhi. Does Query Length Matter? A Comparison of Query Expansion Methods in English-Chinese Cross-Language Information Retrieval. Journal of Computational Information Systems, 2008, 4(3): 1213-1222.
|
|