Cross-language Information Processing & Retrieval

  
    Description
 
    More and more information on the Internet is multilingual. Cross-language information processing and retrieval could help to promote understanding and communication between countries in the fields of  science & technology, economy, culture, etc. Cross-language information processing and retrieval is a sub-field of multilingual information access which is mainly related to information science, artificial intelligence and other fields (See Figure 1).
    Our works in the filed of cross-language information processing and retrieval mainly include: multilingual corpus generation, cross-language keyword extraction, cross-language text classification and clustering, cross-language information retrieval, etc.
 
 

Fig1.  Related Fields about Multilingual Information Access. [ See: Douglas W. Oard, IRAL99] 

     
         Related Links:
   
  • Cross-Language Information Retrieval Resources (By Doug Oard)
  • Cross-Lingual Text Classification (By Jie Tang)
  • LREC 2008 Workshop on Comparable Corpora
  • CLIA2007 workshop
  • CLIA2008 workshop
  • CLIP2007
  • CLIP2006
  • CLIP2005
  • MMIES-2
  • MMIES-1
  • RANLP-2007 Workshop on Acquisition and Management of Multilingual Lexicons
  • NIPS 2006 Workshop Machine Learning for Multilingual Information Access
  • EACL 2006 Workshop on Cross-language Knowledge Induction
  • Eurolan2005 ( Workshop on Cross-Language Knowledge Induction)
  • ACL 2005 Workshop on Building and Using Parallel Texts
  • HLT/NAACL 2003 Workshop on Building and Using Parallel Texts
  • Special Topic Section on Multilingual Information Systems ( JASIST,2006,57(5) )
  • Multi-lingual Language Processing
  • Machine Translation Archive
  • Sentence Alignment and Word Alignment: Projects, Papers, Evaluation, etc.
  • Automatic summarization of Multiple(Multilingual) Documents
  • Language Technology at the JRC 
  • JRC Workshop on EUROVOC
  • EuroVoc: the EU's multilingual thesaurus
  • Top 10 Languages
  • Mining Multilingual Documents
  • CINDOR
  • 语料天涯
  • The International Corpus of English (ICE)
  • ACL SIGWAC
  • Multilingual Glossary of technical and popular medical terms in nine European Languages
  • MultiLingual Computing
  • Language Grid
  • EDR Electronic Dictionary
  • Multilingual Information Management: Current Levels and Future Abilities
  • PEKING: The Linguistic Classification System (LCS)
  • Proteus Project Research
  • Machine Translation of Microsoft Research
  • Multilingual Systems Group of Microsoft Research
  • TTC - Terminology Extraction, Translation Tools and Comparable Corpora(EU)
  • ACCURAT - Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation  (EU)
  • PANACEA - Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies (EU)
  • Language Technologies of EU-FP7
  • MultilingualWeb project  
  • LT4eL-Language Technology for eLearning (uses multilingual language technology tools and semantic web techniques for improving the retrieval of learning material.)
  • KNOW: Developing large-scale multilingual technologies for language understanding
  • KNOW2: Language understanding technologies for multilingual domain-oriented information access 

  •  
  • Language Technology at JRC
  • NLP Group at UNED
  • Web Knowledge Discovery Lab at Sinica
  • JULIE Lab
  • KLE: Knowledge & Language Engineering
  • Natural Language Processing Lab @ Linkoping University
  • Computational Linguistics Group @ University of Vigo
  • Wikipedia Laboratory
  • Ralf  Steinberger
  • Christopher C. Yang
  • Chih-Ping Wei
  • Pascale Fung
  • Chung-Hsing Yeh
  • HSIN-HSI CHEN
  • Eduard Hovy
  • Bruno Pouliquen
  • Jian-Yun Nie 
  • Woosung Kim
  • Nicola CANCEDDA
  • Saif Mohammad
  • Philip Resnik
  • Pierre Zweigenbaum
  • Carol Peters
  • Martin Volk
  • Kalervo Jarvelin
  • Ke Ping
  • Key-Sun Choi
  • Nigel Collier
  • Keita Tsuji
  • Viktor Pekar
  • Diana Inkpen
  • Gayo Diallo
  • Catherine Roussey
  • Paul Rayson
  • Willy Vandeweghe
  • Wolfgang Teubert
  • Tuomas Talvensaari
  • Dragos Stefan Munteanu
  • Fatiha SADAT
  • Jiangping Chen
  • Jason S. Chang
  • Jörg Tiedemann (Uppsala University)
  • Bing Zhao
  • Akiko Aizawa
  • Andrea Mulloni
  • Yanjun Ma
  • Nigel Collier
  • Noah Smith
  • Nuria Bel
  • Pascale Fung
  • Philipp Koehn
  • Regina Barzilay
  • Béatrice Daille (University of Nantes, Monolingual Text Mining, Multilingual Text Mining, Multi-Word Units, Discourse, Document and Corpora)
  • Lorraine Goeuriot (Nanyang Technological University, Comparable corpus)
  • Soto Montalvo
  • Ying Zhang (Carnegie Mellon University)
  • Stephan Vogel (Carnegie Mellon University)
  • Heng A. Ji (City University of New York)
  • Brett W. Bader (Sandia National Laboratories)
  • Manuel Montes y Gómez (Automatic Text Processing)
  • Luis Villaseñor-Pineda
  • Pablo Gamallo Otero (Universidade de Santiago de Compostela, NLP, Ontology design, etc.)
  • Cyril Goutte (National Research Council Canada, Machine Learning)
  • Sudeshna Sarkar (Indian Institute of Technology Kharagpur)
  •  

             Related Resource 
     
             Our Projects:
    • National Natural Science Foundation (No. 70903032): Multilingual Documents Clustering Based on Comparable Corpus (2010-1012), PI
    • National Key Project of Scientific and Technical Supporting Programs funded by Ministry of Science & Technology of China: Information Service System of Scientific and Technical Documents: Key Techniques and Application Demonstration  (No. 2006BAH03B02, 2006BAH03B04) (2006-2009)
             Our Publications:
    • Chengzhi Zhang, Wang Huilin. Multilingual Domain Ontology Learning for Digital Library. Library and Information Service, 2010, to appear.  (in Chinese with English abstract)
    • Sha Liu, Chengzhi Zhang. Survey of Multilingual Document Representation. New Technology of Library and Information Service, 2010, (6): 33-41.  (in Chinese with English abstract)
    • Jie Gui, Peng Li, Chengzhi Zhang, Ying Li and Zhaofeng Zhang. Integrating CRF and Rule method for Knowledge extraction in Patent Mining Task at NTCIR-8. NTCIR8-PATMN. 2010, Submitted.
    • Chengzhi Zhang, Wang Huilin. Multilingual Domain Ontology Learning for Digital Library. Library and Information Service, 2010, to appear.  (in Chinese with English abstract)
    • Hongjiao Xu, Huilin Wang, Chengzhi Zhang. Research on Automatic Construction of Query and Translation Dictionary for Cross-language Information Retrieval. Information Studies: Theory & Application, 2010, 33(3): 105-109.  (in Chinese with English abstract)
    • Zhang Chengzhi. Extracting Chinese-English Bilingual Core Terminology from Parallel Classified Corpora in Special Domain. In: Proceedings of Workshop on Natural Language Processing and Ontology Engineering (NLPOE 2009) in conjunction with Conference on Web Intelligence (WI/IAT-09). Milan, Italy, 2009: 271-274.  [PPT
    • Zhang Chengzhi,Wang Huilin. Survey on Multilingual Document Clustering. New Technology of Library and Information Service, 2009, (6): 31-36.  (in Chinese with English abstract)
    • Wu Dan, He Daqing, Wang Huilin, Shi Chongde, Zhang Chengzhi. Does Query Length Matter? A Comparison of Query Expansion Methods in English-Chinese Cross-Language Information Retrieval. Journal of Computational Information Systems, 2008, 4(3): 1213-1222.
    • Kang Xiaoli, Zhang Chengzhi, Wang Huilin. Survey on Bilingual Terminology Extraction from Comparable Corpora. New Technology of Library and Information Service, 2009, (10): 7-13.  (in Chinese with English abstract)
     
     
     
     
     
     
     
    Comments