24days since
the liberashan

Research

Multilingual, Multiview Text Categorization collection

My colleague Massih Amini and I are making available the pre-processed version of the Reuters corpus data that we used in our papers on multilingual, multiview document categorization:

Reuters RCV1/RCV2 Multilingual, Multiview Text Categorization Test collection

It is a comparable corpus containing around 110K documents in one of 5 languages (English, French, German, Italian and Spanish).  Each document has also been translated in the other 4 languages using a Statistical Machine Translation system.

As there are few benchmarks for multilingual text categorization, we hope this will help promote comparison between systems.

Talks

(tbc)

Program commitees

2010: EAMT-10SIGIR-10, CoNLL-10.

2009: SIGIR-09, CORIA-09, EMNLP-09, Canadian AI graduate symposium,  numéro spécial de la revue TAL sur l'apprentissage automatique pour le TALMachine Translation special issue on pushing the frontier of Statistical Machine Translation.

2008: CORIA-08, .

2007: CORIA-07, EMNLP-07.

2006: IIIA-06, MLIA-06.

In recent years I have also reviewed for Neural Computation, Computational Linguistics, ACM Transactions on Information Systems, IEEE Transactions on Neural Networks, SHARCNet, IEEE Signal Processing Letters, Agence Nationale de la Recherche, IEEE Transactions on Biomedical Engineering, IEEE Transactions on Audio, Speech and Language Processing.