Multilingual, Multiview Text Categorization collectionMy colleague Massih Amini and I are making available the pre-processed version of the Reuters corpus data that we used in our papers on multilingual, multiview document categorization: Reuters RCV1/RCV2 Multilingual, Multiview Text Categorization Test collection It is a comparable corpus containing around 110K documents in one of 5 languages (English, French, German, Italian and Spanish). Each document has also been translated in the other 4 languages using a Statistical Machine Translation system. As there are few benchmarks for multilingual text categorization, we hope this will help promote comparison between systems. Talks(tbc) Program commitees2010: EAMT-10, SIGIR-10, CoNLL-10. 2009: SIGIR-09, CORIA-09, EMNLP-09, Canadian AI graduate symposium, numéro spécial de la revue TAL sur l'apprentissage automatique pour le TAL, Machine Translation special issue on pushing the frontier of Statistical Machine Translation. 2008: CORIA-08, . In recent years I have also reviewed for Neural Computation, Computational Linguistics, ACM Transactions on Information Systems, IEEE Transactions on Neural Networks, SHARCNet, IEEE Signal Processing Letters, Agence Nationale de la Recherche, IEEE Transactions on Biomedical Engineering, IEEE Transactions on Audio, Speech and Language Processing. |