Colin Cherry
Research Scientist, Google Translate, Montreal
Google scholar: Link
E-mail address: colin.a.cherry@gmail.com
Previously, I was a Research Officer in Text Analytics at National Research Council Canada, and Researcher in the Natural Language Processing group at Microsoft Research. Before that, I was a PhD student at University of Alberta.
I work in Natural Language Processing and Machine Translation. I'm interested in learning how to best use all of the exciting advances across AI to build specialist systems that can master a well-defined task and carry it out efficiently, with translation being my primary focus.
I'm a proud member of the Association for Computational Linguistics. You can find nearly everything I've done at the ACL Anthology.
Service Highlights
General Chair: NAACL 2025
Chair, North American Association for Computational Linguistics Executive Board: 2020 - 2021
Secretary, North American Association for Computational Linguistics Executive Board: 2016 - 2019
Research Program Chair, The 13th Conference of The Association for Machine Translation in the Americas (AMTA 2018)
Founding co-organizer of the Workshop on Deep Learning for Low Resource NLP (DeepLo) at ACL 2018, EMNLP 2019 and NAACL 2022
ACL Rolling Review Senior Action Editor: 2022 - Present
Transactions of the ACL (TACL) Action Editor: 2016 - Present
Computational Linguistics Journal Editorial Board: 2013 - 2015
Area chair: EACL 2024 (SAC for ML), EMNLP 2023 (MT), EMNLP 2021 (MT), ACL-IJCNLP 2021 (MT), EMNLP 2020 (ML), EMNLP 2019 (MT), IJCNLP 2017 (MT), ACL 2014 (MT)
Best Paper Selection Committee Member for ACL 2014
Publications Chair for HLT-NAACL 2013
Workshop Program Chair for HLT-NAACL 2012
Publications
Please see my Google Scholar page for an up-to-date list.
2021
Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry, Assessing Reference-Free Peer Evaluation for Machine Translation, NAACL 2021.
Daniel Li, Te I, Naveen Arivazhagan, Colin Cherry, Dirk Padfield, Sentence Boundary Augmentation For Neural Machine Translation Robustness, ICASSP 2021.
2020
Julia Kreutzer, George Foster, Colin Cherry, Inference Strategies for Machine Translation with Conditional Masking, EMNLP, November 2020.
Markus Freitag, George Foster, David Grangier, Colin Cherry, Human-Paraphrased References Improve Neural Machine Translation, WMT, November 2020.
Naveen Arivazhagan, Colin Cherry, Te I, Wolfgang Macherey, Pallavi Baljekar and George Foster, Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation, ICASSP, May 2020.
Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey and George Foster, Re-translation versus Streaming for Simultaneous Translation, IWSLT, July 2020.
2019
Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen and Yonghui Wu, Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges, arXiv, July 2019.
Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li and Colin Raffel, Monotonic Infinite Lookback Attention for Simultaneous Machine Translation, In Proceedings of ACL, July 2019.
Colin Cherry and George Foster, Thinking Slow about Latency Evaluation for Simultaneous Machine Translation, arXiv, May 2019.
Saeed Najafi, Colin Cherry and Grzegorz Kondrak, Efficient sequence labeling with actor-critic training, Canadian Conference on Artificial Intelligence, May 2019.
Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen and the Lingvo Team, Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling, arXiv, February 2019.
2018
Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey, Revisiting Character-Based Neural Machine Translation with Capacity and Compression, In Proceedings of EMNLP, October 2018.
2017
Pierre Isabelle, Colin Cherry and George Foster, A Challenge Set Approach to Evaluating Machine Translation, In Proceedings of EMNLP, September 2017. [Follow up here.]
Boxing Chen, Colin Cherry, George Foster and Samuel Larkin, Cost Weighting for Neural Machine Translation Domain Adaptation, In Proceedings of the First Workshop on Neural Machine Translation, August 2017.
Chi-kiu Lo, Boxing Chen, Colin Cherry, George Foster, Samuel Larkin, Darlene Stewart and Roland Kuhn, NRC Machine Translation System for WMT 2017, In Proceedings of WMT, September 2017.
2016
Boxing Chen, Roland Kuhn, George Foster, Colin Cherry and Fei Huang, Bilingual Methods for Adaptive Training Data Selection for Machine Translation, In Proceedings of AMTA, October 2016.
Chi-kiu Lo, Colin Cherry, George Foster, Darlene Stewart, Rabib Islam, Anna Kazantseva and Roland Kuhn, NRC Russian-English Machine Translation System for WMT 2016, In Proceedings of WMT, August 2016.
Colin Cherry, An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation, In Proceedings of NAACL, June 2016
Mohammad Salameh, Colin Cherry and Grzegorz Kondrak, Integrating Morphological Desegmentation into Phrase-based Decoding, In Proceedings of NAACL, June 2016
Saif M Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, Colin Cherry, Semeval-2016 task 6: Detecting stance in tweets, In Proceedings of SemEval, June 2016
Saif M Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, Colin Cherry, A dataset for detecting stance in tweets, In Proceedings of LREC, May 2016
2015
Colin Cherry, Hongyu Guo and Chengbi Dai, NRC: Infused Phrase Vectors for Named Entity Recognition in Twitter, In Proceedings of the ACL 2015 Workshop on Noisy User-generated Text (W-NUT), July 2015.
Colin Cherry and Hongyu Guo, The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition, In Proceedings of NAACL, June 2015
Garrett Nicolai, Colin Cherry and Grzegorz Kondrak, Inflection Generation as Discriminative String Transduction, In Proceedings of NAACL, June 2015
Mohammad Salameh, Colin Cherry and Grzegorz Kondrak, What Matters Most in Morphologically Segmented SMT Models?, In Proceedings of the Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST), June 2015
Garrett Nicolai, Colin Cherry and Grzegorz Kondrak, Morpho-syntactic Regularities in Continuous Word Representations: A Multilingual Study, In Proceedings of the Workshop on Vector Space Modeling for NLP, June 2015
2014
Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, Saif M Mohammad, NRC-Canada-2014: Detecting aspects and sentiment in customer reviews, In Proceedings of SemEval, August 2014
Mohammad Salameh, Colin Cherry, Grzegorz Kondrak, Lattice Desegmentation for Statistical Machine Translation, In Proceedings of ACL, June 2014
Boxing Chen, Colin Cherry, A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU, In Proceedings of the Workshop on Statistical Machine Translation (WMT), June 2014
2013
Michel Galley, Chris Quirk, Colin Cherry, Kristina Toutanova, Regularized Minimum Error Rate Training, In Proceedings of EMNLP, October 2013
Colin Cherry, Improved Reordering for Phrase-Based Translation using Sparse Features, In Proceedings of NAACL, June 2013
Mohammad Salameh, Colin Cherry, Grzegorz Kondrak, Reversing Morphological Tokenization in English-to-Arabic SMT, In Proceedings of the NAACL Student Research Workshop, June 2013
Colin Cherry, Xiaodan Zhu, Joel Martin, Berry de Bruijn, À la Recherche du Temps Perdu: Extracting temporal relations from medical text in the 2012 i2b2 NLP challenge, Journal of the American Medical Informatics Association (JAMIA), March 2013
2012
Wei Xu, Alan Ritter, Bill Dolan, Ralph Grishman and Colin Cherry, Paraphrasing for Style, In Proceedings of COLING, December 2012
Colin Cherry, Robert C. Moore and Chris Quirk, On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding, In Proceedings of the NAACL Workshop on Statistical Machine Translation, June 2012
Colin Cherry and George Foster, Batch Tuning Strategies for Statistical Machine Translation, in Proceedings of NAACL, June 2012 [Updated with improved scores for PRO]
Colin Cherry, Saif M. Mohammad and Berry de Bruijn, Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes, Biomedical Informatics Insights 5 (Suppl. 1), January 2012
2011
Alan Ritter, Colin Cherry and Bill Dolan, Data-Driven Response Generation in Social Media, in Proceedings of EMNLP, July 2011
Berry de Bruijn, Colin Cherry, Svetlana Kiritchenko, Joel Martin and Xiaodan Zhu, Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010, Journal of the American Medical Informatics Association (JAMIA), May 2011
Svetlana Kiritchenko and Colin Cherry, Lexically-Triggered Hidden Markov Models for Clinical Document Coding, in Proceedings of ACL-HLT, June 2011 [Dictionary]
Colin Cherry and Shane Bergsma, Joint Training of Dependency Parsing Filters through Latent Support Vector Machines, in Proceedings of ACL-HLT, June 2011 [Code]
2010
Shane Bergsma and Colin Cherry, Fast and Accurate Arc Filtering for Dependency Parsing, in Proceedings of COLING, August 2010
Alan Ritter, Colin Cherry and Bill Dolan, Unsupervised Modeling of Twitter Conversations, in Proceedings of NAACL, June 2010
Sittichai Jiampojamarn, Colin Cherry and Grzegorz Kondrak, Integrating Joint n-gram Features into a Discriminative Training Framework, in Proceedings of NAACL, June 2010
2009
Kristina Toutanova and Colin Cherry, A Global Model for Joint Lemmatization and Part-of-speech Prediction, in Proceedings of ACL, August 2009
Colin Cherry and Hisami Suzuki, Discriminative Substring Decoding for Transliteration, in Proceedings of EMNLP, August 2009
Hoifung Poon, Colin Cherry, and Kristina Toutanova, Unsupervised Morphological Segmentation with Log-Linear Models, in Proceedings of NAACL-HLT, June 2009 [Best Paper]
Susan Bartlett, Grzegorz Kondrak, and Colin Cherry, On the Syllabification of Phonemes, in Proceedings of NAACL-HLT, June 2009
Nguyen Bach, Stephan Vogel, and Colin Cherry, Cohesive Constraints in A Beam Search Phrase-based Decoder, in Proceedings of NAACL-HLT, June 2009
2008
Colin Cherry and Chris Quirk, Discriminative, Syntactic Language Modeling through Latent SVMs, in Proceedings of AMTA, Association for Machine Translation in the Americas, October 2008
Colin Cherry, Cohesive Phrase-Based Decoding for Statistical Machine Translation, in Proceedings of ACL: HLT, June 2008
Sittichai Jiampojamarn, Colin Cherry, and Grzegorz Kondrak, Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion, in Proceedings of ACL: HLT, June 2008
Susan Bartlett, Grzegorz Kondrak, and Colin Cherry, Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion, in Proceedings of ACL: HLT, June 2008 [Best Student Paper]
2007
Colin Cherry and Dekang Lin, Inversion Transduction Grammar for Joint Phrasal Translation Modeling, in Proceedings of SSST, NAACL-HLT / AMTA Workshop on Syntax and Structure in Statistical Translation, April 2007
2006
Colin Cherry and Dekang Lin, Soft Syntactic Constraints for Word Alignment through Discriminative Training, in Proceedings of COLING/ACL, July 2006
Qin Wang, Colin Cherry, Dan Lizotte, and Dale Schuurmans, Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization, in Proceedings of CoNLL, June 2006
Colin Cherry and Dekang Lin, A Comparison of Syntactically Motivated Word Alignment Spaces , in Proceedings of EACL, April 2006
2005
Colin Cherry and Shane Bergsma, An Expectation Maximization Approach to Pronoun Resolution, in Proceedings of CoNLL, June 2005
Chris Quirk, Arul Menezes, and Colin Cherry, Dependency Treelet Translation: Syntactically Informed Phrasal SMT, in Proceedings of ACL, June 2005
2004
Chris Quirk, Arul Menezes, and Colin Cherry, Dependency Tree Translation: Syntactically Informed Phrasal SMT, no. MSR-TR-2004-113, November 2004
2003
Colin Cherry and Dekang Lin, A Probability Model to Improve Word Alignment, in Proceedings of ACL, July 2003
Dekang Lin and Colin Cherry, Word Alignment with Cohesion Constraint, in Proceedings of HLT/NAACL, May 2003
Dekang Lin and Colin Cherry, ProAlign: Shared Task Description, in Proceedings of the HLT/NAACL Workshop on Building and Using Parallel Texts, May 2003