Overview

1993 was a watershed year in the development of empirical methods for processing parallel corpora. Seminal publications by Gale and Church at Bell Labs (CL, 1993) and Brown and colleagues at IBM (CL, 1993) established the methodology, models, and algorithms that form the basis of the modern statistical approaches to machine translation and multilingual text processing. In that year the first Workshop on Very Large Corpora (which would ultimately become EMNLP) was also held, a sign of the broader sea change that transformed how problems in natural language processing are approached.

This workshop, collocated with EMNLP 2013 in Seattle, is an opportunity to look back on the 20-year history of statistical models of bitext processing and to ask where the field will be in another 20 years.

We are looking forward to a fun workshop! If you have any questions, please contact us at 20yearsofbitext@googlegroups.com.

Workshop Program (Friday, October 18)

 Time Agenda  Speaker
 9:15 ~ 9:30 Welcome   Organizers
 9:30 ~ 10:00    What a Translation Model Tastes Like  Kevin Knight, USC/ISI 
 10:00 ~ 10:30    TBD  Philipp Koehn, Edinburgh
 10:30 ~ 11:00 Coffee Break   
 11:00 ~ 12:00    Oh, Yes, Everything's Right on Schedule, Fred [audio and transcript]  Peter F. Brown & Robert L. Mercer,
 Renaissance Technologies
 12:00 ~ 1:00 Panel Discussion: The Development of Statistical Machine Translation [audio and transcript]  Moderator: Philip Resnik, UMD
 1:00 ~ 3:00 Lunch   
 3:00 ~ 3:30Poster Spotlight Session  Workshop Participants
 3:00 ~ 4:30 Posters and coffee  
 4:00 ~ 4:30    Google Translate: Past, Present, Future  Franz Josef Och, Google 
 4:30 ~ 5:30 Panel Discussion: The Future of Bitext and Machine Translation   Panelists:
   Daniel Marcu (SDL)
   Dekai Wu (HKUST)
   Chris Quirk (Microsoft)
   Robert C. Moore (Google)
   Adam Lopez (JHU)

 Authors Title 
 Adam Lopez and Matt Post

 Beyond Bitext: Five Open Problems in Machine Translation
 Daniel Zeman and Ondřej Bojar

 Twenty Flavors of One Text
 Dekai Wu

 What SMT Learns
 Heng Yu, Liang Huang, and Haito Mi

 Violation-Fixing Perceptron and Forced Decoding for Scalable MT Training
 Jim Chang, Joseph Chee Chang, Jian-cheng Wu,
 and Jason S. Chang
 Aligning Words in Bitexts using the Bilingual Web
 Jim White

 The Pedantic Javadoc Corpus: Comments and Code as Bitext
 Jörg Tiedemann, Lonneke van der Plas,
 and Begoña Villada Moirón
 Bitexts as Semantic Mirrors
 Kilian Evang and Johan Bos

 Using parallel corpora to bootstrap multilingual semantic parsers
 Marine Carpuat

 Bitext as Word Sense Annotation: Lessons from Evaluating Machine Translation on
 Lexical Semantics Tasks
 Qing Dou and Kevin Knight

 Beyond Parallel Data -- A Decipherment Approach to Machine Translation
 Seung-won Hwang

 On Applying and Extending Bitext for Entity Translation
 ThuyLinh Nguyen

 Lexicalized Reordering Model in Chart-based Machine Translation
 Waleed Ammar, Chris Dyer, and Noah A. Smith

 Discrete Log-Linear Autoencoders for Unsupervised Learning of Linguistic Structure

Participation

We invite 2-page (including references) extended abstracts for poster presentations on the use of bitext in NLP, including:
  • Representations:  morphology, syntax, semantics, words, phrases;
  • Models:  generative, discriminative, Bayesian, hybrid, neural;
  • Speculation:  e.g., will we still be using Model 4 in 2033?;
  • Applications of bitext models:  machine translation, syntax, semantics, object recognition, topic modeling, paraphrase, textual entailment, question answering;
  • Learning:  latent variables, non-convex optimization, semi-supervision, spectral methods, information theoretic approaches;
  • Formalisms:  automata, transducers, grammars;
  • (and of course) Data: parallel corpora and other artifacts of multilinguality.
To encourage inclusiveness and the presentation of speculative and recent work, abstracts will not be published in a proceedings, but simply reviewed by the conference organizers and panelists to ensure that they are on topic and reasonably coherent. Abstracts on work published or submitted elsewhere are welcome.

Abstracts, indicating authors and their affiliations, should be submitted as an attachment in PDF format to 20yearsofbitext@googlegroups.com before 11:59PM PDT on Friday, August 30, 2013.

Important Dates

 Event  Date
 Abstract deadline  Friday, August 30, 2013
 Acceptance notification  Monday, September 9, 2013
 Workshop  Friday, October 18, 2013

Workshop Organizers

Sponsors

Google

Comments