NAACL 2010 logo

Workshop on Computational Neurolinguistics

NAACL HLT 2010 Workshops

Epilogue: The papers can now be accessed in full at the ACL website, and a selection of presentations are available for download too, linked below.

Programme, Sunday 6th June, 2010, Los Angeles

9:00-10.30
Invited Talk: Prof. Tom Mitchell, Carnegie Mellon University
Full Presentation: Learning Semantic Features for fMRI Data from Definitional Text (Pereira, Botvinick & Detre) [Presentation]

10:30-11:00 coffee

11:00-12:30
Full Presentation: Concept Classification with Bayesian Multi-Task Learning (van Gerven & Simanova) [Presentation]
Full Presentation: WordNet Based Features for Predicting Brain Activity associated with Meanings of Nouns (Jelodar, Alizadeh & Khadivi)
Short Presentation: Network Analysis of Korean Word Associations (Jung, Li & Akama)

12:30-13:30 lunch

13:30-15:00
Full Presentation: Detecting Semantic Category in Simultaneous EEG/MEG Recordings (Murphy & Poesio) [Presentation]
Short Presentation: Hemispheric processing of Chinese polysemy in the disyllabic verb/ noun compounds: an event-related potential study (Huang & Lee) [Presentation]
Short Presentation: An Investigation on Polysemy and Lexical Organization of Verbs (Germann, Villavicencio & Siqueira)
Mini-Tutorial: Crash Course in Computational Neuroscience of Language (Murphy) [Presentation]

15:00-15:30 coffee

15:30-17:00
Mini-Tutorial: Crash Course in Computational Neuroscience of Language (Murphy, cntd.) 
Full Presentation: Acquiring Human-like Feature-Based Conceptual Representations from Corpora (Kelly, Devereux & Korhonen) [Presentation]
Full Presentation: Using fMRI Activation to Conceptual Stimuli to Evaluate Methods for Extracting Conceptual Representations from Corpora (Devereux, Kelly & Korhonen) [Presentation]

17:00-18:00 Discussion

Presentation Guidelines for authors can be found here.
  • Full paper oral presentations are 25 minutes in duration, plus 5 minutes for questions.
  • Short paper oral presentations are 15 minutes in duration, plus 5 minutes for questions.

Outline

Computational neurolinguistics is an emerging research area which integrates recent advances in computational linguistics and cognitive neuroscience, with the objective of developing cognitively plausible models of language and gaining a better understanding of the human language system. It builds on research in decoding cognitive states from recordings of neural activity, and computational models of lexical representations and sentence processing. Published work in this area includes the discovery of semantic features in neural activity (Mitchell et al, 2008), using brain signals for the relative evaluation of corpus semantic models (Murphy et al, 2009), and recognizing the semantics of adjective-noun meaning composition (Chang et al, 2009).

On-going research focuses on a number of topics such as brain-computer interfaces to provide dictation systems for paraplegic patients, and algorithms to perform tagging and shallow parsing of neural activity recorded during sentence comprehension. Both computational linguistics and neuroscience stand to gain from these techniques. In computational linguistics, the cognitive plausibility of language models has primarily been evaluated against collections of subjective intuitions (e.g. semantic feature norms, grammaticality judgments, corpus annotations, dictionaries). Evaluation of the large body of Computational Linguistics work based on data driven distributional approaches has also relied on hand-crafted resources such as WordNet or data sets manually tagged with a predefined list of categories. Comparison with neural data may provide a more objective yardstick for both models and resources. And in brain imaging, language-related research has often been limited to relatively coarse analyses (e.g. high level features such as animacy or part-of-speech) but now computational neurolinguistic methods have leveraged the richness of corpus-based descriptions to extract finer-grained representations for single lexemes.

Advances in computational neurolinguistics require close collaboration between computational linguists and neuroscientists. To this end, an interdisciplinary workshop can play a key role in advancing existing and initiating new research. We hope that it will attract an interdisciplinary target audience consisting of computational linguists, machine learning researchers, computational neuroscientists and cognitive scientists.

Topics of Interest

  • Computational Linguistic Focus
    • Word-level analyses (e.g. corpus semantic models, lexica, lexical relations and ontologies, parts-of-speech, word senses, morphology)
    • Phrase-level analyses (e.g. word compounds, meaning composition in multi-word expressions)
  • Machine Learning Focus
    • Decoding of cognitive states from neural activity
    • Feature selection and data mining techniques for decoding linguistic information
  • Neural Science Focus
    • Brain imaging techniques: fMRI, EEG, MEG, NIRS, including cross-modality analysis (e.g. combining fMRI and EEG)
    • Localizing Regions of Interest (e.g. identify the roles / functions of brain regions)
  • Cognitive Science Focus
    • Comparisons with behavioral (e.g. priming experiments, eye-tracking, self-paced reading) and elicited data (e.g. semantic feature norms)
    • Biologically plausible connectionist approaches

Shared Data-Sets

Submissions based on any data-sets or tasks are welcomed, and originality of approach is encouraged. However, to assist researchers who are new to this topic, we are providing the data used in Mitchell et al. (2008) and Murphy et al. (2009), as well as a number of sample shared tasks. Submissions are welcome that follow the tasks in whole or in part, or simply to use them as an evaluation baseline for their own work. Performance will not be independently validated by the organizers, and will only be one of the criteria used to select among submissions.
  • The CMU fMRI data-set of 60 concrete concepts, in 12 categories, collected while nine English speakers were presented with 60 line drawings of objects with text labels and were instructed to think of the same properties of the stimulus object consistently during each presentation. For each concept there are 6 instances of ~20k neural activity features (brain blood oxygenation levels).
  • The Trento EEG data-set for 60 concept concepts, in 2 categories (work tools and land mammals), collected while seven Italian speakers were silently naming photographic images that represent these concepts. For each concept there are 6 instances of ~15k neural activity features (spectral power in voltage signals).

Sample Shared Tasks

As noted above, submissions on any task are welcomed, and these tasks are primarily intended to provide a possible starting point for researchers who are new to the topic.
  • Concept-pair neural discrimination task: For two concepts randomly left out of training, teach a classifier to match recorded neural data to the correct lexeme. This may be achieved by taking advantage of corpus-based models of word meaning, as in published research, or otherwise. This task is based on the evaluation method used with fMRI data in Mitchell et al. (2008), and replicated with EEG data in Murphy et al. (2009).
  • Corpus semantic model evaluation task: Teach a classifier to predict the neural activity observed for single concepts, based on each of several corpus semantic models. The average similarity between observed activity and predicted activity over all concepts can be taken as metric of corpus model fidelity.

Submissions

Authors are invited to submit full papers on original, unpublished work in the topic area of this workshop via the NAACL submission site. Submissions should be formatted using the NAACL 2010 stylefiles, with blind review and not exceeding 8 pages plus an extra page for references. The stylefiles are available at http://naaclhlt2010.isi.edu/authors.html. The PDF files will be submitted electronically through the NAACL submission system, the link will be available later. Each submission will be reviewed at least by two members of the programme committee. Accepted papers will be published in the workshop proceedings. Dual submissions to the main NAACL 2010 conference and this workshop are allowed; if you submit to the main session, indicate this when you submit to the workshop. If your paper is accepted for the main session, you should withdraw your paper from the workshop upon notification by the main session.

Important Dates

  • March 10, 2010: Deadline for submission of workshop papers
  • March 30, 2010: Notification of acceptance
  • April 12, 2010: Camera-ready papers due
  • June 6, 2010: Workshop date

Organisers

  • Brian Murphy, Centre for Mind/Brain Studies, University of Trento, Italy
  • Kai-min Kevin Chang, Language Technologies Institute, Carnegie Mellon University, USA
  • Anna Korhonen, Computer Laboratory, University of Cambridge, UK

Invited Speakers

  • Tom Mitchell, Carnegie Mellon University, USA

Program Committee

  • Afra Alishahi, Saarland University, Germany
  • Ben Amsel, University of Toronto, Canada
  • Stefano Anzellotti, Harvard University, USA
  • Colin Bannard, University of Texas Austin, USA
  • Marco Baroni, University of Trento, Italy
  • Gemma Boleda, Universitat Politècnica de Catalunya, Spain
  • Ina Bornkessel, Max Planck Leipzig, Germany
  • Augusto Buchweitz, Carnegie Mellon University, USA
  • George Cree, University of Toronto, Canada
  • Barry Devereux, University of Cambridge, UK
  • Katrin Erk, University of Texas Austin, USA
  • Stefan Evert, Unversity of Osnabrück, Germany
  • Adele Goldberg, Princeton University, USA
  • Chu-Ren Huang, Hong Kong Polytechnic University, Hong Kong
  • Aravind Joshi, University of Pennsylvania, USA
  • Marcel Just, Carnegie Mellon University, USA
  • Frank Keller, University of Edinburgh, UK
  • Charles Kemp, Carnegie Mellon University, USA
  • Mirella Lapata, University of Edinburgh, UK
  • Chia-Ying Lee, Academia Sinica, Taiwan
  • Roger Levy, University of California Sand Diego, USA
  • Angelika Lingnau, University of Trento, Italy
  • Brad Mahon, University of Rochester, USA
  • Robert Mason, Carnegie Mellon University, USA
  • Diana McCarthy, Lexical Computing Ltd, UK
  • Ken McRae, University of Western Ontario, Canada
  • Tom Mitchell, Carnegie Mellon University, USA
  • Fermin Moscoso del Prado Martin, University of Provence, France
  • Sebastian Padò, University of Stuttgart, Germany
  • Francisco Pereira, Princeton University, USA
  • Massimo Poesio, University of Trento, Italy
  • Thierry Poibeau, CNRS and Ecole Normale Supérieure, France
  • Dean Pomerleau, Intel Labs Pittsburgh, USA
  • Ari Rappoport, Hebrew University of Jerusalem, Israel
  • Brian Roark, Oregeon Health & Science University, USA
  • Kenji Sagae, University of Southern California, USA
  • Hinrich Schütze, Stuttgart University, Germany
  • Sabine Schulte im Walde, University of Stuttgart, Germany
  • Svetlana Shinkareva, University of South Carolina, USA
  • Nathaniel Smith, University of San Diego, USA
  • Aline Villavicencio, Federal University of Rio Grande do Sul, Brazil
  • David Vinson, University College London, UK
  • Yang ChinLung, City University of Hong Kong, China

Links

Centre for Mind/Brain Studies Logo Language Technologies Institute Logo Cambridge University Logo