Description

Mathematical models of natural language semantics oscillate between the two  opposing approaches of word-based statistical and sentence-based compositional.

Word-based models rely on the ideas of Harris and Firth that words occurring in similar contexts have similar meanings. They gather co-occurrence information for words from large corpora of data, represent this information as vectors in a vector space, and reason about meaning similarity using measures such as geometric distance. These vector models can be traced back to Lund and Burgess 1996, Schutze 1998, and Lin 1998.

Compositional models, in the sense of Montague 1970, systematically associate  the steps of a syntactic derivation with semantic operations acting on the interpretations of the constituents. With respect to word meanings, the compositional approach is agnostic, hence the joke: the meaning of life is LIFE. The compositional design comes out particularly well in categorial type-logics, where a derivation takes the form of a proof in a calculus of syntactic
types, which is then mapped compositionally to a derivation in a semantic type
calculus. Pioneering work on the calculus of syntactic types was done by Lambek 1958, 1961; in the 1980s, Van Benthem combined this with compositional interpretation along the lines of Curry's ``proofs as programs" ideas.

There has been a recent wave in combining these approaches to obtain vector representations for meaning of sentences in a compositional way. Some references here are Mitchel and Lapata 2008, Clark and Pulman 2009,
Baroni and Zamparelli 2010, Coecke, Clark, Sadrzadeh 2010, Grefenstette and Sadrzadeh 2011. Despite their initial promise, these models are still in their infancy, either modelling composition by a structure-forgetting operation such as vector addition, or restricting attention to small fragments of language such as adjective noun combinations and transitive sentences.

In the mean time, within the compositional type-logical group a variety of techniques have been developed to overcome the expressive limitations of the original calculi, resulting in grammar logics that can face the computational confrontation with real data. One can think here of multimodal extensions of categorial grammars in the logical and combinatorial traditions; calculi of discontinuity combining concatenation and wrapping operations;  hybrid type-logical grammars mixing directional and non-directional operations; continuation-based semantic approaches. References include Moortgat 1996, 2009, Steedman 2001, Morrill-Valentin-Fadda 2011, Kubota-Levine 2013, Barker-Shan 2014. What these approaches have in common is that interpretations are set up in terms of set-theoretic models, and that corpus-based statistical data for word meanings have not yet been incorporated.

This workshop is an attempt to bring together active researchers of these seemingly separate fields to address problems of both theoretical and practical nature. One major goal is to introduce the statistical researchers to the advanced type-logical techniques that have been developed to handle challenging grammatical phenomena; the second one is to help the researchers of the logical field to enhance their systems with vector representations. The overall goal is to help both groups collaborate to develop systems where both word vectors and complex grammatical structures can be reasoned about in a compositional and computationally tractable way.
Comments