DisCoCat: A Monoidal Translation between English and Spanish

Graduate Student Mentors: Jade Master & Christian Williams

How do computers represent natural language efficiently? It’s not using dancing cats, but a technique called distributional compositional categorical models of meaning (DisCoCat). In knowledge representation of natural language, there are two camps: representing grammar through systems of formal rules, and representing words using statistics. The logical systems are good for understanding how words combine into sentences but can be too rigid to model the infinite complexity of language. On the other hand, distributional methods model complex linguistic meaning effectively but do not take into account the grammatical structure of language.

DisCoCat is a subject which studies how to unify these two approaches using applied category theory. We can build up kinds of grammatical units, such as verbs, adjectives, phrases and sentences, in a structure called a category, and then model these by a map which preserves this structure. The way in which these grammatical units can be plugged together is represented by diagrams similar to the one shown below.

Recent work has shown how to use category theory to develop methods of automatic translation between DisCoCat models. In this project we will explore methods of combing simpler basic DisCoCat models to build up complex models of natural languages. This project has connections to topological quantum field theory, linguistics, automatic translation and artificial intelligence.