Multimodal Interaction

Spoken and Multimodal Dialogue Systems 
Communication is often achieved by coordinated visual and linguistic representations. Documents include images and diagrams, and face-to-face conversations are accompanied by iconic gestures. There seems to be some kind of magic that happens in multimodal communication that allows interlocutors to achieve a shared understanding. This is partly due to the systematic and conventional rules that govern the interpretation and generation of visual and spatial representations and partly due to the psychological meanings that they carry. What are these systematic rules? To what extent we can use well-developed natural language techniques to understand the organization of multimodal presentations? What is the cognitive basis for understanding and design of the diagrams? 
Although humans interpret such multimodal forms of communication effortlessly, this is a very difficult task for computers. The difficulty is due in part to our limited scientific knowledge of the structure and organization of these presentations. I work towards designing conversational systems that can synthesize multimodal presentations to convey information to human users and systems that can deal appropriately with the multimodal communication produced by people.  Here are some of my publications related to this topic. 
  • Multimodal Strategies for Ambiguity Resolution in Conversational SystemsM. Alikhani, E. Selfridge, M. Stone, M. Johsnton, submitted to AAAI 2019. 
  • Arrows are the Verbs of Diagrams, M. Alikhani, M. Stone, In Proceedings of COLING2018, the 27th International Conference on Computational Linguistics.
  • Exploring Coherence in Visual Explanations, M. Alikhani, M. Stone, In Proceedings of First International Workshop on Multimedia Pragmatics. 

Natural language Generation

Effective communication depends on using language to refer to objects and entities around us. Words can flexibly refer to different ranges of continuous values in different contexts. This variability is most apparent with relative gradable adjectives such as "long" and "short". The use of these words seems to vary across people, objects, and contextual expectations. For instance, one may refer to a person as tall in a context but may not refer to the same person as a tall basketball player. How do forced choice with set alternatives affects vague terms? Will absence of vague terms affects category boundaries of the neighboring term? How do expectations for vague terms allow for effective communication?   In the following papers, these questions are discussed.  To gain insight into people's expectations for vague words, we have looked at two vague categories, probability, and color. The results show that flexibility of vague terms depends on how well defined their categories are. For example, basic color terms are argued to have well-defined, non-overlapping categories whereas probability terms flexibly refer to different values as a function of the available alternative choices. 
  • Vague Categories in Communication, M. Alikhani,  K. Persaud, B. McMahan, K. Pei, P. Hemmer, M. Stone, In preparation.

Generating Referring Expressions

Natural language generation is concerned with generating linguistic material from some non-linguistic material. Referring expressions are the ways we use language to refer to entities around us. How do people produce such expressions? What drives choice understanding and choice making in producing referring expressions? How can we efficiently compute properties that are included in a description, such that it successfully identifies the target while not triggering false conversational implicatures? To generate a distinguishing referring expression, basic algorithms choose a set of attribute-value pairs that uniquely identify the intended referent given an intended referent, a knowledge base of entities characterised by properties expressed as attribute–value pairs and a context consisting of other entities that are salient.  These are my publications related to this topic.
  • Designing Grounded Representations for Semantic Coordination, B. McMahan, M. Alikhani, M. Stone, In preparation.