Malihe Alikhani

I am a Ph.D. candidate in the department of Computer Science at Rutgers University, working with Matthew StoneFind out about my researchpublications, bio and professional services.

Research Blurb: A wide range of communicative artifacts involve the coordinated presentation of visual and linguistic information. My research reflects the long-term scientific challenge of understanding how people formulate and understand such presentations.  I work towards designing conversational systems that can synthesize multimodal presentations to convey information to human users and systems that can deal appropriately with the multimodal communication produced by people

Email: ma1195 AT       GitHub     Twitter 

Multimodal Interaction
Spoken and Multimodal Dialogue Systems 

Communication is often achieved by coordinated visual and linguistic representations. Documents include images and diagrams, and face-to-face conversations are accompanied by iconic gestures. There seems to be some kind of magic that happens in multimodal communication that allows interlocutors to achieve a shared understanding. This is partly due to the systematic and conventional rules that govern the interpretation and generation of visual and spatial representations and partly due to the psychological meanings that they carry. What are these systematic rules? To what extent we can use well-developed natural language techniques to understand the organization of multimodal presentations? What is the cognitive basis for understanding and design of the diagrams? 
Although humans interpret such multimodal forms of communication effortlessly, this is a very difficult task for computers. The difficulty is due in part to our limited scientific knowledge of the structure and organization of these presentations. I work towards designing conversational systems that can synthesize multimodal presentations to convey information to human users and systems that can deal appropriately with the multimodal communication produced by people.  Here are some of my publications related to this topic. 
What's Going on in Images and Captions, M. Alikhani, M. Stone, submitted to NAACL 2019. 
  1. CITE: A Corpus of Image-Text Discourse RelationsM.Alikhani, S. Nag Chowdhury, G. de Melo, M. Stone, In Proceedings of NAACL19. 
  2. "Caption" as a Coherence Relation: Evidence and ImplicationsM.Alikhani, M. Stone, In Proceedings of NAACL19, Workshop on Shortcomings in Vision and Language.
  3. A Coherence Approach to Data-Driven Inference in Visual Communication, M.Alikhani, T.Hiippala, M. Stone, CVPR2019 Workshop on Language and Vision.
  4. Multimodal Strategies for Ambiguity Resolution in Conversational Systems, M. Alikhani, E. Selfridge, M. Stone, M. Johsnton, In submission
  5. Arrows are the Verbs of Diagrams, M. Alikhani, M. Stone, In Proceedings of COLING2018, the 27th International Conference on Computational Linguistics.
  6. Exploring Coherence in Visual Explanations, M. Alikhani, M. Stone, In Proceedings of First International Workshop on Multimedia Pragmatics. 

Natural language Generation

Effective communication depends on using language to refer to objects and entities around us. Words can flexibly refer to different ranges of continuous values in different contexts. This variability is most apparent with relative gradable adjectives such as "long" and "short". The use of these words seems to vary across people, objects, and contextual expectations. For instance, one may refer to a person as tall in a context but may not refer to the same person as a tall basketball player. How do forced choice with set alternatives affects vague terms? Will absence of vague terms affects category boundaries of the neighboring term? How do expectations for vague terms allow for effective communication?   In the following papers, these questions are discussed.  To gain insight into people's expectations for vague words, we have looked at two vague categories, probability, and color. The results show that flexibility of vague terms depends on how well defined their categories are. For example, basic color terms are argued to have well-defined, non-overlapping categories whereas probability terms flexibly refer to different values as a function of the available alternative choices. 
  1. Vague Categories in Communication, M. Alikhani,  K. Persaud, B. McMahan, K. Pei, P. Hemmer, M. Stone, In preparation.
  2. The Influence of Alternative Terms on Speakers’ Choice of Vague Description, M. Alikhani,  K. Persaud, B. McMahan, K. Pei, P. Hemmer, M. Stone, In submission.
  3. When is Likely Unlikely: Investigating Variability of VaguenessK. Persaud, B. McMahan, M. Alikhani, K. Pei, P. Hemmer, M. Stone, In Proceedings of the Cognitive Science Society Conference. 

Generating Referring Expressions

Natural language generation is concerned with generating linguistic material from some non-linguistic material. Referring expressions are the ways we use language to refer to entities around us. How do people produce such expressions? What drives choice understanding and choice making in producing referring expressions? How can we efficiently compute properties that are included in a description, such that it successfully identifies the target while not triggering false conversational implicatures? To generate a distinguishing referring expression, basic algorithms choose a set of attribute-value pairs that uniquely identify the intended referent given an intended referent, a knowledge base of entities characterized by properties expressed as attribute–value pairs and a context consisting of other entities that are salient.  These are my publications related to this topic.
  1. Designing Grounded Representations for Semantic Coordination, B. McMahan, M. Alikhani, M. Stone, In preparation.