Research Agenda

I aspire to improve language models by drawing inspiration from human cognition. My research is centered on harnessing the wealth of knowledge from cognitive science to develop better language models that can act as companions and collaborators, ultimately improving human lives.

Lately, I'm been interested in making language models more concept-aware. Concepts play a pivotal role in various cognitive abilities such as categorization, learning, communicating, planning, and decision-making. 

Concepts are the glue that holds our mental model of the world together. They can be thought of as hierarchical categories, used to comprehend the world (similar to knowledge graphs). Concepts can be concrete (“soup”) or abstract (“tasty”). They can also be complex, e.g., “lactose-free and gluten-free chocolate soufflé”. 

I wish to automatically generate flexible, context-dependent concepts in different levels of abstraction, similar to human thinking. I believe that the ability to extract context-dependent representations is crucial for artificial agents and that it will have an overarching impact across many areas. Thus, I wish to create models that capture the abstraction into concepts humans perform when analyzing text. This would resemble human intuition better since we often ignore superficial signals (e.g., style) and distill the message the text aims to convey. 

One of the main tasks LMs are used for is text completion: given a sequence with missing words, the objective is to find the most likely completion. However, their performance is far from perfect, where one notable problem is surface form competition: different surface forms compete for probability mass, even if they represent the same underlying concept in a given context, e.g., “computer” and “PC” (or even more trivial differences such as capitalization). This splits the probability mass between variations of the same concept, distorting the ranking.

This problem emphasizes the need for concepts to create more robust LMs that can truly understand human language by abstracting the superficial words to the concepts they convey. Moreover, I believe that shifting language models from the token- to the concept level will enhance their performance on downstream NLP tasks.

However, context-dependent concept generation is a challenging, ambitious goal. In my recent EMNLP'23 paper Towards Concept-Aware Large Language Models”, I started exploring ways to make language models more concept-aware. Currently, I am working on optimizing concept-aware language models. If you find this direction interesting and wish to hear more about it -- let's talk!