Research

Vision-based robot interaction

Training robots to perform manipulation tasks from visual inputs is a challenging task. A common approach is to train an end-to-end model from scratch using domain-specific data of robots interacting with the physical world. However, collecting such large and diverse datasets can be expensive and may lack generalizability. We are exploring diverse research avenues in this domain, such as extracting of generic visual representations from human video data and incorporating natural language cues to guide robot manipulation. This could circumvent the need for labor-intensive data collection specific to each robotic task, enabling robots to excel across a spectrum of unseen scenarios.

Tabular data understanding

While research based on unstructured data (such as images and text) has seen extensive. During its development, there has been a shortage of studies focused on extracting information from structured data, particularly tables. Large language models (LLMs) struggle with understanding two-dimensional structures like tables, as they are primarily trained on one-dimensional text data. To address this limitation, we are conducting research on table question answering (QA), aiming to enhance LLM’s understanding of tables through techniques such as fine-tuning or prompt engineering.

Retrieval augmented generation

While large language models (LLMs) exhibit impressive performance, they face challenges in encapsulating all information within their parameters and updating to incorporate the latest knowledge. In response, retrieval augmented generation (RAG) emerges as a promising solution. We are investigating question answering (QA) utilizing RAG. Our focus lies in employing various modalities such as tables, text, images, etc. We are particularly delving into multi-hop QA, where answering questions require gathering information from multiple pieces of evidence or sources.

Black box optimization

We address the challenge of optimizing functions in the absence of derivative information. While zeroth-order optimization shows promise in solving such problems, its applications have been restricted to low-dimensional scenarios. Our goal is to overcome these limitations, unlocking the potential for black box optimization in contexts such as LLM adaptation and privacy-preserving distributed learning.