One paper accepted at ICCV 2025
One paper accepted at ICPR 2024
One paper accepted at CVPR 2024
1st place in the SMART-101 Algorithmic Reasoning Challenge
One paper accepted at AAAI 2024
Research Highlights
This project explores spatial and temporal subspaces for video representation learning by introducing information-theoretic approach. Specifically, we aim to learn representations that are disentangled in both spatial and temporal by maximizing the coding rate reduction. As a result, we address the challenging problem of action recognition in a compositional generalization setting, where current vision-language models (VLMs) struggle to perform effectively.
Long-term video is composed of multiple complex semantics. We aim to solve challenging problems like the 'cups and balls trick game' by considering semantic units that make up semantics and disentangling the composed relationships among semantic units.
Research to enhance the performance of prediction of protein structure which is pivotal in determining their functions. The study involves investigating multiple sequence alignments(MSA) and prediction models for protein structure prediction.
Video Question Answering is a task that receives video and question as inputs and outputs an answer to the question. In order to solve this problem efficiently, we are conducting research on how to clearly grasp the spatio-temporal semantic structure of video and extract the core semantic structure.
Video scene graph generation has been an emerging research topic, which aims to interpret a video as a temporally-evolving graph structure by representing video objects as nodes and their relationships as edges. Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames. We are working on embedding these temporal information into the model by extracting it effectively from semantic features in a frame.
The personlalized multimodal dialog research using personalized memory
Developing a deep learning algorithm for predicting global weather forecasting using multimodal data.