Current Project and My Research Interests

Injecting low-level concepts into open-sourced VLMs

VLMs are not good at understanding low-level concepts such as spatial, and numerical values, so we inject this concept by fine-tuning VLMs.

Keywords I’m interested in: LLM, VLM, Planning, Chain-of-Thought, Embodied Agent, Foundation Models, Embodied Reasoning, In-Context Learning, Zero-Shot, Semantic Reasoning, LoRA, Feedback, Reinforcement Learning.

Preprints

[17th, July 2024] Limitation of Prompt Engineering with Gemini-Flash in Embodied Planning from Ego-Centric Video

PDF: https://drive.google.com/file/d/13SImnJ96m8qxHlPPtTs4CjTWGRRgpIqZ/view?usp=sharing

[20th, Mar 2024] Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs

Arxiv: https://arxiv.org/abs/2403.13801   Project page: https://natural-language-as-policies.github.io/ 

Supervisors: Andrew Melnik, Jun Miura, Ville Hautamäki

Research history

[2022-2024] Embodied Agent with LLMs

Worked on research on embodied control using LLM. Conducted experiments through simulated benchmarks. I applied the Chain-of-Thought framework to improve embodied planning accuracy. This project was submitted to a top conference as a preprint and we are waiting for reviewer comments.

Preprint: https://arxiv.org/abs/2403.13801 Project Page: https://natural-language-as-policies.github.io/

[2020-2022] Time series forecast using Transformer architecture

Time series data prediction using the Transformer architecture. Conducted experiments using PyTorch. Improved accuracy by devising ways to use the Attention mechanism. 

[2018-2020] Deep Reinforcement Learning on Educational Robot Arm [2018-2020]

I implemented a very simple framework that enables people to learn about deep reinforcement learning framework on an educational robot arm. Convolutional NN was used to capture scene information as an image.