Current Project and My Research Interests
Injecting low-level concepts into open-sourced VLMs
VLMs are not good at understanding low-level concepts such as spatial, and numerical values, so we inject this concept by fine-tuning VLMs.
Keywords I’m interested in: LLM, VLM, Planning, Chain-of-Thought, Embodied Agent, Foundation Models, Embodied Reasoning, In-Context Learning, Zero-Shot, Semantic Reasoning, LoRA, Feedback, Reinforcement Learning.
Preprints
[17th, July 2024] Limitation of Prompt Engineering with Gemini-Flash in Embodied Planning from Ego-Centric Video
PDF: https://drive.google.com/file/d/13SImnJ96m8qxHlPPtTs4CjTWGRRgpIqZ/view?usp=sharing
[20th, Mar 2024] Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs
Arxiv: https://arxiv.org/abs/2403.13801 Project page: https://natural-language-as-policies.github.io/
Supervisors: Andrew Melnik, Jun Miura, Ville Hautamäki
Research history
[2022-2024] Embodied Agent with LLMs
Worked on research on embodied control using LLM. Conducted experiments through simulated benchmarks. I applied the Chain-of-Thought framework to improve embodied planning accuracy. This project was submitted to a top conference as a preprint and we are waiting for reviewer comments.
Preprint: https://arxiv.org/abs/2403.13801 Project Page: https://natural-language-as-policies.github.io/
[2020-2022] Time series forecast using Transformer architecture
Time series data prediction using the Transformer architecture. Conducted experiments using PyTorch. Improved accuracy by devising ways to use the Attention mechanism.
[2018-2020] Deep Reinforcement Learning on Educational Robot Arm [2018-2020]
I implemented a very simple framework that enables people to learn about deep reinforcement learning framework on an educational robot arm. Convolutional NN was used to capture scene information as an image.