Research/Preprint

Keywords/Fields I’m interested in

LLM, VLM, Planning, Chain-of-Thought, Embodied Agent, Foundation Models, Embodied Reasoning, In-Context Learning, Zero-Shot, Semantic Reasoning, LoRA, Feedback, Reinforcement Learning, DPO, RAG, Planning, Memory, Action, Context-Awareness, Multi-Agents.

Preprints

[17th, July 2024] Limitation of Prompt Engineering with Gemini-Flash in Embodied Planning from Ego-Centric Video

ICML 2024 WORKSHOP: Multi-modal Foundation Model meets Embodied AI

PDF: https://drive.google.com/file/d/13SImnJ96m8qxHlPPtTs4CjTWGRRgpIqZ/view?usp=sharing

I tackled the problem of determining the next action to take from a first-person perspective. I had to choose one option from four choices. This was part of a competition held at a prestigious academic conference. I experimented with various prompt engineering techniques and explored the limitations of Gemini-Flash.

[20th, Mar 2024] Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs

Arxiv: https://arxiv.org/abs/2403.13801

Project page: https://natural-language-as-policies.github.io/

Supervisors: Andrew Melnik, Jun Miura, Ville Hautamäki

Research history

[2022-2024] Embodied Agent with LLMs

I conducted research on controlling a robotic arm using a Large Language Model (LLM). Instead of using the traditional method of simply calling functions, I focused on expressing the robot's skills at the natural language level to enhance generalization performance. This approach enabled grounding the robot's control in natural language.

Preprint: https://arxiv.org/abs/2403.13801

Project Page: https://natural-language-as-policies.github.io/

Skills: Python, LLaVA, ChatGPT, OpenAI, GPT4, GPT3, Pytorch, LLaVA-Next, Huggingface.

[2020-2022] Time series forecast using Transformer architecture

I conducted time series forecasting using the Transformer architecture and experimented with PyTorch. By optimizing the use of the attention mechanism, I was able to improve accuracy. Specifically, I designed the attention mechanism to capture the relationships between multiple time series by carefully applying it to multivariate time series data. Additionally, I used MLflow to streamline the management of experiments.

Skills: Python, MLflow, Dash, Pytorch

[2018-2020] Deep Reinforcement Learning on Educational Robot Arm [2018-2020]

I implemented a highly simple framework that allows for the application of deep reinforcement learning to an educational robotic arm. Using convolutional neural networks (CNNs), I captured scene information as images. The framework was trained to minimize the distance between the destination and the target object. This work demonstrated the feasibility of the approach.

Skills: Python, OpenCV.

Page updated

Google Sites

Report abuse