LLM, VLM, Planning, Chain-of-Thought, Embodied Agent, Foundation Models, Embodied Reasoning, In-Context Learning, Zero-Shot, Semantic Reasoning, LoRA, Feedback, Reinforcement Learning, DPO, RAG, Planning, Memory, Action, Context-Awareness, Multi-Agents.
[17th, July 2024] Limitation of Prompt Engineering with Gemini-Flash in Embodied Planning from Ego-Centric Video
ICML 2024 WORKSHOP: Multi-modal Foundation Model meets Embodied AI
PDF: https://drive.google.com/file/d/13SImnJ96m8qxHlPPtTs4CjTWGRRgpIqZ/view?usp=sharing
I tackled the problem of determining the next action to take from a first-person perspective. I had to choose one option from four choices. This was part of a competition held at a prestigious academic conference. I experimented with various prompt engineering techniques and explored the limitations of Gemini-Flash.
[20th, Mar 2024] Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs
Arxiv: https://arxiv.org/abs/2403.13801
Project page: https://natural-language-as-policies.github.io/
Supervisors: Andrew Melnik, Jun Miura, Ville Hautamäki
I conducted research on controlling a robotic arm using a Large Language Model (LLM). Instead of using the traditional method of simply calling functions, I focused on expressing the robot's skills at the natural language level to enhance generalization performance. This approach enabled grounding the robot's control in natural language.
Preprint: https://arxiv.org/abs/2403.13801
Project Page: https://natural-language-as-policies.github.io/
Skills: Python, LLaVA, ChatGPT, OpenAI, GPT4, GPT3, Pytorch, LLaVA-Next, Huggingface.
[2020-2022] Time series forecast using Transformer architecture
I conducted time series forecasting using the Transformer architecture and experimented with PyTorch. By optimizing the use of the attention mechanism, I was able to improve accuracy. Specifically, I designed the attention mechanism to capture the relationships between multiple time series by carefully applying it to multivariate time series data. Additionally, I used MLflow to streamline the management of experiments.
Skills: Python, MLflow, Dash, Pytorch
[2018-2020] Deep Reinforcement Learning on Educational Robot Arm [2018-2020]
I implemented a highly simple framework that allows for the application of deep reinforcement learning to an educational robotic arm. Using convolutional neural networks (CNNs), I captured scene information as images. The framework was trained to minimize the distance between the destination and the target object. This work demonstrated the feasibility of the approach.
Skills: Python, OpenCV.