ME326 Team 5 Winter 2025

Gotta Hand It To You

ME326 Winter 2025 Term Project

Team 5

Team Members

Baxter Bartlett

bbartl01@stanford.edu

State Machine
Pixel-to-Real-World Coordinate Transformations
Control Node

Bryan Jang Yit Tiang

tiang@stanford.edu

Data Collection and Labelling
YOLO11n and CNN Model Training
Vision and Speech to Text Nodes

Clive Chung

lychung@stanford.edu

Refined grabbing function
System testing
Robot reliability

Ethan Whitmer

erw12@stanford.edu

Pipeline for Vision/Audio Node
Handling of YOLO Boundary boxes for CNN Model
Open-CLIP for label/user input encoding and comparison

Ngorli Paintsil

ngorlip@stanford.edu

Gemini Speech to Text API for user input command
MediaPipe Hand Landmark for palm detection

Project Overview

For this project, we were tasked to use natural communication methods to communicate an intended task to a Locobot robot. One of the goals of this project is to use robot perception to perceive the environment, audio to hear from the person, and use existing libraries (or those you may develop) to take this input and determine the task the human is requesting. Then using computer vision, we need to perceive the environment, control the motion of the robot and complete the requested task.

The three tasks that we must complete are

Object Retrieval: Teams will use language to request the object and bring the desired object back to a particular location
Sequential Task: In this task, teams may use natural language to request a series of actions be performed in the environment
Group Chosen Task: In this task, the team will be given the freedom to pick a collaborative task

For the third task our group chose to implement a drop off task where the robot would hand over the object they retrieved into a human's hand. We were able to successfully program the Locobot to parse an audio command by a human to retrieve an object and place it in a specified location.

Code

The source code can be found here. The reason for the code being in Google Drive (as opposed to GitHub) is because Google disabled our API Keys when we pushed code containing the Keys to GitHub.

Page updated

Report abuse