Helping people navigate the world - through sound
Introduction
Many people with visual impairments struggle to move safely and confidently through everyday environments because traditional assistive technologies are often too expensive and can be difficult to use in dynamic, real-world settings. We created VisionNav to address this challenge by turning the iPhone’s camera and LiDAR data into real-time audio cues, allowing users to “hear” their surroundings and navigate spaces with greater confidence.
Features
Navigate Safely: audio cues help direct the user into clear paths
3-D audio cues which direct the user - left and right based off of respective airpod noise; up and down based on varying pitches
Task Completion: ask for help with a task
Target acquisition - find an object
Uses Gemini command to notify the app to identify the object using YOLO in a set frame
Miscellaneous tasks - read a paper
User Manual
Understanding the navigation process:
When the user starts to move in a general direction, based on the objects in frame, VisionNav will choose the best path shown in the frame divided into five equidistance columns. Once the column with the cleanest path is chosen, the user will be directed right or left with the corresponding AirPod sending audio cues, incorporating the 3-D audio cue aspect of our app.
Understanding the target acquisition process:
When the user speaks the dedicated command to trigger the target acquisition phase, the app will actively look for the user's hand using the YOLO model. It will also look for the object that has been specified in the instruction by the user. With no hand or target object in the frame originally, there will be no audio cue available. It is recommended that once the user sets the frame of reference, that scene will be held constant. Now the user can navigate left and right to align their hand with the object in the positive z or forward direction, which the user will be notified of with a different audio cue. Now as the user proceeds in the forward direction, the audio cues will get faster, and once in range, will display a final sound indicating the user to grab or feel for the object.
Understanding the miscellaneous tasks:
The user can vocally prompt the app to help perform a task, which will be done based on the same behaviors as the previous tasks mentioned above. The user's vocal prompt will be deciphered by the Gemini model that has been incorporated into this app.
Demo
Impact
VisionNav gives people with visual impairments the freedom to move through their environments with confidence rather than limitation. The result is a higher level of trust and reduced reliance on costly devices, and the ability to fully participate in everyday life.
Future
We envision a world where people with visual impairments can navigate confidently without relying on costly or inaccessible tools. VisionNav will continue to grow through wearable integration such as smart glasses, GPS-based outdoor guidance, and AI-powered recognition of landmarks and moving obstacles. The result will be everyday movement that is not only safer and more seamless, but also more liberating.