Portfolio of Md Touhidul Islam

Identifying Crucial Objects in Blind and Low-Vision Individuals’ Navigation

(ASSETS'24)

Problem:
Blind and low-vision (BLV) individuals face numerous challenges while navigating urban and indoor environments due to a lack of object-level awareness in computer vision models. Existing datasets like ImageNet or MS-COCO lack annotations for many objects crucial to safe and effective BLV navigation—such as overhanging branches, sidewalk pits, or bus stops—hindering the development of accessible navigation aids.

Solution:
We created a new object taxonomy (see Table 1) tailored to BLV navigation by:

Analyzing 21 publicly available videos of BLV individuals navigating varied environments.
Identifying 80+ relevant objects by reviewing the videos and noting those that affected navigation through interaction or potential hazards (see Table 1).
Conducting a focus group study with 6 participants (2 blind, two low-vision, two sighted experts) to revise the list, resulting in a curated set of 90 objects across 15 semantic categories (see Fig. 2).
Labeling 31 video segments with the presence/absence of these objects.
Benchmarking seven vision models (YOLOv7, Mask R-CNN, RAM, BLIP, etc.) on this dataset to assess how well existing models recognize these accessibility-critical objects (see Figs. 1 and 3).

Outcome:
The curated object list revealed that modern vision models performed poorly in recognizing many of the 90 objects. Detection and segmentation models typically recognized only 12–15 objects, while open-vocabulary models (RAM, BLIP) had better but incomplete coverage. BLIP showed the best F1 score (0.48), but still missed several key objects critical for safety (e.g., barrier stumps, tree branches).

Impact:
This work exposes a crucial gap in accessibility-oriented AI: state-of-the-art models cannot yet support safe, real-time navigation for BLV users. By releasing this benchmark dataset and object taxonomy, we provide a foundation for creating more inclusive, context-aware navigation tools and training data for future accessibility-focused AI systems.

Paper | GithubTools: ggplot2, Figma, SPSS, Pandas, Python

Page updated

Google Sites

Report abuse