Zach
My research investigates whether a lightweight, multitask Vision Transformer-based model can accurately and efficiently track critical eye movements for the purpose of optimizing VR hardware and user experience.
My research investigates whether a lightweight, multitask Vision Transformer-based model can accurately and efficiently track critical eye movements for the purpose of optimizing VR hardware and user experience.
This project explores the viability of a novel computer vision architecture for virtual reality eye movement analysis for the purpose of foveated rendering. VR requires total user immersion to be effective. However, performance drops can disrupt the experience. Artificial intelligence can optimize VR hardware, ensuring that disruptions do not occur as often. Conducted with support from my mentor and principal investigator, Dr. Sai Zhang, my research explores the potential utility of lightweight, multi-task vision transformers (ViTs) in foveated rendering. This process involves using computer vision models to inform the VR headset exactly where and when a user is looking at a region, and to render only that specific location. This preserves processing power for tasks such as preserving adequate frame rates to improve user experience, with the periphery maintained at lower graphical fidelity, similar to our own vision. Built on a ViT backbone, my model will be trained on Meta’s OpenEDS dataset and split the processes of 2D gaze estimation, as well as saccade and blink detection, into three processes, or heads. The estimation head utilizes a mean squared error loss function, with the detection heads using a cross-entropy loss function, which penalizes heads based on “mistakes” made as they learn. Preliminary training on Columbia’s Gaze Dataset indicates premature error plateauing and loss oscillations. This suggests that the model is failing to learn the relationships within the data. This failure to learn may be influenced by many factors, including dataset reliability, inadequate image cropping, and model architecture issues. Despite these setbacks, ViTs may still provide a viable path forward in the future.
Press the pop-out button to view: