What if we could solve the biggest hurdle in AR, the vergence-accommodation conflict, with a new kind of display?
For years, light field technology was seen as the answer, but it was just an idea—without solid proof.
Augmented reality has moved beyond the realm of science fiction and is now a rapidly advancing technology. With industry leaders eyeing its potential, we’ve seen a wave of AR glasses enter the market, from the Epson Moverio to the HoloLens and Magic Leap.
So, why aren't we all wearing them yet?
While cost is a factor, a more fundamental issue lies in the hardware. Most commercial near-eye displays haven't resolved the Vergence-Accommodation Conflict (VAC). This means that prolonged use can cause users to experience uncomfortable symptoms like dizziness, nausea, and eye strain.
The VAC issue arises because most 3D display devices have a fixed optical focal plane, onto which stereoscopic images are projected. Although the users can align the stereo image pair, they are forced to accommodate at the focal plane. As a result, a persistent mismatch of depth cues exists between the object to be viewed and the depth at which the users accommodate.
An illustration showing the VAC.
So if stereo displays are the problem, what's the answer?
Enter light field displays, a groundbreaking technology developed to do what traditional displays can’t: render virtual objects with truly correct depth and focus cues.
A light field display works by reproducing the light rays of a virtual object as if they were coming from a real object. Imagine light hitting your eyes from all directions, just as it would in the physical world. This is exactly what a light field display does, allowing your eyes to naturally adjust and focus on virtual objects at their correct depth, just like you would with real-world objects.
This isn't just a theory. Light field technology is rapidly gaining traction in both academia and the industry as the ultimate solution to the VAC problem, and it's easy to see why.
Despite all the promise, something was missing. No one had successfully proven that light field displays truly eliminated the VAC issue.
And the challenge runs even deeper. The existing evaluation metrics used to test these devices are often flawed. They're filled with confounding factors, applicable only to very specific scenarios, and far from being a universal standard for the industry.
But sometimes, the most complex problems have solutions hiding in plain sight.
We turned to the field of psychology, which has spent decades perfecting the art of separating different aspects of human perception and cognition. Using a classic visual search task, we were able to design an experiment that would do exactly what was needed: isolate the components that might be affected by VAC, allowing us to directly compare user performance between light field and conventional displays.
To prove that light field technology works, we needed a fair fight. We designed an experiment to compare a Light Field Glass (LFG) with a traditional AR Glass (ARG).
The LFG we used was a prototype from PetaRay Inc., while the ARG was an Epson VM-40 model. We made sure the experiment was as fair as possible by using two glasses that shared the exact same eyepieces. This allowed us to focus on the key differences that really matter: spatial resolution and how they render light.
Light Field AR Glasses (LFG)
Angular Resolution: 3 x 3
Spatial Resolution: 640 x 360 pixel
VM-40 Conventional AR Glasses (ARG)
Spatial Resolution: 1920 x 1080 pixel
Before we started, we made a crucial decision: we didn't have a specific hypothesis about which pair of glasses would perform better.
Why? Because going into an experiment without preconceived notions allows us to analyze the data more rigorously. By using a two-tailed test, we ensured our analysis was unbiased and reduced the chance of a Type I error—or, in simple terms, we minimized the risk of a false positive result. This approach helped us focus purely on what the data had to say, rather than trying to prove a particular outcome.
We built two identical testing stations—one for the Light Field Glasses (LFG) and one for the conventional AR Glasses (ARG). Both pairs of glasses were securely mounted on special rails, and a large 24-inch monitor was placed directly in front of them. Crucially, we could precisely adjust the monitor's distance from the glasses, placing it anywhere from 20 cm to 60 cm.
During the test, participants sat in a chair, rested their heads on a chin rest (to keep their viewing angle steady), and looked through the glasses.
Finally, the entire setup was covered with a thick, blackout cloth. This was essential to create a consistent dark environment, eliminating any outside light or distractions so that the only thing participants saw was what the AR glasses were projecting.
We recruited a total of 34 participants (half female, half male) from NTU, with ages ranging from 18 to 28.
Before anyone began, we made sure the test was safe and fair:
Informed Consent: Every person understood the experiment and agreed to take part.
Vision Check: All participants had normal or near-normal vision (within ±0.75 diopters). To avoid confounding factors, we excluded anyone with a significant vision difference between their two eyes.
All the data we collected and processed strictly followed the rules set by the Research Ethics Committee of National Taiwan University (NTU-REC). Our commitment to ethical guidelines ensured the study was conducted safely and responsibly.
To test the glasses, we used a specific type of visual search challenge called "Find the TLT triplet," based on established methods from psychology (like the Guided Search framework). The key goal of this task is to force the user to look directly at every single item on the screen one by one.
Why is this important? Because the time it takes to find the target grows predictably with the number of items. Also, this search requires sharp focus (high visual acuity), making participants more likely to try and focus at the actual depth of the text being displayed—a crucial point for testing the VAC issue.
Participants were asked to find the triplet "TLT" among a group of other "L" and "T" combinations. The items came in two colors, green and purple, and appeared in different quantities (or "set sizes"): 3, 6, or 12 items on the screen at once.
To test how each pair of glasses handles the virtual-real divide, we mixed virtual and real text. When there were 3 items, 2 of them were virtual. For the larger sets of 6 and 12, half of the items were virtual. We arranged them in an alternating pattern on the screen, as shown below, to ensure a fair test.
We made sure all the items were spread out randomly to prevent the "crowding effect," where items are too close together to see clearly. The participant's only job was to quickly determine if the target "TLT" was there or not, and press a key to record their answer. Each time they searched and responded counted as one "trial."
Our experiment was run in two key ways to test the glasses under different conditions:
Pure Virtual Mode (VR Mode): Here, we turned the monitor off. The room was completely dark, and participants only saw the virtual text projected by the AR glasses. This tested how the glasses performed when rendering virtual objects alone, placing them at either 30 cm (close) or 60 cm (far).
Virtual-Real Integration Mode (AR Mode): This is the main event. Participants saw both the virtual text from the glasses and the real text displayed on the monitor. To ensure a fair comparison, we made a crucial design choice:
The virtual and real texts were interleaved, meaning they were mixed together to force the participant's eyes to process both simultaneously.
Most importantly, the virtual and real texts were always placed at the same depth (either 30 cm or 60 cm away).
The entire experiment was built on combining three core variables:
Two Modes: AR Mode and VR Mode.
Two Glasses: Light Field Glass (LFG) and traditional AR Glass (ARG).
Two Distances: 30 cm and 60 cm.
This gave us a total of eight unique test configurations (e.g., ARG-30, LFG-60, etc.). To make sure the order of testing didn't affect the results (like a participant getting fatigued), we carefully balanced the presentation order for everyone.
Eight Display Configurations in the Experiment
(a) A photo of the light field AR glasses.
(b) Display configuration of (VR, LFG-30) with a set size of 6.
(c) Display configuration of (VR, LFG-60) with a set size of 6.
(d) to (i) Display configurations in the AR mode with all three set sizes, where (d) to (f) illustrate the interleaved arrangement of real and virtual text (circled), a sequencing that remains consistent for both types of AR glasses across all set sizes. The photos were captured by focusing the camera on the monitor.
(a) A photo of the conventional AR glasses.
(b) Display configuration of (VR, ARG-30) with a set size of 6. This photo was shot by focusing the camera at 30 cm.
(c) The same display configuration as (b) except that the photo was shot by focusing the camera at 6.4 meters.
(d) to (i) Display configurations in the AR mode with all the three set sizes. The photos were captured by focusing the camera on the monitor.
For each of the eight testing scenarios, every participant completed seven runs: one practice run followed by six formal runs, totaling 180 tasks.
Before the formal testing began, participants told us their initial fatigue and pain levels using a simple 5-point scale. To prevent exhaustion, a mandatory 60-second break (eyes closed) was enforced between each run. Right after each break, participants rated their fatigue again. This gave us a running track of how their comfort levels changed over time.
Participants signaled their answers by pressing keys as quickly and accurately as possible. Because the full experiment was so long (six to eight hours), we split the eight scenarios across two separate days—one for the AR Mode scenarios and one for the VR Mode scenarios—to ensure the results weren't skewed by exhaustion.
In our experiment, we intentionally changed four main factors, or independent variables, to see how they impacted the users:
Viewing Mode (AR or VR): Did the users see only the virtual content (VR mode), or did they see the virtual content blended with the real world (AR mode)? The AR mode required participants to integrate both kinds of text for an efficient search.
Type of AR Glasses (LFG or ARG): We compared the Light Field Glasses (LFG), which let the user focus naturally at any depth, against the traditional AR Glasses (ARG), which have a fixed focus plane.
Viewing Distance (30 cm or 60 cm): We tested both a close distance (30 cm) and a slightly further distance (60 cm). Importantly, in the blended AR mode, the real and virtual texts were always set to appear at the exact same depth.
Target Existence (Present or Absent): Was the specific target ("TLT") present in the scene or not? This is crucial because how a person searches changes based on this:
- Target-Absent trials usually require a thorough, item-by-item check before the participant can confidently say the target isn't there.
- Target-Present trials allow the participant to respond immediately once they spot the target, making the search process much faster.
To understand user performance and experience, we tracked five key results, or dependent variables: Reaction Time (RT), Slope, Intercept, Accuracy, and Fatigue Level.
Reaction Time (RT), Slope, and Intercept
We analyzed the search process by focusing on the Reaction Time (RT)—the speed (in milliseconds) it took a participant to respond. A lower RT means a faster response.
To dig deeper into why a search was fast or slow, we looked at how RT changed as the number of items (set size: 3, 6, or 12) increased. We plotted the average RT against the set size, and used that line to find two crucial values:
Slope (Search Efficiency): This measures how much extra time a participant takes for every single item added to the screen. A steeper slope means the participant is slower and less efficient at searching.
Intercept (Basic Processing Time): This is the point where the reaction time line starts on the graph. It represents the baseline time needed for things unrelated to the physical search, such as initially processing the image or choosing a key to press. Since our task didn't involve heavy memory or perceptual loads, we believe any significant change in the Intercept is largely due to the VAC issue.
Accuracy and Fatigue
Accuracy: This is a straightforward measure: the percentage of trials where the participant gave the correct answer. We calculated this separately for trials where the target was present and when it was absent.
Fatigue Level: This was the participant's subjective rating of their comfort, ranging from 1 (Very Fresh) to 5 (Severe Fatigue/Pain). We collected this rating multiple times—before and after every run—to track how the different glasses and scenarios affected user comfort over time.
To interpret our extensive data, we used JASP, a professional statistics program.
We performed a deep dive on performance (slope, intercept, RT, and accuracy) by running a complex statistical test called ANOVA. This allowed us to understand how the three main variables—mode, glasses type, and distance—interacted and affected user performance in both successful and unsuccessful searches.
For the user comfort ratings (fatigue), we used non-parametric multiple comparison with Bonferroni-Holm correction to rigorously compare the results across all scenarios.
JASP
The data from our rigorous statistical tests painted a clear picture of how the two types of glasses performed.
Overall, the LFG demonstrated a clear advantage:
Accuracy: Users made fewer mistakes with the LFG, showing significantly higher accuracy than the traditional AR Glass (ARG).
Intercept (Processing Time): The LFG also resulted in a lower Intercept, meaning users needed less baseline time to process the image and prepare their response. This strongly suggests that the LFG makes the virtual content easier to perceive and process—likely because it correctly solves the VAC problem.
The most important difference appeared when we combined the LFG with the challenging close-range (30 cm) AR Mode:
In this crucial real-world simulation, the LFG was significantly faster than the ARG.
The LFG at 30 cm was the only scenario that showed both faster speed and higher accuracy simultaneously compared to the ARG. This finding is critical: it proves that the traditional ARG creates a high level of task difficulty for users, while the LFG effectively overcomes that hurdle.
We did observe an unusual result at the far distance (60 cm) for search efficiency (Slope): the LFG actually led to a slightly less efficient search than the ARG.
However, after combining all our speed (RT) and accuracy results, we found no evidence of a speed-accuracy trade-off. This means users weren't sacrificing accuracy for speed or vice versa. The overall benefit of the LFG—especially its speed and accuracy at 30 cm—confirms its superior performance.
We used a specialized statistical test (the Wilcoxon signed-rank test) designed for this type of rating data. Interestingly, our analysis found no significant difference in the increase in fatigue across any of the glasses, modes, or distances.
While our performance metrics were objective, the feedback from participants revealed the true, lived experience of wearing both types of AR glasses. We categorized their comments into four areas: Text Clarity, Virtual-Real Integration, Vision, and Symptoms.
Figure corresponds to problem of blurry text
Table corresponds to virtual-real integration
When asked about the text they saw, one issue dominated the feedback for the ARG: Blurriness.
The ARG's virtual text was the most common complaint (mentioned 27 times in AR Mode), especially at the challenging 30 cm close-range distance.
This intense blurriness strongly aligns with the VAC issue. Participants with this feedback likely had their eye focus (accommodation) dictated by where their eyes crossed (vergence), making it difficult to focus on the text at the correct depth.
In contrast, feedback for the LFG was a mix of clear text and an issue they described as "blurry" (which was actually distortion). This distortion wasn't related to VAC, but rather to the LFG's lower screen resolution, causing a "grainy" or "ghostly" look. Importantly, even when seeing this distortion, participants could still clearly see the underlying image—unlike the fundamental focus problem with the ARG.
When participants had to view the real world and virtual text at the same time (AR Mode), the LFG proved superior:
ARG Feedback was mostly Negative: The most common complaint was the sensation of their gaze "switching" between the virtual and real text, even though both were at the same depth. This "switching" is a vivid description of their eyes struggling with the VAC and attempting to accommodate two different objects simultaneously.
LFG Feedback was mostly Positive: Users commonly reported being able to perceive both the virtual and real text clearly at the same time. Many said they couldn't even tell which text was being projected by the glasses, demonstrating a seamless integration that the ARG could not achieve.
Figure corresponds to problem of focus
Figure corresponds to health impacts
Concerns about general vision—like difficulty merging the two images (binocular fusion) or struggling to focus—were overwhelmingly directed at the ARG.
A telling piece of feedback was participants saying they "need physical effort to perceive a clear text." This suggests they were consciously forcing their eye muscles to accommodate, essentially battling the VAC issue to maintain focus. This painful effort was reported with both glasses, indicating that even the LFG can be fatiguing, but the ARG was the primary culprit for serious vision issues.
The survey on health impacts was perhaps the most alarming:
Dizziness, eye strain, and eye sore were the most frequently reported symptoms.
Crucially, more severe symptoms like tiredness, headache, eye discomfort, and eye pain were only reported when using the ARG.
The highest number of symptoms occurred specifically when using the ARG in VR mode.
This participant feedback, especially the type and frequency of symptoms, provides powerful anecdotal evidence supporting the idea that the traditional AR Glass's hardware issues (VAC) cause significant discomfort and health symptoms during prolonged use.
In this study, we successfully established and demonstrated a robust evaluation framework for assessing user performance and experience across different Augmented Reality (AR) displays. This framework offers valuable insights into controlling for confounding factors in experimental design and is applicable to comparing AR displays with similar principles, such as light field displays.
The framework’s effectiveness is underscored by its ability to yield clear findings and its robustness when facing complex results. Although the counterintuitive finding of the ARG-60 showing higher search efficiency does not indicate a flaw in our method, it highlights a critical research challenge for the field: how to effectively compare AR displays with vastly different hardware specifications. Future research should investigate methods to normalize different hardware specifications to further enhance evaluation robustness.
Crucially, the framework led to significant findings regarding AR technology:
The Light Field Glass (LFG) outperforms the traditional AR Glass (ARG) in several key areas.
The LFG provides significantly lower intercept and higher accuracy, while also facilitating better virtual-real integration.
This advantage is particularly important for close-range applications, as our results revealed the pronounced and negative impact of the Vergence-Accommodation Conflict (VAC) on ARG users at a 30 cm viewing distance.
The LFG also consistently received more positive user feedback across both near and far viewing conditions, suggesting its broad applicability.
These detailed and crucial insights highlight the effectiveness of our evaluation framework in identifying critical performance differences. Therefore, our evaluation framework has the potential to become a standard for rigorously evaluating user performance and user experiences for future near-eye AR displays.