When you look at a 2D scatter plot, you interpret distances using the plot’s coordinate system. There is no anchor connecting those distances to anything your body intuitively understands. A word that is “close” in a 2D embedding plot might be 2 pixels or 20 pixels away depending on the zoom level and projection choice — both are meaningless without context.
Physical grounding changes this. When embeddings are rendered in a room-scale 3D environment, distance is measured in meters. You can walk from one word to another. Words that are semantically close are physically close — close enough to touch, or close enough to take two steps to reach. This connects the abstract mathematical structure of embedding space to embodied spatial intuition that humans have developed over a lifetime of navigating physical environments.
The hypothesis behind this project is that this metric grounding — anchoring abstract distances to physical distances — helps users build more accurate mental models of embedding geometry than flat 2D projections.
1. Cluster Boundary Ambiguity
2D projections collapse depth information, causing points from different clusters to appear overlapping when they are actually well-separated in higher-dimensional space. In 3D, users can physically move around the cloud and see that an apparent overlap in one view resolves into clear separation from a different angle. Seven of nine participants in our study rated Unity 3D cluster definition at 4 or 5 out of 5, compared to an average of 2.6 for the noisy 2D condition.
2. Similarity and Distance Judgment
When asked whether ‘doctor’ is closer to ‘nurse’ or ‘engineer’, 78% of participants answered incorrectly from the noisy 2D projection — because the projection placed doctor visually near engineer. In the Unity 3D condition, 78% answered correctly. The additional dimension, combined with physical navigation, allowed participants to judge relative distances more accurately.
3. Exploratory Sensemaking
Several participants described a qualitative shift in how they engaged with the data in 3D: rather than scanning a static image, they were actively exploring a space. This changes the cognitive task from pattern recognition to spatial navigation, which is a much more practiced human skill. Participants noted they felt more confident in their cluster judgments after walking around the point cloud from multiple angles.
4. Ambiguous Membership Cases
Words that sit near category boundaries are the hardest to place from any projection. In our study, the mystery dot representing ‘power’ (which sits between Moral Concepts and Professions) was most commonly misclassified in all three conditions. However, participants who explored the 3D space performed slightly better on mystery dot accuracy overall (avg 1.56/3 correct in 3D vs 1.0/3 in best 2D), suggesting the additional dimension provides useful disambiguation even in boundary cases.
1. When labels are the bottleneck
In our implementation, word labels were only visible when pointing the controller at a specific sphere. This created a significant usability problem: exploration required repeatedly pointing at individual spheres to discover what each one was. One participant described spending “many tries” just to find a single target word. For datasets where word identification is part of the task, always-visible labels or a search-to-highlight mechanism would be necessary.
2. When dimensionality is very high
This project used UMAP to reduce 300D GloVe vectors to 3D. UMAP at 3D does a reasonable job of preserving local neighborhood structure, but long-range distances in 3D UMAP space are not reliable indicators of long-range distances in the original 300D space. For very high-dimensional data where UMAP’s global structure degrades significantly, the 3D layout may still mislead users about which words are truly similar across the full embedding space.
3. When hardware is a barrier
A Meta Quest 3 headset and the associated setup time creates a meaningful access barrier. For quick exploration, a printed 2D plot requires no setup at all and can be handed to a participant in seconds. The overhead of sideloading an APK, charging the headset, and calibrating the environment makes 3D VR impractical for rapid or informal data exploration.
4. When inter-cluster overlap is expected
One participant noted being surprised that the 3D space showed more inter-cluster overlap than the clean 2D projection. This is actually more faithful to the underlying data — the clean 2D projection artificially clarified cluster separation by using the two most discriminative PCA dimensions. For users who need an honest view of genuine boundary ambiguity, the 3D representation is more accurate but also more visually complex.
Always-visible labels or proximity-triggered label expansion significantly reduce exploration friction. Consider rendering labels at different sizes based on distance from the user.
Anchor an information panel to the user’s field of view rather than placing it in world space. World-space panels rotate behind the user and become unreadable.
Scale the point cloud to room scale (2–3 meters across) to make navigation intuitive. Too small collapses all distances; too large requires locomotion rather than walking.
Use UMAP rather than PCA for the 3D projection when local neighborhood structure is the primary insight. PCA preserves global variance but may not cluster as clearly at 3D.
For evaluation studies, always include a noisy 2D baseline — the contrast between noisy and clean 2D was the most instructive comparison for participants in our study, independently of the 3D condition.
Consider adding a ‘home’ button that returns the user to a default vantage point. Participants who navigated far from the center of the cloud sometimes struggled to reorient.