Ken Ming Lee, Paul Barde, Maxime C. Cohen, Derek Nowrouzezahrai
Accepted at AAMAS 2026
In the layout optimization literature, customer trajectories (i.e., movement data) are commonly used to optimize store layouts, but they are costly to obtain.
Researchers therefore often resort to heuristics like the Travelling Salesman Path (TSP) and Probabilistic Nearest Neighbour (PNN) to approximate customer trajectories.
However, research has found that shortest paths diverge from real customer paths by ~28%, limiting their usefulness.
We introduce an agent-based, maximum-entropy reinforcement learning (RL) framework that models customer behaviour with bounded rationality.
We show that RL-generated trajectories match actual customer trajectories significantly better than TSP/PNN, improving estimates of impulse purchases and shelf traffic. Moreover, only RL produces product repositioning decisions that align with those supported by real data, yielding similar predicted profit gains.
Our results suggest that RL provides a practical alternative, bridging the gap between simplistic heuristics and expensive real-world data collection.
Data provided:
Raw point coordinates of customer paths
Coordinates of shelves (green) and walls (blue)
Initial discretization scheme: Grey represents walls, and purple represents store shelves
Placement of products in the simulator was determined based on physical store visits and trajectory data.
The animated red dot shows the actual human trajectory. The small grid size kept RL training time low, but the discretization was not sufficiently fine, causing human trajectories to overlap with shelves in the discretized store (shown in purple).
Final discretization scheme:
Each grid cell corresponds to a 50×50 cm area in the store
Coloured circles represent different product categories;
Dollar signs represent checkout points
The red triangle represents the RL agent’s initial position and orientation
The heatmap overlay shows the final positions of customers in the store, suggesting that the data were collected from two distinct checkout points.
Heatmap overlay of all unprocessed (discretized) trajectories, showing a good balance between discretization resolution and store complexity.
The animated red dot in the video shows the actual human trajectory, while the animated green cell shows its discretized version.
The raw trajectory does not overlap with the shelves in the simulated store, allowing the discretized trajectory to closely match the original.
A fully interactive 3D digital twin of the convenience store is also built to complement the paper. Every shelf and product location matches the real physical layout, allowing the research data to be experienced with spatial context.
Inside this environment, one can:
View heatmaps of customer and modelled trajectories
Replay paths taken by TSP, PNN, RL agents, and real-world customers
Switch between first-person and overhead modes
Directly observe where and why human behaviour diverges from algorithmic predictions
To keep the paper focused on its core contributions, store representations were deliberately simplified, and trajectory comparisons are presented as static heatmaps. While informative, customer interaction with a store is fundamentally a human experience. To fully understand hotspots in heatmaps and the gaps between actual human trajectories and those predicted by algorithms, one needs to visualize the heatmaps and paths firsthand.
Note:
Mouse and keyboard are required. To view controls, press “View Controls” in the main menu or on the pause screen (accessed with the Esc key).
The embedded iframe below does not allow mouse capture, but keyboard controls still work, so viewing heatmaps and replaying trajectories in the overhead view (press v to toggle) should function properly.
For the best experience, please run the simulation on itch.io.