Real World Engagement Attention Dataset
(RWEAD)
Last updated 2026/01/05
The Real World Engagement Attention Dataset (RWEAD) bridges a critical gap in behavioural research by moving beyond constrained laboratory settings and artificial stimuli to capture how people truly engage with the world around them. The RWEAD project is founded on a simple, powerful premise: to understand engagement, we must observe it where it naturally occurs. This dataset is exclusively composed of real-world video recordings, meticulously captured and annotated to document the subtle and overt signals of human engagement. Whether it's a person absorbed in a performance, a group reacting in a social setting, or an individual's focused interaction with technology in a public space, RWEAD provides a window into genuine behavioural responses.
Core Focus:
Engagement: Measuring the depth of involvement and emotional/cognitive investment in a real-world activity or stimulus.
Attention: Tracking visual and behavioral focus—where do people look, what captures their gaze, and for how long in unstructured environments?
Behaviour: Documenting the accompanying physical manifestations —facial expressions, body language, gestures, and interactions that complete the picture of engagement.
Designed for researchers in computer vision, human-computer interaction, psychology, marketing, and AI development, the RWEAD dataset empowers you to build and validate models that are robust, ethical, and applicable to real-life scenarios. Explore the data, discover the annotations, and join us in advancing a more authentic understanding of human behaviour.
The dataset is going to be built in modules, with periodic releases.
Note: PoI - Point of interest or PoE - Point of Engagement, will be considered the same point, except said otherwise.
Dataset is structured into three primary categories (under construction):
(i) Corporate:
Amphitheatre (available):
RWEAD1.0: Attention Room (2D: 14 (1080p) + 14 (1520p) 30s videos) - Public
RWEAD1.0: Engagement Amphitheatre (3D: 8 (1080p) + [104 + 52PoE] (1080p) 30s videos) - Public
RWEAD1.0a: Discussion Thesis (2D: 4 (1080p) ~1h30 videos) - Private
CoffeeBreak (in construction);
Round Tables (in construction).
(ii) Open Spaces/Fairs:
Indoor (in construction);
Outdoor (in construction).
(iii) Stands:
Indoor (in construction);
Outdoor (in construction).
RWEAD 1.0 comprises data from two distinct experimental setups from Corporate - Amphitheatre category. Both setups feature only male participants (amateur actors aged 20-55), consisting of students and professors without any formal acting training. In the attention room setup, participants were instructed to fixate on a static point. In the engagement amphitheatre setup, they were asked to track a moving point during their monologue. In both they were asked be engaged or not and whit a positive/negative felling/emotion. All recorded videos have a duration between 30-45 seconds. Critically, environmental variables. including lighting, occlusions, etc., were deliberately uncontrolled to reflect naturalistic conditions.
Setup #1 - Attention Room (2D cameras) (Jan. 2026):
Attention room, was conducted in a classroom environment. Participants were instructed to direct their gaze sequentially toward seven distinct Points of Interest (PoIs) located on the front wall. This controlled task served mainly as a validation benchmark to ensure the precise calibration of attention-tracking model. The setup was recorded simultaneously with two 2D cameras: Laia camera (1080p) [Laia] and a Hikvision DS-2DE4A425IWG-E (1520p) [Hikvision]. Both cameras were identically positioned to capture the participants (the "audience"). The 7 PoIs/PoEs were spatially distributed at known locations on the wall behind the camera apparatus, facing the participants. Each target was 2 movies, being the total number of movies is (7x2) 14 with 1080p + 14 with 1520p. The spatial layout of this configuration is presented in the Figure AR.
Figure AR. The left image depicts a top-view schematic of the physical setup. The right image illustrates the corresponding distribution of the seven PoIs within the camera's coordinate system, with their positions marked as blue dots.
The top two images illustrate the position of the participants.
In the metadata of the setup, all the distances from the audience to the camera are presented (all measures are in meters).
Setup #2 - Engagement Amphitheatre (3D cameras) (Jan. 2026):
Engagement Amphitheatre, comprises two distinct recording sets. Both, sets were film in the same Amphitheatre, in different days, and were asked for the participants represent three engagement categories:
Engaged-Positive: engaged in the activity while exhibiting a positive affective state - goes from Excited, Delighted, Happy, Content, Relaxed to Calm;
Engaged-Negative: engaged but exhibiting a negative affective state - goes from Tense, Angry, Frustrated, Depressed, Bored to Tired, and
Not Engaged.
Both sets were filmed with three Luxonis OAK-D 3D cameras (1080p) [Luxonis].
Set #1: The PoI remained static in front of a dedicated camera. Two additional audience-facing cameras, rotated 180° about the y-axis, completed the three-camera array. Figure 2 shows the configuration. This set contains 4 synchronized video recordings from 2 cameras, totalling (4x2) 8 videos, comprising 8 2D RGB (.mp4) videos and 8 disparity data (.h5). Participants were instructed to gaze directly (or not) at the PoI camera, and present engagement vs. non engagement actions.
Set #2: The layout for this set is similar to the previous one (see Fig. EA, middle and bottom rows). As before, one (1) camera was dedicated to filming the PoI, here, a human presenter on a stage, while two (2) audience-facing cameras, rotated 180° around their vertical y-axis, captured the reactions. The key difference in this set is that the PoI (the presenter) moved in front of the stationary audience. This configuration yielded 60 synchronised recordings from each of the three cameras, resulting in a total of (2x52) 104 videos (audience) + 52 videos (PoI), each comprising 2D RGB (.mp4) and disparity data (.h5). For this set, the audience was instructed to exhibit specific behaviours across different seating configurations, including engagement vs. non-engagement and positive vs. negative emotions.
Figure EA. Set #1, Top row, left shows a top view, and a side view is presented on the right. The positions of all participants (audience) relative to the PoI camera were computed from their disparity maps and are included in the dataset. For illustrative purposes, the figure shows three audience members. Set #2, on the middle-left row, shows the top view, and on the right, the side view.
The top two images illustrate the view of the camera and the respective participants (all measures are in meters).
Note: Audience positions are just an example; once you have the exact position of each person in the 3D information of the camera.
RWEAD1.0a - Discussion Thesis (2D cameras) - Private (Set. 2025):
The RWEAD 1.1 dataset consists of four videos (approximately 1.5 hours each) capturing a master's thesis discussion (Corporate - Discussion Thesis). The recordings feature simultaneous views of the thesis jury, the two elements of the audience, and the student. No instructions were given to participants, and therefore, no ground-truth engagement labels are available. However, points of attention for the jury were on the first 20 minutes, the student and the board (during the student's presentation), and thereafter on the student alone. For the public, points of attention were divided into three points: the student, the jury, and the board (during the initial 20-minute presentation). The scene was recorded with two synchronized pairs of 2D cameras, including a Laia camera (1080p) [Laia]. Key environmental variables—such as lighting and potential occlusions—were left uncontrolled to ensure naturalistic conditions.
Figure DT. Room layout (all measures are in meters).
RWEAD 1.0 download available at OSF - https://doi.org/10.17605/OSF.IO/453SH
For RWEAD 1.0 cite:
Lemos, Marco, Cardoso, Pedro J.S., & Rodrigues, João M.F. (2026) MiE: A Microscopic Model for Real-Time Group Engagement Estimation Using Gaze and Posture. Submitted to Journal of Computational Science
(temporary): Lemos, M., Cardoso, P.J.S., Rodrigues, J.M.F. (2025). Microscopic Binary Engagement Model. In: Lees, M.H., et al. Computational Science – ICCS 2025. ICCS 2025. Lecture Notes in Computer Science, vol 15905. Springer, Cham. https://doi.org/10.1007/978-3-031-97632-2_9. [bib]
ALGARVE 2030, Portugal 2030 and by the European Union, ALGARVE-FEDER-01180500, Ref. 17325 - Project AI.EVENTS
ALGARVE 2030, Portugal 2030 and by the European Union, ALGARVE-FEDER-02964500, Ref. 24298 - Project AI.INSIGHTSI