The accuracy achieved by our participants is assessed by their responses with respect to their precision, recall, and f-measure. Higher values of Precision, Recall, and the F-measure, support the claim of a better accuracy.
The fraction of model elements retrieved by participants (for the first hypothesis) or of defects (for the second hypothesis) which are relevant.
The fraction of relevant model elements (or of relevant defects) retrieved by participants, over the total number of model elements (or potential defects) retrieved.
A measure that combines precision and recall, computed as 2 * (Precision * Recall) / (Precision+Recall) ; this measure provides an harmonic mean of precision and recall.
The speed achieved by our participants is assessed by several time-related indicators. We are interested not only in the overall response time, but also on the time it takes participants to provide valid answers. Lower values of these metrics support the claims of superiority of the corresponding concrete syntax with respect to its cognitive effectiveness in terms of improving the speed with which the models are understood and reviewed. While the overall duration addresses the time spent in the task, the other two metrics provide a detailed picture of the moment when the participant starts and ends providing valid feedback.
The time taken by the participants to complete the task.
The time taken to accurately report the first response element; for the understanding task, this is the time for correctly reporting the first element that answers the question enunciated in the task; for the reviewing task, this is the time taken to report the first seeded defect in the model. If a participant does not correctly report at least one element, this metric will be treated as a missing value and removed from all further analysis procedures.
The time taken to accurately report the last response element; this is the dual for the First Detection metric.
The ease with which participants conduct their tasks is assessed by effort measures. Although time measures (as those we used for speed) are often used as proxies for effort, in the context of the “Physics” of Notations these are better matches for the speed component, which is likely to strongly correlate to ease. Instead, we focus our assessment in two information sources: the physical (visual) effort involved in exploring the model and the perception of effort reported by participants. The former is addressed with eye-tracking measurements, while the latter is assessed through a NASA TLX questionnaire. A higher number and duration of fixations is associated with a higher visual attention in a given set of AOIs (in this case, relevant vs. irrelevant model elements) [22–24]. For understating tasks, a higher Fixation Rate indicates higher efficiency associated with less effort to find the relevant AOIs [24–28]. As for reviewing tasks, a higher ratio indicates more visual effort to find defects [26, 29]. Regarding the Average Fixation Duration, a higher value indicates more time and attention devoted to AOIs [23, 25, 30], some state this ratio is correlated with cognitive processes [31, 32]. A higher number of saccades can be associated with a higher visual effort, meaning the participant may be somewhat “lost” in the model, making a more erratic model navigation [23, 28, 32, 33]. A higher number of saccades to the key can also be associated with difficulties with the concrete syntax. Concerning the NASA-TLX score, higher scores are associated with a higher perceived effort by the participants [17, 33]. Both for the eye-tracking and the NASA-TLX metrics, lower complexity will correspond to higher ease in performing the tasks.
The fraction of the number of fixations in an given Area Of Interest (AOI) over the total number of fixations in the AOG (Area of Glance). A fixation is a stabilisation of the eye on a part of the stimulus for a period of time between 200 and 300 ms.
The fraction of the number of fixations in an given AOI over the total number of fixations in the AOG.
The fraction of total duration of fixations for relevant AOIs over the number of elements of the relevant AOIs.
The fraction of total duration of fixations for irrelevant AOIs over the number of elements of the irrelevant AOIs.
Total number of saccades while performing the task. A saccade is a sudden and quick eye-movement lasting between 40 to 50 ms.
Number of saccades to the key AOI.
Overall weighted score resulting from the application of the TLX questionnaire, covering perceived mental, physical and temporal demand, performance, effort and frustration for performing a task.