RQ3: How accurate are LLM1 and LLM2 for the transformation tasks in LeGEND?
RQ3: How accurate are LLM1 and LLM2 for the transformation tasks in LeGEND?
Experiment Design
To address RQ3, we investigate the accuracy of LLM1 for the extraction in Phase 1 and LLM2 for the conversion in Phase 2. We answer RQ3 empirically by a user study, since it is difficult to assess the accuracy of the transformations. Specifically, we perform an online survey with 10 graduate students recruited from our department of computer science. Among the participants, six people have more than two years’ research experience in the field of autonomous driving.
In the study, since it takes one user about 5 minutes to finish the questions regarding one accident report, we randomly selected 5 accident reports from our database as test seeds. We then ask each participant to read each accident report and express their opinions on the statements regarding the extracted functional scenarios and converted logical scenarios. Participants are asked to rate their agreement on a 5-point Likert scale, ranging from “Strongly Disagree” to “Strongly Agree”. In addition, we give participants an opportunity to provide supplementary feedback, such as brief textual comments, particularly for the instances on which their opinions are not strong.
Statistical Results
Table 4 lists the statements and the selected accident reports utilized in the study. We construct five statements specifically for evaluation purposes. The statements related to the intermediate representation cover aspects such as road structures (Statement #1), initial vehicle actions (Statement #2), and the interactive patterns (Statement #3). The statements related to logical scenarios focus on the correctness of the logical scenario template (Statement #4) and the parameter ranges (Statement #5). For the evaluation, we select 5 accident reports, with the main distinguishing features summarized in Table 4.
Figure 6: Results over different cases