The dataset is built of end-to-end test flows for 4 popular and active open-source applications (50+ contributors, 1000+ commits, 5000+ stars) :
Bookstack: A hierarchical documentation management platform with rich text editing. Demo.
Indico: An event management system used for organizing conferences, meetings, and lectures. Demo.
Invoiceninja: A business-oriented invoicing platform with multi-step workflows. Demo.
Prestashop: A full-stack e-commerce platform with product management and checkout features. Demo.
Apps are containerized with a fixed version for reproducibility and local development.
To show how components of a testcase is constructed, below are the main steps performed.
1/ Playwright test script
The first step is to construct test scripts to perform UI actions. Below is an excerpt of the script to illustrate.
2/ Recording step traces
Then we construct ground truth tuples of (url, action, xpath) for each step at runtime using a Recorder that intercepts each step of the script.
3/ Construct natural language steps
Finally, to construct input to the agent, we convert the steps into NL description.
We examine closed GitHub issues labeled "Bug" from each application analyze the main themes of the bug. Bugs are categorized into the following categories:
Visual:
NA: Navigation Error (29)
Functional:
NOOP: Operation no response (21)
NAV: Navigation logic error (13)
EXEC: Unexpected task result (43)
This bug refers to the absence of essential user interface components on the page. Real-world issue: (indico/#239) "Send" button is missing from request recording in lectures.
This bug occurs when a user action triggers no visual feedback or system response. Real-world issue: (bookstack/#5323) No message when user has no right to delete attachment.
This bug arises when user interactions redirect or transition the application to unintended pages. Real-world issue: (prestashop/#39044) Clicking a product in "All Stores" send you to the "Order" page not the "Edit Product“ page.
This bug occurs when the output or state resulting from an operation does not match the expected specifications or requirements. Real issue (invoiceninja/#10351) Generating a PDF statement for a client shows the wrong client name and address.