The left-hand side (LHS) figure presents the matching results from GVT, while the right-hand side (RHS) figure showcases those from GUIPilot.
In each figure, the original screen is displayed on the left, and the mutated screen appears on the right.
Red Boxes: Highlight widgets that could not be matched => will be reported as missing or extra.
Green Boxes: Denote widgets that are unaffected by the mutation.
Yellow Boxes: Widgets whose relative positions have changed due to the mutation but can still identify a match.
Black lines: Highlight certain matches reported by the approach.
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT will mismatch the rest of the widgets if their positions are shifted.
GUIPilot correctly matches the shifted widgets
GVT fails to match the widget with its swapping pair, therefore GVT will not report the swapping pattern but report individual missing or extra widget.
GUIPilot can still match the widget with its swapping pair.
GVT fails to match the widget with its swapping pair, therefore GVT will not report the swapping pattern but report individual missing or extra widget.
GUIPilot can still match the widget with its swapping pair.
GVT fails to match the widget with its swapping pair, therefore GVT will not report the swapping pattern but report individual missing or extra widget.
GUIPilot can still match the widget with its swapping pair.
GVT fails to match the widget with its swapping pair, therefore GVT will not report the swapping pattern but report individual missing or extra widget.
GUIPilot can still match the widget with its swapping pair.
We only show the matching results for GUIPilot.
The main reason for FP is: If the widget layout changes too significantly, GUIPilot will produce mismatches.
A remedy is to design a more robust similarity metric that does not rely on the relative position.
Example 1:
Example 2:
Example 3:
Example 4:
Example 1:
Example 2:
Example 3:
Example 4:
Example 1: We swap two pairs of widgets; however, they occasionally overlap. This overlapping causes one widget to incorrectly match with a widget from the other pair.
Example 2: When we swap two widgets, one widget overlays the other, occupying a large area and affecting the color of the underlying widget.
We only show the matching results for GUIPilot.
The main reasons for FN are:
(1) If the widget layout changes too significantly, GUIPilot will produce mismatches, the true missing/extra elements are wrongly paired with another irrelevant widget.
(2) If the inserted widgets are undetected by the object detector, they will not participate in the follow-up matching step and, therefore will be unreported.
A remedy is to design a more robust similarity metric that does not rely on the relative position.
Example 1: Mismatch
Example 2: Mismatch
Example 3: The object detector didn't identify the new inserted widget.
Example 4: The object detector didnt identify the new inserted widget.
Example 1: Mismatch
Example 2: Mismatch
Example 3: Mismatch
Example 4: Mismatch
To justify the necessity of our design, we also try to prompt VLM with two screens and ask it to report the inconsistencies. We use the following prompt: