Rx Framework
Cross-Device Record and Replay for Android Applications
Cross-Device Record and Replay for Android Applications
Record and replay is a trending technology, and cross-device and replay is a variant that enables developers to record once and replay everywhere benefiting a broad spectrum of testing and debugging practices. Prior work on record and replay either target at same-device or optimistically (and incorrectly) assume that an app's GUI changes only a little (i.e., allowing some views to be hidden by a scrollable list). However, we observed that an app's GUI layout always automatically adapts to a device’s screen size or orientation, making the app installed with a single APK look drastically different on distinct devices. To bridge the gap, this paper presents a replay framework called Rx that enables a practical, one-pass greedy cross-device replay by mimicking a human replayer based the principle of least surprise in GUI design that: an app when adapting to distinct devices obeying spatial locality and responsive patterns. We evaluate the efficacy of Rx using 37 popular commercial apps (with 14+7 presented in our paper). The results are promising showing that Rx outperforms prior work by over 47% in cross-device record and replay. Rx also finds 2 unique functional bugs and many accessibility bugs, where we reported the accessibility bug of Microsoft Outlook and received their confirmation.
The motivating examples demonstrate two things:
apps even simple, if developed by good developers and well-known software companies, would responsively adapt their GUIs to different devices, and
our tool Rx can detect such adaptions, and replay accordingly
In following examples, Google Calculator and Microsoft Word are installed to distinct devices (Pixel XL and Pixel C) with a single APK of exactly the same version. They look totally different and breaks the assumption of prior work. To the best of our knowledge, even though Google Calculator is extremely simple, no prior work can successfully replay it.
Rx records usage scenarios and replays using responsive patterns reveal-option and reveal-tab. Worth to note that, in the Microsoft Word demo, the default-settings Rx cannot find the "New"/"+" button that creates a new document; we extend Rx with a transform plugin to help create a new document. Considering this is not the default settings, we does not present the extended portion in our paper.
Advanced Calculation using Advanced Pad (Reveal)
Insert a Default Table (Reveal)
Given an app A, at record time (on device D), Rx takes a well-established approach in event tracing. Each logged event is an object e where:
e.type is the event type, e.g., ui:click or sys:gps,
e.params contains the event's parameters,
e.receiver represents the event's associated receiver object (e.g., an Android view for a GUI event), and
e.context is the event's context object consisting of the timestamp, GUI, etc.
Note that e.receiver (r for abbreviation) and e.context (c for abbreviation) themselves are also objects. For example,
r.id and r.text denote the receiver object's resource ID and displaying text;
c.UI denotes the GUI before e is executed.
For any object, which attributes should be logged depends on the actual system implementation.
Our fundamental observation is that a human replayer has no difficulty in replaying event sequences because well-designed apps should have a natural, standard touch-friendly GUI structure following the principle of least surprise. Particularly, we identified that:
Views have spatial locality that: views providing similar functionalities are likely to be placed in a visually adjacent block (namely, a view segment in our paper, denoted by S).
View groups obey responsive patterns, a limited set of officially recommended UI designs constructing for expanding/collapsing/reflowing views.
Based on the above observations, given a recorded event trace (list) \tau on device D, the Rx replay framework finds a proper triggering event sequence to manifest e.receiver's correspondence on hat{D2}.UI via TryReplay.
Please nagivate to our paper for detailed replaying algorithm as well as the framework.
Rx's realization of the three interfaces SegmentUI, Similarity, and RespPatterns takes assumptions and techniques that are well-known to the research community [VIPS, SARA, RANDR, CraftDroid, ATM] whose validity has been confirmed by the corresponding papers. We acknowledge that better implentations (e.g., statistical learning) may exist, by taking existing heuristic realizations, we'd argue even simple realizations can lead Rx to good results.
General. We provide a simple but classic/standard treatment as a reference realization, to prove the effectiveness of the Rx framework. Specifically, SegmentUI adapts and extends a standard treatment VIPS in the UI segmentation field. It follows a bisection algorithm to recursively segment the UI according to a hueristic Naturalness score. The Naturalness score is judged by weighing six σ functions which scores whether two views should be categorized into the same segment or not.
Functions. For σ functions, half (3/6) are explicitly defined by VIPS but adapted to Android; and the others (3/6) are Android-specific. All σs receive an equal weight (1/6).
[Inherited] σcolor checks the background color of two views: the more similar the color, the more it indicates that they belong to the same segment. E.g., all number views in GoogleCalculator app have the same background color which is different from operator views. This is an instinct visual signal.
[Inherited] σdistance checks the distance of two views: the farther they are, the more it indicates that they belong to different segments. E.g., "0" and "result" are far from each other. This is an instinct visual signal.
[Inherited] σinfo checks how much information (e.g., the number of available properties) they are carrying: the more different, the more it indicates that they belong to different segments. This is because views in a single segment are typically similar.
[Specific] σclass checks the class of two views: different means that they are probably in different segments. E.g., "0" is a button whereas "result" is a text.
[Specific] σscroll checks the scrollable parent of them: different means more probability that they are in different segments. E.g., "result" has a scrollable parent whereas "0" does not. This is often because a list item and an non-list item often belongs to different segment.
[Specific] σnumber checks the different number of views in both segments. The more different, the more it indicates that they belong to different segments.
Other Implementations. Other implenetations include processing the segment's screenshot by old-fashion computer vision methods like Remaui and OldFashion's segmentation, or by deep-learningg computer vision methods like Rico's design semantic detection.
Source. Please check algo/defaults/normalizer/segment.ts.
General. The goal of Similarity is to detect the semantic distance of two segments. This is similar to (but different from) semantic event matching. However, many ideas can be borrowed. Rx's prototype also provides a simple/standard reference implementation that abstracts each segment as a document, feature-engineers to tf-idf vectors, and calculates their distance by cosine distance. Such treatment are extensively used by many existing literatures like ATM and CraftDroid.
Other Implementations. Other implementations include comparing segments via one-by-one different-categorized view-properties like GUIDER, old-fashion computer-vison based methods for example LIRAT and Meter which model each segment as a pure image and compare them via image-descriptors like SIFT, or deep-learning methods.
Source. Please check algo/defaults/matcher.ts.
General. We induced 13 concrete patterns by investigating the Material Design Responsive Patterns Guildline, Android Documentation, as well as the Top-100 apps of Google Play Store.
Rank. We rank them by their frequencies that they appear in the Top-100 apps and our experiences following the principle of least surprise and Occam's razor.
Representative Examples. Please navigate to Material Design for representative examples of some well-knowns patterns like Reveal and Expand. We also copied them in below. These examples clearly present how to find some hidden triggering events (widgets). These examples are a greate plus to understand our paper and we consider futher clarifying them in future revision.
List and Descriptions. We list them here in the order of their ranks for a reference. In following, for any pattern Px(e, S'), we use r to represent e.receiver.
1. Pex-vt(e, S'): Expand-1 (VagueText)
Belonging: Expand
Description: When a view is resized, it's text is maybe expanded to become longer, or shrinked to be shorter.
Responsive: Returns a click event on view v' ∈ S' if v'.text starts with r.text or vice versa.
Inverse: Return a back key event.
2. Pex-vte(e, S'): Expand-2 (VagueTextExt)
Belonging: Expand
Description: VagueTextExt differs from VagueText that it finds the view v bottom-up along the segment tree.
Responsive: Finds the view v bottom-up along the segment tree.
Inverse: Return a back key event.
3. Pex-vtd(e, S'): Expand-3 (VagueTextDesc)
Belonging: Expand
Description: VagueTextDesc handles those views that hide their text on some devices behind content description, but show their text on other devices.
Responsive: Returns a click event on view v' ∈ S' if v'.desc equals to r.text or r.desc equals to v'.text.
Inverse: Returns a back key event.
4. Pex-sl(e, S'): Scroll
Belonging: Expand
Description: Scroll is the pattern that handles scrollable parent. Scroll assumes that the view to be triggered resides in a scrollable parent.
Responsive: Returns a scroll event on r.parent if r.parent ∈ S' and is scrollable.
Inverse: Returns a scroll event on r.parent in an opposite direction.
5. Pex-opt(e, S'): Options
Belonging: Reveal
Description: MoreOption is the pattern for more options button (often located at the top-right, or end of a container, and responsible for hiding/showing some advanced operations).
Responsive: Returns a click event on v' ∈ S' if v'.desc contains text "More Options".
Inverse: Returns a click event on blank spaces.
6. Pex-dm(e, S'): Menu
Belonging: Reveal
Description: DrawerMenu is the pattern for drawer button (often located at the top-left corner, and responsible for showing/hiding the drawer). It checks the state of this button and returns a click event accordingly.
Responsive: Expands the menu by returning a click event on v' ∈ S' if v'.desc contains text "Close Navigation Drawer", or collapse the menu by returning a click event on v' ∈ S' if v'.desc contains text "Open Navigation Drawer".
Inverse: Returns a back key event.
7. Pex-tht(e, S'): Tranx-1 (TabHostTab)
Belonging: Transform
Description: TabHostTab is the pattern for tabs of a TabHost. TabHost assumes that all tabs of a TabHost are placed either together or hidden by some tabs, and one can find the target tab by finding the tab on the view, or firstly find its sibling tab then check if there are siblings shown.
Responsive: Returns a click event on v' ∈ S' if v' is a sibling of r in S and r is a tab.
Inverse: Returns a back key event.
8. Pex-thc(e, S'): Tranx-2 (TabHostContent)
Belonging: Transform
Description: TabHostContent is the pattern for contents of a TabHost. Like TabHost, TabHostContent will find the corresponding tab of the view to be triggered. This assumes and applies like what TabHostTab does.
Responsive: Firstly locate r's belonging tab v ∈ S, then returns a click event on v' ∈ S' if v'is a sibling of v in S.
Inverse: Returns a back key event.
9. Pex-dsvp(e, S'): Pager-1 (DoubleSideViewPager)
Belonging: Transform
Description: DoubleSideViewPager is the pattern for handling ViewPager that both exists in recordee and playee. This pattern assumes that the view to be triggered is hided by its ViewPager parent, and can be manifested by this parent.
Responsive: Returns a click event on v' ∈ S' if r.parent is a ViewPager and v' is its conterpart in S'.
Inverse: Returns a back key event.
10. Pex-ssvp(e, S'): Tranx-2 (SingleSideViewPager)
Belonging: Transform
Description: Different from DoubleSideViewPager, SingleSideViewPager is the pattern for handling ViewPager that only shows in playee side. Same as DoubleSideViewPager, this pattern also manifests the views by operating its parents.
Responsive: Considering ViewPager only shows in playee side, this pattern then returns a click event on v' ∈ S' if v'.parent is a ViewPager and r is a child of v'.
Inverse: Returns a back key event.
11. Pex-dfdp(e, S'): Divide-1 (DualFragmentGotoDescriptive)
Belonging: Merge/Divide
Description: DualFragmentGotoDetailed handles dual fragment that an action is triggered at the detailed view, and we are currently at the descriptive preview, we need to firstly parse and get the selected item in the descriptive preview, and the goto the detailed view by tapping the selected item.
Responsive: Returns a BACK key event if we are in a detailed fragment at present.
Inverse: Returns a click event on view v* (of current replaying device's GUI) that is most similar to S'.
12. Pex-dfde(e, S): Divide-2 (DualFragmentGotoDetailed)
Belonging: Merge/Divide
Description: DoubleSideViewPager is the pattern for handling ViewPager that both exists in recordee and playee. This pattern assumes that the view to be triggered is hided by its ViewPager parent, and can be manifested by this parent.
Responsive: Returns a click event on view v' ∈ S' if v'.parent is a list containing a view whose text is r's fragment title, and v' is selected, and v' resides in a descriptive fragment.
Inverse: Returns a back key event.
13. Pex-nu(e, S'): Navigte (NavigationUp)
Belonging: Transform
Description: NavigationUp is the most frequently used Transform pattern who simulates an return-back action.
Responsive: This pattern returns a click event on view v' ∈ S' if v'.desc contains text "Navigation Up".
Inverse: Returns a click event on view v* (of current replaying device's GUI) that is most similar to S'.
Source. Please check algo/defaults/recognizer/patterns.ts.
The following gives the segmentation and matching result of the apps mentioned in our paper. For each example in following, the above segment the whole GUI by highlighted red bound boxes and the below shows their mathcing using colors (same color means a match).
Segment and Match Ex. 1
Segment and Match Ex. 2
Segment and Match Ex. 3
Segment and Match Ex. 4
Our evaluation is designed around the following two research questions:
RQ1: To what extent does Rx push forward the state-of-the-art of cross-device record and replay?
RQ2: What are the causes of replay failures for Rx?
Test Devcies: To answer these two questions, we collect top commercial apps from Google Play Store and conduct record and replay on three typical emulated Android devices of different screen sizes and pixel density:
Pixel XL: Portrait, 1440x2560, 560dpi,
Pixel 3 XL: Portrait, 1440x2960, 560dpi, and
Pixel C: Landscape, 2560x1800, 320dpi.
Baselines: We select the state-of-the-art open-source record and replay tools RERAN (for single-device record and replay) and SRAR (for both single- and cross-device record and replay) as our comparison baselines. For closed-source tools, we contacted the authors of RANDR and V2S for tool binaries but received no response. We also tried to compare with a deprecated tool appetizer, which was state of the practice. Unfortunately, we have to exclude it because it is extremely unstable and frequently loses event.
Comparison Subjects: Given that almost every app uses a scrollable list, we thereby collect the first set by picking the top-1 or top-2 apps of each category ranked by AppBrains that are compatible with our evaluation environment. It is worth to note that,
we omited Games, Demos and Libraries, and Auto and Vehicles categories, because they do not provide a user interface or use a custom rending;
we omited Tools, and Social categories, because we cannot run them on some of our devices (e.g., crash or require a physical device);
we also omited Finance category because downloading them breaks the DMCA copyright.
This selection criteria yielded a set of 30 apps. We then downloaded these apps from ApkCombo. Note that, considering there are 14 apps that SARA failed to parse, and 2 apps often crashing, we only present 14 apps in our paper.
Case Study Subjects: The second set of 7 apps concerns the practical usefulness of cross-device record and replay, whose usage scenarios are out of the capability of any known available record-and-replay tool. We selected 7 top-rated "killer" apps developed by most renowned software companies. Most of them are heavy apps with a complex GUI and behaves quite differently on different screen sizes, except for the interesting case of Android's official Calculator app, which adopts a non-trivial expand responsive pattern.
For each evaluated app, we followed existing work [Lam et. al FSE'17] and [Guo et. al ISSTA'19] to select and create usage scenarios. That means, we create 3-5 typical usage scenarios for these apps that represent the most common functionalities according to their app description in Google Play and our using experiences. We then replay them in distinct devices. We conduct both single-device and cross-device replay for all apps. Specifically,
We compare with RERAN and SARA for single-device replay using the comparison apps;
We compare with SARA for cross-device replay using the comparison apps;
We do not compare with any baseline considering their incapability of replaying GUIs powered by responsive patterns for the case study apps.
This produces a 103x9x3 "UsageScenarios x Replay x Tool" matrix for the comparison apps, and a 30x9x1 matrix for the case study apps. As presented in our paper, it is a 47x9x3 matrix and a 30x6x1 matrix.
The evaluation results are presented in following sheets, where the first four sheets are results that also present in our paper, and the rest sheets include a full-evaluation results with a detailed description of each evaluated technique, including apps not present in our paper.
In the sheets, a "v" mark represents a replay success, whereas an "x" stands for a replay failure.
Overall, results show that Rx significantly outperforms existing techniques, pushes forward the state-of-the-art of cross-device record and replay, and solves many non-trivial cross-device replay cases that are out of any known existing work's scope, with a 83.7% and 79.4% success rate for single-device and cross-device on comparison apps, as well as a 80.0% success rate for case study apps.
Interestingly, Rx also found 2 functional bugs and many accessibility bugs, of which one from Microsoft Outlook was confirmed by developers.
For a comprehensive understanding of the evaluation results, please read the above sheets, or nagivate to our paper.
Record and replay is a foundational technology for a broad spectrum of Android app testing and debugging practices. This paper made the cross-device record and replay practical for industrial-scale apps, by leveraging the principle of least surprise in GUI design, i.e., spatial locality and responsive patterns, with promising experimental evaluation results. We hope this work, which pushes forward the state-of-the-art of cross-device record and replay, will serve as a call for future research along this line.
Please navigate to our github repo.
Note:
The tool is named Rx in our paper yet dx in our repo. They are exactly the same thing.
The tool is developped with an early-version of Deno (a new JavaScript engine by Ryan, Node's author) as well as its standard/third-party libraries like Cliffy. So perhaps it is a little bit uncompatible with Deno's current stable releases. Feel free to create issues or even pull-request once that happens.