Understanding Alexa Skill Security Indicators
Note:
This website is meant to illustrate details about the project regarding to an anonymous submission.
This study's data collection was done between May 2020 to September 2020. All the studies are approved by the university Institutional Review Board (IRB). No personal identifiable information was collected or stored. To comply with the university COVID-19 guideline, no in-person user experiment was conducted.
Amazon Alexa's booming third-party skill market has grown from 160 to 100,000 skills within three years. We aim to demystify the Alexa skill permission system by studying its security indicators. Our user study results show that most of the surveyed Alexa users did not understand the security implications of interacting with third parties via Alexa's voice user interface (VUI). Despite the potential risks of undesired resource sharing, more than two-thirds of the surveyed Alexa users considered third-party skills safe because they think these skills are Alexa- or Amazon-owned applications. Together with other uncovered deficiencies of skill security indicator designs, our study indicates a pressing need for a paradigm shift in designing security indicators for VUI systems.
Two-fold User Study
We design two user studies to explore the never-before-studied skill security indicators.
First, we conduct a user survey with 137 valid Alexa users to quantitatively test the effectiveness of skill security indicators in warning users against potential risks. Specifically, we leveraged a warning analysis model called Communication-Human Information Processing (C-HIP)~\cite{wogalter2006communication} and focus on three key steps between the delivery of a security indicator and users' final behavior: attention, comprehension, and behavior. Each step is the prerequisite of subsequent steps. A failure of any step incurs an ineffective warning delivery.
Second, we perform a qualitative skill experiment by recruiting 41 valid Alexa users to use Alexa skills and conduct an interactive interview. In this experiment, we not only validate the results in the user survey but also scrutinize why users behaved in such ways.
What's the difference?
While the user survey aimed to collect user data based on their past experience, the interview-based skill experiment was designed to capture users' perception of skill security indicators in an immediate, in-depth manner.
How do you design your user survey?
When designing the user survey, we adhered to two design principles.
First, following the best practice in studying security indicators, both user survey and skill experiment were used to correlate and justify the study results. Second, we leveraged C-HIP model as a guideline to design user studies for both studies. Specifically, three key steps in the C-HIP model is used in this work:
Do users pay attention to a skill security indicator? A user needs to switch focus from the primary task (i.e., installing a skill) to the security indicator, and she needs to focus on the security indicator for long enough to read and evaluate them.
Do users understand the risk of granting permissions (for allowing resource access via skill permissions, account linking, or skill I/O) to third-party skills? Users need to understand the scope and implications of the permission.
Do skill security indicators influence users’ installation decisions? Do users ever cancel installation because of the security indicators? Users should not install skills whose permissions exceed their comfort thresholds.
What about recruitment?
We did the recruitment and payment distribution of the user survey using Amazon Mechanical Turk (MTurk). We used different MTurk recruitment filters to ensure the quality of the data collection. For example, we recruited MTurk workers who had high job approval rates (greater than 90%) and a sufficient number of approved jobs (greater than 500). Also, we recruited participants who are from the United States because our targeted Alexa language is English (US).
We initially recruited 150 respondents and paid $10 for each finished task. To ensure the quality of such online data collection, two attention check questions (e.g., ``Can you open the survey link properly?'') to help filter out bots and inattentive respondents.
We also filtered out respondents who were inconsistent in answering the questions. For example, we ask a question at the beginning of the survey: if you own a smartphone? We do not include the answers from those who provided a negative response. We only recruited respondents who have used Alexa, and one needs a mobile app to initiate an Echo device or use the Alexa mobile application.
Alexa users may use different portals. We asked the respondents to select their primary way of exploring and installing skills (either mobile, web, or voice command). We do not consider the results from those who only use voice commands to install skills. This was because skills with account linking or skill permissions cannot be installed via voice commands alone. As a result, 137 respondents' answers were collected. We assigned 18 minutes for a user to finish the survey. The average finish time was 14 minutes 17 seconds.
Among the 137 respondents, 65 were male, and 72 were female, with the remainder declining to identify their gender. Most of the respondents' age was between 29 and 39 (42%). Others' age distribution was: 22% between the ages of 18 and 28, 26% between the ages of 40 and 50, 8% between the ages of 51 and 61, and 2% over the age of 62. Most of the participants were frequent Alexa users (54%). Moreover, 58% of the participants used web Alexa store as their primary Alexa portal. More than 65% of them own a bachelor's degree or higher.
Moreover, we found that some of the respondents said they never used skills with permissions. Note that many skills published in the Alexa store only perform simple tasks like question answering. These skills usually do not request skill permissions or account linking. Thus, since our study is very permission-related, we decided to analyze the results from respondents (124 of them) who have experience related to skill permissions. To address this inconsistency, we have changed the 137 to 124 and explicitly discuss why some respondents’ answers are excluded in the project website.
How do you design the skill experiment?
In the skill experiment, respondents are instructed to stop using the skills once they are installed (except for the first check-in skill) to prevent unwanted information sharing.
For each skill installation instruction, we designed a three-step procedure to mimic the real-world usage of an Alexa skill:
S.1 Participants will be asked to use a skill with the given real-world scenario. For the mobile Alexa users, we asked the respondents to search the skill on their own. Also, \textit{account credentials} are provided if an account linking is needed. We illustrate one example instruction of using the Strava in Alexa: "Pretend that you are a frequent runner and you want to use Alexa to access your Strava account. You use the following skill after searching for different news skills. Please click the link using your phone or computer. www.amazon.com/Running-history-check-Strava-unofficial/dp/B06XG4Z1N6/} "
S.2 Now please try to use it. Note we do not evaluate how you use the Alexa skill. The following interview questions were designed to collect your feedback for using Alexa skills.
S.3 After you used the skill for the first time, please 1) disable the skill, 2) close the skill homepage, and 3) finish the following survey questions. % in the following.
Please feel free to stop using it if you decide not to proceed with the skill.
Skill Permission Prompt
The permission prompt should not be designed like a permission manager. The current design implies a false impression that the skills have been installed, and users are asked to remove any preset permissions if they want.
As discussed in Finding 1, users tend to skip the prompt window as soon as possible by clicking the ``Save Permission'' buttons. First, the skill permission should not be preset to be true.
Second, the prompt window design should encourage users to carefully check the skill permission descriptions because they may not be sure of the risks (Finding 2). For example, similar to the mobile permission prompt design, the skill permissions' prompt window should ask users to choose either ``Allow'' or ``Reject'' for each permission. This can potentially increase the users' willingness to check permission details.
Skill I/O
As mentioned in Finding 5, it is not sufficient to only use a passive warning to alert the users to the risks of skill I/O. How a skill could leverage skill I/O should be clearly described in a more obvious manner. First, skills' I/O-related warning can be considered as a type of skill permission to be displayed in the permission prompt window. This would help increase the attention rate and help users understand what the risks could be.
Another way to alert the users is to play audio warnings when users are interacting with the skills. Unfortunately, this is a challenging task because it will significantly decrease usability by reducing the effective interaction time. Audio-based warning messages could still be useful. For example, Alexa may use a different voice or accent when playing speech from a third-party skill.
We also believe that post-usage warnings can be helpful. For example, in the current iOS permission system, a post-usage warning for background resource usage will warn users of how many times an app (such as Google Maps) has used the user's resource (e.g., location) in the background. Alexa may learn the lesson of leveraging post-usage warnings to help users understand implicit permission usage.
Skill Identity and Reviews.
According to Finding 3, many Alexa users may not know who manages the skills. When further analyzing the user study result, we find that users often paid more attention to the other elements (instead of the skill security indicators) such as skill names and user reviews. We recommend that more indicator-related designs for these elements should be added to the skill homepages.
For example, Twitter uses a blue verified badge besides the account public profile name to let users know that ``an account of public interest is authentic.'' Also, the platform could highlight useful reviews to help users understand the skills' problems. This would potentially help users pay more attention to third-party skills' security- and privacy-related issues.
Educational UI
We find that many users have limited knowledge about Alexa-specific permissions. An educational user interface design should be applied in the Alexa installation process. In the current permission prompt, only vague descriptions are provided for Alexa-specific permissions, such as the unclear definition of "Alexa Reminders". It is not sufficient to help the user understand what could be the potential risks. First, any deceptions shown in skill security indicators should be accurate and easy to understand. Second, more contextual information (e.g., how third parties would use users' data) can be included in the skill homepage.