Research

1. Algebraic-datatype taint tracking, with applications to understanding Android identifier leaks

Current taint analyses track flow from sources to sinks, and report the results simply as source → sink pairs, or flows. This is imprecise and ineffective in many real-world scenarios; examples include taint sources that are mutually exclusive, or flows that combine sources (e.g., IMEI and MAC Address are concatenated, hashed, leaked vs. IMEI and MAC Address hashed separately and leaked separately). These shortcomings are particularly acute in the context of Android, where sensitive identifiers can be combined, processed, and then leaked, in complicated ways. To address these issues, we introduce a novel, algebraic-datatype taint analysis that generates rich yet concise taint signatures involving AND, XOR, hashing – akin to algebraic, product and sum, types. We implemented our approach as a static analysis for Android that derives app leak signatures – an algebraic representation of how, and where, hardware/software identifiers are manipulated before being exfiltrated to the network. We perform six empirical studies of algebraic-datatype taint tracking on 1,000 top apps from Google Play and their embedded libraries, including: discerning between “raw” and hashed flows which eliminates a source of imprecision in current analyses; finding apps and libraries that go against Google Play’s guidelines by (ab)using hardware identifiers; showing that third-party code, rather than app code, is the predominant source of leaks; exposing potential de-anonymization practices; and quantifying how apps have become more privacy-friendly over the past two years.

Details can be found HERE

Presentation can be found here

2. Diagnosing Medical Score Calculator Apps

Mobile medical score calculator apps are widely used among practitioners to help make decisions regarding patient treatment and diagnosis. Errors in score definition, input, or calculations can result in severe and potentially life-threatening situations. Despite these high stakes, there has been no systematic or rigorous effort to examine and verify score calculator apps. We address these issues via a novel, interval-based score-checking approach. Based on our observation that medical reference tables themselves may contain errors (which can propagate to apps) we first introduce automated correctness checking of reference tables. Specifically, we reduce score correctness checking to partition checking (coverage and non-overlap) over score parameters' ranges. We checked 12 scoring systems used in emergency, intensive, and acute care. Surprisingly, though some of these scores have been used for decades, we found errors in 5 score specifications: 8 coverage violations and 3 non-overlap violations. Second, we design and implement an automatic, dynamic analysis-based approach for verifying score correctness in a given Android app; the approach combines efficient, automatic GUI extraction and app exploration with partition/consistency checking to expose app errors. We applied the approach to 90 Android apps that implement medical score calculators. We found 23 coverage violations in 11 apps; 32 non-overlap violations in 12 apps, and 16 incorrect score calculations in 16 apps. We reported all findings to developers, which so far has led to fixes in 6 apps.

Details can be found HERE

Presentation can be found here

3. Detecting Potential User-data Save & Export Losses due to Android App Termination

A common feature in Android apps is saving, or exporting, user’s work (e.g., a drawing) as well as data (e.g., a spreadsheet) onto local storage, as a file. Due to the volatile nature of the OS and the mobile environment in general, the system can terminate apps without notice, which prevents the execution of file write operations; consequently, user data that was supposed to be saved/exported is instead lost. Testing apps for such potential losses raises several challenges: how to identify data originating from user input or resulting from user action (then check whether it is saved), and how to reproduce a potential error by terminating the app at the exact moment when unsaved changes are pending. We address these challenges via an approach that finds potential “lost writes”, i.e., user data supposed to be written to a file, but the file write does not take place due to system-initiated termination. Our approach consists of two phases: a static analysis that finds potential losses and a dynamic loss verification phase where we compare lossy and lossless system-level file write traces to confirm errors. We ran our analysis on 2,182 apps from Google Play and 38 apps from F-Droid. Our approach found 163 apps where termination caused losses, including losing user’s app-specific data, notes, photos, user’s work and settings. In contrast, two state-of-the-art tools aimed at finding volatility errors in Android apps failed to discover the issues we found.

Details can be found HERE

4. Quantifying Nondeterminism and Inconsistency in Self-organizing Map Implementations

Self-organizing maps (SOMs) are a popular approach for neural network-based unsupervised learning. However the reliability of self-organizing map implementations has not been investigated. Using internal and external metrics, we define and check two basic SOM properties. First, determinism: a given SOM implementation should produce the same SOM when run repeatedly on the same training dataset. Second, consistency: two SOM implementations should produce similar SOMs when presented with the same training dataset. We check these properties in four popular SOM implementations. We ran our approach on 381 popular datasets used in health, medicine, and other critical domains. We found that implementations violate these basic properties. For example, 375 out of 381 datasets have nondeterministic outcomes; for 51–92% of datasets, toolkits yield significantly different SOM clusterings; and clustering accuracy might be so inconsistent as to vary by a factor of four between toolkits. This undermines SOM reliability, and the reliability of results obtained via SOMs. Our study shines a light on what to expect, in practice, when running actual SOM implementations. Our findings suggest that for critical applications, SOM users should not take reliability for granted; rather, multiple runs and different toolkits should be considered and compared.

Details can be found HERE

Presentation can be found here

5. On the effectiveness of random testing for Android: or how i learned to stop worrying and love the monkey

Random testing of Android apps is attractive due to ease-of-use and scalability, but its effectiveness could be questioned. Prior studies have shown that Monkey - a simple approach and tool for random testing of Android apps - is surprisingly effective, "beating" much more sophisticated tools by achieving higher coverage. We study how Monkey's parameters affect code coverage (at class, method, block, and line levels) and set out to answer several research questions centered around improving the effectiveness of Monkey-based random testing in Android, and how it compares with manual exploration. First, we show that random stress testing via Monkey is extremely efficient (85 seconds on average) and effective at crashing apps, including 15 widely-used apps that have millions (or even billions) of installs. Second, we vary Monkey's event distribution to change app behavior and measured the resulting coverage. We found that, except for isolated cases, altering Monkey's default event distribution is unlikely to lead to higher coverage. Third, we manually explore 62 apps and compare the resulting coverages; we found that coverage achieved via manual exploration is just 2--3% higher than that achieved via Monkey exploration. Finally, our analysis shows that coarse-grained coverage is highly indicative of fine-grained coverage, hence coarse-grained coverage (which imposes low collection overhead) hits a performance vs accuracy sweet spot.

Details can be found HERE

Presentation can be found here

6. Keystroke/ Mouse Usage Based Emotion Detection and User Identification

Emotions are primarily thought of as mental experiences of body states, which are mostly shown in the face with precise and speciﬁc muscle patterns. It is, perhaps, the most critical attribute of living beings, and is extremely difﬁcult to detect and generate artiﬁcially. Its detection always remains a well-explored classical problem. Existing approaches for detecting human emotions generally demand signiﬁcant infrastructural overheads. Excluding these overheads, in this paper, we propose a much simpler way of emotion detection. To do so, We have

induced different states of emotion through different multi-media components, and then collected participants’ keystrokes (free text) and mouse usage data through a custom-developed survey. We have used several existing classiﬁers (KNN, KStar,

RandomCommittee and RandomForest) and a newly proposed light-weight classiﬁer namely Bounded K-means Clustering, to analyze those usage data for different emotional states. Our analysis demonstrates that emotion can be detected from the usage data up to a certain level. Moreover, our proposed classiﬁer enables the best detection of ﬁve emotional states namely happiness, inspiration, sympathy, disgust, and fear compared to other existing classiﬁers. Besides, the analysis also reveals that user identiﬁcation through usage dynamics does not result in a good level of accuracy when usage gets inﬂuenced by different emotional states.

Details can be found HERE

Presentation can be found HERE

Video

7. Evaluation of Android Security (Undergraduate Thesis)

Now a days, the number of smartphones are increasing rapidly with the advancement of networking & technology. Smartphones are get-ting popularity for their usage on our daily life. Users makes their best use of smartphones by using various kinds of applications. These applications uses features of smartphones and make itself a necessity for the user. These applications are downloadable from trusted source and other third party websites. Trusted stores can’t fully guarantee, but almost assures harmless applications. But third party web-sites are not guaranteed to be harmless for the user. As a result of downloading applications from these websites, users sometimes get victim by loosing their valuable information to the third party. So detecting malicious applications have became a great issue for users. Reverse Engineering on malware analysis is a process which is used on malware in order to understand its operation, code structure and it’s functionality. Our work aims to understand the operation of a malware and investigate the parameters, code and structure which is created or modified by the malicious software. To investigate the malware we found various tools. We can categorize this tool in 2 types: Static and Dynamic Analysis. We used Androguard for static analysis and Asef for Dynamic Analysis. We used Drozer which is leading security framework which determines different security holes in the android Applications.

Details can be found HERE

Poster Presentation can be found HERE

Related presentations can be found here presentation1, presentation2