Assessing Intrusiveness of Smartphone Apps

~i~)
Assessing Intrusiveness of Smartphone Apps
by
Fan Zhang
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2012
@2012 Fan Zhang. All rights reserved.
The author hereby grants to M.I.T. permission to reproduce and to
distribute publicly paper and electronic copies of this thesis document
in whole and in part in any medium now known or hereafter created.
.........................
A uthor ..... .
Department of Electrical Engineering and Computer Science
August 22, 2012
Certified by..
f
Hal Abelson
Professor
Thesis Supervisor
Accepted by ....
Prof. Dennis M. Free -n
Chairman, Masters of Engineering Thesis Committee
Assessing Intrusiveness of Smartphone Apps
by
Fan Zhang
Submitted to the Department of Electrical Engineering and Computer Science
on August 22, 2012, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
We tackle the challenge of improving transparency for smartphone apps by focusing
on the intrusiveness component of assessing privacy risk. Specifically, we develop
a framework for qualitatively assessing and quantitatively measuring the intrusiveness of apps based on their data access behavior. This framework has two essential
components: 1) the Privacy Fingerprint, a concise yet holistic visual that captures
the data access patterns unique to each app, including which types and under which
privacy-relevant usage contexts sensitive data are collected, and 2) an Intrusiveness
Score that numerically measures each app's level of intrusiveness, based on real data
accesses gathered from empirical testing on about 40 popular Android apps across
4 app categories. Used together, the Privacy Fingerprint and Intrusiveness Score
help smartphone users easily and accurately assess the relative intrusiveness of apps
during the decision-making process of installing apps. Our study demonstrates that
the Intrusiveness Score is especially useful in helping to compare apps that exhibit
similar types of data accesses. Another major contribution of the thesis is the identification and quantification of the proportion of accesses that are made while the
user is idle. As our preliminary user study will show, this level of idle access activity
significantly enhances the profiling potential of apps, increasing the app's intrusiveness. When quantified, idle access activity exerts significant impact on changes in
an app's Intrusiveness Score and its relative intrusiveness ranking within a given app
category.
Thesis Supervisor: Hal Abelson
Title: Professor
3
4
Acknowledgments
This thesis would not have been possible without the help and support of many
people.
First and foremost, I would like to thank my advisor, Hal Abelson, for his guidance
and encouragement throughout these past years. His shrewd insights and constructive
critiques have pushed me to refine my ideas and turn them into more compelling
arguments. He has always been very supportive, believing in me even when I was
faltering.
Fuming Shih has been a wonderful source of ideas and illuminating discussions.
I thank him for his numerous insights and patience through our long weekly brainstorming sessions, where our wandering thoughts crystallized into fully-formed ideas.
I appreciate his constant encouragement and his contagious enthusiasm. The novel
ideas presented in this thesis are inspired by our talks.
Wen Dong helped me tremendously in processing the location data and turning
them into much more appealing visuals and animations.
I thank him for all his
thorough explanations and useful tips.
Brian Patt was an enormous help in the initial stages of this thesis. He provided
much valued expertise in navigating through undocumented Android Source Code.
His advice was always very useful, and without his help, I would not have been able
to move forward from the implementation stages.
K. Krasnow Waterman was a great source of background knowledge during the
initial phases of framing the thesis. I thank her for helping me find a suitable topic
that was worth studying.
My friends have provided me with constant emotional support. I thank them for
their understanding, even when I disappeared for long stretches of time to work on
this thesis, and for being there to entertain me during my much needed breaks.
Finally, I'd like to thank my parents for their unwavering support and encouragement in all my endeavors. They have always be patient and kind, even when I was
undeserving, and this thesis is a tribute to them.
5
6
Contents
1
2
3
4
Introduction
17
1.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.2
Our Research
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
1.3
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
1.4
Thesis Roadmap
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Motivating Scenario
23
2.1
Scenario Description
. . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.2
Scenario Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.3
Benefit of Privacy Fingerprint and Intrusiveness Score . . . . . . . . .
26
2.4
Descriptions of the Privacy Fingerprint and Intrusiveness Score . . . .
29
Related Work
31
3.1
Current Transparency Mechanisms
3.2
Information Monitoring Frameworks
. . . . . . . . . . . . . . . . . . .
31
. . . . . . . . . . . . . . . . . .
33
3.3
Using Transparency to Increase User Control . . . . . . . . . . . . . .
35
3.4
Improving Transparency Interfaces
. . . . . . . . . . . . . . . . . . .
35
3.5
Quantifying App Behavior . . . . . . . . . . . . . . . . . . . . . . . .
36
AppWindow Architecture
39
4.1
Monitoring Framework . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.2
Logger Application . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.3
U se Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
7
5
45
Intrusiveness Studies
5.1
5.2
Qualitative Intrusiveness Study
. . . . . . . . . . . . . . . . . .
45
5.1.1
Experimental Design .
. . . . . . . . . . . . . . . . . .
46
5.1.2
Profiling Simulation . . . .
. . . . . . . . . . . . . . . . . .
46
. . . . . . . . . . . . . . . . . .
48
. . . . . . . . . . . . . . . . . .
48
Quantitative Intrusiveness Study
5.2.1
Experimental Design . . .
6 Quantifying Intrusiveness
6.1
6.2
6.3
6.4
7
8
51
Intrusiveness Factors . . . . . . . . . . . . . . . . . . .
51
6.1.1
Usage Factors . . . . . . . . . . . . . . . . . . .
51
6.1.2
Datatype Identifiability . . . . . . . . . . . . . .
52
. . . . . . . . . .
53
Formulating the Privacy Fingerprint
6.2.1
High-level Approach
. . . . . . . . . . . . . . .
53
6.2.2
Step 1: Generating Access Contexts . . . . . . .
54
6.2.3
Step 2: Assigning Intrusiveness Weights to Each Context . . .
54
6.2.4
Step 3: Computing the Frequency Distribution of Access Contexts 57
6.2.5
Step 4: Visualizing the Privacy Fingerprint . . .
59
Computing the Intrusiveness Score . . . . . . . . . . .
60
6.3.1
Intrusiveness Score Formula . . . . . . . . . . .
60
6.3.2
Intrusiveness Subscores . . . . . . . . . . . . . .
61
Discussion . . . . . . . . . . . . . . . . . . . . . . . . .
61
Results for Qualitative Intrusiveness Study
65
7.1
Unexpected Accesses . . . . . . . . . . . . . . . . . . .
65
7.2
Profiling Simulations . . . . . . . . . . . . . . . . . . .
67
7.2.1
Movement Profiling . . . . . . . . . . . . . . . .
67
7.2.2
Activity Profiling . . . . . . . . . . . . . . . . . . . . . . . . .
68
7.2.3
Interaction Profiling
. . . . . . . . . . . . . . . . . . . . . . .
71
Results for Quantitative Intrusiveness Study
8.1
Average Intrusiveness Scores for App Categories
8
75
. .
75
8.2
9
. . . . . . .
78
8.2.1
Intrusiveness for Games Apps . . . . . . . . . . . . . . . . . .
79
8.2.2
Intrusiveness for Photography Apps . . . . . . . . . . . . . . .
83
8.2.3
Intrusiveness for Messaging Apps . . . . . . . . . . . . . . . .
86
8.2.4
Intrusiveness for Social Apps . . . . . . . . . . . . . . . . . . .
88
Intrusiveness for Individual Apps within App Categories
93
Discussion
9.1
Qualitative Intrusiveness Study
9.2
Quantitative Intrusiveness Study
. . . . . . . . . . . . . . . . . . . . .
93
. . . . . . . . . . . . . . . . . . . .
95
97
10 Conclusion
. . . . . . . . . . . . . .
97
10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
10.1 Sum mary
. ..
. . . . ..
. ..
. . . . . ..
99
A Privacy Fingerprints
99
A.1
Privacy Fingerprints for Game Apps .....
A.2
Privacy Fingerprints for Photography Apps . . . . . . . . . . . . . . .
105
A.3
Privacy Fingerprints for Messaging Apps . . . . . . . . . . . . . . . .
109
A.4 Privacy Fingerprints for Social Apps
..................
. . . . . . . . . . . . . . . . . .
119
B AppWindow Code
B.1 Modifications to Android APIs
B.2
114
. . . . . . . . . . . . . . . . . . . . .
119
Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
121
9
10
List of Figures
2-1
Instrusiveness Assessment Interface . . . . . . . . . . . . . . . . . . .
27
4-1
AppWindow Architecture.........................
41
6-1
Privacy Fingerprint for Badoo - Meet New People . . . . . . . . . . .
59
7-1
Heatmap of locations accessed by Google Maps over 4 weeks, collected
while user was idle
7-2
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
The user's level of activity is estimated by the average number of data
accesses logged by AppWindow each day. . . . . . . . . . . . . . . . .
7-3
69
User activity is estimated by the average number of data accesses
logged by AppWindow each hour. . . . . . . . . . . . . . . . . . . . .
7-4
67
70
Interaction patterns over a week are estimated by counting up the
number of times a given bluetooth device is logged by AppWindow
each day. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-5
72
Interaction patterns over a day are estimated by counting up the number of times a given bluetooth device is logged by AppWindow each
hour.........
.....................................
73
8-1
Privacy Fingerprint showing the data access behavior for Angry Birds
82
A-1
Privacy Fingerprint showing the data access behavior for Angry Birds
100
A-2
Privacy Fingerprint showing the data access behavior for Bow Man
100
.
A-3 Privacy Fingerprint showing the data access behavior for Cut the Rope
Free
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
101
A-4 Privacy Fingerprint showing the data access behavior for Fishing Star
101
A-5
Privacy Fingerprint showing the data access behavior for Flow . . . .
102
A-6
Privacy Fingerprint showing the data access behavior for Fruit Ninja.
102
A-7
Privacy Fingerprint showing the data access behavior for Paper Toss
103
.....................................
app.........
A-8
Privacy Fingerprint showing the data access behavior for Scramble Free 103
A-9
Privacy Fingerprint showing the data access behavior for Temple Run
104
A-10 Privacy Fingerprint showing the data access behavior for UNOO . . .
104
A-11 Privacy Fingerprint showing the data access behavior for Camera360
U ltimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105
A-12 Privacy Fingerprint showing the data access behavior for FxCamera
106
A-13 Privacy Fingerprint showing the data access behavior for Instagram
106
A-14 Privacy Fingerprint showing the data access behavior for Photo Grid
107
A-15 Privacy Fingerprint showing the data access behavior for PicsArt
Photo Studio
-
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107
A-16 Privacy Fingerprint showing the data access behavior for Pudding
Camera
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
108
A-17 Privacy Fingerprint showing the data access behavior for Facebook
M essenger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109
A-18 Privacy Fingerprint showing the data access behavior for GO SMS Pro 110
A-19 Privacy Fingerprint showing the data access behavior for Google Voice 110
A-20 Privacy Fingerprint showing the data access behavior for Handcent SMS 111
A-21 Privacy Fingerprint showing the data access behavior for Kakao Talk
111
A-22 Privacy Fingerprint showing the data access behavior for Pinger . . .
112
. . .
112
A-24 Privacy Fingerprint showing the data access behavior for WhatsApp .
113
A-23 Privacy Fingerprint showing the data access behavior for Skype
A-25 Privacy Fingerprint showing the data access behavior for Badoo - Meet
New People
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
114
. . .
115
.
115
A-26 Privacy Fingerprint showing the data access behavior for Bump
A-27 Privacy Fingerprint showing the data access behavior for Facebook
12
A-28 Privacy Fingerprint showing the data access behavior for Foursquare
116
A-29 Privacy Fingerprint showing the data access behavior for Google+ . .
116
A-30 Privacy Fingerprint showing the data access behavior for WhatsApp .
117
A-31 Privacy Fingerprint showing the data access behavior for MeetMe M eet New People . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A-32 Privacy Fingerprint showing the data access behavior for Skout
A-33 Privacy Fingerprint showing the data access behavior for Twitter
13
.
117
.
118
. .
118
14
List of Tables
2.1
Android Permissions for Hypothetical Social Hangout Apps . . . . . .
24
4.1
List of Sensitive Datatypes logged by AppWindow . . . . . . . . . . .
40
6.1
Intrusiveness Weights for Sensitive Access Contexts . . . . . . . . . .
56
6.2
Distribution of access contexts for Badoo . . . . . . . . . . . . . . . .
58
7.1
Percentage of Idle Access Activity . . . . . . . . . . . . . . . . . . . .
66
8.1
Ranking of App Categories by Average Intrusiveness
. . . . . . . . .
76
8.2
Average Intrusiveness Subscores by Category . . . . . . . . . . . . . .
76
8.3
Ranking of App Categories by Average Intrusiveness Subscores . . . .
77
8.4
Ranking of Game Apps by Decreasing Intrusiveness Score . . . . . . .
79
8.5
Intrusiveness Subscore Breakdown of Game Apps
. . . . . . . . . . .
80
8.6
Rankings of Game Apps by Intrusiveness Subscores . . . . . . . . . .
80
8.7
Effect of Idle Access Activity on Overall Intrusiveness Ranking of
Gam es Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
8.8
Intrusiveness Scores for Photography Apps . . . . . . . . . . . . . . .
84
8.9
Intrusiveness Subscore Breakdown of Photography Apps
. . . . . . .
84
8.10 Ranking of Photography Apps by Intrusiveness Subscores . . . . . . .
85
8.11 Effect of Idle Access Activity on Overall Intrusiveness Ranking of Photography Apps
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
8.12 Intrusiveness Subscore Breakdown of Messaging Apps . . . . . . . . .
87
8.13 Ranking of Messaging Apps by Intrusiveness Subscores
87
15
. . . . . . . .
8.14 Effect of Idle Access Activity on Overall Intrusiveness Ranking of Messaging Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
. . . . . . . .
89
8.16 Intrusiveness Subscore Breakdown of Social Apps . . . . . . . . . . .
89
8.17 Ranking of Social Apps by Intrusiveness Subscores . . . . . . . . . . .
90
8.15 Social Apps Ranked by Decreasing Intrusiveness Score
8.18 Effect of Idle Access Activity on Overall Intrusiveness Ranking of Social
A pps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
B.1 Android API methods that were modified for AppWindow . . . . . .
120
16
Chapter 1
Introduction
1.1
Background
In today's world, where data is the new currency, social networks the new playing
ground, and mobile phones the new vehicle for information exchange, privacy is becoming obsolete. As consumers increasingly rely on smartphones for their daily needs,
more of their personal information is placed onto mobile devices, providing apps and
third party advertisers with a wealth of privacy sensitive information to mine and profile user behavior. Indeed, numerous studies [7, 20, 41, led by academic researchers and
investigative journalists alike have confirmed consumer fears over potential privacy
breaches of their data.
Most consumers voluntarily offer their personal data in return for free and useful
services, oftentimes without sufficient realization of the privacy risks that accompany such personal information disclosures. This lack of understanding, however, can
hardly be blamed on app users. They have been brought up in an environment that
offers users little choice and control over what types of data they disclose, and little
visibility into how they are used once obtained.
Various efforts have been made by app developers and mobile platforms like iOS
and Android to increase transparency in order to gain user trust, but most of these
mechanisms are ineffective, and at best, incomplete. Android's permission framework
[1] requires developers to specify each piece of sensitive data (e.g., fine and coarse
17
grain location, device id, personal contacts, etc.) required to run their application
software, asking smartphone users for all the necessary permissions upon installation.
However, permission descriptions are rarely read by users, and even when they are,
rarely understood, most likely due to their length and abundant technical jargon [10].
This lack of user interest and comprehension may be motivated by the fact that users
have little choice in installing an Android app. They must accept all permissions,
however overreaching, or be precluded from using the app altogether.
iOS conducts its own review process for content, quality, and security, but its
criteria is only available to developers and unavailable for public perusal.
Thus,
it does not enhance transparency for users. Given that numerous popular iPhone
apps [4, 20] were exposed as collecting sensitive information (device id, contact data)
unnecessary for app functionality, it is dubious how effective Apples review process
actually is at filtering out rogue apps. Currently, iPhone apps are only required to
present notifications for user consent to gather location information, while accesses
to other types of potentially sensitive data (device id, contact info, etc.) occurs in
the background unbeknownst to the user. Nevertheless, the notification scheme is
received positively by consumers, although it often fosters a false sense of assurance,
as many users subsequently mistake location to be the only type of personal data
collected by the app [15].
Both platforms also support Privacy Policies posted by app companies, but these
policies are often consumed by legalese, rendering them inaccessible to the typical
user who is eager to explore the functionalities of a new app and does not have the
patience to sift through 60-page documents in fine print (and even finer print on a
tiny smartphone screen) about legal matters.
1.2
Our Research
With privacy breaches (intentional or not) hitting the news every couple of months
and the lack of supporting transparency structures that effectively educate consumers
and assuage their privacy concerns, there is a compelling need for mechanisms that
18
clearly and concisely help users gain a holistic and accurate understanding about the
sensitive access behaviors of their apps, empowering them to make better, informed
decisions about choosing which apps to install.
We tackle the challenge of improving transparency for smartphone apps by focusing on the intrusiveness component of assessing privacy risk. Specifically, the purpose
of this thesis is to provide a framework for qualitatively assessing and quantitatively
measuring the intrusiveness of apps based on their sensitive data access behavior, so
that users can easily and effectively assess and compare the relative intrusiveness of
apps. This work is accomplished in three high-level steps, each motivating a different
component of our study:
1. We determine the types and patterns of data accesses performed by apps.
2. We qualitatively assess the intrusiveness of such data collections.
3. We construct a framework to quantify the intrusiveness of apps so that their
intrusiveness can be easily assessed and compared by users.
The first step is accomplished by building AppWindow, a context-aware information monitoring system constructed from existing Android Open Source Code
that logs every instance of privacy sensitive data accessed by an app (See Table 4.1
in Chapter 4 for a comprehensive list of the sensitive datatypes logged by AppWindow). We also log contextual information for each data request, because it is crucial
for appropriately assessing the level of intrusiveness for each data access (See Section
4.2) for the list of context information logged by AppWindow).
The second step is reached by the first of two experimental studies: the Qualitative Intrusiveness Study. We use this study to confirm the prevalence of data
accesses performed under unexpected, and thereby intrusive, conditions. To justify
the importance of this study, we assess the privacy implications of long-term intrusive accesses. We do this by performing aggregate analysis on data collected over
one month from one smartphone user, simulating the powerful inferences that can be
gleaned by an app about the user's movement, activity, and interaction patterns based
on the location data collected by the app and accompanying contextual information.
19
The third step is achieved by the Quantitative Intrusiveness Study. We
handpick about 40 Android apps from the top 20 apps listed in each of 4 app categories
(Games, Messaging, Photography, Social) from the Google Play Store and rigorously
test each app under a set of predefined usage conditions, using AppWindow to monitor
all sensitive data accesses. We break down an app's data accesses into a set of access
contexts, assigning each a different Intrusiveness Score based on the type of sensitive
data accessed and the privacy-relevant usage contexts under the data was collected.
We use these weighted access contexts to generate a Privacy Fingerprint, visually
capturing the sensitive data access patterns unique to each app. We then quantify
the intrusiveness of these data accesses by computing a single-number Intrusiveness
Score based on a normalized sum of the weighted access contexts, so that users can
easily and accurately assess the relative intrusiveness of apps when they choose which
ones to install (See Section A in the Appendix for the Privacy Fingerprints generated
for all the tested apps).
1.3
Contribution
Together, the Privacy Fingerprint and Intrusiveness Score make significant improvements to existing transparency mechanisms in the following ways:
Our framework quantifies an app's intrusiveness.
Current transparency mechanisms (e.g., Android Permissions Model, TaintDroid)
focus on qualitatively assessing an app's potential privacy risk by identifying sensitive
data accesses. Our research probes deeper into an app's access behaviors, quantifying
the intrusiveness of access contexts based on various privacy-relevant usage factors.
We use these access contexts, weighted by intrusiveness, to compute a single-number
Intrusiveness Score, which can be used to efficiently compare the relative intrusiveness
of apps.
The Intrusiveness Score is calculated based on real data accesses performed by apps.
20
Unlike Android Permissions Lists and official app Privacy Policies which provide
overly technical and ambiguous cues into possible sensitive data accesses, the Intrusiveness Score is computed based on an app's real access behavior collected under
various privacy-relevant usage contexts, using the empirical testing data logged by
AppWindow.
The Privacy Fingerprint and Intrusiveness Score are concise representations.
The visual representation of data access patterns and single score for intrusiveness are concise and intuitive for users to understand. Users are no longer required
to sift through inaccessible technical jargon and lengthy Privacy Policies and Terms
of Service agreements to understand what kind of data apps are collecting and under
which privacy-relevant usage conditions they are collecting it.
The Intrusiveness Score accounts for various intrusive usage contexts.
We quantify intrusiveness not only as a function of the sensitivity of data accessed,
but also the sensitivity of usage contexts under which the data is accessed.
For
example, data retrieved while a user is idle is considered intrusive, because it is likely
unexpected by the user. We also deem data accessed under moving conditions more
intrusive than those retrieved while the user is not moving, because the app is able
to glean more about a user's movement profile under the former condition.
1.4
Thesis Roadmap
In the following thesis, we present both the technical and conceptual framework for
assessing the intrusiveness of smartphone applications, identifying important intrusiveness factors and evaluating their effect on the relative intrusiveness of apps. We
first motivate the need for a better transparency framework by presenting a concrete
scenario that illustrates the limitations of current mechanisms, namely Android's Permission Model. We follow this hypothetical, yet realistic scenario-driven discussion
with a comprehensive review of related work, including user studies on the effective-
21
ness of existing transparency frameworks, information monitoring frameworks, access
control frameworks, context-aware privacy frameworks, and an approach for quantifying data access patterns of apps. We highlight the contribution and limitations of
each study.
Next, we take a step back from the conceptual discourse and delve into the implementation details behind AppWindow, explaining how it was built and how it
monitors the data access activities of applications. We use AppWindow to log the
sensitive data requests made by apps in our two experimental studies (Qualitative
Intrusiveness Study, Quantitative Intrusiveness Study), and we describe the methodology and experimental design decisions for each. Key findings from each study are
then presented and discussed. For the Qualitative Intrusiveness Study, we present
the powerful inferences that can be made by apps about a smartphone user's movement, activity, and interaction patterns. For the Quantitative Intrusiveness Study, we
explain how the Intrusiveness Score is computed, justifying the inclusion of various
contributing factors. We compare and contrast the Intrusiveness Scores of various
apps in four categories (Games, Photography, Messaging, Social), highlighting trends
in intrusiveness within and across each app category. Most notably, we find that
Messaging apps are the most intrusive out of the four app categories, whereas Photography apps, on average, are far less intrusive. We also find that most of the tested
apps, especially in the Messaging and Social categories, exhibit significant idle access
activity, which significantly affects the relative intrusiveness rankings of apps within
each category. Finally, we summarize our findings and qualify our contributions,
suggesting areas for improvement and further research.
22
Chapter 2
Motivating Scenario
We describe a scenario that illustrates the limitations of the Android Permission
Model in promoting app transparency to users. We walk through the internal motivations and concerns of one hypothetical Android user named Anna when she is
searching for an app that recommends fun hangout places for her and her friends.
We then discuss how the Privacy Fingerprint and Intrusiveness Score can be used to
assuage her privacy and usability concerns, enabling her to make a more informed,
confident decision in choosing which app to install.
Note to reader: Though hypothetical, our scenario reflects the concerns of users
interviewed in numerous studies conducted to assess the effectiveness of Android's
Permission Model[10, 15, 14, 9]. We focus on Android apps for the scenario because
our current implementation of AppWindow is tied to the Android framework. However, our conceptual framework for the privacy fingerprint is generalizable to any
smartphone platform.
2.1
Scenario Description
As a new freshman in college, Anna is interested in exploring the party scene, but
she and her new friends are all unfamiliar with the area. Anna's parents just bought
her an Android smartphone as a going-away present, so she decides to use an app
to find popular hangout places in the city. Anna knows there are several well-known
23
apps that provide this service, so she goes to the Google Play Store to find the best
one. Anna types in "social hangout" into the search bar, and she is presented with
a list of candidate apps. She focuses on the top 3: FriendHangout, PlacesForYou,
and HotSpots. Each has more than 100,000 Android smartphone users. Anna reads
the reviews for all three. They are mostly positive, and the features look similar. All
three are rated 4.5 stars, but HotSpots has the least users. Anna still can't make a
decision based on these metrics, so she clicks on another tab - "Permissions" - to see
what extra information it can offer. Here is a summary of what she sees:
Table 2.1: Android Permissions for Hypothetical Social Hangout Apps
FriendHangout
PlacesForYou
HotSpots
(123,143 users)
(145,345 users)
(121,341 users)
Coarse location
Coarse location
Coarse location
Fine location
Fine location
Fine location
Contact data
Contact data
Contact data
Calendar data
Send SMS
Change wifi state
Device id
Record audio
Browsing data
Take pictures and videos
Modify/delete
USB stor-
Coarse location
age and SD card content
Anna notices that all three apps access fine and coarse location and contact data.
Beyond that, they all request different information, but she does not know how to
assess the further permissions. There are descriptions displayed under each permission
of varying length, but she tires of reading them because most of them are rather
long and technical. For example, she sees the following description for fine location
permission (as opposed to coarse):
Access fine location sources such as the Global Positioning System on
the tablet, where available.
Malicious apps may use this to determine
where you are, and may consume additional battery power. Access fine
location sources such as the Global Positioning System on the phone,
24
where available. Malicious apps may use this to determine where you are,
and may consume additional battery power.
Anna only reads the first two sentences and then quickly loses focus because the
description is too long.
The word "malicious" strikes her as a bit disconcerting,
but she reasons that the app is safe because almost all of the popular apps require
the Fine Location permission. She glances over the other permissions required by
the three apps under consideration and is confused by the practical implications of
"Modify/delete usb storage and SD card content" and "Change wifi state".
Eager to try out the app, Anna decides to install PlacesForYou because it has
the most users and received good ratings.
She also likes the idea that it allows
users to send SMS and share pictures and videos (as deduced from the Permissions
List). Though the permissions would give the app access to much of her personal
data (contact, sms, pictures, location), Anna reasons that given its popularity and
high ratings, the app cannot be doing anything too egregious, lest the company lose
consumer confidence. She also naively assumes that a big and reputable company
like Google must perform some sort of review process for all the apps uploaded onto
Google Play Store, as Apple does for iPhone, in order to filter out malicious and rogue
apps that may try to sell her personal data for money.
2.2
Scenario Discussion
Anna's experience demonstrates the ineffectiveness of the Android Permissions List in
conveying an app's privacy risk. Indeed, the permissions list barely raises any privacy
concerns because Anna bases her trust on the app's positive reviews, popularity, high
ratings, and Google's reputation [15]. If anything, she uses the list as an indicator of
features, not privacy risk. Furthermore, the inaccessibility of the Android Permissions
List, though intended to provide greater transparency, does not effectively promote
understanding about an app's access activities. On the contrary, the interface actually
facilitates misunderstanding as a consequence of user's resulting ignorance. Without
a clear grasp of the apps' sensitive access behaviors, Anna falls back on a false sense
25
of security, causing her to misplace trust in the Android platform and its applications.
To summarize, we identify the following limitations of the Android Permission
Model as barriers to user understanding:
" Long descriptions (Don't want to read it)
" Technical jargon (Don't understand it)
" Naive trust in platform (Might as well trust it)
" Hard to compare apps (How do I pick one over the other?)
In the next section, we show how the Privacy Fingerprint and Intrusiveness Score
can be used to overcome these hurdles, instilling greater knowledge and assurance in
smartphone users over the sensitive behavior of their apps.
2.3
Benefit of Privacy Fingerprint and Intrusiveness Score
We argue that when used together, the Privacy Fingerprint and Intrusiveness Score
are more effective than the existing Android Permission Model at promoting transparency. We cite the following reasons:
" Concise (Very little explanatory text, primarily a visual and a numerical score)
" Minimal technical jargon (Focus on intrusiveness and not understanding technical terms)
" Informed trust (More understanding about how apps access data)
" Enables easy comparison of apps (Intrusiveness Scores are easy to compare)
Imagine that instead of perusing through a lengthy Permissions List cluttered
by verbose descriptions, Anna clicks on a tab called "Intrusiveness Assessment" for
PlacesForYou and is presented with the interface shown in Figure 2-1. The first thing
26
I
PlacesForYou
ile
a
1
Intrusiveness Score
ame (sw
18.7
with Similar Apps
itdCompare
Intrusiveness
Ranking
o Mn
App Name
itrusiven
Score
1
PlacesForYou
18.7
3
FriendHangout
17.2
20
HotSpots
11.0
Io
0Z C73 I
proportion of data accesm
Figure 2-1: Instrusiveness Assessment Interface
that catches her eye is the big, bolded Intrusiveness Score displayed at the top right
corner. Not knowing what this number means, she quickly scans the page and finds
the comparison chart immediately below it, with an arrow pointing to the current app
under consideration. There, she finds that PlacesForYou is the most intrusive out of
the apps that offer similar services. Whereas FriendHangout is only two spots behind,
HotSpots is ranked significantly lower in intrusiveness. Thus, the Intrusiveness Score,
when combined with the comparison chart, gives Anna a good sense of the relative
intrusiveness of social hangout apps.
Although Anna finds the score comparison helpful, she does not know what this
"intrusiveness" means, nor how it is calculated. She now looks to the visual representation (Privacy Fingerprint) of the app's data access behavior displayed on the
left-hand side of the interface.
The long bands nearing the bottom of the finger-
print indicate that PlacesForYou spends the most time accessing contact information. Since the bands are at the bottom, they also represent more intrusive accesses.
Anna quickly glances over the other data types displayed in bold, and finds that the
app also accesses gps, phone number, and device id. Moving up the visual, she is
27
surprised, and a bit spooked, to find that all of this information is gathered when
she's not even using the app, with 61% of the accesses taking place while the user
is idle. She considers this practice to be sneaky and unwarranted and wonders how
this access behavior compares to those of FriendHangout and HotSpots, which rank
lower in intrusiveness. Anna now clicks on the link for HotSpots in the comparison
chart at the bottom right-hand corner, which takes her to another similar interface.
There (not shown), she finds that HotSpots only accesses device id information and
has an idle access percentage of 10%, significantly lower than PlacesForYou's 61%.
Now, Anna has a better intuitive understanding of why PlacesForYou was ranked so
much higher in intrusiveness than HotSpots. HotSpots accesses fewer pieces of sensitive information, as shown by the fewer number of bands in the Privacy Fingerprint
(1 vs. 5). Furthermore, HotSpot's access of device id is not as intrusive as those of
contact info, phone number, and gps performed by PlacesForYou (indicated by its
higher vertical placement in the fingerprint). Finally, most of the sensitive information gathered by HotSpots occur while the user is active, unlike PlacesForYou, which
accesses the majority of its data while the user is idle.
Anna now considers all the information she has at hand about each of the three
apps - comparable reviews, comparable ratings, comparable popularity (although
PlacesForYou has about 20,000 more users than HotSpots), but vastly different Intrusiveness Scores. The Intrusiveness Assessment Interface has helped to differentiate
the otherwise similar apps on one important factor - intrusiveness, thereby simplifying Anna's decision-making process. Instead of going with PlacesForYou based on her
original rationale of greater popularity, she now compromises that factor in favor of
reduced intrusiveness and chooses to install HotSpots. Her increased understanding
of the apps' access behaviors not only enlightens her to the surprising realization that
apps access data while users are idle, but also gives her comfort and assurance that
the app she has chosen is not accessing her more personal information (sMs, phone
number, etc.) behind her back and sending them to advertisers. Anna now feels confident that she has installed an app that provides great performance and usability,
with minimal intrusiveness.
28
2.4
Descriptions of the Privacy Fingerprint and
Intrusiveness Score
We spent the last section walking through a scenario and explaining how the Privacy
Fingerprint and Intrusiveness Score can improve the transparency of sensitive app behavior, aiding in quicker, more accurate risk assessment during the decision-making
process of choosing apps. Now, we provide a a brief discussion on what these components are and how they are generated. The step-by-step details of this computation
are further explained in Chapter 6.
The Intrusiveness Score captures the aggregate intrusiveness of data accesses performed by the app in a single, impressionable number. This concise yet potent metric
helps smartphone users make quick yet accurate assessments about the relative intrusiveness of apps, facilitating the decision-making process. To compute the Intrusiveness Score, we divide the data accesses of each app into unique access contexts, assigning each context an intrusiveness weight, based on various factors that contribute
to intrusiveness. These factors include the sensitivity of data collected (determined by
how personally identifiable a particular data type is), the percentage of total accesses
accounted for by that particular access context, and the privacy-relevant usage conditions under which the data is accessed (user moving vs. user not moving, user active
vs. user idle). The intrusiveness formula finally distills these various intrusiveness
factors into one number, so that the intrusiveness of apps can be easily compared.
Accompanying the numerical metric of intrusiveness is its necessary complement the Privacy Fingerprint. It is used as an explanatory visual aid for the Intrusiveness
Score, efficiently capturing the rationale behind each app's Intrusiveness Score to
facilitate understanding and aid the decision-making process in cases of vacillation
caused by consideration of other app factors besides intrusiveness (e.g., performance,
usability, popularity). Generated from empirical testing data collected under a set of
predefined usage conditions, the Privacy Fingerprint provides a concise yet holistic
representation of the data access patterns unique to each app. By displaying the
29
breakdown and intrusiveness of each of an app's data access contexts (composed of
the type of sensitive data collected and the privacy-relevant usage conditions under
which it is obtained), the Privacy Fingerprint allows smartphone users to quickly
grasp the various factors that lead to different and similar Intrusiveness Scores. Thus,
the fingerprint provides Anna with greater transparency and understanding into an
app's access behavior, details which are otherwise masked by the Intrusiveness Score.
To summarize, this newly proposed Intrusiveness Assessment Interface for smartphone apps provides a transparency mechanism that is concise, informative and holistic, and enables easier comparison across apps, a significant improvement upon existing tools (e.g., Android Permissions List). By encouraging user engagement, these
improved features increase the likelihood that users like Anna will more accurately
understand the access behavior of apps and appropriately assess and compare the
intrusiveness of apps during their decision-making process.
In the long term, we hope increased engagement with the Intrusiveness Assessment Interface will push app developers to minimize the frequency and volume of
sensitive data they collect in order to maintain a competitive advantage over similar
app companies.
30
Chapter 3
Related Work
Here we discuss work that has been done in the field of smartphone transparency and
permission control. We first motivate the need for better transparency mechanisms
by discussing the ineffectiveness of existing transparency models. Then we examine
proposed solutions to this problem, highlighting their contributions and their limitations and explaining how our study differs from each. We then look at quantitative
approaches that can be used to aid our work in developing a framework for measuring
intrusiveness.
3.1
Current
T
fransparency Mechanisms
Recent user studies have shown that existing transparency frameworks on the Android and iOS platforms are largely ineffective in conveying an app's true actions,
motivating the need for improved transparency mechanisms that will increase user
understanding and facilitate more accurate risk assessment of apps.
Felt et al [10] interviewed and observed 25 Android users to assess the effectiveness of Android permissions at warning users about the potential risks of installing
applications. These studies showed that only 17% of participants even looked at permissions. Moreover, only 21% of those who did accurately understood their content.
42% were completely unaware of Android's Permission Lists, leading the authors to
conclude that current Android permissions warnings do not help most users make
31
security decisions. While the platforms goal is to inform the user of the capabilities
[their] applications have [1], the study's findings suggest that the permissions system
is an ineffective way of providing transparency to users.
King [15] interviewed 24 iPhone and Android users about their privacy expectations of both their apps and the smartphone platform. The author found there was a
noticeable "privacy gap" among "users' privacy expectations, smartphone usage, and
the current information access practices by application developers". The interviews
showed that the majority of Android users demonstrate an over-reliance on existing assurance structures, wrongly believing that Google reviews all applications for
quality and security control before posting them in the Google Play Store for users
to download. Furthermore, only 2 out of the 13 Android users felt they understood
what the permissions granted after reading them, suggesting the language may have
been ambiguous or overly technical. Despite their lack of understanding however,
the majority of the Android interviewees expressed that they liked being asked permission about personal data collection.
This response indicates that transparency
efforts are indeed appreciated by users. King's study also showed that participants
were more comfortable sharing their real-time location, device ID, and location history over photos, address book, call logs, text messages and files stored on SD cards,
with iPhone users selecting location twice as often as Android users. The author
posits that the iOS runtime prompt for accessing location has made iPhone users
more aware of (and perhaps less sensitive to) location requests. Overall, King's study
shows that current transparency efforts, though appreciated by users, have created
confusion and caused them to overestimate the actual amount of assurance provided
by platforms over protecting users against unscrupulous apps.
Kelley et al [14] also conducted interviews with 20 Android users in two American
cities to assess their understanding of Android permissions.
The study found that
while users generally view and read the Android permissions list before installing an
app, the permissions themselves are not well understood. Based on the interviews,
Kelly et al partly attributes this lack of understanding to language that is "at best
vague, and at worst confusing, misleading, jargon-filled, and poorly grouped". Thus,
32
the permission descriptions are not likely to be a significant factor in making the
decision to install an app. Furthermore, interviewees showed difficulty describing
the possible harm that could be caused by applications collecting and sharing their
personal information, highlighting another area of mobile app privacy design that can
be improved. Our profiling study shows what types of deep inferences can be made
by seemingly benign data accesses when performed over a longer period of time (4
weeks in this study) and quantifies this level of profiling potential in our Intrusiveness
Score.
3.2
Information Monitoring Frameworks
Several tools have been built to monitor the sensitive information collected by applications on the Android platform. While these frameworks provide a good foundation for elucidating the types of data collected by smartphone applications, their
use is currently limited to the identification of certain and possible undesirable transactions, rather than holistic assessments of an app's overall intrusiveness based on
aggregate analysis of real-time accessed data. Furthermore, most of these systems
[7, 13, 12, 11, 21, 6] fail to log contextual information about data accesses, an important factor in determining privacy intrusion.
Fuchs et al [11] developed a tool called ScanDroid that provides automated security certification of Android Applications. It performs incremental checks on apps
during install-time to detect possible security breaching behavior. It extracts security
specifications from manifests and checks whether data flows through those applications are consistent with those specifications.
ScanDroid is limited to install-time
checking, so it cannot detect leaks that are outside the scope of installation.
Enck et al [7] created Taintdroid as an information tracking tool for privacy sensitive smartphone data. It uses dynamic taint analysis to track privacy data flow on a
smartphone and any leaks to third party servers. Unlike ScanDroid, TaintDroid can
also detect leaks that are not just limited to install-time activity. Like AppWindow,
Taintdroid is a modification of Android Open Source and can be used to track the
33
information flow of existing Android apps. However, unlike AppWindow, Taintdroid
lacks contextual information for its taints (e.g., location, surrounding bluetooth device information, screen mode, current running application), which we use later to
generate the Privacy Fingerprints and calculate the Intrusiveness Scores for apps.
Egele et al [6] created a similar tool for iOS, called PiOS, which applies static
analysis on iOS apps to detect possible privacy leaks to third parties.
PiOS first
constructs the control flow graph of an iOS app and then determines whether there
exists an execution path from the nodes that access privacy source to the nodes of
network operations. If such a path exists, PiOS deems there to be a potential privacy
leak. While this system can be used to identify potentially undesirable app behavior,
the leak is found based on static analysis, and not holistic analysis of real-time data
accesses, which our privacy fingerprint provides. Furthermore, like Taintdroid, PiOS
does not use contextual data to identify these privacy leaks, and the severity of these
leaks (most often device ID) may change according to the usage context and varying
levels of privacy sensitivity across users. In our study, these limitations are overcome
by the Privacy Fingerprint.
Not all monitoring frameworks lack context awareness, however. Researchers at
the MIT Media Lab developed Funf [16], an open-source extensible sensing and data
processing framework for mobile devices. Funf allows developers to use various sensor
probes to log information about context and sensory data related to data transactions. It also provides visualizations (e.g. heatmaps, frequency stats) on data usage
given the collected contextual information. Although Funf can be used to contextually monitor data accesses by apps, it is not compatible with existing Android apps.
Funf's focus is to provide developers with tools for context-aware data collection,
analysis, and visualization when they create new apps, instead of providing transparency mechanisms for smartphone users to evaluate the actual activities of existing
apps.
ConUCON [2] can also log contextual information for data accessed by apps, but
unlike Funf, it targets the user rather than the developer. ConUCON provides users
with fine grain control of Android permissions and a well-defined policy enforcement
34
framework which uses contextual information to resolve user-specified policies relating to data protection and resource usage control during runtime. The framework
monitors accesses to resources, data and files along with context types that are relevant to the users specified policies (e.g. temporal, spatial, battery, signal strength,
acceleration, Bluetooth state, WiFi state, CPU utilization, and memory amount).
While ConUCON can be used as a context-aware information monitoring system for
existing Android apps, the purpose of its monitoring is to enhance user control over
application data, rather than elucidate the access activities of applications, which is
the focus of our study.
3.3
Using Transparency to Increase User Control
Many other studies like ConUCON have built monitoring systems to empower users,
affording them finer grain control over the specific permissions required by each app
at runtime [19, 22, 12, 13]. Some of these systems [13, 3] alert the user to potentially
undesirable information leaks, giving them greater insight into what apps are doing
with sensitive data, while others provide an interface for users to define their own
policies that prevent the disclosure of certain types of sensitive data under specified
contexts [19, 22]. It is imperative to note that controlling disclosure is out of the scope
of our thesis, because we believe the majority of smartphone users are ill-equipped
to intelligently and meaningfully decide which data accesses they ought to revoke or
allow (they always click OK) [15], given their lack of comprehension of the existing
Android Permissions [10]. Also, revoking data collection may reduce and impair app
performance [13].
3.4
Improving Transparency Interfaces
Lin et al [17] proposes a new privacy interface that uses privacy expectations to
model the privacy concerns of users. The study uses crowdsourcing techniques (on
179 Android users through Amazon Mechanical Turk) to capture users' expectations
35
of which types of sensitive data are requested by apps and then uses the crowdsourced
data to display the most unexpected accesses for a given app. By operating under a
model of privacy expectations, the interface aims to correct misconceptions of users in
order to allay privacy concerns in cases of uncertainty and misunderstanding, as well
as educate them about the real types of data disclosed by apps in order to aid them
in making better trust decisions. Like our study, Lin et al's work aims to provide
a quantifiable indicator for privacy risk by measuring user expectations of various
types of data accesses. They include the purpose of data collection (e.g., major app
functionality, minor supporting app functionality, targeted advertising) as a context
of their data accesses, whereas we capture usage contexts. Both contexts are relevant
to quantifying intrusiveness, and the two studies can be combined to enhance the
coverage and accuracy of the overall Intrusiveness Score.
3.5
Quantifying App Behavior
While many studies have qualitatively assessed the risk of apps by uncovering instances of unexpected data collections and detecting potentially malicious behavior
[8], very little has been done to quantify such risk for each app. One study [5] uses a
probabilistic framework to quantify usage behaviors unique to each user of a "bag" of
Android apps. This "bag-of-apps" model robustly represents the level of phone usage
over specific times of the day. Whereas this empirical approach was used to identify
the unique usage behaviors of users, we use it to quantify the data access patterns
unique to each application in order to create each app's Privacy Fingerprint. We then
distill the fingerprint data into a single-number Intrusiveness Score.
Lin et al [17] used crowdsourcing to capture users' expectations of which sensitive resources are used by various Android apps, as well as to ask users to guess for
which purpose(s) (e.g., major app functionality, minor supporting app functionality,
targeted advertising) the data are used. The study uses TaintDroid to track the destinations of sensitive data transfers in order to determine the likely purpose of certain
data accesses and compares this "ground truth" purpose to the purposes guessed by
36
users. Then, users were asked to rate their comfort level from +2 (very comfortable)
to -2 (very uncomfortable) of a given data access and its purpose (e.g. location is used
by Toss it for target advertising). Researchers correlated the comfort level with the
level of users expectation and correct identification of the purpose. Findings suggest
that both user's expectation and the purpose of why sensitive resources are used have
a major impact on users' subjective feelings and their trust decisions. We believe
this study is a valuable starting point in quantifying the sensitivity of certain data
accesses. Currently, we only account for the identifiability of the datatype and the
intrusiveness of various usage contexts. Adding the user's comfort level would make
the intrusiveness calculation customized to each user's privacy sensitivity. This factor
would add a further dimension to our Intrusiveness Score, and Lin's study provides an
excellent starting point for quantifying this subjective feeling. Another major finding
is that properly informing users of the purpose of resource access can ease their privacy concerns to some extent, particularly in the case where users expectations of app
behavior are more intrusive than they actually are (e.g., expected behavior is targeted
advertising, but actual behavior is support for minor feature of an app). This finding further motivates the need for our research and the development of an interface
that clearly and concisely conveys an app's privacy-relevant data accesses, in order
to aid users in more effectively assessing privacy risk as well as assuage unnecessarily
paranoid fears.
37
38
Chapter 4
AppWindow Architecture
In this chapter, we explain the implementation details behind AppWindow, an information monitoring framework we developed from existing Android Open Source code
to track all sensitive data requests made by Android apps. AppWindow is composed
of two major components:
1. Monitoring Framework
2. Logger Application
4.1
Monitoring Framework
The current version of AppWindow's Monitoring Framework is a light modification of
existing Android framework source code, version 2.3.4.rl (Gingerbread). To monitor
the sensitive data accessed by apps, we placed additional logging code into public API
methods of various Android framework modules responsible for managing potentially
privacy sensitive information (see Figure 4-1).
In this thesis, we define privacy sensitive information (18 total) to include:
39
Table 4.1: List of Sensitive Datatypes logged by AppWindow
Sensitive Datatypes
sms
sim serial number
audio
imisi
photo
msisdn
contact name
device id
contact email
wifi info (SSID, BSSID, RSSI, ip address, mac address)
contact phone number
cell location
browsing info
sensor accelerometer
voicemail
sensor rotation vector
gps location
bluetooth info (device address, name)
For more information about which Android classes and public API methods were
modified and the correspondence of specific data types to publicly accessible methods,
please see Section B of the Appendix.
We also created our own LoggingManager module as part of Android's existing
framework to manage the logging information captured for each data access. The
LoggingManager packages up the type and content of sensitive data accessed, along
with the requesting app's name into an Android Intent and sends it to the dedicated
Logger Application residing on the client device.
4.2
Logger Application
We create the Logger App as an Android application in order to store the sensitive
data accesses in a dedicated sqlite database on the smartphone. The Logger App logs
a transaction record for each privacy sensitive request made by any app installed on
the smartphone. This transaction record includes:
" name of requesting application
" datatype requested
40
"
time of request
" contextual screenmode (on/off)
" contextual app status (foreground/background request)
" address and name of nearby bluetooth devices
" contextual location (longitude, latitude)
4.3
Use Case
Android Smartphone
Android Framework
4
Logger
LoggingManager
Sqlite db
3
r-------------------LocationManager
I
SWifiManager
Android Apps
:3
Google Maps
0
TelephonyManager
1
Evernote
BluetoothService
-- - -- - - -- - - -- - -
SensorManager
ContentResolver
IL
Foursquare
5
Cut the Rope
Figure 4-1: AppWindow Architecture.
Components outlined in red represent structures that are either a modified version
of existing Android Source Code or newly created for AppWindow. The Monitoring
Framework is shown on the left side of the vertical line, and the Logger App is displayed on the right.
Figure 4-1 shows how the two components (Monitoring Framework and Logger
Application) fit together in the the overall architecture of AppWindow.
41
We now
use the diagram to walk through a use case with the sample app FriendHangout
(See Chapter 2) in order to more clearly convey how AppWindow monitors and logs
sensitive information requested by apps. Each step below corresponds to a process
number shown in the architecture diagram.
1. FriendHangout wants to access a user's location so that it can recommend
nearby hangout places. It calls the getLastKnownLocation(
function available
through the Android API.
2. The LocationManager module, which is part of the existing Android Framework,
is responsible for returning the cached location to FriendHangout.
3. Our modified version of LocationManager passes the datatype (gps location) of
the accessed data to the LoggingManager.
4. The LoggingManager adds contextual information (time of access, name of requesting application) to the information received from LocationManager and
packages all the data into an Android Intent. It sends this intent to the Logger
App (written for AppWindow) installed on the smartphone. The Logger App
receives the intent and writes its contents, along with additional contextual
information (screenmode on/off, app in foreground or background, location,
bluetooth info) gathered by the Logger App itself, into a sqlite database.
5. The LocationManager returns the last known location to FriendHangout.
In summary, AppWindow provides a "window" into the sensitive access behavior
of existing Android apps. It not only logs privacy sensitive requests made by apps,
but also records privacy-relevant context information corresponding to each data request, so that the intrusiveness of each access can be more accurately assessed (See
Chapter 6 for details on how this is done). Like TaintDroid, AppWindow serves as a
monitoring tool, but it does not figure out the destination of data transfers, just their
content and usage context. While the contextual information logged by AppWindow
can ultimately be used to resolve user-defined policies regarding an app's collection
42
of personal data [19, 22], the current prototype does not include a policy enforcement
module. We believe that once smartphone users are sufficiently and accurately informed of apps' sensitive data access behaviors and the privacy implications involved,
they will be better equipped to make smart and meaningful decisions about which
sensitive permissions to revoke or allow, making the use of policy enforcement add-ons
more relevant and useful.
43
44
Chapter 5
Intrusiveness Studies
In this chapter, we present the methodology and design decisions for two experimental
studies conducted to assess the intrusiveness of apps. The first study is the Qualitative
Intrusiveness Study, wherein we aim to 1) confirm the prevalence of unexpected,
intrusive data accesses by popular Android apps and 2) assess the long-term privacy
implications of these intrusive accesses by examining the inferences that can be made
about a user's behavioral profile. The second study is the Quantitative Intrusiveness
Study. For this experiment, we test about 40 popular Android apps across 4 different
app categories under pre-defined usage contexts and use the empirical data collected
to construct a framework for quantifying the intrusiveness of apps.
5.1
Qualitative Intrusiveness Study
We conducted the Qualitative Intrusiveness Study to confirm the prevalence of unexpected, intrusive data accesses and gain a preliminary glimpse into the powerful
inferences that can be made about a users personal profile as a consequence of those
intrusive accesses. Results from the profiling analysis can be found in Chapter 7.
45
5.1.1
Experimental Design
We simulate this profiling exercise by tracking the data logged by AppWindow for
one user over the course of one month for each app installed on the smartphone.
AppWindow (See Section B.2 in Appendix for link to source code) was installed onto
a Google Nexus One smartphone and handed over to the user. The user was a female
college graduate student, aged 23, and familiar with Android devices. No specific app
installation requirements were imposed on the user. She used the Nexus One as her
only phone for using apps for one month, therefore simulating real usage conditions.
The user initially selected 15 apps chosen from the Google Play Store to install on
her phone based on personal needs, but some were later removed due to resource and
performance constraints. In Chapter 7, we walk through the profile inferences that
can be made about a smartphone user's hourly and daily movement, activity, and
interaction patterns, based on data collected for one of 15 apps installed by the user.
To keep the discussion simple yet potent, we walk through the profiling capabilities
of only one app - Google Maps. We focus on Google Maps because we believe that
its data accesses afford the most interesting analysis, as the app runs as a background
service and collects an enormous amount of data even absent direct interaction from
the smartphone user. Though not comprehensive, we hope that the discussion will
provide a good preliminary sense of the profiling capabilities of apps.
5.1.2
Profiling Simulation
We break down profiling capabilities into three main categories:
1. Movement profiling
2. Activity profiling
3. Interaction profiling
Movement Profiling
We define a movement profile to be a person's daily movements from place to
place. We create this profile in order to gain a better sense of a smartphone user's
46
movement habits and most frequented locations.
Movement profiling is achieved by performing analysis on the location data (either
gps or wifi) gathered by AppWIndow. We use the longitude and latitude provided by
gps location accesses to pinpoint the smartphone user's precise location. Wifi offers
less precision, but we still manage to approximate the user's position based on the
SSID, RSSI, and mac address packaged in each wifi info request. Using these fine and
coarse location data, we create an animated map of movement patterns for each of the
8 most used applications, showing the user's locations during each hour of the day,
aggregated and averaged over one month. We also use the location data to identify
location hotspots and use the time of day to predict the location type (e.g., home,
work, leisure). This information is ultimately used to provide educated guesses about
a users location at a given time of the day, as well as day of the week (e.g., weekday
vs. weekend).
Overall, this profiling exercise demonstrates that apps can infer an
incredible amount of information about a user's movement patterns and frequented
places, allowing them to peer into many personal aspects of their users' lives.
Activity Profiling
An activity profile refers to a person's active and dormant patterns over the course
of a day and week. We create an activity profile in order to capture a user's level of
activity over the course of a day and week, coupling patterns of activity levels with
contextual location data to estimate typical wake times, sleep times, work times, and
leisure times.
Activity profiling is accomplished by plotting the number of data accesses at different times of the day (segmenting each hour of the 24-hour day into time buckets
- morning, afternoon, evening, night). If location or approximate location (inferred
from wifi information) was logged, an app can estimate the wake-up, work, sleep,
and leisure times each day based on the location type (e.g., home, work, other).
The app can then aggregate the daily logs over each week to estimate the patterns of
activity for a given day of the week (e.g., different patterns for weekday vs. weekend).
47
Interaction Profiling
An interaction profile refers to a person's interaction patterns. We create a interaction profile to get a better sense of a user's acquaintances and likely times of
interaction, identifying acquaintances based on the physical proximity of their bluetooth devices to the smartphone user.
Specifically, we use contextual bluetooth information to determine which devices
(often including the owners first and last name), and thus, people are near the smartphone user, using contextual location and time to guess their relationship to the user
(e.g., Device A, owned by Alice Smith, is often detected when the user is at a location
determined to be her workplace. Thus, we infer that Alice is a work acquaintance).
Although bluetooth data was not gathered by any of the apps that the user installed
on her phone for the user study, we present the profiling potential of bluetooth data
as an aside to demonstrate the inference potential of apps that do use bluetooth data.
5.2
Quantitative Intrusiveness Study
Whereas the Qualitative Intrusiveness Study provides a more visual and holistic
overview of the various profiling capabilities of smartphone apps, the Quantitative
Intrusiveness Study focuses on breaking down this profiling potential into measurable
units, so that the intrusiveness of different apps can be directly and meaningfully
compared. We report the intrusiveness scores and data access patterns in Chapter 8.
5.2.1
Experimental Design
Herein we study the data access patterns of 8-10 apps in the top 20 apps listed for
each of 4 Android app categories: Games (10 apps), Messaging (8 apps), Photography
(9 apps), and Social (9 apps), for a total of 36 apps.
In deciding which apps to test, we chose to focus on depth (testing more apps
within a category) rather than breadth (testing more categories), so that the Intrusiveness Scores of similar apps (clustered by app category) can be compared. We
assigned two research assistants familiar with the study to test the 36 apps under 4
48
predefined privacy-relevant usage contexts:
1. User idle, user moving
2. User idle, user not moving
3. User active, moving
4. User active, not moving
Reasons for using these conditions are explained in Chapter 6. For the "user idle"
usage contexts (1, 2), the experimenter clicks on the app so that it appears at the
foreground and then turns the screen off, representing the idle state. For the "user
active" usage contexts (3, 4), the experimenter explores all features that could activate the permissions listed for the app as they appear on the apps Google Play Store
page. For the "moving" usage contexts (1, 3), we require the experimenters to move
around while using the app and inversely, they must stay in one location for tests
that are conducted under the "not moving" contexts (1, 2).
Time Constraints
For each app, experimenters spent a minimum of 5 minutes testing out features
(some took longer than others, e.g., Games apps) and a minimum of 10 minutes testing
idle times. We extend the time period for idle testing based on the assumption that
apps make sparser requests when the user is not using the app in order to save battery
life.
49
50
Chapter 6
Quantifying Intrusiveness
We use the empirical data collected from the two experimental studies to develop a
framework for quantifying intrusiveness. In this chapter, we discuss the details of our
approach, including the methodology for generating the visual Privacy Fingerprint
and the computational steps involved in calculating the Intrusiveness Score. At the
end of the chapter, we discuss how these two metrics complement each other in helping
to assess the overall and relative intrusiveness of apps.
6.1
Intrusiveness Factors
We aim to measure the intrusiveness of each app by accounting for two main factors
that contribute to the intrusiveness of each data access. These intrusiveness factors
are:
1. Usage factors
2. Datatype identifiability
6.1.1
Usage Factors
Based on the results from the Qualitative Intrusiveness Study, we hypothesized that
an app's data accesses are triggered by certain usage patterns.
important usage factors below:
51
We identify two
1. User's activity status for the requesting app (active or idle)
2. User's movement status at the time of the data request (moving/not moving)
These usage factors are relevant to calculating intrusiveness because accesses triggered
under certain usage conditions may have varying levels of intrusiveness. For example,
if an app is accessing a user's location while her activity with respect to the app is
idle (usage factor 1), then we deem that to be a highly intrusive data access. This
is because accesses occurring during a user's idle period are most likely unexpected.
Additionally, if an app's request for location data is triggered every time the user
moves (usage factor 2), the app gets more information about her movement patterns
than if the access was only triggered based on user activation of a certain location
feature.
6.1.2
Datatype Identifiability
Datatype identifiability quantifies how easily a particular type of data can be used
to determine a person's identity. This concept is related but not equivalent to the
concept of Personally Identifiable Information (PII), an official term coined by the
U.S. National Institute of Standards and Technology to refer to:
any information about an individual maintained by an agency, including
(1) any information that can be used to distinguish or trace an individuals
identity, such as name, social security number, date and place of birth,
mothers maiden name, or biometric records; and (2) any other information
that is linked or linkable to an individual, such as medical, educational,
financial, and employment information. [18]
Whereas PII only includes data that are directly linkable to personal identity, datatype
identifiability, as we define it, is a characteristic of any type of sensitive data, including
those that can not directly be tied to personal identity. Thus, datatype identifiability
casts a broader net than PIH for sensitive data.
52
For example, a phone number can easily be tied to one specific individual and
is considered an example of PIT. In contrast, a single gps location is not useful in
ascertaining the user's identity, on her location at point in time. Thus, gps is not
considered a PII. However, we argue that over time, gps location can tell a story about
a user's movement patterns, provide telling clues (e.g., locations that are occupied
for long periods of time during the morning and evening are likely to be places of
residence) that strongly point to a singular person's identity.
Thus, though a single piece of gps data would not be useful in identifying a user, an
aggregation of data collected over a longer period of time would aid in identification.
However, because a phone number is more easily tied to a person's identity, it would
rank higher in datatype identifiability than a singular gps location. In general, PII
have higher datatype identifiability than sensitive data that are not PII.
6.2
Formulating the Privacy Fingerprint
The Privacy Fingerprint serves as a visual complement to the numerical Intrusiveness
Score, providing an explanatory aid for understanding which intrusiveness factors contribute to the final single-number score. Specifically, the Privacy Fingerprint provides
a compact yet holistic visualization of the data access patterns unique to each app.
Below, we first explain the high-level approach for generating the Privacy Fingerprint
visualization and then elaborate on each step involved.
6.2.1
High-level Approach
First, we generate a list of contexts for data accesses based on the 2 intrusiveness
factors (usage factors, datatype identifiability) described in the previous Section 6.1.
Second, we assign an intrusiveness weight to each access context based on several
heuristics and rank the list of contexts based on increasing intrusiveness.
Third,
we compute the frequency distribution of the relevant access contexts for the app.
Fourth, we create a visual representation of an app's data access patterns, placing
the ranked contexts on the vertical axis and the percent contribution as bars (or
53
fingerprint bands) on the horizontal axis.
6.2.2
Step 1: Generating Access Contexts
The list of access contexts are generated by computing all possible combinations of
usage factors and datatypes. We call each of these contributing factors context
factors, which we list below:
" Datatype and their corresponding data identifiability weights (19 values)
" Users app activity status (2 values)
" Users movement status (2 values)
Combining each of the factors, we get a total of 76 unique contexts (19 x 2 x 2).
Some of these contexts are trivial, so we remove them from consideration.
Specif-
ically, datatypes that are not location-aware (sms, contact, phone number, email,
audio, photo, device id, sim serial number, msisidn, voicemail, browsing, imsi, sensor accelerometer, sensor rotation) will never be triggered due to a user's movement.
There are 28 of these access contexts (14 datatypes x 1 moving status x 2 app activity
statuses). We remove these from the context list, leaving 48 remaining contexts (See
Table 6.1 for a list of these contexts along with their intrusiveness weights).
6.2.3
Step 2: Assigning Intrusiveness Weights to Each Context
We assign intrusiveness weights by ranking the contexts in terms of increasing intrusiveness. To determine a relative ranking of intrusiveness, we rank the contexts by
sorting the list of 48 contexts on each of the 3 context factors shown in Step 1, in the
following order:
1. User's app activity status (active before idle)
2. Datatype identifiability tied to each datatype (in ascending order)
54
3. User's movement status (not moving before moving)
First we sort the list by the user's app activity status (idle or active), because we
believe this context factor to be the most relevant to intrusiveness. Second, we sort
the list on the datatype identifiability associated with each datatype, capturing the
sensitivity of that particular datatype access.
Thirdly, we sort on the movement
status, because we believe that factor to be the least intrusiveness out of all the
context factors. Using this sorting process, we are able to generate a newly ranked
list of contexts, assigning an intrusiveness weight of 1 to the least intrusive context
(at the bottom of the list) and incrementing by one every time we encounter a new
combination of context factors as we move up the list. The final context weighting is
shown in Table 6.1. Note that certain contexts are weighted the same because they
have the same data identifiability, movement status, and app activity values.
55
Table 6.1: Intrusiveness Weights for Sensitive Access Contexts
Context ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Datatype
Identifiability
Moving
User Activity
Context Weight
sms
audio
photo
contact
email
phone number
browsing
voicemail
gps location
gps location
device id
sim serial
imsi
msisdn
wifi
wifi
cell location
cell location
neighboring cell
ip address
neighboring cell
ip address
sensor acce
sensor rot
sms
audio
photo
contact
email
phone number
browsing
voicemail
gps location
gps location
device id
sim serial
imsi
msisdn
wifi
wifi
cell location
cell location
neighboring cell
ip address
neighboring cell
ip address
sensor acce
sensor rot
10
10
10
10
10
10
9
9
5
5
5
5
5
5
4
4
3
3
2
2
2
2
1
1
10
10
10
10
10
10
9
9
5
5
5
5
5
5
4
4
3
3
2
2
2
2
1
1
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
yes
no
yes
no
yes
yes
no
no
yes
yes
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
yes
no
yes
no
yes
yes
no
no
no
no
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
22
22
22
22
22
22
21
21
20
19
19
19
19
19
18
17
16
15
14
14
13
13
12
12
11
11
11
11
11
11
10
10
9
8
8
8
8
8
7
6
5
4
3
3
2
2
1
1
56
6.2.4
Step 3: Computing the Frequency Distribution of Access Contexts
To calculate the frequency distribution of the access contexts for a given app, we first
assign each data access to 1 of the 48 available context buckets. We then compute
the proportional frequency pi of each context bucket i by taking the number of data
accesses fi falling into that particular context bucket i and dividing
number of data accesses conducted by the app E
fj.
fi
by the total
In other words: pi =
The distribution of access contexts for a sample app (Badoo) is shown on the next
page.
57
Table 6.2: Distribution of access contexts for Badoo
Context ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Datatype
Identifiability
Moving
User Activity
Context Weight
sms
audio
photo
contact
email
phone number
browsing
voicemail
gps location
gps location
device id
sim serial
imsi
msisdn
wifi
wifi
cell location
cell location
neighboring cell
ip address
neighboring cell
ip address
sensor acce
sensor rot
sms
audio
photo
contact
email
phone number
browsing
voicemail
gps location
gps location
device id
sim serial
imsi
msisdn
wifi
wifi
cell location
cell location
neighboring cell
ip address
neighboring cell
ip address
sensor acce
sensor rot
10
10
10
10
10
10
9
9
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
yes
no
yes
no
yes
yes
no
no
no
no
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
yes
no
yes
no
yes
yes
no
no
no
no
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
idle
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
active
22
22
22
22
22
22
21
21
20
19
19
19
19
19
18
17
16
15
14
14
13
13
12
12
11
11
11
11
11
11
10
10
9
8
8
8
8
8
7
6
5
4
3
3
2
2
1
1
5
5
5
5
5
5
4
4
3
3
2
2
2
2
1
1
10
10
10
10
10
10
9
9
5
5
5
5
5
5
4
4
3
3
2
2
2
2
1
1
58
p
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.02
0.00
0.18
0.00
0.01
0.30
0.15
0.17
0.00
0.00
0.16
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.04
0.10
0.09
0.00
0.07
0.00
0.00
0.53
0.01
0.10
0.00
0.00
0.06
0.00
0.00
0.00
6.2.5
Step 4: Visualizing the Privacy Fingerprint
We now use the ranked list of contexts and percentage contributions to generate the
Privacy Fingerprint, a visualization that captures the data access patterns unique to
each app. The visual is essentially a bar graph that plots the ranked contexts on
the vertical axis by increasing intrusiveness and the percent contribution as bars (or
fingerprint bands) on the horizontal axis.
Below, we show the Privacy Fingerprint generated for Badoo based on the data
shown in Tables 6.1 and 6.2.
Badoo
idle (49%)
to
active (51%)
me *cbr cMl lcatan (not movn)
eihbohig cEM laction (not moving
#A.
cml hame.not maunni
cal locamon (uoving
LP.
cAl bmlmIanmotnww.gJ
cdl ica6m (mviad
wi (nt monivg)
wiN (natmoviWg]
dfl Imovigl
LW
device id
device id
S(not movins
0
O IOS.T
O.
1
0 0.5
0.75
1
proportion of data accesses
Figure 6-1: Privacy Fingerprint for Badoo - Meet New People
59
Computing the Intrusiveness Score
6.3
The Intrusiveness Score quantifies the intrusiveness of an app's sensitive data accesses.
We calculate the Intrusiveness Score based on the sum of weighted contexts (48 contexts total) used to generate the Privacy Fingerprint, normalized by the proportional
frequency of idle data accesses and active data accesses.
Below we provide the formula for calculating intrusiveness:
6.3.1
Intrusiveness Score Formula
k/2
X
i*
k
Wn* pn + a* Z Wn*pn
n=1
(6.1)
n=25
Variables
x
=
an app's overall intrusiveness
i
=
% of all data accesses that were conducted during the user's idle period
a
=
% of all data accesses that were conducted during the user's active period
k
=
total number of access contexts (in this study, k = 48)
Wn
p=
=
intrusiveness weight assigned to the ith context
the fraction of an app's total data accesses belonging to the ith access context
In words, we first assign an intrusiveness weight to each access context using its
corresponding context weight. We then compute the contribution of each access context to the app's overall Intrusiveness Score by multiplying the context intrusiveness
weight by by the proportion frequency, pi, of that particular access context.
This
decimal value is generated through short-term empirical testing of the app. We then
compute the aggregate intrusiveness of all access contexts falling under the user's
idle period E2 4n
* pn. We call this the Idle Intrusiveness Subscore. We do the
same for the intrusiveness of all access contexts falling under the users active period
n=25 Wn * Pn, which we call the Active Intrusiveness Subscore. We normalize the Idle
Intrusiveness Subscore by the proportion frequency of all idle accesses (determined
60
empirically), and we normalize the Active Intrusiveness Subscore by the proportion
frequency of all active accesses (determined empirically). Finally, we add these two
normalized subscores to arrive at the overall Intrusiveness Score for the app.
6.3.2
Intrusiveness Subscores
While a single Intrusiveness Score is more concise and easier to interpret for smartphone users, we break down this score into 2 subscores - Idle Intrusiveness Subscore
and Active Intrusiveness Subscore - to assess the impact made by the user's app
activity on the relative intrusiveness ranking of apps. Specifically, we use these subscores to assess how an app's level of idle access activity, which has been overlooked in
previous transparency studies (e.g., TaintDroid, AppFence, AppScanner), influence
the relative intrusiveness ranking of apps within the same category. To quantify this
impact on ranking, we first generate a relative ranking of apps based on the Active
Intrusiveness Subscore alone. We then generate a relative ranking based on overall
intrusiveness, which factors in an apps level of idle access activity. The impact of
relative intrusiveness is reflected in an app's subsequent fluctuation in ranking. We
report the findings of such impact in Chapter 8.
6.4
Discussion
The Privacy Fingerprint and Intrusiveness Score are complementary metrics that help
to assess and compare the intrusiveness among apps.
Used alone, the Privacy Fingerprint provides a holistic visual of the breakdown of
the data access patterns that are unique to each app. By glancing at an app's Privacy
Fingerprint, a smartphone user can quickly see which types of sensitive data are
collected by the app, under which privacy-relevant usage contexts they are accessed,
and the distribution of those accesses. The placement of the fingerprint bands, each
representing a different type of data access (differing either in type of data collected or
condition under which they are collected), also provide a quick sense of the relative
intrusiveness of different types of data accesses, so that an app with longer bands
61
nearing the bottom of the fingerprint, signifying accesses of higher intrusiveness, is
suggested to be more intrusive than an app with shorter bands nearing the top of the
fingerprint, indicating accesses of lower intrusiveness.
However, when apps have more similar types of data access patterns (e.g., apps
within the same category), their relative intrusiveness is harder to gauge from the
Privacy Fingerprint alone.
This is where the Intrusiveness Score offers additional
insight. As a single numerical score, it provides a clearcut comparison of the intrusiveness of apps, so that smartphone users can get an immediate sense of which apps,
out of those under consideration, are the most and least intrusive.
To compare further, the Privacy Fingerprint and Intrusiveness Score are both generated from the same set of sensitive accesses and intrusiveness factors, thus reflecting
the same level of intrusiveness for a given app. However, due to the difference in presentation, the two metrics aid the intrusiveness assessment process in different ways.
The Privacy Fingerprint displays more information about the data access behavior of
an app through the fingerprint visual, whereas the Intrusiveness Score captures the
intrusiveness of that access behavior in a single number. Thus, for smartphone users
wishing to make quick judgments based on relative intrusiveness alone, the Intrusiveness Score can help them arrive at a fast decision about which app to install, without
needing to understand the types of data accesses that are involved. For users who
are more curious and selective about which types of data are accessed by apps, the
Intrusiveness Score is insufficient, because the supporting rationale for that score is
hidden in a black box. For these users, the Privacy Fingerprint offers an explanatory aid to the Intrusiveness Score, as well as details about the app's data accesses
which users can use to arrive at their own decisions about intrusiveness. For example, PlacesForYou has a higher Intrusiveness Score than FriendHangout (See Section
2.1 in the Motivating Scenarios Chapter) mainly because the percentage of idle accesses is higher. However, FriendHangout accesses more types of sensitive data than
PlacesForYou. Even though our score deems PlacesForYou to be more intrusive than
FriendHangout, a user may come to the opposite conclusion, being more concerned
with breadth of datatypes accessed than the proportion of accesses that are made
62
during while the user is idle. Thus, in addition to providing more transparency and
understanding about an app's data accesses, the Privacy Fingerprint also allows users
to use that data access behavior to customize their own intrusiveness calculations,
reflecting their own intuitions about intrusiveness. Current transparency mechanisms
(Android Permissions List, TaintDroid) provide information about the types of sensitive data collected but do not summarize the implications of those accesses so that
users can understand the meaning and impact of those accesses. We provide meaning
to the identification of sensitive data accesses by measuring their intrusiveness under
varying privacy-relevant usage contexts.
Together, the Privacy Fingerprint and Intrusiveness Score facilitate better understanding of the sensitive data access behavior of mobile apps, aiding smartphone users
in making more informed and efficient assessments about the relative intrusiveness of
apps when they are choosing which apps to install.
63
64
Chapter 7
Results for Qualitative
Intrusiveness Study
In this chapter, we report the preliminary results obtained from the Qualitative Intrusiveness Study. First, we confirm our hypothesis about the prevalence of unexpected
accesses by reporting the level of idle access behavior for 9 apps installed on the
user's smartphone. Next, we gauge the intrusiveness of these unexpected accesses by
performing a profiling simulation on the access patterns of the app with the highest
level of idle access behavior - Google Maps, walking through the possible inferences
that can be made about the user's movement, activity, and interaction profiles.
7.1
Unexpected Accesses
In this section, we confirm our initial suspicion about the prevalence of unexpected
access activity performed by 9 popular apps tested over a period of 4 weeks by one
user in the Quantitative Intrusiveness Study.
65
Table 7.1: Percentage of Idle Access Activity
App
% Idle
Google Maps
93.30%
Antivirus
84.40%
Facebook
59.15%
Scramble Free
38.14%
Foursquare
30.42%
Cut the Rope Free
3.90%
Yelp
3.39%
Evernote
3.37%
Google Voice
0.00%
As shown in Table 7.1, more than half of the user's apps demonstrated significant
idle access activity, with 5 out of the 9 apps requesting sensitive data accesses at least
30% of the time. This behavior is most pronounced for Google Maps, performing a
whopping 93.30% of all accesses without interaction from the user. Antivirus follows
behind, with an idle access percentage of 84.40%. It makes sense that Antivirus may
want to perform many sensitive data accesses, because it must monitor the activity
of other apps to detect malicious behavior. However, continuous access behavior may
not be necessary for Google Maps. Although app developers may argue that continuous pinging of the user's location allows Maps to quickly provide an up-to-date
location once the user actually decides to use the app for navigation purposes, some
smartphone users may find this performance enhancement excessive and overly intrusive. Yelp, another app that relies on accurate and up-to-date locations to provide
its core functionality of recommending nearby businesses, makes fewer idle accesses
(3.39%) and is still highly rated in the Google Play Store. Yelp gets an overall rating
of 4.3, slightly lower than Google Map's 4.4. Though Yelp makes drastically fewer
location update requests while the user is idle, any possible performance deteriorating
is not reflected significantly in its user ratings. This suggests that Google could make
fewer location update requests without significantly undermining usability.
66
7.2
Profiling Simulations
In this section, we aim to qualitatively assess the profiling potential of Google Maps
by performing a series of profiling simulations on the user's movemeiit, activity, and
interaction patterns to get a better sense of just how "intrusive" unexpected accesses
can be.
7.2.1
Movement Profiling
Locations accessed by Google Maps over 4 weeks
densest
sparsest
Figure 7-1: Heatmap of locations accessed by Google Maps over 4 weeks, collected
while user was idle
Frequented Spots
We logged both coarse and fine location data accesses (wifi networks, cellular towers,
gps location) to generate a heatmap of geographic regions frequented by the user over
a period 4 weeks, while the user was not directly using the Google Maps application
67
(i.e., Maps was running as a background service).
The white spots indicate areas
that were most frequented, whereas the blue-green areas indicate spots that were less
frequented. Using this data, a mobile ad provider could easily ascertain that the user
is most likely an MIT student, given that the hotspots (white spots in the map) are
located throughout the MIT buildings and along a row of dorms. It is also likely that
the user does not own a car, given that she does not exhibit much movement beyond
a 5 mile radius from the center of the map shown above.
Movement Patterns
Using the same aggregate location data logged by AppWindow, we also generate an
animated map to capture the user's movement patterns during every hour of the day
(Go to: http://web.mit.edu/frango/Public/AppWindow/maps-animation.html for
the animation). We summarize these results in a heatmap (See Figure 7-1) that highlights frequented places with time labels so that we can identify likely locations for
residence and work. From this figure, we find that the user spends most evenings and
mornings in Region A, most likely to be her place of residence. Regions B and C are
frequented during the day, suggesting that they are most likely places of work. The
others are likely to be leisure places.
7.2.2
Activity Profiling
Google Maps runs as an application service on the smartphone, meaning that it
continuously makes API requests in the background even when the user is not using
the phone and/or not moving. From the data accesses logged by AppWindow, we find
that Google Maps requests location data more frequently when the user is moving
(most likely to keep the cache updated with the user's most recent location) and
active on the phone, and less frequently (accesses every 10 minutes or longer) when
the user is not active and not moving. Due to the continuous nature of this access
behavior, we can use the data accesses made by Maps to estimate the user's active and
dormant times. Below, we display two figures capturing the average access activity of
the Maps application during every day of the week (See Figure 7-2) and every hour
68
of the day (See Figure 7-3). We then use this access activity as an indicator for user
activity.
Maps: Average Data Accesses per Day (over 4 weeks)
I
200
2000
15m
E
Mon
Toe
Wed
Thu
FrI
Ut
n
Day of Week
Figure 7-2: The user's level of activity is estimated by the average number of data
accesses logged by AppWindow each day.
While there is noise (shown by the standard error bars) in the user's app activity
over individual days of the week, the weekend does seem to experience less activity
than the weekdays, especially Sunday. This suggests that the user is not likely using
her phone over the weekend (perhaps due to traveling and other activities). Tuesday
exhibits tremendous variance in the amount of activity. From an ad provider's perspective, it may be better to target the user with promotions that are more relevant
during the weekdays, since the user seems to user her phone more during those times.
69
Maps: Average Data Accesses per Hour (over 4 weeks)
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
lB
19
20
21
22
23
24
Hour of day
Figure 7-3: User activity is estimated by the average number of data accesses logged
by AppWindow each hour.
The dips indicate periods of dormancy, whereas peaks indicate high levels of activity. From Figure 7-3, we can guess that the user has a late sleeping schedule.
There is a noticeable drop in phone activity from the hours of 5 a.m. to 10 a.m.,
suggesting that she is asleep for most of that time. There is a jump in phone activity
at 4 a.m. (however, with huge variance), perhaps indicating a surge in phone activity
right before bedtime. There is another local dip of activity between 4 p.m. and 8
p.m., suggesting that she is occupied during the time (likely with work). From a
mobile ad provider's perspective, this user seems to be a late worker and late riser,
suggesting that she most likely does not have a normal 9 a.m. to 5 p.m. day job. She
is also awake during the very late hours of the night, indicating that she would be a
viable target for clubs or restaurants that are open late.
70
7.2.3
Interaction Profiling
We use the contextual data collected by AppWindow about the user's surrounding
bluetooth devices to identify likely acquaintances, as well as likely times of interaction.
It is important to note that none of the apps installed by the user in the Qualitative Intrusiveness Study actually accessed any surrounding bluetooth information.
However, we provide a section here to discuss the privacy implications of disclosing
bluetooth data, in order to better inform users of the profiling potential of accesses
that collect information about surrounding bluetooth devices.
Below we present two heatmaps that capture the user's interaction patterns during
every day of the week (Figure 7-4) and every hour of the day (Figure 7-5). We filter out
the most frequent interactions by identifying the top 25 bluetooth devices that appear
most often in the contextual bluetooth logging data. We then plot the frequency of
data accesses using the color gradient shown in the color key to the left of each
heatmap, where purple indicates most frequently logged bluetooth devices and blue
indicates less frequently logged devices. The surrounding bluetooth data is collected
by AppWindow every minute as long as the phone is turned on. Thus, the frequencies
displayed are independent of app usage and can be used as a telling indicator of the
frequency of interaction between the user and owners of the bluetooth devices.
71
Top 25 Bluetooth Devices: Frequency per Day
Jane Potts
John Jacob
Ainanda Wu
Patrick Mok
Dennis Kim
Ling Long
Charles Kain
Rajesh Gupta
Viktor Petrov
0-
500 j
Esteban Gonzales
Clstna Ibelt
Javier Torres
Max Gajewski
Ted Gassett
1000
1 5 00
J
2000
Hae-jin Kim
# Data Accesses
Leo Nobel
Allison Klroy
Tamika Villa
Isabella Bonacci
Esmerelda Imboden
Hans Schmidt
Phyllis Borski
JanaJohansson
Rodrigo Lopez
Jocelyn Nguyen
V
0
I-
.r
3-
S'
3
M
60
Day of Week
Figure 7-4: Interaction patterns over a week are estimated by counting up the number
of times a given bluetooth device is logged by AppWindow each day.
I*
72
Top 25 Bluetooth Devices: Frequency per Htour
UM
AnMWutfu
ftWk MWt
Doue* o
a
LWOW~
EhWW K~
0 -
I
5001
1000
LeO"Moft
1500
z
3
M
2000
MARn EK"
TmnMVM
k~ebs foncd
# Data Accesses
EaneffAof
"am
I
o
''
~iU)~D
~ ~-
D 00
MCr M
awpuif
Mqft swi
Wf
q-
q- T
~I~
CM%
Hour of Day
Figure 7-5: Interaction patterns over a day are estimated by counting up the number
of times a given bluetooth device is logged by AppWindow each hour.
Note: The top 25 device names in Figures 74 and 7-5 are changed to protect the
identity of their owners.
Likely Acquaintances
Owners of devices that correspond to purple grids are likely personal acquaintances
of the user, rather than random encounters. Specifically, those owners are most likely
work acquaintances given that the times of interactions are usually between 2 p.m.
and 6 p.m. (See Figure 7-5) and occur during the weekday (See Figure 7-4).
73
Likely Times of Interaction
From the Figure 7-4, we find that the user seems to experience the most interactions between Mondays and Wednesdays, with a greater emphasis on Wednesday,
possibly owing to more meetings scheduled on this day.
From the Figure 7-5, we find that 2 p.m. is the most frequent interaction period,
suggesting that the user is surrounded by more people during this hour. Coupling
this finding with the data presented in the Figure 7-4, we guess that there may be
2 p.m. meetings scheduled on Wednesday. This deduction is corroborated by the
activity figure (See Figure 7-3 shown in Section 7.2.2), which indicates a dip in app
activity at 2 p.m. This behavior makes sense if the user was occupied with work.
Overall, it seems that the user is most likely to interact with people (in the work
environment) between 2 p.m. to 9 p.m. on weekdays, specifically between Monday
and Wednesdays.
From a mobile ad provider's perspective, the interaction patterns can be used to
identify prime times for advertising social promotions. The ad provider can also use
the data to gain a better understanding of a user's network, by gauging how many
surrounding bluetooth devices are logged.
Furthermore, since the bluetooth device address is also extracted along with the
unique device name, ad providers can also use this to track users indirectly, through
the phones of other users. Thus, as long as the bluetooth is turned on, there is a
potential for indirectly leaking location and interaction information.
74
Chapter 8
Results for Quantitative
Intrusiveness Study
We now proceed to quantify the profiling potential examined in the previous chapter
by applying our Intrusiveness Formula (Equation 6.1) to the data accesses logged by
AppWindow for 8-10 apps in each of 4 Android App categories (Games, Photography,
Messaging, Social). We first report the average Intrusiveness Scores and Subscores
for each app category, highlight broad ranking trends among app categories. Then
we delve deeper into the Intrusiveness Scores and Subscores of each app, showing how
an app's level of idle access activity can affect the relative ranking of apps within
each category, thereby impacting users' risk assessments during the decision making
process of installing apps.
8.1
Average Intrusiveness Scores for App Categories
Here we report the average Intrusiveness Scores for each of 4 app categories tested in
the Quantitative Intrusiveness Study and provide a relative ranking based on those
scores in order to gain a broad sense of intrusiveness across app categories.
We
find that Messaging and Social apps are significantly more intrusive than Games
75
and Photography apps. This noticeable gap in intrusiveness reflects the breadth of
sensitive data accessed by Messaging and Social apps, as well as their prominence of
access activity occur during the user's idle periods, making their overall Intrusiveness
Scores much higher.
Table 8.1: Ranking of App Categories by Average Intrusiveness
Ranking
App Category
Intrusiveness Score
Standard Deviation
1
Messaging
14.89
3.43
2
Social
13.46
3.54
3
Games
8.91
4.64
4
Photography
5.81
5.05
For the lower ranked categories, Photography apps, on average, were by far the
least intrusive, with scores almost three times lower than the average Messaging app.
However, the lower average Intrusiveness Scores for Games and Photography apps
mask the more intrusive apps, as the distribution of scores for those categories are
higher than those of Messaging and Social apps (indicated by their higher standard
deviations).
Table 8.2: Average Intrusiveness Subscores by Category
App Category
Avg Active Score
Avg Idle Score
Avg Active %
Avg Idle %
Messaging
10.40
18.55
59.34%
40.66%
Social
8.94
19.95
60.05%
39.95%
Games
6.58
15.13
79.96%
20.04%
Photography
5.02
9.24
58.61%
8.05%
76
Table 8.3: Ranking of App Categories by Average Intrusiveness Subscores
Active
Idle
1
Messaging
Social
2
Social
Messaging
3
Games
Games
4
Photography
Photography
Ranking
Delving deeper into the Intrusiveness Subscores, we find that the relative intrusiveness rankings of Messaging and Social apps invert when idle access activity is
considered.
During the user's active periods, Messaging apps are ranked as more intrusive than
Social apps because Messaging apps spend a greater portion of their data requests
accessing more personally identifiable information. Specifically, sms data are often
a necessary access for Messaging apps, and these types of requests rank high on
the context intrusiveness scale.
In addition, most Messaging apps access contact
information from the user's address book in order to quickly find the recipients for a
text, as well as allow users to send photos and audio recordings through an sms or
mms. In contrast, while Social apps also access many highly identifiable information
such as contact, email, photo, and audio, none of them access sms data, which is a
crucial contributor to the Intrusiveness Score of all tested Messaging apps.
When the user is idle, however, Social apps rank higher than Messaging apps. As
the Privacy Fingerprints will show (See Section A.4 in the Appendix), Social apps
collected a greater number of sensitive data types during the idle period than Messaging apps. Whereas Messaging apps curbed the breadth of sensitive data accesses
from the active period, Social apps seemed to continue performing similar types of
data accesses into the idle period. As shown through the Privacy Fingerprints of
the Social apps, the data access patterns between active and idle fingerprints look
remarkably similar.
77
8.2
Intrusiveness for Individual Apps within App
Categories
We now provide the relative rankings of individual apps within each category, based
on the overall Intrusiveness Score of each app. As described in Section 6.3.1, the
overall Intrusiveness Score sums up the individual active and idle Intrusiveness Subscores, normalizing each by their percentage contribution to the total number of data
accesses. In addition to providing a relative ranking of apps within each category, we
also inspect their Intrusiveness Subscores (Active Intrusiveness Subscore, Idle Intrusiveness Subscore) to quantify just how much idle access activity impacts an app's
overall intrusiveness ranking within a category. The effect of idle access behavior on
relative intrusiveness is an important contribution of our study, as it demonstrates
how idle access activity, previously unassessed and unquantified, can significantly
impact the intrusiveness assessment of apps, thereby affecting users' decisions when
they choose which apps to install.
78
8.2.1
Intrusiveness for Games Apps
Table 8.4: Ranking of Game Apps by Decreasing Intrusiveness Score
Ranking
Games App
Intrusiveness Score
1
Angry Birds
18.51
2
Scramble
12.18
3
Fishing Star
10.61
4
Bow Man
10.08
5
Paper Toss
9.81
6
Fruit Ninja
8.23
7
UNOO
6.67
8
Cut the Rope
6.52
9
Flow
5.45
10
Temple Run
1.00
Games apps exhibited great variance in overall intrusiveness score, from scores as low
as 1.00 to scores as high as 18.51. This great spread in scores reflects the differences
in data access patterns of apps, as well as varying levels of idle access activity. For
more information about the different breakdowns of data accesses for each Games
app, please see the Privacy Fingerprints in Section A of the Appendix.
79
Table 8.5: Intrusiveness Subscore Breakdown of Game Apps
Game App
Active Score
Idle Score
Angry Birds
8.05
Scramble
Active %
Idle %
20.05
12.80%
87.20%
7.97
19.00
61.86%
38.14%
Fishing Star
7.36
19.00
72.13%
27.87%
Bow Man
8.09
19.29
82.28%
17.72%
Paper Toss
7.85
19.00
82.46%
17.54%
Fruit Ninja
8.05
20.00
98.46%
1.54%
UNO(
6.01
17.00
94.00%
6.00%
Cut the Rope
6.00
18.00
95.63%
4.37%
Flow
5.45
0.00
100.00%
0.00%
Temple Run
1.00
0.00
100.00%
0.00%
Table 8.6: Rankings of Game Apps by Intrusiveness Subscores
Ranking
Active
Idle
1
Bow Man
Angry Birds
2
Angry Birds
Fruit Ninja
3
Fruit Ninja
Bow Man
4
Scramble
Fishing Star
5
Paper Toss
Paper Toss
6
Fishing Star
Scramble
7
UNO®
Cut the Rope
8
Cut the Rope
9
Flow
Flow
10
Temple Run
Temple Run
UNO
While most Game apps made sensitive data requests during the users idle period,
most of these accesses only account for a small percentage of the apps total accesses (7
out of 10 apps exhibited idle accesses at most 20% of the time). However, Angry Birds
was the notable exception, with 87.2% of all its data accesses being made while the
80
user wasnt even playing the game. This prominent idle access activity, coupled with
the apps breadth of sensitive data accesses, landed Angry Birds as the most intrusive
app in the Games category, a pronounced 6 points ahead of the next app on the list.
This is because Bow Mans data accesses have greater context intrusiveness than Angry
Birds' (See Figure A-2 and Figure A-1 for their Privacy Fingerprints). Specifically,
in addition to accessing gps location (like Angry Birds), Bow Man also accesses the
user's browsing information, which ranks higher on the context intrusiveness than
any other data accesses made by Angry Birds.
Table 8.7: Effect of Idle Access Activity on Overall Intrusiveness Ranking of Games
Apps
Game Apps
% Change in Ranking
Angry Birds
10.00%
Cut the Rope Free
10.00%
Fruit Ninja
10.00%
Fishing Star
0.00%
Flow
0.00%
Paper Toss
0.00%
Temple Run
0.00%
Scramble
0.00%
UNOO
-10.00%
Bow Man
-20.00%
Note: A negative change denotes a drop in intrusiveness ranking, whereas a
positive change denotes a boost.
Angry Birds, Cut the Rope Free, and Fruit Ninja increased their relative intrusiveness rankings once idle access activity was factored into the intrusiveness score
calculation. Originally ranking second highest in active intrusiveness, Angry Birds
jumped to the number one spot due to the overwhelming contribution of its idle access behavior (87.2%), which is considered at least twice as intrusiveness as any data
access performed during a users active period. Despite having a high Idle Intrusive-
81
ness Subscore, Bow Man dropped two spots in overall intrusiveness ranking, because
its idle activity contribution (17.72%) was less than that of Fishing Star (27.87%),
which had a comparable Idle Intrusiveness Subscore. UNO® and Cut the Rope Free,
originally only differing by 0.01 points in overall intrusiveness, also switched positions
due the greater breadth of sensitive data accesses performed by Cut the Rope Free
during the idle period.
Angry Birds
idle (87%)
active (13%)
C
wil(N
Movibg)
ims
deAViVi
'PS
detke id
gp5
rmoving)
VS(MtP n-)
- -wis)
-
'PS-msig
IP
0 0.25 0.5 0.75 1 0 0.25 0.5 0.5 1
proportion of data accesses
Figure 8-1: Privacy Fingerprint showing the data access behavior for Angry Birds
From Angry Birds' Privacy Fingerprint above, a user can visually see the breakdown of sensitive datatypes accessed by the app, as well as get a general sense of the
relative intrusiveness between the user's active and idle periods. For example, we see
that Angry Birds accesses wifi, imsi, device id, and gps. When the user is active,
most of the accesses are devoted to static data, such as imsi and device id. When
82
the user is idle, however, there is a noticeable shift in data access patterns, with gps
location taking up most of the requests. Because there are longer bands nearing the
bottom of the fingerprint, the Privacy Fingerprint seems to suggest that Angry Birds
is more intrusive during idle periods, since contexts become more intrusive as you
move down the vertical axis.
8.2.2
Intrusiveness for Photography Apps
Photography Apps exhibited greater variance in Intrusiveness Scores than other categories. Specifically, 1/3 of the photography apps tested accessed no sensitive data
(PhotoEditor, PicsSay, Pixlr-o-matic). This accounted for its low ranking in category
intrusiveness. However, Photo Grid, exhibited very high intrusiveness, with a score
greater than all but one (Angry Birds) of the apps tested in the Games category.
This is due to the fact that PhotoGrid continually accesses the images of every contact from the user's address book, even during the idle period. Camera360, Pudding
Camera, and PicsArt all access the users gps location (presumably to tag photos),
accounting for their relative high Intrusiveness Scores. The good news is, most Photography apps do not perform significant access activity when the user is idle, with
idle data requests accounting for at most 20% of all data accesses and less than 5% for
half of the apps. This relatively low idle access activity accounts.for this category's
relatively low average Intrusiveness Score.
83
Table 8.8: Intrusiveness Scores for Photography Apps
Ranking
Photography App
Intrusiveness Score
1
Photo Grid
12.59
2
Instagram
11.00
3
Camera360
9.36
4
PicsArt
8.33
5
Pudding Camera
7.79
6
FxCamera
3.20
7
PhotoEditor
0.00
7
PicsSay
0.00
7
Pixl-r-omatic
0.00
Table 8.9: Intrusiveness Subscore Breakdown of Photography Apps
Photography App
Active Score
Idle Score
Active %
Idle %
Photo Grid
10.93
21.98
85.00%
15.00%
Instagram %
11.00
0.00
100.00%
0.00%
Camera360
7.23
18.16
80.49%
19.51%
PicsArt
7.77
12.00
86.67%
13.33%
Pudding Camera
7.24
19.00
95.35%
4.65%
FxCamera
1.00
12.00
80.00%
20.00%
PhotoEditor
0.00
0.00
0.00%
0.00%
PicsSay
0.00
0.00
0.00%
0.00%
Pixl-r-omatic
0.00
0.00
0.00%
0.00%
84
Table 8.10: Ranking of Photography Apps by Intrusiveness Subscores
Active
Idle
1
Instagram
Photo Grid
2
Photo Grid
Pudding Camera
3
PicsArt
Camera360
4
Pudding Camera
FxCamera
5
Camera360
PicsArt
6
FxCamera
Instagram
7
PhotoEditor
PhotoEditor
7
PicsSay
PicsSay
7
Pixl-r-omatic
Pixl-r-omatic
Ranking
Table 8.11: Effect of Idle Access Activity on Overall Intrusiveness Ranking of Photography Apps
App Name
% Change in Ranking
Camera36O
22.22%
Photo Grid
11.11%
FxCamera
0.00%
PhotoEditor
0.00%
PicsSay
0.00%
Pixl-r-omatic
0.00%
Pudding Camera
-11. 11%
PicsArt
-11. 11%
Instagram
-11. 11%
Camera360 climbed two spots in ranking once its idle access activity was factored
into the Intrusiveness Score. The ranking boost was caused by the app's relatively
high proportion of idle access activity (19.51%), the highest idle access percentage out
of all Photography apps tested. Instagram dropped in ranking because it exhibited
no idle access behavior, although its overall ranking still remained high due to the
85
higher identifiability of data (photo) it accessed during the idle period. This drop
in rank also promoted PhotoGrid's position in the overall ranking. Pudding Camera
dropped because of its low idle access activity (4.65%), and PicsArt dropped because
of the lower intrusiveness of data it accessed once the user became idle. Instead of
accessing gps, wifi, and device id, it only requested accelerometer data during the idle
period.
8.2.3
Intrusiveness for Messaging Apps
Messaging apps rank the highest in categorical intrusiveness due to the variety of
sensitive data requested (sms, photo, audio, contact, email, gps location, device id,
imsi, msisdn, wifi) and its prominent access activity during the user's idle period. For
half of the Messaging apps tested, idle request activity contributed to at least 40% of
all data accesses, with Kakao Talk and Pinger dominating the category with 83.46%
and 70.91%, respectively. Messaging apps also exhibit the smallest distribution in
Intrusiveness Score, suggesting that messaging apps exhibit similar types of data
access patterns.
Unlike the Photography and Games categories, Messaging apps
exhibit lower variance in intrusiveness scores. Google Voice and Facebook Messenger,
products of two of biggest and wealthiest content providers in the world, rank the
lowest on the intrusiveness scale. This low score results from the limited types of
sensitive data they collect, as well as the low contribution of idle access activity
(i5% of all sensitive data accesses).
Their conservative access behaviors seem to
suggest that data is only accessed when necessary app functionality. Google Voice
only accesses sensitive data (contact info, msisdn) while the user is active (and not
idle), suggesting that its requests are necessary for functionality. Facebook Messenger
accesses more types of sensitive data than Google Voice, but most of them involve
gps location. This reduces its relative intrusiveness ranking, because gps has a lower
context intrusiveness than the other more personally identifiable datatypes that are
more aggressively collected by other Messaging apps.
86
Table 8.12: Intrusiveness Subscore Breakdown of Messaging Apps
Active Score
Idle Score
Active %
Idle %
Kakao Talk
10.79
21.51
16.54%
83.46%
Pinger
10.40
21.95
29.09%
70.91%
WhatsApp
10.86
21.97
59.90%
40.10%
Go SMS Pro
10.90
21.96
51.54%
48.46%
Skype
10.91
21.99
62.39%
37.61%
Handcent
9.10
20.05
60.67%
39.33%
Google Voice
10.90
0.00
100.00%
0.00%
Facebook Messenger
9.35
19.00
94.55%
5.45%
Messaging App
Table 8.13: Ranking of Messaging Apps by Intrusiveness Subscores
Ranking
Active
Idle
1
Skype
Skype
2
Go SMS Pro
WhatsApp
3
Google Voice
Go SMS Pro
4
Whats App
Pinger
5
Kakao Talk
Kakao Talk
6
Pinger
Handcent
7
Facebook Messenger
Facebook Messenger
8
Handcent
Google Voice
87
Table 8.14: Effect of Idle Access Activity on Overall Intrusiveness Ranking of Messaging Apps
Messaging App
%Change in Ranking
Pinger
50.00%
Kakao Talk
50.00%
Handcent
25.00%
WhatsApp
12.50%
Facebook Messenger
-12.50%
Go SMS Pro
-25.00%
Skype
-50.00%
Google Voice
-50.00%
As shown in Table 8.14, idle access behavior has a significant impact on the relative
intrusiveness ranking of apps in the Messaging category. Kakao Talk and Pinger both
rose up half the list, even though neither of them rank within the top three for active
intrusiveness or idle intrusiveness. Nevertheless, it is the overwhelming contribution
of their idle access activity (83.46% for Kakao Talk and 70.91% for Pinger) that lands
them in the top 2 positions for overall intrusiveness ranking in the Messaging category.
Conversely, Google Voice drops down half the list due to the non-existence of its idle
access activity, while other apps like Handcent and WhatsApp climb up due to the
context intrusiveness and relative prominence of their idle access activity.
8.2.4
Intrusiveness for Social Apps
Like Messaging apps, Social apps exhibit relatively lower variance in their Intrusiveness Scores. This reflects the high level of homogeneity in datatypes accessed. Despite
this coherence in Intrusiveness Score however, we notice a clear dichotomy in the level
of idle access activity among Social apps. 5 out of the 9 apps tested had an idle access
percentage of at least 40%, with Twitter and Bump dominating at 79% and 72%, respectively. On the other end of the spectrum, each of the 4 remaining apps exhibited
significantly lower idle activity, accounting for at most 15% of all data accesses.
88
Table 8.15: Social Apps Ranked by Decreasing Intrusiveness Score
Ranking
Social App
Intrusiveness Score
1
Bump
18.81
2
Facebook
16.94
3
Twitter
16.76
4
Foursquare
15.06
5
Skout
12.52
6
Google+
11.42
7
Badoo
11.17
8
LinkedIn
9.27
9
MeetMe
9.16
Table 8.16: Intrusiveness Subscore Breakdown of Social Apps
Active %
Idle %
21.88
27.45%
72.55%
8.12
21.63
34.72%
65.28%
8.00
19.04
20.61%
79.39%
Foursquare
10.51
21.42
58.27%
41.73%
Skout
10.90
21.86
85.21%
14.79%
Google+
10.47
19.43
89.39%
10.61%
Badoo
6.21
16.27
50.70%
49.30%
Linkedln
8.00
19.00
88.42%
11.58%
MeetMe
7.52
19.00
85.64%
14.36%
Social App
Active Score
Idle Score
Bump
10.70
Facebook
Twitter
89
Table 8.17: Ranking of Social Apps by Intrusiveness Subscores
Ranking
Active
Idle
1
Skout
Bump
2
Bump
Skout
3
Foursquare
Facebook
4
Google+
Foursquare
5
Facebook
Google+
6
LinkedIn
Twitter
7
Twitter
Linkedln
8
Meet me
Meet me
9
Badoo
Badoo
Table 8.18: Effect of Idle Access Activity on Overall Intrusiveness Ranking of Social
Apps
Social App
% Change in Ranking
Twitter
44.44%
Facebook
33.33%
Badoo
22.22%
Bump
11.11%
Foursquare
-11.11%
MeetMe
-11.11%
Google+
-22.22%
Linkedln
-22.22%
Skout
-44.44%
The relative intrusiveness ranking of Social apps stay remarkably stable between
active accesses and idle access, with apps shifting up or down at most one spot. This
similarity in rankings suggests that Social apps do not significantly alter their data
access patterns when the user switches from active to idle state. However, when we
normalize each of the Intrusiveness Subscores by their percentage contribution to total
90
data accesses, we find that the active intrusiveness ranking is affected dramatically.
Twitter, originally ranking
7 th
in active intrusiveness due to its limited breadth in
types of data accessed (only requested gps location), climbed up to number 3 in overall
intrusiveness, owing to its high level of idle access activity. Facebook, Badoo, and
Bump also increased their intrusiveness ranking for similar reasons, with idle access
percentages of 65.25%, 72.55%, and 49.30%, respectively, while the remaining apps
were displaced downwards in ranking.
91
92
Chapter 9
Discussion
In this section, we summarize the main findings from the Qualitative Intrusiveness
Study and the Quantitative Intrusiveness Study, highlighting their broader privacy
implications and qualifying their contributions.
9.1
Qualitative Intrusiveness Study
We performed the Qualitative Intrusiveness Study to gain a preliminary glimpse into
the profiling potential of intrusive apps. In this first study, we aimed to investigate
two questions about apps:
1. How much of an apps data collections are unexpected?
2. How can the collected data be used to profile a user?
Answering the first question, the results of the Qualitative Intrusiveness Study
showed that many apps made sensitive data requests while the user was idle. Out
of the 9 apps that were installed by the user, Maps and Antivirus collected most of
their data while the user was not using the app, and other apps such as Foursquare,
Facebook, and Scramble Free also made a significant proportion of their data requests
during the users idle period. These preliminary results convinced us that many popular apps were making requests for sensitive data that were likely unbeknownst and
93
therefore unpermitted by the user. However, not all apps that offered similar services exhibited the same level of idle access activity. For example, both Google Maps
and Yelp provide location-aware services as their core functionality, but their level
of idle access activity differ enormously, with Google Maps at 93.30% and Yelp at a
low 3.39%. Games also exhibit a noticeable change, with Scramble Free at 38.14%
and Cut the Rope Free at 3.90%. These initial findings showed that a more thorough examination of unexpected and therefore "intrusive" behavior was warranted,
as many apps which appear similar to users actually exhibit striking differences in
their sensitive access behavior. These surprising results motivated the second study the Quantitative Intrusiveness Study, where we sought to differentiate apps by developing a framework for quantifying intrusiveness based on real testing data gathered
for about 40 of Androids most popular apps.
To justify the importance of quantifying intrusiveness, we answered the second
question by performing some simple analysis on the data gathered by the preliminary
user study, focusing on the profile inferences that could be gleaned from location and
bluetooth data. We used these data to visualize and characterize a user's movement,
activity, and interaction profile. Overall, we found that smartphone apps can gain
a holistic view of a user's lifestyle based on location alone. Not only can movement
paths and times be mapped out, but likely places of residence, work, and leisure can be
identified, as well as socioeconomic factors tied to those frequented locations. A user's
active and dormant patterns could also be estimated based on the frequency of data
accesses.
We also identified a user's likely acquaintances and interaction patterns
using the contextual bluetooth data collected by AppWindow, thereby capturing
the social dimension of the users lifestyle. Indeed, smartphones are highly personal
devices, and apps can use them to track users' actions and deduce powerful inferences
about their lifestyle patterns.
94
9.2
Quantitative Intrusiveness Study
One major contribution of the Quantitative Intrusiveness Study is to demonstrate
that apps not only vary greatly in the types of sensitive data they access, but also
in the usage conditions under which they access that data. These disparities account
for differences in the Intrusiveness Scores of smartphone apps, both across and within
app categories. Indeed, each app truly does have a singular fingerprint that uniquely
quantifies its data access behavior. These differences in access behavior, and their
subsequent impacts on relative intrusiveness, highlight the need for a method that
allows users to assess apps that provide similar functionalities but exhibit highly
variable sensitive access activity, so that they can easily and accurately distinguish
more intrusive apps from less intrusive ones when choosing which apps to install.
To elaborate, there are noticeable differences in the datatypes accessed by Games
apps and those accessed by Messaging apps. Most of the Messaging apps regularly
request data types with higher personal identifiability, such as sms, contact, email,
photo, audio, whereas Games apps tend to access data types with lower personal
identifiability, accounting for the categorys lower average intrusiveness score. While
some of these more sensitive data accesses are crucial for app functionality, we still
quantify the data accesses as more intrusive given their potential for making deeper
inferences about the user's personal profile.
Whereas apps across categories exhibit noticeable differences in the data types
they access, apps within each category demonstrate greater homogeneity. In these
cases, the level of idle access activity causes significant changes in relative intrusiveness
ranking. This observation is exemplified by the top ranking app for intrusiveness in
the Messaging category: Kakao Talk. Based on data access patterns alone, the app
was ranked 5th out of 8 apps for intrusiveness. However, once the level of idle access
activity was factored into the intrusiveness score, Kakao Talk shot up to number
one in the overall intrusiveness ranking, mostly due to the fact that an overwhelming
majority (83.46%) of its accesses were performed while the user was idle. Even though
Kakao Talk accessed fewer highly identifiable data types than other Messaging apps
95
(e.g., Facebook, Google Voice, Skype, WhatsApp), its significantly higher level of idle
access activity was able to make it more intrusive than any other app tested in the
Messaging category.
This suggests that the level of idle activity is sometimes more impactful on relative
intrusiveness than the types of data that are accessed. This is especially true for apps
that access similar types of data, such as apps within the same category. Thus, the
level of idle activity is a highly relevant factor for assessing intrusiveness, because
users tend to compare apps within the same category when choosing which apps
to install. This is an important finding, because current transparency mechanisms
(Android Permissions Model, Lin et al [17], TaintDroid [7]) only focus on identifying
the types of data that are accessed by apps, and not the usage contexts under which
they are accessed. In this study, we show that there are other important factors in
assessing intrusiveness besides just the type of sensitive data accessed.
Another notable finding of the Quantitative Intrusiveness Study is that popularity
and reputation are not reliable indicators for reduced intrusiveness. All of the apps
chosen for testing ranked in the top 20 apps for their respective categories. Each had
ratings of at least 4 stars and at least 10,000 users, most with more than 100,000.
Yet, the majority of them exhibited a significant level of idle access activity, many
of them being for highly sensitive data (contact info, sms).
Company reputation
also does not guarantee reduced privacy risk. Even though Facebook Messenger was
found to be the least intrusive app out of all the Messaging apps that were tested, the
actual Facebook application was found to be the second most intrusive app out of all
Social apps tested. Twitter, a reputable company, ranked 3rd in intrusiveness, while
Google+ and LinkedIn ranked
6 th
and
8 th,
respectively. It is also worth reiterating
that Google Maps was found to conduct the most idle accesses during the preliminary
Qualitative Intrusiveness Study, making 93.30% of location requests while the user
was not interacting with the app. Thus, users should not automatically attach good
privacy practices to the high reputability of app companies.
96
Chapter 10
Conclusion
10.1
Summary
We have demonstrated that popular Android apps exhibit intrusive access behavior in
the types of data they access and the conditions under which they access them. One
major contribution of our study is the construction of a framework for quantifying the
intrusiveness of smartphone apps. This framework has two essential components: 1)
the Privacy Fingerprint, a concise yet potent visualization that reflects the data access
patterns unique to each app and 2) an Intrusiveness Score that numerically measures
each apps level of intrusiveness based on empirical testing data. Used together, the
Intrusiveness Score and Privacy Fingerprint help a user easily and accurately assess
the relative intrusiveness of apps when they choose which apps to install. Our study
demonstrates that the Intrusiveness Score is especially useful when users are comparing apps that exhibit similar data access patterns, given that current transparency
tools treat them as having comparable intrusiveness. Another major contribution of
our study is the identification and quantification of an app's proportion of idle access
activity. Though overlooked by current transparency mechanisms, our findings from
the Qualitative and Quantitative Intrusiveness Studies show that idle access activity
are prevalent yet unexpected practices among popular apps. More importantly, this
idle activity exerts significant impact on an app's relative intrusiveness ranking within
a given app category, thereby likely affecting a user's decision-making process during
97
app installation.
10.2
Future Work
While we believe our framework for quantifying app intrusiveness is a promising start
for aiding users in more efficiently assessing the relative intrusiveness of apps, the
formula for calculating the Intrusiveness Score needs more refinement. To be sure,
many variables and weights that are used to calculate an apps overall intrusiveness
were either subjectively determined (based on educated guesses) or not sufficiently
validated through empirical testing. Namely, the identifiability weights assigned to
each datatype was subjectively determined, based on an intuitive assessment of how
personally identifiable each data type was. The sequence of intrusiveness factors used
for sorting the access context list by decreasing intrusiveness was also subjectively
imposed. To recall, we sorted first on a user's app activity status (either active or
idle), second on the personal identifiability of each data type, and third on the user's
movement status. We chose this order because we believed it intuitively represented
decreasing contribution to intrusiveness. Lastly, the percentages of active and idle
access activity generated for each app needs to be more thoroughly tested, to account for possible variations in user-specific behavior. For example, an app that is
heavily used will have a greater active access activity percentage than when it is not
heavily used. Apps should be tested across different types of users (perhaps through
crowdsourcing) in order to get a more representative percentage number for each
apps proportion of active and idle access activity. The same can be done for the
percentages assigned to each access context.
98
Appendix A
Privacy Fingerprints
Here we show the Privacy Fingerprints for each smartphone app that was tested in
the Quantitative Intrusiveness Study. We tested 36 in total, but we only show 33
here, because 3 Photography apps did not exhibit any sensitive access behavior.
A.1
Privacy Fingerprints for Game Apps
Below we present the Privacy Fingerprints for each of the 10 free Games Apps that
were tested in the Quantitative Intrusiveness Study. They are shown in alphabetical
order.
99
-U
Angry Birds
active (13%1
idle 187%)
U
C
dmkw ide
-U-maa
ms(mouhw
0
0.25 0.
0.75
1 0 0.25 05 0.75 1
proportion of data accesses
Figure A-1: Privacy Fingerprint showing the data access behavior for Angry Birds
Bow Man
klp 113%i
active 12%1
C
U,
devim m
05 (ot --
debeM
bPS
l
IP
0 015 0.5 0.75
1
0 0.25 0.5 0.5
1
proportion of data accesses
Figure A-2: Privacy Fingerprint showing the data access behavior for Bow Man
100
Cut the Rope Free
idle (4%)
4W
c~bcuIo.Imotmo~
C
wU(Iwtm~g)
V
C
a
devimeid
n not
mog
devo.o
M
vIca Ii
gP. mo(Rwwk
P
o a.23 S
0.73
0 0.2 O 0. MIS
1
proportion of data accesses
Figure A-3: Privacy Fingerprint showing the data access behavior for Cut the Rope
Free
Fishing Star
idle (28%)
active
lp aduki
a,
C
wW (not moving)
VI
(U
U
C
dee
devime Ii
as
O 0.23 O.S 07s 1 0 0.23 OS 0.73
no
id
min
1
proportion of data accesses
Figure A-4: Privacy Fingerprint showing the data access behavior for Fishing Star
101
Flow
active
U
CE
device id
0 0.25
0.5 O.75
proportion of data accesses
Figure A-5: Privacy Fingerprint showing the data access behavior for Flow
Fruit Ninja
Idle (2%)
active (98%)
In
4A
m
&-&ke Id
dea id
ms hm-
0 02S 0S 0.73
1
0.25 0. 0.75 1
proportion of data accesses
Figure A-6: Privacy Fingerprint showing the data access behavior for Fruit Ninja.
102
Paper Toss
active (82%)
Idle (18%)
U,
I
U,
C
fi (not nWmnC)
U,
(U
V
C
as
device I
ws (not noving)
WS (moving)
(n mvIWI
0
0.IS G.5 0L75
10
0.23 0.5 0.75
1
proportion of data accesses
Figure A-7: Privacy Fingerprint showing the data access behavior for Paper Toss app
Scramble Free
Idle (38%)
active (62%)
se-m accw rometer
W'
M
device id
device Id
contc
0
0,25 0.0.75
10 0.25030.75 1
proportion of data accesses
Figure A-8: Privacy Fingerprint showing the data access behavior for Scramble Free
103
Temple Run
active
-I!! a"AdIWoMeta
I
I..
U,
U
U
C
F
0
0.25 0.5 075 1
proportion of data accesses
Figure A-9: Privacy Fingerprint showing the data access behavior for Temple Run
UNO
idle (6%)
active (94%)
U,
U'
C
Wv(not And
wiffi (not movind
wifi(movkv)
mdsnu
wiM(movind
15
U
C
device id
Ws (not moving)
pS (R-mk
0 0.2S O5 0.75
1 0 0.2s O.s O.7s
I
proportion of data accesses
Figure A-10: Privacy Fingerprint showing the data access behavior for UNO®
104
A.2
Privacy Fingerprints for Photography Apps
Below we present the Privacy Fingerprints for each of the 6 out of the 9 free Photography Apps that were tested in the Quantitative Intrusiveness Study. They are
shown in alphabetical order. Note that 3 apps (Photo Editor, PicsSay, Pixl-r-omatic)
are missing Privacy Fingerprints, because they did not exhibit any sensitive access
behavior as logged by AppWindow.
Camera36O
idle (20%)
actIve (80%)
seso accetermt
ssor ccelerameter
C
C
dawrv id
as(ntmoinms
ps m-Ind
ps(natmuovin)
WpS (meb'n
0
CaS
O.s .s
0
0.25 0.5 O.Js
I
proportion of data accesses
Figure A-11: Privacy Fingerprint showing the data access behavior for Camera360
Ultimate
105
FxCamera
Idle (20%)
active (80%)
-esoacceomte
seso awdu"Meer
Z0
C
C
0 0.2 0.3 0.73
1 0 0.25 03 0.73
proportion of data accesses
Figure A-12: Privacy Fingerprint showing the data access behavior for FxCamera
Instagram
active
FA
C
0 0.2
.S 0.3
proportion of data accesses
Figure A-13: Privacy Fingerprint showing the data access behavior for Instagram
106
Photo Grid
Idle (15%)
active (85%)
sSWOacdWawe
-
accdwmnetu
C
U,
C
O0.25 0.5 0.75
1
0 0.2S 0.5 0.75
1
proportion of data accesses
Figure A-14: Privacy Fingerprint showing the data access behavior for Photo Grid
PicsArt
Idle (1s%)
active (87%)
wU(natma~)
C
de*. Id
"'If-
0 a.25 0.5 0.75 1 0 O2as
.
s 1
proportion of data accesses
Figure A-15: Privacy Fingerprint showing the data access behavior for PicsArt Photo Studio
107
Pudding Camera
idle (5%)
active (95%)
Sns" ac
tA
amomtr
I
U,
C
IA
I,
2
C
ws (noat
wing)
wps (mn-i)
0
0.23 &5 0.7
1 0
0.25 0.5 015
proportion of data
I
accesses
Figure A-16: Privacy Fingerprint showing the data access behavior for Pudding Camera
108
A.3
Privacy Fingerprints for Messaging Apps
Below we present the Privacy Fingerprints for each of the 8 free Messaging Apps that
were tested in the Quantitative Intrusiveness Study. They are shown in alphabetical
order.
Facebook Messenger
idle (5%)
active (95%)
OA
ceN omaion (notovlg)
U
C
device id
deV"d
mauemea og
0 0.2S &S 0.75
1 0
0.25 0.5 0.75
Ws (not rnWAFng
1
proportion of data accesses
Figure A-17:
Messenger
Privacy Fingerprint showing the data acces's behavior for Facebook
109
GO SMS Pro
Idle (40%)
actve I
in
n1sisda
C
I P
SWIS
sms
0 0.25 0. 0.75 1
proportion
0 0.25 0.5 0.75 1
of data accesses
Figure A-18: Privacy Fingerprint showing the data access behavior for GO SMS Pro
Google Voice
-oo
accdwmometm
C
Mkss
U
C
ema
contact
0 0.25 O.S 0.75 1
proportion of data accesses
Figure A-19: Privacy Fingerprint showing the data access behavior for Google Voice
110
Handcent
idle (39%)
active (61%)
'I,
I
C
In
'U
d
C
device id
ud
bud
dieuce ,d
F
audio
am
0
0.250.5 0.
1
00.25 0.5 0.75 1
proportion of data accesses
Figure A-20: Privacy Fingerprint showing the data access behavior for Handcent SMS
KakaoTalk
idle (83%)
ipaddess(no
U,
P
win
active (17%)
ipaddess(notmovin)
mobig)
wvf Inm maving
(natmovingi
wiE(,uoving)
.
.
device Id
phone
number
emaid
0 0.2S 0.A 0.7S 1 0 0.25 0M5 0.7
I
proportion of data accesses
Figure A-21: Privacy Fingerprint showing the data access behavior for Kakao Talk
111
Pinger
idle (71%)
active (29%)
#A
M
4Sdae id
Sdavice id
Ws (not moving)
0
0.2505 0.7S 10 0.25 0.$ 0.$
1
proportion of data accesses
Figure A-22: Privacy Fingerprint showing the data access behavior for Pinger
Skype
idle (38%)
active (62%)
seso accdWemnnte
de&-wce
.6
Id 0%L
Contw~t
n
device id
0.7%
-
Ws (not mnoving)
6%
"
a 025 05 0,75
proportion
e,
Contact
1 0 0.25 0.5 0-75 1
of data accesss
Figure A-23: Privacy Fingerprint showing the data access behavior for Skype
112
WhatsApp
idle (48%)
C
active (52%)
hnd
knj
n j
device id
drvie id
ps (not movming)
WS
-
coc
qP
O 0.25 O. 0.75 1 0 0.25 0.5 0.75 1
proportion of data access
Figure A-24: Privacy Fingerprint showing the data access behavior for WhatsApp
113
A.4
Privacy Fingerprints for Social Apps
Below we present the Privacy Fingerprints for each of the 9 free Social Apps that
were tested in the Quantitative Intrusiveness Study. They are shown in alphabetical
order.
Badoo
idle (49%)
RettdhrIig Cel lDCatiM (DOtMOdWg
active (51%)
metbam
16%
Cdl bmnda lootmowbiuI
uce kic~a6W Oflce
Wm ((nanma
wifImoin
cmn
Abcmdaf rWt mao
on mau
mca
&mid
tmoig
0 025 Os 0.75
0 0-25 asM7
proportion cwf data acoesses
Figure A-25: Privacy Fingerprint showing the data access behavior for Badoo - Meet
New People
114
Bump
idle 173%)
actIve 127%)
a -
C
--U
h..
VA Irm
wifi (not mng)
d
movbnd
wi(nmaind)
-s a
deice i
device id
ips (not moving)
men (c
gps (natmo n*
gps ImovInd
contact
photo
o 0.25
0.5 0.75
1 0 0.25 05 0.7
1
proportion of data accesses
Figure A-26: Privacy Fingerprint showing the data access behavior for Bump
Facebook
idle (65%)
active (35%)
U,
I
U,
C
U,
EU
U
C
ps (not movind
gpsm mvimd
s-(notmoAns
P
malao
0 0.25
0.5
0.5
1 0
025 0.5 0.5
I
proportion of data accesses
Figure A-27: Privacy Fingerprint showing the data access behavior for Facebook
115
Foursquare
Idle (42%)
-C
active 158%)
Wifl (not moving
wifi (not movkn
Wfmoviffi)
U
C
wp (notmoving)
cs (not moving)
ae (movn)
gps (moving)
email
emal
IP
o
0.25 0.5 0.75
1 0 0.25 0.3 075 1
proportion of data accesses
Figure A-28: Privacy Fingerprint showing the data access behavior for Foursquare
Google+
idle (65%)
active (89%)
Vo
C
device Id
device Wd
ws (notmoving)
photo
o
0.25 as Os
1 0
O.2s 0.
0.75
1
proportion of data accesses
Figure A-29: Privacy Fingerprint showing the data access behavior for Google+
116
Linkedin
actve (8%)
Idle (11%)
U
C
deuu. id
0 023 0.5 071 1 0 0.25 0.1 0.71
proportion of data
i
accesses
Figure A-30: Privacy Fingerprint showing the data access behavior for WhatsApp
MeetMe
Idle 114%)
imsi
2
C
actIve (86%)
dm SeI
dowimid
(anu.
dmn-k- I
gonomovim
EWd
0
0.25 0.5 03
1 0 0.25 0.5 075 1
proportion of data accesses
Figure A-31: Privacy Fingerprint showing the data access behavior for MeetMe Meet New People
117
Skout
Idle (15%)
active (85%)
U,
U,
I-
C
U,
tu
C
devike Id
de*m
id
aps(not =oVbWg
Ups-n
pa.enumber
phendw.~
A.. .m
C-Ut
o o2s 0s 0.7s 1 0 0.25 0-5 0.75a
proportion of data accesses
Figure A-32: Privacy Fingerprint showing the data access behavior for Skout
Twitter
idle (79%)
active (21%)
C
Ws (not mowing
s(naemovln)
gps (-NOwig)
0
0.23 03 0.73
1 0 0,25 05 0,75
1
proportion of data accesses
Figure A-33: Privacy Fingerprint showing the data access behavior for Twitter
118
Appendix B
AppWindow Code
B.1
Modifications to Android APIs
Here we provide a table that lists the Android API methods that were modified to
log sensitive data requests made by apps. To download the actual source code for
AppWindow, follow the instructions in the next section.
119
Table B.1: Android API methods that were modified for AppWindow
Sensitive Data
Android methods modified
sms
audio
contact name
contact email
contact phone number
browsing info
ContentResolver.java:
ContentResolver.java:
ContentResolver.java:
ContentResolver.java:
ContentResolver.java:
ContentResolver.java:
voicemail
TelephonyManager.java: getVoiceMailNumber()
gps location
device id
sim serial number
imsi
msisidn
will info
query()
query(
query()
query(
query(
query()
TelephonyManager.java: getCompleteVoiceMailNumber()
TelephonyManager.java: getVoiceMailNumberCount()
ContextImpl: getLocationManager()
LocationManager.java: requestLocationUpdates()
LocationManager.java: getLastKnownLocationLogged()
TelephonyManager.java: getDeviceId(
TelephonyManager.java: getSimSerialNumber(
TelephonyManager.java: getSubscriberId(
TelephonyManager.java: getLinellNumber()
ContextImpl.java: getWifiManager(
WifiManager.java: reconnect()
WifiManager.java: reassociate()
WifiInfo.java: getSSID()
WifiInfo.java: getBSSIDO
WifiInfo.java: getRssi()
WifiInfo.java: getMacAdress()
WifiInfo.java: getNetworkIdo
WifiInfo.java: getSupplicantState(
WifiInfo.java: getHiddenSSID(
WifiInfo.java: getDetailedState(
WifiInfo.java: getIpAddress()
cell location
sensor accelerometer
sensor rotation vector
bluetooth info
TelephonyManager.java: getCellLocationo
SensorManager.java: getDefaultSensor(
SensorManager.java: getDefaultSensor(
BluetoothAdapter.java: constructor
BluetoothService.java: getDefaultAdapter()
BluetoothService.java: getState(
BluetoothService.java: getRemoteDevice(
BluetoothService.java: enableo
BluetoothService.java: getName(
BluetoothService.java: getScanMode()
BluetoothService.java: startDiscovery()
BluetoothService.java: getBondedDevices()
BluetoothService.java: listenUsingRfcommOn()
BluetoothService.java: listenUsingRfcommWithServiceRecord()
BluetoothService.java: listenUsingInsecureRfcommWithServiceRecord()
BluetoothService.java: listenUsingInsecureRfcommOn
BluetoothService.java: listenUsingScoOno
BluetoothService.java: readOutOfBandData()
120
B.2
Source Code
AppWindow was constructed from Android Open Source Code, version 2.3.4_rl.
To download this code, please go here: http: //source. android. com/source/downloading.
html. Be careful to check out the correct branch. This can be done using the following
command:
repo init -u https://android.googlesource.com/platform/manifest
-b android-4.0.1\_rl
Next, replace the frameworks/base folder with the modified frameworks/base folder
written for AppWindow, found here: http: //air. csail.mit. edu/f rances/appwindow-2.
3.4_ri.tar.gz. You will have to unzip the file and find the frameworks/base directory, pasting all of that code into the android source you previously downloaded.
Finally, compile the newly modified android source code using instructions from http:
//source . android. com/source/building.html.
121
122
Bibliography
[1] Android. Android security overview - android open source, August 2012. [Online;
accessed 3-Aug-2012].
[2] Guangdong Bai, Liang Gu, Tao Feng, Yao Guo, and Xiangqun Chen. Contextaware usage control for android. In Sushil Jajodia and Jianying Zhou, editors,
Security and Privacy in Communication Networks, volume 50 of Lecture Notes of
the Institute for Computer Sciences, Social Informatics and Telecommunications
Engineering, pages 326-343. Springer Berlin Heidelberg, 2010.
[3] Alastair R. Beresford, Andrew Rice, Nicholas Skehin, and Ripduman Sohan.
Mockdroid: trading privacy for application functionality on smartphones. In
Proceedings of the 12th Workshop on Mobile Computing Systems and Applications, HotMobile '11, pages 49-54, New York, NY, USA, 2011. ACM.
[4] Nick Bilton. Disruptions: So many apologies, so much data mining, February
2012. [Online; accessed 23-Jun-2012].
[5] Trinh-Minh-Tri Do and Daniel Gatica-Perez. By their apps you shall understand
them: mining large-scale patterns of mobile phone usage. In Proceedings of the
9th International Conference on Mobile and Ubiquitous Multimedia, MUM '10,
pages 27:1-27:10, New York, NY, USA, 2010. ACM.
[6] Manuel Egele, Christopher Kruegel, Engin Kirda, and Giovanni Vigna. PiOS:
Detecting privacy leaks in iOS applications. In Proceedings of the 18th Annual
Network & Distributed System Security Symposium (NDSS), February 2011.
[7] William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung,
Patrick McDaniel, and Anmol N. Sheth. Taintdroid: an information-flow tracking
system for realtime privacy monitoring on smartphones. In Proceedings of the 9th
USENIX conference on Operating systems design and implementation, OSDI'10,
pages 1-6, Berkeley, CA, USA, 2010. USENIX Association.
[8] William Enck, Machigar Ongtang, and Patrick McDaniel. On lightweight mobile
phone application certification. In Proceedings of the 16th ACM conference on
Computer and communications security, CCS '09, pages 235-245, New York,
NY, USA, 2009. ACM.
123
[9] Adrienne Porter Felt, Kate Greenwood, and David Wagner. The effectiveness
of application permissions. In Proceedings of the 2nd USENIX conference on
Web application development, WebApps'11, pages 7-7, Berkeley, CA, USA, 2011.
USENIX Association.
[10] Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Ariel Haney, Erika Chin,
and David Wagner. Android permissions: User attention, comprehension, and
behavior. Technical Report UCB/EECS-2012-26, EECS Department, University
of California, Berkeley, Feb 2012.
[11] Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster. SCanDroid: Automated
Security Certification of Android Applications. Technical Report CS-TR-4991,
Department of Computer Science, University of Maryland, College Park, November 2009.
[12] Peter Gilbert, Byung-Gon Chun, Landon P. Cox, and Jaeyeon. Jung. Automating privacy testing for smartphone applications. Technical Report CS-2011-02,
Duke University, February 2011.
[13] Peter Hornyack, Seungyeop Han, Jaeyeon Jung, Stuart Schechter, and David
Wetherall. These aren't the droids you're looking for: retrofitting android to
protect data from imperious applications. In Proceedings of the 18th ACM conference on Computer and communications security, CCS '11, pages 639-652, New
York, NY, USA, 2011. ACM.
[14] Patrick Gage Kelley, Sunny Consolvo, Lorrie Faith Cranor, Jaeyeon Jung, Norman Sadeh, and David Wetheral. A conundrum of permissions: Installng applications on an Android smartphone. In Proceedings of the Workshop on Usable
Security (USEC), February 2012.
[15] Jennifer King. "how come irfi allowing strangers to go through my phone?: Smart
phones and privacy expectations. preprint (2012).
[16] MIT Media Lab. funf - open sensing framework, 2011. [Online; accessed 5-Jun2012].
[17] Jialiu Lin, Shahriyar Amini, Jason I. Hong, Norman Sadeh, Janne Lindqvist,
and Joy Zhang. Expectation and purpose: Understanding users mental models of mobile app privacy through crowdsourcing. In Proceedings of the 14th
Internationalconference on Ubiquitous computing. ACM, September 2012.
[18] Erika McCallister, Tim Grance, and Karen Scarfone. Guide to protecting the
confidentiality of personally identifiable information (pii) -recommendations of
the national institute of standards and technology. Technical Report NIST Special Publication 800-122, National Institute of Standards and Technology, April
2010.
124
[19 Mohammad Nauman, Sohail Khan, and Xinwen Zhang. Apex: extending android permission model and enforcement with user-defined runtime constraints.
In Proceedingsof the 5th A CM Symposium on Information, Computer and Com-
munications Security, ASIACCS '10, pages 328-332, New York, NY, USA, 2010.
ACM.
[20] Scott Thurm and Yukai Iwatani Kane. Your apps are watching you, December
2010. [Online; accessed 10-Jan-2011].
[21] Aydan R. Yumerefendi, Benjamin Mickle, and Landon P. Cox. Tightlip: keeping
applications from spilling the beans. In Proceedings of the 4th USENIX conference on Networked systems design & implementation, NSDI'07, pages
12-12, Berkeley, CA, USA, 2007. USENIX Association.
[22] Yajin Zhou, Xinwen Zhang, Xuxian Jiang, and Vincent W. Freeh. Taming
information-stealing smartphone applications (on android). In Proceedings of
the 4th internationalconference on Trust and trustworthy computing, TRUST'11,
pages 93-107, Berlin, Heidelberg, 2011. Springer-Verlag.
125