Uploaded by Dheeraj V

CS846 FinalReport DheerajVagavolu

advertisement
CS846 Final Report: Semantic Code Reduction for
Android SystemAPIs
Dheeraj Vagavolu
d2vagavo@uwaterloo.ca
Abstract
Finding vulnerabilities in Android System APIs is a wellresearched topic [1, 7]. Due to the best practice nature of the
access control measures, there can be more than one way to
ensure that the APIs are secure. Hence, existing approaches
that use inconsistency in similar APIs as a metric to determine vulnerabilities are not enough. Our thesis aims to use a
machine learning approach to learn from the code patterns
and predict the access control requirements. In order to do
this, we first need to collect appropriate features from the
source code, such as token names, types of variables and
unique method calls. Android System APIs often contain
hundreds or thousands of lines of code, making the extraction of such valuable features difficult. Moreover, the source
code often contains unnecessary code insanity checks and
logging behaviour. Such code is not essential in determining
potential vulnerabilities. Our project aims to reduce these
System APIs based on observed patterns from the source
code, hoping to perform better in the vulnerability prediction
phase based on the extracted features. The two main features
we look at are Input Validations and Error Validations. On
Average, we show that Android System APIs can be reduced
by 12% of their size and evaluate our reduction approach
using a naive Machine Learning model.
1
Introduction
The Android Application framework is rich in services that
provide users with access to hardware resources and ways
of interacting with other sensitive data [3]. These include
access to WIFI, location tracking, camera and many more
such features. OEMs constantly modify the Android Open
Source framework and introduce new features specific to
their devices. Such services are easily interchangeable by
vendors across different devices or even different android
versions [3]. These are known as System Services, and the
methods they expose are known as SystemAPIs.
The Android Framework plays a crucial role in protecting
private user data and the device’s resources. It does this by
following several access-control models such as PID checks,
UID checks, and permission checks [1]. Different Android
framework APIs are protected by different permission levels
such as normal, dangerous and system. The apps must declare the corresponding permissions in their manifest files
and ask users to authorize either at install or run-time to
access these permissions. The access-control model in Android permission is decentralized, which means that each
service’s responsibility is to protect the resources it exposes
by enforcing the necessary permissions.
My thesis aims to predict vulnerable System APIs in the
Android Framework using a Machine Learning approach.
Some of the features we can use for this task are Natural
language-based features such as method calls and variable
names and structural features such as the method level ASTs
and CFGs. However, the System APIs in android often contain hundreds or thousands of lines of code, making the extraction of valuable features difficult. Such code statements
are not necessary and are just present for sanity checks and
logging purposes. Some statements might be necessary for
the execution but not necessary for determining the vulnerability of the API. As the project for this course, we propose
"Semantic" code reduction from System APIs using static
analysis and machine learning.
Feature extraction is an essential part of the entire pipeline
as we extract the valuable information from the SystemAPIs
[12]. We know that a Natural language or text-based feature
space has very high dimensionality. The existence of unwanted information in the feature space results in poor performance and scalability of artificial neural networks when
used for text classification tasks [5]. In literature, we have
seen that several works have focused on feature reduction
to improve the precision and recall of neural network-based
classification models [4, 13, 14].
To directly translate the feature reduction from natural
language classification to code classification, we need to understand patterns in the text which are specific to the source
code. In essence, we need to understand patterns based on
domain knowledge that can be used to reduce unwanted
code in the android framework. In this work, we focus on
the Input Validations and Error Validations as the target patterns and develop a tool to remove such code patterns from
the SystemAPIs. We evaluate our reduction approach using a naive machine learning-based classifier and report our
findings. We will look at the code patterns in depth in the
following sections.
2
Detailed Problem Statement
Inconsistent enforcement of permissions in third-party modifications can result in some unacceptable scenarios. For
example, an application might be granted numerous systemlevel permissions to access a simple Date and Time API.
Conference’17, July 2017, Washington, DC, USA
Sometimes, the application might not even require any permissions for accessing critical resources that can put the
users at risk.
In literature, researchers have studied the permission model
in the framework; however, very few works have studied the
correctness of the policy enforcement in the android framework. Several challenges need to be solved to study android
policy enforcement. Some of them include discovering the
protected resources in a SystemAPI and understanding the
various security checks in the system service. Another challenge is the Inter-process Communication (IPC) through the
Binder mechanism and remote procedural calls, which allow
calls to go beyond the expected boundary of the SystemAPI.
Researchers have come up with several static analysis approaches, such as Kratos [15] and Acedroid [1], which leverage consistency analysis to find vulnerabilities. These aim
to find any inconsistent policy enforcement within similar
system services or services with the same functionality. However, these require lists of manually defined security checks
and protected resources that are based on observation. On
the other hand, ACminer [7] is a semi-automatic approach
that does not require security checks or protected resources
as an input as it uses regular expressions to discover potential security checks. However, it introduces heuristics
to reduce false positives. Consistency analysis shows that
even academics and professional developers can introduce
exploitable bugs, which lead to security issues. Hence it is
necessary to study policy enforcement.
The basic approach in a consistency analysis study is to
find and group SystemAPIs that access similar protected resources. The next step is to determine the similarity of the
access control checks in the path leading to the protected
resource in such SystemAPIs. If similar access control checks
are not used, an inconsistency in the access control is determined and flagged to the developer. Figure 1 shows the
intuition behind consistency analysis for two SystemAPIs.
2.1
Limitations of Consistency Analysis
Inconsistent policy enforcement does not always mean that a
SystemAPI is exploitable. For consistency analysis, one must
look at a pair of SystemAPIs with similar functionality or
ones that access the same protected resource. However, such
conditions introduce false positives as discovering protected
resources is still an open challenge in the android framework.
Several works in the Android application layer have utilized
neural networks to find potential vulnerabilities. In our work,
we aim to use a machine learning-based approach to learn
patterns from the source code of SystemAPIs, which can
directly be used to predict whether a SystemAPI requires
protection. This would avoid the overhead caused by static
analysis and avoid the need to find pairs of SystemAPIs
with similar functionality. Hence, the entire process can
be broken down into three steps; 1) Feature Extraction, 2)
Labelling the Data 3) Training and Testing of ML models. In
Dheeraj Vagavolu
Figure 1. Consistency Analysis Intuition
our current work, we focus on Step:1 i.e., feature extraction,
more specifically on feature reduction.
2.2
Code Reduction Problem
We know that android has a decentralized diverse access
control model. The access control checks range from direct
permission checks to indirect checks involving the device
state or the application status. Sometimes, the Android APIs
are also protected by conditional branches that check the
method arguments’ correctness. Such checks are known
as input validations. While detecting access control checks
is straightforward based on the prior knowledge of the diverse nature of access checks, detecting input validations is
challenging. Input validations are simple conditional checks
which check the correctness of the input method parameters,
the similarity can be seen in Listings 1 and 2.
Listing 1. Input Validation
void method ( param ) {
i f ( param ! = NULL ) {
/ / Do s o m e t h i n g
} else {
// Exit
}
}
Listing 2. Not an Input Validation
void method ( param ) {
i f ( param < 5 ) {
/ / Do s o m e t h i n g
} / / Do s o m e t h i n g E l s e
}
CS846 Final Report: Semantic Code Reduction for Android SystemAPIs
For our problem, the unnecessary code is present in the
failure block of both the access control and input validation
checks. When we look at the Android framework using a
decompiler, we can see that such failure blocks can be identified based on specific heuristics involving Log statements
and Termination conditions. Overall, we aim to look at prior
works and the android framework and develop concrete
heuristics which can be used to identify and remove the unnecessary code statements from the Android System APIs.
In the following sections, we will look at both access control
validations and input validations in detail.
3
Background
This section will introduce some core concepts and terminology used in the report.
3.1
SystemAPIs
We can find several System services in android frameworks
that enhance the capabilities of the Android framework. Such
System services are built as extensions to the android framework and are constantly added by vendors. Unlike framework
services, System services are easily interchangeable and vary
from vendor to vendor.
Usually, such services are meant to be used by the android
OS. However, most of the time, the vendors expose specific
APIs through well-defined interfaces that are accessible to
other services through the Binder IPC [1]. There is a need
for solid access control checks imposed by those APIs in
such cases. Weak access control in these exposed System
APIs could lead to vulnerabilities and exploits by malicious
entities.
We observe that the permission checking in such APIs has
to be enforced by the developers of those APIs. Android has
specific guidelines for permission enforcement which the
framework services follow in most cases. However, the same
cannot be guaranteed for the System services.
3.2
Android Permission Mapping
The android framework protects different resources such
as hardware resources, private user data, and other critical
device information. To access them, each application must
hold corresponding permission. Permissions are nothing but
strings that are associated with each application’s UID.
Different permission levels protect different types of resources. For example, if there is very little or no risk to the
user’s privacy, the Normal permissions are used. These permissions are granted at the install time. The Signature permissions are also granted at the install time by the android
framework; however, the application asking for the signature
permission must be signed with the same signature as that
of the app that defines the required signature permission.
Dangerous permissions protect critical resources that can
be used to identify or harm the user somehow. For example,
Conference’17, July 2017, Washington, DC, USA
if one wants to read contacts from the phone or access the
file storage system, one must have the required dangerous
permissions declared by the android framework. Dangerous
permissions are run-time permissions; that is, the user is
prompted to grant the permission at run-time by showing
an alert or a dialogue.
Android lacks a central permission checking mechanism
and relies on each framework service, exposing sensitive
resources to enforce necessary permission checks. Moreover,
like framework services, system services are responsible for
permission enforcement for all the sensitive resources they
expose. The android framework does not require a specific
format or location for permission checking in these system
services, making them easily exchangeable by various vendors.
4
4.1
Methodology
Removable Features
The first step is to look at a decompiled Android ROM and
develop the features that can be safely removed to improve
or at least not affect the model’s performance. Based on prior
work and the manual inspection, we consider the following
features for reduction.
Input Validations. Input validations are used as a sanity
check to discard ill-formed or malicious inputs to the SystemAPIs. The first requirement of input validation is that the
input must be propagated to a comparison statement through
data flow and compared against some pre-configured static
values. Based on the comparison, the API decides which
action to take.
Access Control Validations. Similar to input validations,
access control checks are standard in SystemAPIs, and they
filter out any calls to the API based on some pre-defined access defining parameters. These can range from straightforward Permission checks to application status checks. Many
works in literature identify and map access control validations. The mapping of the access control blocks is a challenge
because of the best-effort nature of the access control enforcement and a lack of central policy.
Failure Branches. The structural characteristics of input
validations are inherently different from general branching
statements. Specifically, an input validation not only compares the input with other data but also terminates its normal
execution immediately when the validation fails [17]. We
extend the exact characterization to detect access control
failure blocks.
4.2
Why Input Validations and Error Validations
Input validations and access control validations are strongly
embedded in almost every SystemAPI because of the decentralized access control architecture of Android. Using such
validations, the framework developers enforce the necessary
Conference’17, July 2017, Washington, DC, USA
Dheeraj Vagavolu
Figure 2. Static Analysis using WALA to extract features and generate labels from android SystemAPI s
security checks. However, from a prediction model’s perspective, whose goal is to generate access control enforcement
using the code’s body, we deem that such validations are not
necessary to be present in the body.
Zhang et al. aim to discover and predict sensitive input
validations using natural language models in their work. We
build on their techniques, extend them to be more generalized, and discover input validation and access control failure
blocks.
4.3
Heuristics for detection
Our work for detecting validations can be broken down into
two major steps:
Detecting Conditional Statements. We use DefUse analysis and Access Control Detection Logic to detect input validation and access control conditions. Using DefUse analysis,
we aim to track the input arguments from the methods to
the conditional statements, and similarly, we track the access
control statements using prior knowledge.
Detecting Failure Branch. Our primary focus is the failure
branch of the validations checks. For example, instead of
looking at Listing 3, we need to find code inside the failure
branch of validation as shown in Listing 4. However, note
that the failure and success branches cannot be differentiated
by statically evaluating the expressions. For example, it is
difficult for us to know if a parameter is supposed to be
NULL or not. For example, in Figure 4, the input parameter
provider is supposed to be null in order to access some of the
functionality of the SystemAPI.
Listing 3. Success Branch of Input Validation
void method ( param ) {
i f ( param ! = NULL ) {
/ / Main c o d e
...
}
}
Listing 4. Failure Branch of Input Validation
void method ( param ) {
i f ( param == NULL ) {
/ / should not reach here
}
...
}
CS846 Final Report: Semantic Code Reduction for Android SystemAPIs
Conference’17, July 2017, Washington, DC, USA
Figure 3. Tracking Arguments to the Validations Conditions
Figure 4. Simple NULL check is not enough to Detect Failure Branches of Validations
However, based on manual inspection and previous works,
we know that input validations can be tracked using heuristics such as termination conditions as Security Exceptions
and Return statements. For our work, we use such heuristics
to detect the failure blocks of both input validations and Access Control Validations. Specifically, we use a combination
of Exceptions, Return Statements and Log/Debug statements
to detect failure validation branches.
4.4 Static Analysis using WALA
We rely on the Static Analysis methodology to analyze and
extract features from System Services. Static analysis allows
us to examine the desired source code without the need of
executing it, in contrast to Dynamic analysis [9]. Another advantage that Static analysis provides over Dynamic analysis
is the high code coverage. Static analysis tools such as Soot
[8], or WALA are usually used to generate an abstract model
Conference’17, July 2017, Washington, DC, USA
Heuristic
Usage
Exceptions
Look for presence of
Exceptions
Dheeraj Vagavolu
Examples
InputValidation
Exceptions,
Security Exceptions
Return statements
return null; rewhich return Primal
turn 0;
Types
Look for Classes deLog State- rived from LOG and
Log.log("Error")
ments
the method’s argument string
Table 1. Table to test captions and labels.
Return
Statements
of the source code in terms of call graphs (CG) or control flow
graphs (CFG). This abstract model can then be queried for
variable names, perform taint analysis, or perform data-flow
analysis for obtaining more in-depth information.
We have observed that many works in literature have used
Static Analysis tools to examine the Android framework and
applications [9]. We use these words as a motivation to use
the Static analysis tool WALA for analyzing SystemAPIs.
Class Hierarchy Analysis. One of the simplest forms of
analysis that one can perform without generating computationally costly call graphs is the Class Hierarchy Analysis (CHA). Using Class Hierarchy Analysis, one can iterate
through all the classes from the Android framework and
the instructions they contain. Instruction types, targets of
invoking instructions, and field names are specific features
that can be examined using Class Hierarchy Analysis.
Call Graph generation. To gain more in-depth knowledge about the methods in android services, such as variable
names in various field access instructions, we require building the Call graph. The call graph begins at a specific method
and then recursively maps all the methods in each method’s
body. This results in a directed graph in which each node is a
caller/callee method and is connected to corresponding callee
methods through edges. Further, each node in a call graph
has its own symbol table, which can query methods, parameter, and variable names from the instructions. In this work,
we only focus on variable names and method names from
field access and invoke instructions, and hence a call graph
would suffice. For extracting more complicated features such
as the code structure, we require building the control flow
graph, which allows data-dependency analysis across the
methods.
Entry Point Detection. Unlike usual object-oriented programs, the android services and applications don’t have a
fixed main method that can be used as an entry point for
the call graph generation. Hence, researchers are generally
required to create a list of their entry points depending on
the context of the static analysis. In our work, we require to
look at SystemAPI s in the android framework, so we need to
select entry points to allow our call graph to include all the
necessary SystemAPI s. The simplest way to do so would be
to generate a call graph with all the necessary SystemAPI s
as the entry points.
DefUse Analysis. Using Wala, we track the definitions and
usages of variables across the control flow. We do this in
order to cover different cases, as shown in Figure 3, where
the dependency of the argument is being tracked.
Dominator Analysis. Using static analysis, exploring the
conditional blocks is not that straightforward. It is hard to
determine the statements inside the conditionals with an
imprecise control flow graph. In this project, we use dominator analysis to retrieve the target code from inside the
failure branches of the validation statements. First we construct a dominator tree from the control flow graph using
a class provided by WALA. In a dominator tree, a node a
is said to dominate another node b, if all the calls to b in
the control flow graph definitely pass through a. Using this
tree, we look for all the statements which are dominated by
the first instruction just after the validation statement. This
gives us the required instructions from inside the conditional
statements. However, to use this trick, we must assume that
there is at least one statement inside the conditional block.
4.5
Overall System Design
To summarise, Figure 6 encapsulates the entire process starting from the ROM and generating reduced code snippets.
We can group the code into the following two core modules:
Detector Module. This module is responsible for detecting
the required SystemAPIs in the Android framework. It does
so by using Class Hierarchy Analysis and using prior knowledge of interaction amongst Binder and the SystemAPIs.
Control Flow Analysis. This module does most of the work
in detecting input and access control validations. This module wraps the DefUse analysis, Heuristic-based validation
detection and Dominator Analysis and generates reduced
SystemAPIs.
4.6
ML based evaluation
To evaluate the reduction pipeline, we develop a separate
ML pipeline using word embeddings as shown in Figure
5. The pipeline for both training and testing the model is
the same; we would generate the document embedding for
each SystemAPI using Word2vec and then feed them along
with the labels to a classifier. We generate the embeddings of
test data for testing and use that to predict the labels. These
predicted labels can now be compared to actual labels for
getting the scores.
CS846 Final Report: Semantic Code Reduction for Android SystemAPIs
Conference’17, July 2017, Washington, DC, USA
Figure 5. Machine Learning pipleine for a Binary Classifier
For our evaluation, we collect data from the decompiled
ROM of Xiamoi Redmi k20 with Android version 9. Using
Static Analysis, we can collect data from 644 SystemAPI s
from a total of 21 Service classes. From this data, 149 SystemAPI s were labelled protected by the static analysis step,
constituting around 23% of the total data points. These values
are summarized in Table 2. Further, for the training and evaluation of the model, we use a standard 80%-20% train-test
split. The entire process is repeated for the reduced version
of the SystemAPIs.
Naive ML approach. By treating AOSP and its permission
mapping as the ground truth, we take the extracted features from the previous step and create labelled data for the
model’s training. We start with primary binary labelling,
that is, if the API requires protection or not. In this part, we
pre-process the features to a viable format as input to the
machine learning model. The pre-processing of the features
involves steps such as feature scaling and feature selection
based on manual analysis. The last part feeds the training
data into a simple supervised machine learning model and
analyzes the results.
Pre-processing. Pre-processing of the data is essential in
any machine learning pipeline. For each SystemAPI, we prepare a column with comma-separated words collected from
the instructions’ variable names and method names. We call
these collections of words the token set of that SystemAPI.
Next, we perform a camelCase split of the words based
on our observation. For example, getCallingUID is broken
into ‘get’, ‘Calling’, and ‘Uid’. This ensures that the tokens
can be represented as a combination of well-known words.
For example, it is not straightforward for a model to know
that getTemporaryAddress and getPermanentAddress are related, unless they are broken down into their constituent
Total System Services
66
Total SystemAPIs
1463
Access Validations detected
77
Input Validations detected
494
Total time taken
7h 11m
Instruction Removed
12%
Table 2. Statistics from the Static analysis of the android
framework (Xiamoi Redmi K20 with Android version 9)
camel case words. It becomes simpler for a model to understand their similarity based on the common words ‘get’ and
‘Address.’
To ensure generalizability, we use a Stemmer to reduce all
the words to their base form; for example, the word calling
is reduced to the word call. We use the PorterStemmer class
from the gensim library in python to achieve this. Further,
a common practice is to remove stop words such as ‘a’, ‘is’
and ‘the’ along with other words which are very short and
do not make sense on their own. In our case, we keep only
those words that are at least three units in length.
At the end of the pre-processing stage, we have a collection
of processed tokens and the labels for each SystemAPI.
Word embedding model. Based on our current features
extracted from the SystemAPI s, we can model our machine
learning pipeline as a text classification problem. We observe from the literature that using word-embedding models
go hand-in-hand with NLP-based tasks such as sentiment
analysis [2] and text classification [10]. Most of the time,
word-embeddings models are decoupled from the actual task
Conference’17, July 2017, Washington, DC, USA
Dheeraj Vagavolu
Figure 6. Overall System Design
and used to generate vectors for text. These vectors can then
be used for various tasks separately. Glove [11] and Word2vec
[6] are the most popular word-embedding models used by researchers. Word2vec focuses on the relationship of the words,
whereas glove focuses on the probability of co-occurrence
of the words. Word2vec allows for more modifications and
generalizes well on datasets with large vocabularies; hence
in our work, we use the Word2vec model for generating
word-embeddings.
The processed tokens are fed to the Word2vec model to
generate the vectors. The output from the Word2vec model is
one vector per word. Consequently, we end up with multiple
vectors for each SystemAPI. These word vectors can be combined in several ways. The most common approach is to take
the mean of all the vectors. However, more sophisticated approaches also exist, such as taking the weighted sum based
on the word frequency, called the tf-idf approach. Currently,
we take the mean of the vectors for generating the combined
vector, also known as the document vector. These document
embeddings can then be used to train models for various
tasks.
Model. In this project, we look at three well-known binary
classifiers for the task of text classification. They are Support
Vector Machines (SVM), Decision Tree and Random Forest
methods. We use the popular machine learning library scikitlearn for developing the machine learning model. We use the
RandomForest model, as it is an ensemble method and has
CS846 Final Report: Semantic Code Reduction for Android SystemAPIs
performed better compared to SVMs and other classifiers on
several occasions [16].
The results from the RandomForest classifier show an average accuracy of 73% averaged over 20 trials. Moreover, the
precision and recall of either class are similar and around
70%. It means our model has learned to identify both protected and unprotected with almost identical performance.
Hence, we use the RandomForest Model for evaluating both
the Normal and Reduced System APIs.
Accuracy
Normal SystemAPIs
72%
Reduced SystemAPIs
74%
Table 3. Accuracy of RandomForest classifier with Reduced
vs Normal SystemAPIs
Results. From Table 2 we see that on average 12% of code is
removed from the SystemAPIs. However, from Table 3, we
can see that the overall accuracy of the RandomForest classifier has not shown much improvement. It is a reasonable
result as there is still work to be done towards representing
code and building a more complex machine learning model
as discussed in section 5.
5
Limitations
In this section, we discuss some of the most affecting limitations which we intend to work upon in the future.
5.1
Naive Model
We use a naive Decision Tree model to evaluate code reduction for this project’s scope. However, it is possible that the
token-based representation of the code and the simple nature of the model cannot capture the improvement by code
reduction. A more powerful neural network-based model
might be more well suited for evaluating code reduction.
5.2
Dominator Analysis
The more intuitive approach is to use a control flow analysis
based on Control Flow Graphs to capture the inner statements in an identified control flow block. However, we used
the Dominator Analysis for this task due to technical barriers
during implementation. Although we have not performed
any comparison as to which might be better, intuitively,
Dominator Analysis reduces the efficiency of the process.
5.3
Correctness of Heuristics
Formally, we are not able to prove or argue about the completeness of the heuristics used. Hence, there might still be
some cases that are missed by our heuristics and lead to false
Conference’17, July 2017, Washington, DC, USA
negatives. We can argue that the heuristics based on termination conditions leave very little room for false positives.
6
Discussion and Conclusion
In this project, we answer the following research questions:
What type of code features can be reduced? In this work,
we start with reducing failure branches of input validations
and access control validations. Although, in future, we intend to look at more Java-specific features such as method
decorators or variable types.
What percentage of code does not contribute to the
access control requirements? We find that, on average,
12% of the code is not essential to the functionality of the
SystemAPIs. However, we have not looked at the distribution
of the value. In some cases, we could remove 70% of the code
where the call graph was quite complicated.
Does the performance of a naive access control classification model improve? According to our results, the
performance boost is insignificant for the effort required to
remove the validations. However, we think that using a naive
model without any work in the source code representation is
not a good metric to evaluate the reduction of unnecessary
features.
Overall, we can develop a tool in WALA that can automatically detect and remove failure blocks in the form of
input and access validations and reduce the code by 12%
on average. The tool can be easily extended to detect other
patterns due to its modular nature. As part of future work,
more work needs to be put into recognizing unnecessary
features and developing a complex machine learning model
that can use the code’s inherent structural nature.
7
Source Code
The source code can be found in a private repository in GitLab at https://git.uwaterloo.ca/d2vagavo/cs846_sourcecode_
dheerajvagavolu. (Invite has been sent to the username cnsun
at UW Gitlab.)
References
[1] Aafer, Y., Huang, J., Sun, Y., Zhang, X., Li, N., and Tian, C. Acedroid:
Normalizing diverse android access control checks for inconsistency
detection. In NDSS (2018).
[2] Acosta, J., Lamaute, N., Luo, M., Finkelstein, E., and Andreea, C.
Sentiment analysis of twitter messages using word2vec. Proceedings
of Student-Faculty Research Day, CSIS, Pace University 7 (2017), 1–7.
[3] Backes, M., Bugiel, S., Derr, E., McDaniel, P., Octeau, D., and Weisgerber, S. On demystifying the android application framework: Revisiting android permission specification analysis. In 25th {USENIX}
security symposium ({USENIX} security 16) (2016), pp. 1101–1118.
[4] Basha, S. R., Rani, J. K., and Yadav, J. P. A novel summarizationbased approach for feature reduction, enhancing text classification
accuracy. Engineering, Technology & Applied Science Research 9, 6
(2019), 5001–5005.
Conference’17, July 2017, Washington, DC, USA
[5] Brunzell, H., and Eriksson, J. Feature reduction for classification of
multidimensional data. Pattern Recognition 33, 10 (2000), 1741–1748.
[6] Church, K. W. Word2vec. Natural Language Engineering 23, 1 (2017),
155–162.
[7] Gorski, S. A., Andow, B., Nadkarni, A., Manandhar, S., Enck, W.,
Bodden, E., and Bartel, A. Acminer: Extraction and analysis of
authorization checks in android’s middleware. In Proceedings of the
Ninth ACM Conference on Data and Application Security and Privacy
(2019), pp. 25–36.
[8] Lam, P., Bodden, E., Lhoták, O., and Hendren, L. The soot framework
for java program analysis: a retrospective. In Cetus Users and Compiler
Infastructure Workshop (CETUS 2011) (2011), vol. 15.
[9] Li, L., Bissyandé, T. F., Papadakis, M., Rasthofer, S., Bartel, A.,
Octeau, D., Klein, J., and Traon, L. Static analysis of android apps:
A systematic literature review. Information and Software Technology
88 (2017), 67–95.
[10] Lilleberg, J., Zhu, Y., and Zhang, Y. Support vector machines and
word2vec for text classification with semantic features. In 2015 IEEE
14th International Conference on Cognitive Informatics & Cognitive
Computing (ICCI* CC) (2015), IEEE, pp. 136–140.
[11] Pennington, J., Socher, R., and Manning, C. D. Glove: Global
vectors for word representation. In Proceedings of the 2014 conference
on empirical methods in natural language processing (EMNLP) (2014),
Dheeraj Vagavolu
pp. 1532–1543.
[12] Pham, T.-N., Nguyen, V.-Q., Tran, V.-H., Nguyen, T.-T., and Ha, Q.-T.
A semi-supervised multi-label classification framework with feature
reduction and enrichment. Journal of Information and Telecommunication 1, 2 (2017), 141–154.
[13] Phinyomark, A., Phukpattaranont, P., and Limsakul, C. Feature
reduction and selection for emg signal classification. Expert systems
with applications 39, 8 (2012), 7420–7431.
[14] Raychaudhuri, S., Sutphin, P. D., Chang, J. T., and Altman, R. B.
Basic microarray analysis: grouping and feature reduction. TRENDS
in Biotechnology 19, 5 (2001), 189–193.
[15] Shao, Y., Chen, Q. A., Mao, Z. M., Ott, J., and Qian, Z. Kratos:
Discovering inconsistent security policy enforcement in the android
framework. In NDSS (2016).
[16] Valecha, H., Varma, A., Khare, I., Sachdeva, A., and Goyal, M.
Prediction of consumer behaviour using random forest algorithm.
In 2018 5th IEEE Uttar Pradesh Section International Conference on
Electrical, Electronics and Computer Engineering (UPCON) (2018), IEEE,
pp. 1–6.
[17] Zhang, L., Yang, Z., He, Y., Zhang, Z., Qian, Z., Hong, G., Zhang,
Y., and Yang, M. Invetter: Locating insecure input validations in
android services. In Proceedings of the 2018 ACM SIGSAC Conference
on Computer and Communications Security (2018), pp. 1165–1178.
Download