CS846 Final Report: Semantic Code Reduction for Android SystemAPIs Dheeraj Vagavolu d2vagavo@uwaterloo.ca Abstract Finding vulnerabilities in Android System APIs is a wellresearched topic [1, 7]. Due to the best practice nature of the access control measures, there can be more than one way to ensure that the APIs are secure. Hence, existing approaches that use inconsistency in similar APIs as a metric to determine vulnerabilities are not enough. Our thesis aims to use a machine learning approach to learn from the code patterns and predict the access control requirements. In order to do this, we first need to collect appropriate features from the source code, such as token names, types of variables and unique method calls. Android System APIs often contain hundreds or thousands of lines of code, making the extraction of such valuable features difficult. Moreover, the source code often contains unnecessary code insanity checks and logging behaviour. Such code is not essential in determining potential vulnerabilities. Our project aims to reduce these System APIs based on observed patterns from the source code, hoping to perform better in the vulnerability prediction phase based on the extracted features. The two main features we look at are Input Validations and Error Validations. On Average, we show that Android System APIs can be reduced by 12% of their size and evaluate our reduction approach using a naive Machine Learning model. 1 Introduction The Android Application framework is rich in services that provide users with access to hardware resources and ways of interacting with other sensitive data [3]. These include access to WIFI, location tracking, camera and many more such features. OEMs constantly modify the Android Open Source framework and introduce new features specific to their devices. Such services are easily interchangeable by vendors across different devices or even different android versions [3]. These are known as System Services, and the methods they expose are known as SystemAPIs. The Android Framework plays a crucial role in protecting private user data and the device’s resources. It does this by following several access-control models such as PID checks, UID checks, and permission checks [1]. Different Android framework APIs are protected by different permission levels such as normal, dangerous and system. The apps must declare the corresponding permissions in their manifest files and ask users to authorize either at install or run-time to access these permissions. The access-control model in Android permission is decentralized, which means that each service’s responsibility is to protect the resources it exposes by enforcing the necessary permissions. My thesis aims to predict vulnerable System APIs in the Android Framework using a Machine Learning approach. Some of the features we can use for this task are Natural language-based features such as method calls and variable names and structural features such as the method level ASTs and CFGs. However, the System APIs in android often contain hundreds or thousands of lines of code, making the extraction of valuable features difficult. Such code statements are not necessary and are just present for sanity checks and logging purposes. Some statements might be necessary for the execution but not necessary for determining the vulnerability of the API. As the project for this course, we propose "Semantic" code reduction from System APIs using static analysis and machine learning. Feature extraction is an essential part of the entire pipeline as we extract the valuable information from the SystemAPIs [12]. We know that a Natural language or text-based feature space has very high dimensionality. The existence of unwanted information in the feature space results in poor performance and scalability of artificial neural networks when used for text classification tasks [5]. In literature, we have seen that several works have focused on feature reduction to improve the precision and recall of neural network-based classification models [4, 13, 14]. To directly translate the feature reduction from natural language classification to code classification, we need to understand patterns in the text which are specific to the source code. In essence, we need to understand patterns based on domain knowledge that can be used to reduce unwanted code in the android framework. In this work, we focus on the Input Validations and Error Validations as the target patterns and develop a tool to remove such code patterns from the SystemAPIs. We evaluate our reduction approach using a naive machine learning-based classifier and report our findings. We will look at the code patterns in depth in the following sections. 2 Detailed Problem Statement Inconsistent enforcement of permissions in third-party modifications can result in some unacceptable scenarios. For example, an application might be granted numerous systemlevel permissions to access a simple Date and Time API. Conference’17, July 2017, Washington, DC, USA Sometimes, the application might not even require any permissions for accessing critical resources that can put the users at risk. In literature, researchers have studied the permission model in the framework; however, very few works have studied the correctness of the policy enforcement in the android framework. Several challenges need to be solved to study android policy enforcement. Some of them include discovering the protected resources in a SystemAPI and understanding the various security checks in the system service. Another challenge is the Inter-process Communication (IPC) through the Binder mechanism and remote procedural calls, which allow calls to go beyond the expected boundary of the SystemAPI. Researchers have come up with several static analysis approaches, such as Kratos [15] and Acedroid [1], which leverage consistency analysis to find vulnerabilities. These aim to find any inconsistent policy enforcement within similar system services or services with the same functionality. However, these require lists of manually defined security checks and protected resources that are based on observation. On the other hand, ACminer [7] is a semi-automatic approach that does not require security checks or protected resources as an input as it uses regular expressions to discover potential security checks. However, it introduces heuristics to reduce false positives. Consistency analysis shows that even academics and professional developers can introduce exploitable bugs, which lead to security issues. Hence it is necessary to study policy enforcement. The basic approach in a consistency analysis study is to find and group SystemAPIs that access similar protected resources. The next step is to determine the similarity of the access control checks in the path leading to the protected resource in such SystemAPIs. If similar access control checks are not used, an inconsistency in the access control is determined and flagged to the developer. Figure 1 shows the intuition behind consistency analysis for two SystemAPIs. 2.1 Limitations of Consistency Analysis Inconsistent policy enforcement does not always mean that a SystemAPI is exploitable. For consistency analysis, one must look at a pair of SystemAPIs with similar functionality or ones that access the same protected resource. However, such conditions introduce false positives as discovering protected resources is still an open challenge in the android framework. Several works in the Android application layer have utilized neural networks to find potential vulnerabilities. In our work, we aim to use a machine learning-based approach to learn patterns from the source code of SystemAPIs, which can directly be used to predict whether a SystemAPI requires protection. This would avoid the overhead caused by static analysis and avoid the need to find pairs of SystemAPIs with similar functionality. Hence, the entire process can be broken down into three steps; 1) Feature Extraction, 2) Labelling the Data 3) Training and Testing of ML models. In Dheeraj Vagavolu Figure 1. Consistency Analysis Intuition our current work, we focus on Step:1 i.e., feature extraction, more specifically on feature reduction. 2.2 Code Reduction Problem We know that android has a decentralized diverse access control model. The access control checks range from direct permission checks to indirect checks involving the device state or the application status. Sometimes, the Android APIs are also protected by conditional branches that check the method arguments’ correctness. Such checks are known as input validations. While detecting access control checks is straightforward based on the prior knowledge of the diverse nature of access checks, detecting input validations is challenging. Input validations are simple conditional checks which check the correctness of the input method parameters, the similarity can be seen in Listings 1 and 2. Listing 1. Input Validation void method ( param ) { i f ( param ! = NULL ) { / / Do s o m e t h i n g } else { // Exit } } Listing 2. Not an Input Validation void method ( param ) { i f ( param < 5 ) { / / Do s o m e t h i n g } / / Do s o m e t h i n g E l s e } CS846 Final Report: Semantic Code Reduction for Android SystemAPIs For our problem, the unnecessary code is present in the failure block of both the access control and input validation checks. When we look at the Android framework using a decompiler, we can see that such failure blocks can be identified based on specific heuristics involving Log statements and Termination conditions. Overall, we aim to look at prior works and the android framework and develop concrete heuristics which can be used to identify and remove the unnecessary code statements from the Android System APIs. In the following sections, we will look at both access control validations and input validations in detail. 3 Background This section will introduce some core concepts and terminology used in the report. 3.1 SystemAPIs We can find several System services in android frameworks that enhance the capabilities of the Android framework. Such System services are built as extensions to the android framework and are constantly added by vendors. Unlike framework services, System services are easily interchangeable and vary from vendor to vendor. Usually, such services are meant to be used by the android OS. However, most of the time, the vendors expose specific APIs through well-defined interfaces that are accessible to other services through the Binder IPC [1]. There is a need for solid access control checks imposed by those APIs in such cases. Weak access control in these exposed System APIs could lead to vulnerabilities and exploits by malicious entities. We observe that the permission checking in such APIs has to be enforced by the developers of those APIs. Android has specific guidelines for permission enforcement which the framework services follow in most cases. However, the same cannot be guaranteed for the System services. 3.2 Android Permission Mapping The android framework protects different resources such as hardware resources, private user data, and other critical device information. To access them, each application must hold corresponding permission. Permissions are nothing but strings that are associated with each application’s UID. Different permission levels protect different types of resources. For example, if there is very little or no risk to the user’s privacy, the Normal permissions are used. These permissions are granted at the install time. The Signature permissions are also granted at the install time by the android framework; however, the application asking for the signature permission must be signed with the same signature as that of the app that defines the required signature permission. Dangerous permissions protect critical resources that can be used to identify or harm the user somehow. For example, Conference’17, July 2017, Washington, DC, USA if one wants to read contacts from the phone or access the file storage system, one must have the required dangerous permissions declared by the android framework. Dangerous permissions are run-time permissions; that is, the user is prompted to grant the permission at run-time by showing an alert or a dialogue. Android lacks a central permission checking mechanism and relies on each framework service, exposing sensitive resources to enforce necessary permission checks. Moreover, like framework services, system services are responsible for permission enforcement for all the sensitive resources they expose. The android framework does not require a specific format or location for permission checking in these system services, making them easily exchangeable by various vendors. 4 4.1 Methodology Removable Features The first step is to look at a decompiled Android ROM and develop the features that can be safely removed to improve or at least not affect the model’s performance. Based on prior work and the manual inspection, we consider the following features for reduction. Input Validations. Input validations are used as a sanity check to discard ill-formed or malicious inputs to the SystemAPIs. The first requirement of input validation is that the input must be propagated to a comparison statement through data flow and compared against some pre-configured static values. Based on the comparison, the API decides which action to take. Access Control Validations. Similar to input validations, access control checks are standard in SystemAPIs, and they filter out any calls to the API based on some pre-defined access defining parameters. These can range from straightforward Permission checks to application status checks. Many works in literature identify and map access control validations. The mapping of the access control blocks is a challenge because of the best-effort nature of the access control enforcement and a lack of central policy. Failure Branches. The structural characteristics of input validations are inherently different from general branching statements. Specifically, an input validation not only compares the input with other data but also terminates its normal execution immediately when the validation fails [17]. We extend the exact characterization to detect access control failure blocks. 4.2 Why Input Validations and Error Validations Input validations and access control validations are strongly embedded in almost every SystemAPI because of the decentralized access control architecture of Android. Using such validations, the framework developers enforce the necessary Conference’17, July 2017, Washington, DC, USA Dheeraj Vagavolu Figure 2. Static Analysis using WALA to extract features and generate labels from android SystemAPI s security checks. However, from a prediction model’s perspective, whose goal is to generate access control enforcement using the code’s body, we deem that such validations are not necessary to be present in the body. Zhang et al. aim to discover and predict sensitive input validations using natural language models in their work. We build on their techniques, extend them to be more generalized, and discover input validation and access control failure blocks. 4.3 Heuristics for detection Our work for detecting validations can be broken down into two major steps: Detecting Conditional Statements. We use DefUse analysis and Access Control Detection Logic to detect input validation and access control conditions. Using DefUse analysis, we aim to track the input arguments from the methods to the conditional statements, and similarly, we track the access control statements using prior knowledge. Detecting Failure Branch. Our primary focus is the failure branch of the validations checks. For example, instead of looking at Listing 3, we need to find code inside the failure branch of validation as shown in Listing 4. However, note that the failure and success branches cannot be differentiated by statically evaluating the expressions. For example, it is difficult for us to know if a parameter is supposed to be NULL or not. For example, in Figure 4, the input parameter provider is supposed to be null in order to access some of the functionality of the SystemAPI. Listing 3. Success Branch of Input Validation void method ( param ) { i f ( param ! = NULL ) { / / Main c o d e ... } } Listing 4. Failure Branch of Input Validation void method ( param ) { i f ( param == NULL ) { / / should not reach here } ... } CS846 Final Report: Semantic Code Reduction for Android SystemAPIs Conference’17, July 2017, Washington, DC, USA Figure 3. Tracking Arguments to the Validations Conditions Figure 4. Simple NULL check is not enough to Detect Failure Branches of Validations However, based on manual inspection and previous works, we know that input validations can be tracked using heuristics such as termination conditions as Security Exceptions and Return statements. For our work, we use such heuristics to detect the failure blocks of both input validations and Access Control Validations. Specifically, we use a combination of Exceptions, Return Statements and Log/Debug statements to detect failure validation branches. 4.4 Static Analysis using WALA We rely on the Static Analysis methodology to analyze and extract features from System Services. Static analysis allows us to examine the desired source code without the need of executing it, in contrast to Dynamic analysis [9]. Another advantage that Static analysis provides over Dynamic analysis is the high code coverage. Static analysis tools such as Soot [8], or WALA are usually used to generate an abstract model Conference’17, July 2017, Washington, DC, USA Heuristic Usage Exceptions Look for presence of Exceptions Dheeraj Vagavolu Examples InputValidation Exceptions, Security Exceptions Return statements return null; rewhich return Primal turn 0; Types Look for Classes deLog State- rived from LOG and Log.log("Error") ments the method’s argument string Table 1. Table to test captions and labels. Return Statements of the source code in terms of call graphs (CG) or control flow graphs (CFG). This abstract model can then be queried for variable names, perform taint analysis, or perform data-flow analysis for obtaining more in-depth information. We have observed that many works in literature have used Static Analysis tools to examine the Android framework and applications [9]. We use these words as a motivation to use the Static analysis tool WALA for analyzing SystemAPIs. Class Hierarchy Analysis. One of the simplest forms of analysis that one can perform without generating computationally costly call graphs is the Class Hierarchy Analysis (CHA). Using Class Hierarchy Analysis, one can iterate through all the classes from the Android framework and the instructions they contain. Instruction types, targets of invoking instructions, and field names are specific features that can be examined using Class Hierarchy Analysis. Call Graph generation. To gain more in-depth knowledge about the methods in android services, such as variable names in various field access instructions, we require building the Call graph. The call graph begins at a specific method and then recursively maps all the methods in each method’s body. This results in a directed graph in which each node is a caller/callee method and is connected to corresponding callee methods through edges. Further, each node in a call graph has its own symbol table, which can query methods, parameter, and variable names from the instructions. In this work, we only focus on variable names and method names from field access and invoke instructions, and hence a call graph would suffice. For extracting more complicated features such as the code structure, we require building the control flow graph, which allows data-dependency analysis across the methods. Entry Point Detection. Unlike usual object-oriented programs, the android services and applications don’t have a fixed main method that can be used as an entry point for the call graph generation. Hence, researchers are generally required to create a list of their entry points depending on the context of the static analysis. In our work, we require to look at SystemAPI s in the android framework, so we need to select entry points to allow our call graph to include all the necessary SystemAPI s. The simplest way to do so would be to generate a call graph with all the necessary SystemAPI s as the entry points. DefUse Analysis. Using Wala, we track the definitions and usages of variables across the control flow. We do this in order to cover different cases, as shown in Figure 3, where the dependency of the argument is being tracked. Dominator Analysis. Using static analysis, exploring the conditional blocks is not that straightforward. It is hard to determine the statements inside the conditionals with an imprecise control flow graph. In this project, we use dominator analysis to retrieve the target code from inside the failure branches of the validation statements. First we construct a dominator tree from the control flow graph using a class provided by WALA. In a dominator tree, a node a is said to dominate another node b, if all the calls to b in the control flow graph definitely pass through a. Using this tree, we look for all the statements which are dominated by the first instruction just after the validation statement. This gives us the required instructions from inside the conditional statements. However, to use this trick, we must assume that there is at least one statement inside the conditional block. 4.5 Overall System Design To summarise, Figure 6 encapsulates the entire process starting from the ROM and generating reduced code snippets. We can group the code into the following two core modules: Detector Module. This module is responsible for detecting the required SystemAPIs in the Android framework. It does so by using Class Hierarchy Analysis and using prior knowledge of interaction amongst Binder and the SystemAPIs. Control Flow Analysis. This module does most of the work in detecting input and access control validations. This module wraps the DefUse analysis, Heuristic-based validation detection and Dominator Analysis and generates reduced SystemAPIs. 4.6 ML based evaluation To evaluate the reduction pipeline, we develop a separate ML pipeline using word embeddings as shown in Figure 5. The pipeline for both training and testing the model is the same; we would generate the document embedding for each SystemAPI using Word2vec and then feed them along with the labels to a classifier. We generate the embeddings of test data for testing and use that to predict the labels. These predicted labels can now be compared to actual labels for getting the scores. CS846 Final Report: Semantic Code Reduction for Android SystemAPIs Conference’17, July 2017, Washington, DC, USA Figure 5. Machine Learning pipleine for a Binary Classifier For our evaluation, we collect data from the decompiled ROM of Xiamoi Redmi k20 with Android version 9. Using Static Analysis, we can collect data from 644 SystemAPI s from a total of 21 Service classes. From this data, 149 SystemAPI s were labelled protected by the static analysis step, constituting around 23% of the total data points. These values are summarized in Table 2. Further, for the training and evaluation of the model, we use a standard 80%-20% train-test split. The entire process is repeated for the reduced version of the SystemAPIs. Naive ML approach. By treating AOSP and its permission mapping as the ground truth, we take the extracted features from the previous step and create labelled data for the model’s training. We start with primary binary labelling, that is, if the API requires protection or not. In this part, we pre-process the features to a viable format as input to the machine learning model. The pre-processing of the features involves steps such as feature scaling and feature selection based on manual analysis. The last part feeds the training data into a simple supervised machine learning model and analyzes the results. Pre-processing. Pre-processing of the data is essential in any machine learning pipeline. For each SystemAPI, we prepare a column with comma-separated words collected from the instructions’ variable names and method names. We call these collections of words the token set of that SystemAPI. Next, we perform a camelCase split of the words based on our observation. For example, getCallingUID is broken into ‘get’, ‘Calling’, and ‘Uid’. This ensures that the tokens can be represented as a combination of well-known words. For example, it is not straightforward for a model to know that getTemporaryAddress and getPermanentAddress are related, unless they are broken down into their constituent Total System Services 66 Total SystemAPIs 1463 Access Validations detected 77 Input Validations detected 494 Total time taken 7h 11m Instruction Removed 12% Table 2. Statistics from the Static analysis of the android framework (Xiamoi Redmi K20 with Android version 9) camel case words. It becomes simpler for a model to understand their similarity based on the common words ‘get’ and ‘Address.’ To ensure generalizability, we use a Stemmer to reduce all the words to their base form; for example, the word calling is reduced to the word call. We use the PorterStemmer class from the gensim library in python to achieve this. Further, a common practice is to remove stop words such as ‘a’, ‘is’ and ‘the’ along with other words which are very short and do not make sense on their own. In our case, we keep only those words that are at least three units in length. At the end of the pre-processing stage, we have a collection of processed tokens and the labels for each SystemAPI. Word embedding model. Based on our current features extracted from the SystemAPI s, we can model our machine learning pipeline as a text classification problem. We observe from the literature that using word-embedding models go hand-in-hand with NLP-based tasks such as sentiment analysis [2] and text classification [10]. Most of the time, word-embeddings models are decoupled from the actual task Conference’17, July 2017, Washington, DC, USA Dheeraj Vagavolu Figure 6. Overall System Design and used to generate vectors for text. These vectors can then be used for various tasks separately. Glove [11] and Word2vec [6] are the most popular word-embedding models used by researchers. Word2vec focuses on the relationship of the words, whereas glove focuses on the probability of co-occurrence of the words. Word2vec allows for more modifications and generalizes well on datasets with large vocabularies; hence in our work, we use the Word2vec model for generating word-embeddings. The processed tokens are fed to the Word2vec model to generate the vectors. The output from the Word2vec model is one vector per word. Consequently, we end up with multiple vectors for each SystemAPI. These word vectors can be combined in several ways. The most common approach is to take the mean of all the vectors. However, more sophisticated approaches also exist, such as taking the weighted sum based on the word frequency, called the tf-idf approach. Currently, we take the mean of the vectors for generating the combined vector, also known as the document vector. These document embeddings can then be used to train models for various tasks. Model. In this project, we look at three well-known binary classifiers for the task of text classification. They are Support Vector Machines (SVM), Decision Tree and Random Forest methods. We use the popular machine learning library scikitlearn for developing the machine learning model. We use the RandomForest model, as it is an ensemble method and has CS846 Final Report: Semantic Code Reduction for Android SystemAPIs performed better compared to SVMs and other classifiers on several occasions [16]. The results from the RandomForest classifier show an average accuracy of 73% averaged over 20 trials. Moreover, the precision and recall of either class are similar and around 70%. It means our model has learned to identify both protected and unprotected with almost identical performance. Hence, we use the RandomForest Model for evaluating both the Normal and Reduced System APIs. Accuracy Normal SystemAPIs 72% Reduced SystemAPIs 74% Table 3. Accuracy of RandomForest classifier with Reduced vs Normal SystemAPIs Results. From Table 2 we see that on average 12% of code is removed from the SystemAPIs. However, from Table 3, we can see that the overall accuracy of the RandomForest classifier has not shown much improvement. It is a reasonable result as there is still work to be done towards representing code and building a more complex machine learning model as discussed in section 5. 5 Limitations In this section, we discuss some of the most affecting limitations which we intend to work upon in the future. 5.1 Naive Model We use a naive Decision Tree model to evaluate code reduction for this project’s scope. However, it is possible that the token-based representation of the code and the simple nature of the model cannot capture the improvement by code reduction. A more powerful neural network-based model might be more well suited for evaluating code reduction. 5.2 Dominator Analysis The more intuitive approach is to use a control flow analysis based on Control Flow Graphs to capture the inner statements in an identified control flow block. However, we used the Dominator Analysis for this task due to technical barriers during implementation. Although we have not performed any comparison as to which might be better, intuitively, Dominator Analysis reduces the efficiency of the process. 5.3 Correctness of Heuristics Formally, we are not able to prove or argue about the completeness of the heuristics used. Hence, there might still be some cases that are missed by our heuristics and lead to false Conference’17, July 2017, Washington, DC, USA negatives. We can argue that the heuristics based on termination conditions leave very little room for false positives. 6 Discussion and Conclusion In this project, we answer the following research questions: What type of code features can be reduced? In this work, we start with reducing failure branches of input validations and access control validations. Although, in future, we intend to look at more Java-specific features such as method decorators or variable types. What percentage of code does not contribute to the access control requirements? We find that, on average, 12% of the code is not essential to the functionality of the SystemAPIs. However, we have not looked at the distribution of the value. In some cases, we could remove 70% of the code where the call graph was quite complicated. Does the performance of a naive access control classification model improve? According to our results, the performance boost is insignificant for the effort required to remove the validations. However, we think that using a naive model without any work in the source code representation is not a good metric to evaluate the reduction of unnecessary features. Overall, we can develop a tool in WALA that can automatically detect and remove failure blocks in the form of input and access validations and reduce the code by 12% on average. The tool can be easily extended to detect other patterns due to its modular nature. As part of future work, more work needs to be put into recognizing unnecessary features and developing a complex machine learning model that can use the code’s inherent structural nature. 7 Source Code The source code can be found in a private repository in GitLab at https://git.uwaterloo.ca/d2vagavo/cs846_sourcecode_ dheerajvagavolu. (Invite has been sent to the username cnsun at UW Gitlab.) References [1] Aafer, Y., Huang, J., Sun, Y., Zhang, X., Li, N., and Tian, C. Acedroid: Normalizing diverse android access control checks for inconsistency detection. In NDSS (2018). [2] Acosta, J., Lamaute, N., Luo, M., Finkelstein, E., and Andreea, C. Sentiment analysis of twitter messages using word2vec. Proceedings of Student-Faculty Research Day, CSIS, Pace University 7 (2017), 1–7. [3] Backes, M., Bugiel, S., Derr, E., McDaniel, P., Octeau, D., and Weisgerber, S. On demystifying the android application framework: Revisiting android permission specification analysis. In 25th {USENIX} security symposium ({USENIX} security 16) (2016), pp. 1101–1118. [4] Basha, S. R., Rani, J. K., and Yadav, J. P. A novel summarizationbased approach for feature reduction, enhancing text classification accuracy. Engineering, Technology & Applied Science Research 9, 6 (2019), 5001–5005. Conference’17, July 2017, Washington, DC, USA [5] Brunzell, H., and Eriksson, J. Feature reduction for classification of multidimensional data. Pattern Recognition 33, 10 (2000), 1741–1748. [6] Church, K. W. Word2vec. Natural Language Engineering 23, 1 (2017), 155–162. [7] Gorski, S. A., Andow, B., Nadkarni, A., Manandhar, S., Enck, W., Bodden, E., and Bartel, A. Acminer: Extraction and analysis of authorization checks in android’s middleware. In Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy (2019), pp. 25–36. [8] Lam, P., Bodden, E., Lhoták, O., and Hendren, L. The soot framework for java program analysis: a retrospective. In Cetus Users and Compiler Infastructure Workshop (CETUS 2011) (2011), vol. 15. [9] Li, L., Bissyandé, T. F., Papadakis, M., Rasthofer, S., Bartel, A., Octeau, D., Klein, J., and Traon, L. Static analysis of android apps: A systematic literature review. Information and Software Technology 88 (2017), 67–95. [10] Lilleberg, J., Zhu, Y., and Zhang, Y. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) (2015), IEEE, pp. 136–140. [11] Pennington, J., Socher, R., and Manning, C. D. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (2014), Dheeraj Vagavolu pp. 1532–1543. [12] Pham, T.-N., Nguyen, V.-Q., Tran, V.-H., Nguyen, T.-T., and Ha, Q.-T. A semi-supervised multi-label classification framework with feature reduction and enrichment. Journal of Information and Telecommunication 1, 2 (2017), 141–154. [13] Phinyomark, A., Phukpattaranont, P., and Limsakul, C. Feature reduction and selection for emg signal classification. Expert systems with applications 39, 8 (2012), 7420–7431. [14] Raychaudhuri, S., Sutphin, P. D., Chang, J. T., and Altman, R. B. Basic microarray analysis: grouping and feature reduction. TRENDS in Biotechnology 19, 5 (2001), 189–193. [15] Shao, Y., Chen, Q. A., Mao, Z. M., Ott, J., and Qian, Z. Kratos: Discovering inconsistent security policy enforcement in the android framework. In NDSS (2016). [16] Valecha, H., Varma, A., Khare, I., Sachdeva, A., and Goyal, M. Prediction of consumer behaviour using random forest algorithm. In 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) (2018), IEEE, pp. 1–6. [17] Zhang, L., Yang, Z., He, Y., Zhang, Z., Qian, Z., Hong, G., Zhang, Y., and Yang, M. Invetter: Locating insecure input validations in android services. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (2018), pp. 1165–1178.