PPT Version of Presentation Slides

Profiling and Identifying Individual Users by Their Command Line Usage and Writing Sytle Darusalam (100111555) Supervisor Associate Prof Helen Ashman 1 Overview • • • • • • • • Introduction Motivation Literature Review Research Question Methodology Result Contribution Future Work 2 Introduction  Profiling ->, it groups things or individuals into categories based on characteristics (N.P.Dau et al., 2000).  E.g Profiling -> user usage pattern of computer Profiling -> user identification  It aims to identify a user in natural language (Jane Austen and William Shakespeare) and Formal language (command line history) based on the investigation of psychometric user characteristics 3 Motivation • Previous research Biometric characteristic. • The minor thesis extends this by focusing on a psychometric user characteristic. • Research will consider user’s writings in two different scenarios (Natural and Formal language) and can be analyzed with n-gram in order to identify the user. 4 Literature Review • Computer science -> profiling in online social network – Research by Ashman and Holland (on draft). They examined users to identifying Anomaly detection over user model. – Department of electrical and computer Engineering, University of Victoria Canada outline about the use of behavioral biometrics for intrusion detection applications (Ahmed & Traore 2005). • N-gram based analysis – Luo et at (2010) N-gram-based malicious code feature extraction algorithm with statical language model. – N-gram analysis based on author profile also applies in authorship attribution (Keöelj et al. 2003). 5 Research Question The research will answer the questions • Q1: does the use of n-gram analysis to profile users’ writing styles in social network situations allow accurate user identification? a. if so, does it allow both positive and negative identification? • Q2: does the use of n-gram analysis to profile users’ command usage in their command line histories allow accurate user identification? a. if so, does it allow both positive and negative identification? • Q3: if the profiling of both writing styles and command usage allow accurate user profiling, which is the most accurate? 6 Research Question Cont Positive Identification Machine A Machine B Negative Identification Machine A Machine B 7 Methodology • • • • What is N-gram analysis ? N-gram is a language model based on collinear relation (Luo et al., 2010) & ‘N-gram is a subset of overlapping n-sized portion of a series of letter, words, syllables, phonemes or based pairs’ (Ashman and Holland (on draft)). 3-gram, 5-gram, 11-gram and 15-gram is used for analysis. Normalization Data used are percentage, Max-min and Z score T-Test ? Method to compare the styles of two pairs of samples. N-gram (3,5,11 & 15) Normalization A percentage, Maxmin & z Score T-Test (t-Test: paired two sample for means) Result 8 Formal language comparison Positive Identification User1-history1 User1-history3 User1-history2 User1-history4 Negative Identification User2-history1 User4-history1 User3-history1 User5-history1 9 Natural language comparison Jane Austen Positive Identification William Shakespeare Positive Identification Negative Identification 10 Result of Formal Language Comparison Positive Identification (User1 Command Line history for different machines) N-gram Normalizati on Type 3 Gram Percentage 5 Gram 11 Gram 15 Gram Positive Result (Correct Identifica tion) 6/6 Negative Result (False Identificati on ) 0 Max-Min 1/6 Z Score Percentage Total Correct Rate Percen tage 6 100% 5 1 16,6% 6/6 6/6 0 0 6 6 100% 100% Max-Min 1/6 5 1 16,6% Z Score Percentage 6/6 6/6 0 0 6 6 100% 100% Max-Min 1/6 5 1 16,6% Z Score Percentage 6/6 6/6 0 0 6 6 100% 100% Max-Min 1/6 5 1 16,6% Z Score 6/6 0 6 100% 5-gram 11 Result of Formal Language comparison (cont) Negative Identification User1 VS (User2 ,User3, User4, User5) N-gram Normalizat ion Type 3 Gram Percentage Max-Min Z Score Percentage Max-Min Z Score Percentage Max-Min Z Score Percentage Max-Min Z Score 5 Gram 11 Gram 15 Gram Positive Result (Correct Identificat ion) 23/30 20/30 19/30 13/30 17/30 14/30 16/30 26/30 16/30 23/30 24/30 24/30 Negative Result (False Identificati on ) 7 8 11 17 8 11 14 4 14 17 16 16 Total Correct 23 20 19 13 17 14 16 26 16 23 24 24 Rate Percent age 76.7% 66.7% 63.3% 43.3% 56.7% 46.7% 53.3% 86.7% 53.3% 76.7% 80.0% 80.0% 5-gram 12 Result of Natural Language comparison Positive Identification Jane Austen writing style N-gram Normalizatio n Type 3 Gram Percentage Max-Min Z Score Percentage Max-Min Z Score Percentage Max-Min Z Score Percentage Max-Min Z Score 5 Gram 11 Gram 15 Gram Positive Result (Correct Identificatio n) 18/18 2/18 18/18 18/18 2/18 18/6 18/18 6/18 18/18 18/18 9/6 18/18 Negative Result (False Identification ) 0 16 0 0 5 0 0 3 0 0 3 0 Total Correct 18 2 18 18 2 18 18 6 18 18 9 19 Rate Percentage 100% 11.11% 100% 100% 11.11% 100% 100% 33.33% 100% 100% 50% 100% 5-gram 13 Result of Natural Language comparison (cont) Negative Identification Jane Austen vs William Shakespeare N-gram Normalizatio n Type 3 Gram Percentage Max-Min Z Score Percentage Max-Min Z Score Percentage Max-Min Z Score Percentage Max-Min Z Score 5 Gram 11 Gram 15 Gram Positive Result (Correct Identificatio n) 0/16 16/16 0/16 0/16 16/16 0/16 0/16 16/16 0/16 0/16 2/16 0/16 Negative Result (False Identification ) 16 0 16 16 0 16 16 0 16 16 14 16 Total Correct 0 16 0 0 16 0 0 16 0 0 2 0 Average Rate Percentage 0% 100% 0% 0% 100% 0% 0% 100% 0% 0% 12.5% 0% 4.17% 5-gram 14 Result Summary • Formal Language 1. Positive Identification  Successful user identification Normalization Type Percentage Max-min z Score Success Total 100% 16,66% 100% 1. Negative Identification  Successful user identification Normalization Type Success Total Percentage Max-min 62,50% 72.50% z Score 60,83% 15 Result Summary cont • Natural Language 1. Positive Identification  Successful user identification Normalization Type Success Total Percentage Max-min z Score 100% 26,38% 100% 1. Negative Identification  Failed to identify user Normalization Type Success Total Percentage Max-min z Score 0% 100% 0% • which is the most accurate? Formal Language Contribution • New methods for user identification in formal language and natural language. • It could enable intrusion detection where intruders masquerade as real users. 17 Future Work • For formal language, trying to compare one machine divided by period of time • Use other gram, e.g. 2,4,6,7,8,9,10,12,13, since each gram gives a different result • User could have more than one writing style • Compare both participants in all possible scenarios. 18 Any Question Thank you 19 References • ALMASSIAN, N., AZMI, R. & BERENJI, S. 2009. AIDSLK: An Anomaly Based Intrusion Detection System in Linux Kernel. Information Systems, Technology and Management, 232-243. • ASHMAN, H. & HOLLAND, S. Profiling and identifying users with n-gram analysis on their command line histories. • BALDUZZI, M., PLATZER, C., HOLZ, T., KIRDA, E., BALZAROTTI, D. & KRUEGEL, C. 2010. Abusing Social Networks for Automated User Profiling. In: JHA, S., • • OMMER, R. & KREIBICH, C. (eds.) Recent Advances in Intrusion Detection. Springer Berlin / Heidelberg. • BHATTACHARYYA, P., GARG, A. & WU, S. F. Social Network Model Based on Keyword Categorization. Social Network Analysis and Mining, 2009. ASONAM '09. International Conference on Advances in, 20-22 July 2009 2009. 170-175. • OYD, D. M. & ELLISON, N. B. 2008. Social network sites: Definition, history, and scholarship. Journal of Computer Mediated Communication, 13, 210-230. • CHA, B. 2005. Host Anomaly Detection Performance Analysis Based on System Call of Neuro-Fuzzy Using Soundex Algorithm and N-gram Technique. Proceedings of the 2005 Systems Communications. IEEE Computer Society. • DWYER, C., HILTZ, S. R. & PASSERINI, K. Trust and privacy concern within social networking sites: A comparison of Facebook and MySpace. 2007. Citeseer. • HUBBALLI, N., BISWAS, S. & NANDI, S. Sequencegram: n-gram modeling of system calls for program based anomaly detection. Communication Systems and Networks (COMSNETS), 2011 Third International Conference on, 4-8 Jan. 2011 2011. 1-10. • KEÖELJ, V., PENG, F., CERCONE, N. & THOMAS, C. N-gram-based author profiles for authorship attribution. 2003. Citeseer. • KESELJ, F. P. D. S. V. & WANG, S. Language Independent Authorship Attribution using Character Level Language Models. 20 MAIA, M., ALMEIDA, J., VIRG\, \#237 & ALMEIDA, L. 2008. Identifying user behavior in online social networks. Proceedings of the 1st Workshop on Social Network Systems. Glasgow, Scotland: ACM. MCKINNEY, S. & REEVES, D. S. 2009. User identification via process profiling: extended abstract. Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies. Oak Ridge, Tennessee: ACM. N.P.DAU, V., RAU, V. & J.TEMPLETON, S. 2000. profiling users in the UNIX OS Environment. PANNELL, G. & ASHMAN, H. 2010. User Modelling for Exclusion and Anomaly Detection: A Behavioural Intrusion Detection System. In: DE BRA, P., KOBSA, A. & CHIN, D. (eds.) User Modeling, Adaptation, and Personalization. Springer Berlin / Heidelberg. RAAD, E., CHBEIR, R. & DIPANDA, A. User Profile Matching in Social Networks. Network-Based Information Systems (NBiS), 2010 13th International Conference on, 14-16 Sept. 2010 2010. 297-304. REDDY, D. K. S. & PUJARI, A. K. 2006. N-gram analysis for computer virus detection. Journal in Computer Virology, 2, 231-239. VOSECKY, J., DAN, H. & SHEN, V. Y. User identification across multiple social networks. Networked Digital Technologies, 2009. NDT '09. First International Conference on, 28-31 July 2009 2009. 360-365. WEI, W., XIAOHONG, G. & XIANGLIANG, Z. Profiling program and user behaviors for anomaly intrusion detection based on non-negative matrix factorization. Decision and Control, 2004. CDC. 43rd IEEE Conference on, 14-17 Dec. 2004 2004. 99-104 Vol.1. ZHANG, B., YIN, J., HAO, J., WANG, S. & ZHANG, D. 2007. New Malicious Code Detection Based on N-Gram Analysis and Rough Set Theory. In: WANG, Y., CHEUNG, Y.-M. & LIU, H. (eds.) Computational Intelligence and Security. Springer Berlin / Heidelberg. 21

PPT Version of Presentation Slides

Related documents

Products

Support

PPT Version of Presentation Slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib