PPT Version of Presentation Slides

advertisement
Profiling and Identifying Individual
Users by Their Command Line
Usage and Writing Sytle
Darusalam (100111555)
Supervisor Associate Prof Helen Ashman
1
Overview
•
•
•
•
•
•
•
•
Introduction
Motivation
Literature Review
Research Question
Methodology
Result
Contribution
Future Work
2
Introduction

Profiling ->, it groups things or individuals into categories based on
characteristics (N.P.Dau et al., 2000).

E.g Profiling -> user usage pattern of computer
Profiling -> user identification

It aims to identify a user in natural language (Jane Austen and William
Shakespeare) and Formal language (command line history) based on the
investigation of psychometric user characteristics
3
Motivation
• Previous research Biometric characteristic.
• The minor thesis extends this by focusing on a
psychometric user characteristic.
• Research will consider user’s writings in two
different scenarios (Natural and Formal language)
and can be analyzed with n-gram in order to
identify the user.
4
Literature Review
• Computer science -> profiling in online social network
– Research by Ashman and Holland (on draft). They examined users to identifying
Anomaly detection over user model.
– Department of electrical and computer Engineering, University of Victoria Canada
outline about the use of behavioral biometrics for intrusion detection applications
(Ahmed & Traore 2005).
• N-gram based analysis
– Luo et at (2010) N-gram-based malicious code feature extraction algorithm with
statical language model.
– N-gram analysis based on author profile also applies in authorship attribution (Keöelj
et al. 2003).
5
Research Question
The research will answer the questions
•
Q1: does the use of n-gram analysis to profile users’ writing styles in social network
situations allow accurate user identification?
a. if so, does it allow both positive and negative identification?
•
Q2: does the use of n-gram analysis to profile users’ command usage in their command
line histories allow accurate user identification?
a. if so, does it allow both positive and negative identification?
•
Q3: if the profiling of both writing styles and command usage allow accurate user
profiling, which is the most accurate?
6
Research Question Cont
Positive Identification
Machine A
Machine B
Negative Identification
Machine A
Machine B
7
Methodology
•
•
•
•
What is N-gram analysis ?
N-gram is a language model based on collinear relation (Luo et al., 2010) & ‘N-gram is a
subset of overlapping n-sized portion of a series of letter, words, syllables, phonemes or
based pairs’ (Ashman and Holland (on draft)).
3-gram, 5-gram, 11-gram and 15-gram is used for analysis.
Normalization Data used are percentage, Max-min and Z score
T-Test ?
Method to compare the styles of two pairs of samples.
N-gram
(3,5,11 & 15)
Normalization
A percentage, Maxmin & z Score
T-Test
(t-Test: paired two
sample for means)
Result
8
Formal language comparison
Positive Identification
User1-history1
User1-history3
User1-history2
User1-history4
Negative Identification
User2-history1
User4-history1
User3-history1
User5-history1
9
Natural language comparison
Jane Austen
Positive Identification
William Shakespeare
Positive Identification
Negative Identification
10
Result of Formal Language Comparison
Positive Identification (User1 Command Line history for different machines)
N-gram
Normalizati
on Type
3 Gram
Percentage
5 Gram
11 Gram
15 Gram
Positive
Result
(Correct
Identifica
tion)
6/6
Negative
Result
(False
Identificati
on )
0
Max-Min
1/6
Z Score
Percentage
Total
Correct
Rate
Percen
tage
6
100%
5
1
16,6%
6/6
6/6
0
0
6
6
100%
100%
Max-Min
1/6
5
1
16,6%
Z Score
Percentage
6/6
6/6
0
0
6
6
100%
100%
Max-Min
1/6
5
1
16,6%
Z Score
Percentage
6/6
6/6
0
0
6
6
100%
100%
Max-Min
1/6
5
1
16,6%
Z Score
6/6
0
6
100%
5-gram
11
Result of Formal Language comparison
(cont)
Negative Identification User1 VS (User2 ,User3, User4, User5)
N-gram
Normalizat
ion Type
3 Gram
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
5 Gram
11 Gram
15 Gram
Positive
Result
(Correct
Identificat
ion)
23/30
20/30
19/30
13/30
17/30
14/30
16/30
26/30
16/30
23/30
24/30
24/30
Negative
Result
(False
Identificati
on )
7
8
11
17
8
11
14
4
14
17
16
16
Total
Correct
23
20
19
13
17
14
16
26
16
23
24
24
Rate
Percent
age
76.7%
66.7%
63.3%
43.3%
56.7%
46.7%
53.3%
86.7%
53.3%
76.7%
80.0%
80.0%
5-gram
12
Result of Natural Language comparison
Positive Identification Jane Austen writing style
N-gram
Normalizatio
n Type
3 Gram
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
5 Gram
11 Gram
15 Gram
Positive
Result
(Correct
Identificatio
n)
18/18
2/18
18/18
18/18
2/18
18/6
18/18
6/18
18/18
18/18
9/6
18/18
Negative
Result
(False
Identification
)
0
16
0
0
5
0
0
3
0
0
3
0
Total
Correct
18
2
18
18
2
18
18
6
18
18
9
19
Rate
Percentage
100%
11.11%
100%
100%
11.11%
100%
100%
33.33%
100%
100%
50%
100%
5-gram
13
Result of Natural Language comparison
(cont)
Negative Identification Jane Austen vs William Shakespeare
N-gram
Normalizatio
n Type
3 Gram
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
Percentage
Max-Min
Z Score
5 Gram
11 Gram
15 Gram
Positive
Result
(Correct
Identificatio
n)
0/16
16/16
0/16
0/16
16/16
0/16
0/16
16/16
0/16
0/16
2/16
0/16
Negative
Result
(False
Identification
)
16
0
16
16
0
16
16
0
16
16
14
16
Total
Correct
0
16
0
0
16
0
0
16
0
0
2
0
Average
Rate
Percentage
0%
100%
0%
0%
100%
0%
0%
100%
0%
0%
12.5%
0%
4.17%
5-gram
14
Result Summary
• Formal Language
1. Positive Identification  Successful user identification
Normalization Type
Percentage
Max-min
z Score
Success Total
100%
16,66%
100%
1. Negative Identification  Successful user identification
Normalization Type
Success Total
Percentage
Max-min
62,50%
72.50%
z Score
60,83%
15
Result Summary cont
• Natural Language
1. Positive Identification  Successful user identification
Normalization Type Success Total
Percentage
Max-min
z Score
100%
26,38%
100%
1. Negative Identification  Failed to identify user
Normalization Type Success Total
Percentage
Max-min
z Score
0%
100%
0%
• which is the most accurate? Formal Language
Contribution
• New methods for user identification in formal
language and natural language.
• It could enable intrusion detection where
intruders masquerade as real users.
17
Future Work
• For formal language, trying to compare one
machine divided by period of time
• Use other gram, e.g. 2,4,6,7,8,9,10,12,13,
since each gram gives a different result
• User could have more than one writing style
• Compare both participants in all possible
scenarios.
18
Any Question
Thank you
19
References
•
ALMASSIAN, N., AZMI, R. & BERENJI, S. 2009. AIDSLK: An Anomaly Based Intrusion Detection System in Linux Kernel. Information Systems,
Technology and Management, 232-243.
•
ASHMAN, H. & HOLLAND, S. Profiling and identifying users with n-gram analysis on their command line histories.
•
BALDUZZI, M., PLATZER, C., HOLZ, T., KIRDA, E., BALZAROTTI, D. & KRUEGEL, C. 2010. Abusing Social Networks for Automated User Profiling. In:
JHA, S.,
•
•
OMMER, R. & KREIBICH, C. (eds.) Recent Advances in Intrusion Detection. Springer Berlin / Heidelberg.
•
BHATTACHARYYA, P., GARG, A. & WU, S. F. Social Network Model Based on Keyword Categorization. Social Network Analysis and Mining, 2009.
ASONAM '09. International Conference on Advances in, 20-22 July 2009 2009. 170-175.
•
OYD, D. M. & ELLISON, N. B. 2008. Social network sites: Definition, history, and scholarship. Journal of Computer Mediated Communication, 13,
210-230.
•
CHA, B. 2005. Host Anomaly Detection Performance Analysis Based on System Call of Neuro-Fuzzy Using Soundex Algorithm and N-gram
Technique. Proceedings of the 2005 Systems Communications. IEEE Computer Society.
•
DWYER, C., HILTZ, S. R. & PASSERINI, K. Trust and privacy concern within social networking sites: A comparison of Facebook and MySpace. 2007.
Citeseer.
•
HUBBALLI, N., BISWAS, S. & NANDI, S. Sequencegram: n-gram modeling of system calls for program based anomaly detection. Communication
Systems and Networks (COMSNETS), 2011 Third International Conference on, 4-8 Jan. 2011 2011. 1-10.
•
KEÖELJ, V., PENG, F., CERCONE, N. & THOMAS, C. N-gram-based author profiles for authorship attribution. 2003. Citeseer.
•
KESELJ, F. P. D. S. V. & WANG, S. Language Independent Authorship Attribution using Character Level Language Models.
20
MAIA, M., ALMEIDA, J., VIRG\, \#237 & ALMEIDA, L. 2008. Identifying user behavior in online social networks.
Proceedings of the 1st Workshop on Social Network Systems. Glasgow, Scotland: ACM.
MCKINNEY, S. & REEVES, D. S. 2009. User identification via process profiling: extended abstract. Proceedings of the
5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information
Intelligence Challenges and Strategies. Oak Ridge, Tennessee: ACM.
N.P.DAU, V., RAU, V. & J.TEMPLETON, S. 2000. profiling users in the UNIX OS Environment.
PANNELL, G. & ASHMAN, H. 2010. User Modelling for Exclusion and Anomaly Detection: A Behavioural Intrusion
Detection System. In: DE BRA, P., KOBSA, A. &
CHIN, D. (eds.) User Modeling, Adaptation, and Personalization. Springer Berlin / Heidelberg.
RAAD, E., CHBEIR, R. & DIPANDA, A. User Profile Matching in Social Networks. Network-Based Information Systems
(NBiS), 2010 13th International Conference on, 14-16 Sept. 2010 2010. 297-304.
REDDY, D. K. S. & PUJARI, A. K. 2006. N-gram analysis for computer virus detection. Journal in Computer Virology, 2,
231-239.
VOSECKY, J., DAN, H. & SHEN, V. Y. User identification across multiple social networks. Networked Digital
Technologies, 2009. NDT '09. First International Conference on, 28-31 July 2009 2009. 360-365.
WEI, W., XIAOHONG, G. & XIANGLIANG, Z. Profiling program and user behaviors for anomaly intrusion detection
based on non-negative matrix factorization.
Decision and Control, 2004. CDC. 43rd IEEE Conference on, 14-17 Dec. 2004 2004. 99-104 Vol.1.
ZHANG, B., YIN, J., HAO, J., WANG, S. & ZHANG, D. 2007. New Malicious Code Detection Based on N-Gram Analysis
and Rough Set Theory. In: WANG, Y.,
CHEUNG, Y.-M. & LIU, H. (eds.) Computational Intelligence and Security. Springer Berlin / Heidelberg.
21
Download