Christopher Gates, Ninghui Li, Zenglin Xu Purdue University Suresh

advertisement
Christopher Gates, Ninghui Li, Zenglin Xu
Purdue University
Suresh N. Chari, Ian Molloy, Youngja Park
IBM TJ Watson Research
} 
} 
Intellectual Property Theft is an important
security problem
Insider Threat
◦  Legitimate access
◦  In-depth knowledge of resources
◦  Knowledge of deployed security mechanisms
} 
Stolen Credentials
◦  Can utilize other persons legitimate access
} 
Limit exposure via access control
◦  Users need access
◦  Productivity is often seen as more important
} 
Encrypt data at rest
◦  Does not stop legitimate access
} 
Use high level statistics for detection
◦  Does not capture more fine grained detail
◦  Does not give specific guidance for violation
} 
} 
} 
Exploit knowledge about resources to detect
deviation from access history
Can also be viewed as estimating/controlling
risks of aggregated accesses by one user
Two kinds of malicious insiders
◦  Impetuous
◦  Patient
} 
Generate a score for a set of accesses given
a history
Score between two
files
score ( f , g)
Related to all history
score !" f A H #$ = aggg∈A !"score ( f , g)#$
All files in current
period
sumScore =∑ score !" fk A H #$
k=1
Normalize
aveScore =
M
sumScore
M
} 
} 
Files are not accessed randomly within a
hierarchy
There are reasons to access specific areas
◦  Job function
◦  Project
◦  Related content
} 
Similarity can also have many facets
◦  Distance
◦  Access similarity
◦  File type/content
Name
Binary
Full Distance
Lowest Common Ancestor
(LCA)
Log LCA
Access Similarity
Formula
} 
} 
3 aggregation functions : score !" f A H #$ = aggg∈A !"score ( f , g)#$
Relates f to all files in the history
g ∈ AH
◦  min : The lowest score( f , g)
◦  ave : Average all score( f , g)
◦  k-nearest : Compares to k lowest score( f , g)
} 
CMVC Source Code Management System
◦  Log data: [user, timestamp, action, resource]
} 
For evaluation we used 1 year of log data
◦  ~512k unique files
◦  ~133k unique directories
◦  ~2k users
◦  1 period to bootstrap, 10 to train, 1 to test.
} 
Check a users
current access
against their history
Simple
}  Easy to understand
}  Detects deviations
from past behavior
} 
} 
} 
This can catch an
impetuous attacker.
Patient adversary can
seed file accesses in
previous time
periods to affect
similarity of distance
based scores
Gives a relation of
expected behavior
across all profiles.
}  Malicious user can
only affect their own
history.
} 
user1
u1Score
u2Score
…
uNScore
user2
u1Score
u2Score
…
uNScore
…
u1Score
u2Score
…
uNScore
userN
u1Score
u2Score
…
uNScore
Features
Description
Unique File Count
Main technique currently used in practice
New Unique File Count
Binary Method, new unique in window
Average Similarity Score
LogLCA Self Score values, [0,1]
Sum Similarity Score
LogLCA Sum Score values
Mean Distance
- Find a single point in to summarize
previous periods over similarity between user
features.
- Use cosine similarity to find distance
between the current point and the expected
point.
Mean Distance * New
Unique
Since the goal is to detect theft of files, and
mean distance doesn’t have a feature to
represent the number of files accessed, we
combine the mean distance by the number of
new unique files.
} 
No ground truth data for malicious behavior
} 
Generate simulated attacks by injecting directories
} 
Three size ranges for the injection
} 
Inject in two ways
◦  Represents targeted attacks on specific data
◦  500-1000 : 10 unique attacks
◦  1000-2500 : 12 unique attacks
◦  5000+ : 2 unique attacks
◦  Impetous Attacker : Inject X accesses in current period
◦  Patient Attacker : Seed the current users history with files
from the injection, then inject
} 
Injecting
} 
Injecting
} 
Similarity scores may help communicating
events
◦  Better detection of truly anomalous activity
–  Go beyond simple file counts
–  Create a ranking of most anomalous users
◦  Better understanding of what is causing the score
–  Ranking the files that a user is accessing
–  Allows for an incident response team to more quickly
understand why a user is received a high score
} 
} 
Explored using file similarity features to
identify malicious insiders
Evaluated with real access logs and synthetic
attacks
Download