Predicting zero-day software vulnerabilities through data mining Su

advertisement
PREDICTING ZERO-DAY
SOFTWARE VULNERABILITIES
THROUGH DATA MINING
1
Su Zhang
Department of Computing and Information Science
Kansas State University
OUTLINE
Motivation.
 Related work.
 Proposed approach.
 Possible techniques.
 Plan.

2
OUTLINE
Motivation.
 Related work.
 Proposed approach.
 Possible techniques.
 Plan.

3
THE TREND OF VULNERABILITY NUMBERS
4
ZERO-DAY VULNERABILITY


What is zero-day vulnerability?
It is a vulnerability which is found by underground hackers
before being made public.
Increasing threat from zero-day vulnerabilities.
Many attacks are attributed to zero-day vulnerabilities.
E.g. in 2010 Microsoft confirmed a vulnerability in Internet
Explorer, which affected some versions that were released
in 2001.
5
OUR GOAL

Risk awareness. The possibility of zero-day
vulnerability must be considered for
comprehensive risk assessment for enterprise
networks.
6
ENTERPRISE RISK ASSESSMENT FRAMEWORK
7
ENTERPRISE RISK ASSESSMENT FRAMEWORK
8
ENTERPRISE RISK ASSESSMENT FRAMEWORK
9
ENTERPRISE RISK ASSESSMENT FRAMEWORK
10
ENTERPRISE RISK ASSESSMENT FRAMEWORK
11
PROBLEM
Predict
the information of
zero – day vulnerabilities
from software configurations.
12
OUTLINE
Motivation.
 Related work.
 Proposed approach.
 Possible techniques.
 Plan.

13
RELATED WORK

O. H. Alhazmi and Y. K. Malaiya, 2005.

Andy Ozment, 2007.

Kyle Ingols, et al, 2009.

Miles A. McQueen, et al, 2009.
14
OUTLINE
Motivation.
 Related work
 Proposed approach.
 Possible techniques.
 Plan.

15
PROPOSED APPROACH


Predict the likelihood of zero-day vulnerabilities
for specific software applications.
NVD
Available since 2002.
 Rich data source including the preconditions and
consequences of vulnerabilities. It could be used to
build our model and validate our work.

16
SYSTEM ARCHITECTURE
Output(MTTNV&CVSS Metrics)
Our Prediction Model
CPE (common platform enumeration)
Scanner (e.g. Nessus or OVAL)
Target Machine
IE
WinXP
FireFox
…
17
PREDICTION MODEL

Predictive data: CPE (common platform
enumeration)


Indicate software configuration on a host.
Predicted data: MTTNV (Mean Time to Next
Vulnerability) & CVSS Metrics
MTTNV indicates the probability of zero-day
vulnerabilities.
 CVSS metrics indicate the properties of the predicted
vulnerabilities.

18
CPE (COMMON PLATFORM ENUMERATION)

What is CPE?


CPE is a structured naming scheme for information
technology systems, software, and packages.
Example (in primitive format)

cpe:/a:acme:product:1.0:update2:pro:en-us
Professional edition of the "Acme Product 1.0 Update
2 English".
19
CPE LANGUAGE
20
CVSS (COMMON VULNERABILITY SCORING SYSTEM )



An open framework for communicating the
characteristics and impacts of IT vulnerabilities.
Metric Vector
access complexity (H, M, L)
authentication ( R, NR)
confidentiality (N, P, C)
...
CVSS Score: Calculated based on above vector. It
indicates the severity of a vulnerability.
21
CVSS USED IN RISK ASSESSMENT


We use CVSS to derive a conditional probability.
How likely a vulnerability could be successfully
exploited, given all preconditions fulfilled.
By combining the conditional probability with
attack graph one can calculate the cumulative
probability, we could obtain a overall estimated
likelihood of the given machine being
compromised.
22
OUTLINE
Motivation.
 Related work.
 Proposed approach.
 Possible techniques.
 Plan.

23
POSSIBLE TECHNIQUES



Linear Regression ( input are continuous
variables).
Statistical classification (input are discrete
variables).
Maximum likelihood and least squares
(Determining the parameters of our model).
24
VALIDATION METHODOLOGY

Earlier years of NVD: Building our model.

Later years of NVD: Validate our model.

Criteria: Closer to the factual value than without
considering zero-day vulnerabilities.
25
OUTLINE
Motivation.
 Related work.
 Proposed approach.
 Possible techniques.
 Plan.

26
PLAN

Next phase: Study data-mining tools (e.g.
Support Vector Machine) . Then build up our
prediction model.


Validate the model on NVD.
Final phase:
If the previous phase provides a good model, we will
incorporate the generated result into MulVAL.
 Otherwise, we are going to investigate the problem.

27
REFERENCES











[1]Andrew Buttner et al, ”Common Platform Enumeration (CPE) –
Specification,” 2008.
[2]NVD, http://nvd.nist.gov/home.cfm.
[3]O. H. Alhazmi et al, “Modeling the Vulnerability Discovery Process,” 2005.
[4]Omar H. Alhazmi et al, “Prediction Capabilities of Vulnerability Discovery
Models,” 2006.
[5]Andy Ozment, “Improving Vulnerability Discovery Models,” 2007.
[6]R. Gopalakrishna and E. H. Spafford, “A trend analysis of vulnerabilities,”
2005.
[7]Christopher M. Bishop, “Pattern Recognition andMachine Learning,” 2006.
[8]Xinming Ou et al, “MulVAL: A logic-based network security analyzer,” 2005.
[9] Kyle Ingols et al, “Modeling Modern Network Attacks and
Countermeasures Using Attack Graphs” 2009.
[10] Miles A. McQueen et al, “Empirical Estimates and Observations of 0Day
Vulnerabilities,” 2009.
[11] Alex J. Smola et al, “A Tutorial on Support Vector Regression,” 1998.
28
THANK YOU!
Questions & Answers
29
Download