A Journey Towards Rigorous Cybersecurity Experiments

advertisement
A Journey Towards Rigorous
Cybersecurity Experiments
Michel Cukier
Director, Advanced Cybersecurity Experience for Students
Associate Director for Education, Maryland Cybersecurity Center
Associate Professor, Reliability Engineering
Outline
•  Security data from other organizations
•  In-house security datasets
–  Incidents
–  Intrusion prevention system (IPS) alerts
–  Network flows
–  Honeypot data
•  Criminological theories:
–  Routine activity theory – IPS alerts
–  Rational choice theory – Honeypot data
Outline
•  Security data from other organizations
•  In-house security datasets
–  Incidents
–  Intrusion prevention system (IPS) alerts
–  Network flows
–  Honeypot data
•  Criminological theories:
–  Routine activity theory – IPS alerts
–  Rational choice theory – Honeypot data
Availability of Security
Data
•  NSF helped initiating collaborations but none
succeeded (2001)
•  Few available datasets have issues (e.g.,
MIT LL 98/99)
•  NSF workshop on the lack of available data
(2010)
•  DHS PREDICT dataset
Outline
•  Security data from other organizations
•  In-house security datasets
–  Incidents
–  Intrusion prevention system (IPS) alerts
–  Network flows
–  Honeypot data
•  Criminological theories:
–  Routine activity theory – IPS alerts
–  Rational choice theory – Honeypot data
A Rare Collaboration
•  Unique relationship with
–  G. Sneeringer, Director of Security, and his security team
at the Division of Information Technology
•  Access to security related data collected on the UMD
network
•  Development of testbeds for monitoring attackers
⇒ Enables unique empirical studies
Incident Data
•  Incidents:
–  Confirmed compromised computers
–  More than 12,000 records since June 2001
•  Models:
–  Software reliability growth models, time series,
epidemiological models
•  Questions:
–  # incidents: relevant metric?
–  Impact of time (age, duration)?
Timeline of Incident
Data (top 10)
Software Reliability
Growth Models
•  We applied the following models to our data:
– 
– 
– 
– 
Goel-Okumoto (G-O)
S-Shaped
K-Stage (K = 3)
Duane model
•  These models were selected because they are often
seen in the literature
•  We compared the model fit values to our data using
Pearson’s chi-square test:
2
T
(
)
o
−
e
χ2 = ∑ i i
ei
i =1
Chi-square Fit Values
Incidents
All data
worm_msblast
virus_generic_bot
virus_klez
bagle_worm
irc_bot
virus_agobot
rogue_ftp
Nethicsreq
Spamrelay
worm_nachi
G-O
797,069
496
21,721
15,452
10,975
25,486
30,996
37,277
21,852
57,250
826
S-Shaped
333,191
3,049
38,239
125,272
15,115
251,787
24,568
38,573
12,561
13,724
9,174
K-Stage
213,386
10,804
177,952
647,881
25,594
7,966,766
109,530
1,104,622
689,714
2,997,065
71,563
Duane
774,200
3,807
43,025
8,914
52,068
17,292
59,942
41,978
24,848
10,662
2,630
Observations:
“irc bot” Example
Forecasting and
Splitting the Data
•  An application of the models might be to make
predictions or forecasts
•  It makes sense to evaluate the models based on
predictive abilities
•  We split the data (by time) into two sets:
–  “Training” (parameter estimation) set—first 2/3 of the
data
–  “Testing” (forecast validation) set—remaining 1/3 of the
data
•  Compare the forecasted and actual values using the
same chi-square measure as before
Forecast Results
[G-O] [S-shaped]
[K-stage] [Duane]
[G-O] [S-shaped] [K-stage] [Duane]
Incidents
Training
Training
Training Training Testing
All data
820,773
151,563
187,151 162,992
466
1,876
5,875
virus_generic_bot
21,841
23,123
virus_klez
17,487
9,783
worm_msblast
bagle_worm
Testing
Testing
Testing
90,866
234,888
49,126 590,994
2,146
8
15
16
377
83,239
33,908
145
4,406
8,332
5,475
114,529
542,605
4,084
45
84
148
2,215
10,122
14,689
32,504
20
26
26
6,573
289,602 7,329,742
16,483
4,376
1,778
4,323
1,963
irc_bot
30,761
virus_agobot
27,456
23,812
116,668
25,047
1,206
159
29
34,363
rogue_ftp
27,834
44,220 1,705,429
17,823
11,280
11,925
4,200
21,468
nethicsreq
19,370
10,305
366,672
12,807
4,283
730
806
12,691
spamrelay
19,092
13,487 2,022,188
11,514
17,225
1,987
8,441
915
903
1.4
1.6
1.6
1,094
worm_nachi
547
5,634
37,744
Model and Forecast for
“spamrelay”
Intrusion Prevention
System (IPS) Data
•  Intrusion Prevention System (IPS) alerts:
–  IPSs located at the border and inside UMD
network
–  More than 7 million events since September 2006
•  Models:
–  Identify outliers, define metrics containing some
memory
•  In-house validation
Network Flows
•  Network flows:
–  130,000 IP addresses monitored (two class B
networks belonging to UMD)
•  Tool:
–  Goal: increase network visibility
–  Nfsight (available on sourceforge)
•  In-house validation
•  Next goal:
–  An efficient flow-based IDS
Backend Algorithm
Request flow: 2009-07-30 09:34:56.321 TCP 10.0.0.1:2455 → 10.1.2.3:80
Host 1
Host 2
Reply flow: 2009-07-30 09:34:56.322 TCP 10.1.2.3:80 → 10.0.0.1:2455
Algorithm:
• 
• 
• 
• 
Client
10.0.0.1
to tcp/80
Receive a batch of 5 minutes of flows
Pair up unidirectional flows using {src/dst IP/port and protocol}
Run heuristics and calculate probabilities for each end point to host a
service
Output end point results and bidirectional flows
Bi-flow: 2009-07-30 09:34:56.321 TCP 10.0.0.1:2455 → 10.1.2.3:80
Server
10.1.2.3
hosts tcp/80
Heuristics
Heuristic ID
Features and Formula Used
Output Values
Timing:
Heuristic 0
Timestamp of request <
Timestamp of reply
[0, …]
Port numbers:
Heuristic 1
Src port > Dst port
{0, 0.5, 1}
Heuristic 2
Src port > 1024 > Dst port
{0, 0.5, 1}
Heuristic 3
Port in /etc/services
{0, 0.5, 1}
Fan in/out relationships:
Heuristic 4
# ports related
[0, …]
Heuristic 5
# IP related
[0, …]
Heuristic 6
# tuples related
[0, …]
Front-end
Case Study: Scanning
Activity
Case Study: Worm
Outbreak
Case Study: Distributed
Attacks
Honeypot (HP) Data
•  Honeypot data:
–  Malicious activity collected on more than 1,200
HPs (low and high interaction)
–  Low interaction HPs deployed at UIUC, AT&T,
PJM, France and Morocco
–  High interaction HPs for study of attacks/
attackers
Details of Experiment
•  Easy access to honeypots though entry point: SSH
•  Multiple honeypots per attacker for an extended period
of time: one month
•  Configure honeypots given to one attacker with
increasing network limitations: some ports blocked
•  Collect data such as network traffic, keystrokes entered
and rogue software downloaded
Configuration Details
•  The network gateway has two network interfaces:
–  One in front of the Internet, configured with 40 public IP
addresses from the University of Maryland
–  One configured with a private IP address
•  OpenSSH was modified to reject SSH attempts on
its public IP addresses until the 150th try
•  Up to 40 honeypots can exist in parallel
•  Attackers can deploy up to 3 honeypots
•  Honeypots:
–  HP1: no network limitation
–  HP2: main IRC port blocked (port 6667)
–  HP3: every port blocked except HTTP, HTTPS, FTP,
DNS, and SSH
Attacker Identification
•  Attacker IP address
•  Attacker AS number (identifies network on
the Internet)
•  Attacker actions:
–  Rogue software origin
–  Way of performing specific actions
–  Files accessed
•  Comparison of keystroke profiles
Attacker Skills
•  Analyst assesses attacker skill
•  Preferred approach easier to reproduce
•  Criteria based on:
–  Is the attacker careful about not being seen?
–  Does the attacker check the target environment?
–  How familiar is the attacker with the rogue
software?
–  Is the attacker protecting the compromised target?
Attacker Skills (Cont.)
Criterion
Assessment
Hide
Ratio of # sessions where attacker hid
Restore deleted files
Ratio # sessions where deleted files were restored
Check presence
Ratio # sessions where presence checked
Delete downloaded
file
0 if downloaded file is not deleted, 1 otherwise
Check system
0 if system has never been checked, 1 otherwise
Edit configuration file
0 if configuration file has never been edited, 1
otherwise
Change system
0 if system has never been modified, 1 otherwise
Change password
0 if password has never been changed, 1 otherwise
Create new user
0 if no new user has been created, 1 otherwise
Rogue software
adequacy
0 if less than half of the installed rogue software is
adequate, 1 otherwise
Overall Results
•  Experiment run from May 17th, 2010 to November
5th, 2010
Honeypot # sessions # non-empty sessions
All
312
211 (68%)
HP1
160
110 (69%)
HP2
105
74 (70%)
HP3
47
27 (57%)
•  Results:
Percentage of attackers
Analysis as a Function of
Attacker Skill
–  95% check presence
or system
–  79% delete downloaded
file
–  77% change the password
–  15% create a new user
All honeypots
95% 95%
79%
100%
80%
60%
40%
20%
59%
77%
46%49%
21%
56%
15%
0%
1 2
3 4
5 6
7 8
Criterion I D
•  There might be a link between attackers
actions and their skills
9 10
Collaboration Among
Attackers?
Average number of attackers per Honeypot type
HP1
Average number of attackers
•  For the 60 deployed
honeypots, 9 (15%)
2
were targeted
1.44
1.5
1.20
1.18
by more than one attacker
1
1
•  7 honeypots were
0.5
targeted by 2 different
0
attackers, one honeypot
Honeypot t ype
by 3 different attackers
and 1 honeypot by
5 different attackers
•  Raises the important issue about how access is shared and
why
•  Even though 77% of the attackers changed the password, 15%
did share access with at least 1 other attacker
HP2
HP3
All
Challenges
•  Generalization?
–  Replication (same method)
–  Reproduction (different method)
–  Re-analysis of data
•  Issues:
–  Need collaborations for replication
–  Need to develop a new method for reproduction
–  Re-analysis might not be possible
Outline
•  Security data from other organizations
•  In-house security datasets
–  Incidents
–  Intrusion prevention system (IPS) alerts
–  Network flows
–  Honeypot data
•  Criminological theories:
–  Routine activity theory – IPS alerts
–  Rational choice theory – Honeypot data
Theories from Social
Sciences to Add Science
to Cybersecurity
•  For the last years:
–  Focus on criminological theories
–  Collaboration with criminologists
•  Consider various criminological theories
•  Identify theories that need to be adapted to
cybersecurity
Routine Activity Theory
•  Crime is normal and depends on the
opportunities available
•  Crime to be committed:
–  Motivated offender
–  Suitable target
–  Lack of capable guardian
•  Guardian: does not need to be a person,
could be an object or environmental design
Use of IPS Alerts
•  Application of Routine Activity Theory:
–  Offender: attacker launching the attack
–  Target: UMD user
–  Guardian: IPS
•  Alerts = Attack attempts (blocked by IPS)
•  Results:
–  Number of alerts is linked to daily activity
–  Origin of attack is linked to user origin
Rational Choice Theory
•  Assumes offenders are rational people who
seek to maximize their pleasure and
minimize their pain
•  Focuses on offenders as rational decision
makers calculating where their self-interest
lies
–  Choice to engage in crime
–  Aspects of effective punishment
Rational Choice Theory
and Deterrence Theory
•  Rational choice theory recognizes the
balance between costs and benefits in
determining criminal behaviors
•  Deterrence theory focuses mainly on formal
sanctions (costs) and individuals fear of
these sanctions
Deterrence Aspects
Warning
Surveillance
* * * WARNING * * * Unauthorized access to this computer is in violation of Md. Annotated Code, Criminal Law Article sections 8-­‐606 and 7-­‐302 and the Computer Fraud and Abuse Act, 18 U.S.C. sections 1030 et seq. … Surveillance and monitoring tools Banner announcing Surveillance Research Question
•  Are computer focused crimes (i.e., after an
attacker gains unauthorized access to a
computer, the use of this computer to launch
an attack towards an external target)
impacted by a surveillance warning banner
and/or surveillance tools?
Experimental Design
•  Provide access to the honeypots through a
frequently scanned and vulnerable entry point: a
modified SSH service in our case
•  Provide one or more honeypots for every attacker
and for an extended period of time
•  Randomly attribute a honeypot configuration to the
attacker to analyze the impact of the configuration on
the observed crimes
•  Collect relevant data to characterize the attacks
launched by the compromised hosts, such as
network activity and keystrokes
Surveillance
Crime focused
Effect on the attacks launched by the honeypots
•  Experimental Setup
–  300 public IP addresses
–  4 different honeypot types
•  Metrics
–  Number of crimes
–  Temporal distribution of the crimes
•  Datasets
–  Network flow records to identify crimes
–  Keystrokes to identify malicious software installations
42
Honeypot Types
Honeypot
Type
Surveillance
Banner
Surveillance
Processes
0
No
No
1
Yes
No
2
No
Yes
3
Yes
Yes
43
Surveillance Banner
44
Surveillance Processes
45
Observations
•  Results from April 2012 to October 2013
•  2,914 honeypots deployed
•  611 crimes committed
Crime
Instances
Honeypots
Reconnaissance
Activities
389
79
Flooding Attacks
180
10
Brute force Attacks
40
12
Phishing Attacks
2
2
46
Results
•  64% of the honeypots with activity are used to
build malicious attacks
•  Only 3% of the honeypots committed at least one
crime
Honeypots
deployed
At least
one crime
Ratio
Crime
Malicious
Activity
Ratio
Malicious
Type 0
710
25
0.04
456
0.64
Type 1
763
17
0.02
481
0.63
Type 2
694
26
0.04
448
0.65
Type 3
747
18
0.02
474
0.63
Total
2914
86
0.03
185947
0.64
Type
Deterrence Summary
Brute-force
Attack
First Attack
Session
Second
Attack
Session
64% of the honeypots
Warning
Surveillance
•  Reduces significantly the
duration of the sessions
•  Impact whether commands
are typed in first session
…
Crimes
3% of the honeypots
Deterrence has no effect
Issues
•  Mismatch between what criminological
theories need and what HPs data contain
•  Need statistically significant results (e.g., 6
months, over 120 HPs/week deployed, about
2900 HPs, 3700 sessions)
•  Experiments need to be deployed over a long
period of time: attacks/attackers might evolve
Some Good News
•  Empirical studies are solid scientific work
•  Developed approaches can be applied at
other locations
•  Results do not need to be identical (e.g.,
crime varies between cities)
Acknowledgments
•  Gerry Sneeringer and the Div. IT Security
team
•  Danielle Chrun, Bertrand Sobesto, and Ed
Condon
•  David Maimon, Amy Sariti, Mariel Alper,
Teddy Wilson, and Alex Testa
•  Maryland Cybersecurity Center (MC2)
Download