Slides - Trusted Infrastructure Workshop

Trustworthy Smart Grid Infrastructures

Threats, Challenges, and Countermeasures

Saman Zonouz

University of Miami

1

Cyber Security and Forensics Research Group

• www.4-n-6.org

– Born in Jan 2012

• Research

– Real-time Smart-Grid Situational

Awareness and Intrusion Tolerance

– Cyber-Physical Power Grid Malware

Analysis and Intrusion Detection

– Host-based Intrusion Diagnosis and

Root-Cause Forensics Analysis

• Collaborators

– UIUC, PSU, WSU, Purdue, FIU, EPFL,

Google, AT&T Research, IBM Research,

Qualcomm Research

• Sponsors ~$3M

– NSF, ONR, ARPA-E,

DOE, Fortinet, UM

2

Outline

•

Smart Grid Infrastructures

•

Potential Threats

•

Countermeasures

•

Conclusions

3

POWER GRID CRITICAL

INFRASTRUCTURES

4

Power* sources

The Electric Grid Structure

inter-connected regional transmission network operators edge (distribution) networks transmission network

(backbone) distribution network (edge ) power consumers

*Jim Kurose, Networking Challenges for the Smart Grid, IIT Mumbai, 2013.

distributed generation (DG) 5

Traditional Power Grid

Generation, Transmission, and Distribution*

*PowerWorld Simulator.

6

Smart Grid Overview

7

Background: Smart-Grid

• Definition

– Efficiently, reliably, flexibly and sustainably monitor and control the generation, distribution and use of electricity

• Functional characteristics [DOE]

– Self-healing from power disturbance events

– Enabling active participation by consumers in demand response

– Operating resiliently against physical and cyber attacks

– Enabling new products, services, and markets

– Optimizing assets and operating efficiently

• Intelligent distributed measurement and control in smart grid introduces

new security attack surfaces for complex cyber-physical intrusions!

– More accessibility from remote sites

– More remote system observability and controllability

– Harder to detect, analyze, and respond to attacks!

8

G UIDE TO I NDUSTRIAL C ONTROL S YSTEMS (ICS) S ECURITY

This strategy includes firewalls, the use of demilitarized zones and intrusion detection capabilities throughout the ICS architecture. The use of several demilitarized zones in Figure 5-5 provides the added capability to separate functionalities and access privileges and has proved to be very effective in protecting large architectures comprised of networks with different operational mandates. Intrusion

Power Control Network*

Figure 5-5. CSSP Recommended Defense-In-Depth Architecture

*K. Stouffer, J. Falco, K. Scanfone, Guide to Industrial Control Systems Security, NIST.

5.5 General Firewall Policies for ICS

Once the defense-in-depth architecture is in place, the work of determining exactly what traffic should be allowed through the firewalls begins. Co nfiguring the firewalls to deny all except for the traffic absolutely required for business needs is every organization’s basic premise, but the reality is much more difficult. Exactly what does “absolutely required for business” mean and what are the security impacts of allowing that traffic through? For example, many organizations considered allowing SQL traffic through the firewall as required for business for many data historian servers. Unfortunately, SQL was also the vector for the Slammer worm. Many important protocols used in the industrial world, such as HTTP,

FTP, OPC/DCOM, EtherNet/IP, and MODBUS/TCP, have significant security vulnerabilities.

The remaining material in this section summarizes some of the key points from the CPNI Good Practice

Guide on Firewall Deployment for SCADA and Process Control Networks [34] document.

5-11

9

Power System Monitoring and Control

• Observation/sensing

– Current transformers, voltage transformers, PMUs, etc.

– Measurement noise, incomplete sensors

 state estimation

• Inputs: system topology, generation output, load

• Output: state vector (voltage phasors)

• Control (e.g., by operators/HMI servers)

– Relays, generation set points, PLC controllers

10

Cyber-Physical System Security

• Integrated cyber and physical components

– Potentially more catastrophic security incidents

• Security objectives: AIC

– Particularly in safety-critical infrastructures

• Control network security solutions

– Intrusion detection sensors

– Access control policy enforcement

– Incident auditing, logging, and analysis

Power Control Network

Cyber attack could blow up a generator

Targeting nuclear plants [Symantec]

11

.

.

.

Cyber-Physical Threat

S

M

N

T

R

E

M

E

E

A

S

U

Control Center

Power Applications

Actuators/

Apps/

Operators

Attack Surfaces

12

Lack of security auditing solutions (traceability)

LOG-FREE INSTANTANEOUS POST-

ATTACK FORENSICS ANALYSIS

13

Problem Formulation

• Goal: Once an attack occurs, to determine how it happened?

– Detection point: a malicious file modification

• e.g., password file change by Firefox

• Existing solution

– heavy auditing, and logging individual incidents

• Problem: Large trusted zone req. for log storage, parsing and analysis

– High runtime/logging performance overhead

– Storage requirements

– Slow log parsing and analysis (not realtime)

• Challenge

– A system/TCB that provides the root-cause analysis capability by design?

14

Solution

• Design/modify access control policies to facilitate post-intrusion root-cause analysis

– Log-Free and Instantaneous root-cause analysis

• Case study: automated modification of the

SE-Linux modular targeted policy base

Access Control Policies allow A read socket; allow D write sensitive_file;

Original SE-Linux Policy transition domain

A

B

D

Detection point: sensitive file modification by a process in domain D

C

Question?

A  B  D OR A  C  D

Rewritten Access Policy

B

D duplicate

D

B

A

C

D duplicate

D

C

15

Solution Overview

Fact Collector

Valid File

Types

System

Entries

Original

Policy

Policy Generation

(Off-Line)

Root-Cause

Analysis

(On-Line)

Preprocess

Template Attack Graph

Generator Module

System Specific Graph

Modification Module

Detection

Information

User

Constraints

Attack Tree Generation Module

Root-Cause

Database

Root-Cause Analysis

Engine

Policy Modification Module

Attack Path

Policy Database

Fig. 1.

FloTracker block diagram relation set t , FloTracker searches the template policy for all the possible domain transitions. Based on the original policy, this step can be different. For example, in SE-Linux policy that we have used in our implementation, the set of following permissions should be granted to allow a domain transition from state s_t to state d_t by executing the file of type f _t

[14]

•

•

•

• al l ow s_t f _t : f i l e { get at t r execut e } ; al l ow d_t f _t : f i l e ent r ypoi nt ; t ype_t r ansi t i on s_t f _t : pr ocess d_t ; al l ow s_t d_t : pr ocess t r ansi t i on;

The first rule means that the process of domain s_t is allowed to execute the file of type f _t . The next rule implies that the file of type f _t is an entrypoint to domain s_t which means, this file is allowed to be executed in domain d_t .

According to the third line, after executing the file of type f _t , the process s_t will try to transit to domain d_t and the last line allows the domain transition. Based on the initial states S

0 and state transition relation set t , FloTracker uses the Depth First Search (DFS) algorithm to create the set of graphs W= { G

1

( V

1

, E

1

) , ..., G

N e

( V

N e

, E

N e

) } . Finally the TAG is created as G = { G

1

[ G

2

... [ G

N e

} .

System Specific Attack Graph Generation. The attack graph TAG turns out to be large a given target system; therefore, FloTracker extracts and makes use of the systemspecific information to further refine the graph. The given original policy can be general and designed in a way to cover a wide range of softwares. Given a host system, a limited number of softwares are needed to be installed. The parts of the policy which cover those softwares which are not installed in the host system, can be removed from TAG by removing the related domains. In this regard, FloTracker creates a set of valid state S

V

✓

S . To find the set

S

V

FloTracker needs a relationship between the system specific information and all defined domains in the original policy. For

, example, in our implementation on SE-Linux, we have used t ype_t r ansi t i on rule to relate the domains to the file types .

FloTracker searches the policy for all t ype_t r ansi t i on rules which have the following format t ype_t r ansi t i on s_t f _t : pr ocess d_t ;

First, the set G= { ( s

1

, f

1

, d

1

) , ...

( s

M

, f

M

, d

M

) } is created and then FloTracker creates the set of all valid file types on the machine.

Fact Collector searches over all files in the file system and gets the security context assigned to each of them.

The result of the search is the set Valid Types T

V

.

System

Specific Attack Graph Generation module modifies G to G

⇤ based on T

V and G.

V. A TTACK T REE G ENERATI ON

Although, the system specific modified attack graph is smaller than the template attack graph, it still suffers from uncertainty of root-cause analysis. Without loose of generality, we suppose that there is one entry point s e a target domain s t in the system. Given

, there are a set of paths j = [ s e

, ..., s t

] from entry domain s e to target domain s t

. The more number of the paths, we will have less root-cause analysis capability. This uncertainty is due to the design of original policy . Hence, based on the available attack graph, we design a new attack graph with root-cause analysis capability. In other words, we design an attack path which has the capability of high-lighting the desired attack paths. After designing the desired attack graph, we will change the design of the original policy in a way that it will result in desired attack graph.

Obj ectives. Given an original policy with functionality F ,

The redesigned policy should have the same functionality F with higher certainty of root-cause analysis. The higher rootcause certainty can be achieved by cost of having a bigger access control policy. It means that our system might load more

CPU usage and need more memory. This correlation leads to an optimization problem. In this paper we will not solve the optimization problem and in our design, we will receives the maximum desired cost as a user constraint input. Also, in our Implementation we have measured the correlation between the policy size and system costs. According to the desired scenario by the administrator, FloTracker modifies the attack graph in different ways. FloTracker can accept four different scenarios. First, signature based; which administrator cares about a speci fic attack path. Second, Asset based scenario which system administrator lists specific domains as critical domains. Third is the whole policy modification which will result in root-cause analysis capability of transition through any path to a set of critical domains. Finally, the Cooperative

Policy Modification that uses a set of machines with different policies and each machine covers a set of attack paths.

Signature-Based Policy: In some cases, an attack has a known behavior. This behavior is called the signature of the

4

SE-Linux Results

17

False sensor measurement data injection

CYBER-PHYSICAL SECURITY STATE

ESTIMATION IN POWER GRID

18

False Data Injection

to Mislead the State Estimation Server

1.03 pu

9.35°

1.03 pu

3.79°

1.03 pu

-2.22°

Values

|V| (pu)

θ (deg)

P load (MW)

Q load (MVAr)

1.03 pu

-2.22°

1.04 pu

0.00°

1.02 pu

1.34°

1.03 pu

2.44°

1.07 pu

-1 MW

1.03 pu

34 MVAr

5.14°

-1.297° The reality

90 MW

-70 MVAr

0 MW

64 MVAr

Attack design:

Specifically chosen

“interacting” measurements to satisfy the power flow equations

All states at non-malicious buses are preserved!

Q: identify corrupted sensor data?

19

Current Bad Data Detection Solutions:

Residual-Based Approaches

• Need to account for possibility of bad data

– Bad data definition*: “measurements that are grossly in error”

– Bad data can potentially result in incorrect power-state estimates

• Measurement residuals – typical bad data detection for state estimation

if ||z −Hx|| ≤ τ no bad measurements

• Coordinated attacks modify “interacting bad-measurements” that satisfy the power flow solution equations

– difficult or impossible to detect using conventional means

20

* A. Monticelli, State estimation in electric power systems: a generalized approach. Kluwer Academic Publishers, 1999.

Residual-based Approaches:

Insufficient against Security Compromises

•

Multiple interacting bad-measurements*

–

Case 1

–

Case 2

(*) A. Monticelli, F. F. Wu, M. Yen, “Multiple bad data identification for state estimation by combinatorial optimization,”

IEEE Transactions on Power Delivery, 1986.

21

Cyber-Physical State Estimation*

• Co-utilize information from cyber and

power network

– to (more precisely) determine the state of the

cyber-physical system

Example

B

C

A

F

“Measurements i and j may be compromised”

D

E i j

• Use combined security state to provide a scalable approach to detecting bad data caused by a cyber event

*S. A. Zonouz, K. M. Rogers, R. Berthier, R. B. Bobba, W. H. Sanders, T. J. Overbye, “CPSE: Security-Oriented Cyber-

Physical State Estimation for Power-Grid Critical Infrastructures,” in review for IEEE Transactions on Smart Grid.

22

Terminology

• System state notion

1.

privilege domains

• What the attacker can do (proactive response)

2.

past consequences

• What the attacker has done (needs recovery) state s

Attacker’s

Privilege(s)

Past

Consequences

• Example: state s i

Root(A),

User(B)

Opened(R1)

23

System Model Generation

• Competitive Markov decision process (CMDP)

– Stochastic attack graph/finite state machine

• Network Access Policy

– e.g., firewall rules

• Network connectivity matrix and CMDP

– automatically generated

Network

Connectivity Matrix:

Rmt.

A

B

R1

Rmt.





1

0



1



 0

A

1

1

0

0

B

1

1

1

0

R1

0

0

1

1











A

Relay: R1

B

24

IDSes: CMDP sensors

IDS-3

B

---------

Ø

Ø

---------

Ø

IDS-4

B

---------

R1.O

IDS-2

A,B

---------

Ø

A,B

---------

R1.O

IDS-4

IDS-1

A

---------

Ø

IDS-2

IDS-1

A

IDS-2

Relay: R1

IDS-4

B

IDS-3

25

Algorithm Step 1

Potentially-bad Data Identification

• ID sensor reports

– attacker’s current privileges (probabilistically)

• Cyber-Physical interconnection

– the measurements that might have been modified by the adversary

CMDP

• Example

– CP interconnection

• i-th measurement (by PMU i

): real power of the bus B2

– IDS alerts

• PMU i is compromised

 i-th measurement might have been corrupted!

26

Algorithm Step 2

Power State Estimation & Verification

•

Throw the potentially-bad data away, and run a power state estimation using the remaining power measurements

•

Compute , and identify the corrupted measurements

– based on how much they differ from their estimates

27

IEEE TRANSACTIONS ON SMART GRID 8

Cyber-Physical State Estimation Benefits

•

Improved bad-data detection

– Accuracy and scalability

•

Improved state estimates

TABLE II

C Y BER S ECURI TY -S TATE E STI M ATI ON R ESULTS

Pr obability

HMM’s State ID

Compr omised Hosts

0 .

032141

0 .

953099

0

1

28

28

.

.

1

1

.

.

1

1

.

.

1

1, 101 .

10 .

0 .

3

0 .

001001

0 .

016759

15

36

28 .

1 .

1 .

1, 101 .

10 .

0 .

3, 101 .

11 .

0 .

3

28 .

1 .

1 .

1, 101 .

11 .

0 .

3

IEEE TRANSACTIONS ON SMART GRID

Fig. 6.

Automatically generated AGT for the IEEE 24-bus power control networks

TABLE I

(I P, HOST ) M A PPI NGS FROM F I GURE 4( B ) TO F I GURE 6

Host h h h h

1

3

5

7

I P Addr ess

28 .

1 .

1 .

1

Host I P Addr ess h

2

120 .

2 .

1 .

65 h

4

172 .

16 .

104 .

20 h

6

172 .

90 .

200 .

251 h

8

101 .

10 .

0 .

3

172

172

101 .

.

.

16

16

11 .

.

.

101

201

0 .

3

.

.

122

45

2.5%

2%

1.5%

1%

0.5%

0%

!0.5%

!1%

!1.5%

Real% Corrupted% Est.%SCPSE% Est.%Residual%

3.2%

2.2%

1.2%

0.2%

!0.8%

Real% Corrupted% Est.%ID!based% Est.%Residual%

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16%

!1.8%

!2%

!2.5%

Fig. 7.

Single measurement (sensor 1) corruption

1% 2% 3% 4% 5% 6% 7% 8% 9% 10%11%12%13%14% …% 27%

!2.8%

Fig. 8.

Multiple non-interacting measurement (sensors 1 and 27) corruption

2.2%

14 were + 1 and − 1 p.u., respectively. Figure 9 shows the

1.7%

1.2% results for the interacting measurement corruption scenario.

0.7%

The proposed ID-based algorithm was still able to detect the

0.2% measurement corruption and correctly estimate the state by

!0.3% ignoring the set of bad measurements. In comparison, the locally consistent bad measurements deceived the residual-

!1.3%

Real% Corrupted% Est.%SCPSE%

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% to the estimation server. The proposed ID-based solution’s estimation of the first sensor’s measurement, 0

.

1224, was almost equal to its correct value, 0 .

1247 (i.e., with only 0 .

002 p.u.

difference), and hence far from its modified value, resulting in a large measurement residual. The residual-based algorithm was also able to detect the data corruption, as its calculated measurement residual value, 0 .

502, was above the predefined threshold ( t = 0 .

1 p.u.). However, its estimated value was not as accurate as that of the ID-based algorithm. The estimated value was affected by the corrupted value that was wrongly considered good and used by the state estimator during the residual-based approach’s first power state estimation.

The second attack aimed to cause noninteracting measurement modifications on two power sensors. In particular, measurements from the bus 1 sensor on the 1 − 2 line, and from the bus 16 sensor on the 16 − 17 line, were corrupted.

The corruptions were both 1 p.u., and were intentionally designed not to match the underlying power system equations. In practice, such noninteracting bad data usually result from non-malicious natural and accidental failures. Figure 8 shows the measurement estimations resulting from the two algorithms. Much as in the case above, the proposed algorithm and the residual-based approach were both able to detect the data corruption. However, the residual-based approach did not estimate the power system measurements of the compromised sensors accurately.

During a more complicated attack scenario, the attacker intentionally modified two measurements from sensors 1 (on bus 1) and 14 (on bus 2), which were monitoring the two ends of the 1

−

2 power line. The data modifications were intentionally designed in such a way that they still satisfied the power flow equations (i.e., an unobservable attack). In particular, the measurement corruptions on sensors 1 and as correct, since the measurement residual value was 0 .

002, i.e., below the predefined threshold t . That example shows Multiple interacting measurement (sensors 1 and 14) corruption that SCPSE is effective at detecting interacting malicious measurement corruption attacks and producing an accurate estimate of the system state, while residual-based approaches fail to detect such attacks and thereby can lead to a faulty estimate.

We first discuss related work whose focus is on the depend-

We also evaluated SCPSE’s ability to detect maliciously al. [29] introduce a power system reliability analysis algorithm altered power system measurements in our case study power grid in 30 different cyber-originated attack scenarios.

SCPSE’s

The proposed method focuses only on accidental failures due

HMM smoothing component took 914 milliseconds on average to natural causes, and hence does not consider maliciously to calculate the posterior distribution for each attack over the failed power components. Zhou et al. [30] present a sequential state space.

SCPSE’s detection component was able to identify the exact subset of corrupted measurements in all the scenarios within 11 .

7 seconds by going through no more than the power system state estimation algorithm that uses reports from first

14 states (across the attack scenarios) on the ordered state list

(Algorithm 1). That demonstrates that SCPSE has the potential in the presence of adversaries.

!2.3% Power%Sensors% limitations. Detection accuracy of some approaches, e.g., leastsquare error-based algorithms [33], is usually low against

V I I . R

EL ATED

W

ORK coordinated attacks, as they initially consider all the measurements good. Furthermore, some other approaches, e.g.,

Recently, there has been increasing interest in security incicombinational techniques [32], do not scale well. Their search dent detection in power-critical infrastructures [25], including space for detecting bad measurements grows exponentially work on false data injection attacks [2] and defenses against with the number of measurements.

them. However, most of the past work has focused on either cyber or power side solutions

Recent research has focused on false data injection attacks

[26]–[28]. In this section, we modifies multiple measurements in a coordinated fashion to influence the estimate of the state without being detected by traditional bad-data detection schemes, and on defenses against such attacks (e.g., [28], [34], [35]). The impact of such false data injections on power system operations, including power market operations, has been considered in [3]–[5]. Specifically, [3], [4] show that false data injection attacks can be used to manipulate real-time prices in the electricity markets, while [5] shows that they can cause operators to make suboptimal power dispatch decisions.

Kosut et al. [28] introduce an algorithm to detect and

9 localize false data injection attacks using the generalized likelihood ratio test. However, that work does not take into account the cyber network topology or its current state, which might be the root cause of the problem; hence, it does not provide a complete cyber-physical picture.

Bobba et al. [34] and Dan et al. [35] demonstrate how knowledge of power system topology and the correlation, present in power system measurement data can be leveraged to provide effective, cost-ef ficient solutions for detecting malicious false data injection, and also to provide insight into the nature of unobservable attacks. Giani et al. [36] provide further characterization of unobservable attacks. However, unlike SCPSE, those efforts only leverage power system measurements except for [35] which leverages communication infrastructure topology information as well.

We now discuss related work that is focused on the security of cyber infrastructure. Cyber-based diagnostics mechanisms try to estimate the security state of a computer network [37].

Bothunter [38] extends ideas from multi-sensor data fusion to probabilistically correlate triggered alerts generated by intrusion detection systems (IDSes). The main goal is to identify the set of compromised hosts [39]; however, in a power grid context, the goal is the overall safety and reliability of the grid and not security of individual hosts. Such solutions, as they do not take the impact on the physical system into account, are thus unable to provide a complete picture. Ten et al. [40] propose a vulnerability assessment technique to evaluate the vulnerabilities of SCADA systems at three levels: system, scenarios, and access points. By calculating the risk of each asset’s compromise, Mohajerani et al. [41] introduce a method to detect and improve the vulnerability of power networks against the intrusions and malicious acts of cyber hackers. [40] and [41] both perform in an of fline manner, and hence cannot monitor the system for malicious activities while it is in its operational mode. Wilken et al. [42] propose a software fault diagnosis solution that uses data redundancy to detect faults that have been caused by probabilistic system failures [43].

Therefore, software crashes that result from vulnerability exploitations cannot be completely detected using their proposed approach. For cyber systems, there have been extensive investigations into intrusion detection techniques such as anomalybased [44], signature-based [45], and (recently) specificationbased solutions [46]. However, those traditional cyber diagnostics solutions ignore the topology and configuration of the underlying physical power system [47]. In contrast, our framework leverages the topology and configuration of the underlying physical power system to validate the outcomes of traditional IDSes.

For process control networks, Cardenas et al. [48] investigate an intrusion detection technique in which the attack’s final target is assumed to be given. That assumption could be exploited by attackers to further damage the process control network by targeting other critical goals. SCPSE, while generating the attack graph, considers all possible attack paths, even those that do not end up in critical assets, e.g., an internal Web server.

In summary, unlike previous techniques, SCPSE leverages information from the cyber network (control network topology,

Dynamic topology reconfigurations

SECURITY-ORIENTED CYBER-

PHYSICAL CONTINGENCY ANALYSIS

29

Contingency Analysis

• Answers the question: “What happens if component X goes out of

service?”

• Every 2-5 min to determine potential problems

– Runs on the current state estimate

– The list of contingencies must be picked carefully

• The “N-1” criteria is used to operate the system

– no violations when any one element is taken offline

• Future requirements are strengthening the security criteria

– (“N-1-1”): many contingencies need to be solved*

• Example: For 1000 lines

• N-1 means solving 1000 line outages

• N-2 means solving 499500 line outages (1000 choose 2)

*Charles Davis, Thomas Overbye: Linear Analysis of Multiple Outage Interaction. HICSS 2009: 1-8

Power System Contingency Analysis

List of contingencies

Violations caused by contingency

Violation summary

What happens during contingency

Security-Oriented Contingency Analysis

•

“What happens when X goes out of service?”

– X could be either a critical power or cyber asset

– Unlike traditional scenarios, the root-cause is cyber intrusions

•

Solution

– To analyze the cyber network topology model (CMDP)

– To measure how close the current security state is to serious power component contingencies

•

Benefit

– Search space is significantly smaller

– (limited by the cyber topology and strict access control policies)

Cyber-Physical Contingency Analysis

MANUSCRIPT SUBMITTED TO IEEE TRANSACTIONS ON SMART GRID 6

17 slack

1

3

24

16

15

18

14 19

21

(a) IEEE 24-bus System

20

11

9

4

5

ID 9

172.16.201.45

8

12

10

A

87%

Amps

ID8

172.16.104.20

7

2

22

23

13

6

ID 1

Business User #1 (101.10.0.3)

ID 6

Internet Host (28.1.1.1)

Internet

Corporate SubNW #1 (101.10.0.0/16)

Perimeter FW

48.61.60.99

48.61.60.91

48.61.60.248

Boundary FW - Site GenSite

48.61.60.249

PLC Network (48.61.48.0/20)

Historian Bank (172.90.200.0/24)

172.16.110.141

172.16.110.111

ID 3

Line (8,9) 172.16.104.20

120.2.1.68

Historian Bank #2 (172.90.200.0/24)

W onderware HMIs (172.16.104.0/24)

Corporate Super Router

120.2.1.66

GenSite FW #2

ID 0

120.2.1.65

DMZ (120.2.0.0/16)

172.16.110.12 172.16.110.11

120.2.1.75

Generation Optimizers (172.16.110.0/24)

172.90.200.111

172.16.110.121

172.16.110.133

172.90.200.151

172.90.200.21

172.90.200.251

ID 5

GenSite FW #1

172.16.201.43

Corporate SubNW #2 (101.11.0.0/16)

ID 7

Business User #2 (101.11.0.3)

172.16.20.5

172.16.20.4

172.16.20.3

172.16.20.2 172.16.20.1

Simulators #1 (172.16.20.0/24)

GenSite SIMs FW

172.16.2.181

172.16.2.151

Simulators #2 (172.16.2.0/23)

172.16.2.150

172.16.2.175

172.16.101.127

DCS - HMI Servers (172.16.101.0/24)

ID 2

172.16.101.122

172.16.201.20

ID 4

Historian Bank #1 (172.16.201.0/24) 172.16.101.120

Line (8,10)

172.16.201.45

172.16.201.44

172.16.101.121

(b) Power Network Topology

Fig. 2: Experimental Power Grid Testbed Architecture (with Compromised Asset IDs) non-disclosure agreement. Figure 2(b) shows the topology of a single control network that has 59 nodes, e.g., host systems and firewalls. The first control network monitors and controls buses 1

−

12 in the power system (Figure 2(a)), and the second network monitors and controls buses 13

−

24. In particular, each power bus is monitored and controlled by a single host system in the corresponding control network.

M DP gener ation. Given the power network topology and the access policy rules, i.e., about 100 firewall rules, SOCCA constructed the network connectivity matrix and generated the corresponding MDP model. It is noteworthy that because the

MDP models may not be scalable specially for large-scale power-grid infrastructures, SOCCA makes use of the envelope algorithm [21], where the MDP is generated partially, and hence, not every individual state needs to enumerated and analyzed. More technically, given the current system state, only reachable states up to some finite horizon are explored and used for the contingency analysis. Figure 3 illustrates a simplified version of the generated MDP in which states with contingencies that are exclusively cyber are drawn in white, while states with physical consequences are in gray.

The first number on each state represents its ID. Table I maps each state ID to the IDs of the compromised assets (shown in Figure 2) in each state. As shown on the generated MDP, the attacker initially resides remotely in the internet with no privilege on the power network (MDP’s state 0) and can then traverse different attack paths to access a particular host in the power network. Each MDP edge represents an access (i.e., possibly a vulnerability exploitation) allowed from a source to a destination host in the power network.

Per for mance. To validate

SOCCA’s efficiency on various networks with different sizes and topologies, we measured how long it takes to generate the MDP model for randomlygenerated power networks. One important parameter that affects the model generation complexity is the vulnerability factor . This factor is defined as the number of host computers that could be accessed and compromised from a particular host in the power control network. Figures 4(a) and 4(b) show the MDP generation time requirement and the model’s size for vulnerability factor 1, i.e., once the attacker compromises any host system, he or she can always find one other vulnerable machine to compromise. The results were averaged over 1000 runs. As illustrated, for large-scale power networks with 330 K host computers, SOCCA analyzed the inputs and generated the

MDP model within 24 milliseconds.

For cases with the vulnerability factor of 2, SOCCA generated the corresponding MDP graph within 400 milliseconds for a network with 37 nodes (Figures 4(c) and 4(d)). Figure 5 shows the time taken to solve the security index of individual states. A vulnerability factor of two means that from every host, two other host computers are both accessible

(i.e., access control policies allow that) and vulnerable to exploitations. We believe this is very pessimistic and not a very common scenario in industrial control infrastructures due to the extremely strict global access control policies and system security patching within real-world power control networks.

The main intention in doing this experiment in the paper was to evaluate, under such pessimistic assumptions, how the increasing network size affects size of the power grid’s

MDP model. We continued the experiment for cases with even higher (and much less realistic) vulnerability factors.

The model generation for a network with 18 and vulnerability factor of 3 (4) nodes took 74 (5521) milliseconds on average.

As expected, for a fixed network topology, the generated model size, and hence the overall performance overhead increases exponentially with the increasing vulnerability factor,

33

Slow response mechanisms against attacks

GAME-THEORETIC INTRUSION

RESPONSE AND RECOVERY

34

Problem Formulation

•

Given

–

Power grid model (CMDP)

–

Contingency analysis results

•

Goal

– Trustworthy (semi-)automated intrusion tolerance

• Recover from past damages

• Takes proactive response actions (to avoid further damages)

35

Trusted Intrusion Tolerance Engine

Monitor 1

. . .

Monitor n

Alerts Monitoring info

Intrusion Response System

Action 1

. . .

Action m

Commands

36

Good vs. Bad

•

The (RRE vs. Attacker) battle

– modeled as a two-player game

• Stackelberg game

•

Sequential game scheme

– Leader and Follower

– s  Leader takes action  s’  Follower  s’’

•

Leader’s Goal

– minimize the maximum possible damage by the follower!

maximum damage by the attacker later

37

State-based Modeling

•

Based on Competitive Markov Decision Processes

– Framework for decision making

•

CMDP (S, A, R, Pr, γ)

– S: state space

– A: action space

– R(s,a): reward function

– Pr(s,a,s ’): transition PDF

– γ: discounting factor

s

m

Adversarial actions

s s

i

Responsive actions j

s

n

s

k

•

two (conflicting) players

– RRE and the attacker

38

Optimal Response Strategy Selection

Blue states: secure .

Red states: insecure.

Current state

9

A

10

10

A

4

S10

S4

10

A

3

3

A

4

11

A

3

S9

9

A

8

9

A

3

8

A

S3

12

S11

5

A

11

S5

12

A

11

6

A

12

5

A

6

S12

S6 13

A

5

7

A

6

S8

8

A

7

S13

7

A

13

S7

What is the optimal action?

There are three options:

·

·

·

9

A

9

A

9

A

8

10

3

39

Optimal Response Strategy

• RRE solves the CMDP

– Bellman’s equation

– Stackelberg game

– Value iteration

Current state

10

A

4

S4

9

A

10

S9

S10

10

A

3

3

A

4

11

A

3

S11

5

A

11

9

A

8

S5

9

A

3

S3

12

A

11

6

A

12

5

A

6

S12

S6 13

A

5

8

A

12

7

A

6

S8

8

A

7

S13

7

A

13

S7

What is the optimal action? There are three options:

·

·

·

9

A

8

: attacker will take

8

A

12 à

S12 (insecure)

9

A

10

: attacker will take no action! à S10 (insecure)

9

A

3

: attacker can take either no action or

3

A

4 à S3 or S4 (secure)

40

Intrusion Response Demonstration

41

Acknowledgements

We appreciate

1. Our collaborators

– Robin Berthier, Stephen McLaughlin, Devin Pohly,

Himanshu Khurana, Tim Yardley, William Sanders,

Rakesh Bobba, Matt Davis, Kate Davis, Kaustubh Joshi,

Hari Ramasamy, Anurag Srivastava.

2. Our sponsors

– ONR, NSF, DOE, DHS, IBM, ARPA-E, UM, Fortinet.

42

Conclusions

•

Cyber-Physical Security State Estimation

•

Security-Oriented CP Contingency Analysis

•

Trusted Intrusion Tolerance

Thanks for your Attention!

Questions?

s.zonouz@miami.edu

43

Slides - Trusted Infrastructure Workshop

Trustworthy Smart Grid Infrastructures

Threats, Challenges, and Countermeasures

Saman Zonouz

University of Miami

Cyber Security and Forensics Research Group

Outline

Smart Grid Infrastructures

Potential Threats

Countermeasures

Conclusions

POWER GRID CRITICAL

INFRASTRUCTURES

The Electric Grid Structure

Traditional Power Grid

Smart Grid Overview

Background: Smart-Grid

Power Control Network*

Power System Monitoring and Control

• Observation/sensing

• Control (e.g., by operators/HMI servers)

Cyber-Physical System Security

Cyber-Physical Threat

LOG-FREE INSTANTANEOUS POST-

ATTACK FORENSICS ANALYSIS

Problem Formulation

Solution

Solution Overview

SE-Linux Results

CYBER-PHYSICAL SECURITY STATE

ESTIMATION IN POWER GRID

False Data Injection

Current Bad Data Detection Solutions:

Residual-Based Approaches

Residual-based Approaches:

Insufficient against Security Compromises

Multiple interacting bad-measurements*

Case 1

Case 2

Cyber-Physical State Estimation*

Terminology

System Model Generation

IDSes: CMDP sensors

Potentially-bad Data Identification

Power State Estimation & Verification

Throw the potentially-bad data away, and run a power state estimation using the remaining power measurements

Compute , and identify the corrupted measurements

Cyber-Physical State Estimation Benefits

Improved bad-data detection

Improved state estimates

SECURITY-ORIENTED CYBER-

PHYSICAL CONTINGENCY ANALYSIS

Contingency Analysis

Power System Contingency Analysis

Security-Oriented Contingency Analysis

“What happens when X goes out of service?”

Solution

Benefit

Cyber-Physical Contingency Analysis

GAME-THEORETIC INTRUSION

RESPONSE AND RECOVERY

Problem Formulation

Given

Power grid model (CMDP)

Contingency analysis results

Goal

Trusted Intrusion Tolerance Engine

Good vs. Bad

The (RRE vs. Attacker) battle

Sequential game scheme

Leader’s Goal

State-based Modeling

Based on Competitive Markov Decision Processes

CMDP (S, A, R, Pr, γ)

s

s s

s

s

two (conflicting) players

Optimal Response Strategy Selection