1 Advanced Metering Infrastructure Incident Response 2 Final Report 3 4 Electric Power Research Institute 5 August 10, 2012 6 7 8 9 This document contains three separate EPRI reports that have been combined into a single document for convenience of review: 10 Advanced Metering Infrastructure Common Alarms and Events 11 Intrusion Detection System for Advanced Metering Infrastructures 12 Advanced Metering Infrastructure Cyber Security Incident Response Guidelines 13 14 15 16 17 18 19 20 Advanced Metering Infrastructure Common Alarms and Events 21 22 23 24 Product ID Number 25 26 27 Advanced Metering Infrastructure Common Alerts and Alarms 28 29 Product ID Number 30 Draft Release – August 10, 2012 31 32 33 34 35 Insert appropriate EPRI Title Page Auto Text entry here. 36 DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES 37 38 39 40 THIS DOCUMENT WAS PREPARED BY THE ORGANIZATION(S) NAMED BELOW AS AN ACCOUNT OF WORK SPONSORED OR COSPONSORED BY THE ELECTRIC POWER RESEARCH INSTITUTE, INC. (EPRI). NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, THE ORGANIZATION(S) BELOW, NOR ANY PERSON ACTING ON BEHALF OF ANY OF THEM: 41 42 43 44 45 46 (A) MAKES ANY WARRANTY OR REPRESENTATION WHATSOEVER, EXPRESS OR IMPLIED, (I) WITH RESPECT TO THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR (II) THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED RIGHTS, INCLUDING ANY PARTY'S INTELLECTUAL PROPERTY, OR (III) THAT THIS DOCUMENT IS SUITABLE TO ANY PARTICULAR USER'S CIRCUMSTANCE; OR 47 48 49 50 51 (B) ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES) RESULTING FROM YOUR SELECTION OR USE OF THIS DOCUMENT OR ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT. 52 53 Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by EPRI. 54 The following organization(s), under contract to EPRI, prepared this report: 55 Southwest Research Institute 56 57 58 59 60 61 62 63 64 This is an EPRI Technical Update report. A Technical Update report is intended as an informal report of continuing research, a meeting, or a topical study. It is not a final EPRI technical report. 65 66 NOTE 67 68 For further information about EPRI, call the EPRI Customer Assistance Center at 800.313.3774 or e-mail askepri@epri.com. 69 70 Electric Power Research Institute, EPRI, and TOGETHERSHAPING THE FUTURE OF ELECTRICITY are registered service marks of the Electric Power Research Institute, Inc. 71 Copyright © 2011 Electric Power Research Institute, Inc. All rights reserved. ACKNOWLEDGMENTS 72 73 74 The following organization(s), under contract to the Electric Power Research Institute (EPRI), prepared this report: 75 76 77 78 Southwest Research Institute 6220 Culebra Road P.O. Drawer 28510 San Antonio, Texas 78228-0510 79 80 Principal Investigator T. Do 81 This report describes research sponsored by Electric Power Research Institute (EPRI). This publication is a corporate document that should be cited in the literature in the following manner: Title of Document: Subtitle. EPRI, Palo Alto, CA: <Year>. <Product ID#>. iii 82 ABSTRACT 83 84 85 86 87 88 89 90 91 92 In order to identify a common set of Advanced Metering Infrastructure (AMI) electric meter alarms and events for standardization, it is important to identify which alarms and events are most critical and valuable for detecting and responding to AMI security incidents. This document contains the results of Common AMI Alarms and Events Task, which is a component of the Electric Power Research Institute's (EPRI) AMI Incident Response Project. The document provides information that can be referenced in order to develop a standard for AMI alarms and events. This standard will help meet electric utilities’ need for greater interoperability and standardization in AMI electric meter alarms and events and enable vendors of Security Information and Event Management (SIEM) to provide better situational awareness and cyber event detection in their products. v CONTENTS 93 1 INTRODUCTION ................................................................................................................1-1 1.1 Purpose and Scope ................................................................................................1-1 1.2 EPRI AMI Incident Response Project......................................................................1-1 1.3 Document Development Process............................................................................1-2 1.4 Document Organization ..........................................................................................1-3 2 OPERATING ENVIRONMENT ...........................................................................................2-4 2.1 AMI Security Threats and Objectives ......................................................................2-5 3 CATEGORIES OF ALARMS AND EVENTS ......................................................................3-6 3.1 Authentication .........................................................................................................3-7 3.1.1 C12.XX .............................................................................................................3-8 3.2 Integrity...................................................................................................................3-9 3.2.1 Event Log Storage and Management ................................................................3-9 3.2.2 Usage Data .....................................................................................................3-10 3.3 Controls ................................................................................................................3-10 3.3.1 Meter Disconnect Switch.................................................................................3-10 3.4 Anomaly Detection Services .................................................................................3-11 3.4.1 Metrology ........................................................................................................3-11 3.4.2 Firmware/Software ..........................................................................................3-12 3.5 Cryptographic Services .........................................................................................3-12 3.5.1 Key Management and Certificates ..................................................................3-12 3.6 Notification and Signaling Services .......................................................................3-13 3.6.1 Communications Interfaces .............................................................................3-13 3.6.2 System Security Alarms and Events ...............................................................3-14 3.6.3 Device Alarms and Events ..............................................................................3-14 4 COMMON SECURITY OBJECT DEVELOPMENT PLAN ................................................4-16 4.1 Attributes ..............................................................................................................4-16 4.2 Retention Capabilities ...........................................................................................4-17 4.3 Development Roadmap ........................................................................................4-18 122 5 CONCLUSION AND NEXT STEPS .................................................................................5-19 123 6 APPENDIX: REFERENCES, GLOSSARIES, AND INDEXES .........................................6-20 6.1 References ...........................................................................................................6-20 6.2 Acronyms .............................................................................................................6-20 7 APPENDIX: MALICIOUS AMI SECURITY EVENT SCENARIOS ....................................7-22 7.1 Scenario 1: Simple Physical Meter Attack .............................................................7-22 7.1.1 Attack..............................................................................................................7-22 7.2 Scenario 2: Complex Cyber-Physical Meter Attack ...............................................7-23 7.2.1 Attack Stage 1: Physical Removal of Meter.....................................................7-23 7.2.2 Attack Stage 2: Physical Attack of Meter Hardware ........................................7-23 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 124 125 126 127 128 129 130 131 vii 132 7.2.3 Attack Stage 3: Usage of Stolen Credentials...................................................7-24 133 viii 134 135 136 137 138 139 140 141 LIST OF FIGURES Figure 1 Common Alarms and Events Document Development Process ......................................... 1-3 Figure 2 AMI Network Diagram (Courtesy Justin Searle, UtiliSec) ................................................. 2-4 Figure 3 Simple Physical Meter Attacks ................................................................................ 7-22 Figure 4 Complex Attack Stage 1 .......................................................................................... 7-23 Figure 5 Complex Attack Stage 2 .......................................................................................... 7-24 Figure 6 Complex Attack Stage 3 .......................................................................................... 7-25 142 ix 143 LIST OF TABLES 144 145 146 147 148 149 150 151 152 153 154 Table 1 C12.XX Alarms, and Events ....................................................................................... 3-8 Table 2 Event Log Storage and Management Alarms and Events .......................................... 3-9 Table 4 Usage Data Alarms and Events................................................................................ 3-10 Table 5 Meter Disconnect Switch Alarms and Events ........................................................... 3-11 Table 6 Metrology Alarms and Status Events ........................................................................ 3-11 Table 7 Software Alarms, and Events ................................................................................... 3-12 Table 8 Key Management Alarms and Status Events............................................................ 3-13 Table 9 Communication Interface Specific Events ................................................................ 3-13 Table 10 System Security Alarms and Events ....................................................................... 3-14 Table 11 Physical and Device Alarms and Events ................................................................ 3-14 Table 12 Attributes ................................................................................................................ 4-16 155 xi 156 1 157 INTRODUCTION 158 1.1 159 160 161 162 163 164 The Advanced Metering Infrastructure (AMI) Common Alarms and Events document is intended to provide a foundation on which AMI vendors and asset owners can develop a standard for AMI alarms and events. The standard will address electric utilities’ need for greater interoperability and standardization of security events and enable Security Information and Event Management (SIEM) vendors to provide better situational awareness and cyber event detection in their products. 165 166 167 168 The scope of this document includes only alarms and events generated by the meter and not the supporting AMI components such as the collection engine, technician service tool, or other AMI equipment at the head end. Additionally, while this document attempts to begin codifying the meaning of alarms and events, it does not attempt to codify their interpretation. 169 1.2 170 171 172 The goal of the EPRI AMI Incident Response Project is to address several challenges that confront utilities when they are designing systems for detecting and responding to AMI incidents: Purpose and Scope EPRI AMI Incident Response Project 173 Lack of a standard set of alarms and events across AMI vendors 174 How to design a scalable Intrusion Detection System (IDS) 175 Guidelines for responding to AMI incidents 176 177 178 179 180 181 182 183 If a utility deploys meters from multiple vendors, the different data formats and naming conventions for alarms and events can make it challenging and expensive to integrate the AMI systems into the utility's SIEM. A set of common security information objects is needed to standardize the AMI alarms and events across vendors. Codifying security information objects will reduce the resources necessary for a utility to deploy multiple AMI vendors and also reduce the customization that each AMI vendor must do for a specific utility. In the long run, this will also make it easier for third-party security appliance manufacturers to apply their technology to the AMI domain, helping utilities better manage their security. 184 185 186 187 188 189 190 The effective design of an IDS in a utility's AMI environment has several characteristics that differentiate it from the traditional Information Technology (IT) environment. For example, simply deploying a perimeter IDS may not provide the coverage necessary for an AMI system. Since there tends to be mesh networks in addition to IP-based backhaul networks, positioning an IDS at the AMI head-end system could miss malicious activity in the mesh network. Additionally, there can be scalability issues as some utilities deploy millions of meters in their service territory. 191 192 In addition to the challenges of integrating AMI systems into utility backend systems, it is also unclear what the best practices are for responding to AMI incidents. There is a need for clear 1-1 193 194 guidelines for determining the type of incident (natural or manmade, malicious or non-malicious) as well as the best approach for responding to the incident. 195 196 197 198 This report describes the results of the AMI Alarms and Events Task that was created to identify a set of AMI alarms and events that can be standardized and develop a plan for the standardization of common AMI security objects. The other AMI incident response efforts are summarized in a separate report. 199 1.3 200 201 202 This document is informed by and builds upon documents such as the Security Profile for AMI v2.0, AMI System Security Requirements v1.01, NISTIR 7628, and the Smart Grid (SG) Network System Requirements Specification. 203 204 205 206 207 208 209 The Security Profile for AMI represents the security concerns of the Advanced Security Acceleration Project for the Smart Grid (ASAP-SG) AMI Security (AMI-SEC) Task Force and provides guidance and security controls to organizations developing or implementing AMI solutions. The intent of the document is to provide prescriptive, actionable guidance on how to build-in and implement security for AMI smart grid functionality. The scope of the Security Profile for AMI extends from the Meter Data Management System (MDMS) up to and including the Home Area Network (HAN) interface of the smart meter. 210 211 212 213 214 AMI System Security Requirements, also a product of the AMI-SEC Task Force, provide the utility industry and vendors with a set of security requirements for AMI. The requirements are intended to be used in the procurement process, and represent a superset of requirements gathered from current cross-industry accepted security standards and best practice guidance documents. 215 216 217 218 219 The NISTIR 7628, Guidelines for Smart Grid Cyber Security was developed by the Cyber Security Working Group (CSWG) of the Smart Grid Interoperability Panel (SGIP), a publicprivate partnership launched by NIST. The NISTIR 7628 was created to provide organizations with an analytical framework to develop effective cyber security strategies tailored to the implementation of Smart Grid and their associated risks and vulnerabilities. 220 221 222 The SG Network System Requirements Specification was created to provide utilities, vendors, and standard development organizations with a system requirements specification for smart grid communication. 223 224 225 226 227 228 229 The process followed in the development of this document is depicted in Figure 1. First, AMI vendors were contacted to build support and participation. These parties helped develop a questionnaire that was then distributed to the smart grid community, including groups such as the National Institute of Standards and Technology (NIST) Smart Grid Interoperability Panel Cyber Security Working Group (SGIP–CSWG) AMI Security Group and the National Electric Sector Cybersecurity Organization Resources (NESCOR). Members of the community then formed a small informal task force to review and provide input into the creation of this document. 230 231 232 233 The task force’s charter consisted of providing information and insight from industry stakeholders as input to this document. The task force reviewed responses to the questionnaire and provided additional content. Task force members then converted the questionnaire responses into the content presented in this document. Document Development Process 1-2 234 235 236 Each draft version of this document will be distributed to the OpenSG AMI Security Group and NESCOR for feedback. This feedback will be considered by the WG leading to new versions of the document. 237 Figure 1 Common Alarms and Events Document Development Process 238 239 240 1.4 Document Organization 241 242 243 244 245 Section 2 describes the operating environment of AMI systems including a high-level system architecture, security threats and objectives, and basic security event scenarios. Section 3 is the core of the document and describes categories of alarms and events that will support the development of common security objects. Section 4 describes a common security object development plan. 1-3 246 2 247 OPERATING ENVIRONMENT 248 249 250 251 252 253 254 255 256 AMI systems enable utility back office networks to communicate with electric meters through Neighborhood Area Networks (NANs), backhaul networks, and AMI DeMilitarized Zone (DMZ) networks. Figure 2 depicts an abstract AMI network diagram. A typical interface within a NAN is a Radio Frequency Local Area Network (RFLAN) interface. These interfaces form a mesh network among electric meters and often use proprietary communication standards. A collector joins the RFLAN and also has an additional backhaul interface such as a cellular or Ethernet interface. This backhaul interface is used to communicate with an AMI DMZ network and the head-end system. The head-end system is connected to back office networks and can send down pricing signals as well as push firmware updates to electric meters and collectors. 257 258 The remainder of Section 2 discusses AMI security threats and objectives followed by security event scenarios. 259 260 261 Figure 2 AMI Network Diagram (Courtesy Justin Searle, UtiliSec) 2-4 262 2.1 AMI Security Threats and Objectives 263 264 265 This section lists incidents that are both malicious and non-malicious events. Since the vast majority of cyber incidents on AMI systems are non-malicious, incident reports should be correlated to detect suspicious events and limit false positives. 266 Threats to AMI implementations include non-malicious actions and events caused by: 267 Operational mistakes 268 Employees who bypass security for convenience 269 Safety system failures 270 Equipment failures 271 Natural disasters 272 Threats to AMI implementations include malicious actions and events caused by: 273 Insider threats including disgruntled employees 274 Vandalism (random and malicious) 275 Loss of information privacy 276 Extortion 277 Cyber hacking 278 Viruses and worms 279 Theft (theft of device and theft of energy) 280 Terrorism 281 Security goals for AMI security event management systems include: 282 Detection of malicious actions 283 Assessment and characterization of events 284 Notification of appropriate system operators 285 Enabling appropriate responses to events 2-5 286 3 287 CATEGORIES OF ALARMS AND EVENTS 288 289 290 291 292 293 294 This section contains information on the common alarms and events to be implemented on the electric meter. Each system security requirement category is listed in bold typeface. For each system security requirement category, alarms, and events are grouped under sub-categories denoted in bold and italicized typeface. The system security requirement categories were selected from the categories listed in the AMI System Security Requirements document. The sub-categories were selected by evaluating the identified common alarms and events and grouping them under major components of the AMI. 295 296 Within this section, the following terminology is used to differentiate between “alarms” and “events”: 297 298 299 Alarms are considered security critical and should be reported immediately to the system operator or security event management system. Alarms and events may have direct impact on critical functions such as system control. 300 301 302 303 304 Events may or may not be security related but are useful as forensic evidence during the investigation of an AMI cyber-security incident. Correlated events may also indicate an attack in progress. Events are stored locally on the device and periodically sent to the event management system. For each device, events can happen at different logical layers within the system including network communication, operating system, or application. 305 306 307 308 309 This section contains tables describing the alarms, and events. Each table contains the following fields: name, description, type, and traceability. The name field is a short, unique way to identify the alarm, or event. The description field contains more information or a definition. The type field is populated with an A for an alarm and an E for an event. This traceability is documented in each table, or marked as not applicable when appropriate. 310 311 312 When possible, these alarms and events categories are traced to the requirements listed in AMI System Security Requirements and NISTIR 7628. The AMI System Security Requirements categories that are applicable to alarms and events consist of the following: 313 Authentication (FAT) 314 Integrity (FIN) 315 Accounting (FAC) 316 Anomaly Detection Services (FAS) 317 Cryptographic Services (FCS) 318 Notification and Signaling Services (FNS) 3-6 319 320 The AMI System Security Requirements topics that are not applicable to alarms and events consist of the following: 321 Confidentiality and Privacy (FCP) 322 Availability (FAV) 323 Identification (FID) 324 Authorization (FAZ) 325 Non-Repudiation (FNR) 326 Boundary Services (FBS) 327 Resource Management Services (FRS) 328 Trust and Certificate Services (FTS) 329 Development Rigor (ADR) 330 Organizational Rigor (AOR) 331 Handling/Operating Rigor (AHR) 332 Accountability (AAY) 333 Access Control (AAC) 334 335 336 337 338 In cases where an alarm, or event traces to multiple items in AMI System Security Requirements, the following convention is used: ABC.1,2,..,N where ABC denotes the AMI System Security Requirements category code and 1-N denotes the specific requirement in the category. In cases where multiple categories are referenced, a semicolon (;) is used to separate the category (e.g., ABC.1,2; DEF.1.3). 339 The NISTIR 7628 categories that are applicable to alarm and event consist of the following: 340 Access Control (SG.AC) 341 Audit and Accountability (SG.AU) 342 Identification and Authentication (SG.IA) 343 Communication Protection (SG.SC) 344 Information Integrity (SG.SI) 345 346 347 348 In cases where an alarm, or event traces to multiple items in NISTIR 7628, the following convention is used: AB.CD-1,2,..,N where AB.CD denotes the category code and 1-N denotes the specific requirement in the category. In cases where multiple categories are referenced, a semicolon (;) is used to separate the category (e.g., AB.CD-1,2; EF.GH-1.3). 349 350 This section also includes material related to the Security Profile for AMI. If applicable, these relationships are indicated along with the section of that document referenced. 351 3.1 352 This section describes alarms, and events related to authentication. Authentication 3-7 353 3.1.1 C12.XX 354 355 356 357 The ANSI C12.18, C12.19, and C12.22 communication standards are implemented in the majority of AMI electric meters in North America. As it relates to security, the ANSI standards provide specifications for authentication controls on the meter. C12.XX alarms and events are included in Table 1. 358 359 360 361 These communication standards may be used to access AMI electric meters using field tools. Additional guidance regarding field tools is included in the Security Profile for AMI Section AMISP-2.10.1. Additional guidance regarding system connection is included in the Security Profile for AMI Section DHS-2.8.18. 362 363 C12.22 is a standard generally used in North America. In subsequent versions, this document will be generalized and include DLMS/COSEM alarms and events. 364 365 Table 1 C12.XX Alarms, and Events Event Type ID Name Description Type Traces To TAB.1 C12.18 Successful Login This event occurs when a user successfully logs in. E FAC.3; SG.AC-3,4 TAB.2 C12.18 Failed Login This event occurs when a user attempts to log in but is not successful. E FAC.3; SG.AC-3,4,8 TAB.3 C12.19 User Password Modified This event occurs when a user password is modified in the meter’s C12.19 tables. E FAC.3; SG.AC-21 TAB.4 C12.19 User Permissions Modified This event occurs when a user's permissions are modified in the meter’s C12.19 tables. E FAC.3; SG.AC-3,4 TAB.5 C12.19 Least Privilege Alert This alert occurs when a user logs in with a lower privilege password and attempts to modify C12.19 tables that require a higher privilege. A FAC.3; SG.AC 7 TAB.6 C12.22 Failed Authentication This event occurs when a message authentication fails. This may occur as a result of a meter receiving a message with missing or invalid digital signatures for authentication. E FAC.3; SG.AC-3,4,8; SG.SC-8,9,20 TAB.7 C12.22 Successful Login This event occurs when a user successfully logs in C12.18 over C12.22. E FAC.3; SG.AC-3,4 TAB.8 C12.22 Failed Login This event occurs when a user attempts to log in E FAC.3; SG.AC-3,4,8 3-8 C12.18 over C12.22 but is not successful. 366 3.2 Integrity 367 The following section describes alarms and events related to integrity. 368 3.2.1 Event Log Storage and Management 369 370 Alarms and status events related to event log storage and management are included in Table 2. The purpose of the following events is to document the operation and use of event storage. 371 372 Table 2 Event Log Storage and Management Alarms and Events Event Type ID Name Description Type Traces To LOG.1 Local Event Store Cleared This event occurs when the local event store is cleared. E FAC.23 LOG.2 Local Event Log Overflowed This event occurs when the local event log overflows. This may occur as a result of the meter receiving a high number of events. A FAC.23; SG.AU-4,5 LOG.3 Event Log Configuration Change This event occurs when the event log configuration changes. Types of configuration changes may include types or number of events logged. E FAC.23; SG.CM-3,4,5,6 LOG.4 Event Log Disable This event occurs when the event logging functionality is disabled. E FAC.23; SG.CM-6 3-9 373 3.2.2 Usage Data 374 375 Alarms and status events related to the integrity of customer-specific usage data are included in Table 3. 376 377 Table 3 Usage Data Alarms and Events Event Type ID Name Description Type Traces To USG.1 Usage Data Cleared This event occurs when usage data is cleared from the device. E FAC.23 USG.2 C12.19 Energy Usage Data Table Modified This event occurs when the C12.19 usage data tables are modified on the device. E FAC.23 USG.3 Usage Data Configuration Change This event occurs when there is a usage data configuration change on the device. Types of usage data configurations may include consumption units and multipliers. E FAC.23 378 3.2.3 Meter Configuration 379 Alarms and status events related to the configuration of the AMI meter are included in Table 4. 380 381 Table 4 Meter Configuration Alarms and Events Event Type ID Name Description Type Traces To CFG.1 C12.19 Device Configuration Change This event occurs when a device's configuration is changed in the meter’s C12.19 tables. E SG.CM-3, 4, 5 382 383 3.3 Controls 384 385 The section describes the alarms and events as it relates to the accountability of control functions on the meter. 386 3.3.1 Meter Disconnect Switch 387 388 389 390 The meter disconnect switch enables the utility to send a signal to the AMI electric meter to disconnect a customer’s power if they are moving, have not paid their electric bills, or for safety purposes. Alarms and events related to triggering must be in place to detect malicious activity. Alarms and status events related to the meter disconnect switch are included in Table 5. 3-10 391 392 Table 5 Meter Disconnect Switch Alarms and Events Event Type ID Name Description Type SWI.1 Remote Disconnect This alarm occurs when power is remotely disconnected. A high frequency of disconnects or many in a single neighborhood could indicate a problem. A SWI.2 Remote Reconnect This alarm occurs when power is remotely reconnected. A SWI.3 New Connect This event occurs when a new power connection is established. A SWI.4 Local Disconnect This event occurs when power is locally disconnected. This event may be triggered through the meter’s C12.18 optical port interface. A SWI.5 Local Connect This event occurs when power is locally connected. This event may be triggered through the meter’s C12.18 optical port interface. A Traces To FAS.1,4 393 3.4 Anomaly Detection Services 394 395 This section describes the alarms and events as it relates to the anomaly detection functions on the meter. 396 3.4.1 Metrology 397 398 Alarms and status events related to detection of anomalies on the metrology board are included in Table 6. 399 400 Table 6 Metrology Alarms and Status Events Event Type ID Name Description Type Traces To MET.1 Abnormal Time Signal Readings This event occurs when data communicated from the metrology board to the register board is received at an abnormal rate. E FAS.1,4 3-11 Event Type ID Name Description Type Traces To MET.2 Metrology Failed Integrity Check This event occurs when there is a failed integrity check in data communicated from the metrology chip to the register chip. A FAS.1,4 SG.SC-8 401 3.4.2 Firmware/Software 402 403 404 Alarms and status events related to detections of anomalies in the firmware/software are included in Table 7. Guidance regarding firmware is included in the Security Profile for AMI Section DHS-2.10. 405 406 Table 7 Software Alarms, and Events Event Type ID Name Description Type Traces To FIR.1 Upgrade Initiation This event occurs when a firmware upgrade is initiated. The source of initiation should be included in the event. E SG.AU-16 FIR.2 Software Failed Integrity Check This event occurs when there is a failed integrity check concerning the firmware. A FNS.1;FAS.1,4; ; SG.AU-2; SG.SI-7 FIR.3 Upgrade Complete This event occurs when the firmware is successfully upgraded. E FIR.4 Software Error This alarm occurs when there is a software error. Information regarding what type of error has occurred including buffer overflows, exceptions, and integrity check failures is included. A FNS.1;FAS.1,4; SG.SI-7 407 3.5 Cryptographic Services 408 409 410 This section describes cryptographic services alarms and status events. Guidance regarding cryptographic services is included in the Security Profile for AMI Sections DHS-2.8.11, DHS2.8.12, and DHS-2.8.15. 411 3.5.1 Key Management and Certificates 412 413 414 Alarms and events related to key management and certificates are included in Table 8. It is up to the vendor to provide sufficient information in addition to the defined alarms and events to distinguish the source of these key management and certificate alarms and events. 3-12 415 416 Table 8 Key Management Alarms and Status Events Event Type ID Name Description Type Traces To CRY.1 Certificate Renewal This event occurs when the certificate renewal request is made before the certificate expires. The old key and certificate are retained until the new certificate is available. E FCS.3,4; SG.IA-3; SG.SC-11,15 CRY.2 Key Generation This event occurs when a new key is generated on the device. E FCS.3.,4; SG.IA-3; SG.SC-11 CRY.3 Key Update This event occurs when the key is updated. E FCS.3,4; SG.IA-3; SG.SC-11,14 CRY.4 Key Revocation This event occurs when the key is revoked. E FCS.3,4; SG.IA-3; SG.SC-11 417 3.6 Notification and Signaling Services 418 This section describes alarms and status events related to notification and signaling services. 419 3.6.1 Communications Interfaces 420 421 422 Alarms and status events related to general communication interfaces are included in Table 9. Guidance regarding communications interfaces can be found in the Security Profile for AMI Section DHS-2.8.18. These interfaces could include the following: 423 Ethernet 424 Cellular 425 RF mesh 426 427 Table 9 Communication Interface Specific Events Event Type ID Name Description Type Traces To COM.1 Communications Established This event occurs when a communications connection is established or re-established. This event may be used to detect signal interference or jamming attacks. E FNS.2,3 COM.2 Communications Lost This event occurs when an existing communications connection is broken. This event may be used to detect signal interference or jamming attacks. E FNS.2,3 3-13 COM.3 Communications Integrity Check Failure This event occurs when an integrity check fails. This event may be used to detect signal interference or packet injection attacks. 428 3.6.2 System Security Alarms and Events 429 Alarms and status events related to security are included in Table 10. 430 431 E FNS.1,2,3;FAS.1,4; SG.SI-8 Table 10 System Security Alarms and Events Event Type ID Name Description Type Traces To SEC.1 C12.22 Replay Detected This alarm occurs when a C12.22 replay is detected. A FNS.6 SEC.2 C12.22 Invalid Packet This alarm occurs when an invalid C12.22 packet is received. A FNS.6; SG.SI-7; SG.SC-20 SEC.3 C12.22 Registration This event occurs when a C12.22 registration occurs. E FNS.2,3;FAC.3; SG.IA-5 SEC.4 C12.22 Deregistration This event occurs when a C12.22 deregistration occurs. E FNS.2,3;FAC.3; SG.IA-5 SEC.5 Time Synchronization Failed This event occurs when time synchronization fails. E FNS.2,3 432 433 3.6.3 Device Alarms and Events 434 435 Alarms and events related to the device are included in Table 11. The following alarms and events document changes in the physical status of the AMI components. 436 437 Table 11 Physical and Device Alarms and Events Event Type ID Name Description Type Traces To PHY.1 Inversion Tamper This alarm occurs when an inversion tamper is detected. A FNS.1; FAS.2,3 PHY.2 Removal Tamper This alarm occurs when a removal tamper is detected. A FNS.1; FAS.2,3 PHY.3 Reverse Rotation This alarm occurs when a reverse rotation tamper is detected. A FNS.1; FAS.2,3 PHY.4 Battery Voltage Low This event occurs when the device's battery becomes low. E PHY.5 Device Power On This event is generated when power is restored. E 3-14 FNS.2,3 Event Type ID Name Description Type Traces To PHY.6 Device Reset This event is generated when the device has been reset via a hardware reset action. E FNS.2,3 PHY.7 Device Outage This event is generated when a loss of power is detected. This event may be generated if a backup power source (e.g., battery or capacitive) is provided. E FNS.2,3 PHY.8 Electromagnetic Attack This alarm occurs when an electromagnetic attack is detected. This may occur when an attempt is made to alter energy usage by placing a magnet on the meter. A FNS.1; FAS.2,3 3-15 438 4 439 COMMON SECURITY OBJECT DEVELOPMENT PLAN 440 441 442 443 444 This section describes the general issues related to the formalization of the AMI alarms and events document and the path towards the development of an accepted definition of the structure, content, and semantics of AMI alarms and events, or common AMI security objects. Specifically, a common set of attributes must be ascribed to each and alarm and event, and a set of requirements for retention capabilities must be agreed upon. 445 4.1 446 447 448 449 450 An alarm or event object should contain sufficient information for an event management system or system operator to make an informed decision about how to react to a security or system event. Every alarm and event should be authenticated at the source and linked to the identity of the device where it is generated. Basic attributes for all alarm and events are included in Table 12. Attributes 451 452 Table 12 Attributes Name Description Traces To Event Type ID Event Type ID uniquely identifies the type of event that occurred. FAC.7,8; SG.AU-3 Name Name is a word or short phrase that identifies the event. SG.AU-3 Originator ID Originator ID identifies the source of the event. The Originator ID should contain information that allows the device type and equipment identity (e.g. serial number.) FAC.8,9; SG.AU-3 Event Properties Event-specific properties which provide additional information about the security event. Examples include a disconnect event including a property that indicates if the disconnect was invoked locally or remotely. Another example would be a security failure event that indicates whether the failure was related to an optical interface or WAN. FAC.8,20; SG.AU-3 4-16 Name Description Traces To Time Stamp Time stamp information should conform to the following: FAC.8,30; SG.AU-3,8 Event Quality Identifies the date and time of occurrence Should use ISO 8601 time format with the ability to record up to millisecond precision Time stamps should be adequately synchronized across systems Event quality information should conform to the following: Description FAC.13,21; SG.AU-9 Helps determine if the event data has possibly been tampered with including information such as time not synchronized, physical tamper triggered, and device errors Description includes a brief textual description of the event type FAC.20; SG.AU-3 453 4.2 Retention Capabilities 454 455 456 This section describes a set of best practices for the retention of alarms and event information. The Security Profile for AMI provides guidance regarding backup and recovery in its section DHS-2.10.4. Additionally NISTIR 7628 includes related guidance in its section 3.9. 457 458 Best practices based on this guidance related to remote storage/backup capabilities include the following: 459 460 461 At minimum, an AMI electric meter should retain two weeks worth of data, but it should also be flexible and configurable to allow shorter or longer retention times based on regional and organizational regulations and policy. 462 463 At minimum, event logs should be transferred to a storage/backup system weekly before they are cleared locally on the meter. 464 465 466 467 468 469 Locally stored event log information should be readily correlatable to remotely stored event long information to allow for forensic investigation of meters without network connectivity. Log backups should be physically separated from the AMI system. At the head end, these log backups should be stored in information systems that are physically separate systems from the AMI system and in accordance with the organizations policies for enterprise data retention. 470 471 Log backups should be stored in a secure environment with controlled access and provisions for disaster recovery. 4-17 472 473 474 Regarding the format of audit records and storage, in order to encourage interoperability between event management systems and AMI vendors, logs should conform to an open standard such as the IETF SYSLOG protocol or HTTP with XML delineated data. 475 476 477 Data should be stored in an aggregated event management system, either distributed or centralized. The data should be accessible independently from the AMI system in order to support offline analysis. 478 4.3 479 480 The following is a proposed roadmap of the steps that must be taken to achieve the goal of the development of common AMI security objects. 481 Development Roadmap 1) Formalize AMI Common Alarms and Events Document 482 483 a. Engage key stakeholders in a public process to review the requirements, identify gaps, and formalize the document. 484 485 b. An existing industry working group such as OpenSG would be preferred for this process. 486 2) Develop Stakeholder and Vendor Consensus 487 488 489 a. Solicit key stakeholders, such as asset owners, AMI vendors, and security information and event management vendors to form a working group tasked with the development of common AMI security objects. 490 491 b. Once a final set of requirements has been developed, the document shall be presented to a working group body for review and formalization. 492 493 494 c. The development of common AMI security objects will be beneficial to all parties as they allow for more resources to be applied towards detection of threats, as opposed to integration of systems. 495 3) Facilitate the Development of AMI rule sets for SIEM systems 496 497 a. Engage security application vendors in the development of AMI rule sets for existing SIEM systems. 498 499 b. Provide expertise and guidance regarding standards and interoperability specific to AMI rule sets for SIEM systems. 500 4) Develop Conformity Testing and Certification Framework 501 502 503 a. A conformity testing and certification testing framework shall be developed to ensure interoperability between SIEM systems that have AMI-specific functionality. 504 b. The framework shall be verified in a laboratory environment. 505 4-18 506 5 507 CONCLUSION AND NEXT STEPS 508 509 510 511 This document has identified a set of common AMI alarms and events. Based on this information, a set of common AMI security objects will be defined and documented. The development roadmap in Section 4 lays forth the path towards formalization of this document and development of the common AMI security objects. 512 513 514 515 516 517 518 519 520 521 522 523 524 The next steps in this process will be to develop a schedule to complete the roadmap identified in Section 4. This schedule will culminate with EPRI project P183.009, Standardized Security Objects for AMI. P183.009 intends to develop standard security objects for AMI systems. Standardized Security Objects for AMI is expected to build on AMI incident response work that was performed in the PDU Cyber Security and Privacy Initiative. In the Initiative, EPRI coordinated with major AMI vendors to identify specific alarms and events for which standard security objects should be developed. Standardized Security Objects for AMI will be developed based on a consensus among the vendors for how each of the alarms and events should be represented. This accepted definition of the structure, content, and semantics of AMI alarms and events will enable a new generation of AMI SIEM systems. Third-party security application vendors may also be included to accelerate the development of AMI SIEM systems. Finally, the interoperability of the AMI vendors' implementations is intended to be verified in a laboratory environment. 5-19 525 6 527 APPENDIX: REFERENCES, GLOSSARIES, AND INDEXES 528 529 This section will contain a listing of the document's supporting references, a glossary, and an index. 530 6.1 531 This section will contain a listing of the document's supporting references. 532 533 534 535 6.2 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 AMI - Advanced Metering Infrastructure AMI-SEC - AMI Security ANSI - American National Standard Institute ASAP-SG - Advanced Security Acceleration Project for the Smart Grid CBKE - Certificate Based Key Exchange CSV - Comma Separated Value C12.18 standard - ANSI standard for type 2 optical port C12.22 standard - ANSI specification for interfacing to data communication networks CSWG - Cyber Security Working Group DPA - Differential Power Analysis DMZ - DeMilitarized Zone HAN - Home Area Network HTTP - HyperText Transfer Protocol IDS - Intrusion Detection System IEC - International Electrotechnical Commission IETF - Internet Engineering Task Force IT - Information Technology MAC - Media Access Control MDMS - Meter Data Management System MIB - Management Information Bases NAN - Neighborhood Area Networks NESCOR - National Electric Sector Cybersecurity Organization Resources NIST - National Institute of Standards and Technology NIST CSWG - NIST Cyber Security Working Group NISTIR - National Institute of Standards and Technology Interagency Report- Protocol Data Unit (Also – power delivery and utilization) RFLAN - RF Local Area Network SGIP-CSWG - Smart Grid Interoperability Panel Cyber Security Working Group SIEM - Security Information and Event Management 526 References IEC/TS 62351-7 TS Ed.1 (2008-04): Power systems management and associated information exchange -Data and communication security - Part 7: Network and system management (NSM) data object models Acronyms 6-20 565 566 567 568 569 570 571 SOAP - Simple Object Access Protocol SPA - Simple Power Analysis SYSLOG - IETF standard for computer data logging WG - Working Group XML - eXtensible Markup Language ZigBee - HAN communication protocol 6-21 572 7 574 APPENDIX: MALICIOUS AMI SECURITY EVENT SCENARIOS 575 576 577 578 Malicious AMI security events consist of compromise attempts, a potentially successful malicious action, and symptoms of the attack. This section describes two possible security event scenarios as background: 1) a simple security event that can be detected with a single category of alarms and 2) a more complex event that requires correlating multiple security alarms. 579 580 581 Each scenario consists of a description of the attack, possible alarms triggered including an Event Type ID that can be used to refer to additional information in Section 3, and actions that the utility can take to mitigate the attack. 582 7.1 583 7.1.1 Attack 584 585 586 587 In this scenario, an attacker attaches an optical probe and attempts to authenticate to the deployed meter using the ANSI C12.18 protocol. Once authenticated, the attacker attempts to obtain other credentials such as keys and passwords stored within the respective C12.19 tables in order to gain access to other systems. Figure 3 depicts this attack. 573 Scenario 1: Simple Physical Meter Attack 588 589 590 Figure 3 Simple Physical Meter Attacks 591 7.1.1.1 Possible Alarms Triggered 592 The following alarms and event are triggered: 593 C12.18 Failed Login (TAB.2) – Triggered repeatedly as the attacker fails to gain access. 7-22 594 595 596 7.1.1.2 Utility Actions 597 598 599 After alarms are sent to the utility, a decision will be made to respond and send personnel to the location. These personnel may or may not be trained to assess whether other malicious attacks have occurred so if the equipment was not destroyed, it could be replaced by a new meter. 600 7.2 601 7.2.1 Attack Stage 1: Physical Removal of Meter 602 603 Attacker gains physical access to the meter and removes it from the meter enclosure. Figure 4 depicts this attack stage. C12.18 Successful Login (TAB.1) – Triggered once when the attacker successfully gains access. Scenario 2: Complex Cyber-Physical Meter Attack 604 605 606 Figure 4 Complex Attack Stage 1 607 7.2.1.1 Possible Alarms Triggered 608 The following alarms and event are triggered: 609 Removal Tamper (PHY.2) – Triggered once the meter is removed from its enclosure 610 Device Outage (PHY.7) – Triggered once the meter is disconnected from its power source 611 7.2.1.2 Utility Action 612 613 The alarms and events are received by the utility as the meter is being removed. A decision could be made to respond and send personnel to the location 614 7.2.2 Attack Stage 2: Physical Attack of Meter Hardware 615 616 In the second stage, the attacker attaches hardware and tools to device to obtain security relevant information. Figure 5 depicts this attack stage. 7-23 617 618 619 Figure 5 Complex Attack Stage 2 620 7.2.2.1 Possible Alarms Triggered 621 The following alarms and event are triggered: 622 623 Software Error (FIR.4) – Triggered as software errors occur while the attacker attempts to utilize various exploits. 624 625 Device Reset (PHY.6) – Triggered as the attacker resets the device while trying to read out the memory. 626 7.2.2.2 Utility Action 627 628 If the meter is reconnected to the system, the alarm will be received by the utility. A decision could be made to respond and send personnel to the location. 629 7.2.3 Attack Stage 3: Usage of Stolen Credentials 630 631 632 633 634 As a result of reading out the memory the attacker has gained access to the C12.18 login credentials. The attacker has figured out a way to communicate with the meter over C12.22 and attempts to use the stolen credentials to log into the meter. After several unsuccessful attempts, the attacker logs into the meter and issues a remote disconnect command. Figure 6 depicts this attack stage. 7-24 635 636 637 Figure 6 Complex Attack Stage 3 638 7.2.3.1 Possible Alarms Triggered 639 The following alarms and event are triggered: 640 C12.22 Failed Login (TAB.8) – Triggered as the attacker makes unsuccessful login attempts. 641 C12.22 Successful Login (TAB.7) – Triggered as the attacker successfully logs in. 642 643 Remote Disconnect (SWI.2) – Triggered as the attacker issues a remote disconnect command. 644 645 7.2.3.2 Utility Action 646 647 The utility receives multiple C12.22 failed logins, a successful login, and a remote disconnect alarm. A decision could be made to respond and send personnel to the location. 7-25 648 649 Intrusion Detection System for Advanced Metering Infrastructure 650 651 Product ID Number 652 653 0 654 655 Intrusion Detection System for Advanced Metering Infrastructure 656 657 Product ID Number 658 Draft Release – August 10, 2012 659 660 661 662 663 Insert appropriate EPRI Title Page Auto Text entry here. 664 DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES 665 666 667 668 THIS DOCUMENT WAS PREPARED BY THE ORGANIZATION(S) NAMED BELOW AS AN ACCOUNT OF WORK SPONSORED OR COSPONSORED BY THE ELECTRIC POWER RESEARCH INSTITUTE, INC. (EPRI). NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, THE ORGANIZATION(S) BELOW, NOR ANY PERSON ACTING ON BEHALF OF ANY OF THEM: 669 670 671 672 673 674 (A) MAKES ANY WARRANTY OR REPRESENTATION WHATSOEVER, EXPRESS OR IMPLIED, (I) WITH RESPECT TO THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR (II) THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED RIGHTS, INCLUDING ANY PARTY'S INTELLECTUAL PROPERTY, OR (III) THAT THIS DOCUMENT IS SUITABLE TO ANY PARTICULAR USER'S CIRCUMSTANCE; OR 675 676 677 678 679 (B) ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES) RESULTING FROM YOUR SELECTION OR USE OF THIS DOCUMENT OR ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT. 680 681 Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by EPRI. 682 The following organization(s), under contract to EPRI, prepared this report: 683 University of Illinois at Urbana-Champaign 684 685 686 687 688 689 690 691 692 This is an EPRI Technical Update report. A Technical Update report is intended as an informal report of continuing research, a meeting, or a topical study. It is not a final EPRI technical report. 693 694 NOTE 695 696 For further information about EPRI, call the EPRI Customer Assistance Center at 800.313.3774 or e-mail askepri@epri.com. 697 698 Electric Power Research Institute, EPRI, and TOGETHERSHAPING THE FUTURE OF ELECTRICITY are registered service marks of the Electric Power Research Institute, Inc. 699 Copyright © 2012 Electric Power Research Institute, Inc. All rights reserved. ACKNOWLEDGMENTS 700 701 702 The following organization(s), under contract to the Electric Power Research Institute (EPRI), prepared this report: 703 704 705 University of Illinois at Urbana-Champaign 1308 W. Main St. Urbana, Illinois 61801 706 707 Principal Investigator: Robin Berthier 708 This report describes research sponsored by the Electric Power Research Institute (EPRI). This publication is a corporate document that should be cited in the literature in the following manner: Title of Document: Subtitle. EPRI, Palo Alto, CA: <Year>. <Product ID#>. iii 709 ABSTRACT 710 711 712 713 714 715 716 717 The deployment of Advanced Metering Infrastructures (AMI) significantly increases the attack surface that utilities have to protect. As a result, there is a critical need for efficient monitoring solutions to supplement protective measures and keep the infrastructure secure. This document investigates current industrial and academic efforts to address the challenge of detecting security events across the range of AMI networks and devices. The goal of this study is to help utilities and vendors to understand intrusion detection requirements, gaps in existing approaches, and research problems that need to be solved to build and deploy a scalable and comprehensive security monitoring solution. v CONTENTS 718 1 INTRODUCTION................................................................................................................... 1 1.1 Purpose and Scope ...................................................................................................... 1 1.2 Document Organization ................................................................................................ 1 2 MONITORING REQUIREMENTS AND CURRENT APPROACHES ..................................... 2 2.1 AMI Security Threats and Monitoring Requirements ..................................................... 2 2.2 Major Security Concerns .............................................................................................. 3 2.3 Industry Solutions ......................................................................................................... 3 2.4 Academic Solutions ...................................................................................................... 5 3 GUIDELINES FOR A SCALABLE AND COMPREHENSIVE IDS FOR AMI ......................... 8 3.1 Characteristics of an IDS Architecture for AMI .............................................................. 8 3.2 Case Study ................................................................................................................... 9 3.2.1 Intrusion Detection Operations Required ............................................................. 9 3.2.2 Monitoring Architecture Components, Topology, and Communications ............. 10 3.2.3 Alert Correlation and Aggregation ...................................................................... 12 733 4 CONCLUSION AND NEXT STEPS .................................................................................... 13 734 5 APPENDIX: REFERENCES, GLOSSARIES, AND INDEXES ............................................ 14 5.1 References ................................................................................................................. 14 5.2 Acronyms ................................................................................................................... 15 719 720 721 722 723 724 725 726 727 728 729 730 731 732 735 736 737 vi 738 739 740 741 742 743 744 LIST OF FIGURES Figure 1: Percentages of IDS vendors for different technologies and environments. Source: publicly available information from top 15 smart grid security solution vendors. ......................... 4 Figure 2: Characteristics of a scalable and comprehensive intrusion detection system for AMI .. 9 Figure 3: AMI Network Diagram Instrumented with IDS Components (Courtesy Justin Searle, UtiliSec) ....................................................................................................................................11 745 vii 746 747 748 749 LIST OF TABLES Table 1: Overview of research publications related to IDS for AMI or SCADA environments...... 5 Table 2: Monitoring Operations and Sensor Placement Based on Attack Consequences .........10 750 ix 751 1 INTRODUCTION 752 753 1.1 Purpose and Scope 754 755 756 757 758 This Intrusion Detection Systems (IDSes) for Advanced Metering Infrastructure (AMI) document is a product of the EPRI AMI Incident Response Project. The document is intended to give AMI vendors and asset owners a clear understanding of the unique monitoring requirements of AMI and to identify key research challenges related to intrusion detection technology and large-scale deployment. 759 760 761 762 763 764 765 The effective design and deployment of IDSes in a utility’s AMI environment have several characteristics that differentiate them from design and deployment in traditional information technology (IT) environments. For example, simply deploying a perimeter IDS may not provide the coverage necessary for an AMI system. Since there tend to be mesh networks in addition to IP-based backhaul networks, positioning an IDS at the AMI head-end system could miss malicious activity in the mesh network. In addition, there can be scalability issues, as some utilities deploy millions of meters in their service territories. 766 767 768 The scope of this document includes monitoring requirements for the core components of an AMI (i.e., collection engine, meter data management system, data collection unit, and meters) and does not cover the home area network (HAN) or third-party communication equipment. 769 1.2 770 771 772 773 Section 2 reports on requirements for monitoring the security of AMI and on existing approaches from both industry and academia to address those requirements. This review leads to a gap analysis presented in Section 3, where key research challenges and guidelines for deploying a scalable IDS for AMI are identified. Document Organization 774 2 776 MONITORING REQUIREMENTS AND CURRENT APPROACHES 777 2.1 778 779 780 781 782 The deployment of an AMI represents a significant increase in the attack surface that utilities have to protect. The addition of a communication infrastructure and the processing capabilities of AMI devices coupled with the physical accessibility of smart meters and even access points enable new ways to penetrate the system and could attract a wide array of threats. Among the attack motivations that are specific to AMI, we consider: 783 Energy fraud 784 785 Service disruption for the purposes of extortion (e.g., through a denial of service), vandalism, hacktivism, or terrorism (e.g., power disruption through remote disconnects) 786 Theft of sensitive information 787 Abuse of communication infrastructure (e.g., by creating a botnet) 788 789 790 791 792 Malicious activities that achieve those goals could have a heavy financial impact on utilities and would likely result in major losses of customer trust and technology adoption. As a result, it is critical that utilities have a way to perform timely detection and identification of malicious actions and incidents so that local issues can be mitigated before they escalate. This objective requires the implementation of an efficient monitoring solution. 793 794 The challenges to address when designing a monitoring solution for a large and complex system include: 795 What information should be collected? 796 797 Where should sensors be deployed, and how can visibility over the information required for detection be gained? 798 799 Which detection technologies would be best suited to triggering alarms when malicious activity occurs? 800 How should appropriate system operators be notified? 801 802 Should a separate communication channel be used to exchange intrusion detection information and configuration? 803 804 Which data aggregation and correlation techniques should be deployed to optimally differentiate malicious events from legitimate ones? 805 806 807 808 809 The last question is crucial, because the likelihood of legitimate events that could trigger intrusion detection alerts is high. For instance, security alarms could be triggered because of operational mistakes, misconfigurations, system failures, or disruptive events such as natural disasters. The misidentification of legitimate failures as malicious actions would generate false positives and lessen the efficacy of the monitoring solution. 775 AMI Security Threats and Monitoring Requirements 2 810 2.2 Major Security Concerns 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 As part of the initiative to develop common alerts and alarms for AMI, and to understand the needs for AMI cyber security incident detection and response, a questionnaire was developed and sent to utility partners. Respondents represented a diversity of environments (urban, suburban, and rural) and deployment phases (pilot planned, started, and completed). The top security concerns expressed were loss of controllability over AMI devices, followed by loss of observability due to a lack of data integrity. Cyber threats specific to AMIs included meter compromise and massive remote disconnects. Indeed, meters, along with pole-top collectors, are the components that are most vulnerable to cyber intrusion by an external entity, while the headend system and vendor access are seen as the most vulnerable to insider attacks. 836 2.3 837 838 839 840 841 842 843 Utilities are investing in monitoring solutions to complement the anti-tampering alarms and event-logging capabilities already offered by smart meters. As shown on Figure 7, a survey of the 15 leading security vendors that offer AMI monitoring solutions showed that products mostly fall into two categories: 1. Centralized network-based intrusion detection sensors, and 2. Centralized security information and event managers (SIEMs). At a lower level, utilities expressed concerns with respect to the following security events: Unauthorized massive remote disconnect Device tampering: malware and malicious code injection (e.g., through buffer overflow attack attacks), rogue device attachment, meter tampering, access to firmware password, and zero-day attacks against AMI devices Encryption issues: access to decryption keys or discovery of flaws in encryption Denial of service against routers or cell relays Unauthorized modifications to system configurations and physical components This list offers an initial guide to understanding the need for a comprehensive monitoring solution and identifying where intrusion detection sensors should be deployed. For example, the importance of threats targeting devices in the field indicates that the integrity and health of AMI devices should be closely monitored. However, instrumenting and monitoring every device may be too expensive, and, as explained in the following sections, current security solutions indeed do not cover this requirement. Industry Solutions Field monitoring solution for SCADA Field monitoring solution for AMI Centralized SIEM in utility network Network-based monitoring sensor Host-based sensor for embedded device 844 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 845 846 Figure 7: Percentages of IDS vendors for different technologies and environments. Source: publicly available information from top 15 smart grid security solution vendors. 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 Network-based IDSes are sometimes coupled with a firewall to gain intrusion prevention capabilities, and sit in the head-end behind decryption servers to have access to clear traffic. Either they perform packet header analysis only, or they also include application-level dissectors to analyze payloads. SIEMs are also installed in the utility network and receive logs from security appliances and devices using Syslog and a variety of information sources. They offer a central database to ease event aggregation, correlation, and visualization across components and over time. Those products offer a cost-effective solution to monitoring events and communication traffic from a large volume of AMI devices. Most of them were designed for SCADA monitoring, but an increasing number now integrate AMI protocol analysis capabilities and data-mining approaches to process AMI events. While those products are important for monitoring the infrastructure, the cost advantage of deploying only a centralized solution has to be weighed against the limitation of not having visibility over events that occur at the edge of the network. In particular, neighborhood area networks (NANs) are usually deployed with a wireless mesh communication infrastructure, in which a significant portion of the traffic occurs among network nodes and is invisible to monitoring devices at the head-end. As a result, threats such as unauthorized remote disconnects originating from the field cannot be detected by current centralized solutions. Indeed, utilities have expressed the need for security solutions that could provide situational awareness over all parts of the infrastructure. Another security monitoring gap emphasized by utilities has been the lack of host-based intrusion detection sensors embedded on AMI devices to permit remote checking of firmware integrity and to identify compromised devices. In addition, utilities expressed strong interest in large-scale patch management solutions for embedded devices deployed in the field. The responses to the questionnaire on AMI Incident Response Guidelines and Best Practices indicate that the main reasons for the industry’s current push for centralized monitoring solutions and the lack of distributed IDSes in the field have been 1) the need for high cost efficiency, 2) the lack of maturity of AMI security (e.g., how to assess the likelihood and criticality of a smart meter compromise), and 3) the difficulty of integrating proprietary communication protocols (e.g., most mesh network communication technologies at the lower layers are proprietary). 4 879 2.4 Academic Solutions 880 881 882 883 884 Research from academic institutions and national labs can be organized into three categories: 1) efforts to understand the threat environment, 2) efforts to develop security monitoring architectures, and 3) design of network- or host-based intrusion detection sensors. Those research efforts are summarized in Table 13 and detailed below. 885 Table 13: Overview of research publications related to IDS for AMI or SCADA environments Publications Threat analysis Host IDS Network IDS AMI [1] Energy theft in the AMI (McLaughlin, 2010) ✓ ✓ [2] Multi-vendor penetration testing in the AMI (McLaughlin, 2010) ✓ ✓ [3] Cyber security issues for AMI (Cleveland, 2008) ✓ ✓ [4] Intrusion detection for AMI: requirements and architectural directions (Berthier, 2010) [5] Cumulative attestation kernels for embedded systems (LeMay, 2009) ✓ ✓ ✓ ✓ [6] An IDS for wireless PCS (Roosta, 2008) SCADA ✓ ✓ ✓ ✓ ✓ ✓ [9] Intrusion detection in SCADA networks (Barbosa, 2010) ✓ ✓ [10] Sophia proof of concept report (Rueff, 2010) ✓ ✓ [8] Intrusion monitoring in PCS (Valdes, 2009) [11] Distributed IDS in a multilayer network architecture of smart grids (Zhang, 2011) ✓ ✓ ✓ [12] Specification-based IDS for HAN (Jokar, 2011) ✓ ✓ [13] Specification-based IDS for AMI (Berthier, 2011) ✓ ✓ ✓ 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 In the first category, [3] reviewed the security requirements for AMIs and the related threats, emphasizing that encryption and authentication alone will not be sufficient security protections, and that monitoring solutions are a critical complement. [1] provided a detailed security analysis of the issue of energy theft. The authors explained that AMI would significantly increase the risk of energy theft because of the interconnected nature of the infrastructure and the large-scale deployment of identical devices, leading to an amplification of effort, a division of labor, and an extended attack surface. In [2], the same authors introduce a penetration testing method to evaluate AMI components, revealing vulnerabilities such as the sending of unencrypted passwords over optical ports, the possibility of replaying authentications, and the derivation of encryption keys from meter passwords. In the category of work on security monitoring architectures, [4] outlines the requirements for a comprehensive intrusion detection system for AMI, based on an analysis of the threat model and the information required for detection. In particular, the authors explained that specificationbased intrusion detection systems that enable the deployment of a white-listed network would offer a strong security monitoring solution in an AMI environment where communications are tightly controlled and deterministic. This assumption of well-behaved network traffic is characterized in [9], in which the authors explained how the fixed number of network devices, the limited number of protocols, and the regular communication patterns found in the SCADA environment would enable precise network traffic models that can be leveraged for intrusion detection. That notion was used to build a tool called Sophia [10] that captures network traffic in industrial control systems to build a baseline model, and then triggers alerts when deviations are detected. [8] used a combination of specifications, change detection, and statistical anomalies to monitor process control systems protocols such as Modbus and DNP3. Once again, taking advantage of the regularity of network communication patterns enables the hybrid approach to detect both known and unknown attacks. [11] described a distributed intrusion detection system for both AMI and SCADA systems that relies on anomaly-based sensors deployed in HAN, NAN, the WAN, and SCADA environments. The sensors collect security-relevant information from the communication flows, and two machine-learning algorithms, including a support vector machine and an artificial immunity approach called clonal selection, process the data to identify malicious behavior. Those algorithms offer high detection accuracies if they are correctly and sufficiently trained. Finally, in the category of work on host-based intrusion detection sensors, [5] introduces an architecture called the cumulative attestation kernel that addresses the issue of securely auditing firmware updates in embedded systems such as smart meters. The system is designed to be cost-, power-, computation-, and memory-efficient. A prototype is implemented to demonstrate the feasibility of the solution as well as to formally prove that it meets remote attestation requirements. With respect to network-based intrusion detection systems, [6] proposes a modelbased sensor working on top of the WirelessHART protocol to monitor and protect wireless process control systems. The hybrid architecture consists of a central component that collects information periodically from distributed field sensors. A set of eight detection rules working on the physical, data-link, and routing layers covers threats including signal jamming, node compromise, and packet modification. [12] presents and evaluates a specification-based intrusion detection sensor for HAN designed to monitor the physical and MAC layers of the ZigBee 6 932 933 934 935 936 937 938 939 940 941 942 protocol. [13] also took advantage of a specification-based approach to monitor the ANSI C12.22 protocols through dedicated sensors deployed in the NAN. That solution was unique in using formal methods to prove that specification-based checkers offer sufficient coverage with respect to an AMI security policy. 943 3 945 GUIDELINES FOR A SCALABLE AND COMPREHENSIVE IDS FOR AMI 946 3.1 947 948 949 950 The study of the threat model, the needs expressed by utilities, the current solutions from security providers, and the latest research efforts on AMI monitoring provide a set of key insights into the characteristics of a comprehensive IDS for AMI. Those characteristics are summarized in Figure 8 and described below. 944 Characteristics of an IDS Architecture for AMI 951 952 953 1. Monitoring of AMI communications at the head-end is necessary but not sufficient. Important threats that occur at the edge of the network mean that it is also necessary to instrument field devices or to deploy sensors in the field. 954 955 956 957 958 2. Monitoring of embedded operating systems in devices deployed in the field with hostbased intrusion detection systems is critical. It empowers security operators to validate security alerts by checking whether the integrity of devices has been altered. This capability should be combined with an efficient patch distribution and management mechanism. 959 960 961 962 3. Network-based intrusion detection systems should leverage the deterministic nature of energy system communications through the implementation of a white-list approach in order to gain higher detection accuracy, to handle unknown attacks, and to work without the need for frequent updates. 963 964 965 966 967 4. IDS developers should embrace formal verification tools to validate the design of checkers for both host- and network-based intrusion detection systems. Those tools have been successfully used to check hardware design of critical systems, and they can offer strong mathematical guarantees to ensure that the stringent security requirements of AMI are met. 968 969 970 971 5. The deployment of IDS sensors in the field requires strong protection mechanisms and separate communication channels to prevent the IDS from becoming compromised. In addition, trust-building schemes such as majority voting should be implemented to ensure that attackers cannot easily forge alerts. 972 973 974 975 6. The monitoring architecture should scale to AMIs made of millions of devices. This high scalability requirement means that distributed detection technologies should be favored, in addition to smart data aggregation schemes that would enable operators to gain situational awareness without being overwhelmed by the number of alerts. 976 977 978 979 7. Finally, any security solution deployed in the smart grid environment has to be highly practical by reinforcing security layers without affecting the core mission of delivering energy. This requirement also applies to the monitoring architecture, which means that autonomous sensors and self-learning algorithms should be leveraged. 8 980 981 Figure 8: Characteristics of a scalable and comprehensive intrusion detection system for AMI 982 3.2 Case Study 983 984 985 986 987 Based on the characteristics and requirements outlined in the previous subsection, we now illustrate the design of a scalable and comprehensive IDS architecture for AMI through an example. We assume a traditional AMI architecture made of two types of network: a back-haul connecting the utility to a set of collectors deployed in the field, and neighborhood area networks to connect meters to collectors. 988 3.2.1 Intrusion Detection Operations Required 989 990 991 In order to define detection technologies and sensor placement, the first step is to translate the threat model into attack consequences. Those consequences are key to understanding the information required by an intrusion detection system for the successful identification of attacks. 992 993 994 In Table 1, which has been updated from one provided in [4], monitoring operations are defined based on a generic but comprehensive list of attack consequences, and organized according to three detection technologies: 995 996 997 998 999 Stateful specification-based monitoring: to track the behavior of nodes in the network over time and to compare operations to protocol specifications and security policy in order to flag deviations (e.g., by validating the sequence of C12.22 requests and responses and by monitoring the frequency of critical operations such as remote disconnects). 1000 1001 1002 Stateless specification-based monitoring: to verify a security property (e.g., the integrity of firmware, or the correct format of a C12.22 payload) without having to keep a state over time. 1003 1004 1005 Anomaly-based monitoring: again, to verify security properties, but with respect to statistical metrics (e.g., network bandwidth) rather than detailed system specifications (e.g., by monitoring the signal power level and packet losses in wireless mesh networks). 1006 1007 1008 1009 1010 The locations of sensors are defined by the information accessible at each location (head-end, collectors, or meters) and the processing capabilities available and required by the monitoring operations. Typically, stateful monitoring operations require more computations than stateless operations. Finally, in the case of network-based monitoring, the rightmost column provides indications of which protocol layers of the OSI model have to be monitored. 1011 Table 14: Monitoring Operations and Sensor Placement Based on Attack Consequences Attack Consequences Detection Goal and Operation Sensor Locations Stateful Specification-based Monitoring Integrity of Checking of configuration and routing configuration and operations against security policy and network Collectors routing protocols configuration Illegitimate network Stateful checking of protocol operations against Collectors and/or operations security policy and application configurations head-end Stateless Specification-based Monitoring Inconsistent traffic Checking of packet header against security Collectors and/or origin or destination policy and network configuration head-end Integrity of Checking of packet payload against protocol Collectors and/or communication specifications head-end traffic Illegitimate use of Checking of system logs against security policy Head-end credentials Integrity of node Operating system, application, and file integrity Meters, collectors, software or checking and head-end hardware Checking of protocol operations against security Unresponsive nodes Collectors policy and application configuration Anomaly-based Monitoring High bandwidth Traffic monitoring against normal statistical Meters and/or usage profiles collectors High signal power Checking of wireless signal against normal Meters and/or level statistical profiles collectors Protocol Layers (OSI Model) 3-4 5-7 3-4 3-7 5-7 Host 2-7 3-4 1-2 1012 1013 3.2.2 Monitoring Architecture Components, Topology, and Communications 1014 1015 Following guidance on intrusion detection systems from [14], a monitoring architecture for AMI can be decomposed into the following components: 1016 1017 1018 1019 Sensors: software or hardware components to capture and analyze network or system activity. In the case of an AMI, sensors should be deployed at the head-end, collectors and meters. Head-end sensors would process a large volume of traffic, while sensors in meters should have minimum computing requirements. 1020 1021 1022 1023 Management server: information generated by sensors should be sent to one or several management servers. The roles of the management server are 1) to store events in a database, and 2) to run a correlation and aggregation process to detect intrusions that could not be identified locally. 1024 1025 1026 1027 Database server: repository for event information recorded by sensors and management servers. The combination of the management server and the database server is often called a Security Information and Event Management (SIEM). A SIEM can log security events from other sensors than the ones deployed in the AMI. 10 1028 1029 1030 1031 1032 Console: interface that security administrators can use 1) to configure the intrusion detection systems, 2) to monitor the security state of the AMI, 3) to visualize and explore alerts, and 4) to conduct forensics activity. Figure 9 presents the topology of the monitoring architecture with the locations of the various components in the AMI. SIEM mgmt server database Host-based IDS at head-end Embedded sensor on meter Network-based IDS at head-end IDS sensors on collectors IDS console 1033 1034 1035 Figure 9: AMI Network Diagram Instrumented with IDS Components (Courtesy Justin Searle, UtiliSec) 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 Communications among IDS components should be isolated from metering traffic. At the headend and in the backhaul, that can be achieved through an encrypted VLAN. In NANs, depending on the communication medium, IDS management traffic and alerts can be carried either on a separate protocol (e.g., XML or JSON over SSL) or through the AMI communication traffic (e.g., ANSI C12.22) but with separate encryption keys. To reach high scalability, IDS sensors should preprocess collected activities and be as autonomous as possible in order to send only the most relevant information to the management server. The management server should use machine-learning algorithms to correlate and aggregate events over time with the goals of 1) translating raw sensor data into actionable information, and 2) reducing false positive rates. Correlation and aggregation techniques are described in the next subsection. Additionally, for large-scale AMIs, the management server and sensors at the head-end can be deployed in a two-tier, load-balancing architecture, such that a first set of appliances preprocess (e.g., to decrypt payloads) and route monitoring traffic to the correct server for final processing and storage. Finally, two important mechanisms enable the monitoring architecture to become more resilient: 1) reduction of the trust of individual sensors by requiring majority voting or event correlation across multiple sensors before triggering of alerts, and 2) removal of single points of failure through deployment of multiple management servers and/or distribution of IDS information among sensors (e.g., using distributed hash tables). 1057 3.2.3 Alert Correlation and Aggregation 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 The significant size of AMI requires deploying highly efficient security event managers in order to process large volume of alerts while providing timely information about critical events and keeping a low volume of false positives. Alert processing operations [7] are organized according to the following steps: 1. Pre-processing: a. Normalization and storage of alerts into a standard format (e.g., IDMEF). b. Organization of normalized alerts into a relational database. Tables should be created for AMI components (e.g., meters, collectors, routers, etc.) and events (e.g., C12.22 failed authentication, remote disconnect, etc.). 2. Aggregation: a. Computation of probabilistic similarity measures among alerts (e.g., across space, such as several meters reporting high numbers of packet losses in the same NAN, and across time, such as collectors reporting scanning attempts over a similar period). b. Reduction of the volume of alerts through clustering and merging following attribute analysis and similarity measures (e.g., alerts targeting the same ApTitle), or through filtering following rules learned over time (e.g., discarding C12.18 authentication alerts if related to approved maintenance operations). 3. Correlation: a. Using predefined attack scenarios, specified by experts or learned over time (e.g., an energy theft attempt would likely combine anti-tampering alerts with outage notifications). This correlation approach is only effective for known attacks. b. To complement the previous approach and handle unknown attacks, correlation of alerts can be made by linking attack steps over prerequisites and consequences of attacks (e.g., integrity violation of a meter system following a network buffer overflow exploit targeting the same meter). c. Correlation through multiple information sources, by combining knowledge about policies (e.g., maximum frequency for remote disconnect operations), maintenance operations (e.g., configuring a meter with a field device), and alerts. 12 1089 4 1090 CONCLUSION AND NEXT STEPS 1091 1092 1093 1094 1095 This document has identified a set of characteristics of a scalable and comprehensive monitoring architecture for AMI, based on the review of AMI threats, utility needs, security vendor solutions, and the research literature. Those characteristics were illustrated through a case study that presented IDS components along with a topology and a discussion about IDS communication architecture and alert correlation and aggregation techniques. 1096 1097 1098 The next steps to improving the technology will be to work with vendors, utilities, and third parties to ensure the interoperability of IDS components for AMI through the identification of standard interfaces and standard communication protocols. 1099 5 1101 APPENDIX: REFERENCES, GLOSSARIES, AND INDEXES 1102 5.1 1103 1104 1105 [1] S. McLaughlin, D. Podkuiko, and P. McDaniel, “Energy theft in the advanced metering infrastructure,” in Proceedings of the Critical Information Infrastructures Security, pp. 176–187, 2010. 1106 1107 1108 [2] S. McLaughlin, D. Podkuiko, S. Miadzvezhanka, A. Delozier, and P. McDaniel, “Multivendor penetration testing in the advanced metering infrastructure,” in Proceedings of the 26th Annual Computer Security Applications Conference, ACM, 2010, pp. 107–116. 1109 1110 1111 [3] F. Cleveland, “Cyber security issues for advanced metering infrastructure (AMI),” in Proceedings of the Power and Energy Society General Meeting: Conversion and Delivery of Electrical Energy in the 21st Century, IEEE, 2008, pp. 1–5. 1112 1113 1114 1115 [4] R. Berthier, W. Sanders, and H. Khurana, “Intrusion detection for advanced metering infrastructures: Requirements and architectural directions,” in Proceedings of the First IEEE International Conference on Smart Grid Communications (SmartGridComm), IEEE, 2010, pp. 350–355. 1116 1117 [5] M. LeMay and C. Gunter, “Cumulative attestation kernels for embedded systems,” Proceedings of Computer Security–ESORICS 2009, pp. 655–670, 2009. 1118 1119 1120 [6] T. Roosta, D. Nilsson, U. Lindqvist, and A. Valdes, “An intrusion detection system for wireless process control systems,” in Proceedings of the 5th IEEE International Conference on Mobile Ad Hoc and Sensor Systems (MASS 2008), IEEE, 2008, pp. 866–872. 1121 1122 1123 [7] U. Zurutuza and R. Uribeetxeberria, “Intrusion detection alarm correlation: a survey,” in: Proceedings of the IADAT International Conference on Telecommunications and Computer Networks (TCN’04), Donostia, Spain, December 2004. 1124 1125 [8] A. Valdes and S. Cheung, “Intrusion monitoring in process control systems,” in Proceedings of the 42nd Hawaii International Conference on System Sciences, pp. 1–7, 2009 1126 1127 1128 [9] R. Barbosa and A. Pras, “Intrusion detection in SCADA networks,” In Lecture Notes on Computer Sciences: Mechanisms for Autonomous Management of Networks and Services, vol. 6155, pp. 163–166, Springer, 2010 1129 1130 [10] G. Rueff, C. Thuen, and J. Davidson, “Sophia Proof of Concept Report,” Idaho National Laboratory (INL), 2010. 1131 1132 1133 [11] Y. Zhang, L. Wang, W. Sum, I. Green, M. Alam, and others, “Distributed IDS in a multilayer network architecture of smart grids,” IEEE Transactions on Smart Grid, num. 99, page 1, 2011 1100 References 14 1134 1135 1136 [12] P. Jokar, H. Nicanfar, and V. Leung, “Specification-based IDS for home area networks in smart grids,” in Proceedings of the IEEE International Conference on Smart Grid Communication (SmartGridComm), pp. 208—213, 2011 1137 1138 1139 [13] R. Berthier and W. H. Sanders, “Specification-based IDS for AMI,” in Proceedings of the 17th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 184—193, 2011 1140 [14] NIST SP 800-94, “Guide on Intrusion Detection and Prevention Systems (IDPS),” 2009 1141 1142 5.2 Acronyms 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 AMI: Advanced Metering Infrastructure AMI-SEC: AMI security ANSI: American National Standards Institute ASAP-SG: Advanced Security Acceleration Project for the Smart Grid CBKE: certificate-based key exchange CSV: comma-separated value C12.18 standard: ANSI standard for type 2 optical port C12.22 standard: ANSI specification for interfacing to data communication networks DPA: differential power analysis DMZ: demilitarized zone HAN: home area network HTTP: Hypertext Transfer Protocol IDS: intrusion detection system IEC: International Electrotechnical Commission IETF: Internet Engineering Task Force IT: information technology MAC: media access control MDMS: meter data management system MIB: management information bases NAN: neighborhood area networks NESCOR: National Electric Sector Cybersecurity Organization Resources NIST: National Institute of Standards and Technology NIST CSWG: NIST Cyber Security Working Group PDU: protocol data unit RFLAN: RF local area network SGIP-CSWG: Smart Grid Interoperability Panel Cyber Security Working Group SIEM: Security Information and Event Management SOAP: Simple Object Access Protocol SPA: simple power analysis SYSLOG: IETF standard for computer data logging WG: working group XML: Extensible Markup Language ZigBee: HAN communication protocol 1178 1179 AMI Cyber Security Incident Guidelines 1180 1181 Product ID Number 1182 1183 16 1185 Advanced Metering Infrastructure Cyber Security Incident Guidelines 1186 Product ID Number 1187 Technical Update, August 10, 2012 1184 1188 1189 1190 1191 1192 Insert appropriate EPRI Title Page Auto Text entry here. 1193 DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES 1194 1195 1196 1197 THIS DOCUMENT WAS PREPARED BY THE ORGANIZATION(S) NAMED BELOW AS AN ACCOUNT OF WORK SPONSORED OR COSPONSORED BY THE ELECTRIC POWER RESEARCH INSTITUTE, INC. (EPRI). NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, THE ORGANIZATION(S) BELOW, NOR ANY PERSON ACTING ON BEHALF OF ANY OF THEM: 1198 1199 1200 1201 1202 1203 (A) MAKES ANY WARRANTY OR REPRESENTATION WHATSOEVER, EXPRESS OR IMPLIED, (I) WITH RESPECT TO THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR (II) THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED RIGHTS, INCLUDING ANY PARTY'S INTELLECTUAL PROPERTY, OR (III) THAT THIS DOCUMENT IS SUITABLE TO ANY PARTICULAR USER'S CIRCUMSTANCE; OR 1204 1205 1206 1207 1208 (B) ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES) RESULTING FROM YOUR SELECTION OR USE OF THIS DOCUMENT OR ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT. 1209 1210 Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by EPRI. 1211 The following organization(s), under contract to EPRI, prepared this report: 1212 Southwest Research Institute (SwRI) 1213 Automation and Data Systems Division 1214 6220 Culebra Road, Building 68 1215 San Antonio, TX 78238-5166, USA 1216 1217 1218 1219 1220 1221 1222 This is an EPRI Technical Update report. A Technical Update report is intended as an informal report of continuing research, a meeting, or a topical study. It is not a final EPRI technical report. 1223 1224 NOTE 1225 1226 For further information about EPRI, call the EPRI Customer Assistance Center at 800.313.3774 or e-mail askepri@epri.com. 1227 1228 Electric Power Research Institute, EPRI, and TOGETHERSHAPING THE FUTURE OF ELECTRICITY are registered service marks of the Electric Power Research Institute, Inc. 1229 Copyright © 2011 Electric Power Research Institute, Inc. All rights reserved. ACKNOWLEDGMENTS 1230 1231 1232 The following organization(s), under contract to the Electric Power Research Institute (EPRI), prepared this report: 1233 1234 1235 Southwest Research Institute 6220 Culebra Road San Antonio, TX 78238-5166 1236 Principal Investigators 1237 Tam Do; Gary Ragsdale, Ph.D., P.E.; Will Arensman 1238 This report describes research sponsored by EPRI. 1239 This publication is a corporate document that should be cited in the literature in the following manner: Title of Document: Subtitle. EPRI, Palo Alto, CA: <Year>. <Product ID#>. iii 1240 ABSTRACT 1241 1242 1243 1244 1245 1246 1247 This document is intended to be used by system and asset owners to assist in the preparation and response to AMI cyber security incidents. This document was developed by conducting interviews with EPRI members, AMI asset owners, and vendors, regarding practices involved in responding to AMI cyber security incidents and mapping the responses to requirements put forth by the Department of Homeland Security (DHS), National Institute of Standards and Technology (NIST), Open Smart Grid (Open-SG) Working Group, and Advanced Metering Infrastructure Security (AMI-SEC) working group. 1248 Keywords 1249 Cyber Security, Incident Response, Best Practices 1250 v CONTENTS 1251 1252 1253 1 INTRODUCTION ................................................................................................................5-1 1.1 Usage .....................................................................................................................5-1 1.2 References .............................................................................................................5-2 2 REFERENCES AND BIBLIOGRAPHIES ...........................................................................2-1 2.1 List of References ...................................................................................................2-1 2.2 Definition of Terms..................................................................................................2-1 3 AMI INCIDENT RESPONSE PREPARATION GUIDELINES .............................................3-1 3.1 Incident Response Organization .............................................................................3-1 3.1.1 AMI Entity Identification and Oversight [9 – SG.IR-1] ........................................3-1 3.1.2 Incident Response Program [9 – SG.IR-1] ........................................................3-2 3.1.3 AMI Incident Response Development Facility [9 – SG.IR-3, SG.IR-4] ...............3-3 3.1.4 Incident Response Roles and Responsibilities [9 – SG.IR-2] ............................3-3 3.2 Incident Response Planning ...................................................................................3-4 3.2.1 Defining AMI Cyber Incident Response Requirements [3 – 4.1; 9 – SG.IR-1, SG.AT-1] ......................................................................................................................3-4 3.2.2 Incident Scenarios Identification [9 – SG.IR-1, SG.AT-1] ..................................3-5 3.2.3 Establishing a Continuity of Operations Plan [1 - 2.12.2; 9 – SG.CP-2] .............3-6 3.2.4 Identifying Roles and Responsibilities for Continuity of Operations [9 – SG.AT-6]3-7 3.3 Risk and Impact Assessment ..................................................................................3-8 3.3.1 Identifying Internal Objectives and Functions ....................................................3-8 3.3.2 AMI Incident Classification [3 - 2] ......................................................................3-9 3.3.3 Impact Ranking of AMI Functions [4 - 2.3] ......................................................3-10 3.3.4 AMI Incident Impact Classification [4 - 2.3]......................................................3-11 3.3.5 AMI Incident Impact Analysis [4 -2.3] ..............................................................3-12 3.4 Incident Response Training ..................................................................................3-15 3.4.1 Incident Process Training [1-2.7.5; 3-2.12.1; 9 - SG.AT-6] ..............................3-15 3.4.2 Process Verification Testing [1-2.7.6] ..............................................................3-15 4 AMI INCIDENT RESPONSE BEST PRACTICES GUIDELINES ........................................4-1 4.1 Incident Management Process [9 – SG.IR-5] ..........................................................4-1 4.1.1 Incident Identification ........................................................................................4-2 4.1.2 Incident Remediation ........................................................................................4-6 4.1.3 Incident Review ................................................................................................4-8 5 CONCLUSION ...................................................................................................................5-1 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 APPENDIX A: HISTORY OF AMI CYBER SECURITY INCIDENTS 1288 APPENDIX B: CATALOG OF AMI INCIDENT ARTIFACTS AND FORENSICS vii 1289 APPENDIX C: DESCRIPTIONS OF AMI INCIDENT SCENARIOS 1290 viii 1291 LIST OF FIGURES 1292 1293 1294 Figure 1 – AMI Incident Impact Analysis Flowchart ............................................................... 3-13 Figure 2 – AMI Incident Management Process ........................................................................ 4-2 1295 1296 LIST OF TABLES 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 Table 1 – Scope of Impact .................................................................................................... 3-10 Table 2 - AMI Incident Impact Classification.......................................................................... 3-11 Table 3 – Likelihood of Occurrence ....................................................................................... 3-13 Table 4 – Risk Rating ............................................................................................................ 3-14 Table 5 – AMI Incident Type ................................................................................................. 3-14 Table 6 – Incident Scenario Description .................................................................................. 4-3 Table 7 – Physical Incident Scenarios ..................................................................................... 4-3 Table 8 – Authentication Incident Scenarios ........................................................................... 4-4 Table 9 – Communications Incident Scenarios........................................................................ 4-5 Table 10 – Software Incident Scenarios .................................................................................. 4-6 Table 11 – Incident Remediation Strategies ............................................................................ 4-7 1309 1310 ix 1311 1 INTRODUCTION 1312 1313 1314 1315 1316 1317 The following document provides cyber security incident response guidelines and best practice recommendations specific to AMI systems. The document assumes that incident response guidelines and practices already exist for enterprise systems. The guidelines and practices described are specific to the unique components comprising the AMI system, hereafter referred to as the AMI logical architecture as defined in [3 – 4.2]. 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 AMI logical architectures are different from other energy-related system architectures in several regards. The AMI logical architecture exists within the distribution grid and is closely associated with the utility’s customers. As the AMI is concerned with energy distribution and not transmission it does not fall under the scope of NERC CIP, ISO, or other bulk energy controls. Instead the AMI is more subject to mass media and customer scrutiny. The AMI also differs from other energy-related systems in that major components of the AMI such as the AMI meter operate outside the fence lines of utilities in areas that are easily physical accessible as compared to traditional electrical distribution grid components (e.g., substations and transformers).. . In some instances, AMI meters and web portals interact directly with customer-provided appliances and energy management systems using well-understood protocols. This document is dedicated to addressing these differences to the extent that the differences relate to AMI incident response. The document does not address incident response within the AMI-associated external to the AMI logical architecture as described in [3 – 4.2]. 1331 1332 1333 It is also assumed that the reader has access to and will become familiar with applicable standards cited in this document. The standards are included by citation only. The reader is encouraged to reference and become familiar with standards applicable to AMI systems. 1334 1335 1.1 1336 1337 1338 1339 1340 1341 This format of this document closely follows that of the NIST Special Publication (SP) 800-53, Information Security document [5] and the “U.S. Department of Homeland Security, Catalog of Control Systems Security: Recommendations for Standards Developers” document [1]. This document adopts the convention of adding a “supplemental guidance” to each requirement from the ASAP-SG Security Profile 2.0 document [3]. In this document, the supplemental guidance section contains the identified industry best practices for implementing the requirement. 1342 Requirements are listed in the following format: 1343 1344 1345 1346 1347 1348 Usage Requirement Title o Description o Requirement o Supplemental Guidance o Requirements Enhancement Rationale 5-1 1349 1350 This document is divided into two major sections: “AMI Incident Response Preparation Guidelines” and “AMI Incident Response Best Practices Guidelines.” 1351 1352 1353 The goal of the “AMI Incident Response Preparation Guidelines” section is to lay the organizational groundwork for developing policies to respond to Advanced Metering Infrastructure (AMI) cyber security incidents. 1354 1355 The goal of the “AMI Incident Response Best Practices Guidelines” section is to provide guidance on how to best respond to identified AMI cyber security threats. 1356 1357 1.2 1358 References throughout this document adopt the following format [<Reference #> - <Section #>]. 1359 1360 1361 1362 1363 1364 References <Reference #> - Refers to the document source listed in the reference section <Section #> - Refers to the specific section within the reference source The absence of a <Section #> within the citation implicitly refers to the document in its entirety. 5-2 1365 2 1366 REFERENCES AND BIBLIOGRAPHIES 1367 2.1 List of References 1. U.S. Department of Homeland Security, Catalog of Control Systems Security: Recommendations for Standards Developers, April 2011. 2. U.S. Department of Commerce. National Institute of Standards and Technology, Computer Security Incident Handling Guide, Special Publication 800-61, Revision 1. 3. The Advanced Security Acceleration Project (ASAP-SG) - Security Profile For Advanced Metering Infrastructure, Version 2.0, June 22, 2010. 4. AMI-SEC (Advanced Metering Infrastructure Security) Task Force - AMI Risk Assessment (draft document), AMI-SEC Risk Assessment v0.9a-20080319.doc. 5. NIST Special Publication (SP) 800-53, Information Security, Revision 3, Updated 5/1/2010, http://csrc.nist.gov/publications/nistpubs/800-53-Rev3/sp800-53-rev3final_updated-errata_05-01-2010.pdf . 6. UCAIUG: AMI‐SEC-ASAP AMI System Security Requirements V1.01, 12/17/2008 7. EPRI IECSA Volume II, Final Release, EPRI Use Case Repository, EPRI.com 8. AMI Network V 3.0.doc, EPRI Use Case Repository, EPRI.com 9. The Smart Grid Interoperability Panel (SGIP) – Cyber Security Working Group (CSWG), NISTIR 7628 – Guidelines for Smart Grid Cyber Security: Vol. 1, Smart Grid Cyber Security Strategy, Architecture and High-Level Requirements, August 2010 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 2.2 Definition of Terms 1386 1387 AMI entity –the owner and/or operator of the AMI logical architecture (e.g., cooperatives, municipal agency, or a corporate body). 1388 1389 Use case – a set of related activities and communications that render value in support of an AMI objective. 1390 1391 AMI function – a set of use cases supporting the achievement of a common AMI objective (e.g., billing). 1392 1393 1394 AMI component – a subdivision of the AMI logical architecture primarily involved in one or more AMI use case communications or sequence of activities. AMI components may secondarily participate in non-AMI use cases. AMI components are described in [3- 4.2] . 1395 AMI system – synonymous with AMI internal architecture are described in [3- 4.2]. 1396 1397 1398 1399 AMI-associated components – are subdivisions of the AMI logical architecture external to the AMI internal architecture (e.g., Outage Management System, Customer Information System, and Distribution Automation System). The AMI associated components are depicted as the AMI logical architecture external perspective in [3- 4.2]. 2-1 1400 1401 1402 AMI logical architecture [3- 4.2] –In this document when the term AMI logical architecture is used, we are referring to the AMI logical architecture internal perspective. The components which are encompassed within the AMI logical architecture internal perspective are: 1403 AMI Meters 1404 Field Tools 1405 AMI Communications Network Devices 1406 AMI Head End 1407 Meter Data Management System 1408 AMI Network Management System 1409 AMI Meter Management System 1410 1411 1412 Level of effort – is a measure of resources required to achieve an end (e.g., successfully attack an Advanced Metering Infrastructure (AMI) asset. Resources may include combinations of time, skill, money, and access to technology). 1413 Loss of Control – The inability to exercise AMI control functions (e.g., remote disconnect) 1414 1415 Loss of Communications – The inability to communicate with the AMI (e.g., send/receive commands from a meter) 1416 1417 Loss of Trust/Integrity – The inability to verify the security of functions or data within the AMI (e.g., compromise of keys, integrity check failures, billing data compromise) 1418 1419 Loss of Privacy – The inability to protect the privacy of information (e.g., compromise of keys, loss of encryption functionality) 1420 Loss of Attribution – The inability to attribute an AMI action (e.g., meter log on) to its actor 1421 1422 Natural Event – An event which can be correlated with other situational awareness data (e.g., weather forecast, fires, earthquakes). These events are generally caused by an act of nature. 1423 1424 Man-Made Event – An event not caused by nature and is generally caused as a result of a manmade event. 1425 1426 1427 1428 1429 Malicious Event – An event which has been caused with ill-intent. It generally occurs in the absence of natural or unintentional events. These events are identified by correlating the event with other situational awareness data (e.g., meter tampers, meter authentication attempts). Depending on the severity of the event, the AMI entity may need to notify law enforcement (e.g., local, federal).Types of events malicious events include: 1430 AMI cyber attack 1431 Civil unrest 1432 Vandalism 1433 Energy Theft 2-2 1434 1435 1436 Non-Malicious Event – An event which has been caused unintentionally, or as the result of operator or system design error. These events generally arise as a result of AMI operation, misconfiguration, or design errors. 1437 1438 Objective – a level of achievement necessary for a desired outcome or state of AMI entity business activity (e.g., customer communication). 1439 1440 2-3 1441 3 1443 AMI INCIDENT RESPONSE PREPARATION GUIDELINES 1444 1445 1446 1447 The purpose of Section 3 and its subsections are to define the preparation guidelines for planning the incident response procedures defined in Section 4. The guidelines provide the organizational framework upon which incident response procedures are rationally created, validated, and justified. 1448 1449 For this section a subset of guidelines has been selected which pertain specifically to the unique aspects of the Advanced Metering Infrastructure (AMI) such as: 1442 1450 1451 AMI meters operate in geographically distributed and easily physically accessible locations (e.g., homes, apartment buildings, and strip malls) 1452 AMI meters may contain remote connect/disconnect capabilities 1453 1454 AMI meter data may be used in organizational demand response programs therefore the integrity of the data is of concern 1455 1456 AMI systems often contain mesh communication networks allowing for the potential to leverage the network to attack a large number of meters 1457 1458 1459 The guidelines in this section should be considered as a supplement to any existing organizational incident response plan, and should be used to augment incident response approaches as applicable to AMI. 1460 1461 This section is divided into four subsections: incident response organization, incident response planning, risk and impact assessment, and training. 1462 3.1 1463 1464 This subsection describes the requirements and provides guidance on how to organize an AMI incident response plan. 1465 3.1.1 AMI Entity Identification and Oversight [9 – SG.IR-1] 1466 1467 The responsibility and accountability for AMI incidents fall to AMI entity having responsibility and authority over an AMI logical architecture. 1468 1469 1470 1471 1472 An individual or a group of individuals responsible for the different domains within the AMI entity should be identified and charged with leading the development of the AMI incident response program. It is important for this individual leader, or collective group of leaders to take responsibility for the planning and implementation of cyber security incident response processes and procedures. 1473 3.1.1.1 Requirement 1474 1475 An individual or a collective group of individuals with authority that encompasses all aspects of the AMI logical architecture should oversee the AMI incident response team and process. Incident Response Organization 3-1 1476 1477 1478 Organizations may subdivide leadership of the AMI into different operational domains. Examples of operational domains are: o Responsible for handling the business operations of AMI logical architecture such as revenue assurance and billing. 1479 1480 1481 AMI Operations o Responsible for day-to-day operation of the AMI internal architecture such as maintenance and implementation of the meter data management system. 1482 1483 1484 Business Processes Information Technology (IT) Operations 1485 1486 1487 o Responsible for overall management of the AMI associated components such as servers, backend appliances, and the organization’s security information and event management system (SIEM). 1488 1489 For each identified operational domain within the organization, a leader should be identified to coordinate the incident response process as it pertains to their domain. 1490 3.1.1.2 Supplemental Guidance 1491 1492 1493 1494 The designated leader within the AMI entity directs the formation of the incident response capability and governs the administration of the capability once it is operational. The AMI incident response designated leader has approval authority for incident plans, policies, and procedures [1-2.2, 2.3]. 1495 3.1.1.3 Requirements Enhancement 1496 None 1497 3.1.2 Incident Response Program [9 – SG.IR-1] 1498 1499 1500 1501 1502 The AMI logical architecture realizes use cases, thereby achieving business objectives specific to the automated collection of meter data, provisioning of power to a customer, and ensuring the integrity of the entity-customer relationship. AMI-specific use cases must be realized in a timely, reliable, and secure fashion for the relationship to prosper and entity business objectives to be achieved. 1503 1504 1505 1506 1507 The purpose of the AMI incident response program is to sustain or restore the AMI use cases despite incidents that would otherwise disrupt the AMI logical architecture. An AMI incident response program addresses the unique cyber security risks with a corresponding capability to continue or resume operations of an Advanced Metering Infrastructure (AMI) in the event of disruption of AMI use cases and objectives. 1508 1509 1510 1511 1512 The AMI incident response program entails the preparation, testing, and maintenance of AMIspecific policies, procedures, processes, systems, and skill sets. The program enables the AMI entity to restore the AMI’s operational status after the occurrence of a disruption. Disruptions can come from natural disasters, such as earthquakes, tornados, floods, or from manmade events like riots, terrorism, or vandalism. The ability for the AMI to function after such an event is 3-2 1513 1514 directly dependent on implementing program prior to incidents using the organization’s planning process. [1-2.12]. 1515 3.1.2.1 Requirement 1516 1517 The AMI entity establishes an AMI incident response program as an identifiable and recognized organizational function. 1518 3.1.2.2 Supplemental Guidance 1519 1520 The official establishment and visibility of the AMI incident response program as a formal and unique activity is important to its adoption and support within the AMI entity. 1521 3.1.2.3 Requirements Enhancement 1522 None 1523 3.1.3 AMI Incident Response Development Facility [9 – SG.IR-3, SG.IR-4] 1524 1525 1526 1527 1528 AMI logical architectures are complex internal and associated components satisfying AMI entity goals including reliability, integrity, and functionality. Like other systems, it is desirable to develop and verify incident response programs without risk to an operational AMI logical architecture. Therefore, it is good practice to use an AMI incident response development facility separate from the production AMI system. 1529 3.1.3.1 Requirement 1530 1531 A facility exists as a development and test vehicle for the creation, maintenance, and verification of AMI incident response policies, procedures, training, processes, and components. 1532 3.1.3.2 Supplemental Guidance 1533 1534 AMI systems have distinct differences from enterprise systems often found within an AMI entity and compose of systems such as: 1535 Electric Meters 1536 RF, Wireless, Wired and Cellular Communications Networks 1537 Signing and Encryption Appliances 1538 Meter Data Management Systems (MDMS) 1539 1540 Additionally, many components of the AMI system operate in potentially harsh and physically unsecured environments. 1541 1542 1543 The facility should to the extent possible replicate the unique aspects of the AMI system and its environment. The facility may consist of functional test beds used in development to test configurations of meters and systems prior to wide-scale deployments. 1544 3.1.3.3 Requirements Enhancement 1545 None 1546 3.1.4 Incident Response Roles and Responsibilities [9 – SG.IR-2] 1547 1548 The skill sets required to develop, test, and execute an AMI incident response program are many and in many cases unique to the AMI logical architecture. Likewise, the duties to be performed 3-3 1549 1550 1551 1552 1553 1554 within the program are often unique to the AMI system. For example, the understanding of the AMI meter and its communications networks (e.g., wide-area and neighborhood area) is a skill set different from a typical IP-based enterprise system. The designated roles to respond to a meter intrusion or an AMI network denial of service attack are also different as a meter technician may require training in non-IP communication protocols, AMI hardware configuration parameters, AMI diagnostic techniques, and cyber-security methods. 1555 3.1.4.1 Requirement 1556 1557 The duties, skill sets, and responsibilities of roles within the AMI incident response program are clearly defined and communicated within the AMI entity. 1558 3.1.4.2 Supplemental Guidance 1559 1560 1561 Each organization may sub-divide its AMI leadership into different functional domains. The AMI incident response team should consist of members from each of these functional domains and should include representatives from domains relating to the AMI-associated components.. 1562 1563 The following groups should be considered in the formation of a cross-functional team that plans for and responds to AMI incidents: 1564 Business (e.g., Legal, Accounting, Human Resources, Public Relations) o Charged with responding to the business impacts of the AMI incident. 1565 1566 AMI Operations o Charged with responding to the operational impacts of the AMI incident. 1567 1568 Information Technology (IT) Operations o Charged with responding to the impacts of the AMI incident as it relates to the IT system. 1569 1570 1571 1572 The cross-functional team may be adjusted depending on the scope of impact of the AMI incident. 1573 3.1.4.3 Requirement Enhancement 1574 1575 1576 The AMI framework is an integral part of the larger organizational framework. The organizational framework includes AMI incident response roles with well-defined skills, accountability and responsibilities. 1577 3.2 1578 1579 This section provides requirements and guidance on how to plan and prepare for an AMI incident. 1580 1581 3.2.1 Defining AMI Cyber Incident Response Requirements [3 – 4.1; 9 – SG.IR-1, SG.AT-1] 1582 1583 1584 The AMI system is implemented to satisfy a set of AMI-specific use cases and derive business value. Typical use cases can be found within the Advanced Security Acceleration Project (ASAP-SG) - Security Profile for Advanced Metering Infrastructure, Version 2.0, document [3– Incident Response Planning 3-4 1585 1586 4.1]. The document provides traceability of AMI requirements to business use cases and associated business objectives supported by the AMI system. 1587 1588 1589 1590 1591 The goal of the AMI incident response program is to satisfy AMI requirements associated with data integrity, reliability, timeliness, privacy, and accessibility when incidents disrupt the realization of AMI use cases. A thorough understanding of AMI requirements is necessary for the proper formulation of policies, procedures, training, and resources comprising the AMI incident response program. 1592 3.2.1.1 Requirement 1593 1594 AMI incident response program satisfies as set of clearly defined and articulated requirements traceable to the AMI system use cases and business objectives. 1595 3.2.1.2 Supplemental Guidance 1596 1597 1598 1599 The AMI use cases produce a set of business objectives when successfully achieved or a set of business impacts when disrupted by an incident. AMI incident response requirements should be considered in light of the business objectives and impacts associated with a given incident scenario. 1600 3.2.1.3 Requirement Enhancements 1601 None 1602 3.2.2 Incident Scenarios Identification [9 – SG.IR-1, SG.AT-1] 1603 1604 1605 1606 1607 A proactive stance toward AMI incidents requires the AMI program should anticipate incidents before they are encountered. The incident scenario analysis considers the impacts upon use cases and business objectives caused by benign or malicious attacks on AMI meters, communication devices, head end, forecasting system, meter management system, network, management system, and other AMI components as defined in [3- 4.3]. 1608 3.2.2.1 Requirement 1609 1610 1611 1612 The AMI incident response program identifies AMI incident scenarios and the possible causes for those AMI incidents. The incident scenarios should identify forensic evidence required for detection the AMI incident. The AMI incident response program should anticipate AMI incident scenarios which may not yet occurred. 1613 3.2.2.2 Supplemental Guidance 1614 1615 1616 1617 To proactively identify AMI incidents it is important that the organization develop an overall AMI incident response plan that provides a process for responding to AMI security incidents regarding vulnerabilities which have been identified and have not yet been remediated, as well as vulnerabilities for which the organization has accepted as a risk. 1618 1619 The AMI incident response plan should include methods for assigning incident impact and assetvalue metrics (e.g., monetary, or objective-oriented). 1620 The AMI incident plan should include practices for: 1621 Reviewing and updating incident response procedures. 1622 How to respond to newly released vulnerabilities. 3-5 1623 Metrics for determining the impact of newly identified vulnerabilities. 1624 1625 Methods for determining the type of AMI incident (e.g., malicious, non-malicious, natural) 1626 1627 The risk analysis described in Section 3.3.5 should be applied to scenarios after the plausibility of the scenario has been adequately established through examination or actual occurrence. 1628 Additionally, the AMI incident response plan should include: 1629 Risk mitigation strategies 1630 1631 Policies and procedures for responding to incidents based on the impact and type of AMI incident 1632 3.2.2.3 Requirement Enhancement 1633 None 1634 3.2.3 Establishing a Continuity of Operations Plan [1 - 2.12.2; 9 – SG.CP-2] 1635 1636 A continuity of operations plan ensures the resumptions of services in the event of an AMI incident. 1637 3.2.3.1 Requirement 1638 1639 1640 1641 1642 1643 1644 1645 The organization should develop, test and implement a continuity of operations plan dealing with the overall issue of maintaining or re-establishing operation of the AMI system in case of an undesirable interruption. The plan should addresses roles, responsibilities, assigned individuals with contact information, and activities associated with restoring system operations after a disruption or failure. Designated officials within the organization review and approve the continuity of operations plan. Individuals responsible for the systems covered by the continuity of operations plan should be trained on the implementation and execution of the continuity of operations plan. 1646 3.2.3.2 Supplemental Guidance 1647 1648 A continuity of operations plan addresses both business continuity planning and recovery of all vital AMI system operations. 1649 3.2.3.3 Requirements Enhancement 1650 The continuity of operations plan may include: 1651 Verification of communications with all AMI meters, collectors, and relays 1652 The restoration of disconnect switches to known states 1653 Verification of the integrity of meter firmware 1654 Verification of the integrity of meter data 1655 1656 The continuity of operations plan should also address: Unintentional AMI cyber security incidents (e.g., operator error, accidents) 3-6 1657 Natural events (e.g., high velocity winds, flooding, earthquakes) 1658 Intentional incidents as a result of: 1659 o Disgruntled current and past employees 1660 o Hackers 1661 3.2.3.3.1 Rationale 1662 1663 1664 1665 Experience demonstrates that systems fail and mistakes occur. An AMI continuity of operations plans allow for an orderly recovery from such situations. Critical analysis of a system often brings weaknesses to light, thereby giving the AMI incident response program opportunities to design additional safeguards into the AMI system necessary for and orderly recovery. 1666 1667 3.2.4 Identifying Roles and Responsibilities for Continuity of Operations [9 – SG.AT-6] 1668 1669 1670 The organization’s continuity of operations plan should define and communicate the specific roles and responsibilities for each part of the plan in relation to various types of disruptions to the operation of the AMI system. 1671 3.2.4.1 Requirement 1672 1673 1674 The organization’s continuity of operations plan defines and communicates the specific roles and responsibilities for each part of the plan in relation to various types of AMI incidents. [1 – 2.12.3]. 1675 3.2.4.2 Supplemental Guidance 1676 1677 1678 An AMI incident continuity of operations plan defines the roles and responsibilities of the various employees and contractors in the event of an incident. The plan identifies responsible personnel to lead the recovery and response effort if an incident occurs. 1679 1680 1681 The following roles and responsibilities are representative of the organizational domains involved with AMI and as a result they should be considered within the continuity of operations plan: 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 Incident Coordinator o Responsible for organizing, dispatching and coordinating the incident response team o Trained in the incident response policies, procedures, roles, responsibilities, and goals AMI Operations Lead o Responsible for the day-to-day operations of AMI system within the organization o Assists in identifying key personnel within the AMI operations group to assist in the incident response process o Leads efforts to quantify and categorize incident impacts o Trained in the AMI functions, use cases, goals, metrics, and incident response procedures IT Operations Lead o Responsible for the day-to-day operations of information systems within the organization o Assists in identifying key personnel within the IT operations group to assist in the incident response process 3-7 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 o Assists the AMI Operations lead in the quantification and categorization of incident impacts o Trained in information technology and enterprise cyber security. Business Operations Lead o Responsible for the day-to-day business operations within the organization o Assists in contacting the affected business entities within the organization o Assists the AMI lead in the quantification and categorization of incident impacts o Leads the mapping of the AMI entity's organizational objectives into cyber security metrics o Trained in the AMI entity’s planning and budgeting Public Relations Lead o Responsible for correspondence with the media, regulators, and other external groups. o Trained in the legal entity’s external communications, reporting, and marketing policies and practices. Legal Lead o Responsible for liaising with law enforcement and regulatory agencies regarding planning and responding to AMI cyber security incidents. o Trained in judicial evidentiary data collection and custodial control requirements, external reporting procedures and policies, and regulation governing cyber security incidents. 1715 1716 1717 1718 1719 These generic categories were compiled from the interviews conducted with EPRI members, AMI asset owners, and vendors, regarding the practices involved for responding to AMI cyber security incidents. Each organization may have different titles and categories for these roles and should make their best determination in mapping the provided categories to their organization. 1720 3.2.4.3 Requirements Enhancement 1721 None 1722 3.2.4.3.1 Rationale 1723 1724 1725 Specific roles and responsibilities within continuity plans help alleviate confusion in the event of a significant incident. This clarification can ease the process and better focus the organization when recovering from a disruption. 1726 3.3 1727 1728 This section provides requirements and guidance on how to perform a risk and impact assessment with the goal of assisting the AMI entity in prioritizing incident response efforts. 1729 3.3.1 Identifying Internal Objectives and Functions 1730 1731 1732 AMI entity objectives should be considered as part of the risk and impact assessment. Methods and use cases for those objectives should identify the tangible and intangible benefits to the AMI entity. Risk and Impact Assessment 3-8 1733 3.3.1.1 Requirements 1734 1735 The organization identifies AMI entity objectives and the use cases implemented to achieve those goals. 1736 3.3.1.2 Supplemental Guidance 1737 1738 1739 1740 An AMI entity should identify internal objectives and functions that would be affected by an AMI incident and should consider those objectives when assigning a risk and impact rating to the identified incidents. Examples of AMI entity internal objectives that may be affected by AMI incidents are: 1741 1742 1743 1744 1745 1746 Real-Time Pricing (RTP) – Top Level [7 – D19] Remote Connect/Disconnect [6– 2.1.1] Demand Response – Utility Commanded Load Control [7 – D12] AMI Network (Moving Data Elements from the AMI Head-End to Smart Meter & from the Smart Meter to the AMI Head-End) [8] Firmware Updates [9 – 2.3.17] 1747 3.3.1.3 Requirements Enhancement 1748 1749 1750 AMI entity objectives and use cases are impacted by AMI incidents in varying degrees. Use cases affected by AMI cyber-security incidents are identified and associated with organizational objectives. 1751 3.3.2 AMI Incident Classification [3 - 2] 1752 1753 1754 1755 1756 The AMI program should define common methods, vernacular, and criteria for describing a given AMI incident. Incidents may be classified according to common incident characteristics (e.g., failure mechanisms, mitigation methods, resource requirements, impact assessment ranking). Categorizing incidents facilitates better planning, organization, and uniformity within the incident response program. 1757 3.3.2.1 Requirement 1758 1759 AMI incidents are subdivided into categories and classified according scope of impact, likelihood of occurrence and incident type. 1760 3.3.2.2 Supplemental Guidance 1761 The following criteria should be used to describe AMI incidents: 1762 Scope of Impact 1763 1764 1765 1766 1767 1768 1769 1770 A classification based on the number of systems affected by the AMI incident. Scope of impact ratings can be used to determine an incident’s severity and impact on the operation of the AMI logical architecture. Likelihood of Occurrence: A classification based on the current threat environment of how likely the AMI incident is to occur. Likelihood of occurrence ratings can be used to reprioritize responses to AMI incidents. Type: 3-9 1771 1772 A classification based on the intent of the perpetrator of the AMI incident (e.g., natural, man-made, malicious, or non-malicious). 1773 3.3.2.3 Requirements Enhancement 1774 None 1775 3.3.3 Impact Ranking of AMI Functions [4 - 2.3] 1776 1777 The AMI program should assess the level of AMI entity concern associated with the loss of an AMI function. 1778 3.3.3.1 Requirement 1779 1780 AMI incident impact rankings are assigned to assets, functions, or use cases according to the consequence and severity of an incident that disrupts portions or all of the functional capability. 1781 3.3.3.2 Supplemental Guidance 1782 1783 1784 1785 1786 AMI incident impact rankings should be developed through threat assessment exercises. The threat assessment exercise should enumerate all components of the AMI logical architecture, enumerate the components’ interactions with external systems, assess the potential vulnerabilities within each component and component interface, and evaluate the impact (e.g., organizational and operational) of the vulnerability being exercised. 1787 Example impact ranking definitions are provided below in Table 15. Table 15 – Scope of Impact 1788 Scope of Impact 1 2 3 Low The incident has a low impact limited to a single unit (e.g., Electric Meter). Measures should be taken to address the threat related to the single unit. Medium The incident has a medium impact limited to units in a single grouping of AMI logical architecture components (e.g., a meter cell). Measures should be taken to identify the root cause of the threat and remediate the threat across all affected systems. High The incident has a high impact and the effects span the entire AMI logical architecture (e.g., meter, routers, head end). Measures should be taken to immediately contain the threat and a cross-functional team of AMI operators, IT system operators, AMI vendor, and business representatives, should be formed to remediate the threat. 1789 1790 1791 The scope of impact ratings listed here is provided as an example. Scope of impact for vulnerabilities should be determined on a case-by-case basis depending on the function and number of devices impacted. 1792 3.3.3.3 Requirements Enhancement 1793 None 3-10 1794 3.3.4 AMI Incident Impact Classification [4 - 2.3] 1795 1796 The AMI incident response program should classify AMI incidents based upon the impact of the AMI incident to AMI entity objectives. 1797 3.3.4.1 Requirement 1798 1799 Incidents are assigned an impact classification according to the AMI entity objective impacted by an incident. 1800 3.3.4.2 Supplemental Guidance 1801 1802 1803 1804 Table 16 provides an example mapping of AMI incidents to impact classifications based upon incident’s scope of impact. It is recommended that the reader formulate their own classification of AMI incidents and tailor it based on the unique aspects of their AMI deployment (e.g., usage of C12.18 keys) and their organization’s AMI objectives. 1805 Table 16 - AMI Incident Impact Classification ID AMI Incident Classification Description Impact Classification 1 Theft of Energy An attacker attempts to steal energy by modifying the meter and/or the metering data. Low – The attack is associated with a single occurrence of energy theft and has negligible impact. Medium – The attack is associated with multiple occurrences of energy theft and has a direct monetary impact. High – The attack is associated with a large scale theft of energy and can have both monetary and functional impacts 2 Theft of C12.18 Logon Credentials An attacker has stolen the C12.18 login credentials from a single meter. Low – The C12.18 credentials are different for each meter and have a minimal impact. Medium – The C12.18 credentials are by multiple meters but are limited due to the physical requirement for exploitation. High – The C12.18 credentials are in use by multiple meters and also have a high scope of impact due to the ability to perform the attack wirelessly or remotely. 3 Bypass of Security Controls An attacker has bypassed security controls on the meter. Low – The bypass of security credentials only affects a single device at a time and has a limited impact. Medium – The bypass of security controls affects multiple devices but has limited impact in terms of control. High – The bypass of security controls affects multiple devices and allows full bypass of security controls on the affected devices. 3-11 4 Failed Communications Integrity An integrity check failure has occurred on the meter. Low – The integrity check failure occurs in a limited number of instances and affects a limited number of meters. Medium – The integrity check failure occurs on a large scale and may affect the quality of data being received from the meters. High – The integrity check failure occurs across all data being received from the meters. Thresholds on number of meters affected should be defined by each organization to rank the impact of the incident. 5 Power Outage An AMI incident results in a power outage. Low – A limited number of customers are affected (e.g., single household or business). Medium – Multiple customers are affected by the effects are limited and isolated to a specific geographical area (e.g., neighborhood). High – A large scale power outage occurs affecting a significant portion of the AMI deployment (e.g., city or state). High impact outages may also include life and safety critical services (e.g., hospitals and emergency first responders). 6 Software Error An AMI incident results in a software error occurring on the meter. Low – The software error occurs in a limited number of devices and has a low impact (e.g., causes a reset or reboot). Medium – The software error occurs in multiple meters and has a high impact (e.g., toggling of disconnect switch, denial of service) High – The software error occurs a significant portion of the AMI deployment (e.g., city or state) and has a high impact (e.g., toggling of disconnect of switch). 1806 1807 3.3.4.3 Requirements Enhancement 1808 None 1809 3.3.5 AMI Incident Impact Analysis [4 -2.3] 1810 1811 1812 Define the methods for assessing the value of the AMI function to an AMI entity in terms that can also be used to quantify incident impact if part or all of the AMI logical architecture are rendered ineffective. 3-12 1813 3.3.5.1 Requirement 1814 1815 The AMI incident impact metrics assigned to the AMI incidents quantify incident response risk exposure. 1816 3.3.5.2 Supplemental Guidance 1817 1818 The AMI entity should identify the AMI functions and use cases to be included in the incident impact analysis. 1819 The following methodology may be adopted in assigning risk rankings to AMI assets. 1820 Define and Categorize AMI Incidents Assign Likelihood of Occurrence Assign Scope of Impacts Risk Rating 1821 1822 1823 Figure 10 – AMI Incident Impact Analysis Flowchart 1824 1825 First an impact ranking is assigned to AMI assets using a methodology such as that defined in the Table. 1826 1827 Next an exercise is performed where AMI incidents are defined and categorized. AMI incidents are assigned a scope of impact ranking based on the number AMI assets they affect. 1828 1829 1830 1831 Finally a likelihood of occurrence rating is assigned to the AMI incident based on historical evidence of similar types of incidents. An example of likelihood of occurrence ratings to be used is provided below in Table 17. The likelihood of occurrence table provided is a simplified version based off of the AMI-SEC likelihood interpretation policy [4 – 2.6.1]. 1832 Table 17 – Likelihood of Occurrence 3-13 Likelihood of Occurrence 1 Low Not expected to occur, or may occur in exceptional circumstances. 2 Medium Could occur at some time. 3 High Will probably occur in most circumstances. 1833 1834 Once the AMI incidents have been defined, categorized and assigned an impact and likelihood of occurrence rating, a risk rating can be determined by using a matrix such as shown in Table 18. 1835 Table 18 – Risk Rating Risk Rating Scope of Impact Likelihood of Occurrence Low Medium High 1 2 3 1 Low Low Medium Medium 2 Medium Low Medium High 3 High Low High High 1836 1837 1838 The risk ratings help to guide planning, budgeting, and resource allocation processes to address the issues of highest concern and impact. 1839 1840 Additional metrics may be included in the risk rating in the event of an incident response such as the type of AMI incident. 1841 The following type classification has been provided as an example: Table 19 – AMI Incident Type 1842 Type 1 2 3 Malicious The incident has been determined to have been caused by a malicious actor. Additional measures may be taken to contain the threat until the threat has been remediated. NonMalicious The incident has been determined to have been caused by a non-malicious actor such as an operational error. Depending on the severity of the error, measures may be taken to contain the incident impact. Natural The incident has been determined to have been caused by a natural event such as an earthquake, or storm. Normal operational procedures should be utilized to contain and remediate the incident. 1843 1844 1845 1846 Such ratings are dynamic and may not be able to be determined prior to an incident occurring; however it may be used in adjusting the incident response in the course of investigating an AMI incident. 3-14 1847 3.3.5.3 Requirements Enhancement 1848 None 1849 3.4 1850 1851 This section provides requirements and guidance on how to develop and assess incident response training. 1852 3.4.1 Incident Process Training [1-2.7.5; 3-2.12.1; 9 - SG.AT-6] 1853 1854 The AMI incident response program should identify the types of training and materials required to prepare the AMI entity, its personnel, and those affected by AMI incidents 1855 3.4.1.1 Requirement 1856 1857 1858 1859 The program includes training on the creation and implementation of the AMI incident response plans for employees, contractors, and stakeholders. The program provides refresher training annually. The training covers employees, contractors, and stakeholders in the implementation of the continuity of operations plan. 1860 3.4.1.2 Supplemental Guidance 1861 1862 1863 1864 1865 Training needs are to be provided to individuals in the AMI community so that all understand the content, purpose, and implementation of the incident response plans. Different levels of incident response training may be prepared for personnel with different levels of AMI responsibility. Refer to 3.2.4 for the different AMI incident response roles and responsibilities and recommended training. 1866 3.4.1.3 Requirements Enhancement 1867 None 1868 3.4.2 Process Verification Testing [1-2.7.6] 1869 Regular testing helps to determine the effectiveness of the AMI incident response plans. 1870 3.4.2.1 Requirement 1871 The program regularly tests AMI incident response plans to validate the efficacy of the program. 1872 3.4.2.2 Supplemental Guidance 1873 1874 Following the preparation of the AMI incident response plans, a schedule is developed to review and test each of the plans to ensure that it continues to meet its AMI objectives. 1875 3.4.2.3 Requirements Enhancement 1876 None Incident Response Training 1877 3-15 1878 4 1880 AMI INCIDENT RESPONSE BEST PRACTICES GUIDELINES 1881 1882 1883 This section focuses on the best practices for responding to AMI incidents. This section introduces the AMI incident management process and provides specific best practice guidance regarding incident detection, remediation and review. 1884 4.1 1885 1886 1887 A phased approach can greatly streamline the incident notification and mobilization process. An example of one such approach is provided. The organization may tailor the approach to best fit their organization. 1888 1) Incident Identification 1879 Incident Management Process [9 – SG.IR-5] 1889 1890 a. Incident is identified through an automated process such as an SIEM or reported manually through another source 1891 1892 b. Evidence such as alarms, events and situational awareness data is collected and preserved 1893 1894 c. Potential incident scenarios are classified and identified based on available evidence and the incident response teams are notified 1895 2) Remediation 1896 1897 a. If a remediation does not exist, the incident response team will develop a remediation strategy which may involve a root cause analysis of the incident 1898 b. The applicable remediation strategy is applied based on the identified scenario 1899 3) Review 1900 1901 a. If necessary, regression testing is performed in order to verify that the vulnerability or exploit has been remediated. 1902 1903 1904 b. A review of the incident is performed in order to determine if the incident response strategy was effective and whether additional training for incident responders is necessary or if the incident detection rule sets need to be fine-tuned. 1905 Figure 11 provides a detailed view of the AMI incident management process. 4-1 AMI Incident Reported Out of Band SIEM Receives Event SIEM Applies AMI Incident Detection Rule Sets Incident Scenario Determined? No Gather Situational Awareness Data No Develop Remediation Strategy Yes Remediation Strategy Exists? Yes No Apply Remediation Strategy Incident resolved? Yes Perform Incident Review 1906 Figure 11 – AMI Incident Management Process 1907 1908 The following sections provide guidance and best practices for the incident management process. 1909 4.1.1 Incident Identification 1910 1911 1912 1913 1914 Incident identification is the first step in the incident management process. Identification of AMI incidents involves the collection of forensic evidence (e.g., alarms, events and situation awareness data) in order to determine the incident scenario. Identification of AMI incidents and the incident scenario allows for the AMI entity to determine the appropriate next steps such as developing a remediation strategy or accepting the risk. 1915 1916 4.1.1.1 Security Information and Event Management Systems [9 – SG.IR-6] 1917 1918 1919 Security Information and Event Management (SIEM) systems allow an organization to organize and correlate live situational awareness data from multiple systems, including non-AMI systems to detect security events or incidents. 1920 4.1.1.1.1 Guidance 1921 1922 1923 Organizations should develop AMI specific rule sets for their SIEM systems. SIEM rule sets may include situation awareness data to assist in the detection of AMI incident and incident scenarios. 1924 1925 The SIEM should be configured to collect alarms and events from all components of the AMI logical architecture. 1926 1927 1928 Once the rule sets have been developed and the SIEM’s have been configured, the organization should fine tune the rule sets in such a way that they do not result in an unacceptable number of false positive AMI incident detections. 4-2 1929 4.1.1.2 Data Mining 1930 1931 Data mining allows an organization to analyze archives of past security logs to identify intrusions. 1932 4.1.1.2.1 Guidance 1933 1934 1935 1936 Organizations should archive, for an organizationally defined period of time, AMI-related SIEM alarms, events and situational awareness data. Data mining solutions may then be used to allow for a forensic investigation to detect signs of current or past intrusions and fine-tune AMI SIEM rule sets to detect intrusions and reduce the number of false positive AMI incident detections. 1937 4.1.1.3 Log Message Integrity 1938 1939 1940 Integrity checks on log messages can be used to determine if messages been tampered or modified from acquisition to analysis, transmission and storage. Additionally, integrity checks should be used to allow for preservation of forensic evidence for legal purposes. 1941 4.1.1.3.1 Guidance 1942 1943 1944 1945 Organizations should utilize integrity checks such as cryptographic signatures to detect tamper or modification of log messages during transmission. Modification to data in transit may indicate a cyber-security intrusion. Organizations should also employ integrity checks to preserve SIEM data for forensic and legal purposes. 1946 4.1.1.4 Incident Scenarios [9 – SG.IR-8] 1947 1948 1949 1950 1951 This section provides guidance on how to identify specific AMI incidents. The following fields in Table 20 are used to describe each of the incident scenarios listed in the sections below. After the incident scenario has been determined, the incident classification field in Table 25 is used to determine the appropriate remediation strategy to employ. Table 20 – Incident Scenario Description 1952 ID Incident Scenario Alarms and Events Situation Awareness Data Incident Classification Identifier for the incident scenario Description of the incident scenario Alarms and events associated with the incident scenario Evidence to support that the incident has occurred Classification of the incident to allow the incident response team to determine an appropriate remediation strategy to employ 1953 1954 4.1.1.4.1 Physical 1955 1956 The following incident scenarios occur as a result of a physical action being taken upon the meter. Table 21 – Physical Incident Scenarios 1957 ID Incident Scenario Alarms and Events Situational Awareness Data 4-3 Incident Classification PHY.1 Attacker installs bypass device to steal electricity Removal Tamper A physical bypass device has been installed. Theft of Energy PHY.2 Attacker tampers with meter to steal electricity Removal Tamper Tamper seals on the meter have been broken. Theft of Energy Attacker vandalizes meter to steal electricity Removal Tamper A natural event (e.g., earthquake, fire, or flood) causes the meter to lose power. Device Outage Attacker removes meter and inserts it backwards into the meter housing to steal electricity Removal Tamper PHY.3 PHY.4 PHY.5 Signs of hardware modification present (e.g., wires or headers soldered onto board) Communications Lost Communications Lost Reverse Rotation Signs of physical damage to the meter. Theft of Energy Weather or news reports may indicate that a natural event has occurred. Power Outage Physical inspection reveals meter is inserted backwards. Theft of Energy 1958 1959 1960 1961 1962 4.1.1.4.2 Authentication The following incident scenarios occur as a result of an attacker attempting to bypass authentication controls on the meter. Table 22 – Authentication Incident Scenarios 1963 ID Incident Scenario Alarms and Events Situational Awareness Data Incident Classification AUTH.1 Attacker attempts to guess the login credentials to the C12.18 optical port interface C12.18 Failed Login Event occurs in the absence of any known maintenance activity Theft of C12.18 Logon Credentials Attacker attempts to bypass the meter’s security controls by injecting C12.22 messages with invalid credentials C12.22 Failed Authentication Inspection of packet captures to detect maliciously modified packets Bypass of Meter Security Controls AUTH.2 C12.18 Successful Login 1964 1965 4-4 1966 1967 1968 1969 4.1.1.4.3 Communications The following incident scenarios occur as a result of a disruption of communications between the meter and the head end. Table 23 – Communications Incident Scenarios 1970 ID Incident Scenario Alarms and Events Situational Awareness Data Incident Classification COM.1 Poor communications between the AMI meter and the head end causes integrity check failures Failed Communications Integrity Check Failed communications integrity checks occur sporadically without any discernible pattern (e.g., types of messages failing integrity checks). Poor Communications Link COM.2 Attacker injects fuzzed packets on the communications link in order to bypass security controls Failed Communications Integrity Check Events occur with a discernible pattern (e.g., types of messages failing, period in which events occur) Bypass of Security Controls COM.3 Attacker injects malformed packets in an attempt to spoof the meter’s energy usage Failed Integrity Check Events occur with a discernible pattern related to billing or metering functions. Theft of Electricity COM.4 Attacker employs an active RF jamming attack to disrupt meter communications Communications Lost A number of meters in the same physical locality report a loss of communications Denial of Service COM.5 Attacker floods the communications channel with traffic in order to disrupt meter communications Communications Lost Meter traffic flow monitoring tools report an abnormally large volume of traffic Denial of Service Attacker exploits a meter vulnerability to cause a permanent denial of service Communications Lost Event occurs in the absence of any additional situational awareness data (e.g., natural event, tamper events) Denial of Service COM.6 Malformed PDU Failed Communications Integrity Check Malformed PDU 1971 1972 1973 1974 4.1.1.4.4 Software The following incident scenarios occur as a result of a software error on the meter. 4-5 Table 24 – Software Incident Scenarios 1975 ID Incident Scenario Alarms and Events Situational Awareness Data Incident Classification LOG.1 Attacker injects malicious code in order to bypass security controls on the meter Software Error Additional signs of suspicious behavior (e.g., unauthorized remote disconnects) Bypass of Security Controls Meter software flaw causes software errors to occur Software Error Incident occurs under specific circumstances in similarly configured meters Meter Malfunction LOG.2 Malformed PDU 1976 1977 4.1.2 Incident Remediation 1978 1979 1980 1981 The next step in the incident management process is remediation. Remediation of AMI incident involves identifying the appropriate remediation strategy to employ. In most circumstances, Remediation of AMI incidents should be performed after the AMI incident scenario and incident classification has been identified. 1982 The following best practices have been identified for the remediation of AMI incidents. 1983 4.1.2.1 Event Log Retention [9 – SG.IR-10] 1984 Event logs allow for the forensic analysis of AMI incidents. 1985 4.1.2.1.1 Guidance 1986 1987 Event logs (e.g., alarms, events, and situational awareness data) should be periodically archived, as defined by organizational policy, for the following purposes: 1988 o Historical event log data can be used in conjunction with data mining methods to improve SIEM algorithms and detect specific AMI incidents. 1989 1990 1991 Data Mining Evidence in Court 1992 o Historical event log data may be used to prosecute criminals in the court of law. 1993 1994 1995 Due to the distributed nature of AMI electric meters, it is recommended that event logs be periodically retrieved from the meter and stored in a centralized or distributed system. The amount of time to store AMI event log data may be determined by corporate policies. 1996 4.1.2.2 Developing a Remediation Strategy [9 – SG.IR-9] 1997 1998 1999 2000 2001 The AMI entity should create incident classifications and develop remediation strategies in the event that the identified incident has occurred. Table 25 provides a list of incident classification and remediation strategies. The specific remediation strategy to employ will depend on the impact ranking of the AMI incident. An example of how to develop impact rankings is provided in Section 3.3.4. 4-6 2002 Table 25 – Incident Remediation Strategies 2003 ID AMI Incident Classification Remediation Strategy REM.1 Theft of Energy The organization should determine a quantitative threshold of events that should occur before an AMI incident response team is deployed for investigation. If theft of energy has been determined, the incident is reported to the business department of the AMI entity in order to recover lost revenue. If the incident has been determined to be a part of a large scale incident the AMI entity may deploy its incident response team for further investigation. The organization should determine a threshold (e.g., monetary) at which local, state, or federal law enforcement may be notified to assist in the investigation. REM.2 Theft of C12.18 Logon Credentials The organization should determine a quantitative threshold of failed C12.18 login attempts before an AMI incident response team is deployed for investigation. The organization should focus on events that occur outside of maintenance windows to reduce false positive detections. If the C12.18 credentials have been determined to be compromised, the incident response team should assess the impact of the stolen credentials on the AMI logical architecture. Based on the impact assessment the incident response team may accept the risk or deploy a remediation strategy such as replacing the C12.18 login credentials on affected AMI meters. REM.3 Bypass of Security Controls The organization should determine a quantitative threshold of failed C12.22 authentication events before an AMI incident response team is deployed for investigation. If security controls have been determined to have been bypassed, the incident response team should assess the impact of the bypass of security controls. The incident response team should then attempt to investigate the root cause of the event and may decide to deploy a remediation such as issuing a software patch. REM.4 2004 2005 2006 Failed Communications Integrity The incident is reported to the AMI operations department of the AMI entity in order to correct the poor communications link. REM.5 Power Outage The incident is reported to the AMI operations department of the AMI entity and organizational power restoration procedures are followed. REM.6 Software Error The incident response team should work with the vendor to assess the impact of the software error on the AMI meter. The incident response team should then attempt to investigate the root cause of the event and may decide to deploy a remediation such as issuing a software patch. If the incident has been determined to be a malicious attack the incident response team should work with the organization to determine the appropriate course of action. In certain circumstances, an organization may encounter a new or unidentified vulnerability for which a remediation currently does not exist. In this case, the AMI entity should deploy an 4-7 2007 2008 2009 incident response team to investigate the incident and develop an incident response strategy and update the AMI incident response plan as necessary. 2010 4.1.3 Incident Review 2011 2012 2013 Organizations should periodically review their incident handling procedures following the occurrence of an AMI incident for effectiveness. The following sections provide guidance on the activities to be performed following an AMI incident. 2014 4.1.3.1 Forensic Analysis 2015 2016 A forensics analysis is the process of collecting digital evidence such as AMI event logs, for the purposes of a legal investigation. 2017 4.1.3.1.1 Guidance 2018 2019 2020 In some circumstances the root cause of an AMI incident cannot be determined before the threat has been remediated. A forensic analysis of AMI event logs should be performed in order to determine the root-cause of the incident. 2021 2022 Depending on the nature of the incident and the AMI architecture, the organization should collect and archive logs from all components of the AMI logical architecture. 2023 4.1.3.2 Incident Reporting [9 – SG.IR-11] 2024 2025 2026 Depending on the nature of the event (e.g., widespread energy theft), the organization should report the incident to the appropriate authority if there are implications of activities violating local, state, or federal laws. 2027 4.1.3.2.1 Guidance 2028 The AMI entity should follow existing guidance for reporting incidents. 2029 4.1.3.3 Incident Impact Analysis [9 – SG.IR-9] 2030 2031 Performing an incident impact analysis allows the organization to determine the actual impact of the incident in order to update their AMI incident response plans. 2032 4.1.3.3.1 Guidance 2033 The AMI entity should following existing guidance for performing an incident impact analysis. 2034 4-8 2035 5 2036 CONCLUSION 2037 2038 2039 2040 The preceding sections highlight the key elements of AMI Incident Response in an AMI technology agnostic fashion. The information contained herein is the accumulation and combination of industrial input and experience. As such, it represents the current best practices for AMI incident response as AMI systems currently exist. 2041 2042 2043 This document is a snapshot in time and must evolve as the AMI systems evolve. Future AMI operational experience and greater adoption of AMI security standards should be added to this document in the form of more elaborate and specific incident response guidelines. C-1 Export Control Restrictions The Electric Power Research Institute Inc., Access to and use of EPRI Intellectual Property is granted with the specific understanding and requirement that responsibility for ensuring full compliance with all applicable U.S. and foreign export laws and regulations is being undertaken by you and your company. This includes an obligation to ensure that any individual receiving access hereunder who is not a U.S. citizen or permanent U.S. resident is permitted access under applicable U.S. and foreign export laws and regulations. In the event you are uncertain whether you or your company may lawfully obtain access to this EPRI Intellectual Property, you acknowledge that it is your obligation to consult with your company’s legal counsel to determine whether this access is lawful. Although EPRI may make available on a case-by-case basis an informal assessment of the applicable U.S. export classification for specific EPRI Intellectual Property, you and your company acknowledge that this assessment is solely for informational purposes and not for reliance purposes. You and your company acknowledge that it is still the obligation of you and your company to make your own assessment of the applicable U.S. export classification and ensure compliance accordingly. You and your company understand and acknowledge your obligations to make a prompt report to EPRI and the appropriate authorities regarding any access to or use of EPRI Intellectual Property hereunder that may be in violation of applicable U.S. or foreign export laws or regulations. (EPRI, www.epri.com) conducts research and development relating to the generation, delivery and use of electricity for the benefit of the public. An independent, nonprofit organization, EPRI brings together its scientists and engineers as well as experts from academia and industry to help address challenges in electricity, including reliability, efficiency, health, safety and the environment. EPRI also provides technology, policy and economic analyses to drive long-range research and development planning, and supports research in emerging technologies. EPRI’s members represent more than 90 percent of the electricity generated and delivered in the United States, and international participation extends to 40 countries. EPRI’s principal offices and laboratories are located in Palo Alto, Calif.; Charlotte, N.C.; Knoxville, Tenn.; and Lenox, Mass. Together…Shaping the Future of Electricity © 2010 Electric Power Research Institute (EPRI), Inc. All rights reserved. Electric Power Research Institute, EPRI, and TOGETHERSHAPING THE FUTURE OF ELECTRICITY are registered service marks of the Electric Power Research Institute, Inc. 10xxxxx 2044