Building Reliable, Secure and Manageable Substation Communications Dragan Dokic | CCIE, CISSP, MCSE Introduction - Experience • Dragan Dokic | President, Summit Energy Tech • Focus on utility sector – Infrastructure systems management – Custom business systems software development • 16 years of experience in IT industry • 10 years in utility sector – Managed network operations for PNGC Power [Portland, OR] from September 2002 to October 2011 – Presentation focuses on lessons learned in field network reliability, security and manageability from this experience Introduction • PNGC’s 2001 – 2011 field network – 92 office, substation and repeater sites at 11 distribution utilities in Oregon, Idaho • System mission – Gather real-time load data 24/7 for power scheduling operation in Portland – Support local utility SCADA/AMI/Site Security operations PNGC Power WAN – July 2011 Toledo, OR Boardman, Oregon Junction City, Oregon Lewiston, ID Malta, ID The Moon Areas of Focus Reliability Security Manageability Presentation available for download at in the Events section Reliability – Network Design • Keys to success – Diversity in media • Combine land lines, fixed wireless [private/public], mobile wireless and satellite – Diversity in providers • Local and national – Dynamic Routing [OSPF] • Routers exchange knowledge of local network with neighboring routers • Enterprise grade routers / switches a requirement • Perfect world configuration – Private wired/wireless ‘island’ with two Internet gateways using distinct media and distinct providers Connectivity overview Backup router Primary router Link cost overview Backup Primary Link cost calculation Sub A -> Main Office via Satellite tunnel: 3+1=4 Link cost calculation Sub A -> Main Office via 900Mhz+DSL tunnel: 1+1+1=3 Open Shortest Path Link cost via Satellite tunnel [4] higher than via DSL tunnel[3]; therefore, packets will traverse 900Mhz/DSL tunnel in normal operation Normal Operation Open Shortest Path From substation A to Main Office Normal Operation Open Shortest Path From substation B to Main Office Link down operation If DSL tunnel is down, packets will traverse satellite tunnel; Sub A Main Office X Link down operation If DSL tunnel is down, packets will traverse satellite tunnel; Sub B Main Office X Questions? Security – Overview • Wireless link encryption • Function specific VLANs • No default routes! Wireless Link Encryption • Media device level [e.g. Radio, Modem] – WEP, WPA, WPA2 • Routing device level [e.g. Cisco 891 router] – IPSEC • End device level [e.g. DIGI TS4 port server] – SSL At what level to secure data? Security - Wireless Link Encryption [continued] • Most secure option? – Use all three if management overhead is not an issue • Most efficient but secure enough option? – Use routing device site-to-site VPN capabilities – Advantages: • Support for best commercially available security technologies [e.g., AES-256] • Comprehensive change logging capabilities • Standardized configuration throughout the system [less management overhead] Security – Function Specific VLANs • Define VLAN’s per business function – SCADA, AMI, Security System, Wireless, VOIP, Network Mgmt. • Firewall traffic between VLANs on need-to-access basis – E.g., Prevent personnel attached to substation wireless VLAN to access documentation stored on a server at the main office from accessing recloser controls in the SCADA VLAN • Reliability advantages – Non-critical VLANs [e.g. AMI, security] can be shut down automatically/remotely if link quality is too poor to carry all traffic, but good enough to carry SCADA One VLAN per business function High-speed link outage scenario Security – No Default Route! • Do not use default routes through service providersupplied gateways • Define a single host route back to the main office, then establish default route through VPN tunnel • This is the most effective method to prevent attacks sourced from the Internet • Always use in conjunction to regular firewall configuration lists [not a substitute!] Less secure Provider gateway More secure Provider gateway Questions? Manageability - Overview • Tools – network management systems • Addressing – developing a scheme • Watchdog system – preventing lockout Manageability – Tools • Network Management Systems [NMS] • Protocols used • SNMP, Syslog, ICMP, HTTP • Applications • PRTG • Solarwinds Syslog Manageability – Tools [continued] • How to collect data? Push vs. Pull – Pull: Poll devices using SNMP/HTTP/ICMP at regular intervals [e.g., every – Push: Devices send data per defined event triggers – SNMP traps – Syslog messages • What data to collect? – – – – Availability [ping] Network utilization Input voltages RSSI [radio link quality] Manageability – Tools [continued] • Pull example: – 5 minute SNMP poll of UPS for input voltage – If voltage drops below threshold of 108VAC for a duration of time longer than 5 minutes, an alert will be triggered by NMS [e-mail, text message, event log] – But what if voltage drops for 2 minutes only in between polls? You may not know it even happened. • Push comes to rescue: – UPS sends SNMP trap to NMS as soon as voltage drops below 108VAC – Alert is triggered by NMS when trap is received Paessler PRTG – Screen shot Solarwinds Kiwi Syslog – Screen shot Manageability – Addressing • Develop consistent scheme to use system wide • Recommended private range: 10.0.0.0/8 – – – – First octet: same for entire system Second octet: site ID [e.g. 8=Springfield Sub] Third octet: business function ID [e.g., 4=AMI] Fourth octet: device itself [e.g., Collector #1] Subnet Mask [255.255.255.0] 1st octet ‘fixed’ 2nd octet = site ID 4th octet = device 3rd octet = vlan/business function Manageability – Addressing [continued] • Large network? – Group sites by region using second octet – Allows for address summarization if needed. • Example: – Eastern division region: • 10.64-127.0.0 • Summary address: 10.64.0.0/10 – Western division region: • 10.128-191.0.0 • Summary address: 10.128.0.0/10 Manageability – Watchdog System • General concept – Reboot key remote communications devices if connectivity to central site is interrupted • Benefit – Prevent unnecessary site visits due to • Operator error • Device lock-up [e.g., buggy firmware, heat issues] Manageability – Watchdog System [continued] • Hardware requirements: – SNMP-capable switched PDU with task scheduling and delayed power cycling command capabilities – Example: APC AP7900 8-port 15A PDU • Software capability requirements: – Centralized command override mechanism using NMS – Send SNMP ‘Set’ to cancel pending power cycling command Manageability – Watchdog System Example • ‘Delayed’ power cycle schedule is defined on PDU: – Outlets to power cycle: – Frequency: – Command execute delay: 1,2 [e.g., radio, router] 60 minutes 30 minutes • Network management system running at main office sends an SNMP delayed power-cycle command cancel message – Frequency: every 5 minutes • Process – If delayed power cycle cancel command cannot reach the PDU at least one time during the 30 minute reboot delay period, outlets 1 and 2 will be power cycled and communication will (hopefully!) be restored Questions? Thank you!