Do ‘Black Swans’ fly in flocks? Societies’ vulnerability to infrastructure failure. University of Bath Institute for Policy Research Lecture 22nd October 2013 Prof. Jim Norton FREng Past President BCS, The Chartered Institute for IT External Board Member UK Parliamentary Office of Science & Technology Member of ippr Commission on National Security in the 21st Century www.profjimnorton.com Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. The second half of the chessboard Original idea: George Gilder at the Cato-Brookings Institution conference "Regulation in the Digital Age," held in Washington D.C. on April 17-18, 1997. The cost-performance of electronics doubles every 18-24 months (Moore’s Law) 1,000,000,000,000 100,000,000,000 10,000,000,000 1,000,000,000 100,000,000 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 38 Doublings 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 Source: Gordon E. Moore. Cramming more components onto integrated circuits. Electronics Magazine 38(8), 19/4/1965, available at ftp:// download.intel.com/museum/Moores_Law/ArticlesPress_Releases/ Gordon_Moore_1965_Article.pdf and Analysys Moore’s Law in Action: The family of processors known as “Ivy Bridge” are based on circuits just 22 nanometre feature size, reduced from the 32 nanometres of the previous generation “Sandy Bridge”. Latest target is 5 nanometre feature size using extreme ultra violet light to mask chips prior to etching. Source: Intel & Financial Times Opto-electronics follow the same path (Moore’s Law operates in telecoms, too) 45 000 40 000 35 000 30 000 Mbit/s 25 000 20 000 15 000 10 000 5000 0 1975 45 Doublings 1980 1985 1990 1995 2000 2005 Karlsruhe Institute of Technology reports a transmission rate of 26 Terabits/sec over a distance of 50 kms using 325 multiplexed optical channels from a single laser on a single fibre… Source: http://www.zdnet.co.uk/news/emerging-tech/2011/05/24/scientists-hit-record-breaking26tbps-by-laser-40092859/ The cost-performance of magnetic storage doubles roughly every 18months… 1,000,000,000,000 100,000,000,000 10,000,000,000 1,000,000,000 100,000,000 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 31 Doublings 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 Source: E. Grochowski and R. D. Halem. Technological impact of magnetic hard disk drives on storage systems. IBM Systems Journal 42 (2) 21/4/2003, available at https://www.research.ibm.com/journal/sj/ 422/grochowski.html and Silicon Image Cooper’s law for wireless 100,000,000,000,000 1,000,000,000,000 10,000,000,000 100,000,000 1,000,000 10,000 100 1 45 Doublings 1895 1905 1915 1925 1935 1945 1955 1965 1975 1985 1995 2005 Cooper’s Law, (after ArrayComm Chairman, Martin Cooper), states that the number of conversations (voice and data) conducted over a given area, in all of the useful radio spectrum, has doubled every two and a half years for the last 107 years, ever since Marconi discovered radio in 1895. Latest is WiGig Alliance developing 9Gbit/sec WiFi… Source: ArrayComm Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. USA Eastern & Mid-Western power outage New York power outage August 14, 2003 1:33 PM Boing Boing is among the first to report that a massive power outage just hit much of the Northeast, including New York, Cleveland and Detroit. CNN's breaking news just confirmed it. Source: St. Andrews Management Institute: SAMI Consulting http://www.samiconsulting.co.uk New York Power outage August 2003 3 reported deaths 3000 fires from use of candles 80,000 calls to 911 300 arrested for looting Shut down of cellular coverage 400 flights cancelled Cost estimated at over £5bn Source: St. Andrews Management Institute: SAMI Consulting http://www.samiconsulting.co.uk 2003 was not a great year for power companies… Source: http://www.mapreport.com/subtopics/d/o.html The Manchester cable chamber fire “Damage to ‘choke points’ In March 2004, a fire broke out in a BT cable tunnel in Manchester and put 130,000 land lines out of action, affecting internet services and disrupting several parts of the emergency services communications network including Derbyshire and Cheshire police forces and the Greater Manchester ambulance service. Many bank cash machines in the area were closed since they make security checks over phone lines and local shops could not use credit and debit card machines for the same reason. The incident highlights the vulnerability of parts of our communications infrastructure, and demonstrates how a single failure can cascade across multiple areas and services. At the same time, some organisations which had thought they had back-up communication routes in place should the BT services go down, found that these alternative routes used duct space in the same cable tunnel, and so were lost as well.” Source: UK Parliamentary IT Committee http://www.pitcom.org.uk/briefings/PitComms1-CNI.pdf “Pitt” Floods Review Quote from Sir Michael Pitt’s covering letter to the Secretaries of State: “Better planning and higher levels of protection for critical infrastructure are needed to avoid the loss of essential services such as water and power”. Source: The Pitt Review. See: http://www.environment-agency.gov.uk/research/library/publications/33889.aspx Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. A ghost at the feast? “We live today in a complex, densely networked and heavily technology-reliant society. Extensive privatisation and the pursuit of competitive advantage in globalised markets, have also led us to pare down the systems we rely upon until little or no margin for error remains. We have switched to lean production, stretched supply chains, decreased stock inventories and reduced redundancy in our systems. We have outsourced, offshored and embraced a just-in-time culture with little heed for just-in-case. This magnifies not only efficiency but also vulnerability. Everything depends on infrastructure functioning smoothly and the infrastructure of modern life can be brittle: interdependent systems can make for cascades of concatenated failure when one link in the chain is broken”. Some key reports… A series of reports published in the summer of 2009 stressed the need for major investment in infrastructure renewal and hardening against a wide range of threats… Quotes from the reports (1)… Recommendation 52: Government should review its powers to mandate realistic minimum levels of resilience in relation to all critical infrastructures and in relation to all areas of interdependence between different infrastructure sectors. Where wider interpretation or amendment of existing legislation is not sufficient and new primary legislation is required, this should be included in the planned further Bill on Civil Contingencies. Recommendation 53: Government should bring together regulators of the different infrastructure industries and require them to enforce higher resilience standards in their own sectors, as well as to investigate and strengthen resilience in areas of interdependencies between sectors and in sector supply chains. Recommendation 54: Government should go further and signal to sector regulators that it would welcome investment by utility providers in relevant areas outside their own core business areas where such investment would reduce interdependence on other elements of the infrastructure. Investment by the power generators, national grid and energy distribution companies in mobile communications that are more resilient against power failure, for example, would be welcome. Recommendation 57: Government should task the Centre for the Protection of National Infrastructure (CPNI) with the development of security recommendations aimed at mitigating command and control risks associated with Smart Grids… Quotes from the reports (2)… “We do not believe that the NI can continue on its current trajectory, for three main reasons: •it is highly fragmented, both in terms of delivery and governance… •its resilience against systemic failure is significantly weakening through a combination of: o ageing infrastructure components; o greater complexity and interconnectivity between the different infrastructure sectors; and o nearing maximum capacity as a result of increased social and economic pressures •the significant challenges posed by climate change and socio-demographic changes, which mean that: o there is an urgent need for a major change in devising low carbon solutions to meet the 80% target for reducing greenhouse gas emissions by 2050; o core pieces of infrastructure need to be ‘future-proofed’ against extreme natural events; and o they need to be able to respond to future demographic, social and life style changes”. Quotes from the reports (3)… • “We recommend that the government creates a single point of authority for infrastructure resilience to coordinate the work of the agencies responsible for dealing with individual sectors and threats and recognise interdependency. This would provide the fundamental overview that is lacking, consider how to fill in the gaps and address the areas of infrastructure defence which are currently ignored. • With climate change identified as the biggest threat currently facing the UK’s infrastructure, government must ensure that the newly created Natural Hazards Team is effective. Government should invest the Natural Hazards Team with the power to provide strong leadership to asset owners and ensure legislation is properly enforced. • Government must give clearer guidance to sector regulators such as Ofgem and Ofwat. At present these regulators’ remit is largely the short-term prices paid by end users. In order to deliver the improvements to resilience identified as necessary by government and the overview function for infrastructure resilience, regulators must have the capacity to address asset resilience as well as broader and longer term consumer interests. Regulators require the ability to ensure asset owners build in reserve capacity to critical infrastructure and that they are fully prepared for any emergency scenario.” Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. A case study: Global Navigation Satellite Systems (GNSS) Royal Academy of Engineering report: • Initiated in 2009 following a US GAO report warning that GPS service levels might not be maintained • The RAEng had concerns that the degree of dependence on GNSS was not well understood • The report was published in March 2011. Downloadable from: http://www.raeng.org.uk/news/publications/list/mostrecent.htm The wide range of civil GNSS applications Sectors: Applications: Rail Law Enforcement, Humanitarian Road Elderly and Disabled, Communications Aviation Maritime Agriculture and Fisheries Energy Financial & Banking, Surveying Scientific & Environmental Emergency Services Dredging Source: Royal Academy of Engineering report “Global Navigation Space Systems: reliance & vulnerability published March 2011 Examples of GPS dependencies (source K. VanDyke, DOT, USA) Source: Research & Radionavigation – General Lighthouse Authorities UK & Ireland Vulnerabilities Failure of the “Positioning Navigation and Timing” (PNT) service would instigate common-cause failures of many applications that are otherwise independent and may be interrelated. For example, loss of PNT could lead to accidents, disruption to the emergency services, and loss of telecommunications. GNSS jammers are cheap, widely available and widely used: the JLOC detection network in the USA detects thousands of incidents each day. Source: Royal Academy of Engineering report “Global Navigation Space Systems: reliance & vulnerability published March 2011 Key concerns Reliance on GNSS for PNT is high and increasing. Risk from out-of-cycle solar events is unquantifiable. Risk from jamming is growing. Risk from spoofing is emerging and serious. GPS, Galileo and GLONASS have the same vulnerabilities. Source: Royal Academy of Engineering report “Global Navigation Space Systems: reliance & vulnerability published March 2011 Implications The UK (like other developed countries) is already dangerously dependent on GPS as a source of PNT. Backup systems are often inadequate or untested. Jammers are too easily available and the risks are increasing. No-one has a full picture of the dependencies and consequent vulnerabilities. These risks could be mitigated cost-effectively. Source: Royal Academy of Engineering report “Global Navigation Space Systems: reliance & vulnerability published March 2011 A worked example of vulnerability: e-Navigation needs resilient PNT e-Navigation by 2018-2020: “the harmonized collection, integration, exchange, presentation and analysis of maritime information ...to enhance berth to berth navigation…” IMO says: “e-Navigation systems should be resilient …. robust, reliable and dependable. Requirements for redundancy, particularly in relation to position fixing systems should be considered” ≠ resilient PNT + due to GNSS vulnerability… Picture Courtesy of US National Executive Committee Source: Research & Radionavigation – General Lighthouse Authorities UK & Ireland THV Galatea Trial 2009 With full denial of the GPS signal, some large position errors were observed - Blue ship icon is GPS indicated position - Green circle is true position (from eLoran): eLoran was unaffected by the jamming Reference: http://www.gla-rrnav.org/file.html?file=0968bb9a8ee1ea4ad8c35241ac29c951 Source: Research & Radionavigation – General Lighthouse Authorities UK & Ireland THV Galatea Trial 2009 With higher power jamming, the GPS position visited some exotic locations With lower power jamming comparable to the GPS signal level caused - Hazardously Misleading Information (HMI) - No alarms sounded - Erroneous positions and velocities, some of them barely noticeable! ‘exotic locations’ Source: Research & Radionavigation – General Lighthouse Authorities UK & Ireland THV Galatea Trial 2009 With a little more jammer power, alarms began to sound Eventually all of these bridge systems failed…. - ECDIS: Electronic Chart Display - Ship’s Autopilot - DGPS: Differential GPS - Heli-deck stabilisation system - DSC-GMDSS: Maritime Distress Safety System - Radar - Gyro-compass - AIS: Automatic Identification System Source: Research & Radionavigation – General Lighthouse Authorities UK & Ireland Alternatives do exist to to GNSS - eLoran Independent, complementary and dissimilar to GNSS Low frequency / high power / terrestrial Modern s/w Rx on powerful h/w platform available Chip-level integration possible Meets maritime performance requirements for coastal and harbour approach navigation accuracy, availability, integrity, and continuity Source: Research & Radionavigation – General Lighthouse Authorities UK & Ireland Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. A case study from the USA: SCADA systems • Much of the underpinning system design and software in historic command and control systems (such as Supervisory Control and Data Acquisition SCADA) is poor. Aurora Generator Test – Idaho Labs Plenty of good advice on which to draw… Reports from the UK Royal Academy of Engineering: • http://www.raeng.org.uk/news/publications/list/reports/Engineering_values_i n_IT.pdf • http://www.raeng.org.uk/news/publications/list/reports/Complex_IT_Projects .pdf Report from the US National Academy of Sciences • http://www.nap.edu/catalog.php?record_id=11923 (there is a link to download a free PDF) Report from the US National Security Agency demonstrator project • http://www.adacore.com/home/products/sparkpro/tokeneer Report from the global work on Information Security Economics • http://www.cl.cam.ac.uk/~rja14/Papers/econ_czech.pdf Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. Results drawn from detailed telephone interviews with a balanced sample of 500 IoD members Sample by employee numbers (%) Distribution of sample by sector (%) 13% 15% 1-25 6% Bus & Prof Servs Financial services 10% 39% 26-50 8% 51% 501+ 11% Govt, Educ, Health & Personal Servs 101-200 201-500 9% Distribution & Hotels 51-100 Manufacturing 17% 6% Source: IoD Business Opinion Surveys carried out by GfK-NoP Other, inc. Construction Mining & Transport 15% Impacts of specific infrastructure failure Question asked: “On a scale of 1 to 7, where 7 would imply complete shutdown of your operations for the duration of the outage and 1 minimal impact, please assess the impact of failure of supply for 24 hours during the working week of the following infrastructure…” Mean organisational impacts of loss of services for 24 hours during the working week Gas Supply Water Supply Sewage Removal Mobile V&D Comms. Fixed V&D Comms. Electrical Power 1 Minimal 2 3 4 5 Level of impact Source: IoD Business Opinion Survey. Researched carried out March 2008 6 7 Closure Expectation on continued availability of other infrastructure after region-wide power loss Question asked: “In an emergency that involves loss of electrical power for 12 hours over a wide area, would you expect to be able to continue to use the following services during that period?” Expectation of continuing availability of other services during a 12 hour wide area failure of electrical power Gas Supply Fixed V&D Comms Sewage Removal Water Supply Mobile V&D Comms 0 10 20 30 40 50 60 70 Percentage expectation of availability Source: IoD Business Opinion Survey. Researched carried out March 2008 80 90 Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. Creating markets in resilience…? May be more sustainable than simple, blunt regulation… Solve the problems through the potential of profit from the new opportunities? Clearly applicable in communications and water industries… demonstrable untapped markets. Further potential in electrical power and gas industries? Scoping the potential mobile comms. customers… There will be many potential customers in both the public and private sectors, where the implications of ‘Critical National Infrastructure’ (CNI) support extend. Sector SIC 2007 Employ-ment (K) % need Accessible market (K) Transport/Storage/Communications H&J 2,887 25% 721.5 Electricity, Gas, Water Sewerage D&E 316 50% 158 Financial & Insurance K 1,127 15% 169.1 Manufacturing C 2,627 10% 262.7 Health & Social Work Q 4,145 10% 414.5 Public Administration and Defence O 1,563 5% 78.2 Indicative accessible UK market: 1,804 K Source: UK Office for National Statistics: Labour Market Statistics Release Oct 2013 How might this be engineered: Key questions…? A resilient voice & data overlay on an existing cellular network? • Dedicated SIMs that allow ‘phones and other devices to access both the resilient overlay and the base network? • Minimal use of ‘pico’ and ‘micro’ cells within the overlay to reduce backup power costs and maximise wide area coverage, at the cost of spectrum efficiency? • Based on ‘3G’ or ‘4G’ technology? • Using new dedicated spectrum or a ‘carve out’ from existing spectrum as conventional users migrate from 2G to 3G? • Also supporting the next generation of “blue light” services? What might be the charging model? • Sold as “just like a normal cellular device” but with enhanced resilience? • Enhanced monthly rental but ‘normal’ call charges in conventional use? (No scope for ‘pay as you go’…!) • Enhanced call charges when basic network unavailable, or during tests and exercises? • Regulatory support from sector regulators (e.g. energy, communications, broadcasting, water supply, financial services …) requiring the availability of resilient mobile communications to key personnel within their sectors? • Scope for incremental sales to private users? • Market and brand differentiation against less resilient operators? Could infrastructure projects break the vicious circle? • Even with the challenging economic backdrop, we are likely to see extensive investment in enhancing and hardening the UK national infrastructure over the next several years. • It seems to me to be crucially important that this investment is based on the best principles of secure design and implementation, especially in terms of software and embedded systems… • If we want that high confidence that a system has some desired properties (e.g. specific security properties), then this can only be shown by analysis, supported (to a degree) by testing. • Once that is accepted, it dictates the whole strategy for development, because it requires that the desired properties are expressed in a formal language, and that the software is developed using notations and languages that can be rigorously analysed to show that the system they describe has the required properties. If there is a market for certifiably secure software, then there will be a market for the languages, methods and analysis tools that will be needed. Achieving the breakout… • Take the new infrastructure projects as the catalyst for a fundamental change in practice, leveraging Government’s role in regulation and, to a lesser extent, procurement. • Adopt a mandatory two-stage procurement, with an initial step in which a systems architect would capture, formalise and analyse the customer's requirements; • Demand that key operational software should always be delivered with an evidence-based argument that it met the security specification; • Rely far more on analysis and far less on testing as the core evidence. From breakout to critical mass Recommendation 60: Government should also approach the European Commission … to sponsor a programme for the creation of a range of secure and reliable standard software modules (such as simple operating systems, database management systems and graphical user interfaces). These modules should be developed using formal methods and be made available free of charge through an ‘Open Source’ licence to encourage their widespread use. Issues to be covered • Setting the scene - impact of exponential growth. • Disasters do happen… • Why worry now? • The trap of “accidental systems” • Outdated and insecure systems • Is there a good understanding of resilience? • Creating markets in resilience. • Summing up - final thoughts. UK Government’s Cyber Security Vision Source: UK Cyber Security Strategy 2011 & PwC http://www.cabinetoffice.gov.uk/resource-library/cyber-security-strategy Some perspectives on “Risk” "Until one is committed, there is hesitancy, the chance to draw back, always ineffectiveness…. Whatever you can do, or dream you can, begin it. Boldness has genius, power and magic in it. Begin it now." - Johann Wolfgang Von Goethe “There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success than to take the lead in the introduction of a new order of things”. Niccolo Machiavelli from The Prince “Business without risk is like farming without water” from “Managing Uncertain Markets” Risk identification and mitigation are key Board responsibilities Cyber risk is growing steadily in importance. Mitigation measures and regular testing/exercising are essential. Ensuring that The management approach proper risk assessment and control processes are needs to be balanced and In place and working is holistic. a key responsibility of the Board. Responsibility for Identifying and managing risk and ensuring continuity of business is pervasive throughout an organisation. Some final thoughts…. • The UK has a good history of voluntary co-ordination and trusted information sharing. l l l l l However, electrical power and communications have the key roles in blocking cascades of infrastructure failure. ‘Mobile’ comms. has now surpassed ‘fixed’ as the access network of choice under many circumstances, yet UK mobile networks were never designed for wide-area emergency resilience. Perhaps, in the head-long rush to move up the supply chain into content, mobile operators in particular are overlooking an opportunity in their core market? Opportunities also exist in other basic services. Surely market forces can complement regulation in driving up emergency resilience? Slides can be downloaded from: www.profjimnorton.com/bathipr1.pdf