Predictive Storm Damage Modeling and Optimizing Crew Response to Improve Storm Response Operations by Sean David Whipple B.S.E., University of Michigan, Ann Arbor, 2008 Submitted to the MIT Sloan School of Management and the Engineering Systems Division in partial fulfillment of the requirements for the degrees of MS ETkf Master of Business Administration MASSACHUSETTS A'81tTT OF TECHNOLOGY and Master of Science in Systems Engineering JUN 13 2014 in conjunction with the Leaders for Global Operations Program at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY L IRRARIES June 2014 @ Sean David Whipple, MMXIV. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author .......... .................. Signature redacted ....... MIT Sloan School of Management and the Engineerfng Systems Division May 9, 2014 Certified by......... Signature--------redacted -- -- --- --- ---I ............................ -V Certified by....................... James Kirtley, Thesis Supervisor Professor of Electrical Engineering Department of Electrical Engineering and Computer Science Signature redacted Georgia Perakis, Thesis Supervisor William F. Pounds Professor of Management Science MIT Sloan School of Management Approved by . Signature redacted., . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Richard Larson Mitsui Professor of Engineering Systems Chair, Engineering Systems Division Education Committee Approved by...........Signature redacted ......................... Maura Herson Director, MBA Program MIT Sloan School of Management THIS PAGE INTENTIONALLY LEFT BLANK 2 Predictive Storm Damage Modeling and Optimizing Crew Response to Improve Storm Response Operations by Sean David Whipple Submitted to the MIT Sloan School of Management and the Engineering Systems Division on May 9, 2014, in partial fulfillment of the requirements for the degrees of Master of Business Administration and Master of Science in Systems Engineering Abstract Utility infrastructures are constantly damaged by naturally occurring weather. Such damage results in customer service interruption and repairs are necessary to return the system to normal operation. In most cases these events are few and far between but major storm events (i.e. Hurricane Sandy) cause damage on a significantly higher scale. Large numbers of customers have service interrupted and repair costs are in the millions of dollars. The ability to predict damage before the event and optimize response can significantly cut costs. The first task was to develop a model to predict outages on the network. Using weather data from the past six storms as well as outage data from the events, asset information (framing, pole age, etc.), and environmental information were used to understand the interactions that lead to outages (forested areas are more likely to have outages than underground assets for example). Utilizing data mining and machine learning techniques we developed a model that gathers the data and applies a classification tree model to predict outages caused by weather. Next we developed an optimization model to allocate repair crews across Atlantic Electric staging locations in response to the predicted damage to ensure the earliest possible restoration time. Regulators impose constraints such as cost and return to service time on utility firms and these constraints will largely drive the distribution of repair crews. While the model starts with predicted results, the use of robust optimization will allow Atlantic Electric to optimize their response despite the uncertainty of why outages have occurred, which will lead to more effective response planning and execution across a variety of weather-related outages. Using these models Atlantic Electric will have data driven capability to not only predict how much damage an incoming storm will produce, but also aid in planning how to allocate their repair crews. These tools will ensure Atlantic Electric can properly plan for storm events and as more storms occur the tools will increase their efficacy. Thesis Supervisor: James Kirtley Title: Professor of Electrical Engineering Department of Electrical Engineering and Computer Science Thesis Supervisor: Georgia Perakis Title: William F. Pounds Professor of Management Science MIT Sloan School of Management 3 THIS PAGE INTENTIONALLY LEFT BLANK 4 Acknowledgments First and foremost I would like to thank my advisers Prof. Georgia Perakis and Prof. James Kirtley for their invaluable insight, guidance, and support on this project and on my MIT experience as a whole. In addition to Georgia Perakis, the work done on this paper was also done with great collaboration with Prof. Vivek Farias (Sloan), Matthieu Monsch (PhD MIT ORC), and Anna Papush (PhD candidate MIT ORC) and want to thank them for their support as well as the Atlantic Electric team for making my internship a challenging and rewarding experience. I would also like to thank my fellow LGO and Sloan classmates for making my graduate experience at MIT and Sloan one I will always treasure. Finally I would like to thank my friends, family, and girlfriend, Lisa Kurajian, for their encouragement, love, and support during my time at MIT. 5 THIS PAGE INTENTIONALLY LEFT BLANK 6 Contents 1 Introduction to Storm Response at Atlantic Electric 1.1 Current Storm Response Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Literature Review 1.3 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.1 Outage Prediction Literature Review . . . . . . . . . . . . . . . . . . . . . 15 1.2.2 Optimization Literature Review . . . . . . . . . . . . . . . . . . . . . . . 17 Thesis Outline and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Predicting Outages 2.1 19 D ata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 Electrical Asset Information . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.2 Land Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3 Historical Outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.4 Historical Weather Information . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Segment Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Weather Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 3 13 2.3.1 Weather Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 Weather Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.1 Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Classification Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 Implementation of Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 31 Repair Crew Optimization . . . . . . . . . . . . . . . . 31 3.1 Introduction to Crew Allocation at Atlantic Electric 3.2 Solving the Master Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 7 Relaxing the Problem 3.4 Comparison of Master and Relaxed Formulations . . . . . . . . . . . . . . . . . . 3.5 3.6 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 3.4.1 Test Evenly Distributed Workload Assumption . . . . . . . . . . . . . . . 35 3.4.2 Comparison of Crew Assignments Between Master and Relaxed Formulation 36 Optimization Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 . . . . . . . . . . . . . . . . 38 3.5.1 Robust Optimization Using Box Constraints 3.5.2 Robust Optimization Using Bertsimas-Sim Uncertainty Sets . . . . . . . . 39 3.5.3 Determining Robustness of Solution . . . . . . . . . . . . . . . . . . . . . 41 3.5.4 Comparison of Optimization Results . . . . . . . . . . . . . . . . . . . . . 44 C onclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 47 Conclusions and Future Work 4.1 4.2 4.3 35 Improvements to Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1.1 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1.2 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1.3 Principal Component Analysis on Weather Information . . . . . . . . . . 48 . . . . . . . . . . . . . . . . . . . . . . . . 48 Improvements to Optimization Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.1 Data Inaccuracies 4.2.2 In Storm Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 Customer Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 C onclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A In Sample and Out of Sample Prediction Results 51 B Complete Maximum Likelihood Formulation 55 B.1 Without Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 B.2 W ith Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 59 C Complete Variable Notation C.A Outage Prediction Variable Notation . . . . . . . . . . . . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . 59 C.2 Crew Optimization Variable Notation 61 Bibliography 8 List of Figures 2-1 Asset Description . . . . . . . . . . . . . . . . 20 2-2 Land Cover Distribution . . . . . . . . . . . . 21 2-3 Weather Station Locations . . . . . . . . . . . 22 2-4 2010 Storm Results . . . . . . . . . . . . . . . 28 2-5 Web Application User Interface . . . . . . . . 29 2-6 Web Application Output Window . . . . . . . 29 3-1 Job Allocation . . . . . . . . . . . . . . . . . 35 3-2 Crew Workload Comparison . . . . . . . . . . 36 3-3 Comparison of Results in Master and Relaxed Formulations 37 3-4 Historical Non-Storm Outage Histogram . . . 42 3-5 r Simulation Histogram . . . . . . . . . . . . 43 3-6 Comparison of Optimization Results..... 44 9 THIS PAGE INTENTIONALLY LEFT BLANK 10 List of Tables 2.1 Hisotrical Outages 2.2 Sensitivity and Specificity Results of Outage Prediction 27 3.1 Master Formulation Variable Notation . . . . . . . . . . 32 3.2 Relaxed Formulation Variable Notation 34 A.1 October 2011 Winter Storm Results . . . . . . . . . . 52 A.2 December 2010 Winter Storm Results . . . . . . . . . . 52 A.3 Hurricane Sandy . . . . . . . . . . . . . . . . . . . . . . 53 A.4 Hurricane Irene . . . . . . . . . . . . . . . . . . . . . . . 53 A.5 February 2010 Snow Storm . . . . . . . . . . . . . . . . 54 A.6 December 2008 Snow Storm . . . . . . . . . . . . . . . . 54 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 C.1 Outage Prediction Variable Notation 59 C.2 Crew Optimization Variable Notation 59 11 THIS PAGE INTENTIONALLY LEFT BLANK 12 Chapter 1 Introduction to Storm Response at Atlantic Electric Severe weather often causes significant damage to Atlantic Electric 1 assets, prompting Atlantic Electric to respond by dispatching crews to repair any damage and return the system to normal operation. Large storms such as hurricanes are particularly problematic because they not only require significant costly repairs and interrupt service to hundreds of thousands of customers, but the Department of Public Utilities (DPU) can fine Atlantic Electric and other utility firms if they determine that a storm response plan was not properly created and implemented. Because Atlantic Electric often does not have enough crews to repair the damage in an acceptable amount of time, they need to bring in outside contract crews to meet the need. This is an industry standard, most firms are staffed for normally operation and maintenance. Staffing for very large storms would add unnecessary cost throughout normal operating periods. Since large storms affect large areas, it results in many utilities competing for the same limited supply of repair crews. As a result Atlantic Electric and other utilities try to to obtain these crews several days in advance of a major storm to allow travel and planning time so that the utility is able to respond to the storm as efficiently as possible. During storm events, Atlantic Electric stations its repair crews at sites called platforms. These platforms are the home base for these crews throughout the duration of the storm (or until they re-allocate crews). Once stationed, crews repair damage to the Atlantic Electric system when outages occur in locations that are associated with their platform. Storm response is not only a large driver of public opinion of Atlantic Electric, it also has a significant impact on Atlantic 'The name of the actual utility has been replaced with Atlantic Electric 13 Electric's costs. Large storms incur total repairs that number in the millions of dollars, much of which may not be reimbursed by the DPU. To motivate the problem we can examine the response of Atlantic Electric to Hurricane Irene and a snowstorm in October 2011. After reviewing Atlantic Electric's response to these events, Attorney General Martha Coakley recommended that the DPU seek $16 million in penalties, the largest ever sought by the DPU [22]. This was after Atlantic Electric had already spent $47 million in repairs for Hurricane Irene alone. Large storms have a significant direct financial impact on a utility's bottom line. Additionally because utilities are essentially legal monopolies, government regulators limit the amount of earnings a utility can obtain based upon their customer satisfaction level (customers tend to not be happy when they are without power for significant periods of time). 1.1 Current Storm Response Process Currently Atlantic Electric's process for determining the number of crews they need and implementing an effective approach for stationing them is very human. Regional managers estimate damage based on their previous experience and request a specific amount of external crews. Each manager has their own methods and experience driving that decision point, there is no central decision making criteria. Atlantic Electric aggregates the information and tries to meet this requested budget. Once the storm has hit Atlantic Electric has three meetings per day where regional managers update progress on repairs and report their surplus/deficit of crews. Again these surplus/deficit numbers are driven by intuition of what managers believe they need to efficiently return the system to normal operation. Atlantic Electric managers again do their best to meet all the needs using their experience. Atlantic Electric's process is inherently human and can be quite flawed. While the managers are all skilled and experienced, the lack of data driven approaches makes their decisions hard to defend to regulators such as the DPU. Customers and the public in general are skeptical of analyses that are solely driven by human intuition. It also creates a serious knowledge transfer problem. When managers are unavailable to work or leave the company all of their expertise goes with them. Understanding the effect of weather on Atlantic Electric assets is not a simple one and takes years of experience to develop. A data driven model that is well maintained can be operated by someone with significantly less training, their only requirements would be to feed the model new data as it becomes available and interpret results. Such a system would 14 yield consistent, continually improving results as storm data is collected over time. 1.2 Literature Review 1.2.1 Outage Prediction Literature Review Society's increasing dependence on technology, media and communication has created a growing reliance on the electrical supply industry. Stemming from this dependence, weather-based power outages have recently become a very significant concern for both distributors and consumers. In the past decade there has been a good deal of research done with respect to this particular problem, from multiple angles. Most of the literature related to this field can largely be categorized into three distinct lines of work. The first of these is research related to climate change and weather-based forecasting. Cli- mate variation over the past several decades has sparked a great deal of academic interest both in terms of data collection and modeling. Synoptic weather typing is a scheme for the classification of weather conditions into distinct types. It has frequently come into use as a valuable tool for work in climate impact applications such as pollution, disaster planning, agriculture and human health. In [10] and [17], we see an automated and manual approach to this kind of weather analysis. The work in [10] predicts occurrences of freezing rain by using automated synoptic typing on differentiations in air mass. By studying hourly meteorological readings, they identify weather types correlated with freezing rain and apply stepwise logistic regression to predict its likelihood. Similarly, [17] also employs airborne particle concentrations and daily weather data to build a manual synoptic typing that categorizes storm types in advance. This branch of research also encompasses the effects of climate change on a socio-economic level. Through time series modeling, [13] predicts daily variability in ski resort attendance based on a combination of surrounding urban and mountain weather. The authors of [24] and [3] consider the potentially harmful impacts of climate variability on temperature-related mortality and air pollution-related health effects by analyzing correlations with weather parameters. A second branch of the literature considers electrical system reliability with respect to weather and the environment. Foundational work in this direction is done in [5], [6] and [7]. These papers propose single and two-state weather models, then expand these to a three state weather model that captures normal, adverse and extreme weather circumstances. Through the resulting calculations of reliability indices, they demonstrate the need for weather to be considered in practical 15 system assessments. Sensitivity studies show that disregarding weather effects produces overly optimistic system appraisals, and that inclement weather conditions must be divided into a minimum of two types. Prior to this work, multiple investigative studies considered specific types of weather events and their impacts on system reliability. These works, such as [16], [1], and [8], ultimately aim to improve reliability through system redesign. The work in [16] analyzes drought conditions and their resulting effects on tree faults; by using the Palmer Drought Index, they present the influence of drought on tree-caused power outages. The latter two works consider lightning storms and ice storms, respectively. By modeling system response and storm characteristics, [1] presents a Monte Carlo simulation that evaluates system reliability and helps identify weaker areas for system redesign. Using a similar approach, [8] models weather, vulnerability and restoration times in order to estimate system component reliability during severe ice storms. More recently, there have been general weather reliability studies following Billinton's works. In order to present a cost-benefit analysis for overhead-to-underground line conversions, the work in [28] estimates damage rates based on hurricane wind speeds and simulates the resulting restoration process. The paper by Caswell et al. [9] considers correlations between reliability indices and various weather parameters to account for system variability. The third line of work entails the prediction of weather-related electrical power outages. Some of the earlier considerations of this problem are demonstrated in [23], [11] and [12]. The approach in [23] utilizes artificial neural networks (ANNs) in order to predict the number of power interruptions based on inputted weather parameters. This approach combines time series and regression to develop a learning algorithm. The follow-up works [11] and [12] consider the effects of normal daily weather conditions on distribution system interruptions; by using Poisson regression models they determine the significant weather parameters that contribute most to daily outages. Later work such as [27] and [29] show the incorporation of other statistical techniques. In [27], contingency probability estimators are computed through the use of maximum likelihood (ML), to predict a transmission failure rate, and multiple linear regression on transformed weather data. Using both a Poisson regression model and a Bayesian network model, [29] proposes a method for predicting the number of annual overhead distribution line failures caused by weather. The seminal series of papers by Liu et al., including [19] and [20], address a statistical approach to predicting spatial distribution of power outages and restoration times resulting from hurricane and ice storm damage. They employ a generalized linear mixed regression model 16 (GLMM), however instead of using quantitative characteristics of each storm, they created indicator variables that map each outage to its respective storm. Furthermore, their model predicts damage on an outage level, meaning that it indicates whether a given device will open. This lacks granularity in that an outage may be caused by 5 trees falling across the lines or only by one broken pole. Their spatial prediction is executed on a 3 km x 3 km grid cell in a given area serviced by a utility company. Building on this approach, [14] uses generalized linear models (GLM) as well as generalized additive models (GAM), in addition to measurable storm data that replaced indicator variables. In order to avoid variable collinearity, the data was transformed using principal component analysis (PCA), which insures that the data is not correlated. This work also predicts on a grid level, now 3.66 km x 2.44 km, in order to estimate numbers of outages, customers without power and damaged poles and transformers. Although this approach increases the prediction granularity, it still makes several assumptions on conditions that cause outages, such as the wind speed necessary to down a pole or uproot a tree. In the more recent work by Hongfei et al. [15], a Bayesian hierarchical statistical modeling approach is used to predict the number of outages and capture uncertainty in the outage data. Although the prediction is not categorized by type of outage, the model also geographically displays the uncertainty of the damage forecasts. 1.2.2 Optimization Literature Review Optimizing operations in industry is not a novel concept and has been utilized across many fields and industries. Balwani showed that a stochastic programming optimization can reduce overtime repairs on Atlantic Electric's gas assets [2]. However the model presented in [2] contained data certainty, a fundamental constraint we will be facing in storm response is most of the data will not be certain. [4] utilizes mixed integer programming to solve classic optimization problems under uncertainty (ex. the traveling salesman problem). Other efforts had been made to create optimization formulations with data uncertainty but were often accepting sub-optimal results. Bertsimas and Sim create a more attractive trade off of data uncertainty and solution optimality using probabilistic bounds of uncertainty in the data. 1.3 Thesis Outline and Contributions This thesis will first present the methodology used to generate predicted outages in Chapter 2. First we will discuss the data available to us and outline the methodology used to transform that data into outage predictions. While models currently exist that attempt to solve similar problems our solution provides analysis at a new level of granularity that currently does not 17 exist. The model presented will give Atlantic Electric a unique understanding of the incoming damage from a storm at the device level (as opposed to the town level or other more aggregate levels). Because our methods are purely data driven they are well suited to continuously improve as additional data becomes available. In Chapter 3 we rely heavily on Bertsimas-Sim uncertainty sets to show that once Atlantic Electric begins planning crew allocations (based upon predicted outages or actual damage retrieved from the field), restoration time can be significantly reduced without complete knowledge of the damage profile from any given storm. Reducing restoration time will allow Atlantic Electric to release external contract crews earlier and utilize less manpower overall resulting in significantly lower storm response cost. Additionally, restoration time reduction is critical for Atlantic Electric's public appearance and regulatory filings to show that Atlantic Electric is sufficiently planning for, and responding to, major weather storms. 18 Chapter 2 Predicting Outages 2.1 Data In order to construct the model we built a database utilizing both Atlantic Electric asset information, outage information from past storm events, and historical weather information relevant to those storms for the state of Massachusetts. Because we want to predict to outages during future weather events for improving storm response planning, it is critical to understand the asset vulnerabilities on the Atlantic Electric network utilizing both their asset features, environmental information, and relevant weather information. 2.1.1 Electrical Asset Information In order to predict outages on the network we first need to have an understanding of the electrical asset information. This analysis is focused on the distribution network of Atlantic Electric, that is everything down stream of a substation. Substations and everything upstream of them in the power generation and distribution system were not used in this analysis (Figure 2-1 illustrates this scope). 19 Circuit Power station Substations Low voltage power lines Houses Segments Figure 2-1: High level representation of the distribution network. This analysis focused solely on everything downstream of substations. Everything upstream (substation, high voltage transmission lines, and power generation) was not considered. The electrical distribution network is composed of segments that form a distribution tree downstream from the substation. These segments are any electrical equipment (poles, underground lines, etc.) that connect devices (fuses, breakers, transformers, etc.). In total, Atlantic Electric has approximately 60,000 devices and 280,000 segments (some, not all, of the segments have devices located on them) on 300 circuits across Massachusetts serving approximately 1.2 million customers. In order to fully understand the asset information we obtained the physical attributes of the 280,000 segments across Massachusetts. These segments are represented by 35 physical properties (insulation, density, age, framing, length, wiring, etc.). Utilizing this we can create a define a new term, known as an asset. An asset is a device grouped with all the segments that are downstream of that device until the next device. 2.1.2 Land Cover Environmental information such as land cover and altitude was added to the segment description to aid in capturing key interactions that lead to outages. For example, a bare framed pole with thinner insulation is more likely to go out in a forested area due to falling trees compared to flat open ground. Using data from MassGIS, Massachusetts has 33 different land cover types (forested, low density residential, commercial, etc.). Figure 2-2 illustrates the distribution of Atlantic Electric segments and devices on those land cover types.. 2.1.3 Historical Outages In order to train the model we used the outage history across several storms as responses to the weather and asset interactions. Table 2.1 contains summary statistics of historical outages that 20 40000 10000 ||.'I., Figure 2-2: This histogram depicts the variety of land cover types where segments lie. The green bars represent the total number of segments corresponding to each category, while the blue bars represent the total number of segments corresponding to each category that are a part of an asset which has been damaged in one of the six storms considered in this work. Atlantic Electric experienced. It is important to note that in this table that the term "outage" refers to the failure of a device, not a segment. Because one or more damaged segments can lead to an outage, an outage in itself is not a good indicator of damage at or near the device that went out. Table 2.1: Hisotrical Outages Days Outages Customers Out 2008-12-12 10 1784 185931 Wind Storm February 2010 2010-02-24 6 615 151350 Winter Storm December 2010 2010-12-26 4 444 106346 Tropical Storm Irene 2011 2011-10-29 11 2746 291672 Hurricane Sandy 2012 2012-10-29 7 1466 180416 Storm Name First Outage Winter Storm December 2008 It is important to note here that even the worst storm has approximately 4.5% of the the total devices going out. This presents a particularly difficult prediction problem as we have so few positive samples to work with. 21 ................ ...................... _ _ _ .................. ........................................ .............. ............................ ..... 2.1.4 Historical Weather Information When examining historical weather there were two sources available. The first source was a set of weather logs provided by a third party vendor, which includes time series data at several hundred locations across MA and contains information on weather features such as temperature, wind (speed, gust, and direction), precipitation rates, etc. Each station contained one data point per hour throughout each storm. The second source was a collection of weather forecasts aggregated from several freely available weather providers that regularly report to Atlantic Electric. While this data was free and Atlantic Electric already had the mechanisms in place to obtain it, it was considerably lacking in granularity compared to the weather logs mentioned previously in this section. This data set has fewer locations, fewer features (i.e. does not contain precipitation rates or wind gust information) and only contains one data point per day per storm. This reduction in granularity significantly impacts our efficacy in outage prediction. Figure 2-3 illustrates the locations where weather information is available. 4% % 4S N 4P 2% ~ f Figure 2-3: Locations of available weather stations across MA. Blue stations represent locations of weather provided by a private vendor. Orange locations are publicly available weather stations that readily provide forecasts. 2.2 Segment Clustering The first step in organizing the data is to group the different segments into types. We do this using a clustering algorithm: k-means (also referred to as Lloyd's algorithm, [21]). This algo- 22 rithm aims to partition the segments into k clusters in which each segment is associated with the center of its corresponding cluster. All segments associated with any given cluster then are considered equivalent and have the same properties. The number of centers (k) was initially set to 20, and eventually brought down to 10 to simplify the model and increase generality. A higher number of centers leads to better representation of the segments but also more parameters to optimize in the final model. In order for the clustering algorithm to work, some of the raw features were normalized by the length of the segment: the total number of poles (respectively total number of customers) was converted to pole density (respectively customer density). Without this preprocessing step, the clustering algorithm gives very poor results as there is a large variance in segment lengths (from 1 meter to 86 miles). Each segment type is then further categorized by land cover (the 33 land cover categories were brought down to 10 by grouping similar ones together and removing land cover types that covered negligible areas of land): forested areas, highly residential areas, open rural areas, etc. This brings the total number of segment types to 100 (this is generated from all combinations of our 10 types from k-means clustering and the 10 land cover types). Now we can represent all Atlantic Electric assets with the following ai = di : {l, 12, ... , 11001i Where ai represents asset i, di is the associated device on that asset, and the set {li,1, .. , IOO}i is the length of each type of segment directly downstream of di (downstream meaning all segments between di and the immediate downstream devices of di in the distribution network). 2.3 Weather Profiles When exploring how to use the weather information, two different approaches had to be taken depending on whether the weather logs or forecasts were being used as described in Section 2.1.4. While the weather logs are clearly more ideal for outage prediction, the logistics of working with that vendor to provide tailored forecasts has not yet been completed. Therefore the weather forecasts are necessary for prediction until tailored forecasts from the vendor are available for use in predictions in advance of major storm events. We will present approaches for both data sources. 23 2.3.1 Weather Logs To make the weather information usable each weather feature was binned into dummy variables where each bin contained the number of logs that fell into that particular range. Each feature (wind speed, temperature, etc.) was set to have 10 bins evenly distributed across the range of lowest to highest value for that particular feature. Increasing the number of bins could potentially increase accuracy of the model but in doing so we lose degrees of freedom and take away from the generality of the model. For each device all weather stations that were within a 5 mile radius were used to create the weather profile for that device. Unfortunately not all of Atlantic Electric's assets fall within a 5 mile radius of a weather station and therefore had to be excluded from use in the model. However this only accounted for approximately 5% of Atlantic Electric's assets in this case. 2.3.2 Weather Forecasts Because the forecasts lack the granularity to bin the information as we did with weather logs, we simply applied the worst case scenario value of each forecast feature from every station within a 20 mile radius (the radius had to be extended due to the forecasts being available at fewer locations than the logs). This results in a significant decrease in prediction capability and therefore the model captures much less variance when compared to the model using weather logs. The loss in prediction quality is not surprising given the amount of data loss. However when examining the data we can actually see it would be difficult for any model to yield quality predictions using only the forecasts. We looked at the expected value of all the weather features at each asset, given the outage response at that asset. This can be written mathematically as follows "EWfja 1 = ], E [wf,|Oa = 0] Where wfa is the weather feature value for feature 1Vf,a f (2.1) at asset a. Because the above ratio was approximately 1 for most of the weather features, there is little to no signal to generate quality predictions. 24 2.4 Model Formulation 2.4.1 Maximum Likelihood Estimator Much of the work from this section was done in collaboration with Matthieu Monsch (PhD Massachusetts Institute of Technology Operations Research Center) and can be found in his dissertation [25]. The complete model formulation from [25] can be found in Appendix B, but we will summarize it here. To begin we will make some assumptions about damages caused by weather events. Assumption 1. Damaging events occur independently on Atlantic Electric's network as a Poisson process It is safe to assume that damaging events that occur are unlikely to influence damaging events at other locations (i.e. if a fuse fails in Worcester, that failure is unlikely to effect the failure rates of other devices across MA). It then follows that damaging events can occur as Poisson process dependent on weather and asset features. Assumption 2. Damaging rates are uniform across each segment This assumption is safe as after all the segments have been clustered into types it follows that any one segment would be subject to the same weather vulnerabilities as segments of the same type. Combining Assumptions 1 and 2 we can state that damaging events occur with the following AS't = lsge*gwt where I is the length of segment s, g,, is the vector of vulnerabilities for that particular segment, and wt is a vector of the weather features surrounding that segment. We make no assumption on particular weather features and allow for segments to be more vulnerable to damage under certain weather conditions compared to others [25]. We cannot compute damage at the segment level because of data granularity, but we can aggregate damage at the asset level with the following: Aa,t is~* wt = s~a la,cge*wt = C -lawt 25 Or more generally Aa,t y*Xa,t The maximum likelihood estimator (,y*) can be found efficiently and due to the non-linear nature of this estimator, it provides a better prediction than those of linear estimators found in some of the existing literature (e.g. logistic regression) [25]. 2.4.2 Classification Trees While the earlier assertion that a maximum likelihood estimator is better than typical linear models is true, one can note that classification trees also present a viable option as they hold true to those assumptions in general. Classification trees only require that the data be linear within regions of the predictor variable space, not the entire variable space. Recursive partitioning in classification trees can effectively identify regions of the predictive variable space that are prone to outages while not being influenced by other areas of the space that often do not lead to outages (i.e. underground segments almost never experience outages due to weather) [26]. We used the scikit-learni package (implementation to be discussed in detail later) in Python which uses random tree generation to lower generalization error and produce more diverse classification trees [18]. 2.5 Results In order to test the model we did both in and out of sample testing. In sample testing used all storms as training data and out of sample testing held back the storm that was being tested and trained on the other five storms. This manner of testing prevented against over fitting but also simulated the environment in which it would be used at Atlantic Electric; incoming storm information would never be part of the training set. The in sample results however demonstrate the model's ability to learn new outage responses as the data becomes available. 'http://scikit-learn.org/ 26 Table 2.2 details results using sensitivity and specificity numbers where sensitivity = number of true positives number of true positives + number of false negatives specificity = number of true negatives number of true negatives + number of false positives Table 2.2: Columns listed with "IS" denote in sample results and columns denoted with "OS" denote out of sample results Sensitivity (OS) Specificity (OS) Sensitivity (IS) Specificity (IS) Winter Storm Dec '08 .702 .989 .733 .999 Wind Storm Feb '10 .967 .991 .755 .999 Winter Storm Dec '10 .713 .991 .819 .999 Tropical Storm Irene .978 .976 .767 .999 Hurricane Sandy .669 .998 .724 .999 Storm Name It is also useful to visualize the results to aid users in internalizing results and interpreting them. Figure 2-4 is one such example where we can see how well we predict when compared to an actual storm. We aggregated outages around Atlantic Electric's crew platform locations for demonstration purposes. This visual was produced using Google Maps 2 for this document, later we will show screen captures of the production system built to allow Atlantic Electric users to interface with the model. A complete summary of the model prediction results can be found in Appendix A. For simplicity and space all results are given in a table format as opposed to the visualization method seen in Figure 2-4. 2 http://map.google.com 27 Figure 2-4: Out of sample results for Winter Storm 2010. Outages are grouped to the nearest Atlantic Electric staging platform by location. Circle diameter varies linearly with the number of outages associated with that platform. Red circles identify the number of outages that actually happened in the storm. Blue circles show out of sample results from the prediction model. 2.6 Implementation of Prediction Model The model previously described was built into a Python tool3 which utilized several packages (scipy4 , numpy5 , scikit-learn 6 , pandas7 ) all interfacing with a MySQL database. The entire project is deployed using Flask8 as a web application that can be accessed with a browser. These Python packages provide a variety of numerical tools that enable us to do the analysis described in this chapter. Because of the size of the computation necessary we hosted this project on Amazon Web Services9 which has ample computing capacity to handle the demands of the model. This allows Atlantic Electric to only pay for the computing power they need and have multiple users easily access the tool from multiple locations. Shttp://python.org 4 http://scipy.org/ 5 http://numpy.org 6 http://scikit-learn.org/ 7 http://pandas.org/ 8 http://flask.pocoo.org/ 9 http: //aws.amazon.com/ 28 .................. I I - Outage prdkton VfW*W6*M Gmm Pawrahn 4 ftow* * wm Figure 2-5: This user interface allows users to select training storms, determine which platforms to be opened, input data, and run the model. O0~g pmdioton t.,._- 0 P6001MIN BANOW" 00"" NOWq. 4tim Figure 2-6: After the model has predicted outages given the user input, the web application visualizes the data by grouping outages to their closest opened platform allowing users to visually internalize the results of the model. Outages are only grouped to platforms that have been enabled, or opened, by the user. 2.7 Conclusion The methods presented in this chapter outline a machine learning approach that will allow Atlantic Electric to quantify the amount expected damage from an impending weather event without any human bias. Even with the limited data available, we were able to predict outages with sufficient accuracy and that accuracy will continue to improve as Atlantic Electric experiences more storm events. Because the model can clearly articulate the expected outages 29 I'll.. "I from weather, Atlantic Electric is well suited to better contract necessary resources from outside firms, deploy storm response resources effectively both before the storm occurs and during the storm, and justify their actions to the public and government regulators. The next chapter will detail the methods developed that will aid Atlantic Electric in making deployment decisions based on the outage prediction results we are making here. 30 Chapter 3 Repair Crew Optimization 3.1 Introduction to Crew Allocation at Atlantic Electric During storm events Atlantic Electric stations its repair crews at sites called platforms. These platforms, or staging areas, are the home base for these crews throughout the duration of the storm (or until they re-allocate crews). These locations are generally Atlantic Electric facilities but can also include municipal locations such as schools or parks where Atlantic Electric can temporarily stage crews during a storm. Once stationed crews repair damage to the Atlantic Electric system when outages occur in locations that are associated with their platform. Storm response is not only a large driver of public opinion of Atlantic Electric, it also has a significant impact on Atlantic Electric's costs. Large storms cause damage that can require millions of dollars in repairs. Atlantic Electric can petition for reimbursement from the DPU for these repairs. However, if the DPU determines that Atlantic Electric did not sufficiently plan for and respond to the storm, they can refuse reimbursement or even fine Atlantic Electric. Currently Atlantic Electric uses a very manual and human process to station their repair crews in a storm response scenario. Atlantic Electric managers use their experience and intuition to make educated guesses to determine the number of crews they need to effectively repair the system within an acceptable amount of time. Atlantic Electric managers have tremendous experience and insight when making these decisions. However human decisions will always be inherently sub-optimal and it is often difficult to transfer that knowledge and expertise to other Atlantic Electric employees who do not have the same years of experience. Since storms do not happen often it generally takes employees years to see enough storms in order to effectively manage the storm response process. 31 The previous model produced predicted outages at the platform level based upon weather forecast input. The next step is to utilize those predictions to aid in Atlantic Electric's storm response planning. An understanding of where they anticipate damage will allow Atlantic Electric to station crews so that they are best suited to repair damage in the fastest time possible. This will ultimately return service to customers quicker, reduce costs for Atlantic Electric, and aid Atlantic Electric in justifying their actions to the DPU in their regulatory filings. 3.2 Solving the Master Problem To solve the problem of storm response planning we created a formulation that ultimately determines where to station crews and what jobs they will be responsible for completing. While this is not a scheduler it ultimately gives Atlantic Electric the ability to better understand the anticipated work of each repair crew as a result of expected outages from the outage prediction model. The complete formulation and explanation of variables is given below (and all formulation notation can also be found in Appendix C). Table 3.1: Master Formulation Variable Notation Notation Decisions j at platform k Xijk Crew i assigned to job Xik Crew i assigned to platform k Z Data Description Yjk Mk Objective value equal to the worst case repair time k Time required to do job j from platform Crew capacity for platform k It should be noted that we are currently examining Yjk values that take on a known, nominal value. In reality these repair times are unknown, as repairs can be caused by any combination of problems with varying repair times. These unknowns will have serious effects on both the solution and the optimization formulation. The uncertainty of -Yjk and its effects on the optimization problem will be discussed later in this chapter. The model can be represented mathematically as the following: 32 Objective: Minimize Z subject to the following constraints Vi (3.1) Vk (3.2) Vj (3.3) Vi (3.4) X jk < Xik Vij,k (3.5) Xijk, Xik c {0, 1} Vi, j, k (3.6) '-jkXijk < Z 5 ik< 5Xijk 5 k Xi 5 1 k All notation for this formulation can be found in Table 3.1 (and Appendix C). Platform capacities are ensured by 3.2 and 3.3 ensures that all jobs are completed. Constraints 3.4 and 3.5 ensure that crews are assigned to only one station and they can only repair jobs that are assigned to that station. Constraint 3.1 stipulates that no repair crew's total repair time can exceed that of the worst repair time given by the objective value Z. This solution does obtain optimal solutions however at very long solve times. Given the nature of variables and constraints, serious storm events will have a significant increase in the number of decision variables and constraints. Consider a case of 300 crews, 600 outages, and 6 platforms to station those crews. The resulting formulation has approximately 2 million decision variables and constraints. Solve time for this particular case was on the order of days (relaxing Xijk to a continuous variable on the range [0, 1] still produced integral solutions but the solve time was still insufficiently long). It is possible that the software was exploring an extraneous number of solutions given the potential symmetry of those solutions. However even achieving gap values that were 3% of the optimal solution took hours. Given the importance of timing in Atlantic Electric's storm response planning an alternate simplified formulation was explored. 3.3 Relaxing the Problem Deeper analysis of the previous formulation showed that while we were obtaining optimal solutions, crews were essentially splitting the work evenly at each platform. In order to achieve the best state wide completion time, the optimization would drive solutions such that all crews were completing their work at approximately the same time. By making the assumption that 33 crews evenly split the work at each platform we can drastically simplify the problem. We no longer need to assign crews to jobs and platforms. Now our decisions are reduced to assigning jobs to platforms and the the number of crews to each platform. Table 3.2: Relaxed Formulation Variable Notation Notation Xjk Decisions Job j assigned to platform k Ck Platform k fractional workload C* Number of crews assigned to platform k C Data Description Objective value indicating worst platform fractional workload j from platform k Yjk Time required to do job Mk Crew capacity for platform k C* Total number of crews available The new mathematical model can be represented as the following: Objective: Minimize C subject to the following constraints Ck< (3.1) C k SJkXJk < Ck kXj >1 Vk (3.2) Vj (3.3) k We interpret the number of crews at each platform as the following: Ck Ck = C C* (3.4) Vk Ck* < Mk (3.5) The constraints are analogous to those in the master formulation. The major difference being that now we ensure a statewide completion time with a combination of constraints (3.1) and (3.2). Constraints for ensuring that all jobs are completed and all platform capacities are met are similarly modified from the original formulation. 34 - .. .. .......... -...- - - -........... -1--.1.......... 11111 "I'll ................ ... 1 11.................... .. ................. -I--I I-- i . ..--.. .... .... .. 0 Figure 3-1: This demonstrates the method used by the optimization to assign each job to a platform. The about of time to complete a job is represented by the dashed red line, which is unique to each platform. This new formulation is a significant decrease in problem complexity. In our master solution we examined one potential scenario of 300 crews, 600 jobs, and 6 platforms which contained approximately 2 million decision variables and constraints. The relaxed formulation reduces the problem to approximately 7 thousand decision variables and constraints and solve time is now on the order of seconds. 3.4 Comparison of Master and Relaxed Formulations While the improvement in computation time for the relaxed solution will allow Atlantic Electric to adequately utilize the model within the time constraints imposed by storm operations, it is important to verify that the new formulation produces adequately optimal results given the assumptions made. In order to validate the relaxed model we will first look at the accuracy of the assumption that the model allocates work among crews evenly in the master solution. Next we will compare workload of crews in the master formulation and the relaxed formulation at a platform level'. 3.4.1 Test Evenly Distributed Workload Assumption In order to test the validity of our assumption we ran the master formulation on several notional data set scenarios. The data sets were created by randomly sampling outages from previous storms and only opening two platforms for stationing. These are representative of actual storm scenarios that Atlantic Electric faces but are on a smaller scale to manage solve time. The 'It should be noted that here we are still using nominal -Yjk values as input data into our models. 35 histograms below show the crew workloads for each platform for all three scenarios. 1.0 Crew Workload I by Platform , 64 2- 0.8- 0.6- *i0.4 -f E U 1 0 ' 0.2J 0.0[ ' ' 360.0 360.5 361.0 361.5 362.0 362.5 363.0 4.0 335 6 3.5- 5 4- . . 3.0 2.5E 2.0- 340 345 350 355 360 365 3 1.5 - 2 1.0 0.5 352 353 354 355 3 56 35 7 358 359 360 361 3.0 2.5 2.0.E 1.51.0 354 355 356 9 8 76- I 357 358 359 360 361 5124Liii 335 340 345 3 50 355 360 365 370 PI atform 1 325 330 335 340 345 350 355 360 365 370 Platform 2 Figure 3-2: Crew workloads for three randomly sampled sets Ultimately the crew workload differ by only minutes (compared to hours of total work). The difference is due to the fact that we don't allow crews to go "help" another crew that is still finishing a job if they have finished early. Overall these differences suggest that our assumption of equal distribution of workload is a reasonable one. 3.4.2 Comparison of Crew Assignments Between Master and Relaxed Formulation Now we will examine the same three scenarios and compare the optimization results from the original and relaxed formulations. Ultimately we want to ensure that both optimization results are producing similar numbers of crews at the platform level. The crew allocations for the three test scenarios used earlier are given in Figure 3-3. 36 40 32 4033 Simulation 1 20 10 0 40 6 7 Pkonn I Pktbrn 2 29 Smulaon 2 31 30 20 10 0 30 Pufon I Phdi*rm 2 28 S mulation 3 27 20 10 0 10 11 PWtOrM I Pkfflwrm 2 Figure 3-3: Comparison of Results in Master and Relaxed Formulations Our relaxed formulation never deviates from the master solution by more than a few crews. Given the level of granularity of data and our current accuracy for outage prediction, we believe that this is sufficiently close to the optimum solution. Individual crew workloads may differ by values more than 10% on occasion but the ultimate operational question we are trying to answer is quite close to the master counterpart. 3.5 Optimization Under Uncertainty As stated earlier we have been dealing with known, nominal values of 'Yjk in our optimization formulations. However this will never be the case given that: " Our weather forecast information, W, always contains inaccuracy and is an input we are constrained by " Predictions currently only assign P(OutagejW) (or a classification of an outage) " Any given outage can be caused by a combination of issues (i.e. an outage can be caused by a broken pole and three down trees or it can be caused by icing on a transformer causing open breakers and downed wires) which lead to variable repair times It is impossible to escape these uncertainties so we must program them into our optimization to ensure that our solutions are always valid despite the variation in potential workload values. Recall from our relaxed formulation that the only constraint using our unknown workload (-Yjk) 37 values was the following: -YjkXjk < Ck Vk We can amend this constraint to include all possible values of 'Yjk and ensure that any solution given by our optimization will be a valid one with the following: > YkXjk < Ck,Vk, -Y,k EUV Here U is the set of all values that -y can potentially take on. Given this new constraint all solutions from the model are guaranteed to be valid. However in this current form we no longer have a mixed integer program (MIP). Understanding the nature of U is important to re-modeling the formulation back to a MIP [4]. 3.5.1 Robust Optimization Using Box Constraints The simplest method is to assume all where -yjk E [ljk, Ujk]. -Yjk values reside in a "box" (a one dimensional range Now our uncertainty set U takes the following form: Uk -Y IVi,1j k <- -Yjk <- Ujk}IVk Recall our original constraint which contained uncertainty: S kXjk CkYJk E Ujk,Vk The above constraint can now be re-written as: UjkXjk < CkVk While the implementation of this solution is easy note that we now assume that all jobs take on the worst case scenario value. Given that we assumed all predictions of outages happen independently, we can assume that their respective repair times are also independent random variables. Assuming that all of the restoration times will take on their upper bound value simultaneously is highly unlikely and yields significantly lower and conservative objective value [4). 38 3.5.2 Robust Optimization Using Bertsimas-Sim Uncertainty Sets Instead of assuming all restoration times assume a worst case value we can now consider a case where the uncertainty set U is comprised of values where a fraction assume a worst case scenario and the remainder are forced to their nominal values. Consider the following representation of the uncertainty set: 'YYj Uk {-YIVi, Yk C [-Yjk In the above specification of UA, ±7jk] k c:'YF F}k -jl -j -)7k represents the nominal value of Yjk and 73k represents the F then bounds the total deviation from the half width on the interval of which ^Yjk can reside. nominal -Yjk values in the uncertainty set. Ultimately F is a parameter that specifies the number of values that assume the extreme values ('yr,- 73k, 7Yjk + 73k) [4]. Selection of the F parameter will be discussed later in this chapter. To make the constraint robust we can dictate that: 'Yjk =Jk + -YjkUjk where all 'Yjk are the nominal workload values, 73k is the potential deviation from the nominal workload value, and ujk indicates the direction of that deviation (positive or negative). The total deviation from nominal for all Yjk must be bounded and therefore all Ujk must reside in the following uncertainty set: Uk,u = ulV , Ujk E [- Taking this representation of our uncertainty set 11; lUjk|I < IF Uk ,u the original constraint can now be repre- sented as the following: ZkXjk i - max ZUjk-YkXjk < Ck U A. 39 The max problem on the right hand side is a linear optimization of the following problem: Z(ujk - maximize uj-k)-jkXjk j subject to U+ + u- < F, Vk u,jk' uk jk <1, Vj, k k 0 The above optimization has a bounded finite region, thus attaining a finite optimum value. By strong duality, the following optimization is also feasible and will obtain the same optimum value [4]. minimize FRk + ( ±'---j subject to Rk + r+ > yJkX~k, Vj, k Rk + rT Vj, k -> jkXjk, Vj, k Rk, r ,rj Using the above transformations, our original robust constraint: E kXjk + max UIEUhtu i 5UjkyjkXjk 5 Ck Is equivalent to the following set of constraints: ZkXjk +r+ +A Rk + K Rk + Vi--k. > -- R, rA,r +FRk YjkX~k, Ck Vk VJ kXjk, Vj >0 Vj, k Combining this with our original relaxed formulation we now have a robust solution that does not unnecessarily limit the objective value of the solution. Note however that the above solution does include the possibility that some constraints will be violated (i.e. it is possible for the solution to be infeasible) [4]. The probability of this occurrence is dictated by F. Larger F values will produce more robust formulations but hinder the resulting objective. 40 Our formulation can now be written completely as the following: Objective: Minimize C subject to the following constraints [_ _kXjk -+-+,+ r4-k,] + r R& k <Ck l+ '-j ±T.k ZK~k Rk -rAik _> YkXjk, Rk + rik- > -JkX3k, Rk, r+ r jk' >0 (3.1) VJ (3.2) Vj (3.3) Vj, k (3.4) Vj (3.5) Jk >- Xk > 1 ( Vk k (3.6) , Ck < C k We interpret the number of crews at each platform as the following: C* Ck* = Ck C (3.7) Vk CE < A4k (3.8) Similar to the earlier formulations we discussed, equation 3.8 ensures that platform capacity Mk. (the maximum number of crews that can be station at platform k) is not exceeded, equation 3.5 ensures all jobs are completed and equation 3.6 ensures the earliest state wide completion time. Ultimately the model is optimizing Ck, the number of crews that are assigned to each platform (refer to Table 3.2 for a complete description of model variables). In the next section we will discuss the trade offs between different choices of F by examining the historical repair information from Atlantic Electric and noting how that informs our decisions when choosing our half width values (-Y) 3.5.3 and F. Determining Robustness of Solution In order to choose the parameters of our robust optimization sufficiently we must first obtain an understanding of previous damaging events. Because our prediction model only predicts outages and not damage (which is a limitation of the data available and not the model), examining historical information will be critical. Atlantic Electric's information on damage is rather is limited but they have a much richer data set on previous outages. The data set contains data on what device opened, when it 41 opened, how long until it was restored, and how many customers were affected by this event. In order to build a proxy for repair time we decided to use the difference between the time a device initially went out (open time) and it when it was re-energized (closed time) and exclude outages during storm events. During a storm a particular outage may be repaired but the circuit might not be energized for a number of reasons " A circuit will not be re-energized if other crews are repairing damage that a given device feeds to " Only qualified Atlantic Electric employees can re-energize a circuit, a contract crew may make the repairs but the system will not turn on until a Atlantic Electric employee reenergizes the circuit thus affecting the turn on time " Storms have "emergency" mode where crews are only repairing damage that causes a risk to the public. Other non-threatening outages therefore have a longer downtime not because repairs take longer but simply because crews are not authorized to work Because of these operational differences a restoration time is not a good proxy for repair time during a storm. Excluding storm outages from the historical data set we see the following restoration profile. Historical Non-Storm Outages 7000 -I ijk = 166.67 6000F jk = 544.09 1 50000 .0 04000 - E z 30002000 1000 400 600 800 1000 1200 Repair Time (mins) Figure 3-4: Historical Non-Storm Outage Histogram While the data clearly does not show a normal (or even centered) distribution choosing a nominal ^yjk that equals the average of this distribution and a half width value that covers a large 42 portion of the histogram is sufficient. The above histogram then yields that -y3k = 166.67 and a half width value -Yjk = 544.09 (2 times the standard deviation of the data). It should be noted that the above values were all non-storm values restoration times so an average value that is higher than the median is still reasonable. Similarly for our half width value, ysk, we have selected to encompass all of the distribution (with a select few major outliers excluded). These values are a bit conservative but they are still a valid representation of the data and the implementation of the robust solution is significantly easier. We can generate a P histogram by simulating random draws with replacement from the historical repair time histogram and calculating the cumulative relative error from the average repair time as given below z p Yik 7Yk -. -Yj k Running 50 simulations with a scenario that contains 600 outages, using random draws from our historical data with replacement produces the following result Imputed r Simulation Histogram 3.5 90th percentile 3.0C r= 15.81 2.5 E 2.02 z 1.5 1.0. 0.5 0.01 12 13 14 15 Imputed Figure 3-5: r Value 16 17 18 F Simulation Histogram F parameter such that it encompasses a significant portion of this histogram will ensure that our solution will remain valid with very high probability. A 90th percentile F Selecting a final = 15.81 and is sufficiently robust to meet the operational situation that Atlantic Electric faces. While even large deviations (both in number of outages and simulations) yield relatively similar F, values it is recommended that the simulation be re-run with each storm. 43 While the nominal 7jk value is given by 166.67 in our scenario, we still apply a weighted distance to that value when applying it to each of the platforms. Even though the repair time will be the same the added time helps to account for variables such as travel. This weighting can be tuned by the user who can make estimates based upon weather conditions, logistical constraints, and other factors that will increase the time to until a device is repaired that are independent of the repairs themselves. 3.5.4 Comparison of Optimization Results We can compare our model's results with actual Atlantic Electric operations using data from Hurricane Irene. Figure 3-6 shows the number of crews stationed at each platform for three scenarios: actual deployment, base model results, and increased robustness. The increased robustness scenario is a scenario in which I is higher than the previously calculated value to further ensure feasibility of the solution. 70 60 50 Number of Crews -- ---- - Grid Model U Increased r UNational 40 URobust 30 - 20 10 0 /////Q , #k#Q Z 04 Figure 3-6: Comparison of Optimization Results We can see that Atlantic Electric staged the vast majority of their crews at Platforms H and Q. This practice is common as those platforms are the largest and Atlantic Electric often waits until damage assessment processes have been completed before stationing crews. Because our outage prediction model can give us an advanced look of where damage will likely occur, we can use our model to station our crews optimally. In this case Atlantic Electric would have 44 benefited from a more even distribution of their crews across their staging locations. 3.6 Conclusion The work presented in this chapter provides Atlantic Electric with clear guidance for stationing their repair crews despite the fact that there are significant unknowns in this process. The model presented accounts for data error and ensures that a sufficiently optimal result is reached to ensure an appropriate state wide completion time. Some factors were not considered in the model that may affect Atlantic Electric's decisions. For example, Atlantic Electric may want to stage more crews in a specific location not because it has more outages, but because it has more customers or more important customers (schools, hospitals, etc). Given that these factors are not present in the model it is unlikely that Atlantic Electric will follow the model results precisely, however it will still be able to provide concrete intuition as to how they should generally station their crews to best serve their customers. 45 THIS PAGE INTENTIONALLY LEFT BLANK 46 Chapter 4 Conclusions and Future Work 4.1 Improvements to Prediction Model While the work presented here will aid Atlantic Electric in significantly improving their storm response operations, there are several improvements to these models that can still add significant value to Atlantic Electric. 4.1.1 Bootstrapping As mentioned in Section 2.1.3 we have very few positives samples to train the model. It would be possible to aid the prediction model by bootstrapping the positive samples to match the number of negative samples. Bootstrapping is the process of sampling with replacement from the existing positive sample set to create a new data set equal in size to the number of negative samples. This aids the model in capturing the variance of the processes we are trying to model and evenly weights the positive and negative samples. An even weighting of positive and negative samples would help ensure that the model is not over fitting to non-outages. It is possible that bootstrapping could lead to lower tree diversity [18] and reduce our model specificity. However given the imbalance between positive and negative samples and the current level of accuracy in specificity, it is likely that a beneficial trade-off exists by using bootstrapping. 4.1.2 Parallelization Currently the model is completely serial and computation time grows directly with the number of storms being applied. As Atlantic Electric continues to gather more data from future storms the model will take longer to train. Decision trees in themselves cannot be parallelized but other 47 models such as random forests can be easily parallelized. Because of this inherent parallelization Atlantic Electric can utilize larger computing machines on AWS with multiple cores to train the prediction model faster. It is possible that a random forest might not train as well as the original model but with sufficient data and enough trials it is likely that a random forest model will yield sufficiently optimal results. 4.1.3 Principal Component Analysis on Weather Information Reducing the number of features in the model improves generality and prevents over fitting. While we removed features that would obviously not impact outages (i.e. indoor temperature in the weather logs) and features that had extremely low levels of variance, it is possible that feature reduction could be implemented further using principal component analysis (PCA) on the available weather features and/or asset features. It is possible that using PCA could identify features, or combinations of features, that have little or no effect on producing outages. Removing such features would not only improve model generality but also create the opportunity to improve the model further. Because more degrees of freedom would be available, new feature combinations can be experimented with to improve the model's efficacy. 4.2 4.2.1 Improvements to Optimization Model Data Inaccuracies Atlantic Electric currently lacks a data set that can capture the operational reality of repairs during a storm situation. Even their restoration data is laced with variables that are difficult to quantify because currently the only well documented aspect of a storm is what devices opened and when they were closed. Because we do not know exactly when crews began work on a particular damage point and when they finished (re-energizing a circuit does not necessarily occur when the damage is repaired), we cannot extrapolate precise repair times. A richer data set that details the repair times of outages is critical to building an optimization formulation that can produce an optimal solution without the requirement of remaining robust. 4.2.2 In Storm Planning Once a storm has passed, Atlantic Electric crews scout assets and do damage assessment of the storm. Currently these processes are all done on paper and all damaging events are not documented. Digitizing this information would allow the optimization model to be re-run after the storm has hit using actual damage data to continue to help in crew deployment and re- 48 deployment. Atlantic Electric is currently in development of a new damage assessment system that will use tablets and cell phones for damage assessment data entry. With the information in a digital format it can be easily utilized in the optimization model and re-run after the storm has passed. Because Atlantic Electric continually repositions crews throughout the storm the optimization model can be continually re-run with updated damage information as the storm response progresses to aid in operational decisions. 4.2.3 Customer Constraints Atlantic Electric often places crews to ensure that they are repairing outages that reach larger numbers of customers first, or their more important customers (schools, hospitals, etc). These constraints are not currently added into the model largely because of the amount of time it would take to quantify these constraints effectively. Adding constraints such as these will ensure that the model is more closely aligned with Atlantic Electric's incentives with regards to their customers. 4.3 Conclusions Traditionally if a firm provided a poor service or product, customers would simply move to another provider. While this is not the case in utilities, their revenues are directly tied to customer service quality as determined by their government regulators. Additionally weather causes significant damage to their assets that also require costly repairs. The damage from weather and the subsequent power outages are unavoidable, but utilities such as Atlantic Electric must rely on proper planning and operational execution after the event to ensure that assets are repaired in a quick, cost-effective manner. The models outlined in this paper provide a data driven approach to improve Atlantic Electric's ability to analyze incoming weather patterns and properly create a response plan. As regulator demands are increasing, response plans that are built on data driven models present a strong case to regulators to justify the operational response of utilities. These models also allow for a more seamless transfer of knowledge, which is crucial given that utilities currently require subject matter experts to make educated guesses for storm response planning. The knowledge transfer of these subject matter experts is a long process that requires new employees to obtain years of experience. The models presented here still require knowledge from critical employees but do not require years of training and experience to run. Furthermore, as higher quality data becomes available, the model efficacy will continue to improve. As Atlantic Electric continues to 49 encounter damaging weather events, the model will continue to develop new understanding on weather and asset feature interactions that lead to outages, which will aid in Atlantic Electric's ability to repair any resulting damage and return service to its customers. 50 Appendix A In Sample and Out of Sample Prediction Results The following tables show the model's predicted outage results and observed outages for each storm. There are four sets of model results for each storm created by combinations of using logs and forecasts and in sample and out of sample testing. In sample tests train on all storms including the storm being tested. Out of sample tests train on all storms except for the storm being tested. This is a better indicator of how well the model will do when in operation as a storm's precise weather information will not be available to assist in training the model. 51 1 14 1 Forecasted In Sample Forecasted Out Of Sample Actual 78 1 20 Forecasted Out Of Sample Actual 20 70 Forecasted In Sample Logs In Sample 70 Logs In Sample Logs Out of Sample 37 Logs Out of Sample 23 2 1 2 41 31 42 36 79 43 25 47 47 157 46 67 30 1 8 27 17 25 24 124 48 10 55 56 40 235 127 235 203 94 11 14 12 13 68 156 50 155 151 51 419 38 366 109 22 3 41 2 2 80 31 27 30 32 61 23 61 22 41 24 124 124 123 310 0 2 218 2 0 11 104 181 99 92 119 0 4. 2 22 2 2 24 00 2 197 2 1 69 8 3 12 12 12 2 18 2 2 52 1 85 1 0 23 2 6 27 5 4 39 178 31 178 169 63 4.C-Y 166 1 0 56 Table A.2: December 2010 Winter Storm Results 425 115 37 104 102 128 00 289 85 75 87 55 77 C) 0:Z Table A.1: October 2011 Winter Storm Results 3 33 3 0 1 121 26 113 11 3 7 106 5 3 17 202 91 196 133 91 0 . 110 11 34 11 12 86 4. 385 38 381 380 55 29 42 2 36 2 2 15 60 45 Q 84 7 27 9 9 87 298 160 294 260 378 2214 375 314 749 2551 973 2447 1851 1358 81 22 23 59 25 Logs In Sample Forecasted In Sample Forecasted Out Of Sample Actual 57 89 58 15 67 51 Actual Logs Out of Sample 53 41 Forecasted Out Of Sample / 58 45 Forecasted In Sample 227 96 223 199 255 55 13 41 Logs In Sample 14 37 Logs Out of Sample / 0 49 82 51 41 230 128 43 126 94 94 80 52 80 69 35 122 27 92 96 86 80 17 100 75 6 245 77 47 150 73 145 35 59 125 5 244 387 15 320 247 33 33 32 18 29 64 50 32 52 63 61 52 38 54 68 78 40 78 77 35 / 81 36 45 60 43 32 81 79 142 196 0 39 44 40 41 59 60 39 59 59 160 Table A.4: Hurricane Irene 81 79 84 84 75 Table A.3: Hurricane Sandy 14 52 70 23 35 11 11 72 32 22 43 23 18 32 32 10 20 57 5 15 8 88 58 58 58 53 120 33 5 0 2 63 89 80 44 20 24 18 3 5 60 88 86 63 91 109 110 109 76 89 49 79 109 106 66 107 91 75 0 84 88 83 80 156 332 35 21 32 17 20 172 0 141 165 141 137 77 1225 1598 87 95 1528 1289 83 89 2839 118 1323 1109 1287 991 931 102 22 96 Actual 75 Actual Forecasted Out Of Sample 92 Forecasted Out Of Sample 91 72 Forecasted In Sample Forecasted In Sample 72 Logs In Sample Logs In Sample 103 Logs Out of Sample 189 26 230 57 26 111 26 11 35 46 38 44 164 2 60 2 2 45 7 115 7 7 73 32 59 25 3 19 12 77 12 12 43 k 4 45 5 4 18 ~ 23 120 23 20 32 22 98 24 23 44 16 82 16 16 48 12 53 9 1 40 5 27 5 5 32 20 33 31 19 15 15 1 0 14 21 15 13 3 12 3 3 259 32 317 220 1 8 4 1 91 22 94 67 19 32 16 8 Table A.6: December 2008 Snow Storm 102 102 105 83 121 0 Table A.5: February 2010 Snow Storm 32 8 26 8 15 61 9 8 50 81 30 100 82 62 140 58 55 93 0 3 19 0 42 29 20 0 20 32 0- 0 158 22 175 59 0; 10 136 10 5 26 0 42 197 44 43 110 212 44 223 210 0. 0 2 12 3 1 3 6 3 1 32 0 42 296 17 311 262 9 209 10 10 1595 563 1738 1157 494 1825 479 376 977 Appendix B Complete Maximum Likelihood Formulation This appendix is a complete description of the model formulation using maximum likelihood techniques to solve the outage prediction problem as developed in [25]. Failure events happen independently across the network and at a rate proportional to their length, and linearly dependent in the the surrounding weather features wt: Ast = lsg*,Wt Where i is the length of the segment s and g,, is the vector of vulnerabilities for segment type cS (c. being the type segment s). This makes no assumption of weather features and allows for certain segment types to be vulnerable to damage under different weather conditions. Because of data granularity Monsch aggregated damaging events at the asset level. Because of the independence assumption we can represent this damage at each asset a with the following: 1,g*Cc Wt Aa,t = sEa Z la,cgc wt C = where l*gwt la is the vector of lengths of each segment type in asset a, and g is the matrix consisting of the vectors ge assembled column-wise. The previous equation is linear in terms of the coefficients of the matrix g. By rearranging the terms, we can therefore rewrite in a more common way as 55 a product of vectors: AaytXa,t where Xa,t is the vector representation of the matrix latw* which contains features for an asset under specific weather conditions. We will use i to represent the (a, t) pairs from now on. By definition of a Poisson process, the total number Y of events on each asset is a Poisson random variable with parameter: Yi ~ P(Y*xi) From here we also get the probability distribution of a failure happening on a given asset. Let Zi be the indicator variable corresponding to at least one event having happened on an asset: Zi = 1 if and only if at least one event occurred on one of its segments, i.e. Zi = min(Yi, 1). Therefore Zi is distributed according to the following exponential distribution: Zi ~ E(1 - ey*xi) Now when estimating y there are two cases to consider. First the case where Y is observable (i.e. we have access to the underlying event data), then the case where only Zi. The next sections show that the maximum likelihood estimator for y can be efficiently estimated in both cases. B.1 Without Censoring In this case, we are able to recover the exact count of events that occurred on an asset. The corresponding likelihood function is therefor given by: L(7) = P(Vi, Y = yjIy) As is commonly done in literature, we will focus on the log-likelihood loss function (where K is a constant that does not affect the optimal choice parameters): L(-y) -InL(y) + K =[-*xi - yiln(y*xi)] We then have the following result: Theorem 1. The following maximum likelihood problem is convex: 56 max L() -y Proof. Convexity is immediate when observing that the Hessian matrix of the likelihood function can be written as follows: y Xkil H= (k,l) = (Dx) * (Dx) . D is a diagonal matrix with entries Di,i = B.2 H is therefore a semi definite matrix. With Censoring Because of the available data Monch focused on the following likelihood function instead: L(y) = P(Vi, Zi = zi y) And the following log-likelihood function: L(y) =-InL(y) + K >3*Xi - = zi=o 1: ln(1 - e--<* i) zi=1 A similar result is found in this setting: Theorem 2. The following maximum likelihood problem is convex: maxL(-y) Proof. Once again, the Hessian matrix is a positive semi-definite as can be seen in the following expression: H= ~Zi (>z i,kii,l (ey*xi _ 1)2 (k,1) x~,x~,l = (Dx) * (Dx) where D is a diagonal matrix with entries Di,i = 57 . THIS PAGE INTENTIONALLY LEFT BLANK 58 Appendix C Complete Variable Notation C.1 Outage Prediction Variable Notation Table C.1: Outage Prediction Variable Notation Variable Description Variable Description k The number of segment clusters selected to describe Atlantic Electric assets di Device i 1m Length of segment of type m (k total) ai Asset indexed by i, linked with device Wf. Value for weather feature Oa Outage at asset a A,,t Probability of damage at segment s on storm t Is Length of segment s Vector of vulnerability values for each type of segment Wt Weather for storm Aa,t C.2 f at asset a t Probability of outage at asset a on storm t Crew Optimization Variable Notation Table C.2: Crew Optimization Variable Notation Variable Description Variable Description Xijk Crew i assigned to job j at platform k Xik Crew i assigned to platform k 10A Time required to do job AIAk Crew capacity for platform k Z Worst repair time in master formulation Xk Job CA Fractional workload at platform k C* Number of crews assigned to platform k Worst fractional workload simplified C* Total number of crews available j from platform k formulations 59 j assigned to platform k THIS PAGE INTENTIONALLY LEFT BLANK 60 Bibliography [1] Nagaraj Balijepalli, Subrahmanyam S Venkata, Charles W Richter Jr, Richard D Christie, and Vito J Longo. Distribution system reliability assessment due to lightning storms. Power Delivery, IEEE Transactions on, 20(3):2153-2159, 2005. [2] Siddharth Balwani. Operational Efficiency through Resource Planning Optimization and Work Process Improvment. Master's thesis, Massachusetts Institute of Technology, 2012. [3] Susan M Bernard, Jonathan M Samet, Anne Grambsch, Kristie L Ebi, and Isabelle Romieu. The potential impacts of climate variability and change on air pollution-related health effects in the united states. Environmental Health Perspectives, 109(Suppl 2):199, 2001. [4] Dimitris Bertsimas and Melvyn Sim. 52(1):35-53, 2004. The Price of Robustness. Operations Research, [5] R Billinton and JR Acharya. Weather-based distribution system reliability evaluation. 1EE Proceedings-Generation,Transmission and Distribution, 153(5):499-506, 2006. [6] R Billinton and G Singh. Application of adverse and extreme adverse weather: modelling in transmission and distribution system reliability evaluation. In Generation, Transmission and Distribution, IEE Proceedings-, volume 153, pages 115-120. IET, 2006. [7] Roy Billinton and Janak Acharya. Distribution system reliability assessment incorporating weather effects. In Universities Power Engineering Conference, 2006. UPEC'06. Proceedings of the 41st International,volume 1, pages 282-286. IEEE, 2006. [8] Elin Brostrdm and Lennart Sdder. Modelling of ice storms for power transmission reliability calculations. In in Proc. 15th Power Systems Computation Conference PSCC2005, Liege, 2005. [9] Heidemarie C Caswell, Vincent J Forte, John C Fraser, Anil Pahwa, Tom Short, Mark Thatcher, and Val G Werner. Weather normalization of reliability indices. Power Delivery, IEEE Transactions on, 26(2):1273-1279, 2011. [10] Chad Shouquan Cheng, Heather Auld, Guilong Li, Joan Klaassen, Bryan Tugwood, and Qian Li. An automated synoptic typing procedure to predict freezing rain: An application to ottawa, ontario, canada. American Meteorological Society, 19(4):751-768, April 2004. [11] A Domijan Jr, A Islam, WS Wilcox, RK Matavalam, JR Diaz, L Davis, and J D'Agostini. Modeling the effect of weather parameters on power distribution interruptions. 7th IASTED Int. Conf. Power and Energy Systems, Clearwater Beach, Fl, USA, 2004. [12] A Domijan Jr, RK Matavalam, A Montenegro, WS Wilcox, YS Joo, L Delforn, JR Diaz, L Davis, and JD Agostini. Effects of norman weather conditions on interuptions in distribution systems. Internationaljournal of power & energy systems, 25(1):54-61, 2005. 61 [13] Lawrence C Hamilton, Cliff Brown, and Barry D Keim. Ski areas, weather and climate: time series models for new england case studies. International Journal of Climatology, 27(15):2113-2124, 2007. [14] Seung-Ryong Han, Seth D Guikema, Steven M Quiring, Kyung-Ho Lee, David Rosowsky, and Rachel A Davidson. Estimating the spatial distribution of power outages during hurricanes in the gulf coast region. Reliability Engineering & System Safety, 94(2):199-210, 2009. [15] Li. Hongfei, L.A. Treinish, and J.R.M. Hosking. A statistical model for risk management of electric outage forecasts. IBM Journal of Research and Development, 54(3):8:1-8:11, May-June 2010. [16] Vincent J. Forte Jr. Faults caused by drought stressed trees. Distribution system engineering, National Grid, November 2004. [17] Barry D Keim, Loren David Meeker, and John F Slater. Manual synoptic climate classification for the east coast of new england (usa) with an application to pm2.5 concentration. CLIMATE RESEARCH, 28:143-154, 2005. [18] Fei Tony Liu, Kai Ming Ting, and Wei Fan. completely-random decision trees. Maximizing tree diversity by building [19] Haibin Liu, Rachel A Davidson, and T Apanasovich. Statistical forecasting of electric power restoration times in hurricanes and ice storms. Power Systems, IEEE Transactions on, 22(4):2270-2279, 2007. [20] Haibin Liu, Rachel A Davidson, and Tatiyana V Apanasovich. Spatial generalized linear mixed models of electric power outages due to hurricanes and ice storms. Reliability Engineering & System Safety, 93(6):897-912, 2008. [21] Stuart P. Lloyd. Least Squares Optimization in PCM. IEEE Std. 1516-2000, IT-28, 1982. [22] Christopher Loh. AG Seeks More Than $16 Million in Penalties for Inadequate Storm Response by National Grid, 2012 (accessed February 18, 2014). [23] Roop Kishore R Matavalam. Power DistributionReliability as a Function of Weather. PhD thesis, University of Florida, 2004. [24] Michael A McGeehin and Maria Mirabelli. The potential impacts of climate variability and change on temperature-related morbidity and mortality in the united states. Environmental Health Perspectives, 109(Suppl 2):185, 2001. [25] Matthieu Monsch. Large Scale Prediction Models and Algorithms. PhD thesis, Massachusetts Institute of Technology, 2013. [26] Galit Shmueli, Ntiin R. Patel, and Peter C. Bruce. Data Mining for Business Intelligence. John Wiley and Sons, Inc., 2nd edition. [27] Fei Xiao, James D McCalley, Yan Ou, John Adams, and Steven Myers. Contingency probability estimation using weather and geographical data for on-line security assessment. In ProbabilisticMethods Applied to Power Systems, 2006. PMAPS 2006. International Conference on, pages 1-7. IEEE, 2006. [28] L. Xu and R.E. Brown. A framework of cost-benefit analysis for overhead-to-underground conversions in florida. In Power & Energy Society General Meeting, 2009. PES '09. IEEE, pages 1-7, Calgary, AB, July 2009. 62 [29] Yujia Zhou, Anil Pahwa, and Shie-Shien Yang. Modeling weather-related failures of overhead distribution lines. Power Systems, IEEE Transactions on, 21(4):1683-1690, 2006. 63