Anomaly Detection in Premise Energy Consumption Data Jason W Black (Energy Systems, Battelle) Yi Zhang, Weiwei Chen (MSL, GE Global Research Center) CMU Electricity Conf March 2012 Demand Response Estimation • Critical X’s: weather, Temperature, seasons, days of week, devices, time0 • X’s are gathered from: • AMI data • Real time weather forecasts • Historical data DR Forecast 18,000 17,500 • Methods: • Sampling Methods • Categorization Techniques • Empirical Methods • Regression techniques • Bayesian methods • Artificial Intelligence – Neural Nets, SVM,0 Load (MW) 17,000 16,500 16,000 15,500 Base Load Forecast DR Actual 15,000 DR Forecast 14,500 13 14 15 16 17 18 Hour 19 20 21 22 23 24 • 2 Error Sources: - Base Load Forecast - Response Estimate 2/ GE Title or job number / 3/13/2012 Typical Household Load Patterns Weekend with heating/AC Weekday without heating/AC Weekday with heating/AC Anomaly 3/ GE Title or job number / 3/13/2012 Proposed Methods for Anomaly Detection Regression-Based Anomaly Detection Distribution-Based Anomaly Detection Attribute-Based Anomaly Detection Data collections and extraction Which method? Entropy calculation Data preprocession Distribution-based method Attribution generation Regression-based method Attribute-based method Output the anomalies 4/ GE Title or job number / 3/13/2012 Regression-based Anomaly Detection 5/ GE Title or job number / 3/13/2012 Distribution-based Anomaly Detection In information theory, entropy is a measure of the uncertainty associated with a random variable. Normal Weekday E(Abnormal) < E(Normal) High Entrop y Low Entrop y Abnormal Weekday 6/ GE Title or job number / 3/13/2012 Attribute-based Anomaly Detection Attributes generated from daily consumption data are: mean Variance Maximum range (=maximum - minimum) ratio (=maximum/minimum). Log(Var) Normal Weekday High usage mean and peak Vacation Days Weekdays Weekends Low usage mean and peak Vacation Weekday Log(Mean) 7/ GE Title or job number / 3/13/2012 Raw Data 2 sets of raw data Single customer data Detailed usage 8 months Time 1/1/2010 1/1/2010 1/1/2010 1/1/2010 Air A/C Down A/C Up Handler Air Entertain Condensi Condensi Downstair Handler Computer Dishwash Dryer(Gas ment Microwav ng Unit ng Unit s Upstairs Room er ) Center e Other Range 0:00 0 0 0 0 0.03 0 0 0.01 0 0.01 0:15 0 0 0 0.06 0.03 0 0 0.01 0 0.01 0:30 0 0 0 0.03 0 0 0.01 0 0 0:45 0 0 0 0.05 0.03 0 0 0.01 0 0.02 Pool Pump 0 0 0 0 Indoor Temperat Indoor ure Temperat Refrigerat Downstair ure Weather or Main Washer s Upstairs Temp 0 0.06 0.1 0 281.97 283.64 53.6 0 0.01 0.1 0 282.07 283.53 53.6 0 0.02 0.05 0 282.01 283.64 53.6 0 0.07 0.18 0 282.14 283.71 53.6 9-customer data Some information is missing 90 days 8/ GE Title or job number / 3/13/2012 Abnormal Days Detection (use statistics, machine learning, etc. to detect days with abnormal behavior/responsiveness in nonevent days or during DR events) 1.2 1 0.8 0.6 0.4 1 1.5 1 0.5 0.5 5 10 15 20 Usage(Kw) 2 1.5 1 0.8 0.6 0.4 0.2 5 10 15 20 4 3 2 1 5 1015 20 0.5 0.4 0.3 0.2 0.1 5 10 15 20 0.2 0.2 0.2 0.1 0.1 5 10 15 20 5 10 15 20 1 5 10 15 20 3 3 2 2 1 1 5 10 15 20 0.1 5 10 15 20 5 10 15 20 0.5 5 10 15 20 5 4 3 2 1 5 10 15 20 2.5 2 1.5 1 0.5 4 3 2 1 2 5 10 15 20 5 10 15 20 2 1.5 1 0.5 3 4 3 2 1 5 10 15 20 0.3 5 10 15 20 5 10 15 20 2 1.5 1 0.5 0.4 0.3 5 10 15 20 4 3 2 1 2 1.5 1 0.5 0.4 5 10 15 20 10 8 6 4 2 5 10 15 20 2 1.5 1 0.5 5 10 15 20 5 10 15 20 2 1.5 1 0.5 5 10 15 20 5 10 15 20 0.4 2 1.5 1 0.5 5 10 15 20 5 10 15 20 1.2 1 0.8 0.6 0.4 5 4 3 2 1 2 1.5 1 0.5 5 10 15 20 5 10 15 20 5 10 15 20 1 0.8 0.6 0.4 0.2 2 1.5 1 0.5 1 5 10 15 20 5 10 15 20 0.3 4 3 2 1 3 2 1 5 10 15 20 5 10 15 20 1.5 1 0.8 0.6 0.4 1 0.8 0.6 0.4 0.3 0.2 5 10 15 20 0.1 5 10 15 20 Time (hr) 9/ GE Title or job number / 3/13/2012 Simulation Results: 9-customer Data 10 / GE Title or job number / 3/13/2012 Simulation Results: Single-customer Data Detection Results ROC Curve 11 / GE Title or job number / 3/13/2012 Conclusions Anomaly detection achievable in demand response estimation. Proposed Regression-based, distribution-based, and attribute based methods can be used. Data mining on each customer’s power consumption profile is required for algorithm selection and parameter tuning 12 / GE Title or job number / 3/13/2012 Backup 13 / GE Title or job number / 3/13/2012 Distribution-based Anomaly Detection cont. 14 / GE Title or job number / 3/13/2012 Piecewise Regression 15 / GE Title or job number / 3/13/2012 Attribute-based Anomaly Detection cont. Algorithm: 1. Generate attributes from raw energy consumption data 2. Run k-Means clustering n times using random start point and select the best one output 3. Identify the cluster with the smallest mean usage as anomaly days 16 / GE Title or job number / 3/13/2012 Different Anomaly Detection Methodologies Regression based Distribution based Attribute based Log(Var) Vacation Days Weekdays Weekends Log(Mean) 17 / GE Title or job number / 3/13/2012 Demand Response (DR) Process Estimation of future energy consumption Estimation of future energy generation No Is DR needed? Finish Yes Assign DR event to customers DR estimation Does reduction amount meet needs? No Yes Adjust DR event assignment Sent DR event to customers 18 / GE Title or job number / 3/13/2012