A Multi-core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems Tao Cui and Franz Franchetti Email: tcui@ece.cmu.edu Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA Motivation and Background Methods & System Problems from Power System Distribution Power Flow Model • Uncertainties in electric power distribution system: I abc n c Vabc m d I abc m Vabc m A Vabc n B I abc m • Forward / Backward Sweep • Wind power, DG, PHEV, responsive load, etc. • Small complex Matrix-Vector Mult • Lack of real time measurements: Latest Results Performance Result: High Performance Computing Engine Speed (Gflops) Generate Random Samples • Probabilistic Model for Uncertainties: C++STL Speed Optimized C++STL Intel Fully Optimized C Array C Pattern SSE SSE+Pthread >40x 120 100 ~ 500x 1 Estimate Density 80 0.14 Load Flow Solver 1 60 0.12 0.1 0.1 • Wind forecast model, EV pattern, Load profile. 0.5 Load Flow Solver 0 1 … 2~64 parallel threads ... • A Probabilistic Solution of Distribution System 20 0.06 0.01 0.04 0 4 0.02 0.5 0 40 0.08 1 0.5 Load Flow Solver 0 Optimized Scalar SIMD (SSE or AVX) Multi-Core 140 10 Solve Load Flows Speed (Gflop/s) 180 160 Monte Carlo Simulation: • AMI, Smart metering may help, but still in preliminary stage. Performance on Different Machines for IEEE37 Testfeeder Performance 4 core Core2Extreme 2.66 GHz 0 -10 -5 0 5 10 16 64 Node (Bus) Numbers 15 Core2Extreme 2.66GHz (4-core, SSE) 256 Xeon X5680 3.33GHz (6-core, SSE) Core i5-2400 3.10GHz (4-core, AVX) 2 Xeon7560 2.27GHz (16-core, SSE) Translate speed (Gflop/s) into run time: Input with Randomness Physically Deterministic Method: Load Flow Output result with probability features Distribution System Probabilistic Monitoring System x Algorithm Campus Network Internet Web Server @ ECE CMU Web UI y Problem Size (IEEE Test Feeders) Approx. flops Approx. Time / Core2 Extreme Approx. Time / Core i5 Baseline. C++ ICC –o3 (~300x faster then Matlab) IEEE37: one iteration 12 K ~ 0.3 us ~ 0.3 us IEEE37: one load flow (5 Iter) 60 K ~ 1.5 us ~ 1.5 us IEEE37: 1 million load flow 60 G ~<2s ~<1s ~ 60 s (>5 hrs Matlab) IEEE123: 1 million load flow 200 G ~ < 10 s ~ < 3.5 s ~ 200 s (>15 hrs Matlab) Comments 0.01 kVA error SCADA Interval: 4 seconds Monte Carlo Results and Web User Interface Monte Carlo solver Simulated Meters Using Computing Nodes Active Role of Distribution System in Smart Grid 100 run Phase A *Original figure by Andrew Hsu (EESG) Web UI Advances in High Performance Computing Multi-core Desktop Computing Server Web UI Data Aggregator Meter Meter 10000 run 100000 run 1000000 run 10000000 run 20 20 20 20 20 15 15 15 15 15 15 10 10 10 10 10 10 5 5 5 5 5 5 0 0 0 0 0 0.8 Phase B Web UI 1000 run 20 0.9 1 0.8 0.9 1 0.8 0.9 1 0.8 0.9 1 0 0.8 0.9 1 0.8 100 100 100 100 100 100 80 80 80 80 80 80 60 60 60 60 60 60 40 40 40 40 40 40 20 20 20 20 20 20 0 0 0 0 0 0 0.9 1 Out: Voltage on Node 738 -3 4 0.98 1.02 0.98 1 1.02 0.98 1 1.02 0.98 1 1.02 0.98 1 1.02 0.98 1 1.02 3 2 Year 2010: Phase C Aggressive Code Optimizations on Monte Carlo Solver Image: Dell Workstation + GPU accelerator 1 x 10 • Data structure optimization + Core i7 + Tesla C1060 1Tflop/s peak performance $5,000 class • Algorithm level optimization Code Generation (Spiral) Substation N0 Image: Nvidia N2 Load N4 N5 Load Load Load pN0 pN5 pN1 pN4 pN2 pN3 pN3 pN2 pN4 pN1 pN5 pN0 High Performance Computing Enabled Power System Solution • Monte Carlo based Probabilistic Solution of Distribution System: • “Golden standard” for probabilistic power flow • Well fitted problem on modern computing platform • Extensible to contingency analysis, steady state time-series… An Affordable Supercomputing Center for Distribution Substation Pseudo-code for Matrix-Vector Multiplication: Matrix Pattern: N5 Data: Parameters Current Voltage N4 Data: Parameters Current Voltage r 0 0 0 r 0 0 0 r i 0 0 0 i 0 0 0 i c11 c12 c13 c21 c22 c23 c31 c32 c33 Array Access r: real number, i: imaginary number c: complex number • Require knowledge in specific domain & computer architecture • Potential Benefit: Moore’s law! Consecutive in Data Array ... • NO free speedup, parallelism, memory , special instructions N3 Backward Sweep Pointer Array ... • Fully utilizing the computing power is a difficult problem Load N1 Forward Sweep Pointer Array • SIMD vectorization // input[0~2]: vector real part; // input[3~5]: vector imaginary part; // constant: parameters (r or c on left). switch (matrix_pattern){ case real_diagonal_equal_matrix: output[0] = *constant * input[0]; ... output[5] = *constant * input[5]; break; case imag_diagonal_equal_matrix: output[0] = -*constant * input[3]; output[1] = -*constant * input[4]; output[2] = -*constant * input[5]; output[3] = *constant * input[0]; output[4] = *constant * input[1]; output[5] = *constant * input[2]; break; //cases for other frequent matrix patterns ... default: //default full 3x3 complex matrix ... } • Multithreading: Switch Buffer A,B Scalar Register: 1 float Scalar Instructions Sample FBS Load Flow Result Vector Register: 4 floats in SSE Computing Thd N RNG & Load Flow in Buf AN RNG & Load Flow in Buf BN Computing Thd 2 RNG & Load Flow in Buf A2 RNG & Load Flow in Buf B2 Computing Thd 1 RNG & Load Flow in Buf A1 RNG & Load Flow in Buf B1 SIMD Instructions Scheduling Thd 0 Sample FBS Load Flow Result KDE in all Buf Bs Result Out RNG: Random Number Generator KDE: Kernel Density Estimation Real Time Interval (SCADA Interval) Sync Signal Out KDE in all Buf As Result Out Sync Signal Squeezing Computation Power out of the Computer Architecture. Push Performance to the Hardware Peak. 150 150 150 150 150 150 100 100 100 100 100 100 50 50 50 50 50 50 0 0.93 0.94 0.95 voltage, (p.u.) 0 0.93 0.94 0.95 voltage, (p.u.) 0 0.93 0.94 0.95 voltage, (p.u.) 0 0.93 0.94 0.95 voltage, (p.u.) 0 0.93 0.94 0.95 voltage, (p.u.) 0 0.93 1 0 -400 0.94 0.95 voltage, (p.u.) -200 0 200 Active power, kw 400 In: Active Power P~ u=0,std=100kw on Phase A of Node 738,711,741 Conclusions & Future Work Program optimization / parallelization: • Enable fast computation of large amount of power flow. Performance can be further increased on new platform: • Tracking new development in CPU micro-architecture. • GPU: small, less powerful but many more cores. Applications of fast distribution power flow solver: • Probabilistic monitor of distribution system • Fast time series solution; statistical analysis… Acknowledgement & Publications • This work is supported by NSF through awards 0931978 and 0702386. • Related Publications: 1. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Distribution Power Flow,” The 43rd North American Power Symposium (NAPS), Boston, USA, Aug 2011. 2. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems,” IEEE PES General Meeting 2012, San Diego, CA, USA.