Document 10752432

advertisement
A Multi-core High Performance Computing Framework for
Probabilistic Solutions of Distribution Systems
Tao Cui and Franz Franchetti
Email: tcui@ece.cmu.edu
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
Motivation and Background
Methods & System
 Problems from Power System
 Distribution Power Flow Model
• Uncertainties in electric power distribution system:
 I abc n  c  Vabc m   d    I abc m
Vabc m   A  Vabc n   B    I abc m
• Forward / Backward Sweep
• Wind power, DG, PHEV, responsive load, etc.
• Small complex Matrix-Vector Mult
• Lack of real time measurements:
Latest Results
 Performance Result: High Performance Computing Engine
Speed (Gflops)
Generate Random Samples
• Probabilistic Model for Uncertainties:
C++STL Speed Optimized
C++STL Intel Fully Optimized
C Array
C Pattern
SSE
SSE+Pthread
>40x
120
100
~
500x
1
Estimate Density
80
0.14
Load Flow Solver
1
60
0.12
0.1
0.1
• Wind forecast model, EV pattern, Load profile.
0.5
Load Flow Solver
0
1
… 2~64 parallel
threads ...
• A Probabilistic Solution of Distribution System
20
0.06
0.01
0.04
0
4
0.02
0.5
0
40
0.08
1
0.5
Load Flow Solver
0
Optimized Scalar
SIMD (SSE or AVX)
Multi-Core
140
10
Solve Load Flows
Speed (Gflop/s)
180
160
 Monte Carlo Simulation:
• AMI, Smart metering may help, but still in preliminary stage.
Performance on Different Machines for IEEE37 Testfeeder
Performance 4 core Core2Extreme 2.66 GHz
0
-10
-5
0
5
10
16
64
Node (Bus) Numbers
15
Core2Extreme
2.66GHz
(4-core, SSE)
256
Xeon X5680
3.33GHz
(6-core, SSE)
Core i5-2400
3.10GHz
(4-core, AVX)
2 Xeon7560
2.27GHz
(16-core, SSE)
Translate speed (Gflop/s) into run time:
Input with
Randomness
Physically
Deterministic
Method: Load
Flow
Output result
with probability
features
 Distribution System Probabilistic Monitoring System
x
Algorithm
Campus Network
Internet
Web Server
@ ECE CMU
Web UI
y
Problem Size
(IEEE Test Feeders)
Approx.
flops
Approx. Time /
Core2 Extreme
Approx. Time
/ Core i5
Baseline. C++ ICC –o3
(~300x faster then Matlab)
IEEE37: one iteration
12 K
~ 0.3 us
~ 0.3 us
IEEE37: one load flow (5 Iter)
60 K
~ 1.5 us
~ 1.5 us
IEEE37: 1 million load flow
60 G
~<2s
~<1s
~ 60 s (>5 hrs Matlab)
IEEE123: 1 million load flow
200 G
~ < 10 s
~ < 3.5 s
~ 200 s (>15 hrs Matlab)
Comments
0.01 kVA error
SCADA Interval:
4 seconds
 Monte Carlo Results and Web User Interface
Monte Carlo solver
Simulated Meters Using Computing Nodes
Active Role of Distribution System in Smart Grid
100 run
Phase A
*Original figure by Andrew Hsu (EESG)
Web UI
 Advances in High Performance Computing
Multi-core Desktop
Computing Server
Web UI
Data Aggregator Meter
Meter
10000 run
100000 run
1000000 run
10000000 run
20
20
20
20
20
15
15
15
15
15
15
10
10
10
10
10
10
5
5
5
5
5
5
0
0
0
0
0
0.8
Phase B
Web UI
1000 run
20
0.9
1
0.8
0.9
1
0.8
0.9
1
0.8
0.9
1
0
0.8
0.9
1
0.8
100
100
100
100
100
100
80
80
80
80
80
80
60
60
60
60
60
60
40
40
40
40
40
40
20
20
20
20
20
20
0
0
0
0
0
0
0.9
1
Out: Voltage on
Node 738
-3
4
0.98
1.02
0.98
1
1.02
0.98
1
1.02
0.98
1
1.02
0.98
1
1.02
0.98
1
1.02
3
2
Year 2010:
Phase C
 Aggressive Code Optimizations on Monte Carlo Solver
Image: Dell
Workstation
+ GPU accelerator
1
x 10
• Data structure optimization
+
Core i7 + Tesla C1060
1Tflop/s peak performance
$5,000 class
• Algorithm level optimization
Code Generation (Spiral)
Substation
N0
Image: Nvidia
N2
Load
N4 N5
Load
Load
Load
pN0
pN5
pN1
pN4
pN2
pN3
pN3
pN2
pN4
pN1
pN5
pN0
 High Performance Computing Enabled Power System Solution
• Monte Carlo based Probabilistic Solution of Distribution System:
• “Golden standard” for probabilistic power flow
• Well fitted problem on modern computing platform
• Extensible to contingency analysis, steady state time-series…
An Affordable Supercomputing Center for Distribution Substation
Pseudo-code for Matrix-Vector Multiplication:
Matrix Pattern:
N5 Data:
Parameters
Current
Voltage
N4 Data:
Parameters
Current
Voltage
r
0
0
0
r
0
0
0
r
i
0
0
0
i
0
0
0
i
c11 c12 c13
c21 c22 c23
c31 c32 c33
Array Access
r: real number,
i: imaginary number
c: complex number
• Require knowledge in specific domain & computer architecture
• Potential Benefit: Moore’s law!
Consecutive
in Data Array
...
• NO free speedup, parallelism, memory , special instructions
N3
Backward
Sweep
Pointer
Array
...
• Fully utilizing the computing power is a difficult problem
Load
N1
Forward
Sweep
Pointer
Array
• SIMD vectorization
// input[0~2]: vector real part;
// input[3~5]: vector imaginary part;
// constant: parameters (r or c on left).
switch (matrix_pattern){
case real_diagonal_equal_matrix:
output[0] = *constant * input[0];
...
output[5] = *constant * input[5];
break;
case imag_diagonal_equal_matrix:
output[0] = -*constant * input[3];
output[1] = -*constant * input[4];
output[2] = -*constant * input[5];
output[3] = *constant * input[0];
output[4] = *constant * input[1];
output[5] = *constant * input[2];
break;
//cases for other frequent matrix patterns
...
default: //default full 3x3 complex matrix
...
}
• Multithreading:
Switch Buffer A,B
Scalar Register:
1 float
Scalar Instructions
Sample
FBS Load Flow
Result
Vector Register:
4 floats in SSE
Computing Thd N
RNG & Load Flow in Buf AN
RNG & Load Flow in Buf BN
Computing Thd 2
RNG & Load Flow in Buf A2
RNG & Load Flow in Buf B2
Computing Thd 1
RNG & Load Flow in Buf A1
RNG & Load Flow in Buf B1
SIMD Instructions
Scheduling Thd 0
Sample
FBS Load Flow
Result
KDE in all Buf Bs Result Out
RNG: Random Number Generator
KDE: Kernel Density Estimation
Real Time Interval
(SCADA Interval)
Sync
Signal Out
KDE in all Buf As Result Out
Sync Signal
Squeezing Computation Power out of the Computer Architecture.
Push Performance to the Hardware Peak.
150
150
150
150
150
150
100
100
100
100
100
100
50
50
50
50
50
50
0
0.93
0.94 0.95
voltage, (p.u.)
0
0.93
0.94 0.95
voltage, (p.u.)
0
0.93
0.94 0.95
voltage, (p.u.)
0
0.93
0.94 0.95
voltage, (p.u.)
0
0.93
0.94 0.95
voltage, (p.u.)
0
0.93
1
0
-400
0.94 0.95
voltage, (p.u.)
-200
0
200
Active power, kw
400
In: Active Power
P~ u=0,std=100kw
on Phase A of
Node 738,711,741
Conclusions & Future Work
 Program optimization / parallelization:
• Enable fast computation of large amount of power flow.
 Performance can be further increased on new platform:
• Tracking new development in CPU micro-architecture.
• GPU: small, less powerful but many more cores.
 Applications of fast distribution power flow solver:
• Probabilistic monitor of distribution system
• Fast time series solution; statistical analysis…
Acknowledgement & Publications
• This work is supported by NSF through awards 0931978 and 0702386.
• Related Publications:
1. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Distribution Power Flow,”
The 43rd North American Power Symposium (NAPS), Boston, USA, Aug 2011.
2. T. Cui, F. Franchetti, “A Multi-Core High Performance Computing Framework for Probabilistic Solutions of Distribution Systems,”
IEEE PES General Meeting 2012, San Diego, CA, USA.
Download