Opportunities for Machine Learning in Stochastic Optimization, with Applications in Energy

advertisement
Opportunities for Machine Learning in Stochastic
Optimization, with Applications in Energy
Resource Planning
Cornell University
Computational Sustainability Seminar
March 5, 2010
.
Warren Powell
CASTLE Laboratory
Princeton University
http://www.castlelab.princeton.edu
© 2010 Warren B. Powell, Princeton University
Slide 1
Research challenges in energy
R&D portfolio optimization
Stochastic storage problems
» Making commitments for wind energy in the presence of storage and
pricing uncertainties
» How to allocate energy from wind and sun to different types of
storage options
Learning by doing – How to plan tax subsidies to encourage
adoption, given the uncertainty of the evolution of
technology in the marketplace.
Planning technology investment and replacement
Multidecade investment planning under uncertainty
regarding energy technology, tax policy, commodity prices
and our understanding of climate change.
R&D Portfolio optimization
Ceramic (solid oxide) electrolyte
Used for stationary power production; high running temps allow use of CH4
and other non-H2 gases
Research areas: anode, cathode, electrolyte, bipolar plates, seal and pressure
vessel
R&D Portfolio Optimization
R&D Portfolio optimization
Component
Technology
Parameter
Anode
Surface Area
 ASA
Power Density
 APD
Production
Cost
 ACOST
Cathode
Surface Area
 CSA
Power Density
 CPD
Production
Cost
 CCOST
Electrolyte
Reaction
Stability
 ERS
Degradation
 EDEG
Production
Cost
 ECOST
Bipolar Plates
Temperature
Stability

Conductivity

Production
Cost
 BCOST
Seal
Temperature
Stability
 STS
Chemical
Stability
 SCS
Production
Cost
 SCOST
Pressure
Vessel
Design
Production
Cost
 PCOST
TS
B
 PDES
Technology
----
Parameter
----
R&D Portfolio Optimization
CON
B
Technology
Parameter
R&D Portfolio optimization
One-stage portfolio optimization problem
x   0,1,1,0,0,0,0,1,0,1,1,1,0,....  Portfolio of proposals to support
min x EF  x,W 
1081 portfolios
Random outcome of research
Nonlinear, nonseparable cost function
Optimal control of wind and storage
Wind
» Varies with multiple
freqeuencies (seconds,
hours, days, seasonal).
» Spatially uneven, generally
not aligned with population
centers.
Solar
» Shines primarily during the
day (when it is needed), but
not entirely reliably.
» Strongest in
south/southwest.
Optimal control of wind and storage
30 days
1 year
Optimal control of wind and storage
Hydroelectric
Batteries
Flywheels
Ultracapacitors
Optimal control of wind and storage
The optimization of energy flows into and out of storage
may be best formulated as a policy optimization problem:
min   E   t C  St , X  ( St ) 
T
t 0
Policy, consisting of a
vector of tunable
parameters.
State variable capturing energy,
pressure, latency, age of energy
in each storage device.
Decision function
parameterized by tunable
parameters in 
» How do we formulate tunable decision functions (policies) and
how do we tune multidimensional vectors of parameters?
Energy resource modeling
Need to plan long term energy investments…
Tax policy
2010
2015
Solar panels
Batteries
Price of oil
2020
Carbon capture and
sequestration
2025
2030
Climate change
Energy resource modeling
… in the presence of hourly wind variations …
Energy resource modeling
... and storage.
Water in reservoir
Rainfall
Demand
Energy resource modeling
Stochastic, multiscale policy modeling
» We need to plan energy use and storage in the presence
of hourly wind variations, seasonal rainfall and
demand, while optimizing energy investments over a
multidecade horizon.
Vt (St )  min x  C(St , x   EVt 1  St 1 (St , xt ,Wt 1 ) 
Amount in storage, state
of wind, rainfall, demand,
prices, and current energy
investments.
How much to store
Random information about
wind, rain, demand, prices,
changes in energy technology,
policy, climate.
and where, what technologies
to invest in.
Energy resource modeling
Notes:
» 200,000 time periods for full policy model.
» 20 million variables for a deterministic linear program.
» Many sources of uncertainty:
• Fine-grained:
– Hourly variations in wind, solar, demand
– Daily variations in prices, rainfall, snowmelt
– Seasonal variations in weather, prices and demand
• Coarse-grained:
– Breakthrough in batteries?
– Can we sequester carbon?
– Is there a carbon tax?
– What is our understanding of climate change?
Outline
Dynamic programming and the curses of
dimensionality
Introduction to ADP
Energy resource modeling
Machine learning for general value functions
The exploration vs. exploitation problem
Outline
Dynamic programming and the curses of
dimensionality
Introduction to ADP
Energy resource modeling
Machine learning for general value functions
The exploration vs. exploitation problem
Introduction to dynamic programming
Bellman’s optimality equation:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 (St 1 ) | St 
xX
States:
» R&D portfolio optimization – Vector of several dozen technology
parameters
» Wind energy and storage – Vector of ~dozen parameters
describing storage, wind history and price history.
» Energy investment policy – Vector of energy investments, energy
storage, and the “state of the world” – several dozen (aggregate) to
several thousand (disaggregate) dimensions.
Slide 17
Introduction to dynamic programming
Bellman’s optimality equation:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 (St 1 ) | St 
xX
Three curses
Problem: Curse of dimensionality
State space
Outcome space
Action space (feasible region)
Slide 18
Introduction to dynamic programming
Bellman’s optimality equation:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 (St 1 ) | St 
xX
Goal: We would like to have a robust method for
solving general classes of these problems:
» Vector-valued states, actions and information.
» States may have discrete, continuous and categorical
elements.
» Action/decision vectors may have dozens to thousands
of dimensions.
» No tunable parameters!
Slide 19
Introduction to dynamic programming
The computational challenges:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 (St 1 ) | St 
xX
How do we find Vt 1 ( St 1 )?
How do we compute the expectation?
How do we find the optimal solution?
Slide 20
Introduction to ADP
Classical ADP
» Most applications of ADP focus on the challenge of
handling multidimensional state variables
» Start with
Vt ( St )  max  Ct ( St , xt )  E Vt 1 (St 1 ) | St 
xX
» Now replace the value function with some sort of
approximation
Vt 1 ( St 1 )  Vt 1 ( St 1 ) 
 
f F
f
f
( St 1 )
Introduction to ADP
But this does not solve our problem
» Assume we have an approximate value function.
» We still have to solve a problem that looks like


Vt ( St )  max  Ct ( St , xt )  E   f  f ( St 1 ) 
xX
f F


» This means we still have to deal with a maximization
problem (might be a linear, nonlinear or integer
program) with an expectation.
The post-decision state
New concept:
» The classical dynamic system:
St 1  S M  St , xt ,Wt 1 
» The “pre-decision” state variable:
St  The information required to make a decision xt
» The “post-decision” state variable:
Stx  The state of what we know immediately after we
make a decision.
Slide 23
The post-decision state
Classical form of Bellman’s equation:
Vt ( St )  max  Ct ( St , xt )  E Vt 1 (St 1 ) | St 
xX
Bellman’s equations around pre- and post-decision
states:
» Optimization problem (making the decision):

Vt ( St )  max x Ct ( St , xt )  Vt x  S M , x (St , xt ) 

Note: this problem is deterministic!
» Simulation problem (the effect of exogenous
information):
Vt x ( Stx )  E Vt 1 ( S M ,W ( Stx ,Wt 1 )) | Stx 
Slide 24
The post-decision state
Challenges
» For most practical problems, we are not going to be
able to compute Vt x ( Stx ).
Vt ( St )  max x  Ct ( St , xt )  Vt x ( Stx ) 
» Concept: replace it with an approximation Vt ( Stx ) and
solve
Vt ( St )  max x  Ct ( St , xt )  Vt ( Stx ) 
» So now we face:
• What should the approximation look like?
• How do we estimate it?
Slide 25
The post-decision state
Value function approximations:
» Linear (in the resource state):
Vt ( Rtx )   vta  Rtax
aA
» Piecewise linear, separable:
Vt ( Rtx )   Vta ( Rtax )
aA
» Indexed PWL separable:
Vt ( Rtx )   Vta  Rtax | ( featurest ) 
aA
Slide 26
Approximate dynamic programming
t
Slide 27
Approximate dynamic programming
Slide 28
Approximate dynamic programming
Slide 29
Approximate dynamic programming
With luck, your objective function improves
1900000
Objective function
Objective function
1800000
1700000
1600000
1500000
1400000
1300000
1200000
0
100
200
300
400
500
Iterations
Iterations
600
700
800
900
1000
Approximate dynamic programming
Step 1: Start with a pre-decision state S tn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt  max x  Ct ( St , xt )  Vt
(S
( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn
Recursive
statistics
n
Step 4: Obtain Monte Carlo sample of Wt ( ) and
Simulation
compute the next pre-decision state:
Stn1  S M ( Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
Slide 31
Approximate dynamic programming
Step 1: Start with a pre-decision state S tn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt  max x  Ct ( St , xt )  Vt
(S
( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn
Recursive
statistics
n
Step 4: Obtain Monte Carlo sample of Wt ( ) and
Simulation
compute the next pre-decision state:
Stn1  S M ( Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
Slide 32
Approximate dynamic programming
Research challenges
» How do we approximate the value function?
» How do we update the value function with new
information (stepsizes)?
» How do we solve the exploration-exploitation
problem so we sample the right states to generate
the best approximation?
Outline
Dynamic programming and the curses of
dimensionality
Introduction to ADP
Energy resource modeling
Machine learning for general value functions
The exploration vs. exploitation problem
Energy resource modeling
The dispatch problem
Slide 35
Energy resource modeling
Storage
» We can hold energy in water reservoirs, batteries and flywheels:
Hour t
Hour t+1
Slide 36
Energy resource modeling
Storage
» We can hold energy in water reservoirs, batteries and flywheels:
Hour t
Value of holding water in the reservoir
for future time periods.
Slide 37
Energy resource modeling
Slide 38
Energy resource modeling
Hour
2008
1
2
3
4
8760
2009
1
2
Slide 39
Energy resource modeling
Hour
2008
1
2
3
4
8760
2009
1
2
Slide 40
Energy resource modeling
2008
2009
Slide 41
Energy resource modeling
2008
2009
oil
 2009
wind
 2009
nat gas
 2009
coal
 2009
Slide 42
Energy resource modeling
2009
Vt ( Rt )
oil
 2009
Value
wind
 2009
wind
 2009
nat gas
 2009
Amount of resource
coal
 2009
Slide 43
Energy resource modeling
2008
2009
2010
2011
2038
Slide 44
Energy resource modeling
2008
2009
2010
2011
2038
Slide 45
Energy resource modeling
2008
2009
~5 seconds
~5 seconds
2010
~5 seconds
2011
~5 seconds
2038
~5 seconds
Slide 46
SMART-Stochastic, multiscale model
Use statistical methods to learn the
value of resources in the future.
Resources may be:
Vt ( Rt )
» Stored energy
» Storage capacity
• Batteries
• Flywheels
• Compressed air
Value
• Hydro
• Flywheel energy
• …
» Energy transmission capacity
• Transmission lines
• Gas lines
• Shipping capacity
» Energy production sources
Amount of resource
• Wind mills
• Solar panels
• Nuclear power plants
Slide 47
Energy resource modeling
Optimal from linear program for deterministic model
Optimal from linear program
Reservoir level
Rainfall
Demand
Energy resource modeling
 Approximate dynamic programming
ADP solution
Reservoir level
Rainfall
Demand
49
Energy resource modeling
 Optimal from linear program for deterministic model
Optimal from linear program
Reservoir level
Rainfall
Demand
Energy resource modeling
 Approximate dynamic programming
ADP solution
Reservoir level
Rainfall
Demand
Energy resource modeling
ADP vs optimal reservoir levels for stochastic rainfall
9000
Reservoir level
8000
ADP at last iteration
Optimal for individual
scenarios
7000
6000
5000
4000
3000
2000
1000
0
0
100
200
300
400
500
600
700
800
Time period
Slide 52
Energy resource modeling
Notes:
» Algorithm is provably convergent to optimal policy as
long as there is only one type of storage.
J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming
Algorithm for the Energy Dispatch Problem with Grid- Level Storage,” under
view SIAM J. Control and Optimization.
» Algorithm uses no exploration. Convergence comes
from concavity of value function.
» But we would like to solve much more general
problems.
Outline
Dynamic programming and the curses of
dimensionality
Introduction to ADP
Energy resource modeling
Machine learning for general value functions
The exploration vs. exploitation problem
Machine learning for general value functions
Most papers in approximate dynamic
programming use one of four strategies:
» Lookup tables
Estimate V ( S ) for each discrete state S
» Basis functions
Vt ( St )  Vt ( St ) 
 
f F
f
f
( St )
» Neural networks – Popular in engineering.
» Nonparametric regression – Growing popularity
Lookup tables
Strategies for lookup table representations
» One value for each state
• This is what blows up with the curse of dimensionality.
» Aggregate states, and estimate the value for each
aggregated state (related method is tiles)
• What level of aggregation? The best level of aggregation
changes with iterations, and depends on the state.
» Aggregate states, and then steadily use more
disaggregate representations
• Right level of aggregation depends on how often you visit a
state
• How do we determine the best rate to move to more
disaggregate representations?
Hierarchical aggregation
Hierarchical aggregation
Hierarchical aggregation
Hierarchical aggregation
Approximating value functions
Hierarchical aggregation
» Standard strategy is to use linear combination:
 sn    g ,n  sg ,n
gG
» Update using stochastic approximation procedure:
 g ,n 1   g ,n   nf ( n ,  n 1 )
» Challenges:
• Single set of weights (does not depend on state)
• Scaling problems!
Approximating value functions
Hierarchical aggregation
» Approximate a function using a weighted sum of
estimates at different levels of aggregation:
 sn   wsg ,n  sg ,n
gG
» Weights are computed using the inverse of total
variation:
2
2 1
g ,n
g ,n
 s    s 

wsg ,n 
2
2 1
g ', n
g ', n
  s     s 

g 'G



» Weights depend on s, which means that there can be
millions of weights.
Hierarchical aggregation
0.35
Weight on most disaggregate level
0.3
Weights
0.25
Weights
0.2
0.15
0.1
0.05
Weight on most aggregate levels
0
0
200
400
600
Iteration
Iterations
800
1000
1200
Multiple levels of aggregation
1900000
1850000
1800000
Objective function
1750000
1700000
1650000
Aggregate
1600000
1550000
Disaggregate
1500000
1450000
1400000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Multiple levels of aggregation
1900000
Weighted Combination
1850000
1800000
Objective function
1750000
1700000
1650000
Aggregate
1600000
1550000
Disaggregate
1500000
1450000
1400000
0
100
200
300
400
500
Iterations
600
700
800
900
1000
Basis functions for continuous states
Imagine that we know a set of basis functions  f ( S )
such that
V (S |  )    f  f (S )
f F
» Can we design an implementable algorithm that will
find  ?
• Yes, but approximate value iteration does not work!
Approximate value iteration
Step 1: Start with a pre-decision state S tn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
n
n 1
M ,x
n
vˆt  max x  Ct ( St , xt )  Vt
(S
( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn
Recursive
statistics
n
Step 4: Obtain Monte Carlo sample of Wt ( ) and
Simulation
compute the next pre-decision state:
Stn1  S M ( Stn , xtn ,Wt 1 ( n ))
Step 5: Return to step 1.
Slide 65
Approximate policy iteration
Step 1: Start with a pre-decision state S tn
Step 2: Inner loop: Do for m=1,…,M:
Step 2a: Solve the deterministic optimization using
an approximate value function:
vˆ m  max x  C ( S m , x)  V n 1 ( S M , x ( S m , x)) 
to obtain .
Step 2b: Update the value function approximation
V n 1,m ( S x ,m )  (1   m1 )V n 1,m1 ( S x ,m )   m1vˆ m
Step 2c: Obtain Monte Carlo sample of W ( m ) and
compute the next pre-decision state:
S m1  S M ( S m , x m ,W ( m ))
Step 3: Update V n ( S ) using V n1,M ( S ) and return to step 1.
Slide 66
Dirichlet mixture model
Dirichlet process mixtures of generalized linear
regression models (DP-GLM).
» Cluster states and estimate linear regressions around
each cluster:
Dirichlet mixture model
Features:
» Handles continuous, discrete and categorical states.
» Naturally adapts from early iterations with few observations to
later iterations with more observations.
» Naturally adapts to heteroscedastic data.
Merging machine learning and optimization
The value of a resource
depends on the “state of
the world.”
» Is there a carbon tax?
» What is the state of battery
research?
» Have there been major new
oil discoveries?
» What is the price of oil?
» Did the international
community adopt strict
limits on carbon emissions?
» Has their been advances in
our understanding of
climate change?
• Instead of one piecewise linear
value function for each resource
and time period…
• We need one for each state of the
world. There can be thousands of
these.
Outline
Dynamic programming and the curses of
dimensionality
Introduction to ADP
Energy resource modeling
Machine learning for general value functions
The exploration vs. exploitation problem
»
»
»
»
Peter Frazier (faculty member in ORIE at Cornell)
Ilya Ryzhov (graduate of ORIE)
Warren Scott (Ph.D. student at Princeton)
Martijn Mes (University Twente)
The exploration vs. exploitation problem
The decision depends on estimate of future value
V 0 ( MN )  0
V (CO)  0
0
V 0 ( NY )  0
$350
$150
V (CA)  0
0
$450
$300
The exploration vs. exploitation problem
The decision depends on estimate of future value
V 0 ( MN )  0
V (CO)  0
0
V 0 ( NY )  0
$350
$150
V (CA)  0
0
$450
$300
V 1 (TX )  450
The exploration vs. exploitation problem
The decision depends on estimate of future value
V 0 ( MN )  0
$180
V (CO)  0
0
V 0 (CA)  0
$400
$600
$125
V 1 (TX )  450
V 0 ( NY )  0
The exploration vs. exploitation problem
The decision depends on estimate of future value
V 0 ( MN )  0
$180
V (CO)  0
0
V 0 (CA)  0
V 0 ( NY )  600
$400
$600
$125
V 1 (TX )  450
The exploration vs. exploitation problem
The decision depends on estimate of future value
V 0 ( MN )  0
$400
V (CO)  0
0
$200
$150
V 0 (CA)  800
$250
V 1 (TX )  450
V 0 ( NY )  500
The exploration vs. exploitation problem
The decision depends on estimate of future value
V 0 ( MN )  0
V (CO)  0
0
V 0 ( NY )  500
$385
$800
V (CA)  800
0
$125
$275
V 1 (TX )  450
The exploration vs. exploitation problem
Current research:
» We are focusing on the problem of maximizing an
unknown function:
max x EF ( x,W )
» We assume a measurement x is “expensive.”
» Important problem dimensions:
• The complexity of the measurement x.
• The nature of the belief structure.
• The complexity of the decision problem which uses the
information we are collecting.
The exploration vs. exploitation problem
The knowledge gradient:
» Assume you can make only one measurement, after which you have to
make a final choice (the implementation decision).
» What choice would you make now to maximize the expected value of
the implementation decision?

 Change in estimate
 of value of option

5 due to

measurement.

1
2
3
4
5
The exploration vs. exploitation problem
The knowledge gradient
» The knowledge gradient is the expected value of a single
measurement x, given by
 xKG  E max y F ( y, K ( x))  max y F ( y, K )
MarginalExpectation
value
of optimization
measuring
xUpdated
(the
knowledge
gradient)
Optimization
given what
we
New
problem
over different
measurement
outcomes
Implementation
decision
knowledge
stateproblem
given
measurement
x know
Knowledge
state
• x= measurement decision
• y= implementation decision
» Knowledge gradient policy chooses the measurement with the
highest marginal value.
» This can be viewed as a kind of coordinate ascent algorithm.
79
The knowledge gradient with correlated beliefs
An important problem class involves correlated beliefs –
measuring one alternative tells us something other
alternatives.
1
2
3
4
5
measure
...these beliefs change too. here...
80
The exploration vs. exploitation problem
Approximate this function
using hierarchical
aggregation. From this
model, we can infer
correlated beliefs.
Performance of knowledge
gradient using correlated
beliefs for this surface.
Exploiting correlated
beliefs
Two-dimensional surface
Concentration
Temperature
82
Measuring two-dimensional surfaces
Initially we think the concentration is the same everywhere:
Estimated performance
Knowledge gradient
» We want to measure the value where the knowledge gradient is the
highest. This is the measurement that teaches us the most.
Measuring two-dimensional surfaces
After four measurements:
Estimated performance
Measurement
Knowledge gradient
Value of another measurement
New optimum
at same location.
» Whenever we measure at a point, the value of another
measurement at the same point goes down. The knowledge
gradient guides us to measuring areas of high uncertainty.
Measuring two-dimensional surfaces
After five measurements:
Estimated performance
Knowledge gradient
Measuring two-dimensional surfaces
After six samples
Estimated performance
Knowledge gradient
Measuring two-dimensional surfaces
After seven samples
Estimated performance
Knowledge gradient
Measuring two-dimensional surfaces
After eight samples
Estimated performance
Knowledge gradient
Measuring two-dimensional surfaces
After nine samples
Estimated performance
Knowledge gradient
Measuring two-dimensional surfaces
After ten samples
Estimated performance
Knowledge gradient
Measuring two-dimensional surfaces
After 10 measurements, our estimate of the surface:
Estimated performance
True concentration
Optimal learning in physical sciences
Materials research
» How do we find the best
material for converting
sunlight to electricity?
» What is the best battery
design for storing
energy?
» We need a method to sort
through potentially
thousands of
experiments.
Optimal learning in physical sciences
Designing molecules
» X and Y are sites where we can hang substituents to change the
behavior of the molecule
Optimal learning in physical sciences
We express our belief using a linear, additive QSAR model
» X m   X ijm ij  Indicator variable for molecule m.
» Y  0    ij X ij
sites i substituents j
Optimal learning in physical sciences
Learning the best of 10,000 compounds.
» Results from 15 sample paths
Related documents
Download