Opportunities for Machine Learning in Stochastic Optimization, with Applications in Energy Resource Planning Cornell University Computational Sustainability Seminar March 5, 2010 . Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2010 Warren B. Powell, Princeton University Slide 1 Research challenges in energy R&D portfolio optimization Stochastic storage problems » Making commitments for wind energy in the presence of storage and pricing uncertainties » How to allocate energy from wind and sun to different types of storage options Learning by doing – How to plan tax subsidies to encourage adoption, given the uncertainty of the evolution of technology in the marketplace. Planning technology investment and replacement Multidecade investment planning under uncertainty regarding energy technology, tax policy, commodity prices and our understanding of climate change. R&D Portfolio optimization Ceramic (solid oxide) electrolyte Used for stationary power production; high running temps allow use of CH4 and other non-H2 gases Research areas: anode, cathode, electrolyte, bipolar plates, seal and pressure vessel R&D Portfolio Optimization R&D Portfolio optimization Component Technology Parameter Anode Surface Area ASA Power Density APD Production Cost ACOST Cathode Surface Area CSA Power Density CPD Production Cost CCOST Electrolyte Reaction Stability ERS Degradation EDEG Production Cost ECOST Bipolar Plates Temperature Stability Conductivity Production Cost BCOST Seal Temperature Stability STS Chemical Stability SCS Production Cost SCOST Pressure Vessel Design Production Cost PCOST TS B PDES Technology ---- Parameter ---- R&D Portfolio Optimization CON B Technology Parameter R&D Portfolio optimization One-stage portfolio optimization problem x 0,1,1,0,0,0,0,1,0,1,1,1,0,.... Portfolio of proposals to support min x EF x,W 1081 portfolios Random outcome of research Nonlinear, nonseparable cost function Optimal control of wind and storage Wind » Varies with multiple freqeuencies (seconds, hours, days, seasonal). » Spatially uneven, generally not aligned with population centers. Solar » Shines primarily during the day (when it is needed), but not entirely reliably. » Strongest in south/southwest. Optimal control of wind and storage 30 days 1 year Optimal control of wind and storage Hydroelectric Batteries Flywheels Ultracapacitors Optimal control of wind and storage The optimization of energy flows into and out of storage may be best formulated as a policy optimization problem: min E t C St , X ( St ) T t 0 Policy, consisting of a vector of tunable parameters. State variable capturing energy, pressure, latency, age of energy in each storage device. Decision function parameterized by tunable parameters in » How do we formulate tunable decision functions (policies) and how do we tune multidimensional vectors of parameters? Energy resource modeling Need to plan long term energy investments… Tax policy 2010 2015 Solar panels Batteries Price of oil 2020 Carbon capture and sequestration 2025 2030 Climate change Energy resource modeling … in the presence of hourly wind variations … Energy resource modeling ... and storage. Water in reservoir Rainfall Demand Energy resource modeling Stochastic, multiscale policy modeling » We need to plan energy use and storage in the presence of hourly wind variations, seasonal rainfall and demand, while optimizing energy investments over a multidecade horizon. Vt (St ) min x C(St , x EVt 1 St 1 (St , xt ,Wt 1 ) Amount in storage, state of wind, rainfall, demand, prices, and current energy investments. How much to store Random information about wind, rain, demand, prices, changes in energy technology, policy, climate. and where, what technologies to invest in. Energy resource modeling Notes: » 200,000 time periods for full policy model. » 20 million variables for a deterministic linear program. » Many sources of uncertainty: • Fine-grained: – Hourly variations in wind, solar, demand – Daily variations in prices, rainfall, snowmelt – Seasonal variations in weather, prices and demand • Coarse-grained: – Breakthrough in batteries? – Can we sequester carbon? – Is there a carbon tax? – What is our understanding of climate change? Outline Dynamic programming and the curses of dimensionality Introduction to ADP Energy resource modeling Machine learning for general value functions The exploration vs. exploitation problem Outline Dynamic programming and the curses of dimensionality Introduction to ADP Energy resource modeling Machine learning for general value functions The exploration vs. exploitation problem Introduction to dynamic programming Bellman’s optimality equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 (St 1 ) | St xX States: » R&D portfolio optimization – Vector of several dozen technology parameters » Wind energy and storage – Vector of ~dozen parameters describing storage, wind history and price history. » Energy investment policy – Vector of energy investments, energy storage, and the “state of the world” – several dozen (aggregate) to several thousand (disaggregate) dimensions. Slide 17 Introduction to dynamic programming Bellman’s optimality equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 (St 1 ) | St xX Three curses Problem: Curse of dimensionality State space Outcome space Action space (feasible region) Slide 18 Introduction to dynamic programming Bellman’s optimality equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 (St 1 ) | St xX Goal: We would like to have a robust method for solving general classes of these problems: » Vector-valued states, actions and information. » States may have discrete, continuous and categorical elements. » Action/decision vectors may have dozens to thousands of dimensions. » No tunable parameters! Slide 19 Introduction to dynamic programming The computational challenges: Vt ( St ) max Ct ( St , xt ) E Vt 1 (St 1 ) | St xX How do we find Vt 1 ( St 1 )? How do we compute the expectation? How do we find the optimal solution? Slide 20 Introduction to ADP Classical ADP » Most applications of ADP focus on the challenge of handling multidimensional state variables » Start with Vt ( St ) max Ct ( St , xt ) E Vt 1 (St 1 ) | St xX » Now replace the value function with some sort of approximation Vt 1 ( St 1 ) Vt 1 ( St 1 ) f F f f ( St 1 ) Introduction to ADP But this does not solve our problem » Assume we have an approximate value function. » We still have to solve a problem that looks like Vt ( St ) max Ct ( St , xt ) E f f ( St 1 ) xX f F » This means we still have to deal with a maximization problem (might be a linear, nonlinear or integer program) with an expectation. The post-decision state New concept: » The classical dynamic system: St 1 S M St , xt ,Wt 1 » The “pre-decision” state variable: St The information required to make a decision xt » The “post-decision” state variable: Stx The state of what we know immediately after we make a decision. Slide 23 The post-decision state Classical form of Bellman’s equation: Vt ( St ) max Ct ( St , xt ) E Vt 1 (St 1 ) | St xX Bellman’s equations around pre- and post-decision states: » Optimization problem (making the decision): Vt ( St ) max x Ct ( St , xt ) Vt x S M , x (St , xt ) Note: this problem is deterministic! » Simulation problem (the effect of exogenous information): Vt x ( Stx ) E Vt 1 ( S M ,W ( Stx ,Wt 1 )) | Stx Slide 24 The post-decision state Challenges » For most practical problems, we are not going to be able to compute Vt x ( Stx ). Vt ( St ) max x Ct ( St , xt ) Vt x ( Stx ) » Concept: replace it with an approximation Vt ( Stx ) and solve Vt ( St ) max x Ct ( St , xt ) Vt ( Stx ) » So now we face: • What should the approximation look like? • How do we estimate it? Slide 25 The post-decision state Value function approximations: » Linear (in the resource state): Vt ( Rtx ) vta Rtax aA » Piecewise linear, separable: Vt ( Rtx ) Vta ( Rtax ) aA » Indexed PWL separable: Vt ( Rtx ) Vta Rtax | ( featurest ) aA Slide 26 Approximate dynamic programming t Slide 27 Approximate dynamic programming Slide 28 Approximate dynamic programming Slide 29 Approximate dynamic programming With luck, your objective function improves 1900000 Objective function Objective function 1800000 1700000 1600000 1500000 1400000 1300000 1200000 0 100 200 300 400 500 Iterations Iterations 600 700 800 900 1000 Approximate dynamic programming Step 1: Start with a pre-decision state S tn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 ( Stx,1n ) (1 n 1 )Vt n11 ( Stx,1n ) n 1vˆtn Recursive statistics n Step 4: Obtain Monte Carlo sample of Wt ( ) and Simulation compute the next pre-decision state: Stn1 S M ( Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. Slide 31 Approximate dynamic programming Step 1: Start with a pre-decision state S tn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 ( Stx,1n ) (1 n 1 )Vt n11 ( Stx,1n ) n 1vˆtn Recursive statistics n Step 4: Obtain Monte Carlo sample of Wt ( ) and Simulation compute the next pre-decision state: Stn1 S M ( Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. Slide 32 Approximate dynamic programming Research challenges » How do we approximate the value function? » How do we update the value function with new information (stepsizes)? » How do we solve the exploration-exploitation problem so we sample the right states to generate the best approximation? Outline Dynamic programming and the curses of dimensionality Introduction to ADP Energy resource modeling Machine learning for general value functions The exploration vs. exploitation problem Energy resource modeling The dispatch problem Slide 35 Energy resource modeling Storage » We can hold energy in water reservoirs, batteries and flywheels: Hour t Hour t+1 Slide 36 Energy resource modeling Storage » We can hold energy in water reservoirs, batteries and flywheels: Hour t Value of holding water in the reservoir for future time periods. Slide 37 Energy resource modeling Slide 38 Energy resource modeling Hour 2008 1 2 3 4 8760 2009 1 2 Slide 39 Energy resource modeling Hour 2008 1 2 3 4 8760 2009 1 2 Slide 40 Energy resource modeling 2008 2009 Slide 41 Energy resource modeling 2008 2009 oil 2009 wind 2009 nat gas 2009 coal 2009 Slide 42 Energy resource modeling 2009 Vt ( Rt ) oil 2009 Value wind 2009 wind 2009 nat gas 2009 Amount of resource coal 2009 Slide 43 Energy resource modeling 2008 2009 2010 2011 2038 Slide 44 Energy resource modeling 2008 2009 2010 2011 2038 Slide 45 Energy resource modeling 2008 2009 ~5 seconds ~5 seconds 2010 ~5 seconds 2011 ~5 seconds 2038 ~5 seconds Slide 46 SMART-Stochastic, multiscale model Use statistical methods to learn the value of resources in the future. Resources may be: Vt ( Rt ) » Stored energy » Storage capacity • Batteries • Flywheels • Compressed air Value • Hydro • Flywheel energy • … » Energy transmission capacity • Transmission lines • Gas lines • Shipping capacity » Energy production sources Amount of resource • Wind mills • Solar panels • Nuclear power plants Slide 47 Energy resource modeling Optimal from linear program for deterministic model Optimal from linear program Reservoir level Rainfall Demand Energy resource modeling Approximate dynamic programming ADP solution Reservoir level Rainfall Demand 49 Energy resource modeling Optimal from linear program for deterministic model Optimal from linear program Reservoir level Rainfall Demand Energy resource modeling Approximate dynamic programming ADP solution Reservoir level Rainfall Demand Energy resource modeling ADP vs optimal reservoir levels for stochastic rainfall 9000 Reservoir level 8000 ADP at last iteration Optimal for individual scenarios 7000 6000 5000 4000 3000 2000 1000 0 0 100 200 300 400 500 600 700 800 Time period Slide 52 Energy resource modeling Notes: » Algorithm is provably convergent to optimal policy as long as there is only one type of storage. J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Energy Dispatch Problem with Grid- Level Storage,” under view SIAM J. Control and Optimization. » Algorithm uses no exploration. Convergence comes from concavity of value function. » But we would like to solve much more general problems. Outline Dynamic programming and the curses of dimensionality Introduction to ADP Energy resource modeling Machine learning for general value functions The exploration vs. exploitation problem Machine learning for general value functions Most papers in approximate dynamic programming use one of four strategies: » Lookup tables Estimate V ( S ) for each discrete state S » Basis functions Vt ( St ) Vt ( St ) f F f f ( St ) » Neural networks – Popular in engineering. » Nonparametric regression – Growing popularity Lookup tables Strategies for lookup table representations » One value for each state • This is what blows up with the curse of dimensionality. » Aggregate states, and estimate the value for each aggregated state (related method is tiles) • What level of aggregation? The best level of aggregation changes with iterations, and depends on the state. » Aggregate states, and then steadily use more disaggregate representations • Right level of aggregation depends on how often you visit a state • How do we determine the best rate to move to more disaggregate representations? Hierarchical aggregation Hierarchical aggregation Hierarchical aggregation Hierarchical aggregation Approximating value functions Hierarchical aggregation » Standard strategy is to use linear combination: sn g ,n sg ,n gG » Update using stochastic approximation procedure: g ,n 1 g ,n nf ( n , n 1 ) » Challenges: • Single set of weights (does not depend on state) • Scaling problems! Approximating value functions Hierarchical aggregation » Approximate a function using a weighted sum of estimates at different levels of aggregation: sn wsg ,n sg ,n gG » Weights are computed using the inverse of total variation: 2 2 1 g ,n g ,n s s wsg ,n 2 2 1 g ', n g ', n s s g 'G » Weights depend on s, which means that there can be millions of weights. Hierarchical aggregation 0.35 Weight on most disaggregate level 0.3 Weights 0.25 Weights 0.2 0.15 0.1 0.05 Weight on most aggregate levels 0 0 200 400 600 Iteration Iterations 800 1000 1200 Multiple levels of aggregation 1900000 1850000 1800000 Objective function 1750000 1700000 1650000 Aggregate 1600000 1550000 Disaggregate 1500000 1450000 1400000 0 100 200 300 400 500 Iterations 600 700 800 900 1000 Multiple levels of aggregation 1900000 Weighted Combination 1850000 1800000 Objective function 1750000 1700000 1650000 Aggregate 1600000 1550000 Disaggregate 1500000 1450000 1400000 0 100 200 300 400 500 Iterations 600 700 800 900 1000 Basis functions for continuous states Imagine that we know a set of basis functions f ( S ) such that V (S | ) f f (S ) f F » Can we design an implementable algorithm that will find ? • Yes, but approximate value iteration does not work! Approximate value iteration Step 1: Start with a pre-decision state S tn Step 2: Solve the deterministic optimization using Deterministic an approximate value function: optimization n n n 1 M ,x n vˆt max x Ct ( St , xt ) Vt (S ( St , xt )) to obtain xtn. Step 3: Update the value function approximation Vt n1 ( Stx,1n ) (1 n 1 )Vt n11 ( Stx,1n ) n 1vˆtn Recursive statistics n Step 4: Obtain Monte Carlo sample of Wt ( ) and Simulation compute the next pre-decision state: Stn1 S M ( Stn , xtn ,Wt 1 ( n )) Step 5: Return to step 1. Slide 65 Approximate policy iteration Step 1: Start with a pre-decision state S tn Step 2: Inner loop: Do for m=1,…,M: Step 2a: Solve the deterministic optimization using an approximate value function: vˆ m max x C ( S m , x) V n 1 ( S M , x ( S m , x)) to obtain . Step 2b: Update the value function approximation V n 1,m ( S x ,m ) (1 m1 )V n 1,m1 ( S x ,m ) m1vˆ m Step 2c: Obtain Monte Carlo sample of W ( m ) and compute the next pre-decision state: S m1 S M ( S m , x m ,W ( m )) Step 3: Update V n ( S ) using V n1,M ( S ) and return to step 1. Slide 66 Dirichlet mixture model Dirichlet process mixtures of generalized linear regression models (DP-GLM). » Cluster states and estimate linear regressions around each cluster: Dirichlet mixture model Features: » Handles continuous, discrete and categorical states. » Naturally adapts from early iterations with few observations to later iterations with more observations. » Naturally adapts to heteroscedastic data. Merging machine learning and optimization The value of a resource depends on the “state of the world.” » Is there a carbon tax? » What is the state of battery research? » Have there been major new oil discoveries? » What is the price of oil? » Did the international community adopt strict limits on carbon emissions? » Has their been advances in our understanding of climate change? • Instead of one piecewise linear value function for each resource and time period… • We need one for each state of the world. There can be thousands of these. Outline Dynamic programming and the curses of dimensionality Introduction to ADP Energy resource modeling Machine learning for general value functions The exploration vs. exploitation problem » » » » Peter Frazier (faculty member in ORIE at Cornell) Ilya Ryzhov (graduate of ORIE) Warren Scott (Ph.D. student at Princeton) Martijn Mes (University Twente) The exploration vs. exploitation problem The decision depends on estimate of future value V 0 ( MN ) 0 V (CO) 0 0 V 0 ( NY ) 0 $350 $150 V (CA) 0 0 $450 $300 The exploration vs. exploitation problem The decision depends on estimate of future value V 0 ( MN ) 0 V (CO) 0 0 V 0 ( NY ) 0 $350 $150 V (CA) 0 0 $450 $300 V 1 (TX ) 450 The exploration vs. exploitation problem The decision depends on estimate of future value V 0 ( MN ) 0 $180 V (CO) 0 0 V 0 (CA) 0 $400 $600 $125 V 1 (TX ) 450 V 0 ( NY ) 0 The exploration vs. exploitation problem The decision depends on estimate of future value V 0 ( MN ) 0 $180 V (CO) 0 0 V 0 (CA) 0 V 0 ( NY ) 600 $400 $600 $125 V 1 (TX ) 450 The exploration vs. exploitation problem The decision depends on estimate of future value V 0 ( MN ) 0 $400 V (CO) 0 0 $200 $150 V 0 (CA) 800 $250 V 1 (TX ) 450 V 0 ( NY ) 500 The exploration vs. exploitation problem The decision depends on estimate of future value V 0 ( MN ) 0 V (CO) 0 0 V 0 ( NY ) 500 $385 $800 V (CA) 800 0 $125 $275 V 1 (TX ) 450 The exploration vs. exploitation problem Current research: » We are focusing on the problem of maximizing an unknown function: max x EF ( x,W ) » We assume a measurement x is “expensive.” » Important problem dimensions: • The complexity of the measurement x. • The nature of the belief structure. • The complexity of the decision problem which uses the information we are collecting. The exploration vs. exploitation problem The knowledge gradient: » Assume you can make only one measurement, after which you have to make a final choice (the implementation decision). » What choice would you make now to maximize the expected value of the implementation decision? Change in estimate of value of option 5 due to measurement. 1 2 3 4 5 The exploration vs. exploitation problem The knowledge gradient » The knowledge gradient is the expected value of a single measurement x, given by xKG E max y F ( y, K ( x)) max y F ( y, K ) MarginalExpectation value of optimization measuring xUpdated (the knowledge gradient) Optimization given what we New problem over different measurement outcomes Implementation decision knowledge stateproblem given measurement x know Knowledge state • x= measurement decision • y= implementation decision » Knowledge gradient policy chooses the measurement with the highest marginal value. » This can be viewed as a kind of coordinate ascent algorithm. 79 The knowledge gradient with correlated beliefs An important problem class involves correlated beliefs – measuring one alternative tells us something other alternatives. 1 2 3 4 5 measure ...these beliefs change too. here... 80 The exploration vs. exploitation problem Approximate this function using hierarchical aggregation. From this model, we can infer correlated beliefs. Performance of knowledge gradient using correlated beliefs for this surface. Exploiting correlated beliefs Two-dimensional surface Concentration Temperature 82 Measuring two-dimensional surfaces Initially we think the concentration is the same everywhere: Estimated performance Knowledge gradient » We want to measure the value where the knowledge gradient is the highest. This is the measurement that teaches us the most. Measuring two-dimensional surfaces After four measurements: Estimated performance Measurement Knowledge gradient Value of another measurement New optimum at same location. » Whenever we measure at a point, the value of another measurement at the same point goes down. The knowledge gradient guides us to measuring areas of high uncertainty. Measuring two-dimensional surfaces After five measurements: Estimated performance Knowledge gradient Measuring two-dimensional surfaces After six samples Estimated performance Knowledge gradient Measuring two-dimensional surfaces After seven samples Estimated performance Knowledge gradient Measuring two-dimensional surfaces After eight samples Estimated performance Knowledge gradient Measuring two-dimensional surfaces After nine samples Estimated performance Knowledge gradient Measuring two-dimensional surfaces After ten samples Estimated performance Knowledge gradient Measuring two-dimensional surfaces After 10 measurements, our estimate of the surface: Estimated performance True concentration Optimal learning in physical sciences Materials research » How do we find the best material for converting sunlight to electricity? » What is the best battery design for storing energy? » We need a method to sort through potentially thousands of experiments. Optimal learning in physical sciences Designing molecules » X and Y are sites where we can hang substituents to change the behavior of the molecule Optimal learning in physical sciences We express our belief using a linear, additive QSAR model » X m X ijm ij Indicator variable for molecule m. » Y 0 ij X ij sites i substituents j Optimal learning in physical sciences Learning the best of 10,000 compounds. » Results from 15 sample paths