Confidential draft – Do not distribute Integrating multiple analytic modules around the Operational Intelligence Platform (OIP) The case of water distribution Claude Le Pape, Alfredo Samperio, Gratien Bonvin Draft, June 2, 2014 The investigation of analytic problems related to water distribution as part of the Arrowhead project and the examination of an exploratory use case presented by a customer of Schneider Electric strongly suggest the need to integrate multiple “analytic” modules around a common data basis. In parallel, the ongoing development of the OI Platform (to manage in some unique manner types of data often encountered in Schneider Electric, starting with time series, and gather analytic services) suggests that such a common data basis shall be compatible or interfaced with the OI Platform. In this document, we propose a relational data model inspired by (i) the EPANET water distribution standard, (ii) the OI platform, and (iii) other elements from the Arrowhead project which we believe could be generalized and linked once and for all to the OI platform. We also describe various analytic problems which we believe could be addressed from this basis, using specific software already available within Schneider Electric (e.g., hydraulic simulation and optimization), more generic software under development within Schneider Electric (e.g., for demand prediction), or software that could be available from external partners (e.g., from Artelys for planning and demand-response management). Part of the interest of facilitating the integration of analytic components around a common basis consists in enabling an easier evaluation of external components, in comparison or in complement to our current offer. Let us note that we focus here on offline analytics, i.e., analytics aimed at planning actions in advance. These shall be complemented with real-time control analytics (e.g., from SpecificEnergy for real-time pump selection), which we will not consider in the present document. 1 Confidential draft – Do not distribute In its current version, this document is clearly intended as a draft aimed at triggering the discussion, in order to decide how to go further. Comments and suggestions for improvement are the most welcome. 1. Proposed Data Model We describe a proposed data model in relational form. As much as possible, we have tried to stick to the concepts of the EPANET model. We have allowed ourselves to deviate from EPANET whenever there was a clear advantage in doing it, either to better represent elements of the exploratory use case alluded to above, or to adopt concepts used in the OI Platform and the Arrowhead project.1 1.1. Network Nodes: Junctions and Tanks A network in EPANET is described in terms of Nodes and Links. We focus on two types of Nodes, i.e., Junctions and Tanks. For both types, the coordinates of the node are generally useful, for analytic calculations and/or to enable some geometrical display of the network. The coordinates relate to an arbitrary origin and are expressed in meters. At this point, to keep things simple, we ignore the impact of the height of the water in a tank and use the altitude of the tank as the only relevant height parameter. In both cases, we have allowed a DemandPattern to be attached to the node.2 The DemandPattern describes either a history or a prediction of how much water leaves the node over time, in general to serve end customers. Predicting the DemandPattern of a given node is one of the main analytic functions we consider. For optimization concerns, it might also be useful to associate to a source node a function describing the cost of a cubic meter of water at this node. This will be done through the 1 As we are not specialists of EPANET, we may have missed important elements or opportunities to remain closer to the EPANET model, while allowing easy integration with the OI Platform. For this first version, we have also tried to keep things simple, making simplifying assumptions which could be criticized. As already mentioned, the contents of this document are proposed as a starting point, open for debate. 2 In our understanding, EPANET enables the specification of demands only for junctions and not for tanks. Strictly speaking this would be sufficient, as a “tank” might simply be linked to a “junction”. However, we feel it could be appropriate to simplify the network description and allow a tank to be considered as a consuming node of the network. 2 Confidential draft – Do not distribute association of a WaterTariff object to the node. When no water can be produced at a given node (or injected from a non-described node) the WaterTariff attribute is null. Contrarily to junctions, tanks are places in which water can be stored. The minimal data needed to manage this storage include the maximal volume of water storable in the tank, the minimal volume that shall remain in the tank at all times (by default, 0), and the current (initial) volume. In the relational model we propose, the JUNCTION table includes the following columns: A JUNCTION_ID which unambiguously identifies the junction. The X_COORDINATE of the junction. The Y_COORDINATE of the junction. The Z_COORDINATE (altitude) of the junction. An optional DEMAND_PATTERN_ID identifying a demand pattern for the junction. When no demand pattern is provided (DEMAND_PATTERN_ID = null), it is assumed that the junction is merely an intermediate node in the network, from which water is forwarded to other nodes. An optional WATER_TARIFF_ID identifying a cost function for the water at the junction if the junction is a source. The TANK table includes the following columns: A TANK_ID which unambiguously identifies the tank. The X_COORDINATE of the tank. The Y_COORDINATE of the tank. The Z_COORDINATE (altitude) of the tank. The minimal volume VOLUME_MIN to be kept at all times in the tank. The maximal volume VOLUME_MAX that can be kept in the tank. The INITIAL_VOLUME at the beginning of the overall time period under consideration. An optional DEMAND_PATTERN_ID identifying a demand pattern for the tank. 1.2. Network Links: Pipes, Pumps, Pumping Stations, Valves Nodes are connected by several types of Links. At this stage, we assume the network is oriented; hence each Link has a START_NODE and an END_NODE. 3 Confidential draft – Do not distribute A Pipe is a passive Link between its START_NODE and its END_NODE. Energy losses occur in a Pipe depending on various parameters including its LENGTH and DIAMETER, as well as on the MATERIAL constituting the Pipe. The PIPE table contains the following columns: A PIPE_ID which unambiguously identifies the pipe. A START_NODE_ID where multiple links might join before the pipe. An END_NODE_ID from where multiple links might branch after the pipe. A MATERIAL_ID identifying the material constituting the pipe. The LENGTH of the pipe. The DIAMETER of the pipe.3 A Pump is an active Link in which a motor provides electrical power, which is transformed in mechanical power used to pump water that will flow from the START_NODE to the END_NODE. The most important characteristics of a Pump describe the relations between the electrical power, the mechanical power, and the flow. In addition, a Pump can be controlled by a VARIABLE_SPEED drive.4 The PUMP table contains the following columns: A PUMP_ID which unambiguously identifies the pump. A START_NODE_ID where multiple links might join before the pump. An END_NODE_ID from where multiple links might branch after the pump. A Boolean VARIABLE_SPEED indicating whether the pump is controllable or not. Two limits FLOW_MIN and FLOW_MAX providing the minimal and maximal flow recommended by the pump manufacturer to maintain the health of the pump. The minimal power POWER_MIN of the pump and the POWER_SLOPE describing the dependency between the mechanical power used by the pump and the water flow enabled by it (when the pump is not controlled by a drive). In practice, these can be determined from the minimal and maximal flows recommended by the pump manufacturer and two curves: the FLOW_TO_HEAD curve providing the relation between the flow enabled by the pump and the pressure (expressed in meters) and the FLOW_TO_EFFICIENCY curve characteristic of the pump. 3 Aging models might be associated with pipes, suggesting the use of additional characteristics such as INSTALLATION_TIME. At this point, it is however still unclear to us which data could be effectively useful (e.g., characteristics of the material). Aging models will have to be defined in a second version of this document. 4 Aging models might be associated with pumps, suggesting the use of additional characteristics such as INSTALLATION_TIME. At this point, it is however still unclear to us which data could be effectively useful (age of the pump, age of the engine, etc.). Aging models will have to be defined in a second version of this document. 4 Confidential draft – Do not distribute A MOTOR_EFFICIENCY factor between 0.0 and 1.0. A PumpingStation consists of one or several pumps in parallel, i.e., with the same START_NODE and END_NODE. The characteristics of a PumpingStation can be inferred from the characteristics of the individual pumps. Hence, the PUMPING_STATION table is optional. When it is provided, it includes: A PUMPING_STATION_ID which unambiguously identifies the pumping station. A START_NODE_ID where multiple links might join before the pumping station. An END_NODE_ID from where multiple links might branch after the pumping station. A Valve is a Link in which water flow can be limited. At this stage, we associate no specific parameter with a Valve and assume the Valve can be used to set any upper limit on the flow. The VALVE table contains the following columns: A VALVE_ID which unambiguously identifies the valve. A START_NODE where multiple links might join before the valve. An END_NODE from where multiple links might branch after the valve.5 1.3. Materials and ageing models At this stage, the MATERIAL table contains the following columns: A MATERIAL_ID which unambiguously identifies the material. Its DARCY_FRICTION_FACTOR used in classical models for estimating energy losses in a pipe. According to the Darcy–Weisbach equation, the pressure loss in a Pipe can be written as follows: fD * (L/D) * (V2/2) Where: L is the LENGTH of the Pipe 5 In practice, there are multiple types of valves, depending on whether: 1 – the opening is set to a specific value, then flow and pressure drop follow hydraulic equations 2 – the valve is controlled to maintain a given pressure drop 3 – the valve is controlled in order to maintain a given flow. In this version, we assume that the flow is controllable. We might introduce different types of valves in the future. 5 Confidential draft – Do not distribute D is the DIAMETER of the Pipe V is the velocity of the water flow in the Pipe (in m/s), which can also be written as Q/S where Q is the flow of water in the pipe (in m3/s) and S = (D/2)2 the section of the pipe (in m2). is the density of the water in kg/m3 hence 1000. fD is the dimensionless DARCY_FRICTION_FACTOR. In reality, this factor depends on the relative roughness of the pipe and on the speed of water in the pipe. In first approximation, however, this can be supposed constant and associated to the pipe material. We expect that Materials will also be used to describe ageing models. At this point, this is still to be explored. 1.4. Demand Patterns A DemandPattern is a time series defining an expected output flow (to final customers or to another non-represented portion of the network) from a given node. When appropriate, the flow can be defined to be periodic over a given time period and renewed from one year to the other, possibly according to a given ANNUAL_RENEWAL_FACTOR. In the relational model we propose, the WaterDemands table can be used to specify water demand patterns. It includes the following columns: A WATER_DEMAND_TIME_SERIES_ID which unambiguously identifies the time series. A START_TIME. An END_TIME. The FLOW between the given START_TIME and the given END_TIME. An optional PERIODICITY (e.g., “NONE”, “DAY”, “WEEKDAY”, “WEEKEND”) indicating that the given demand element repeats itself periodically. When this column is not used, it is assumed that there is no periodic repetition of the demand. An optional PERIOD_START_TIME and an optional PERIOD_END_TIME limiting the extent over which the periodical repetition applies. An optional ANNUAL_RENEWAL Boolean (0 or 1) indicating whether the given demand element repeats itself every year. When this column is not used, it is assumed that there is no annual repetition. An optional ANNUAL_RENEWAL_FACTOR indicating that the given demand element repeats itself every year, multiplied by the given factor. 6 Confidential draft – Do not distribute 1.5. Tariffs Tariff descriptions can be used both for water costs and electricity costs. A Tariff is described as a time series of curves, enabling the cost to vary with the flow of water or the electrical power that is used. The TARIFFS table includes the following columns: 6 A TARIFF_TIME_SERIES_ID which unambiguously identifies the time series. A START_TIME. An END_TIME. Six columns describing the curve that applies from the given START_TIME to the given END_TIME: CAPACITY_MIN, CAPACITY_MAX, COST_MIN, COST_MAX, FIXED_COST, and VARIABLE_COST. o When the power or flow equals CAPACITY_MIN, the cost for being at this power or flow level for one unit of time is COST_MIN. o As soon as CAPACITY_MIN is exceeded, i.e., becomes CAPACITY_MIN, a penalty corresponding to the given FIXED_COST is paid. FIXED_COST is often equal to 0. The corresponding column is optional. o Between CAPACITY_MIN and CAPACITY_MAX, the cost grows from (COST_MIN + FIXED_COST) to COST_MAX as a quadratic function of the power or flow with the given VARIABLE_COST as initial slope. In usual cases, COST_MAX – (COST_MIN + FIXED_COST) = VARIABLE_COST * (CAPACITY_MAX – CAPACITY_MIN) and the cost grows linearly with the capacity. o When the power or flow equals CAPACITY_MAX, the cost for being at this power or flow level for one unit of time is COST_MAX. An optional PERIODICITY (e.g., “NONE”, “DAY”, “WEEKDAY”, “WEEKEND”) indicating that the given tariff element repeats itself periodically. When this column is not used, it is assumed that there is no periodic repetition of the tariff. An optional PERIOD_START_TIME and an optional PERIOD_END_TIME limiting the extent over which the periodical repetition applies. An optional ANNUAL_RENEWAL Boolean (0 or 1) indicating whether the given tariff element repeats itself every year. When this column is not used, it is assumed that there is no annual repetition. An optional ANNUAL_RENEWAL_FACTOR indicating that the given tariff element repeats itself every year, with all costs multiplied by the given factor. 6 Let us note that an additional table might be necessary if we want to incorporate a choice of contract (and, in particular, of contracted power) in the optimization problem. 7 Confidential draft – Do not distribute 2. Analytic Modules This section presents three analytical components considered at this point. 2.1. Demand Prediction The demand prediction component aims at extending a given water demand pattern in the future. A prediction model linking demand with other variables (e.g., weather conditions) is first learned. Then the model is used to extend a given demand pattern for a given period of time. A more precise specification of such an analytic component will be provided in another document in preparation. 2.2. Pumping Plan Optimization / Planning for Demand Response Multiple options for the optimization of pumping plans and demand response could be considered. In this section, we will attempt to describe an approximate “simple” model which would make sense in the exploratory use case we are aware of. An open question is whether the approximations we make are reasonable. In particular, we ignore all transient factors. We do as if we can use a steady-state approach over a given number of individual time periods. Given are H time periods PERIOD1 PERIOD2 … PERIODH With start time stt and end time ett (1 ≤ t ≤ H). With electricity cost (tariff) over the period. To ease the following description, we will in this section restrict ourselves to linear tariffs and assume that for each period t, a cost per kWh ct is given. Given are N water towers (tanks) TOWER1 TOWER2 … TOWERN With minimal and maximal volumes vmini and vmaxi (1 ≤ i ≤ N) o The minimum is supposed to be given. However, it would be interesting to study how the energy cost and the non-delivery risks vary with this minimum. With a (predicted) water consumption profile PF1 PF2 … PFN o PFi is a deterministic function specifying a consumption ci,t for all t in {1 … H} o Later we may want to play with a probabilistic function and introduce a notion of robustness of the plan with respect to variability of the demand. We ignore such a potential extension for the moment. 8 Confidential draft – Do not distribute Before each water tower TOWERi there is a valve VALVEi enabling to limit the flow and a pipe PIPEi. The goal is to define at each time t in {1 … H} the flow Fi,t between the pumping station and the water tower TOWERi in a way that guarantees that the demand will be satisfied (in the deterministic version) and that minimizes cost. The volume Vi,t in the water tower TOWERi at the end of period PERIODt is obtained as follows: Vi,t = Vi,t-1 + Fi,t * (ett – stt) – ci,t We impose vmini ≤ Vi,t ≤ vmaxi Vi,0 is the initial volume at the beginning of the first period. This value is given. The discharge pressure PRt at the end node of the pumping station that is needed during period t depends on the flows Fi,t as follows: PRt ≥ FORMULA(Fi,t) We want to vary the formula, using more or less precise models with influence on three factors: (i) the amount of data needed, and hence the cost of the solution implementation; (ii) the computation time; (iii) the precision of the results. The key point is that if approximate models lead to pumping schedules which are close to the pumping schedules that would be obtained with more precise models, then the approximate models are acceptable. Several elements shall be considered. o Precise physical models are likely to need a lot of data on pipe characteristics: can we avoid this need? o Can dynamics be ignored, without getting a too bad approximation? o Theoretically, the needed pressure also depends on the altitudes of the water towers for which the valve is open: can we ignore this? o An interesting option would consist in building a data-driven model (e.g., we build from past data a table enabling to approximate the actual function) rather than using a physical model o If the pressure is never much higher than the minimal hydrostatic pressure needed, an option might be to do as if the pressure can be constant or a simple linear or piecewise linear function of the total flow. When the pumping station is directly linked to each water tower, one specific model we may use is the following: PRt ≥ g hi + dffi * (Li/Di) * (Vi,t2/2) for each i where is the density of the water in kg/m3 hence 1000. 9 Confidential draft – Do not distribute g is the gravitational acceleration (9.81 m/s2). hi is the difference of altitude between the water tower TOWERi and the pumping station. dffi is the DARCY_FRICTION_FACTOR of the material of the pipe PIPEi Li is the LENGTH of PIPEi Di is the DIAMETER of the PIPEi Vi,t is the velocity of the water flow in the pipe, i.e., Vi,t = Fi,t / (Di/2)2. When there are intermediate pipes and junctions, the same formula has to be used iteratively from the tanks to compute the discharge pressure at each junction. At each junction, the application of the inequality for each outgoing pipe guarantees that the most constraining branch is taken into account. If the pumps had no loss, the power POWERt needed over time period t would be PRt * i Fi,t.7 Taking into account the efficiency of pumps brings an additional difficulty. In practice, each pump PUMPj is contributing a flow Qj,t with i Fi,t = j Qj,t. When there is no drive, the mechanical power deployed by each pump PUMPj is roughly in the form: POWER_MINj + POWER_SLOPEj * Qj,t Taking into account the efficiency of the motor leads to: POWERt = j (1 / MOTOR_EFFICIENCYj) * (POWER_MINj + POWER_SLOPEj * Qj,t) The total energy cost to minimize is equal to t POWERt * (ett - stt) * ct Once a pumping plan is obtained, studying the opportunities of demand-response could be done in multiple ways, e.g., by varying the electricity tariff or using the framework previously developed by Schneider Electric and Artelys. 2.3. Network Simulation At this point, we do hope (but this needs to be checked) that a network description in the proposed relational model can be used as an input to perform simulations using the hydraulic tools available in Schneider Electric. This would enable us to link these tools with the OI With some constraints on the possible values of POWERt depending on the characteristics of the pumps, use of drives, etc. In particular, in the absence of drive, POWERt would take its values in a discrete set {p0 = 0, p1, p2, …}. 7 10 Confidential draft – Do not distribute Platform and hence with other analytic tools developed on top of the platform (e.g., demand prediction). A more precise specification of such a link needs to be written in the future. 11