Further Development of Bartlett-Lewis model for fine-resolution rainfall ∗ Jo Kaczmarska Department of Statistical Science University College London, Gower Street, London, WC1E 6BT, UK. (jo@stats.ucl.ac.uk) April 15, 2011 Abstract In a recent development in the literature, a new temporal rainfall model, based on the Bartlett-Lewis clustering mechanism, and intended for sub-hourly application, was introduced. That model replaced the rectangular rain cells of the original Bartlett-Lewis model with a Poisson process of instantaneous pulses, in order to allow greater variability in rainfall intensity over small time intervals. A version with two superposed processes provided a good fit to five-minute data from New Zealand, but required a large number of parameters. In the present paper the basic instantaneous pulse model is extended, following the approach developed in an earlier study, by randomising the cell duration parameter, thus allowing the durations of cells within a single storm to be dependent. Moments up to 3rd order for the aggregated rainfall process are developed for the new model, which is then fitted to 69 years of 5-minute data from Bochum, Germany. The new model is compared with a number of other Bartlett-Lewis variants, and found to perform well, improving on the non-random version, and providing a parameter-efficient method of allowing for different storm types. A further improvement is found by fitting a model variant in which pulse depths within cells are dependent. ∗ Research Report No. 312, Department of Statistical Science, University College London. Date: April 2011 1 1 Introduction Temporal rainfall models based on the clustered Poisson process approach introduced by Rodriguez-Iturbe et al. (1987) have been used for over thirty years, in order to simulate the artificial rainfall series required as input for hydrological models, for example for flood risk analysis, sewerage system design, and the design of reservoirs. The models assume that rain-events arrive in a Poisson process. Each rain event consists of a cluster of rain cells, with the temporal location of cells relative to the event origin specified by one of two clustering mechanisms - Bartlett-Lewis or Neyman-Scott. In the most commonly used models, each cell is assumed to have a random duration, during which rain with a constant random intensity is deposited, giving rise to their description as “rectangular pulse models”. The models’ ability to generate simulations in continuous time is one of their principal advantages, allowing aggregation of the properties and simulations to different timescales in a consistent way. A further important feature of the models is their representation of the physical rainfall process in a realistic, if simplified way, such that the hierarchical structure of rainfall is represented, and the parameters have interpretable meanings. This means that fitted model parameters can provide some insight into the nature of differences between sites, or indeed between different potential future climate conditions. Since their introduction, many refinements have been introduced. Key amongst these have been those which have allowed for different types of rainfall. These include models with multiple cell-types (Cowpertwait 1994), or multiple superposed processes (Cowpertwait 2004, Cowpertwait et al. 2007). In order to keep parameter numbers manageable, these methods have generally limited the number of cell types or processes to just two, which can be thought of as representing heavy, short-duration convective and lighter, long-duration stratiform types of rainfall. An alternative modification to enable variation between storms is the randomisation of the cell duration parameter between storms (Rodriguez-Iturbe et al. (1988), Entekhabi et al. (1989)). In effect this allows a continuous range of storm types. The primary motivation was to improve the fit of the models to the probability of no rain in an interval (“proportion dry”), particularly for longer periods of several hours or more. In the Bartlett-Lewis case (Rodriguez-Iturbe et al. 1988) the model was re-parameterised such that different storms essentially had the same structure, but operated on different timescales. This model was recommended by Wheater et al. (2005), following a practical review of several models, for combining good performance with a relatively parsimonious model structure. A more recent approach from Cowpertwait (2010) addresses the issue of different types of rainfall, by assuming a continuum of storm types of random type Z, whose parameters are functions of Z, and investigates this approach in the special case where Z is uniformly distributed. This approach again has the advantage of being relatively parsimonious, but choices for the distribution of Z and parameter functions of Z are likely to be limited by tractability. 2 Other variations of the basic models include the addition of a jitter to make the cell intensity more realistically irregular (Rodriguez-Iturbe et al. 1987, Gyasi-Agyei & Willgoose 1997), the introduction of dependence between cell duration and intensity (Kakou & Onof 1996) and a more realistic assumption for the shape of rainfall intensity within cells (Northrop & Stone 2005). The models are fitted to discrete data from rain-gauges, typically using the generalised method of moments (the complexity of the models, particularly when aggregated, making a maximum likelihood method impracticable). This is a fairly subjective method, for which there is considerable flexibility, particularly in terms of the number and types of properties chosen for fitting and the weights applied to these. Examples of practical application are numerous (Onof et al. 2000, Wheater et al. 2005, Cowpertwait 2006, Kilsby et al. 2007, Burton et al. 2008), with generally very good performance, and little to choose between the two clustering mechanisms (Wheater et al. 2005). Although some shortcomings in performance are found, it is not always clear to what extent these are due to the models, or the fitting, the subjective fitting method being at once a disadvantage, and an advantage (since weights and properties can be selected to focus specifically on areas in which the hydrologist is most interested for a particular application). The most commonly noted shortcomings relate to the reproduction of wet/dry properties and to extremes. The former was addressed to some extent by randomising the cell duration parameter, as discussed; the latter by the introduction of the skewness coefficient as one of the fitting properties (Cowpertwait 1997). Parameter identifiability can be a problem, particularly with the model variants with relatively high numbers of parameters, such as those with multiple cell types or superposed processes. Cowpertwait (2010) suggests that any more than eight parameters per season is likely to be excessive since the sample moments used in model fitting are highly correlated. This is backed-up by empirical studies, for example Wheater et al. (2005), in a comparison of models, found that the Bartlett-Lewis model with random cell duration with a one-parameter cell intensity distribution (6 parameters) had reasonably well-identified parameters, whereas a model with two cell types, each with a two-parameter intensity function (10 parameters) did not. Another shortcoming of these models is that they are stationary, and much recent development has focused on simulating future rainfall, allowing for potential impacts of climate change. However, to date no straightforward approach has been found, and many of the approaches in the existing literature continue to use the clustered Poisson models within their methodology. For example, such models are used for downscaling climate model output (Kilsby et al. 2007, Burton et al. 2010), or for the disaggregation of simulations produced using alternative modelling strategies from daily to sub-daily (see for example Glasbey et al. (1995), Koutsoyiannis & Onof (2000, 2001) for methodology and Chandler et al. (2007) for application). An alternative approach from Fowler et al. (2000) used the 3 Neyman-Scott rectangular pulse model to simulate rainfall within a given weather-state, the states themselves being modelled using a semi-Markov process. While much of the application of the models has been at hourly or longer timescales, there is also a significant requirement for sub-hourly resolution, in particular for the design of stormwater sewerage systems. This was the motivation for the development of the Bartlett-Lewis Pulse model (Cowpertwait et al. 2007), which replaces the rectangular rain cells of the original Bartlett-Lewis model with a Poisson process of instantaneous pulses (thus incorporating two levels of clustering, and allowing greater variability in rain intensity at short timescale). We will refer to this model as the Bartlett-Lewis Instantaneous Pulse model (BLIP). The model achieved a very good fit to a time-series of five-minute rainfall data from a site near Wellington, New Zealand, using two superposed storm processes. In this paper, we examine the performance of the BLIP model on another long series of five-minute rainfall, from a single rain-gauge in Bochum, Germany. We confirm the problems of parameter identifiability of the 11-parameter model with two superposed processes, and therefore the need to develop a parsimonious model structure that is still capable of allowing for different types of precipitation. We therefore go on to develop a version of the BLIP model with a random cell duration parameter, following the approach of the Random Parameter (or Random η) Bartlett-Lewis model of Rodriguez-Iturbe et al. (1988), and compare the fit using this model against the non-random version, and against other key variants. Note that although we are focusing here purely on temporal models i.e those fitted to a single site, such models may readily be extended to the spatial dimension and fitted to rain-gauges following the approach of Cowpertwait (1995), Cowpertwait et al. (2002). 2 2.1 Specification of the Bartlett-Lewis suite of models Summary of Existing Models In the basic Bartlett-Lewis Rectangular Pulse (BLRP) model, rain-events arrive in a Poisson process of rate λ, each event generating a cluster of cell arrivals. The Bartlett-Lewis clustering mechanism assumes that the time intervals between successive cells are independent, identically distributed random variables (whereas in the Neyman-Scott model, it is the temporal distances of the cells from their storm origin which are independent and identically distributed). It is normally assumed that the intervals between cells are exponentially distributed, so that the cell arrivals constitute a secondary Poisson process of rate β. Each cell is associated with a rectangular pulse of rain, of random duration, L, and with random intensity, X. In the simplest version of the model, these are both 4 assumed to be exponentially distributed with parameters η and 1/µX respectively, and are independent of each other. The cell origin process terminates after a time that is also exponentially distributed with rate γ. This basic version thus has five parameters in total. Additional flexibility can be added by allowing for a distribution with more parameters for pulse intensities. A distribution with a longer tail may help in particular with the fit of extreme values, and popular variants include the Gamma and Weibull distributions. One additional parameter is required in order to use either of these. Both storms and cells may overlap, and the total intensity of rain at any point in time, Y (t) is given by the sum of all pulses “active” at time t. The Random Parameter Bartlett-Lewis model (Rodriguez-Iturbe et al. 1988) extends this basic model by allowing the parameter η, that specifies the duration of cells, to vary randomly between storms. This is achieved by assuming that the η values for distinct storms are independent, identically distributed random variables from a Gamma distribution with index α and scale parameter ν. The model is re-parameterised so that, rather than keeping the cell arrival rate, β, and the storm termination rate, γ constant for each storm, it is the ratio of both of these parameters to η that is kept constant. Thus, for a higher η (i.e. typically shorter cell durations), we have correspondingly shorter storm durations, and shorter cell interarrival times. This is desirable as it is in line with what we observe in practice - that short duration convective rain is more intense than the longer duration stratiform rain. Essentially the effect is that all storms have a common structure, but distinct storms occur on different (random) timescales. An issue exists with the original random η model (Verhoest et al. 2010), which led to the development of the Truncated Random η model (Onof, C., T. Meca-Figueras, J. M. Kaczmarska, R. E. Chandler, and L. Hege, Modelling rainfall with a Bartlett-Lewis process: third-order moments, proportion dry, and a truncated random parameter version, (manuscript in preparation, 2011)), where the Gamma distribution for the cell duration parameter, η, is truncated, with support (ε, ∞). The issue arises due to the divergence at zero of integrals over η for the variance and skewness of the aggregated series, for certain values of the shape parameter α. In fact, if the skewness coefficient is to be included in the fitting, α in the original model would need to be greater than 4, which in practice is an undesirable constraint. The lower limit, ε, for the integrals over η can be pre-specified, or alternatively, as in the Truncated Random η model, can constitute a further parameter to be determined. The Bartlett Lewis Instantaneous Pulse model (Cowpertwait et al. 2007), intended for fitting to fine-scale (of the order of five to fifteen minute) data, has a minimum of six parameters (one more than the original Bartlett-Lewis model), and is defined and parameterised as follows: • Storm origins arrive in a Poisson process of rate λ. 5 • Each storm origin initiates a Poisson process of cell origins of rate β; in contrast to the basic Bartlett-Lewis model, it is not assumed that there is a cell at the storm origin itself, so a storm may have no rainfall. This is purely for mathematical convenience and does not lead to any loss of generality. • Each cell origin initiates a further Poisson process of rainfall pulses of rate ξ. Again, it is not assumed that there is a pulse at the cell origin, so a cell may have no rainfall. Note that the pulses are instantaneous - they have a depth, but no duration. This Poisson process of instantaneous pulses replaces the rectangular pulse assumption of the original Bartlett-Lewis model. • Both the storm duration (the duration of the cell origin process), and the cell duration are assumed to be exponentially distributed, the former with rate γ, and the latter with rate η. The process of pulses terminates with the cell or storm lifetime, whichever is the sooner. • Associated with each pulse is a depth, X, so the pulse process is a marked point process (Cox & Isham (1980)). The model developed by Cowpertwait et al. (2007) allows pulse depths from a single cell to be dependent, but those from distinct cells are assumed independent. No specific dependence structure is specified, and the model fitted in the paper assumed independent, exponentially distributed pulse depths, with mean depth µX . The fitted model also assumed two superposed processes, with a common depth parameter across the two storm types, giving a total of eleven parameters. 2.2 Developing a Random η Version of the Bartlett Lewis Instantaneous Pulse model For the randomisation of η in the BLIP model, we take the same approach as for the original Bartlett-Lewis model, but now with the additional assumption that the ratio of the pulse arrival rate to the cell duration parameter (ι = ξ/η) is kept constant. In order to calculate the moments, it is helpful to think of the random η model as the superposition of a continuum of independent processes with random cell duration parameter, η, and storm origin rate, λf (η), where f (η) is the density function of η. Now, the rth cumulant of a sum of independent random variables is the sum of their rth cumulants. Therefore the mean, variance and 3rd central moment (which are the first three cumulants) can simply be obtained by replacing λ with λ f (η) in their original equations, and integrating over possible values of η. 6 The integration approach described ¸some expectations of functions of η. In ·³ requires ´k particular, we will need to use Eη η1 e−ηx for k = 1 and various values of x, given by: "µ ¶ # Z ∞ k να 1 −ηx Eη e = η α−1−k e−(ν+x)η dη η Γ[α] 0 = να Γ[α − k] × Γ[α] (ν + x)α−k Note that, in order for the integral not to diverge at zero, we require α > k. This proved to be an issue for the original Bartlett-Lewis model, as discussed above, where the skewness integral included elements with k = 4. For the Bartlett-Lewis Instantaneous Pulse model, we only need k = 1, so that we require α > 1, which does not significantly prejudice the fit, and a “truncated” version is thus not required. The moments are derived from the original equations of Cowpertwait et al. (2007), by taking expectations over η and using the formula above, as discussed. All the moments can be expressed exactly, which is an advantage for this type of model where numerical approximations can lead to slow computational speeds. The moments for the new model are given in the Appendix. 3 Fitting the Models The generalised method of moments (GMM) is used for fitting. This is an extension of the method of moments which estimates parameters by equating expressions for population moments with their sample values. In the GMM, the number of properties that we want to fit to exceeds the number of unknown parameters, and our estimator is given by the value of θ that minimises: S(θ|T ) = (T − τ (θ))0 W (T − τ (θ)) for some positive definite weighting matrix W , where θ is the unknown parameter vector, T is the vector of observed values for a set of k properties, and τ (θ) is the vector of their expected values under the model. S is referred to as the “objective function”. Here, we take W to be a diagonal matrix, so that the objective function becomes S(θ|T ) = Pk 2 i=1 wi [Ti (y) − τi (θ)] , with the wi equal to 1/Var(Ti (y))). This is a slight simplification of the theoretically optimal approach (in terms of the identifiability of parameters) of Hansen (1982), where W is the inverse of the covariance matrix of statistics. Note that, since the number of properties included in S exceeds the number of parameters, there is no guarantee that there will be a good fit to all the fitting properties. The 7 adequacy of the fit is thus assessed by considering properties used in the fitting procedure, as well as others that are of interest in hydrological applications. Some properties will need to be assessed using simulations, for example, extreme values. We follow Cowpertwait et al. (2007) in our choice of fitting properties - the hourly mean, plus the coefficient of variation, lag-1 correlation and skewness at timescales of 5 minutes, 1 hour, 6 hours and 24 hours. Minimisation of S requires a numerical optimisation routine. The approach followed here to fit the Bochum data, is that of Wheater et al. (2005), and we have used the optimisation routines developed for that project. Firstly, a set number of optimisations are carried out using the Nelder-Mead method, each starting with a different initial value for the set of parameters. This set of initial values is generated by random perturbation about a single user-supplied value. The best parameter set is then used as a new starting value for a further set of optimisations, which now use a Newton-type algorithm. The reason for the use of two different optimisation routines is that the first is more robust and thus well suited to identifying promising regions of the parameter space, whereas the second is more powerful if given good starting values. We used the method outlined by Wheater et al. (2005), which is based on the theory of estimating equations, to estimate standard errors. However, we found that numerical instabilities in the calculation of the standard errors could give very different answers for different iterations, even when broadly the same solution for the parameter set was found, and for some of the more complex models, standard errors could not be found at all (due to singularity of the Hessian matrix, required in the calculations). In terms of assessing parameter uncertainty, we therefore preferred the alternative approach suggested by Wheater et al. (2005), which is the examination of profile objective functions. Each parameter in turn is fixed at each of a set of values, and the objective function is optimised over the remaining parameters. The resulting plot for each parameter showing the optimised objective function against the set of parameter values provides a useful means for assessing the identifiability of the parameter - for example, a very flat objective function indicates a wide range of plausible values. Approximate 95% confidence intervals can also be calculated using the objective function itself (although here again there may be problems with numerical instabilities). We will use this approach to give an idea of parameter uncertainty for our new model. 8 4 Comparison of Models on Bochum Data 4.1 Models Fitted The models were fitted, using the methodology and fitting properties discussed, to 69 years of five-minute rainfall data from a single site in Bochum in Germany. The measurements were obtained using a Hellmann rain gauge, in which rain displaces a float and a marking pen attached to the float makes a continuous trace on a recording chart. A separate fit was produced for each month, to allow for seasonality. In each case, we assume that σX /µX = 1, and that the skewness coefficient of X is 2 (effectively X is exponentially distributed). For the Instantaneous Pulse models, initially we also assume that all pulse depths are independent, and, for the two storm type version, we follow Cowpertwait et al. (2007) in assuming a common mean depth for both types. Initially, no constraints were imposed on the parameters other than that they should be greater than zero. The six models initially fitted were: Rectangular Pulse Models 1. the Bartlett-Lewis Rectangular Pulse model (BLRP) 2. the Bartlett-Lewis Truncated random η Model (BLRPR) 3. the Bartlett-Lewis Rectangular Pulse model with two superposed processes (BLRP2); Instantaneous Pulse Models 1. the Bartlett-Lewis Instantaneous Pulse Model (BLIP) 2. the Bartlett-Lewis Instantaneous Pulse Random η model, developed in Section 2.2 (BLIPR) 3. the Bartlett-Lewis Instantaneous Pulse model with two superposed processes (BLIP2) For the Bartlett-Lewis Rectangular Pulse model, on randomising η, the fitted solution gave such a high precision to the mean cell duration, that it effectively replicated the non-random solution. Thus, the fitted parameter set for the BLRPR model is simply a re-parameterised version of the set of BLRP parameters, and there is thus no improvement in the fit compared with the fixed η version. This appears to contradict examples in the literature where the randomised η version had shown an improved fit compared to the fixed η model (Rodriguez-Iturbe et al. 1988, Wheater et al. 2005). On further investigation, we concluded that the improvement in the fit to proportion dry that had previously been 9 found by randomising η was at the expense of a deterioration in the fit to the skewness, which had not been included as a fitting property in these earlier analyses. Fitting the models with two superposed processes proved problematic. Although the BLRP2 model with no parameter constraints gave a very good fit in terms of a low minimum objective function value, the parameters thus obtained were highly unstable, unrealistic and inconsistent from month to month, and no standard errors could be found. It was clear that there was insufficient information in our observed data to identify the large number of required parameters. Introducing constraints for the parameters increased the minimum objective function values, and did not resolve the situation, with resulting solutions having many parameters lying on the constraint boundaries. We therefore concluded that ensuring realistic and reasonably smooth parameters across months would require constraints on the relationships between parameters, rather than just setting bounds on individual parameters. Although it has two more parameters than the BLRP2 model, the Bartlett-Lewis Instantaneous Pulse model with two superposed processes (BLIP2) proved slightly less problematic. For this model, with minimal constraints, we found parameters for most months which were within realistic bounds and which gave a very low minimum objective function value. However, here also we found solutions quite unstable, with issues of parameter identifiability, particularly in the summer months, and again no standard errors could be found. We came to the conclusion that both of these models’ parameter identifiability issues made them unsuitable for practical application. Given the above findings, we present results here for the following three models only: BLRP, BLIP, BLIPR. For the Bartlett-Lewis Instantaneous Pulse Random η model (BLIPR), the unconstrained solution gave an extremely high number of pulses per hour, so for practical reasons, we constrained µX to be above 0.001. This resulted in the fitted µX being at the constraint level for all months (effectively reducing the number of parameters by one), with all other parameters broadly as before, except for a corresponding change in ι. The quality of the fit was unchanged with this constraint, as the product term µX ι effectively forms a single composite parameter over most of the possible parameter space, as we will see in Section 6 from the profile objective functions. We also constrained α to be above 1, as discussed in Section 2.2. In the next section we will compare these three models, firstly in terms of the moments and the minimum objective function value, and then by considering wet/dry properties, which were not included within the objective function. 10 4.2 4.2.1 Performance Comparison of the Fitted Models Moments Plots of the fits of the models (BLRP, BLIP, BLIPR) against the observed data for each month in respect of the mean, variance, lag-1 correlation and skewness coefficient are shown in Figures 1-4. All the models generally perform well with respect to the properties included in the fitting. They reproduce the mean exactly (this is not a given, since the number of properties fitted exceeds the number of parameters), and fit the variance well at all timescales. All tend to underestimate the lag-1 auto-correlation at longer timescales, and the skewness at the shorter ones. It is interesting that the BLRP model generally outperforms the BLIP model, with a lower minimum objective function value in all months except January and December. The model with rectangular pulses has generally been considered unsuitable for timescales shorter than the mean cell duration, due to the unrealistic intensity shape. However, when finescale data is available for fitting, the fitted model tends to have shorter, more frequent cells than if only hourly data is available (of the order of 5-10 minutes, compared with 20-40 minutes for most months), which are arguably more realistic, and which broadly resolve the problem. The best fit, however, is achieved by the new BLIPR model, and this has a lower minimum objective function value for all months than the BLRP or BLIP models. 4.2.2 Wet/dry properties The proportion of dry intervals is a very important property for hydrological applications. Although this could have been included as one of the fitting properties, it is useful to reserve an important feature for subsequent model validation, as this gives an independent test of the appropriateness of the model structure. Plots of the fits of the models against the observed data for each month in respect of the proportion dry are shown in Figure 5. The BLIPR model can be seen to outperform the other models (including the BLIP2 model) strongly with respect to the fit to proportion dry, across all timescales. It is also of interest to consider the wet and dry spell transition probabilities (i.e the probability that a wet interval is followed by another wet interval, or a dry by another dry), which are important for the accurate modelling of antecedent conditions. Figure 6 shows that the BLIPR model again outperforms the other models with respect to the wet spell transition probability. While the BLRP model has a good fit at the hourly timescale, and the BLIP2 model at five minutes, these both perform poorly at other timescales, with 11 only the BLIPR model showing consistency of performance across timescales. There is less difference between models for the dry spell transition probabilities, with all models providing a reasonable fit at all timescales. Based on the properties examined so far, the BLIPR model gives the best performance. Finally, in the next Section, we consider the fit of this model to extreme values, and include one further minor modification to the structure of pulses within cells to improve this aspect. 5 Extreme Value Performance In the derivation of moments for the BLIP model (Cowpertwait et al. (2007)), pulse depths for pulses within the same cell were allowed to be dependent, although the empirical fits assumed independence, as have the fits we have carried out so far. Intuitively, dependent pulse depths should allow higher values of extremes at short timescales, which is desirable since we are currently understating five-minute skewness. We suppose here the most extreme form of dependence in which pulse depths within the same cell have a common depth, with depths in different cells still allowed to vary (denoted the BLIPRd model). This was found to give a lower minimum objective function value than the independent pulse version. The fit to five-minutes skewness was much improved, albeit with a slight deterioration in the variance at the 24-hour timescale. Table 1 shows the minimum objective function value for each of the models that we have successfully fitted, for each month. Since the same set of moments and weights were used for each model, these are directly comparable. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec BLRP BLIP BLIPR BLIPRd 83 38 100 110 141 152 162 140 149 92 68 68 67 56 113 168 239 275 345 268 271 150 76 67 45 30 58 85 93 92 110 86 87 71 30 32 40 24 48 66 76 72 95 76 65 50 25 28 Table 1: Comparison of minimum objective function value 12 For our data, the months with the highest rainfall, rainfall variability and skewness are the summer months, and these are also the months with the highest extremes. A comparison of the fit of extremes for July for the BLIPR model is given in Figure 7, using Gumbel plots. These compare the observed annual maxima (for the month of July) against fifteen simulations, where each simulation is of the same length as our observed data. The maximum rainfall per unit-time is plotted against the “reduced-variate” − ln(− ln(1 − 1/R)) where R is the return period i.e. the average time period within which rainfall of the specified magnitude can be expected to occur once. The graphs for July show that the model has a tendency slightly to underestimate extremes, as has been noted before for this type of model. Results for other months give a fairly similar picture. A comparison showing mean annual extremes (averaged over fifteen simulations) for a number of alternative models at the five minute and hourly timescales is also shown in Figure 8. At the five minute timescale, the BLIPRd model gives the best performance, although all the models underestimate the extremes. Results are closer at the one-hour timescale, and for longer timescales, there is essentially no difference between models. Note that, although the simulated extremes under-estimate the observed values for all timescales, this is partly due here to sampling variation. We have fitted to the mean observed properties, including the skewness coefficient, averaging over each of the 69 years of data. For our data, this gave a lower observed skewness coefficient than we would have obtained by calculating over all 69 years, although in practice the difference could go either way. The latter would have given us a slightly better fit to our extremes, but does not permit the calculation of the covariance matrix for the observed statistics, since it is just a single sample. Based on our analysis, the BLIPRd is shown to be the best performing of the models compared, both in terms of the moments fitted, and more importantly, in respect of the wet/dry properties and extreme values not included in the fit. The fitted parameter set for the BLIPRd model is given in Table 2. It is interesting to consider the parameters in terms of their physical realism, and to consider also the intuition behind our results. Comparing with empirical observations from Houze & Hobbs (1982), the parameter values seem reasonable. Winter storms last several hours, have around 20 cells, which last on average around 22 minutes. In summer, storms have a similar mean duration, but only around 8 cells. However, these have a correspondingly much higher pulse rate, giving broadly the same amount of rainfall per storm over all months. In terms of the intuition behind our results, we conclude that it is not the replacement of rectangular pulses by instantaneous ones that leads to the improved performance of the BLIPR model, compared with the BLRPR model. This is clear from the fact that the better-performing version of the instantaneous pulse model is the one where all pulses have the same depth. With its very short pulse inter-arrival times, the model thus effectively simply replicates rectangular pulses. Instead, we attribute the 13 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec λ µx α α/ν κ φ ι 0.0236 0.0235 0.0227 0.0240 0.0274 0.0321 0.0308 0.0298 0.0256 0.0206 0.0251 0.0264 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 0.0010 2.1468 3.6795 1.9210 1.9902 1.5185 1.2407 1.6347 1.2842 1.3859 2.1263 1.9307 2.0346 4.5905 4.3936 5.5057 6.7368 7.7972 10.1391 10.3442 11.1444 8.9390 6.6978 5.3808 4.5838 1.0273 1.0958 0.7161 0.5176 0.5094 0.4679 0.1883 0.5649 0.4163 0.5801 1.0629 1.0926 0.0458 0.0582 0.0436 0.0387 0.0604 0.0675 0.0452 0.0796 0.0540 0.0406 0.0491 0.0544 173 187 203 248 393 530 899 521 505 286 181 188 Table 2: Parameters for Bartlett-Lewis Pulse random eta model, with common pulse depths within the same cell improved performance of the BLIPR and BLIPRd models to the fact that, unlike the BLRPR model, these allow rainfall intensity to vary with cell duration, since the pulse rate effectively drives the intensity and is proportional to the cell duration parameter, η. Our model thus gives a simple, but effective way of introducing dependence between cell duration and intensity. 6 Parameter Identifiability and Confidence Intervals for the Bartlett-Lewis Random η Pulse model Finally, we explore the parameter identifiability of the new model using profile objective functions as described in Section 3. For these, we fitted the model to the natural log of parameters, which gave the same fitted solution as we had before, but with greater stability, so that we could derive a Hessian and approximate confidence intervals for all months. 95% intervals are given for illustration for the month of January in Table 3. Profile objective functions for the log of all parameters, again for the month of January, are shown in Figure 9. These illustrate the large range over which the profile objective functions for µX and ι are flat, as discussed in Section 4. Over this range, the product of these two parameters effectively constitutes a single parameter, such that an increase in one of them can be directly compensated for by a corresponding decrease in the other. Once ι gets too small, however, terms in its reciprocal in the skewness equation start to be significant and the relationship changes. The spikes in some of the plots are an indication of numerical difficulties in the fitting. Re-running the plots would tend not to replicate 14 these, but might produce other spikes at different locations. 95% Interval λ µx α α/ν κ φ ι (0.021, 0.027) (NA, 0.0030) (1.611, 2.970) (3.642, 5.671) (0.842, 1.233) (0.037, 0.057) (54.206, NA) Table 3: Approximate confidence intervals for January’s parameter estimates for the BartlettLewis Pulse random eta model, with common pulse depths within the same cell 7 Discussion and Conclusions In this paper, an extension to the Bartlett-Lewis Pulse model has been developed, the BLIP Random η model, which allows the cell duration parameter to vary randomly between storms, following the approach of the original Random η Bartlett-Lewis Rectangular Pulse model. A version of the model in which all pulses within the same cell have the same depth is found to be an improvement on the assumption of independent pulse depths. The new model is found to perform well at all timescales, with a marked improvement compared with the fixed η version of the fit to skewness at short timescales and to proportion dry at all timescales. The model also outperforms the original Random η Bartlett-Lewis model, which is found to give the same solution as the fixed η version if skewness is included as one of the fitting properties. The fit to extremes at very short timescales, although better than for the BLRP and BLIP models, remains a potential area for improvement, but for most months, the fit at timescales of one hour or more is satisfactory. It is possible that an alternative distribution for the pulse intensities might improve the fit to extremes further. This was not investigated in depth here, other than replacing the Exponential distribution with the more flexible Gamma, which had no positive impact on the minimum objective function value nor on the fit to extremes. In fact, the fitted parameter σx /µx , allowed to vary from its previous constrained value of 1, was close to zero for most months, such that the pulse depths were effectively fixed at their mean value. A longer-tailed distribution, such as the Pareto or Weibull, might be more effective, but this was not pursued here. The BLIPRd model has seven parameters, effectively reduced to six, as for most of the parameter space the product of the mean depth, µX and ι, the ratio of the mean cell duration to the mean pulse inter-arrival rate, constitutes a single parameter. The model 15 is therefore far more stable then the alternative “two superposed processes” version, which also aims to allow for different storm types. Even so, our profile objective function plots show that issues of parameter identifiability remain, and constraints may be desirable to ensure that parameters are physically realistic. The BLIPRd model is therefore our preferred model for practical application, improving on the fit to the commonly used BLRPR model, with no greater complexity. Although not pursued further here, it would be interesting also to consider linking the cell intensity random variable X in the BLRPR model to the cell duration parameter η, by assuming that the mean intensity µX varies in proportion to η. This is expected to give similar results. 8 Acknowledgements Deutsche Montan Technologie and Emschergenossenschaft/Lippeverband in Germany are gratefully acknowledged for providing the data. I would also like to thank Valerie Isham, Christian Onof, Richard Chandler and Joao Jesus for helpful advice. 16 Appendices A Moments for the Barlett-Lewis Instantaneous Pulse Random η model Parameter definitions • λ - storm arrival rate • α - shape parameter for the Gamma distribution of the cell duration parameter, η • ν - scale parameter for the Gamma distribution of η • κ - ratio of the cell arrival rate to η (i.e. β/η) • φ - ratio of the storm duration parameter to η (i.e. γ/η) • ι - ratio of the pulse arrival rate to η (i.e. ξ/η) • µX - mean cell intensity • E(Xijk Xijl ) - product moment of the depths of 2 pulses within the same cell • E(Xijk Xijl Xijm ) - product moment of the depths of 3 pulses within the same cell A.1 Mean E[Yih ] = λµp µX h A.2 Variance ( à ´ 2 κι 2µ 1 −φηh 1 V ar[Yih ] = λµp E(X 2 )h + X2 Eη e − + φh φ η η " # à !) 2ι φ 1 1 + E(Xijk Xijl ) − µ2X κ Eη e−(φ+1)ηh − + (φ + 1)h (φ + 1)2 φ+2 η η ( à ! 2µ2X κι να ν 2 = λµp E(X )h + − + φh φ2 (α − 1)(ν + φh)α−1 α − 1 " #à !) 2ι φ να ν 2 + E(Xijk Xijl ) − µX κ − + (φ + 1)h (φ + 1)2 φ+2 (α − 1)(ν + (φ + 1)h)α−1 α − 1 17 A.3 Covariance (k ≥ 1) h Cov(Yih , Yi+k ) " à ! µ2X κ e−φη(k−1)h − 2e−φηkh + e−φη(k+1)h = λµp ι Eη φ2 η à ! à !# φ e−(φ+1)η(k−1)h − 2e−(φ+1)ηkh + e−(φ+1)η(k+1)h 2 + E(Xijk Xijl ) − µX κ Eη (φ + 2) (1 + φ)2 η !α−1 à !α−1 à !α−1 ) ν ν ν −2 + ν + φ(k − 1)h ν + φkh ν + φ(k + 1)h ! φ + E(Xijk Xijl ) − µ2X κ (φ + 2) (à !α−1 à !α−1 à !α−1 )# ν ν ν × −2 + ν + (φ + 1)(k − 1)h ν + (φ + 1)kh ν + (φ + 1)(k + 1)h à ν = λµp ι α−1 à !" µ2X κ φ2 (à 18 A.4 3rd Central Moment E[(Y h − E(Y h ))3 ] ( " # 3 κ2 E(X X X ) 2E(X X )µ κ µ 6 ijm X ijk ijl ijk ijl = λκι3 + − X (1 + φ)3 φ φ(2 + φ) (2 + φ) " à !α−1 à !α # 2ν ν ν 2ν × h− + +h (α − 1)(1 + φ) (1 + φ)(α − 1) ν + (1 + φ)h ν + (1 + φ)h " # 2E(Xijk Xijl )µX κ µ3X κ2 6 + + − (1 + φ)(2 + φ)2 (1 + φ) (3 + φ) " ( à !à !α−1 à !à !α−1 )# ν 3 + 2φ 2+φ ν 1+φ ν × h− − + (α − 1) (1 + φ)(2 + φ) 1+φ ν + (1 + φ)h 2+φ ν + (2 + φ)h " à !α−1 à !α # 6µ3X κ2 2ν 2ν ν ν + h− + +h 3 φ (1 + φ) φ(α − 1) φ(α − 1) ν + φh ν + φh " # 2E(Xijk Xijl )µX κ µ3X κ2 6 − + φ(1 + φ)2 φ (2 + φ) " ( à !α−1 à !α−1 )# ν 1 + 2φ (1 + φ) ν φ ν × h− − + (α − 1) φ(1 + φ) φ ν + φh (1 + φ) ν + (1 + φ)h " ( à !α−1 )# 2 X ) 6E(Xijk ν ν ijl + h− 1− ιφ(1 + φ)2 (1 + φ)(α − 1) ν + (1 + φ)h " à !α−1 6E(X 2 )µX κ ν ν ν φ2 + h − + − ιφ2 (1 + φ) φ(α − 1) φ(α − 1) ν + φh (1 + φ)(2 + φ) à !α−1 !# ) à ν ν E(X 3 )h ν + + 2 × h− (1 + φ)(α − 1) (1 + φ)(α − 1) ν + (1 + φ)h ι φ(1 + φ) B Figures 19 0.11 obs BLRP BLIP BLIPR 1 hr mean,mm 0.10 0.09 0.08 J F M A M J J A S O N D Month Figure 1: The mean 1 hour rainfall by month, fitted v observed 5−min 1 hour 0.012 0.5 0.010 0.4 0.008 0.006 0.3 0.004 0.2 Var, mm 0.002 0.1 J F M A M J J A S O N D J F M A M 6 hour J J A S O N D A S O N D 24 hour 6 30 5 25 4 20 3 15 2 10 J F M A M J J A S O N D J F M A M J J Month obs BLRP BLIP BLIPR Figure 2: variance by month, fitted v observed 20 5−min 1 hour 0.55 0.80 0.50 0.75 0.45 0.70 0.40 0.65 0.35 0.30 ac lag 1 0.60 0.25 J F M A M J J A S O N D J F M A M 6 hour J J A S O N D A S O N D 24 hour 0.35 0.20 0.30 0.15 0.25 0.10 0.20 0.15 0.05 J F M A M J J A S O N D J F M A M J J Month obs BLRP BLIP BLIPR Figure 3: Lag-1 correlation by month, fitted v observed 5−min 1 hour 30 12 11 25 10 20 9 8 15 skewness coeff 7 10 6 5 J F M A M J J A S O N D J F M A M 6 hour J J A S O N D A S O N D 24 hour 5.5 3.0 5.0 2.8 4.5 2.6 4.0 2.4 2.2 J F M A M J J A S O N D J F M A M J J Month obs BLRP BLIP BLIPR BLIP2 Figure 4: Coefficient of skewness by month, fitted v observed 21 5−min 1 hour 0.98 0.95 0.96 0.90 0.94 proportion dry 0.92 0.85 0.90 J F M A M J J A S O N D J F M A M 6 hour J J A S O N D A S O N D 24 hour 0.85 0.60 0.80 0.55 0.50 0.75 0.45 0.70 0.40 J F M A M J J A S O N D J F M A M J J Month obs BLRP BLIP BLIPR Figure 5: Proportion dry by month, fitted v observed 5−min 1 hour 0.90 0.8 0.85 0.7 0.80 wet spell transition probability 0.75 0.6 0.70 0.5 0.65 0.60 J F M A M J J A S O N D J F M A M 6 hour J J A S O N D A S O N D 24 hour 0.70 0.6 0.65 0.60 0.5 0.55 0.4 0.50 0.45 0.3 J F M A M J J A S O N D J F M A M J J Month obs BLRP BLIP BLIPR Figure 6: Transition probability of a wet interval being followed by another wet interval, by month, fitted v observed 22 2 5−min 5 10 20 50 100 14 35 12 30 10 25 8 20 6 15 4 10 2 5 −1 0 1 2 2 3 6 hour 5 10 20 4 50 1 hour 2 5 −1 5 0 100 1 10 2 24 hour 2 5 10 20 50 3 4 20 50 3 4 100 5 100 80 60 60 50 40 40 30 20 20 10 0 −1 0 1 2 3 4 5 −1 0 1 2 5 Figure 7: Gumbel plots of observed v simulated extremes for July, using the Bartlett-Lewis Instantaneous Pulse random η model; pulses within the same cell assumed to have a common depth 2 5 10 20 50 100 2 Return period (years) 20 50 100 30 Rainfall mm Rainfall mm 10 obs BLRP BLIP BLIPR (indep) BLIPR (dep) 40 obs BLRP BLIP BLIPR (indep) BLIPR (dep) 15 5 Return period (years) 10 20 5 10 0 0 −1 0 1 2 3 4 5 −1 Gumbel reduced variate 0 1 2 3 4 5 Gumbel reduced variate (a) 5 minute (b) 1 hour Figure 8: Annual Gumbel plots of observed v simulated extremes for variants of the BartlettLewis model 23 log(λ) log(mux) 600 Objective function Objective function 250 200 150 100 400 300 200 100 50 −7 −6 −5 −4 −3 −7 −6 −5 −3 Parameter value log(α) log(α ν) 200 Objective function 65 60 55 50 45 150 100 50 40 0.0 0.5 1.0 1.5 0 Parameter value 1 2 3 Parameter value log(κ) log(φ) 120 Objective function 120 Objective function −4 Parameter value 70 Objective function 500 100 80 60 40 100 80 60 40 −3 −2 −1 0 −4.5 −4.0 Parameter value −3.5 −3.0 −2.5 −2.0 −1.5 Parameter value Objective function log(ι) 100 approx 95% CI approx 99% CI 80 60 40 0 2 4 6 8 10 Parameter value Figure 9: Profile Objective Function Plots for January for the Bartlett-Lewis Pulse random η model; pulses within the same cell assumed to have a common depth 24 References Burton, A., Fowler, H., Blenkinsop, S. & Kilsby, C. (2010), ‘Downscaling transient climate change using a Neyman-Scott Rectangular Pulses stochastic rainfall model’, Journal of Hydrology 381 (1-2). Burton, A., Kilsby, C. G., Fowler, H. J., Cowpertwait, P. S. P. & O’Connell, P. E. (2008), ‘Rainsim: A spatial-temporal stochastic rainfall modelling system’, Environmental Modelling & Software 23. Chandler, R. E., Isham, V. S., Wheater, H. S., Onof, C. J., Leith, N., Frost, A. J. & Segond, M.-L. (2007), Spatial-temporal rainfall modelling with climate change scenarios, Technical Report FD2113, DEFRA/EA. Cowpertwait, P., Isham, V. & Onof, C. (2007), ‘Point process models of rainfall: Developments for fine-scale structure’, Proc. R. Soc.Lond. A 463. Cowpertwait, P., Kilsby, C. & O’Connell, P. (2002), ‘A space-time Neyman-Scott model of rainfall: empirical analysis of extremes’, Water Resources Research 38 (8), 1131. doi:10.1029/2001WR000709. Cowpertwait, P. S. P. (1994), ‘A generalized point process model for rainfall’, Proc. R. Soc.Lond. A 447, 23–37. Cowpertwait, P. S. P. (1995), ‘A generalized spatial-temporal model of rainfall based on a clustered point process’, Proc. R. Soc.Lond. A 450, 163–175. Cowpertwait, P. S. P. (1997), ‘A poisson-cluster model of rainfall: high-order moments and extreme values’, Proc. R. Soc.Lond. A 454, 885–898. Cowpertwait, P. S. P. (2004), ‘Mixed rectangular pulses models of rainfall’, Hydrology and Earth System Sciences 8(5). Cowpertwait, P. S. P. (2006), ‘A spatial-temporal point process model of rainfall for the Thames catchment, UK’, Journal of Hydrology 330. Cowpertwait, P. S. P. (2010), ‘A neyman-scott model with continuous distribution of storm types’, Australian and New Zealand Industrial and Applied Mathematics Journal 51, 97–108. Cox, D. & Isham, V. (1980), Point Processes, Chapman and Hall. Entekhabi, D., Rodriguez-Iturbe, I. & Eagleson, P. . (1989), ‘Probabilistic representation of the temporal rainfall process by a modified Neyman-Scott rectangular pulses model: parameter estimation and validation’, Water Resources Research 25(2), 295–302. 25 Fowler, H., Kilsby, C. & OConnell, P. (2000), ‘A stochastic rainfall model for the assessment of regional water resource systems under changed climatic conditions’, Hydrol. Earth Sys. Sci. 4(2), 263–282. Glasbey, C. A., Cooper, G. & McGechan, M. B. (1995), ‘Disaggregation of daily rainfall by conditional simulation from a point process model’, Journal of Hydrology 165. Gyasi-Agyei, Y. & Willgoose, G. R. (1997), ‘A hybrid model for point rainfall modelling’, Water Resources Research 33(7). Hansen, L. P. (1982), ‘Large sample properties of generalized method of moments estimators’, Econometrica 46, 1029–1054. Houze, R. A. & Hobbs, P. V. (1982), ‘Organization and structure of precipitating cloud systems’, Advances in Geophysics 24, 225–315. Kakou, A. & Onof, C. (1996), ‘A point process model for rainfall with duration intensity dependence’, Annales Geophysicae . Suppl. II to vol. 14: part II, C302. Kilsby, C., Jones, P., Burton, A., Ford, A., Fowler, H., Harpham, C., James, P., Smith, A. & Wilby, R. (2007), ‘A daily weather generator for use in climate change studies’, Environmental Modelling and Software 22. Koutsoyiannis, D. & Onof, C. (2000), ‘HYETOS - a computer program for stochastic disaggregation of fine-scale rainfall’. http://www.itia.ntua.gr/e/softinfo/3/. Koutsoyiannis, D. & Onof, C. (2001), ‘Rainfall disaggregation using adjusting procedures on a Poisson cluster model’, Journal of Hydrology 246, 109–122. Northrop, P. J. & Stone, T. M. (2005), ‘A point process model for rainfall with truncated gaussian rain cells’. Research Report No. 251, Department of Statistical Science, University College London. Onof, C., Chandler, R., Kakou, A., Northrop, P., Wheater, H. & Isham, V. (2000), ‘Rainfall modelling using poisson-cluster processes: a review of developments’, Stochastic Environmental Research and Risk Assessment 14, 384–411. Rodriguez-Iturbe, I., Cox, D. & Isham, V. (1987), ‘Some models for rainfall based on stochastic point processes’, Proc. R. Soc.Lond. A 410, 269–288. Rodriguez-Iturbe, I., Cox, D. & Isham, V. (1988), ‘A point process model for rainfall: further developments’, Proc. R. Soc.Lond. A 417, 283–298. Verhoest, N., Vandenberghe, S., Cabus, P., Onof, C., Meca-Figueras, T. & Jameleddine, S. (2010), ‘Are stochastic point rainfall models able to preserve extreme flood statistics?’, Hydrological Processes 24, 3439–3445. 26 Wheater, H. S., Chandler, R. E., Onof, C. J., Isham, V. S., Bellone, E., Yang, C., Lekkas, D., Lourmas, G. & Segond, M.-L. (2005), ‘Spatial-temporal rainfall modelling for flood risk estimation’, Stoch Environ Res Risk Assess 19. 27