Supplementary Information (SI) for: Increased record-breaking precipitation events under global warming Jascha Lehmann*, Dim Coumou, Katja Frieler *Corresponding author. Email: jlehmann@pik-potsdam.de S1: ‘Testing the iid assumption’ The iid assumption for the stationary model is justified because the detrended observational time series are close to iid. To show this, we first detrend the original Rx1day time series by subtracting the smoothed mean value calculated using singular spectrum analysis with window length of 15 years. The residuals contain the year-to-year variability for a specific month. We then test whether the residuals are temporally independent by calculating the serial correlation in the detrended Rx1day time series for each calendar month. We find that the correlation values are randomly distributed over all land areas and are generally small and within -0.2 and 0.2 for all months (Fig. S13). Some outliers reach values of -0.6 and +0.5, but for such relatively small values of serial correlation the 1/n solution holds1. 1 Coumou, D., A. Robinson, and S. Rahmstorf (2013), Global increase in record-breaking monthly-mean temperatures, Clim. Change, 118(3-4), 771–782, doi:10.1007/s10584-012-0668-1. S2: Additional figures Fig. S1 Location of the 11 391 observing weather stations used to create the HadEX2 data set which is given on a 3.75° x 2.5° grid. Fig. S2 Number of grid points with monthly maximum 1-day precipitation data for each point in time given in (a) absolute numbers and (b) relative to the total number of grid points with data. Fig. S3 Same as Fig. 1, but for the GHCNDEX data set. Fig. S4 Same as Fig. 2, but for the GHCNDEX data set. Fig. S5 Time series of the annual record-breaking anomaly calculated from HadEX2 (grey bars) and GHCNDEX (pink bars) shown for (a) Global, (b) northern extratropics, (c) northern subtropics, (d) tropics, and (e) southern subtropics. The long-term non-linear trend of the record-breaking anomaly (solid line) is calculated using singular spectrum analysis with window length of 15 years. (f)-(j) and (k)-(o) are the same as (a)-(e), respectively, but for seasonal record-breaking anomalies representing NDJFM (middle panel) and MJJAS (right panel). To ensure comparability between both data sets record-breaking anomalies were only calculated for the period 1951-2010, where both data sets provide data. For each region and season, we computed the Pearson correlation coefficient ( XY ) between the record-breaking anomaly time series of both data sets, which is shown in the corresponding panels in Fig. S5. In general, results are in good agreement between the two data sets indicated by high positive Pearson correlation coefficients implying that the variables are positively linearly related. In the tropics, correlation coefficients are in a range of 0.070.20 indicating a positive but weaker linear relationship. This could be due to larger uncertainties due to sparse data coverage in this region. A large and consistent increase in record-breaking anomaly can be found in both data sets over the northern extratropics, northern subtropics, and on the global scale. However, the increase is slightly stronger in GHCNDEX compared to HadEX2. Over the southern subtropics both data sets show no trend in record-breaking anomaly. Fig. S6 Schematic illustration of (Step 1) finding region specific time boundaries for the shuffling process, (Step 2) computing the observed regional record-breaking anomaly, and (Step 3) computing a set of modeled record-breaking anomalies based on the iid-model. In the first step, for each month, the Rx1day data is organized in a p x n matrix where the number of rows p equals the number of grid cells and thus denotes the location and the number of columns n refers to the number of years. For each region a time period is defined for which this regions provides data (see colored rectangles). The following steps are applied to each individual region in the limits of these time boundaries. To compute the observed record-breaking anomaly (“Step 2”), record-breaking events are counted in each row, i.e. for each grid cell in the given region with a value of 1 denoting that this value has set a new record and a value of 0 that this particular value was not a record-breaking event (see upper matrix in middle panel of Step 2). We subsequently sum up all values of this matrix along the p grid cells which leaves a vector of length n giving the total number of record-breaking events per year in the given region. This vector is normalized with the number of expected record-breaking events (lower matrix in middle panel) using eq. [1] to come up with a time series of the regional record-breaking anomaly (right panel). The black dashed vertical lines in the middle panel of “Step 2” denote the time period which fulfills the applied data requirements. “Step 3” explains how the iid-model is computed. First, the n columns are randomly shuffled in which process the order in time is lost, but the spatial correlation within the given region is kept. From the shuffled matrix a time series of simulated regional record-breaking anomaly is computed in the same way as described for the observational data. The resulting record-breaking anomaly refers to one realization of the iid-model. The full procedure described in “Step 3” is repeated 10.000 times to create a set of possible record-breaking anomalies under the Null hypothesis of the iid assumption. From this set of time series the 90th and 95th confidence intervals are determined. Fig. S7 Same as Fig. 2 in main manuscript but, here, confidence ranges are estimated using a shuffling process which does not account for spatial correlation but therefore keeps missing values fixed in space and time and thus conserves trends in the number of data points per year. This leads to generally smaller confidence ranges compared to Fig. 2, where spatial correlation within each region is taken into account at the expense of neglecting changes in the number of data points per year. Fig. S8 Temporal heterogeneity in monthly data coverage, exemplarily shown for January. In the left panel (a), for each grid point, years with data are colored corresponding to the region the grid point belongs to. Only those values are colored for which the data requirements for the full year are fulfilled, i.e. minimum time series length of 30 years and minimum 100 time series per year. For each region, grid points are sorted by the start year of the given time series to illustrate temporal heterogeneity within individual regions. For each year we sum up all grid points with data which results in a time series with the total (global) number of available data as depicted by black circles in panel (b). Randomly shuffling years of each time series in each region within its individual time boundaries leads – on average – to a nearly equal distribution of data coverage in the given region. This is shown by red circles in panel (b). This time series is characterized by steplike increases in years where new regions start to supply data as indicated by the vertical dashed lines. The shuffling method is thus able to reproduce a similar curve of changes in the amount of data over time. Fig. S9 same as Fig. S8 but with data requirements applied to winter season. Fig. S10 Same as Fig. S8 but shown for June and with data requirements applied to summer season. Fig. S11 Same as Fig. 2 in main manuscript but, here, confidence ranges are estimated using a blockshuffling method with a fixed block size of 2 years. Fig. S12 Same as Fig. S11 but with a fixed block size of 3 years. Fig. S13 Serial correlation in the non-linear detrended HadEX2 Rx1day time series for each calendar month. Fig. S14 Time series of annual record-breaking anomaly shown for the global mean (black line). Colored bars represent the ENSO time series (nino3.4 index) with positive values indicating El Niño years (blue bars) and negative values corresponding to La Niña years (red bars). S3: Additional tables # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Region Label Alaska Australia Central Africa Central America CGI Central North America Central West Asia Eeastern Asia Eastern North America Europe Mediterranean Northern Asia Northern South America Southern Africa Sahara Southern Asia Southern South America South East Asia Tibetan Tableau Western South America Western North America Northern Extratropics Northern Subtropics Tropics Southern Subtropics lat1 lon1 lat2 lon2 lat3 lon3 lat4 lon4 lat5 lon5 lat6 lon6 lat7 lon7 60 105 60 168 73 169 73 105 -------50 110 -10 110 -10 155 -30 155 -30 180 -50 180 ---11 -20 15 -20 15 52 -11 52 ------11 50 -68 -10 -1 50 -80 105 29 85 118 105 29 85 -90 -10 --- --- --- --- --- --- 50 -85 29 -85 29 105 50 105 -- -- -- -- -- -- 15 20 40 100 50 50 40 100 50 50 75 145 30 20 75 145 30 -- 60 -- 15 -- 60 -- --- --- 25 45 30 50 -60 -10 -10 40 25 75 45 70 -85 -10 -10 40 50 75 45 70 -85 40 40 180 50 45 30 50 -60 40 40 180 ----- ----- ----- ----- ----- ----- -20 -66 -1 -80 11 -69 11 -50 0 -50 0 -34 -20 -34 -35 15 5 -10 -20 60 -11 30 30 -10 -20 60 -11 30 30 52 40 100 -35 15 20 52 40 100 --20 --95 --5 --95 ---- ---- -20 -39 -57 -39 -57 -67 -50 -72 -20 -66 -- -- -- -- -10 95 20 95 20 155 -10 155 -- -- -- -- -- -- 30 75 50 75 50 100 30 100 -- -- -- -- -- -- -1 -80 -20 -66 -50 -72 -57 -67 -57 -82 1 -82 -- -- 29 105 29 130 60 130 60 105 -- -- -- -- -- -- 40 180 90 180 90 180 40 180 -- -- -- -- -- -- 20 -20 180 180 40 20 180 180 40 20 180 180 20 -20 180 180 --- --- --- --- --- --- -40 180 -20 180 -20 180 -40 180 -- -- -- -- -- -- Table S1 Coordinates of corners of regions displayed in Fig. 4 and Fig. 5. Values are given in degrees North (for latitudes) and in degrees East (for longitudes). Regions 1-21 are used to compute the global aggregate.