Supplementary Materials: Full Methods Table S1-3 Figures S1-S11 1 Downscaling reveals diverse effects of anthropogenic climate warming on the potential for local environments to support malaria transmission Krijn P. Paaijmans, Justine I. Blanford, Robert G. Crane, Michael E. Mann, Liang Ning, Kathleen V. Schreiber, Matthew B. Thomas Supplemental Material 1. Climate Downscaling 1.1 Data and Methodology The climate in a local area (in this case, a meteorological reporting station) is a function of the large-scale atmospheric forcing, local forcing (such as topography, water bodies and land use) and stochastic variability. General Circulation Models (GCMs) are most effective at capturing the larger-scale processes, and climate downscaling involves the development of an empirical or dynamically-based transfer function that relates the larger-scale climatic state to local weather statistics. In the methodology described here, Self-Organizing Maps (SOMs) are used to characterize the larger-scale atmospheric state using 2.5o gridded reanalysis data from the U.S. National Center for Environmental Prediction (Kanamitsu, et al., 2002). Reanalysis data are generated using a global atmospheric forecast model, where a data assimilation scheme is used to constrain or “nudge” the model output toward the available global observations (derived from the global network of surface and upper air reporting stations, aircraft and satellite observations). The reanalysis variables tend to be more accurate where observational data densities are greatest, but they are also the most reliable “observations” available in data sparse regions. SOMs are a category of artificial neural networks that serves as both a data analysis and a data visualization technique. The SOM takes a multi-dimensional data set--in this case a time series of regional atmospheric parameters--and partitions it by defining a set of locations (nodes) in the multidimensional data space where each node is the multivariate mean of the surrounding cluster of data points, and where nodes are unevenly distributed through the data space such that there are more nodes where data densities are greatest. All data points, to a greater or lesser extent, contribute to the definition of each node, so the nodes do not represent a discrete partitioning of the data space. In this respect, SOMs are analogous to a fuzzy clustering algorithm. The NCEP reanalysis data are regridded to a nominal 2o x 2o grid surrounding the meteorological station. The 19 hexagonal cells centered on and surrounding the station are used to train the SOM, and for each of the 19 cells we use standardized values of the surface air temperature, the specific and relative humidities at 850 hPa and the sine of the Julian day to give a total of 76 variables (19 cells X 4 atmospheric 2 variables) that define the large-scale atmospheric state (the weighting given to the day of year is somewhat arbitrary, but compensates for the degree of spatial autocorrelation in the atmospheric parameters). For each station, one SOM is trained with an n x m data matrix input where m is the number of variables (76) and n is the number of days in the data set (for the period 1979-2007). In this application we use an 11 x 9 array for the SOM, giving 99 nodes. Smaller SOM arrays produce more generalized groupings, while larger arrays allow for more subtle differences between groups. Each node in the SOM is defined by a 76-element reference vector that corresponds to the 76 variables in the input data matrix. The training procedure starts by randomly assigning values to each reference vector. We then take the 76 variables from day one and compare it to each of the SOM node reference vectors. The winning node is the node that is closest to the data (using Euclidian distance) and the reference vector for that node is nudged slightly in the direction of that day’s data. The reference nodes for the surrounding nodes are also nudged in the same direction, but by a smaller amount. Every row (day) in the data matrix is passed through the SOM in the same manner, and the procedure repeated iteratively until there is essentially no change in the SOM reference vectors. Updating the surrounding nodes as well as the winning node forces nodes that represent very different parts of the data space to move further apart in SOM space, while nodes that are very similar are located close together. If the data set consisted of two very different clusters or "atmospheric states" for example, the two groups would map to opposite corners of the SOM, and the nodes between would represent the transition states from one to the other. The procedure for training SOMs in this fashion and using them as a type of fuzzy clustering algorithm is described in Hewitson and Crane (2002) and Crane and Hewitson (2003). SOMs are particularly useful in this context because they assume no underlying statistical distribution for the data and they are very forgiving of missing data. A continuous data set is not required, and if there are missing data within a row of the data matrix, the comparison with the reference vectors is carried out on whichever pairs of data elements are present for that day. For each station, once the SOM has been trained on the reanalysis data and every day is mapped to a SOM node, we take all the days on a given node, extract the local (meteorological station) observed temperature for those days, and construct a cumulative frequency distribution of observed local temperatures associated with that particular historical large-scale atmospheric state. We can then take any day, map it to the SOM and then randomly extract a (downscaled) temperature value from the associated cumulative frequency distribution. We can also take GCM daily data (re-gridded to the same 2o x 2o grid) for the present and future time periods and map those to the same SOM created from the reanalysis, and again extract an observed temperature associated with the model simulated atmospheric state. From this point, we create a suite of data sets for each meteorological observing station that includes daily time series of: the observed station temperature, the temperature downscaled from the reanalysis data, the temperature downscaled 3 from a suite of present-day GCM runs, and the temperature downscaled from a suite of GCMs projecting mid-century climate change. A separate SOM mapping and downscaling is carried out for each meteorological station and for each station we downscale the daily maximum and minimum temperatures, and the daily average temperature (max+min / 2). We then construct a joint frequency distribution of paired maximum and minimum temperatures to downscale the diurnal range. For the future projections, the GCM simulated temperature difference (Future – Present) is added back to the downscaled temperatures. In this way, the change in the GCM large-scale mean captures changes due to direct radiative forcing and a generally warmer world, while the downscaling captures the regional variation in those changes as a function of atmospheric state and local variability. In summary, the downscaling procedure involves: For each target location (meteorological station): 1) Defining the large scale atmospheric state using NCEP reanalysis data Standardize the data and regrid to an approximate 2o grid. Extract the training data for the 19 hexagonal cells centered on and surrounding the station location. Train a SOM to group days with similar atmospheric characteristics 2) Defining the temperature regime associated with each group (SOM node) Take each day that maps to a group, extract the target meteorological station's observed temperature for those days, and define the temperature frequency distribution for the node 3) Downscaling temperature Standardize and regrid the GCM present-day and future projection climate data using the same variables that were used to train the SOM Map these data to the SOM trained on the NCEP reanalysis data Randomly select a temperature value from the nodes temperature distribution function, and add the temperature delta values for the future projection This is then repeated for each GCM and for all meteorological stations. Full details on the downscaling procedure and its application to downscaling precipitation in South Africa and in Pennsylvania can be found in Hewitson and Crane (2006) and Ning et al. (2012 a; b). The meteorological stations used for the East Africa analysis are listed in Table S1 and the GCMs used for the projection of present-day (1961-2000) and future (2046-2065) climates are given in Table S2. The present-day simulation uses historical atmospheric greenhouse gas concentrations from the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project (CMIP3) 20c3m scenario. The mid-century 4 projections are based on the A2 emissions scenario. The data and descriptions of the GCMs can be found at the WCRP CMIP3 Multi-Model Data website1 and the GCMs used are those that archived the daily data necessary for the precipitation downscaling described in Ning et al. (2012a; b). 1.2 Downscaling Validation and Results The close agreement between the observed meteorological station data and the downscaled temperatures derived using the NCEP reanalysis data is shown in Figure S1, which compares the two probability distributions, Observed and Downscaled, for mean daily temperature and the diurnal temperature range at all four stations. By randomly selecting values from the observed temperature frequency distribution, and by constructing individual SOMs for each target location, the downscaling not only captures the local forcing constrained by the larger-scale atmospheric state, but also incorporates the stochastic variability present in the observations and is able to match both the means and the extremes of the distributions. The ability of the downscaling to effectively recreate the monthly means and the seasonal cycle in temperature is shown in Figure S2. Figure S3 shows the fractional mean square error (MSE) between the observed temperatures and the temperatures downscaled from the NCEP reanalysis data and present-day GCM data and from the raw GCM simulations. The fractional MSE for the downscaled GCM data is very close to that of the NCEP recreated temperatures (in the range of 10-25%), and much smaller than the errors for the raw GCM simulation data (typically > 100%) at all four stations. The actual downscaled projections averaged across all eight GCMs are shown in Figure S4 for maximum, minimum and mean temperatures, as well as the diurnal temperature range at all four stations. References Crane, R. G. and B. C. Hewitson. Clustering and upscaling of station precipitation records to regional patterns using self-organizing maps (SOMs). Climate Research, 25:95-107 (2003). Hewitson, B. C. and R. G. Crane. Self-Organizing Maps: Applications to synoptic climatology. Climate Research, 22:13-26 (2002). Ning, L., M. E. Mann, R. G. Crane, T. Wagener. Probabilistic Projections of Climate Change for the Mid-Atlantic Region of the United States—Validation of Precipitation Downscaling During the Historical Era. J. Climate, 25:509-526 (2012a). Ning, L., M. E. Mann, R. G. Crane, T. Wagener, R. Najjar, R. Singh. Probabilistic Projections of Anthropogenic Climate Impacts on Precipitation for the MidAtlantic Region of the United States. J. Climate, 25:5273-5291 (2012b). Kanamitsu, M., W. Ebisuzaki, J. Woollen, S-K Yang, J.J. Hnilo, M. Fiorino, and G. L. Potter. NCEP-DEO AMIP-II Reanalysis (R-2). Bulletin of the American Meteorological Society, 83:1631-1643 (2002) 1 https://esg.llnl.gov:8443/index.jsp 5 Table S1: Location of the Downscaled Meteorological Stations Station Latitude Longitude Altitude (m) Kitale 1.02 35.00 1,875 Kisumu -0.10 34.75 1,146 Kericho -0.37 35.35 2,184 Garissa -0.47 39.63 147 Observation time period 1982/122005/10 1978/122010/04 1987/021998/02 1980/062010/04 Table S2: GCMs Used for the Temperature Downscaling Over Kenya Model CCCMA_CGCM 3.1 CNRM_CM 3 CSIRO_MK 3.0 GFDL_CM 2.0 IPSL_CM4 MIUB_ECHO_G Country Canada France Australia USA France Germany/Korea MPI_ECHAM 5 MRI_CGCM 2.3.2A. Germany Japan Resolution (long x lat) Spectral T47 (2.5o x 2.5o) Spectral T63 (1.9o x 1.9o) Spectral T63 (1.9o x 1.9o) 2.5o x 2o 3.75o x 2.5o Spectral T30 (3.75o x 3.75o) Spectral T63 (1.9o x 1.9o) Spectral T44 (2.7o x 2.7o) 6 Table S3. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic (1981-2000) and future climates (2046-2065), for 4 sites across Kenya. Data represent 20-years averages from the ensemble of individual models, and the standard error of the mean. Values between parentheses are the lowest and highest average value reported by any model. 7 Fig. S1. The observed (blue) and NCEP downscaled (red) probability distributions of daily average temperatures (left column) and diurnal temperature ranges (DTRs; right column) over the 4 stations: 1. Kitale; 2. Kisumu; 3. Kericho; 4. Garissa. The vertical scales are different in order to fit the results. 8 Fig. S2. The observed (blue) and NCEP downscaled (red) annual cycles of daily average temperatures (left column) and diurnal temperature ranges (DTRs; right column) over the 4 stations: 1. Kitale; 2. Kisumu; 3. Kericho; 4. Garissa (Unit: °C). The vertical scales are different in order to fit the results. 9 Fig. S3. The average fractional MSE from NCEP-downscaled data (blue), GCMdownscaled (green), and raw GCM simulations (red) over the 4 stations, and the average across the 4 stations: 1. Kitale; 2. Kisumu; 3. Kericho; 4. Garissa. 10 Fig. S4. Projected changes of the annual average maximum temperature, minimum temperature, average temperature, and DTR between the period 2046-2065 and the period 1981- 2000 for each of the four stations: 1. Kitale; 2. Kisumu; 3. Kericho; 4. Garissa (Unit: °C). The squares are the ensemble downscaled average across the eight GCMs, and the whiskers are the inter- GCM uncertainties. 11 Fig. S5. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Kericho in western Kenya. The grey shading indicates the range outputs from individual climate models (full details of models in Table S2). The dotted black line represents the average from the ensemble of models; the solid black line the available weather station data recorded at Kericho for a subset of the historic time series. 12 Fig. S6. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Kitale in western Kenya. The grey shading indicates the range outputs from individual climate models (full details of models in Table S2). The dotted black line represents the average from the ensemble of models; the solid black line the available weather station data recorded at Kitale for a subset of the historic time series. 13 Fig. S7. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Garissa in western Kenya. The grey shading indicates the range outputs from individual climate models (full details of models in Table S2). The dotted black line represents the average from the ensemble of models; the solid black line the available weather station data recorded at Garissa for a subset of the historic time series. 14 Fig. S8. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Kericho in western Kenya. Different colored lines represent outputs from individual climate models (full details of models in Table S2). 15 Fig. S9. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Kitale in western Kenya. Different colored lines represent outputs from individual climate models (full details of models in Table S2). 16 Fig. S10. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Kisumu in western Kenya. Different colored lines represent outputs from individual climate models (full details of models in Table S2). 17 Fig. S11. Mean temperature and diurnal temperature range (DTR) as estimated by the raw GCMs or the downscaled (DS) models, for recent historic and future climates for Garissa in western Kenya. Different colored lines represent outputs from individual climate models (full details of models in Table S2). 18