INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. (2014) Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/joc.4127 Creating a topoclimatic daily air temperature dataset for the conterminous United States using homogenized station data and remotely sensed land skin temperature Jared W. Oyler,a,b* Ashley Ballantyne,b Kelsey Jencso,b Michael Sweetb and Steven W. Runninga a Numerical Terradynamic Simulation Group, Department of Ecosystem and Conservation Sciences, University of Montana, Missoula, MT, USA b Montana Climate Office, Montana Forest and Conservation Experiment Station, University of Montana, Missoula, MT, USA ABSTRACT: Gridded topoclimatic datasets are increasingly used to drive many ecological and hydrological models and assess climate change impacts. The use of such datasets is ubiquitous, but their inherent limitations are largely unknown or overlooked particularly in regard to spatial uncertainty and climate trends. To address these limitations, we present a statistical framework for producing a 30-arcsec (∼800-m) resolution gridded dataset of daily minimum and maximum temperature and related uncertainty from 1948 to 2012 for the conterminous United States. Like other datasets, we use weather station data and elevation-based predictors of temperature, but also implement a unique spatio-temporal interpolation that incorporates remotely sensed 1-km land skin temperature. The framework is able to capture several complex topoclimatic variations, including minimum temperature inversions, and represent spatial uncertainty in interpolated normal temperatures. Overall mean absolute errors for annual normal minimum and maximum temperature are 0.78 and 0.56 β C, respectively. Homogenization of input station data also allows interpolated temperature trends to be more consistent with US Historical Climate Network trends compared to those of existing interpolated topoclimatic datasets. The framework and resulting temperature data can be an invaluable tool for spatially explicit ecological and hydrological modelling and for facilitating better end-user understanding and community-driven improvement of these widely used datasets. KEY WORDS kriging; air temperature; land skin temperature; homogenization; MODIS Received 28 January 2014; Revised 31 May 2014; Accepted 14 July 2014 1. Introduction Given that climate is a key driver of many ecological and hydrological processes (Running et al., 1987), the effects of climate change have increasingly become a central focus within different areas of environmental research, conservation, and natural resource management (Wiens and Bachelet, 2010; Glick et al., 2011; Millard et al., 2012; Morisette, 2012). As a result, the demand for accurate and spatially continuous climate data that match the scales of local environmental processes and land management decision-making has continued to rise (Daly, 2006; Wiens and Bachelet, 2010; Beier et al., 2011). Assessments of climate change impacts across smaller regions present a challenge, however, owing to the mismatch in scale between local topoclimatic factors and synoptic outputs from global climate models (GCMs) and atmospheric reanalyses (Beniston, 2006; Daly, 2006). This mismatch is especially apparent in mountainous landscapes where topography frequently drives rapid changes in temperature * Correspondence to: J. W. Oyler, Numerical Terradynamic Simulation Group, Department of Ecosystem and Conservation Sciences, University of Montana, 32 Campus Drive, Missoula, MT 59812, USA. E-mail: jared.oyler@ntsg.umt.edu © 2014 Royal Meteorological Society and precipitation over relatively small spatial scales (Beniston, 2006; Barry, 2008). Accordingly, gridded topoclimatic datasets (TCDs) that account for local topoclimatic factors are often necessary to assess local environmental impacts. TCDs generally exist at spatial resolutions ≤10 km, the scale at which the influence of topoclimatic factors such as elevation, cold air drainage potential, and coastal zones becomes greatest (Daly, 2006). Within the conterminous United States (CONUS), the most frequently used TCDs are the interpolated PRISM (Daly et al., 2002; Daly et al., 2008) and Daymet (Thornton et al., 1997) datasets. Both datasets use point-source weather station data and a digital elevation model (DEM) to incorporate the effects of topoclimatic factors and statistically interpolate climate variables to a regular grid. The use of PRISM and Daymet is ubiquitous and recent environmental modelling applications include various GCM statistical downscaling efforts (e.g. Maurer and Hidalgo, 2008; Abatzoglou and Brown, 2012), climate impact assessments (e.g. Elsner et al., 2010; Littell et al., 2010), wildfire hazard and risk assessments (e.g. Keane et al., 2010), and analyses of trends in ecosystem productivity (e.g. Turner et al., 2011) and plant species distributions (e.g. Crimmins et al., 2011). While TCDs like PRISM and Daymet are clearly valuable and have been diligently maintained over many J. W. OYLER et al. Figure 1. Process flow diagram of the TopoWx (‘Topography Weather’) statistical framework. Numbers above components represent sections where components are described. GWR is geographically weighted regression. years, their inherent limitations are often overlooked by end-users, particularly in regard to spatial interpolation uncertainty and their appropriateness in assessing interdecadal and long-term climate trends (Beier et al., 2011; Bishop and Beier, 2013). Although model validation and performance evaluations have been conducted by TCD developers (Thornton et al., 1997; Daly et al., 2008), there are currently no grid-cell-specific metrics of uncertainty. It is consequently difficult for end-users to determine the quality of a TCD for a specific region of interest or to incorporate uncertainty into subsequent analyses. Additionally, while TCDs usually have basic quality assurance (QA) checks on input station data, they do not account for changes in station siting, instrumentation, exposure or observation and data processing practices through time – types of changes, termed inhomogeneities, that can result in significant artificial jumps and trends in climate (Menne et al., 2009; Trewin, 2010). The climate community has conducted substantial research in this area (e.g. Alexandersson, 1986; Peterson et al., 1998; Reeves et al., 2007; Menne et al., 2009) and four global gridded temperature datasets that account for inhomogeneities are now available (Smith et al., 2008; Hansen et al., 2010; Jones et al., 2012; Rohde et al., 2013), but inhomogeneity detection and correction algorithms (i.e. homogenization algorithms) have not yet been integrated into TCDs. Lastly, most TCD models require expert knowledge to run and are closed-source systems that cannot be easily extended or improved by the general climate impacts research community. Addressing these limitations, we present an open source statistical framework for modelling topoclimatic air temperature (Figure 1). Targeted to create a 30-arcsec (∼800 m) resolution CONUS dataset of 1948–2012 daily minimum and maximum temperatures (Tmin, Tmax), the objectives of the framework, termed TopoWx (‘Topography Weather’), are to provide (1) improved temporal and spatial representations of topoclimatic air temperature; (2) grid-cell level uncertainty estimations; and (3) an impetus © 2014 Royal Meteorological Society to increase both end-user understanding of TCD limitations and end-user involvement in TCD development. 2. Materials and methods 2.1. Overview Similar to existing TCDs (Thornton et al., 1997; Daly et al., 2008), we use weather station data and spatial grids of auxiliary predicators to model the influence of topoclimatic factors and spatially interpolate daily Tmin and Tmax. However, to address the limitations of existing TCDs and meet the framework objectives, we differentiate the TopoWx framework through several carefully constructed components (Figure 1). A first component consists of comprehensive QA procedures (Durre et al., 2010) that better ensure the overall quality of the input station observation records (Section 2.2.2.). The second component consists of homogenization procedures (Menne and Williams, 2009) that we apply to the quality assured station data (Section 2.2.3.). Without homogenization, inhomogeneities in the station records have the potential to significantly bias temperature trends in the final gridded output (Menne et al., 2009). The third component consists of missing value infilling procedures (Schafer, 1997; Stacklies et al., 2007) that generate a serially complete record at each station location (Section 2.2.4.). Missing value infilling ensures a spatially consistent set of input stations throughout the entire 1948–2012 time period, yet still allows for the incorporation of important data from short-term or incomplete station records. Lastly, a set of several interpolation components consists of the main spatio-temporal interpolation procedures that take the homogenized, serially complete station data as input and produce the final gridded topoclimatic temperature and uncertainty estimates (Section 2.3.). The spatio-temporal interpolation procedures include both geostatistical kriging (Isaaks and Srivastava, 1989; Hengl, 2009), geographically weighted regression (GWR) (Fotheringham et al., Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE (a) (b) Figure 2. Maps of (a) the final set of 14 087 stations used as input to TopoWx and (b) underlying topography of the conterminous US. Station networks include the daily Global Historical Climatology Network (GHCN-D), Remote Automatic Weather Stations (RAWS) network, and the Snowpack Telemetry (SNOTEL) network. Each station has ≥5 years of raw data for each month. Boundaries represent US climate divisions. 2002), and a novel application of remotely sensed land skin temperature (LST) as a spatial predictor of topoclimatic air temperature (Wan and Li, 2011). 2.2. Weather station data 2.2.1. Data sources As our primary weather station data source, we use the daily Global Historical Climatology Network (GHCN-D; Menne et al., 2012), a global weather station dataset consisting of observations from a multitude of different networks and sources. We spatially limit GHCN-D stations to North America between 53 and 22β N latitude and 126 and 64β W longitude resulting in a total of 14 729 potential stations with temperature observations (Figure 2(a)). To gain better spatial cover in the topographically complex areas of the western CONUS (Figure 2(b)), we also obtain 764 potential station records from the more remote Natural Resources Conservation Service (NRCS) © 2014 Royal Meteorological Society Snowpack Telemetry (SNOTEL) network and 1308 potential station records from the US Forest Service and Bureau of Land Management Remote Automatic Weather Stations (RAWS) network. For inclusion in the TopoWx framework, we require a station to have at least 5 years of observations in each month, a threshold much shorter than the 20-year threshold imposed by other longer-term TCDs (e.g. Livneh et al., 2013). Our 5-year threshold was chosen based on the finding that at least 5–7 years of observations are required before pairwise relationships between stations begin to stabilize (Hubbard, 1994; Camargo and Hubbard, 1999). The ability to reliably model relationships between a station and its neighbours is critical for infilling and extending shorter station records back to 1948 (see Section 2.2.4.). 2.2.2. Quality assurance To check for possible duplicate observations, outliers and numerous internal, temporal, and spatial inconsistencies Int. J. Climatol. (2014) J. W. OYLER et al. in the GHCN-D, RAWS, and SNOTEL station data, we use the QA procedures of Durre et al. (2010). After we mark any QA-flagged observation as missing, if a station falls below the 5-year threshold in any 1 month, we drop it from the framework. Similar to PRISM (Daly et al., 2008), we additionally check all station elevations for consistency with corresponding location elevations from a DEM. We manually investigate any station elevation having a discrepancy greater than 200 m (Daly et al., 2008) and modify either the station elevation or location. 2.2.3. Homogenization Although the QA procedures remove bad observations, they address neither potential inhomogeneities nor the occurrence of time of observation departures (TODs) where a station’s reported daily Tmin or Tmax is off by a calendar day (Janis, 2002). TOD can be a significant problem for daily spatial interpolations given that the various input station time series are assumed to be aligned at a daily time step. To reduce TOD and inhomogeneities in the input station data, we apply two adjustment procedures: a simple daily time-step correction of Tmax observations having a morning observation time and a homogenization algorithm developed by Menne and Williams (2009). Many US stations within GHCN-D are part of the National Oceanic and Atmospheric Administration (NOAA) Cooperative Observer Program (COOP) Network and staffed by volunteers. For convenience, Tmin and Tmax observations at COOP stations are often manually taken once daily over a 24-h period that does not directly correspond to the typical midnight-to-midnight calendar day (Karl et al., 1986). For these non-midnight observation times, the most common and consistent instance of TOD is for morning observations of Tmax as the recorded Tmax is likely for the previous calendar day (Janis, 2002; Holder et al., 2006). Therefore, we shift all morning Tmax observations back a day. For morning Tmax observations at COOP stations in North Carolina, Holder et al. (2006) found that this simple shift significantly improved correlations with midnight-to-midnight observations at automated collocated stations. While less frequent, depending on the time of year and the passage of fronts, TOD issues can also occur for other observation times of Tmin and Tmax, but their detection and correction are more complex and would likely require the use of hourly data (Janis, 2002). Consequently, we limit explicit daily TOD corrections to the more consistent and frequent TOD occurrence within morning observations of Tmax. We also do not apply the Tmax TOD correction to the 7.9% of input GHCN-D COOP Tmax observations that are missing a documented observation time (n = 8 706 353 of 110 051 460). In addition to TOD, non-midnight observations can also result in a time-of-observation bias (TOB) where a single Tmin during a very cold morning or a single Tmax during a very warm late afternoon is recorded over two successive days (DeGaetano, 1999; Janis, 2002). Even slight 1-day © 2014 Royal Meteorological Society shifts can result in seasonal biases for monthly temperature (Karl et al., 1986). The TOB issue is particularly evident when trying to assess temperature trends at stations whose time-of-observation has changed through time and is one of the main network-wide inhomogeneities in the US temperature record (Menne et al., 2009). Adjustment methods have been developed to correct for TOB at a monthly time step (e.g. Karl et al., 1986), but there has been less focus on corrections for daily data. For simplicity, we consolidate corrections for TOB changes and all other network-wide and local inhomogeneities within the monthly time-step pairwise homogenization algorithm (PHA) of Menne and Williams (2009). PHA uses a recursive implementation of the standard normal homogeneity test (SNHT; Alexandersson, 1986) and numerous pairwise comparisons of temperature time series to identify inhomogeneities in a station’s observations relative to surrounding stations (Menne and Williams, 2009). Once specific artificial changepoints in a station’s temperature series are identified, PHA estimates their magnitude and adjusts the segments between changepoints relative to the most current identifiable homogenous segment. Although these adjustments effectively remove the trend bias at a station, it is important to note that PHA does not adjust for a station’s mean temperature bias (Menne et al., 2009). For instance, if a station switches to a morning time-of-observation that causes an artificial drop in monthly temperature, PHA will adjust all previous observations downward to remove the trend bias caused by the change. Nonetheless, the station will still have a cool bias in its mean monthly temperatures relative to stations that are at midnight-to-midnight observation time. In the end, the purpose of PHA is not to adjust all station records to a theoretical set of standard observation practices, siting, and instrumentation. Instead, the purpose of PHA is to remove trend biases caused by individual station changes in such items. Within the TopoWx framework, we use the default configuration of the PHA v52i software (Menne and Williams, 2009; Williams et al., 2012), which is currently applied to homogenize monthly station data for the US Historical Climatology Network (USHCN) v2.5 dataset (Menne et al., 2009) and the GHCN-Monthly v3.2 dataset (Lawrimore et al., 2011). As PHA runs on a monthly time step, we first aggregate the daily station data to monthly means, apply PHA, and then scale the daily values to match the PHA-adjusted monthly means. This is similar to the procedure of Vincent et al. (2002) who homogenized daily data at stations in Canada by adjusting daily observations to match homogenized monthly and annual data. Although scaling daily observations to match the homogenized monthly data only corrects the mean and not the variance or skewness of a station’s temperature distribution (Della-Marta and Wanner, 2006; Kuglitsch et al., 2009), the approach is relatively straightforward and provides daily temperature series that match the trends and variations in the homogenized monthly data without the added complexities and uncertainties in detecting and Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE correcting inhomogeneities at a daily time step (Vincent et al., 2002). 2.3. 2.2.4. Missing value infilling At smaller spatial scales, synoptic-scale atmospheric conditions are mediated in the boundary layer by several main topoclimatic factors, namely elevation, topographic convergence and cold air drainage potential, slope and aspect, water bodies, and land cover (Daly, 2006; Dobrowski et al., 2009). To define the interpolation grid and represent the main topoclimatic factor of elevation, we use the 30-arcsec PRISM DEM derived from the National Elevation Dataset (Gesch et al., 2002) by Daly et al. (2008). We chose the DEM used by PRISM because it facilitates straightforward comparisons between TopoWx and PRISM and allows for easier development of models that can combine the two datasets. To account for other topoclimatic factors not completely represented by the DEM, we use spatially continuous remotely sensed observations of LST. Compared to the thermodynamic temperature that is typically measured 1.5 to 2-m above the ground, LST is the radiometric temperature of the ground surface (Jin and Dickinson, 2010). Properties of the land surface, such as land cover, topography, albedo, and soil characteristics, and their interaction with atmospheric conditions, control spatial patterns of LST (Mostovoy et al., 2006; Jin and Dickinson, 2010). While LST and air temperature have different physical meanings, LST spatial and temporal variability have been found to be highly correlated with air temperature and LST has been used to inform air temperature interpolations where weather station observations are sparse (e.g. Mostovoy et al., 2006; Vancutsem et al., 2010; Hengl et al., 2011; Benali et al., 2012). For observations of LST, we use the Moderate Resolution Imaging Spectroradiometer (MODIS), 8-day, 1-km LST product (MYD11A2; Dozier, 1996; Wan, 2008). MYD11A2 estimates LST using the thermal infrared signal received by the MODIS sensor and a split-window algorithm that uses differential absorption in adjacent infrared bands to correct for atmospheric attenuation and land cover classification-based emissivities to account for variability in surface emissivity (Dozier, 1996; Snyder et al., 1998). The 8-day product is an average of daily clear-sky LST during a respective 8-day period. We use MYD11A2 from the Aqua satellite since its day and night overpass times more closely correspond to the diurnal timing of Tmax and Tmin in the CONUS (Crosson et al., 2012). As Aqua MYD11A2 is only available from mid-2002 and we are interpolating temperature back to 1948, we calculate 10-year (2003–2012) monthly LST means for both day and night observations and use them as static auxilary predictors analgous to the elevation predictor, but monthly-varying. In other words, we use a different mean LST predictor for each month and temperature variable (Tmin or Tmax) for a total of 24 auxilary mean LST predictors. We quantify mean LST using the eight MYD11A2 8-day periods centred around each respective month. The frequent incompleteness of weather station observations creates an additional challenge for climate research and spatio-temporal interpolations (Huth and Nemesova, 1995). Simply interpolating raw incomplete data could produce inhomogeneities in the gridded output as the number of stations and station spatial coverage vary during the 1948–2012 time period (Guentchev et al., 2010). The issue is particularly acute in the mountainous areas of the western CONUS where many remote and higher elevation SNOTEL and RAWS stations have only come online in the past 30 years. The use of non-missing neighbouring observations to infill missing data at a target station has been regularly used to create serially complete station data (DeGaetano et al., 1995; Huth and Nemesova, 1995; Eischeid et al., 2000). The most generally accurate infilling method, termed spatial regression (Durre et al., 2010), uses overlapping observation periods to develop regression models between a target station and neighbouring stations and then uses the models to infill the target’s missing values (Kemp et al., 1983; Huth and Nemesova, 1995; Hubbard and You, 2005). Quantified with a correlation metric, more weight is often given to those neighbours having a stronger relationship with the target (Kemp et al., 1983; Hubbard and You, 2005; Durre et al., 2010). Building upon the spatial regression assumption that there is a useful correlation structure between a target station and its neighbours, we adopt a two-step statistical procedure (Appendix S1) to infill missing temperature values in the homogenized station records using not only neighbouring longer-term stations, but also synoptic atmospheric conditions as provided by the National Centers for Environmental Prediction/National Center for Atmospheric Research reanalysis dataset (Kalnay et al., 1996). The procedure is identical for Tmin and Tmax and we complete it separately for each variable. Using both an expectation maximization-based infilling (Schafer, 1997) and a principal component analysis method robust to missing values (Stacklies et al., 2007), we estimate the 1948–2012 daily temperature mean and variance for an incomplete station time series and then infill the daily anomalies around the mean (Appendix S1). We found that this approach reduces mean absolute error (MAE) and maintains observed temperature variance better than the pure spatial regression methods. To ensure that station time series are consistent through time, for any station that has more than 5 continuous years of missing data from 1948–2012, we replace all the station’s temperature observations with values from the station’s infill model. While this will likely have some effect on daily interpolation accuracy, the accuracy trade-off allows us to still incorporate valuable data from short-term stations while avoiding the introduction of even slight artificial changepoints in temperature means and variances. © 2014 Royal Meteorological Society 2.3.1. Temperature interpolation Auxiliary spatial predictors Int. J. Climatol. (2014) J. W. OYLER et al. Similar to the station data, MYD11A2 also suffers from a significant amount of missing data largely due to cloud contamination (Crosson et al., 2012). For a single 8-day pixel value, if the MODIS QA flags indicate cloud contamination or other possible issues resulting in an average emissivity error >0.02 or average LST error >2 β C, we do not consider the value in the 10-year mean. We also completely remove any grid cell missing more than two thirds of its 2003–2012 8-day values. When only using non-missing data to calculate mean LST, we found that missing data, especially during regional cloudy periods in winter, resulted in discontinuities and spatial artefacts. Using the three nearest stations to each MYD11A2 grid cell, we consequently apply the same mean estimation procedure used for the station data (Appendix S1) to better estimate 2003–2012 mean LST values. To further characterize the influence of topography on daily cold air drainage, we derive a multi-scale topographic dissection index (TDI; Holden et al., 2011a) from the PRISM DEM: ( ) n z s0 − zmin (i) ( ) ∑ (1) TDI s0 = z (i) − zmin (i) i=1 max where TDI(s0 ) is the final multi-scale TDI value for grid-cell location s0 , z(s0 ) is the elevation of grid-cell location s0 , zmin (i) is the overall minimum grid-cell elevation in spatial window i, zmax (i) is the overall maximum grid-cell elevation in spatial window i and n is the number of spatial windows (Holden et al., 2011a). The TDI for a specific window size reflects the height of a grid cell relative to the surrounding terrain and ranges from 0 (lower than the surrounding terrain) to 1 (higher than the surrounding terrain). Across a network of temperature sensors in complex terrain, Holden et al. (2011a) found a multi-scale TDI to be well correlated with daily patterns of Tmin anomalies influenced by cold air drainage. Ranging in value from 0 to 5, we calculate our multi-scale TDI across a total of five spatial window sizes (3, 6, 9, 12, and 15 km). Although our selection of these five window sizes is subjective, the window sizes account for spatial variations in an optimal TDI scale across the CONUS domain, yet still maintain a spatially static definition of the TDI predictor. 2.3.2. Monthly normal temperature interpolation Similar to the two-step infilling algorithm, we use a two-step interpolation procedure that first interpolates the monthly temperature normals at a grid cell and then the 1948–2012 daily variation around the normals. The procedure is again identical for both Tmin and Tmax. We define a month’s normal Tmin or Tmax as the month’s mean value from 1981–2010, the latest 30-year normal period defined by NOAA’s National Climatic Data Center. We adopt a regression-kriging (RK) framework (Hengl et al., 2004) that assumes monthly normal temperature represents a spatial process that can be expressed by the sum of deterministic and spatially autocorrelated stochastic components: ( ) ( ) ( ) T s0 , m0 = T π s0 , m0 + T e s0 , m0 (2) © 2014 Royal Meteorological Society ( ) where T s0 , m0 is the final interpolated normal tempera( ) ture at grid-cell location s0 and month m0 , T π s0 , m0 is the deterministic spatial trend or drift in normal temperature modelled by station horizontal locations and auxiliary ( ) predictors, and T e s0 , m0 is a stochastic spatially autocorrelated residual with mean zero (Hengl et al., 2004; Webster and Oliver, 2007). Following the RK framework of Hengl et al. (2004) and the multiple linear regression model (of Florio ) et al. (2004), we use linear regression to fit T π s0 , m0 and ordinary ( ) kriging (OK) to interpolate T e s0 , m0 : ( ( ) ) T s0 , m0 = π½0 + π½1 x + π½2 y + π½3 z + π½4 lst m0 + n ∑ ( ) ( ) wi s0 , m0 · T e si , m0 (3) i=1 where π½ 0 , π½ 1 , π½ 2 , π½ 3 , and π½ 4 , are the estimated regression trend model coefficients for the intercept, longitude, latitude, elevation, and monthly average LST, respectively; x, y, z, and lst(m0 ) are the longitude, latitude, elevation, and average LST for m0 at grid-cell location s0 , wi (s0 , m0 ) are( weights ) defined by residual spatial covariance, and T e si , m0 are the regression residuals for n stations. In addition to the interpolation of T, RK provides an important estimate of kriging prediction standard error (π k ) at every grid cell and month, which is a straightforward method to represent general spatial uncertainty in interpolated monthly normals. RK π k is a composite uncertainty measure that reflects not only the interpolation error associated with the regression trend model, but also the geographical arrangement of stations (Hengl et al., 2004). For instance, RK π k will be higher for grid cells that are located further away from station locations and from the centre of the station predicator space (Hengl et al., 2004). We estimate RK π k through the calculation of the universal kriging π k (Cressie, 1993; Hengl et al., 2004). Like most traditional kriging analyses, we use a variogram model to define the spatial covariance structure of T e (Isaaks and Srivastava, 1989). However, given the large and diverse landscape of the CONUS, it is likely not valid to use a single global variogram that assumes the covariance structure is the same within the entire domain and across months (Lloyd, 2009). Additionally, performing RK at each grid cell using a global regression model and the entire population of stations would be computationally inefficient (Hengl, 2009) and prone to over smooth the interpolations or result in less accurate predictions and uncertainty estimates compared to more locally defined models (Lloyd, 2009). To account for non-stationarity in regression parameters and T e covariance, we use a local moving window kriging (MWK; Haas, 1990) implementation of RK (MW-RK), a kriging approach that fits a separate local regression and variogram around each and every interpolation point using only n surrounding stations (Appendix S2). Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE 2.3.3. Daily temperature anomaly interpolation Similar to the MW-RK monthly normal temperature interpolations, we assume that 1948–2012 daily temperature anomalies from the 1981–2010 normals can be expressed as the sum of a regression-modelled spatial trend and an interpolated residual. However, because there are 23 742 days from 1948–2012, a daily varying MW-RK approach is not computationally practical. We subsequently apply a simpler daily-varying interpolation model where we use a moving window GWR for the spatial trend and inverse distance weighting (IDW) for the residual interpolation. GWR is similar to regular linear regression except observation points are weighted according to their distance from a prediction location (Fotheringham et al., 2002). The GWR model is identical to the regression component in Equation (3) except we add TDI as an additional auxiliary predictor. For each month, we use the optimization procedures discussed in Appendix S2 to obtain locally optimal n values for the number of surrounding stations to use in the GWR and IDW. We calculate station weights for the GWR via a bisquare weighting function: ( ) β‘ wi s0 = β’1 − β’ β£ ( ( ) )2 2 h s0 i β€ β₯ β₯ r β¦ (4) where wi (s0 ) is the distance-based weight of station i at interpolation location s0 , r is the interpolation window radius defined as the distance of the n + 1 closest station, and h(s0 )i is the distance between station i and interpolation location s0 . We use a power parameter of 2 for the IDW. While the GWR model and IDW interpolations vary daily, both the locally optimal n and the GWR and IDW weights remain constant for each month. We obtain a final estimate of actual daily temperature by combining T and the interpolated anomaly: ( ( ) ) ( ) T s0 , d0 = T s0 , m0 + πΏT s0 , d0 (5) where T(s0 , d0 ) is the temperature at interpolation point s0 for day d0 within m0 and πΏT(s0 , d0 ) is the daily temperature anomaly at interpolation point s0 for day d0 . Our combined use of a more complex procedure to interpolate T and a simpler, faster method to interpolate πΏT can be considered a form of climatologically aided interpolation (CAI; Willmott and Robeson, 1995). However, unlike traditional implementations of CAI that model πΏT with univariate methods like pure IDW (Willmott and Robeson, 1995), we incorporate auxiliary predictors that can be critical for properly representing topoclimatic spatial patterns of πΏT. Holden et al. (2011b) showed topoclimatic factors in a mountainous region to be directly related to spatial patterns of πΏT, especially during stable atmospheric conditions favourable for cold air inversions. 2.4. Validation 2.4.1. Basic error statistics For a basic validation of the infilled daily station temperatures, interpolated monthly normal temperatures, and © 2014 Royal Meteorological Society interpolated daily temperatures, we use three main model performance metrics: MAE (Willmott and Matsuura, 2005), bias, and the refined index of agreement (dr ), a dimensionless measure of average error (Willmott et al., 2012; Appendix S3). The dr metric (Equation S5) ranges from −1.0 to 1.0 with a value >0.5 indicating a predicative ability greater than the observed mean (Willmott et al., 2012; Legates and McCabe, 2013). Unlike basic correlation measures, dr is sensitive to differences in magnitude and variance between observed and modelled values (Legates and McCabe, 1999). Since the largest mode of variability in a station’s time series is normally the seasonal cycle, we also apply a baseline adjustment to dr (Legates and McCabe, 1999; Willmott et al., 2012; Appendix S3). This effectively avoids inflated dr values that are simply the result of the model capturing the main seasonality, but not necessarily day-to-day variability (Legates and McCabe, 1999; Willmott et al., 2012). We use three separate sets of stations to validate the daily temperature infill models: long-term GHCN-D stations that are part of USHCN and at least 95% complete for the 1948–2012 time period and SNOTEL and RAWS stations that have at least 20 years of data. Assuming a worst-case missing data scenario, for each station, we set all but its last 5 years of observations to missing, build the 1948–2012 temperature infill models and then compare the infilled values with the observed values that were artificially set to missing. We calculate an overall daily MAE, bias, and mean dr (dr ) for the three networks. We also calculate the MAE of average station temperatures (AVG-MAE), which is essentially the mean of the absolute station biases. To evaluate model performance in the interpolation of 1981–2010 temperature normals and 1948–2012 daily Tmin and Tmax, we perform a leave-one-out cross-validation (LOOCV; Willmott and Robeson, 1995) with every station in the interpolation domain. We summarize MAE, bias, and dr by US climate division (Guttman and Quayle, 1996) and, following Abatzoglou (2013), October–April (‘cold’ season) and May–September (‘warm’ season) time periods. For daily temperature, we limit the LOOCV to only non-missing, non-infilled observations to provide a better indication of the error associated with actual observed temperature and not the infilled values. 2.4.2. Homogenization Since TopoWx is the first CONUS-scale TCD to use homogenized station data, it is important to specifically validate the homogenization process. As a validation dataset, we use homogenized monthly observations from the official USHCN v2.5 product (Menne et al., 2009; version 2.5.0 20130622). For each USHCN station (n = 1218), we extract the TopoWx interpolated 1948–2012 daily temperatures from the nearest 30-arcsec grid cell. Following Menne et al. (2009), we then calculate annual temperature anomalies (1981–2010 base period) for each USHCN station location for both the USHCN v2.5 and TopoWx data and interpolate the anomalies to a Int. J. Climatol. (2014) J. W. OYLER et al. 0.25β grid using the IDW method of Willmott et al. (1985). From the 0.25β anomaly grids, we calculate and compare the area-weighted CONUS mean annual anomalies of TopoWx and USHCN v2.5. If TopoWx effectively homogenizes the input station data, 1948–2012 CONUS annual anomaly differences between TopoWx and USHCN v2.5 should be small. Although USHCN v2.5 is the official homogenized station dataset for the CONUS, it also uses PHA and is not completely independent from TopoWx. Therefore, we also compare TopoWx CONUS annual anomalies to those of Berkeley Earth, a global temperature dataset that uses an entirely different procedure to account for station record inhomogeneities (Rohde et al., 2013). To examine whether the homogenized TopoWx TCD does in fact improve upon non-homogenized TCDs, we additionally apply the same USHCN v2.5 and Berkeley Earth annual anomaly comparison to three non-homogenized datasets: TopoWx interpolations based on non-homogenized station data (TopoWx Raw), the Daymet 1-km product (Thornton et al., 2012), and the PRISM 2.5-min monthly product (PRISM Climate Group, 2013a). In conducting these comparisons, we acknowledge that most non-homogenized TCDs were never intended to be used to analyse temperature trends (PRISM Climate Group, 2013b). Nonetheless, the use of TCDs in such a context continues to occur (e.g. Diaz and Eischeid, 2007; van Mantgem et al., 2009; Crimmins et al., 2011). 2.4.3. Land skin temperature Within the TopoWx framework, the application of remotely sensed LST as an auxiliary predictor likely has the greatest potential to improve spatial representations of temperature normals (Hengl et al., 2011). To quantify the influence of the LST predictors and whether the influence differs between Tmin and Tmax, we first compare temperature normal biases between TopoWx and three TCDs that do not use LST: TopoWx without an LST predictor (TopoWx-No-LST), the Daymet 1-km product (Thornton et al., 2012), and the PRISM 30-arcsec 1981–2010 monthly normals product (PRISM Climate Group, 2012). Both Daymet and PRISM use a GWR approach to interpolation, but Daymet only accounts for elevation (Thornton et al., 1997), while PRISM has a sophisticated station weighting scheme to account for numerous other topoclimatic factors (Daly et al., 2002; Daly et al., 2008). We focus the bias analysis on stations in the more topographically complex western CONUS (n = 4923; Figure 2(a)). For all four datasets, we calculate bias in relation to an index of station LST spatial setting (LST-I). We generate LST-I values for each station by applying the TDI calculation in Equation (1) to the LST grids. An LST-I value of 0 represents an area with an LST value relatively colder than surrounding terrain while a value of 5 represents an area with an LST value relatively warmer than surrounding terrain. In addition to the bias analysis, we also analyse the absolute and relative influence of LST and the other MW-RK © 2014 Royal Meteorological Society predictors (longitude, latitude, and elevation) on interpolations of western CONUS monthly normals. At each station location, we perform basic monthly multiple linear regressions of the predictors and monthly normals. We quantify relative predicator influence by partitioning the proportion of model variance explained (R2 ) accounted for by each predictor using the ‘lmg’ method (Lindeman et al., 1980) of the relaimp package (Grömping, 2006) within the R environment for statistical computing (R Core Team, 2012). The lmg method averages the sequential sum of squares over different predictor orderings to better account for multicollinearity. 2.4.4. Uncertainty In addition to improved temporal and spatial representations of topoclimatic air temperature, one of the main objectives of TopoWx is to provide accurate grid-cell level estimations of uncertainty. To assess the accuracy of MW-RK prediction standard error (π k ), we evaluate the relationship between station LOOCV monthly normal MAE and π k . If π k properly accounts for local variability in station monthly normals, π k should have a strong positive correlation with MAE (Harris et al., 2010). We also examine the relationship between π k and MAE at a regional scale by quantifying the correlation between climate division average MAE and π k . Besides correlating π k to MAE, by assuming normality, we can use π k to estimate symmetric prediction confidence intervals (PCIs). If the PCIs are accurate, a given n% of LOOCV predictions should fall with their n% PCI (Harris et al., 2010). For instance, 95% of LOOCV predictions should fall within their respective 95% PCI. We quantify PCI accuracy across the full range of interval probabilities with the G-statistic (Goovaerts, 2001; Harris et al., 2010). The G-statistic ranges from 0.0 to 1.0 with values closer to 1.0, indicating higher PCI accuracy. As previously described, π k is a composite measure that incorporates uncertainty from both the deterministic and spatially autocorrelated stochastic components of the MW-RK procedure (Hengl et al., 2004). Because most other TCDs use forms of GWR that only model a deterministic spatial trend (Thornton et al., 1997; Daly et al., 2008), we also compare π k to uncertainty estimates from a GWR version of the MW-RK trend model (Equation (3)). We use Equation (4) to define local GWR weights and calculate GWR prediction standard errors (π GWR ) according to Leung et al. (2000). We compare the π GWR MAE correlations, G-statistics, and average PCI widths to those of π k and also examine differences in spatial patterns between the two uncertainty measures. 3. Results and discussion 3.1. Basic error statistics 3.1.1. Infilled missing values After the removal of QA-flagged observations (Table S1), 14 087 stations met the minimum criteria of 5 years of Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE Table 1. Infill error statistics. (a) Cross-validation error statistics for daily 1948–2012 temperature infilling based on using only 5 years of data to build the infill models. (b) Error statistics for daily 1948–2012 temperature infilling on days with both infilled and observed values for stations within the CONUS. Network (number of stations) (a) GHCN-D (626) RAWS (376) SNOTEL (541) (b) GHCN-D (9480) RAWS (1244) SNOTEL (691) Tmin Tmax Bias (β C) AVG-MAE (β C) Daily MAE (β C) dr Bias (β C) AVG-MAE (β C) Daily MAE (β C) dr +0.00 −0.12 −0.07 0.22 0.19 0.15 1.36 1.58 1.67 0.82 0.76 0.75 +0.01 +0.04 +0.09 0.23 0.18 0.23 1.48 1.40 1.77 0.82 0.83 0.78 +0.00 −0.02 −0.01 0.03 0.06 0.04 1.06 1.10 1.10 0.85 0.83 0.83 +0.00 −0.00 +0.00 0.03 0.06 0.05 1.03 1.01 1.14 0.87 0.88 0.86 Error metrics are defined in Section 2.4.1. observations for Tmin and/or Tmax (Figure 2(a)). Of these, a total of 626 USHCN stations, 376 RAWS stations, and 541 SNOTEL stations were used for infill model cross-validation based on their longer periods of record. Overall, cross-validation errors for the infill models appeared to be reasonable, especially considering that the cross-validation procedure limited model building to only 5 years of data (Table 1(a)). Except for a RAWS Tmin bias of −0.12 β C, temperature bias for each network was within ±0.10 β C. For all three networks, AVG-MAE was <0.25 β C, daily MAE was <2.0 β C, and dr was ≥0.75 (Table 1(a)). As described in Section 2.2.4., to minimize even slight artificial mean and variance changepoints, for any station with more than 5 continuous years of missing data from the period 1948–2012, we replace all the station’s temperature observations with values from the station’s infill model. For both Tmin and Tmax, around 80% of stations fell into this category and had their observations replaced with infilled values (Tmin n = 11 289; Tmax n = 11 315). This still resulted in around 3000 long-term stations retaining non-infilled observations. To make sure that the infilled values adequately represented the original observations at the shorter-term stations, we calculated error summaries for all stations in the CONUS (Table 1(b)). These error statistics are different than those from the cross-validation procedure as they represent residuals between infilled values and the observations from which the infill models were actually built. For all three networks and both Tmin and Tmax, bias was within ±0.02 β C, AVG-MAE was <0.10 β C, daily temperature MAE was <1.15 β C, and dr was ≥0.83. 3.1.2. Interpolated monthly normal temperatures Across the CONUS, overall LOOCV monthly normal Tmin MAE (0.80–0.84β C) was higher than overall monthly normal Tmax MAE (0.60–0.62 β C; Figure 3). Likely a reflection of the multifaceted relationship between Tmin and elevation (e.g. Bolstad et al., 1998; Lundquist et al., 2008; Daly et al., 2010; Holden et al., 2011a), higher monthly normal Tmin MAE was most apparent in © 2014 Royal Meteorological Society the topographically complex areas of the western CONUS (Figure 3). Monthly normal Tmin MAE was less elevated in climate divisions with relatively flat and homogenous landscapes, especially in the interior plains of the central CONUS (Figure 3). The highest monthly normal Tmax MAE was during the May–September time period within climate divisions along the California Pacific coast (Figure 3). During the summer, owing to the relatively cool California current and the position of the North Pacific High, coastal marine inversion layers and stratus clouds produce a strong Tmax gradient from the coast to more inland areas and a greatly complicated relationship between elevation and Tmax (Daly et al., 2008; Iacobellis and Cayan, 2013). Overall monthly normal Tmin was slightly positively biased (+0.01 β C) for the CONUS while monthly normal Tmax was slightly negatively biased (−0.03 to −0.01 β C; Figure 3). At the scale of individual climate divisions, Tmin generally had marginally larger biases than Tmax. For instance, 33% of climate divisions had a monthly normal Tmin absolute bias >0.1 β C for both the October–April and May–September time periods compared to 23% of climate divisions for Tmax. 3.1.3. Interpolated daily temperatures Compared to the monthly normals, daily temperature LOOCV MAE was greater with overall daily Tmin (Tmax) MAE ranging from 1.43 β C (1.34 β C) in the May–September time period to 1.75 β C (1.61 β C) in the October–April time period (Figure 4). Similar to the monthly normals, higher daily Tmin MAE was noticeable in the topographically complex areas of the western CONUS and daily Tmax MAE had higher values along the Pacific coast during the May–September time period (Figure 4). From October–April, a north-south swath of higher daily Tmax MAE was also evident through portions of the Rocky Mountains and Great Plains (Figure 4). This region of higher daily Tmax MAE could be a result of both the occurrence of wintertime Tmax inversions (Daly et al., 2010) and the higher frequency and magnitude of wintertime cold and warm fronts (Camargo and Hubbard, Int. J. Climatol. (2014) J. W. OYLER et al. Figure 3. Leave-one-out cross-validation error statistics for interpolated 1981–2010 monthly temperature normals summarized by US climate division. Statistics are based on all input GHCN-D, SNOTEL, and RAWS stations within the conterminous United States (n = 11 589 for minimum temperature; n = 11 619 for maximum temperature). MAE is mean absolute error. Point maps of individual station MAE and bias can be found in Figures S1–S4. 1999). Given the relatively flat, open prairie and agricultural landscapes of the Great Plains, it is also likely that winter spatial patterns of daily Tmax in the region are less a function of the underlying terrain than they are a function of air mass and front positions. Overall LOOCV Tmin dr ranged from 0.73 to 0.78 while Tmax dr ranged from 0.79 to 0.82 (Figure 4). Spatial patterns of dr were generally similar to those of daily MAE with weaker dr values in the western CONUS (Figure 4). In contrast to daily MAE patterns, weaker dr values were also found in the Florida peninsula during the May–September time period where climate division Tmin dr ranged from 0.59 to 0.66 and Tmax dr ranged from 0.60 to 0.72 (Figure 4). We found summer station observations in Florida to have the lowest temporal standard deviations out of any stations in the CONUS. Thus, even though daily MAE is relatively low in Florida during the summer (Figure 4), small differences between interpolated and observed values have greater potential to reduce dr than in regions or time periods with greater observation seasonality and daily variability (Hubbard, 1994). Sea breezes along the Florida coast and associated convective activity are also strongest in summer (e.g. Pielke, 1974), likely making spatial patterns in daily Tmax harder to resolve. © 2014 Royal Meteorological Society 3.2. Homogenization Compared to the non-homogenizied TCDs, TopoWx CONUS annual temperature anomalies appeared to be more temporally consistent with USHCN v2.5 data and Berkeley Earth, especially for Tmax (Figure 5). The TopoWx 1948–2012 CONUS Tmax trend of 0.123 β C decade−1 was nearly identical to the USHCN v2.5 Tmax trend of 0.125 β C decade−1 and only slightly warmer than the 0.118 β C decade−1 Berkeley Earth trend (Table 2). In contrast, TopoWx Raw and PRISM 1948-2012 Tmax trends were non-significant and much less positive (Table 2). The cold bias in the TopoWx Raw, PRISM, and Daymet Tmax trends is a well-known attribute of the non-homogenized US Tmax record and is attributed to the general conversion from evening to morning observation times and the switch from liquid-in-glass thermometers to the maximum–minimum temperature system (Menne et al., 2009). Homogenization also appeared to improve the correspondence in CONUS Tmin anomalies between TopoWx and both USHCN v2.5 and Berkeley Earth, but not to the extent of Tmax (Figure 5). The TopoWx 1948-2012 Tmin trend of 0.160 β C decade−1 , while greater than the 0.134 β C decade−1 TopoWx Raw and 0.142 β C decade−1 PRISM trends, was still biased cold in relation to the Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE Figure 4. Leave-one-out cross-validation error statistics for interpolated 1948–2012 daily temperatures summarized by US climate division. Statistics are based on observed, non-missing observations at all input GHCN-D, SNOTEL, and RAWS stations within the conterminous United States (n = 11 589 for minimum temperature; n = 11 619 for maximum temperature). MAE is mean absolute error. Mean dr is the mean refined index of agreement. Note inverted color bar for dr. Point maps of individual station MAE and dr can be found in Figures S5–S8. Table 2. Annual temperature trends for the CONUS based on USHCN v2.5 data, Berkeley Earth, TopoWx, TopoWx Raw, PRISM, and Daymet. Dataset USHCN v2.5 Berk Earth TopoWx TopoWx Raw PRISM Daymet 1948–2012 Trend (β C decade−1 ) 1981–2010 Trend (β C decade−1 ) Tmin Tmax Tmin Tmax +0.185* +0.181* +0.160* +0.134* +0.142* NA +0.125* +0.118* +0.123* +0.025 +0.000 NA +0.199* +0.182* +0.177* +0.169* +0.193* +0.191* +0.266* +0.252* +0.272* +0.117 +0.080 +0.077 *p-value ≤ 0.10. USHCN v2.5 0.185 β C decade−1 and Berkeley Earth 0.181 β C decade−1 trends (Table 2). Additionally, over the 1981–2010 time period, both Daymet and PRISM had 1981–2010 Tmin trends more similar to USHCN v2.5 while TopoWx was closer to Berkeley Earth (Table 2). The remaining cold bias in the TopoWx Tmin trend in relation to USHCN v2.5 and Berkeley Earth could be the result of PHA not entirely adjusting for TOB inhomogeneities in the Tmin record. For the USHCN v2.5 data, a specific monthly TOB correction (Karl et al., 1986) © 2014 Royal Meteorological Society is applied before PHA. Additionally, in contrast to the 1895-present USHCN v2.5 period-of-record, we only run PHA over the 1948–2012 time period. Nevertheless, given the closer match in annual anomalies between TopoWx and both USHCN v2.5 and Berkeley Earth, the 1948–2012 PHA-only homogenization still appears to largely account for the main network-wide inhomogeneities in the raw station data and is a clear improvement over the non-homogenized TCDs (Figure 5). 3.3. Land skin temperature Unlike TCDs without an LST predictor, TopoWx had consistently low monthly normal Tmin bias across seasons and different LST spatial settings (Figure 6). TopoWx-No-LST, PRISM, and Daymet tended to overestimate Tmin in areas with colder LST values and underestimate Tmin in areas with warmer LST values (Figure 6). Averaged across all western CONUS stations, LST was also the most important predictor of monthly normal Tmin accounting for >50% of the variance explained across all months (Figure 7(b)). In contrast, the relative importance of the elevation predictor remained near or below 20% for most months and only rose to near 30% during the spring (Figure 7(b)). Throughout the mountainous western CONUS, microclimate influences on Tmin can be strong and Tmin cold Int. J. Climatol. (2014) J. W. OYLER et al. (a) (b) (c) (d) Figure 5. Differences in average mean annual temperature anomalies for the conterminous United States. (a) Difference from USHCN v2.5 minimum temperature anomalies, (b) difference from USHCN v2.5 maximum temperature anomalies, (c) difference from Berkley Earth minimum temperature anomalies, and (d) difference from Berkeley Earth maximum temperature anomalies. Differences are the respective dataset values minus USHCN v2.5 or Berkeley Earth values. TopoWx Raw is TopoWx driven by non-homogenized station data. Daymet is only available from 1980 onwards. (a) (b) (c) (d) Figure 6. Dataset bias for stations in the western United States (n = 4923) grouped by an index of land skin temperature (LST). (a) Cold season minimum temperature, (b) cold season maximum temperature, (c) warm season minimum temperature, and (d) warm season maximum temperature. An LST index value of 0 represents an area with an LST value relatively colder than surrounding terrain while a value of 5 represents an area with an LST value relatively warmer than surrounding terrain. TopoWx-No-LST is TopoWx without an LST predictor. © 2014 Royal Meteorological Society Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE (a) (b) (c) Figure 7. Diagnostic statistics of moving window monthly multiple linear regressions relating the moving window regression kriging auxiliary predictors (elevation, land skin temperature, latitude, and longitude) and 1981–2010 monthly temperature normals within the western United States. (a) Overall variance explained (R2 ); and proportion of R2 attributed to each predictor for (b) minimum temperature and (c) maximum temperature. Statistics are averaged across 4923 western US station locations. air pools and inversions are a common phenomenon, especially during periods of atmospheric stability and significant radiative cooling (e.g. Lundquist et al., 2008; Daly et al., 2010; Holden et al., 2011b). As a result, Tmin often does not have a simple linear relationship with elevation, which limits the ability of an individual elevation predictor to properly represent Tmin spatial patterns (Daly et al., 2008; Dobrowski et al., 2009). On the basis of the high relative importance of the LST predictor and the decreased bias of TopoWx over TopoWx-No-LST, the addition of LST appears to help overcome the limitations of the elevation predictor and provides significant added value to the monthly normal Tmin interpolations (Figure 7(b)). Although the LST predictor appeared to decrease Tmax bias at the lowest LST-I values, differences between TopoWx and TCDs without LST were not as significant as those seen for Tmin (Figure 6). Except for the lowest LST-I values, Tmax bias for all the datasets was <±0.25 β C (Figure 6). Furthermore, in contrast to Tmin, the relative importance of the LST predictor in predicting western CONUS Tmax normals was less than that of elevation in all months except for December and January (Figure 7(c)). There are likely two main reasons for this result. First, since Tmax generally displays a simpler linear decrease with elevation, elevation is already a strong predictor of Tmax without the addition of LST (Daly, 2006; Daly et al., 2008; Dobrowski et al., 2009). © 2014 Royal Meteorological Society This was not the case for Tmin where elevation was a relatively weak predictor (Figure 7(b)). Second, owing to solar radiation effects on the thermal infrared signal (Vancutsem et al., 2010; Benali et al., 2012), different mediating effects of land cover and moisture regimes on the surface energy balance (e.g. Mildrexler et al., 2011), and increased daytime convective turbulence and advection compared to nighttime conditions (Pielke et al., 2007; Kloog et al., 2012), the relationship between Tmax and LST is often more complex than that of Tmin and LST (Vancutsem et al., 2010; Benali et al., 2012; Kloog et al., 2012). Given that MODIS LST can only be retrieved under a relatively cloudless atmosphere, the maximum LST predictor is also likely biased to clear-sky conditions when the difference between maximum LST and Tmax is normally greatest because of increased insolation (Jin et al., 1997). For the winter months that did display slightly higher relative influence values for LST (Figure 7(c)), lower wintertime insolation is likely resulting in a more linear correspondence between LST and Tmax across different surface conditions. In the winter, climatological Tmax inversions and snow cover in many mountainous regions of the western CONUS (Whiteman et al., 1999; Pepin et al., 2011) could also be lessoning the predictive power of elevation and increasing that of LST. Ultimately, even though the MW-RK linear model had overall greater predictive power for Tmax than Tmin (Figure 7(a)), the added value of LST on interpolations of monthly normal Tmax in the western Int. J. Climatol. (2014) J. W. OYLER et al. Table 3. Performance metrics of monthly normal prediction standard error (π) for moving window regression kriging (MW-RK) and geographically weighted regression (GWR). Tmin MW-RK GWR Tmax MW-RK GWR G-statistic MAE and π correlation Average climate division MAE and π correlation Average PCI width (β C) 0.995 0.984 0.41 0.36 0.89 0.90 1.636 1.735 0.992 0.981 0.35 0.33 0.81 0.82 1.218 1.274 Metrics are defined in Section 2.4.4. CONUS appears to be mainly confined to specific months (Figure 7(c)) or environmental settings (Figure 6) and is less significant than the added value seen for Tmin. 3.4. Uncertainty The 1981–2010 monthly normal temperature PCIs derived from π k displayed high accuracy with G-statistics of 0.995 and 0.992 for Tmin and Tmax, respectively (Table 3). For instance, the actual percentage of LOOCV monthly normal predictions within the 95% PCI was 94.6% for Tmin and 94.5% for Tmax. Compared to the high PCI accuracy, the correlation between individual station LOOCV MAE and π k was positive, but not overwhelmingly strong (Table 3). The MAE and π k correlation was 0.41 for Tmin and 0.35 for Tmax. Nonetheless, the correlations are similar to those from the best performing kriging models reviewed by Harris et al. (2010). At the scale of US climate divisions, the correlations between MAE and π k were also much stronger with Tmin and Tmax at 0.89 and 0.81, respectively (Table 3). These results are similar to those of Daly et al. (2008) who found PCIs derived from the PRISM interpolation model to be more highly correlated with MAE at larger regional aggregations. Although accuracy metrics for π k were favourable, they were not significantly better than those of the deterministic GWR model (Table 3). The GWR G-statistics and individual station MAE and π GWR correlations were lower than those of π k , but nearly indistinguishable. For similarly performing uncertainty models, the one with the smaller average PCI widths is normally preferred (Harris et al., 2010). While π k again performed better than π GWR in this regard, differences were not substantial (Table 3). The average π k PCI widths were 4.4–5.7% smaller than the average π GWR PCI widths. In contrast to the accuracy metrics, differences in local spatial patterns between π k and π GWR were much more distinguishable. As a local example, in the western climate division of Montana, USA, August monthly normal Tmin π k displayed bullseyes of decreased uncertainty around station locations while π GWR did not (Figure 8). Unlike π GWR , which only represents model goodness of fit (Daly et al., 2008), π k accounts for the geographical arrangement of stations (Hengl et al., 2004). The π k field was also smooth while π GWR had circular arcs of discontinuities likely resulting from specific stations moving in or out of the local GWR radius. In contrast to these differences, © 2014 Royal Meteorological Society both π k and π GWR had spatial patterns that followed the underlying elevation and/or LST values of the grid cells. In the end, the uncertainty spatial patterns are reflective of the advantages of MW-RK π k over not only GWR π GWR , but also OK π k . The GWR π GWR measure only represents model goodness of fit while OK π k only accounts for the geographical arrangement of stations. As evident in the local example, MW-RK π k is able to combine both components of uncertainty into a single composite measure (Hengl et al., 2004). 3.5. Example output and comparison with other datasets As an example of the final TopoWx output for the CONUS, we concentrate on the summer month of August. In August, nighttime microclimate influences and Tmin inversions are more consistent in many mountainous regions of the western CONUS due to increased nighttime atmospheric stability (e.g. Finklin, 1986; Holden et al., 2011b). Coastal marine inversions layers also increase Tmax spatial complexity along the Pacific coast (Daly et al., 2008; Iacobellis and Cayan, 2013). We examine spatial patterns in both August Tmin and Tmax normals and corresponding uncertainty. We also compare TopoWx August Tmin and Tmax normals within the western CONUS to those of the Daymet 1-km product (Thornton et al., 2012), and PRISM 30-arcsec 1981–2010 monthly normals product (PRISM Climate Group, 2012). In August, TopoWx Tmax displayed a strong correspondence to elevation gradients, especially in the western CONUS (Figure 9). Cooler Tmax temperatures were also noticeable along the Pacific coast. In contrast, TopoWx August Tmin displayed more complexity with relationships to not only elevation, but also convergent valleys, large inland lakes and rivers, and urban areas (Figure 9). Uncertainty patterns for both August Tmin and Tmax (Figure 9) directly corresponded to warm season MAE (Figure 3). Higher August Tmin π k values were seen throughout the topographically complex western CONUS, while higher Tmax π k values were mainly confined to the Pacific coast (Figure 9). Although regional differences in π k dominated the spatial patterns at the scale of CONUS, uncertainty patterns related to station locations and topographical patterns were still discernable (Figure 9). Differences in western CONUS August Tmin normals between TopoWx and the existing PRISM and Daymet Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE (a) (b) (c) Figure 8. Maps of prediction standard error for 1981–2010 August minimum temperature normals for the western climate division of Montana, USA. (a) Climate division topography and geographical context; (b) TopoWx moving window regression kriging prediction standard error; and (c) geographically weighted regression prediction standard error. Dots in (a) are weather station locations. Figure 9. Conterminous US maps of TopoWx 1981–2010 August temperature normals and corresponding uncertainty. Note different scales for Tmin normals and Tmax normals. © 2014 Royal Meteorological Society Int. J. Climatol. (2014) J. W. OYLER et al. Figure 10. Western US maps of differences in 1981–2010 August temperature normals between PRISM and TopoWx and between Daymet and TopoWx. TCDs were substantial (Figure 10). Only 47% of western CONUS grid cells for Daymet were within 1.0 β C of TopoWx Tmin and Daymet had an overall −0.83 β C cold bias in relation to TopoWx western CONUS Tmin. Differences between PRISM and TopoWx western CONUS Tmin were smaller, but still significant (Figure 10). PRISM Tmin displayed an overall −0.30 β C cold bias in relation to TopoWx Tmin and 57% of PRISM grid cells were within 1.0 β C of TopoWx Tmin. In mountainous terrain, both Daymet and PRISM generally displayed warmer valley and cooler mountain Tmin than TopoWx, but PRISM tended to better match TopoWx spatial patterns. For instance, within the undulating basin and range topography of the northeastern climate division of Nevada, USA, Daymet Tmin significantly smoothed out terrain influences while PRISM displayed Tmin inversion patterns more similar to TopoWx (Figure 11). Elevation, the only topoclimatic factor accounted for by Daymet, is a poor predictor of August Tmin in this Nevada climate division. For the TopoWx MW-RK Tmin model, the average relative importance of elevation within the region was only 6% of variance explained. Conversely, LST was the dominant predictor at over 77% of variance explained. Daymet differences from TopoWx were subsequently © 2014 Royal Meteorological Society negatively correlated with LST (r = −0.82). While the PRISM model does not use LST as a predictor, its sophisticated station weighting scheme better accounts for Tmin inversions and other topoclimatic factors (Daly et al., 2008). Nevertheless, with a negative correlation between PRISM differences from TopoWx and the LST predictor (r = −0.61), PRISM Tmin still tended to be warmer in the valleys and cooler in the mountains than TopoWx Tmin (Figure 11). In addition to these differences in mountainous terrain, Daymet and PRISM August Tmin were also cooler than TopoWx over large inland water bodies like the Great Salt Lake (Figure 10). While TCDs are usually only used over land and not expected to be valid over water, an ability to better represent temperature patterns directly over water bodies would be a beneficial advancement. However, given the significant differences in LST values over water bodies compared to their surrounding terrestrial landscapes, a lack of station observations over many lakes, and generally higher water body π k uncertainty values (e.g. Figure 8), further validation is likely required to confirm the accuracy of Tmin spatial patterns over water. Compared to water bodies, differences related to urban areas in the western CONUS were less visually discernable Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE Figure 11. Comparison of TopoWx, PRISM, and Daymet 1981–2010 August minimum temperature normals within the northeastern climate division of Nevada, USA. except in the Central Valley of California. Within the Central Valley, Daymet and PRISM were generally warmer than TopoWx except for islands of warmer TopoWx Tmin over urban areas like Fresno (Figure 10). Differences likely unrelated to underlying terrain or land cover were also noticeable, especially in northwestern California where Daymet August Tmin was more than 10 β C degrees colder than both TopoWx and PRISM throughout much of the region (Figure 10). In contrast to Tmin, differences between TopoWx August Tmax and the other TCDs were not as significant (Figure 10). The percentage of western CONUS grid cells within 1.0 β C of TopoWx Tmax was 91% for Daymet and 90% for PRISM. In relation to TopoWx Tmax, Daymet was biased +0.06 β C and PRISM was biased +0.27 β C within the western CONUS. Corresponding to TopoWx Tmax uncertainty (Figure 9), the most substantial Tmax © 2014 Royal Meteorological Society differences were mainly confined to areas near and along the Pacific coast (Figures 10 and 12). Owing to the frequent onshore presence of a marine layer in the summer (Johnstone and Dawson, 2010; Iacobellis and Cayan, 2013), the California Pacific coast represents one of the few areas where Tmax and elevation do not have a simple linear relationship (Daly et al., 2008). For example, within the north coast drainage climate division of California (Figure 12), elevation only had an average relative importance of 14% within the TopoWx MW-RK Tmax model while LST relative importance was 47%. Correspondingly, in viewing TCD outputs for the climate division, Daymet Tmax normals were over smoothed in certain areas while both PRISM and TopoWx displayed more realistic coastal and topographic influences (Figure 12). Similar to the Nevada example for Tmin, TopoWx relies on LST to overcome limitations of the Int. J. Climatol. (2014) J. W. OYLER et al. Figure 12. Comparison of TopoWx, PRISM, and Daymet 1981–2010 August maximum temperature normals within the north coast drainage climate division of California, USA. elevation predictor while PRISM uses a station weighting scheme based on coastal proximity and terrain blockage of the marine layer (Daly et al., 2008). Within this context, TopoWx Tmax tended to display a deeper inland penetration of the cooling maritime influence on the Pacific coast than PRISM Tmax (Figure 12). One reason for this difference could be related to the fog and low stratus clouds that frequently accompany the marine layer (Johnstone and Dawson, 2010; Iacobellis and Cayan, 2013). As LST observations can only be retrieved under relatively cloudless conditions, the TopoWx Tmax spatial patterns could be biased to what is more frequently seen under a clear-sky atmosphere. While a more detailed validation would be required to determine the exact advantages and disadvantages of PRISM and TopoWx along the Pacific © 2014 Royal Meteorological Society coast, this represents a good example of one potential limitation of the LST predictor and the spatial patterns it produces. Overall, the differences in western CONUS August Tmin and Tmax normals between TopoWx and the other TCDs are consistent with the results of the LST predictor analysis (Figures 6 and 7). As a strong predictor of Tmin, LST is likely driving many of the spatial differences between TopoWx and the other TCDs. The influence of the LST predictor was clearly evident in the Nevada example (Figure 11) where it had high relative importance and was negatively correlated with Daymet and PRISM differences from TopoWx. In contrast, with the overall lower relative importance of LST in Tmax interpolations, there were subsequently less differences between the datasets except in Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE regions where the linear relationship between Tmax and elevation was not as strong (Figure 12). 4. Conclusion As evident in our validation, TopoWx contributes three main advancements to topoclimatic temperature interpolation: (1) an improved representation of interdecadal and long-term temperature trends; (2) an improved representation of complex temperature spatial patterns, particularly for Tmin; and (3) a spatial representation of uncertainty that accounts for both model goodness of fit and the geographical arrangement of stations. These advancements were made through the use of previously developed homogenization procedures (Menne and Williams, 2009), remotely sensed LST as an auxiliary predictor of topoclimatic air temperature, and a unique implementation of MW-RK. In the context of these advancements, several caveats should still be noted. Homogenization procedures largely remove artificial jumps and trends, but they can also smooth out finer-scale trend variations by imposing the regional climate signal on each station (Pielke et al., 2007). Additionally, as illustrated by differences in USHCN trends and those of relatively finer resolution reanalysis datasets (Vose et al., 2012), there are still uncertainties in regional climate signals even within homogenized datasets. TopoWx is also a daily product, but homogenized on a monthly time step. Future work should look to improve corrections for daily time-of-observation departures and biases and to incorporate daily homogenization schemes (e.g. Della-Marta and Wanner, 2006; Kuglitsch et al., 2009) that additionally correct for artificial changes in temperature distributions, not just the mean. More rigorous inter-comparisons with other TCDs that use truly independent station data are also warranted to fully understand the advantages and disadvantages of TopoWx in specific regions of interest, particularly in regions of high uncertainty like the California Pacific coast. Since the RK approach already lends itself to using any arbitrary model for the deterministic trend component (Hengl et al., 2007), more sophisticated modelling methods that move beyond linear regression should also be investigated. Regression trees or generalized additive models could be used to account for complex, nonlinear predictor relationships and possibly improve LST predictive power. Lastly, while the π k metric provides a good indication of spatial uncertainty in temperature normals, it does not propagate uncertainty from the station data infilling step nor does it reflect changes in daily temperature uncertainty through time. TopoWx will remain a work-in-progress and we encourage community-driven enhancements, feedback, and derivative datasets. All associated TopoWx input/output data, software code, validation metrics, and station QA, homogenization and infill statistics will be available at http://www.ntsg.umt.edu/project/TopoWx. Even with the model’s remaining caveats, TopoWx takes an important next step in addressing the main limitations © 2014 Royal Meteorological Society of current TCDs particularly in regard to representing topoclimatic variations in Tmin, improving upon issues stemming from non-homogenized station data and quantifying spatial uncertainty. The TopoWx methods developed for temperature should also be applicable to other climate variables. For instance, the station data record extension methods that combine atmospheric reanalysis and local long-term station data could be key for better representing interdecadal temporal variability in precipitation at higher elevation station locations (Luce et al., 2013). Ultimately, TopoWx should help advance climate-driven ecological and hydrological modelling and facilitate more openness in TCDs and a better end-user understanding of their uncertainties and limitations. Acknowledgements We thank Dr. Anna Klene and 3 anonymous reviewers for invaluable feedback on previous drafts. This study is based on work supported by the National Science Foundation under EPSCoR Grant EPS-1101342, the US Geological Survey North Central Climate Science Center Grant G-0734-2 and the US Geological Survey Energy Resources Group Grant G11AC20487. Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the National Science Foundation. Supporting Information The following supporting information is available as part of the online article: Appendix S1. Methods for missing value infilling. Appendix S2. Methods for moving window regression kriging. Appendix S3. Model performance metrics. Table S1. Number of daily temperature observations from 1948 to 2012 flagged by Durre et al. (2010) quality assurance procedures for GHCN-D, SNOTEL and RAWS station networks. Figure S1. Leave-one-out cross-validation mean absolute error (MAE) for interpolated 1981–2010 monthly minimum temperature normals. Points are all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 589). Figure S2. Leave-one-out cross-validation mean absolute error (MAE) for interpolated 1981–2010 monthly maximum temperature normals. Points are all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 619). Figure S3. Leave-one-out cross-validation bias for interpolated 1981–2010 monthly minimum temperature normals. Points are all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 589). Figure S4. Leave-one-out cross-validation bias for interpolated 1981–2010 monthly maximum temperature normals. Points are all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 619). Int. J. Climatol. (2014) J. W. OYLER et al. Figure S5. Leave-one-out cross-validation mean absolute error (MAE) for interpolated 1948–2012 daily minimum temperatures. MAE is based on observed, non-missing observations at all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 589). Figure S6. Leave-one-out cross-validation mean absolute error (MAE) for interpolated 1948–2012 daily maximum temperatures. MAE is based on observed, non-missing observations at all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 619). Figure S7. Leave-one-out cross-validation refined index of agreement (dr ) for interpolated 1948–2012 daily minimum temperatures. The dr is based on a monthly-varying baseline and observed, non-missing observations at all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 589). Figure S8. Leave-one-out cross-validation refined index of agreement (dr ) for interpolated 1948–2012 daily maximum temperatures. The dr is based on a monthly-varying baseline and observed, non-missing observations at all input GHCN-D, SNOTEL, and RAWS stations within the contiguous United States (n = 11 619). References Abatzoglou JT. 2013. Development of gridded surface meteorological data for ecological applications and modelling. Int. J. Climatol. 33: 121–131, doi: 10.1002/joc.3413. Abatzoglou JT, Brown TJ. 2012. A comparison of statistical downscaling methods suited for wildfire applications. Int. J. Climatol. 32: 772–780, doi: 10.1002/joc.2312. Alexandersson H. 1986. A homogeneity test applied to precipitation data. J. Climatol. 6: 661–675, doi: 10.1002/joc.3370060607. Barry RG. 2008. Mountain Weather and Climate, 3rd edn. Cambridge University Press: Cambridge, UK. Beier CM, Signell SA, Luttman A, DeGaetano AT. 2011. High-resolution climate change mapping with gridded historical climate products. Landsc. Ecol. 27: 327–342, doi: 10.1007/s10980011-9698-8. Benali A, Carvalho AC, Nunes JP, Carvalhais N, Santos A. 2012. Estimating air surface temperature in Portugal using MODIS LST data. Remote Sens. Environ. 124: 108–121, doi: 10.1016/j.rse.2012. 04.024. Beniston M. 2006. Mountain weather and climate: a general overview and a focus on climatic change in the Alps. Hydrobiologia 562: 3–16, doi: 10.1007/s10750-005-1802-0. Bishop DA, Beier CM. 2013. Assessing uncertainty in high-resolution spatial climate data across the US Northeast. PLoS One 8: e70260, doi: 10.1371/journal.pone.0070260. Bolstad PV, Swift L, Collins F, Régnière J. 1998. Measured and predicted air temperatures at basin to regional scales in the southern Appalachian mountains. Agric. For. Meteorol. 91: 161–176, doi: 10.1016/S0168-1923(98)00076-8. Camargo MB, Hubbard KG. 1999. Spatial and temporal variability of daily weather variables in sub-humid and semi-arid areas of the United States high plains. Agric. For. Meteorol. 93: 141–148, doi: 10.1016/S0168-1923(98)00122-1. Cressie N. 1993. Statistics for Spatial Data. Wiley: New York, NY. Crimmins SM, Dobrowski SZ, Greenberg JA, Abatzoglou JT, Mynsberge AR. 2011. Changes in climatic water balance drive downhill shifts in plant species’ optimum elevations. Science 331: 324–327, doi: 10.1126/science.1199040. Crosson WL, Al-Hamdan MZ, Hemmings SNJ, Wade GM. 2012. A daily merged MODIS Aqua–Terra land surface temperature data set for the conterminous United States. Remote Sens. Environ. 119: 315–324, doi: 10.1016/j.rse.2011.12.019. Daly C. 2006. Guidelines for assessing the suitability of spatial climate data sets. Int. J. Climatol. 26: 707–721, doi: 10.1002/joc.1322. © 2014 Royal Meteorological Society Daly C, Gibson WP, Taylor GH, Johnson GL, Pasteris P. 2002. A knowledge-based approach to the statistical mapping of climate. Clim. Res. 22: 99–113. Daly C, Halbleib M, Smith JI, Gibson WP, Doggett MK, Taylor GH, Curtis J, Pasteris PP. 2008. Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int. J. Climatol. 28: 2031–2064, doi: 10.1002/joc.1688. Daly C, Conklin DR, Unsworth MH. 2010. Local atmospheric decoupling in complex topography alters climate change impacts. Int. J. Climatol. 30: 1857–1864, doi: 10.1002/joc.2007. DeGaetano AT. 1999. A method to infer observation time based on day-to-day temperature variations. J. Clim. 12: 3443–3456, doi: 10.1175/1520-0442(1999)012<3443:AMTIOT>2.0.CO;2. DeGaetano AT, Eggleston KL, Knapp WW. 1995. A method to estimate missing daily maximum and minimum temperature observations. J. Appl. Meteorol. 34: 371–380, doi: 10.1175/1520-0450-34.2.371. Della-Marta PM, Wanner H. 2006. A method of homogenizing the extremes and mean of daily temperature measurements. J. Clim. 19: 4179–4197, doi: 10.1175/JCLI3855.1. Diaz HF, Eischeid JK. 2007. Disappearing “alpine tundra” Köppen climatic type in the western United States. Geophys. Res. Lett. 34: L18707, doi: 10.1029/2007GL031253. Dobrowski SZ, Abatzoglou JT, Greenberg JA, Schladow S. 2009. How much influence does landscape-scale physiography have on air temperature in a mountain environment? Agric. For. Meteorol. 149: 1751–1758, doi: 10.1016/j.agrformet.2009.06.006. Dozier J. 1996. A generalized split-window algorithm for retrieving land-surface temperature from space. IEEE Trans. Geosci. Remote Sens. 34: 892–905, doi: 10.1109/36.508406. Durre I, Menne MJ, Gleason BE, Houston TG, Vose RS. 2010. Comprehensive automated quality assurance of daily surface observations. J. Appl. Meteorol. Climatol. 49: 1615–1633, doi: 10.1175/2010JAMC2375.1. Eischeid JK, Pasteris PA, Diaz HF, Plantico MS, Lott NJ. 2000. Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J. Appl. Meteorol. 39: 1580–1591, doi: 10.1175/1520-0450(2000)039<1580:CASCND> 2.0.CO;2. Elsner MM, Cuo L, Voisin N, Deems JS, Hamlet AF, Vano JA, Mickelson KEB, Lee S, Lettenmaier DP. 2010. Implications of 21st century climate change for the hydrology of Washington State. Clim. Change 102: 225–260, doi: 10.1007/s10584-010-9855-0. Finklin AI. 1986. A climatic handbook for Glacier National Park – with data for Waterton Lakes National Park. General Technical Report INT-204, US Department of Agriculture Forest Service Intermountain Research Station, Ogden, UT. Florio EN, Lele SR, Chi Chang Y, Sterner R, Glass GE. 2004. Integrating AVHRR satellite data and NOAA ground observations to predict surface air temperature: a statistical approach. Int. J. Remote Sens. 25: 2979–2994, doi: 10.1080/01431160310001624593. Fotheringham AS, Brunsdon C, Charlton M. 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley: Chichester, UK. Gesch D, Oimoen M, Greenlee S, Nelson C, Steuck M, Tyler D. 2002. The National Elevation Dataset. Photogramm. Eng. Remote Sens. 68: 5–11. Glick P, Stein BA, Edelson NA (eds). 2011. Scanning the Conservation Horizon: A Guide to Climate Change Vulnerability Assessment. National Wildlife Federation: Washington, DC. Goovaerts P. 2001. Geostatistical modelling of uncertainty in soil science. Geoderma 103: 3–26, doi: 10.1016/S0016-7061(01)00067-2. Grömping U. 2006. Relative importance for linear regression in R: the package relaimpo. J. Stat. Softw. 17: 1–27. Guentchev G, Barsugli JJ, Eischeid J. 2010. Homogeneity of gridded precipitation datasets for the Colorado River Basin. J. Appl. Meteorol. Climatol. 49: 2404–2415, doi: 10.1175/2010JAMC2484.1. Guttman NB, Quayle RG. 1996. A historical perspective of U.S. Climate Divisions. Bull. Am. Meteorol. Soc. 77: 293–303, doi: 10.1175/1520-0477(1996)077<0293:AHPOUC>2.0.CO;2. Haas TC. 1990. Kriging and automated variogram modeling within a moving window. Atmos. Environ. 24A: 1759–1769, doi: 10.1016/0960-1686(90)90508-K. Hansen J, Ruedy R, Sato M, Lo K. 2010. Global surface temperature change. Rev. Geophys. 48: RG4004, doi: 10.1029/2010RG000345. Harris P, Charlton M, Fotheringham AS. 2010. Moving window kriging with geographically weighted variograms. Stoch. Environ. Res. Risk Assess. 24: 1193–1209, doi: 10.1007/s00477-010-0391-2. Hengl T. 2009. A Practical Guide to Geostatistical Mapping. Lulu Publishers: Raleigh, NC. Int. J. Climatol. (2014) TOPOCLIMATIC DAILY AIR TEMPERATURE Hengl T, Heuvelink GBM, Stein A. 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120: 75–93, doi: 10.1016/j.geoderma.2003.08.018. Hengl T, Heuvelink GBM, Rossiter DG. 2007. About regression-kriging: from equations to case studies. Comput. Geosci. 33: 1301–1315, doi: 10.1016/j.cageo.2007.05.001. Hengl T, Heuvelink GBM, PercΜec TadicΜ M, Pebesma EJ. 2011. Spatio-temporal prediction of daily temperatures using time-series of MODIS LST images. Theor. Appl. Climatol. 107: 265–277, doi: 10.1007/s00704-011-0464-2. Holden ZA, Abatzoglou JT, Luce CH, Baggett LS. 2011a. Empirical downscaling of daily minimum air temperature at very fine resolutions in complex terrain. Agric. For. Meteorol. 151: 1066–1073, doi: 10.1016/j.agrformet.2011.03.011. Holden ZA, Crimmins MA, Cushman SA, Littell JS. 2011b. Empirical modeling of spatial and temporal variation in warm season nocturnal air temperatures in two North Idaho mountain ranges, USA. Agric. For. Meteorol. 151: 261–269, doi: 10.1016/j.agrformet.2010.10.006. Holder C, Boyles R, Syed A, Niyogi D, Raman S. 2006. Comparison of collocated automated (NCECONet) and manual (COOP) climate observations in North Carolina. J. Atmos. Oceanic Technol. 23: 671–682, doi: 10.1175/JTECH1873.1. Hubbard KG. 1994. Spatial variability of daily weather variables in the high plains of the USA. Agric. For. Meteorol. 68: 29–41, doi: 10.1016/0168-1923(94)90067-1. Hubbard KG, You J. 2005. Sensitivity analysis of quality assurance using the spatial regression approach – a case study of the maximum/ minimum air temperature. J. Atmos. Oceanic Technol. 22: 1520–1530, doi: 10.1175/JTECH1790.1. Huth R, Nemesova I. 1995. Estimation of missing daily temperatures: Can a weather categorization improve its accuracy? J. Clim. 8: 1901–1916, doi: 10.1175/1520-0442(1995)008<1901:EOMDTC> 2.0.CO;2. Iacobellis SF, Cayan DR. 2013. The variability of California summertime marine stratus: impacts on surface air temperatures. J. Geophys. Res. Atmos. 118: 9105–9122, doi: 10.1002/jgrd.50652. Isaaks EH, Srivastava RM. 1989. Applied Geostatistics. Oxford University Press: Oxford, UK. Janis MJ. 2002. Observation-time-dependent biases and departures for daily minimum and maximum air temperatures. J. Appl. Meteorol. 41: 588–603, doi: 10.1175/1520-0450(2002)041<0588:OTDBAD> 2.0.CO;2. Jin M, Dickinson RE. 2010. Land surface skin temperature climatology: benefitting from the strengths of satellite observations. Environ. Res. Lett. 5: 044004, doi: 10.1088/1748-9326/5/4/044004. Jin M, Dickinson RE, Vogelmann AM. 1997. A comparison of CCM2–BATS skin temperature and surface-air temperature with satellite and surface observations. J. Clim. 10: 1505–1524, doi: 10.1175/1520-0442(1997)010<1505:ACOCBS>2.0.CO;2. Johnstone JA, Dawson TE. 2010. Climatic context and ecological implications of summer fog decline in the coast redwood region. Proceedings of the National Academy of Sciences of the United States of America. 107: 4533–4538, doi: 10.1073/pnas.0915062107. Jones PD, Lister DH, Osborn TJ, Harpham C, Salmon M, Morice CP. 2012. Hemispheric and large-scale land-surface air temperature variations: an extensive revision and an update to 2010. J. Geophys. Res. 117: D05127, doi: 10.1029/2011JD017139. Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo K, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D. 1996. The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc. 77: 437–471, doi: 10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2. Karl TR, Williams CN, Young PJ, Wendland WM. 1986. A model to estimate the time of observation bias associated with monthly mean maximum, minimum and mean temperatures for the United States. J. Clim. Appl. Meteorol. 25: 145–160, doi: 10.1175/1520-0450(1986)025 <0145:AMTETT>2.0.CO;2. Keane RE, Drury SA, Karau EC, Hessburg PF, Reynolds KM. 2010. A method for mapping fire hazard and risk across multiple scales and its application in fire management. Ecol. Model. 221: 2–18, doi: 10.1016/j.ecolmodel.2008.10.022. Kemp WP, Burnell DG, Everson DO, Thomson AJ. 1983. Estimating missing daily maximum and minimum temperatures. J. Clim. Appl. Meteorol. 22: 1587–1593, doi: 10.1175/1520-0450(1983)022 <1587:EMDMAM>2.0.CO;2. Kloog I, Chudnovsky A, Koutrakis P, Schwartz J. 2012. Temporal and spatial assessments of minimum air temperature using satellite surface © 2014 Royal Meteorological Society temperature measurements in Massachusetts, USA. Sci. Total Environ. 432: 85–92, doi: 10.1016/j.scitotenv.2012.05.095. Kuglitsch FG, Toreti A, Xoplaki E, Della-Marta PM, Luterbacher J, Wanner H. 2009. Homogenization of daily maximum temperature series in the Mediterranean. J. Geophys. Res. 114: D15108, doi: 10.1029/2008JD011606. Lawrimore JH, Menne MJ, Gleason BE, Williams CN, Wuertz DB, Vose RS, Rennie J. 2011. An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. J. Geophys. Res. 116: D19121, doi: 10.1029/2011JD016187. Legates DR, McCabe G. 1999. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 35: 233–241, doi: 10.1029/1998WR900018. Legates DR, McCabe GJ. 2013. A refined index of model performance: a rejoinder. Int. J. Climatol. 33: 1053–1056, doi: 10.1002/joc.3487. Leung Y, Mei C-L, Zhang W-X. 2000. Statistical tests for spatial nonstationarity based on the geographically weighted regression model. Environ. Plann. A 32: 9–32, doi: 10.1068/a3162. Lindeman R, Merenda P, Gold R. 1980. Introduction to Bivariate and Multivariate Analysis. Scott Foresman: Glenview, IL. Littell JS, Oneil EE, McKenzie D, Hicke JA, Lutz JA, Norheim RA, Elsner MM. 2010. Forest ecosystems, disturbance, and climatic change in Washington State, USA. Clim. Change 102: 129–158, doi: 10.1007/s10584-010-9858-x. Livneh B, Rosenberg EA, Lin C, Nijssen B, Mishra V, Andreadis KM, Maurer EP, Lettenmaier DP. 2013. A long-term hydrologically nased dataset of land surface fluxes and states for the conterminous United States: update and extensions. J. Clim. 26: 9384–9392, doi: 10.1175/JCLI-D-12-00508.1. Lloyd CD. 2009. Nonstationary models for exploring and mapping monthly precipitation in the United Kingdom. Int. J. Climatol. 30: 390–405, doi: 10.1002/joc.1892. Luce CH, Abatzoglou JT, Holden ZA. 2013. The missing mountain water: slower westerlies decrease orographic enhancement in the Pacific Northwest USA. Science 342: 1360–1364, doi: 10.1126/science.1242335. Lundquist JD, Pepin N, Rochford C. 2008. Automated algorithm for mapping regions of cold-air pooling in complex terrain. J. Geophys. Res. 113: D22107, doi: 10.1029/2008JD009879. van Mantgem PJ, Stephenson NL, Byrne JC, Daniels LD, Franklin JF, Fulé PZ, Harmon ME, Larson AJ, Smith JM, Taylor AH, Veblen TT. 2009. Widespread increase of tree mortality rates in the western United States. Science 323: 521–524, doi: 10.1126/science.1165000. Maurer EP, Hidalgo HG. 2008. Utility of daily vs. monthly large-scale climate data: an intercomparison of two statistical downscaling methods. Hydrol. Earth Syst. Sci. 12: 551–563, doi: 10.5194/hess12-551-2008. Menne MJ, Williams CN. 2009. Homogenization of temperature series via pairwise comparisons. J. Clim. 22: 1700–1717, doi: 10.1175/2008JCLI2263.1. Menne MJ, Williams CN, Vose RS. 2009. The U.S. Historical Climatology Network Monthly Temperature Data, Version 2. Bull. Am. Meteorol. Soc. 90: 993–1007, doi: 10.1175/2008BAMS2613.1. Menne MJ, Durre I, Vose RS, Gleason BE, Houston TG. 2012. An overview of the Global Historical Climatology Network-Daily Database. J. Atmos. Oceanic Technol. 29: 897–910, doi: 10.1175/ JTECH-D-11-00103.1. Mildrexler DJ, Zhao M, Running SW. 2011. A global comparison between station air temperatures and MODIS land surface temperatures reveals the cooling role of forests. J. Geophys. Res. 116: G03025, doi: 10.1029/2010JG001486. Millard MJ, Czarnecki CA, Morton JM, Brandt LA, Briggs JS, Shipley FS, Sayre R, Sponholtz PJ, Perkins D, Simpkins DG, Taylor J. 2012. A national geographic framework for guiding conservation on a landscape scale. J. Fish Wildl. Manage. 3: 175–183, doi: 10.3996/052011-JFWM-030. Morisette JT (ed). 2012. North Central Climate Science Center – Science agenda 2012–2017: U.S. Geological Survey Open-File Report 2012–1265, USGS, Reston, VA, 19 pp. Mostovoy GV, King RL, Reddy KR, Kakani VG, Filippova MG. 2006. Statistical estimation of daily maximum and minimum air temperatures from MODIS LST data over the state of Mississippi. GISci. Remote Sens. 43: 78–110, doi: 10.2747/1548-1603.43.1.78. Pepin NC, Daly C, Lundquist J. 2011. The influence of surface versus free-air decoupling on temperature trend patterns in the western United States. J. Geophys. Res. 116: D10109, doi: 10.1029/ 2010JD014769. Peterson TC, Easterling DR, Karl TR, Groisman P, Nicholls N, Plummer N, Torok S, Auer I, Boehm R, Gullett D, Vincent L, Heino R, Int. J. Climatol. (2014) J. W. OYLER et al. Tuomenvirta H, Mestre O, Szentimrey T, Salinger J, Førland EJ, Hanssen-Bauer I, Alexandersson H, Jones P, Parker D. 1998. Homogeneity adjustments of in situ atmospheric climate data: a review. Int. J. Climatol. 18: 1493–1517, doi: 10.1002/(SICI)1097-0088 (19981115)18:13<1493::AID-JOC329>3.0.CO;2-T. Pielke RA. 1974. A three-dimensional numerical model of the sea breezes over south Florida. Mon. Weather Rev. 102: 115–139, doi: 10.1175/1520-0493(1974)102<0115:ATDNMO>2.0.CO;2. Pielke RA, Davey CA, Niyogi D, Fall S, Steinweg-Woods J, Hubbard K, Lin X, Cai M, Lim Y-K, Li H, Nielsen-Gammon J, Gallo K, Hale R, Mahmood R, Foster S, McNider RT, Blanken P. 2007. Unresolved issues with the assessment of multidecadal global land surface temperature trends. J. Geophys. Res. 112: D24S08, doi: 10.1029/2006JD008229. PRISM Climate Group. 2012. Norm81m dataset, Oregon State University, Corvallis, OR. ftp://prism.nacse.org/normals_800m (accessed 5 April 2014). PRISM Climate Group. 2013a. AN81m dataset, Oregon State University, Corvallis, OR. ftp://prism.nacse.org/monthly (accessed 5 April 2014). PRISM Climate Group. 2013b. Descriptions of PRISM spatial climate datasets for the conterminous United States, Oregon State University, Corvallis, OR. http://www.prism.oregonstate.edu/documents/ PRISM_datasets_aug2013.pdf (accessed 27 May 2014). R Core Team. 2012. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna. Reeves J, Chen J, Wang XL, Lund R, Lu Q. 2007. A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteorol. Climatol. 46: 900–915, doi: 10.1175/JAM2493.1. Rohde R, Muller R, Jacobsen R, Perlmutter S, Rosenfeld A, Wurtele J, Curry J, Wickham C, Mosher S. 2013. Berkeley Earth temperature averaging process. Geoinform. Geostat. 1(2): 1–13, doi: 10.4172/ 2327-4581.1000103. Running SW, Nemani RR, Hungerford RD. 1987. Extrapolation of synoptic meteorological data in mountainous terrain and its use for simulating forest evapotranspiration and photosynthesis. Can. J. For. Res. 17: 472–483, doi: 10.1139/x87-081. Schafer JL. 1997. Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC: Boca Raton, FL. Smith TM, Reynolds RW, Peterson TC, Lawrimore J. 2008. Improvements to NOAA’s historical merged land–ocean surface temperature analysis (1880–2006). J. Clim. 21: 2283–2296, doi: 10.1175/ 2007JCLI2100.1. Snyder WC, Wan Z, Zhang Y, Feng Y. 1998. Classification-based emissivity for land surface temperature measurement from space. Int. J. Remote Sens. 19: 2753–2774, doi: 10.1080/014311698214497. Stacklies W, Redestig H, Scholz M, Walther D, Selbig J. 2007. pcaMethods–a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23: 1164–1167, doi: 10.1093/bioinformatics/btm069. Thornton PE, Running SW, White MA. 1997. Generating surfaces of daily meteorological variables over large regions of complex terrain. J. Hydrol. 190: 214–251, doi: 10.1016/S0022-1694(96)03128-9. Thornton PE, Thornton MM, Mayer BW, Wilhelmi N, Wei Y, Cook RB. 2012. Daymet: Daily surface weather on a 1 km grid for © 2014 Royal Meteorological Society North America, 1980–2012, Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, TN, doi: 10.3334/ORNLDAAC/Daymet_V2. http://daymet.ornl.gov/ (accessed 19 November 2013). Trewin B. 2010. Exposure, instrumentation, and observing practice effects on land temperature measurements. WIREs: Clim. Change 1: 490–506, doi: 10.1002/wcc.46. Turner DP, Ritts WD, Yang Z, Kennedy RE, Cohen WB, Duane MV, Thornton PE, Law BE. 2011. Decadal trends in net ecosystem production and net ecosystem carbon balance for a regional socioecological system. For. Ecol. Manage. 262: 1318–1325, doi: 10.1016/j.foreco.2011.06.034. Vancutsem C, Ceccato P, Dinku T, Connor SJ. 2010. Evaluation of MODIS land surface temperature data to estimate air temperature in different ecosystems over Africa. Remote Sens. Environ. 114: 449–465, doi: 10.1016/j.rse.2009.10.002. Vincent LA, Zhang X, Bonsal BR, Hogg WD. 2002. Homogenization of daily temperatures over Canada. J. Clim. 15: 1322–1334, doi: 10.1175/1520-0442(2002)015<1322:HODTOC>2.0.CO;2. Vose RS, Applequist S, Menne MJ, Williams CN, Thorne P. 2012. An intercomparison of temperature trends in the U.S. Historical Climatology Network and recent atmospheric reanalyses. Geophys. Res. Lett. 39: L10703, doi: 10.1029/2012GL051387. Wan Z. 2008. New refinements and validation of the MODIS land-surface temperature/emissivity products. Remote Sens. Environ. 112: 59–74, doi: 10.1016/j.rse.2006.06.026. Wan Z, Li Z. 2011. MODIS land surface temperature and emissivity. In Land Remote Sensing and Global Environmental Change, Ramachandran B, Justice CO, Abrams MJ (eds). Springer: New York, NY. Webster R, Oliver MA. 2007. Geostatistics for Environmental Scientists, 2nd edn. Wiley: Chichester, UK. Whiteman CD, Bian X, Zhong S. 1999. Wintertime evolution of the temperature inversion in the Colorado plateau basin. J. Appl. Meteorol. 38: 1103–1117, doi: 10.1175/1520-0450(1999)038<1103: WEOTTI>2.0.CO;2. Wiens JA, Bachelet D. 2010. Matching the multiple scales of conservation with the multiple scales of climate change. Conserv. Biol. 24: 51–62, doi: 10.1111/j.1523-1739.2009.01409.x. Williams CN, Menne MJ, Thorne PW. 2012. Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. J. Geophys. Res. 117: D05116, doi: 10.1029/ 2011JD016761. Willmott CJ, Matsuura K. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30: 79–82, doi: 10.3354/cr030079. Willmott CJ, Robeson SM. 1995. Climatologically aided interpolation (CAI) of terrestrial air temperature. Int. J. Climatol. 15: 221–229, doi: 10.1002/joc.3370150207. Willmott CJ, Rowe CM, Philpot WD. 1985. Small-scale climate maps: a sensitivity analysis of some common assumptions associated with grid-point interpolation and contouring. Am. Cartogr. 12: 5–16. Willmott CJ, Robeson SM, Matsuura K. 2012. A refined index of model performance. Int. J. Climatol. 32: 2088–2094, doi: 10.1002/ joc.2419. Int. J. Climatol. (2014)