Climatic Research Unit (CRU) Datasets

Climatic Research Unit (CRU)
Datasets – and some analyses!
Phil Jones
Climatic Research Unit
University of East Anglia
Norwich, NR4 7TJ, UK
• Datasets
• Data needs to be on the web
• Where possible data needs to contain
• Dataset needs to have a peer-review
publication to back it up
• Don’t say, I used CRU data! I’ve seen this in a few
papers I’ve been sent for review and also in countless
emails, where the sender asks for details on how and
sometimes why they should be using the data?
• CRU has dataset names for a purpose – for people to
refer to them by! We will be moving to DOIs, but
there are issues here with regular updates
• Most of our datasets are backed up by peer-review
papers. These give details about their construction.
The web site gives some details, mainly on lay-out,
units etc. We can’t put some papers up
• CRU datasets are generally supplied in netcdf and
ascii (for smaller ones). It is up to users to read them
into software
• CRU isn’t able to extract windows out of the globalscale gridded datasets
• Here I’ll be discussing these three datasets
• CRUTEM4 (Jones, P.D., Lister, D.H., Osborn, T.J., Harpham, C., Salmon, M., Morice,
C.P. 2012: Hemispheric and large-scale land surface air temperature variations: An extensive
revision and an update to 2010. J. Geophys. Res. 117, D05127, doi:10.1029/2011JD017139
• HadCRUT4 (Morice, C.P., Kennedy, J.J., Rayner, N.A. and Jones, P.D., 2012:
Quantifying uncertainties in global and regional temperature change using an ensemble of
observational estimates: the HadCRUT4 dataset. Journal of Geophysical Research, 117,
D08101, doi:10.1029/2011JD017187
• CRU TS 3.10 (Harris, I., Jones, P.D., Osborn, T.J. and Lister, D.H., 2013: Updated
high-resolution grids of monthly climatic observations– the CRU TS3.10 Dataset. Int. J.
• These are by no means all the datasets at CRU
Climatol. (in press)
What are the datasets?
• CRUTEM4 – gridded (5° by 5° lat/long) monthly
temperature anomalies (from 1961-90) based on land
stations. Extends from 1850 and does no infilling, so
if there are no station data, the grid-box value is
• HadCRUT4 – combination of CRUTEM4 with HadSST3
(a similar gridded dataset of SST anomalies)
• Both HadCRUT4 and CRUTEM4 updated monthly, but
much more extensively every year (~April) with
updated homogenized data provided offline or
through websites from NMSs
• CRU TS 3.10 – gridded (0.5° by 0.5° lat/long) monthly
anomalies (from 1961-90) mean temperature, DTR,
precipitation total, vapour pressure, sunshine and
potential evapotranspiration (PET). Extends from
1901 and is as spatially complete as possible for all
variables Interpolation/extrapolation only occurs
over land areas. The Antarctic (south of 60°S is
Citation Statistics
• The current three versions are quite new, but there
were earlier versions of all three datasets
(CRUTEM2/3, HadCRUT2/3, CRUS TS 1.0/2.1)
• Which dataset gets the most citations?
• Reason is that it puts the data as anomalies and
absolute values, and derives national averages for
~200 countries and territories
• Some data papers appear to get more citations than
modelling/analysis papers, even though many don’t put
the reference in
An initial comparison
• CRU TS 3.10 is complete over global land (except Antarctica)
• Next plot degrades its resolution to CRUTEM4 and then
removes all missing areas that are in CRUTEM4
• Comparison plots of trends 1951-2009
• Subsequent slide shows whether the trends are significantly
different. Only two boxes show a difference
• Similar pair of plots comparing CRU TS 3.10 for Precipitation
against GPCCv5 (from the Global Precipitation Climatology
Centre at DWD)
• Similar comparisons for additional periods (1901-50 and 19012009)
Series at smaller scales (region definition)
I. Harris et al., 2013: Updated high-resolution grids of monthly climatic observations
– the CRU TS3.10 Dataset. Int. J. Climatol. (in press)
Underlying Station Data/Code
• Partly because, we’ve included station series sent to us by
National Met Services (NMSs) we have been not able to release
the individual station series
• In late 2009, we contacted all NMSs to see if we could release
the station series we have for their countries. Only 40% replied
and only one country said no. With the UK Met Office we
decided to overrule Poland and released all the station data. The
station series are updated each year
• We intend to release all the station data for the CRU TS 3.10
dataset as well
• The Met Office released a version of the code to calculate
CRUTEM3/4. This is not the original Fortran, but a version in
Perl which works with a free compiler
• GPCC (part of DWD) have lots of different version of their
gridded precipitation products, but don’t release the underlying
station data. Difficult, therefore, to know where some of their
‘oddish’ values come from
• Issue is becoming one of gridded datasets being traceable back
to the original ‘raw’ data
Updating Issues
• Our principle of homogeneity adjustment is to make
as much use as possible of these types of data
produced by National Met Services (NMSs)
• This means that each year we have to access NMS
web sites to update series
• Generally, we find that many NMSs have improved
their homogeneity and added new long series, so
updating is not straightforward
• We must check each series against what we had and
recalculate 1961-90 normals
• NCDC are updating their homogeneity exercise every
month now, but only updating the stations once a year
• As mentioned earlier, dataset doi’s and regular
updates is not that clear
Uncertainties (grid boxes)
• Ever since we first produced the hemispheric averages, we’ve
been asked about the accuracy of the hemispheric averages and
also the individual grid-box series
• To address this, we developed variance adjusted versions for
CRUTEM2/3/4 so CRUTEM2v/3v/4v and also for
• Variance adjustment (Jones et al., 1997) attempts to make each
grid-box series internally consistent and not affected by
changing station numbers (each series is adjusted to one based
on an infinitely sampled grid box)
• This study leads to the concept of the effective number of
spatial degrees of freedom (Neff), which decreases with
increasing timescale. The number is larger for some variables,
such as precipitation, which are much more spatially variable
• For temperature, the fact that (Neff) is smaller at larger
timescales enables proxy reconstructions to be made
Jones, P.D., Osborn, T.J. and Briffa, K.R., 1997: Estimating sampling errors in
large-scale temperature averages. J. Climate 10, 2548-2568.
Uncertainties (Hemispheres/Globe)
• Variance adjustment still omitted the effect on
large-scale averages of regions dropping out in the
early years
• This was addressed in 2003 and more completely in
2006 by Brohan et al (2006)
• Few users used these uncertainty estimates as some
components were temporally and spatially dependent
• So addressed in HadCRUT4 in a different way
Brohan, P., Kennedy, J., Harris, I., Tett, S.F.B. and Jones, P.D., 2006:
Uncertainty estimates in regional and global observed temperature changes: a
new dataset from 1850. J. Geophys. Res. 111, D12106,
Global time-series at annual resolution
Red – homogeneity
Green – sampling
Blue – buckets
Combined error is
the sum in
quadrataure,as the
various errors are
Latest Uncertainties (Ensemble
Used in 2012 releases to ensure users took the uncertainties properly
into account
Developed by deriving 100 realizations of the past, drawing from the
distributions for the various error and bias components
Users wanting just one realization take the best guess for each grid
box, and also the best guess for each hemispheric average and the
An FAQ needed to explain why the best guess for each grid box doesn’t
produce the best guess hemispheric and global averages
Requirement to understand the structure of the uncertainties and the
error estimates of the various components
Knowledge of the error structure is vital for addressing approaches to
reduce the error. This shows that reductions will come from digitising
more data in areas currently without data, and not from regions that
already have extensive coverage – so need more series from Africa than
more in parts of North America
Also improvements will come from improved methods of adjusting for
biases in both the marine and land components
Comparison of CRUTEM4 with papers by Callendar (1938, 1961)
Includes the error estimate ranges for CRUTEM4 developed by Morice et al (2012)
Further comparisons with earlier work in Ch 1 of WG1 from AR4
HadCRUT4 vs other groups
Each series has its full coverage
Web Locations
• CRUTEM4 and HadCRUT4 are also
available at
20CR LSAT trends compared to
conventional large-scale averages
• Paper by Compo et al. (2013) accepted by GRL
20CR LSAT versus conventional series for land (90N-60S)
20CR (RHS) versus the infilled CRU dataset (CRU TS 3.10, LHS)
Trends over 1952-2010
Separate plots for
LSAT (90N-60S)
and differences
(20CR minus
20CR seems far
too warm in some
WW2 years
UK (50-60°N, 0-10°W) - annual
NZ (165-180°E, 35-50°S) - annual
Using ERA-Interim to assess
changes in extremes across
• Uses 1979-2010 for ERA-Interim and compares the trends in
extremes with station data from E-OBS, and also the E-OBS
• Four indices of extremes (Tx90p, Tx10p, Tn90p and Tn10p) all
calculated using the ETCCDI software
• Work not yet completed
Peer-Review papers
• Useful to have these to back up
datasets. IPCC requires this!
• Not necessary to update regularly, but
useful if this can be done
• Don’t worry that datasets don’t always
get referenced
• Dataset journals coming along, setting
up doi’s for datasets in a similar way to
• Many CRU datasets, as there are many datasets at NCDC and
different versions of GCM/RCM simulations
• This compared CRU’s high (CRU TS 3.10) and low resolution
(CRUTEM4) datasets and also with the GPCCv5 precipitation
• Using as much NMS-homogeneity adjusted land data means
updating in near-real time creates additional burdens
• Uncertainties addressed at the grid-box and the larger-scale
• To use these effectively, the latest version of our combined
dataset (HadCRUT4) provides multiple realizations of the past
(in an ensemble nature similar to many GCM simulations)
• Knowledge of the error structure is vital to developing effective
ways of reducing the error