gcbb12047-sup-0001-DataS1

advertisement
Supporting Online Material (SOM) for: Global Simulation of Bioenergy Crop
Productivity: Analytical Framework and Case Study for Switchgrass
Shujiang Kang*, Sujithkumar S. Nair, Keith. L. Kline, Jeffery A. Nichols, Dali Wang, Wilfred
M. Post, Craig C. Brandt, Stan D. Wullschleger, Nagendra Singh, and Yaxing Wei
*Correspondence author: tel. 865-574-5948, fax 865-574-9501, email:kangs@ornl.gov
SOM Contents






S1 Description of high-performance computing procedures of HPC-EPIC
S2 Model calibration
S3 Efficiency of HPC-EPIC simulation
S4 Uncertainty in global biomass productivity simulation of switchgrass
S5 Field-Trials Database Availability and Updates
S6 Python script for weather data processing of EPIC weather input files.
S1 Description of high-performance computing procedures of HPC-EPIC
This section provides a detailed description of global simulation of switchgrass production
conducted on a cluster at Oak Ridge National Laboratory, referred to Description of EPIC and
HPC-EPIC in Materials and Methods. Parallelization is achieved by creating multiple packages
and distributing them to different processors for execution (Fig. S1). A package is a set of
simulations along with their associated input data executed independently on one
processor. Because of the data independence of the package, the simulations can proceed in
parallel. The processing speed attained is essentially linear with respect to the number of
processors utilized. This allows us to vary the number of packages to best fit the computational
resources available. Packaging the inputs, simulation configuration, and outputs into a single file
further optimizes the utilization of hardware, input/output, and scheduling resources because
moving one large file is more time efficient than moving thousands of small files. The packaging
and processing procedures were developed and tested recently in collaboration with the Great
1
Lakes Bioenergy Research Center (GLBRC). In that project, high-resolution modeling with
HPC-EPIC was conducted to assess bioenergy crop sustainability in two Midwest US states
(Nichols et al., 2011; Zhang et al., 2010).
For this case study, the data processing and packaging procedures described in Nichols et al.
(2011) were generally followed with one major exception: a modification was incorporated to
produce packages that are the smallest size possible, yet contain all required files. In the design
described, all possible input files were duplicated in each package without regard to the
minimum requirements for a given simulation site. This was more efficient in terms of
assembling the packages but meant that many packages had data that were not relevant for a
given site-simulation. We changed how packages were built because the datasets required for the
global simulations are large, numerous, and unique for each location. Therefore, packages were
assembled by region and only the input files required for the particular region were included in
each package. This modification enabled global, 30-year crop simulations to be completed
rapidly (in less than three hours) as described with the results below.
S2 Model calibration
For areas without established parameters or calibration datasets, we used the parameters from the
nearest zones with established parameters or from a zone with a similar climate. This includes
three categories of ecological zones. The first category is the same ecological zone but located in
different continents or regions. The parameters from calibrated zones are directly shared. For
example, we arranged the parameters calibrated from the same ecological zones in northern
hemisphere to those located in south hemisphere. The second category is that there is no any
calibration for the ecological zones, but the climate of these ecological zones is similar to some
2
other calibrated zones. For example, we have no calibration data for subtropical desert, and the
calibrated parameters from subtropical steppe zones are used for these zones. The last category is
for the zone out of last two categories, we used the parameters from nearest ecological zones. For
example, we assigned parameters calibrated from subtropical ecological zones to tropical zones.
S3 Efficiency of HPC-EPIC simulation
The ORNL Institutional Cluster supports high-speed computations for climate change
simulations and other scientific research (See www.cnms.ornl.gov/capabilities/oic-ornl.pdf) for
specifications of the nodes used for these simulations). The HPC-EPIC simulations for 50
packages as described above in methods were performed in parallel and took from 30 to 166
minutes to complete (Fig. S2). We estimate that traditional, serial computation of this set of
EPIC simulations under desktops would have taken approximately 500 hours. The actual
execution time for each package on a cluster was dependent on the capacity available at the
computing nodes which in turn was influenced by the number of cores available at each node and
the node’s total load (the nodes were processing other data separate from this task). If the
assigned node had a heavy load, memory competition and input-output would slow the execution
of the EPIC simulation package. The other factor affecting execution speed was the simulated
biophysical processes of the EPIC model itself. For example, failure of switchgrass to grow
under harsh conditions such as coldness in the arctic areas or drought in desert areas shortened
the EPIC run-time for packages involving simulations for these zones.
The computer hours required to run the simulations were negligible compared to the amount of
time required to complete other steps to develop and test the HPC-EPIC platform (Fig. S3).
Approximately twelve months of research staff time were invested in developing the global
3
platform and generating the initial case study results for switchgrass. The process of identifying,
downloading, verifying quality, and transforming or composing the data sources into the input
files needed by HPC-EPIC took approximately six months of research staff time, making these
basic data collection and assembly steps the most time-consuming part of the case study.
Additional steps to complete the simulations involved the development of management files (2
months effort) and iterative model tests, corrections and calibrations (an additional 3 months).
For example, the weather data input format was initially transferred into the simulation files with
a formatting error that was identified when the model was first tested. Organizing the simulation
outputs to provide appropriate structure for review and analysis took approximately one month
of staff effort. This time estimate does not include additional effort required to interpret results,
improve visualization and prepare corresponding reports.
S4 Uncertainty in global biomass productivity simulation of switchgrass
Despite significant efforts within specific communities (e.g., climate modeling, (IPCC, 2006)),
there is not yet full agreement on terminology and typology of uncertainties associated with
systems modeling (Walker et al., 2003), leading to “Balkanized views and interpretations of
probabilities, possibilities, likelihood and uncertainty” (Ricci et al., 2003). A generic issue is
that models by definition are simplified representations and the sheer size and complexity of the
world means that global-scale models depend on data that are averaged, aggregated or otherwise
processed in manners that may increase uncertainty, especially when results attempt to estimate
an outcome for a specific place and time rather than larger-scale trends. Each data set may
involve several dimensions of uncertainty.
4
It is not possible to quantify and discuss in detail the many types and sources of uncertainty
inherent in global modeling. For example, global weather data are based on a set of point
estimates that cannot capture the full range of variability in weather phenomena across space and
time. Because weather infrastructure is limited in many parts of the world, global weather data
are estimated for many simulation units. Data that are based on extrapolation and other models
are more uncertain than data derived from direct measurement. Even where direct measurement
is possible, data gaps and errors are common due to multiple sources of human error and
mechanical malfunction. Therefore, we attempt to make specific observations about the data sets
and methods used for this case study.
Soil inputs were derived from the Harmonized World Soil Database (HWSD), and soil properties
of dominant soils were converted to the half-degree simulation units used in this study (see
methods). Several sources of potential uncertainty exist associated with the soils data. The
HWSD itself involves data from multiple sources, various resolutions and differing degree of
aggregation. High resolution soil data (< 1: 1,000,000) were used in the regions of China,
Europe, North America, and South America, and parts of Africa, but the resolution for other
areas was low ( about 1: 5,000,000 ) in the HWSD database. Data resolution does not necessarily
imply higher accuracy. The HWSD documentation acknowledges variability in supporting data
and relatively lower reliability in West Africa and Australia compared to areas complemented by
the Soils and Terrain (SOTER) databases: e.g., Southern and Eastern Africa, Latin America and
the Caribbean, Central and Eastern Europe (FAO/et.al., 2012).
Converting and up-scaling the HWSD soil data to the 62,482 simulation units used as model
inputs in our case study also introduces uncertainty. When we aggregated data for simulation
5
units, we smoothed soil properties and lost soil variability across cells. Additionally, the
simplification step of selecting only the dominant soil for each simulation cell adds to
uncertainty. Generally, two or more types of soils could be present in each HWSD soil map unit,
and there may be several soils with nearly equal proportions in a specific simulation cell.
Therefore, choosing one dominant soil type cannot accurately and consistently represent soil
properties across each cell.
A low-input management system with replanting switchgrass every 12 years was designed for
this global simulation. The annual application rate of 60 kg N ha-1 and 6.7 kg P ha-1 were drawn
from the current available experimental trials and recommended management practices which
are primarily associated with cultivated farm lands. A few trial data sets were from nutrientlimited or less fertile areas. Management factors that are customized to better fit local conditions
and integrated with higher resolution simulation units and supporting field trial data will
contribute to the ability to make improvements in future simulations.
Soil degradation after conversion of tropical forests is known to lead to relatively quick loss of
fertility. We believe that switchgrass productivity is overestimated when simulated for 30 years
on lands that are currently in tropical forests. In addition, constraints such as steep slopes that are
unsuitable for mechanized farming, saline or alkaline soils, and seasonal ponding, are not
adequately considered at the coarse resolution applied in this case study. The potential to
develop switchgrass cultivars and rotations that are better suited to these areas; e.g., ones that
could maintain fertility and productivity under low-input agro-system management or help with
the restoration of previously degraded soils, is not known. Similarly, more intensive management
options and irrigation are not considered by the management files used in the simulation. On
6
balance, we believe that these variables lead to high uncertainty and are likely to overestimate
total biomass potential in simulation units in the tropics and those units characterized by
mountains and periodic flooding. Therefore, the quantitative results for these areas are
speculative and have the highest degrees of uncertainty.
S5 Field-Trials Database Availability and Updates
Consistently measured, spatially-explicit data describing how climate, soils, and crop
management practices influence biomass production and environmental indicators are scarce.
This scarcity has been documented at local scales (Nichols et al, 2011) and the challenges are
magnified at global scales. High-quality field data for model development and validation needs
to be collected, verified and made accessible to the research community. To foster continued
development and extension of the modeling framework, we have established an accessible
website that includes the field-trials and other data sets described above. Two versions of the
field-trials data are available. We archived a static version of the field-trials data and EPIC
management files used for the simulations presented in this paper; these are permanently
archived at https://www.bioenergykdf.net/content/global-switchgrass-field-trial-production-andmanagement-dataset. More importantly, we will create a dynamic version of the field-trials data
in the KDF to support the modeling community. The goal of the dynamic version is to address
many of the limitations of the current simulation by expanding the number of sites and detailed
management practice information in the data set. Researchers are encouraged to access this data
and expand its utility by contributing with additional data when possible. Both datasets contain
ASCII files of the data and metadata. Instructions for contributing additional data to the dynamic
7
version are available at https://www.bioenergykdf.net/content/global-switchgrass-field-trialproduction-and-management-dataset.
References
FAO/IIASA/ISRIC/ISSCAS/JRC (2012) Harmonized World Soil Database (version 1.2). FAO,
Rome, IT and IIASA, Laxenburg, AT.
IPCC (2006) IPCC guidelines for national greenhouse gas inventories. Chapter 3: Uncertainty.
Nichols JA, Kang S, Post W et al. (2011) HPC-EPIC for high resolution simulations of
environmental and sustainability assessment. Computers and Electronics in Agriculture, 79,
112-115.
Ricci PF, Rice D, Ziagos J, Cox LA Jr. (2003) Precaution, uncertainty and causation in
environmental decisions. Environ Int. 2003 Apr; 29(1):1-19.
Walker, W, Harremoes, P, Rotmans, J, van Der Sluijs, J, van Asselt, M, Janssen, P, Krayer Von
Krauss, M (2012) Defining Uncertainty: A Conceptual Basis for Uncertainty Management in
Model-Based Decision Support. Integrated Assessment, North America, 4, Feb. 2005.
Available at: http://journals.sfu.ca/int_assess/index.php/iaj/article/view/122/79 (Date
accessed: 06 Oct. 2012).
Zhang X, Izaurralde RC, Manowitz DM et al. (2010) An integrated modeling framework to
evaluate the productivity and sustainability of biofuel crop production system. GCB
Bioenergy doi: 10.1111/j.1757-1707.2010.01046.x.
8
Fig. S1. Workflow of HPC-EPIC execution on ORNL Institutional Cluster (OIC) at Oak Ridge National
Laboratory.
Fig. S2. Time consumed for the 50 packages of HPC-EPIC simulations on OIC cluster at Oak Ridge
National Laboratory.
9
Fig. S3. Time allocation (person-month) for development of major components for the global simulation
platform for bioenergy crop and switchgrass case study.
Fig. S4. Ecological zones and calibration sites used in the global simulation of Switchgrass.
10
S6 Python script for weather data processing of EPIC weather input files.
Five daily weather variables (maximum temperature, minimum temperature, solar radiation,
precipitation, wind speed and relative humidity) were exported into single weather files for each
simulation cell from CRU-NCEP data in netcdf format.
import sys,os,numpy,datetime,logging
firstFile = True
numOutFiles = 0
outFileList = []
baseYear = 1980
baseDay = 1
# Data starts on January 1st
baseMonth = 1
finalYear = 2010
SECS_PER_HOUR = 60.0 * 60.0
ZERO_K_AS_C = -273.15
logging.basicConfig(stream = sys.stdout, level = logging.INFO)
def convertToCelsius(kelvin):
return kelvin + ZERO_K_AS_C
def calcRelativeHumidity(t, p, q):
"t = temperature, p = pressure, q = mixing ratio qair"
e_s = 6.1121 * math.exp( ((18.678 - t/234.5) * t) / (257.14 + t) ) * 100.0
e_v = q * p * 29.0 / 18.0
rel_hum = e_v/e_s
if (rel_hum > 1.0):
return 1.0
else:
return rel_hum
def open_NetCDF_AnnualFile(filePath):
print filePath
try:
fileRef = netcdf.netcdf_file (filePath, 'r')
except:
logging.critical ('Problem opening file ' + filePath)
return fileRef
def OpenEPICWeatherFile(outDir, latindx, lonindx):
outFilePath = outDir + os.sep + 'EPIC_CRUNCEP_lat' + str(latindx) + '_lon' + str(lonindx) +
'.wth'
try:
outFile = open (outFilePath, 'w')
except IOError as e:
logging.critical ('Could not open output file ' +
outFilePath + ' !!!\n' +
'Exception: ' + str(e))
sys.exit (-1)
return outFile
def AddDataToEPICWeatherFile(outFile, currDatetime, solarRad, maxTemp, minTemp, rain, relHum,
windSpd):
pass
outFile.write (('%6d%4d%4d' + (6 * '%6.2f') + '\n') %
(currDatetime.year, currDatetime.month, currDatetime.day,
solarRad,
maxTemp,
minTemp,
rain,
relHum,
windSpd))
def CloseEPICWeatherFile(outFile):
outFile.close()
11
def gsb_weather_CRUNCEP(latindx, lonindx, baseDir, outDir, optsDict):
outFile = OpenEPICWeatherFile(outDir, latindx, lonindx)
for year in xrange (baseYear, finalYear + 1):
logging.info ('Processing year ' + str (year) + '...')
currDatetime = datetime.datetime (year, baseMonth, baseDay)
currDirPrefix = (baseDir + os.sep)
currDirSuffix = (str(year) + '.nc')
press_FileName = (currDirPrefix + 'press' + os.sep + 'cruncep_press_' + currDirSuffix)
qair_FileName = (currDirPrefix + 'qair' + os.sep + 'cruncep_qair_' + currDirSuffix)
tair_FileName = (currDirPrefix + 'tair' + os.sep + 'cruncep_tair_' + currDirSuffix)
rain_FileName = (currDirPrefix + 'rain' + os.sep + 'cruncep_rain_' + currDirSuffix)
swdown_FileName = (currDirPrefix + 'swdown_total' + os.sep + 'cruncep_swdown_' +
currDirSuffix)
uwind_FileName = (currDirPrefix + 'uwind' + os.sep + 'cruncep_uwind_' + currDirSuffix)
vwind_FileName = (currDirPrefix + 'vwind' + os.sep + 'cruncep_vwind_' + currDirSuffix)
"""
if optsDict ['wind']:
"""
# Open all NetCDF Files for current year
pressFile = open_NetCDF_AnnualFile(press_FileName)
qairFile = open_NetCDF_AnnualFile(qair_FileName)
tairFile = open_NetCDF_AnnualFile(tair_FileName)
rainFile = open_NetCDF_AnnualFile(rain_FileName)
swdownFile = open_NetCDF_AnnualFile(swdown_FileName)
uwindFile = open_NetCDF_AnnualFile(uwind_FileName)
vwindFile = open_NetCDF_AnnualFile(vwind_FileName)
"""
if optsDict ['wind']:
"""
if year % 4 == 0:
maxDay = 366
else:
maxDay = 365
for dayNum in xrange (0, maxDay):
# logging.info ('\tProcessing day ' + str(dayNum + 1) + '...')
timeIndex = dayNum * 4
try:
filepath = "pressure"
pressureDaily = pressFile.variables['press'][timeIndex:timeIndex+4, latindx,
lonindx]
pressureAvgDaily = sum(pressureDaily)/4.0
# print "average pressure = ", pressureAvgDaily
filepath = "qair"
qairDaily = qairFile.variables['qair'][timeIndex:timeIndex+4, latindx, lonindx]
qairAvgDaily = sum(qairDaily)/4.0
# print "average qair = ", qairAvgDaily
filepath = "tair"
tairDaily = tairFile.variables['tair'][timeIndex:timeIndex+4, latindx, lonindx]
tairAvgDaily = convertToCelsius(sum(tairDaily)/4.0)
maxTemp = convertToCelsius(max(tairDaily))
minTemp = convertToCelsius(min(tairDaily))
# print "max tair = ", maxTemp, "min tair = ", minTemp
filepath = "relative humidity"
relHum = calcRelativeHumidity(tairAvgDaily, pressureAvgDaily, qairAvgDaily)
# print "relative humidity = ", relHum
filepath = "rain"
12
rainDaily = rainFile.variables['rain'][timeIndex:timeIndex+4, latindx, lonindx]
rainTotDaily = sum(rainDaily) # mm/6h
# print "total rain = ", rainTotDaily
filepath = "swdown"
swdownDaily = swdownFile.variables['swdown'][timeIndex:timeIndex+4, latindx,
lonindx]
swdownTotDaily = sum(swdownDaily) / 1000000.0 # units = MJ/m^2.
_total file has total over 6 hour period.
# print "total swdown = ", swdownTotDaily
Per Yaxing,
filepath = "wind"
uwindDaily = uwindFile.variables['uwind'][timeIndex:timeIndex+4, latindx,
lonindx]
vwindDaily = vwindFile.variables['vwind'][timeIndex:timeIndex+4, latindx,
lonindx]
uSquared = map(lambda x: x*x, uwindDaily)
vSquared = map(lambda x: x*x, vwindDaily)
sumSquared = [a + b for a,b in zip(uSquared, vSquared)]
speedDaily = map(math.sqrt, sumSquared)
speedAvgDaily = sum(speedDaily)/4.0
# print "avg wind speed = ", speedAvgDaily
except:
logging.critical ('Problem accessing variables in file ' + filepath)
sys.exit (-1)
# Write out data for this day
AddDataToEPICWeatherFile(outFile, currDatetime, swdownTotDaily, maxTemp, minTemp,
rainTotDaily, relHum, speedAvgDaily)
# Done with this day. Increment currDatetime to next day
currDatetime += datetime.timedelta (days = 1)
# Close NetCDF Files
pressFile.close()
qairFile.close()
tairFile.close()
rainFile.close()
swdownFile.close()
uwindFile.close()
vwindFile.close()
CloseEPICWeatherFile(outFile)
13
Download