Supporting Online Material (SOM) for: Global Simulation of Bioenergy Crop Productivity: Analytical Framework and Case Study for Switchgrass Shujiang Kang*, Sujithkumar S. Nair, Keith. L. Kline, Jeffery A. Nichols, Dali Wang, Wilfred M. Post, Craig C. Brandt, Stan D. Wullschleger, Nagendra Singh, and Yaxing Wei *Correspondence author: tel. 865-574-5948, fax 865-574-9501, email:kangs@ornl.gov SOM Contents S1 Description of high-performance computing procedures of HPC-EPIC S2 Model calibration S3 Efficiency of HPC-EPIC simulation S4 Uncertainty in global biomass productivity simulation of switchgrass S5 Field-Trials Database Availability and Updates S6 Python script for weather data processing of EPIC weather input files. S1 Description of high-performance computing procedures of HPC-EPIC This section provides a detailed description of global simulation of switchgrass production conducted on a cluster at Oak Ridge National Laboratory, referred to Description of EPIC and HPC-EPIC in Materials and Methods. Parallelization is achieved by creating multiple packages and distributing them to different processors for execution (Fig. S1). A package is a set of simulations along with their associated input data executed independently on one processor. Because of the data independence of the package, the simulations can proceed in parallel. The processing speed attained is essentially linear with respect to the number of processors utilized. This allows us to vary the number of packages to best fit the computational resources available. Packaging the inputs, simulation configuration, and outputs into a single file further optimizes the utilization of hardware, input/output, and scheduling resources because moving one large file is more time efficient than moving thousands of small files. The packaging and processing procedures were developed and tested recently in collaboration with the Great 1 Lakes Bioenergy Research Center (GLBRC). In that project, high-resolution modeling with HPC-EPIC was conducted to assess bioenergy crop sustainability in two Midwest US states (Nichols et al., 2011; Zhang et al., 2010). For this case study, the data processing and packaging procedures described in Nichols et al. (2011) were generally followed with one major exception: a modification was incorporated to produce packages that are the smallest size possible, yet contain all required files. In the design described, all possible input files were duplicated in each package without regard to the minimum requirements for a given simulation site. This was more efficient in terms of assembling the packages but meant that many packages had data that were not relevant for a given site-simulation. We changed how packages were built because the datasets required for the global simulations are large, numerous, and unique for each location. Therefore, packages were assembled by region and only the input files required for the particular region were included in each package. This modification enabled global, 30-year crop simulations to be completed rapidly (in less than three hours) as described with the results below. S2 Model calibration For areas without established parameters or calibration datasets, we used the parameters from the nearest zones with established parameters or from a zone with a similar climate. This includes three categories of ecological zones. The first category is the same ecological zone but located in different continents or regions. The parameters from calibrated zones are directly shared. For example, we arranged the parameters calibrated from the same ecological zones in northern hemisphere to those located in south hemisphere. The second category is that there is no any calibration for the ecological zones, but the climate of these ecological zones is similar to some 2 other calibrated zones. For example, we have no calibration data for subtropical desert, and the calibrated parameters from subtropical steppe zones are used for these zones. The last category is for the zone out of last two categories, we used the parameters from nearest ecological zones. For example, we assigned parameters calibrated from subtropical ecological zones to tropical zones. S3 Efficiency of HPC-EPIC simulation The ORNL Institutional Cluster supports high-speed computations for climate change simulations and other scientific research (See www.cnms.ornl.gov/capabilities/oic-ornl.pdf) for specifications of the nodes used for these simulations). The HPC-EPIC simulations for 50 packages as described above in methods were performed in parallel and took from 30 to 166 minutes to complete (Fig. S2). We estimate that traditional, serial computation of this set of EPIC simulations under desktops would have taken approximately 500 hours. The actual execution time for each package on a cluster was dependent on the capacity available at the computing nodes which in turn was influenced by the number of cores available at each node and the node’s total load (the nodes were processing other data separate from this task). If the assigned node had a heavy load, memory competition and input-output would slow the execution of the EPIC simulation package. The other factor affecting execution speed was the simulated biophysical processes of the EPIC model itself. For example, failure of switchgrass to grow under harsh conditions such as coldness in the arctic areas or drought in desert areas shortened the EPIC run-time for packages involving simulations for these zones. The computer hours required to run the simulations were negligible compared to the amount of time required to complete other steps to develop and test the HPC-EPIC platform (Fig. S3). Approximately twelve months of research staff time were invested in developing the global 3 platform and generating the initial case study results for switchgrass. The process of identifying, downloading, verifying quality, and transforming or composing the data sources into the input files needed by HPC-EPIC took approximately six months of research staff time, making these basic data collection and assembly steps the most time-consuming part of the case study. Additional steps to complete the simulations involved the development of management files (2 months effort) and iterative model tests, corrections and calibrations (an additional 3 months). For example, the weather data input format was initially transferred into the simulation files with a formatting error that was identified when the model was first tested. Organizing the simulation outputs to provide appropriate structure for review and analysis took approximately one month of staff effort. This time estimate does not include additional effort required to interpret results, improve visualization and prepare corresponding reports. S4 Uncertainty in global biomass productivity simulation of switchgrass Despite significant efforts within specific communities (e.g., climate modeling, (IPCC, 2006)), there is not yet full agreement on terminology and typology of uncertainties associated with systems modeling (Walker et al., 2003), leading to “Balkanized views and interpretations of probabilities, possibilities, likelihood and uncertainty” (Ricci et al., 2003). A generic issue is that models by definition are simplified representations and the sheer size and complexity of the world means that global-scale models depend on data that are averaged, aggregated or otherwise processed in manners that may increase uncertainty, especially when results attempt to estimate an outcome for a specific place and time rather than larger-scale trends. Each data set may involve several dimensions of uncertainty. 4 It is not possible to quantify and discuss in detail the many types and sources of uncertainty inherent in global modeling. For example, global weather data are based on a set of point estimates that cannot capture the full range of variability in weather phenomena across space and time. Because weather infrastructure is limited in many parts of the world, global weather data are estimated for many simulation units. Data that are based on extrapolation and other models are more uncertain than data derived from direct measurement. Even where direct measurement is possible, data gaps and errors are common due to multiple sources of human error and mechanical malfunction. Therefore, we attempt to make specific observations about the data sets and methods used for this case study. Soil inputs were derived from the Harmonized World Soil Database (HWSD), and soil properties of dominant soils were converted to the half-degree simulation units used in this study (see methods). Several sources of potential uncertainty exist associated with the soils data. The HWSD itself involves data from multiple sources, various resolutions and differing degree of aggregation. High resolution soil data (< 1: 1,000,000) were used in the regions of China, Europe, North America, and South America, and parts of Africa, but the resolution for other areas was low ( about 1: 5,000,000 ) in the HWSD database. Data resolution does not necessarily imply higher accuracy. The HWSD documentation acknowledges variability in supporting data and relatively lower reliability in West Africa and Australia compared to areas complemented by the Soils and Terrain (SOTER) databases: e.g., Southern and Eastern Africa, Latin America and the Caribbean, Central and Eastern Europe (FAO/et.al., 2012). Converting and up-scaling the HWSD soil data to the 62,482 simulation units used as model inputs in our case study also introduces uncertainty. When we aggregated data for simulation 5 units, we smoothed soil properties and lost soil variability across cells. Additionally, the simplification step of selecting only the dominant soil for each simulation cell adds to uncertainty. Generally, two or more types of soils could be present in each HWSD soil map unit, and there may be several soils with nearly equal proportions in a specific simulation cell. Therefore, choosing one dominant soil type cannot accurately and consistently represent soil properties across each cell. A low-input management system with replanting switchgrass every 12 years was designed for this global simulation. The annual application rate of 60 kg N ha-1 and 6.7 kg P ha-1 were drawn from the current available experimental trials and recommended management practices which are primarily associated with cultivated farm lands. A few trial data sets were from nutrientlimited or less fertile areas. Management factors that are customized to better fit local conditions and integrated with higher resolution simulation units and supporting field trial data will contribute to the ability to make improvements in future simulations. Soil degradation after conversion of tropical forests is known to lead to relatively quick loss of fertility. We believe that switchgrass productivity is overestimated when simulated for 30 years on lands that are currently in tropical forests. In addition, constraints such as steep slopes that are unsuitable for mechanized farming, saline or alkaline soils, and seasonal ponding, are not adequately considered at the coarse resolution applied in this case study. The potential to develop switchgrass cultivars and rotations that are better suited to these areas; e.g., ones that could maintain fertility and productivity under low-input agro-system management or help with the restoration of previously degraded soils, is not known. Similarly, more intensive management options and irrigation are not considered by the management files used in the simulation. On 6 balance, we believe that these variables lead to high uncertainty and are likely to overestimate total biomass potential in simulation units in the tropics and those units characterized by mountains and periodic flooding. Therefore, the quantitative results for these areas are speculative and have the highest degrees of uncertainty. S5 Field-Trials Database Availability and Updates Consistently measured, spatially-explicit data describing how climate, soils, and crop management practices influence biomass production and environmental indicators are scarce. This scarcity has been documented at local scales (Nichols et al, 2011) and the challenges are magnified at global scales. High-quality field data for model development and validation needs to be collected, verified and made accessible to the research community. To foster continued development and extension of the modeling framework, we have established an accessible website that includes the field-trials and other data sets described above. Two versions of the field-trials data are available. We archived a static version of the field-trials data and EPIC management files used for the simulations presented in this paper; these are permanently archived at https://www.bioenergykdf.net/content/global-switchgrass-field-trial-production-andmanagement-dataset. More importantly, we will create a dynamic version of the field-trials data in the KDF to support the modeling community. The goal of the dynamic version is to address many of the limitations of the current simulation by expanding the number of sites and detailed management practice information in the data set. Researchers are encouraged to access this data and expand its utility by contributing with additional data when possible. Both datasets contain ASCII files of the data and metadata. Instructions for contributing additional data to the dynamic 7 version are available at https://www.bioenergykdf.net/content/global-switchgrass-field-trialproduction-and-management-dataset. References FAO/IIASA/ISRIC/ISSCAS/JRC (2012) Harmonized World Soil Database (version 1.2). FAO, Rome, IT and IIASA, Laxenburg, AT. IPCC (2006) IPCC guidelines for national greenhouse gas inventories. Chapter 3: Uncertainty. Nichols JA, Kang S, Post W et al. (2011) HPC-EPIC for high resolution simulations of environmental and sustainability assessment. Computers and Electronics in Agriculture, 79, 112-115. Ricci PF, Rice D, Ziagos J, Cox LA Jr. (2003) Precaution, uncertainty and causation in environmental decisions. Environ Int. 2003 Apr; 29(1):1-19. Walker, W, Harremoes, P, Rotmans, J, van Der Sluijs, J, van Asselt, M, Janssen, P, Krayer Von Krauss, M (2012) Defining Uncertainty: A Conceptual Basis for Uncertainty Management in Model-Based Decision Support. Integrated Assessment, North America, 4, Feb. 2005. Available at: http://journals.sfu.ca/int_assess/index.php/iaj/article/view/122/79 (Date accessed: 06 Oct. 2012). Zhang X, Izaurralde RC, Manowitz DM et al. (2010) An integrated modeling framework to evaluate the productivity and sustainability of biofuel crop production system. GCB Bioenergy doi: 10.1111/j.1757-1707.2010.01046.x. 8 Fig. S1. Workflow of HPC-EPIC execution on ORNL Institutional Cluster (OIC) at Oak Ridge National Laboratory. Fig. S2. Time consumed for the 50 packages of HPC-EPIC simulations on OIC cluster at Oak Ridge National Laboratory. 9 Fig. S3. Time allocation (person-month) for development of major components for the global simulation platform for bioenergy crop and switchgrass case study. Fig. S4. Ecological zones and calibration sites used in the global simulation of Switchgrass. 10 S6 Python script for weather data processing of EPIC weather input files. Five daily weather variables (maximum temperature, minimum temperature, solar radiation, precipitation, wind speed and relative humidity) were exported into single weather files for each simulation cell from CRU-NCEP data in netcdf format. import sys,os,numpy,datetime,logging firstFile = True numOutFiles = 0 outFileList = [] baseYear = 1980 baseDay = 1 # Data starts on January 1st baseMonth = 1 finalYear = 2010 SECS_PER_HOUR = 60.0 * 60.0 ZERO_K_AS_C = -273.15 logging.basicConfig(stream = sys.stdout, level = logging.INFO) def convertToCelsius(kelvin): return kelvin + ZERO_K_AS_C def calcRelativeHumidity(t, p, q): "t = temperature, p = pressure, q = mixing ratio qair" e_s = 6.1121 * math.exp( ((18.678 - t/234.5) * t) / (257.14 + t) ) * 100.0 e_v = q * p * 29.0 / 18.0 rel_hum = e_v/e_s if (rel_hum > 1.0): return 1.0 else: return rel_hum def open_NetCDF_AnnualFile(filePath): print filePath try: fileRef = netcdf.netcdf_file (filePath, 'r') except: logging.critical ('Problem opening file ' + filePath) return fileRef def OpenEPICWeatherFile(outDir, latindx, lonindx): outFilePath = outDir + os.sep + 'EPIC_CRUNCEP_lat' + str(latindx) + '_lon' + str(lonindx) + '.wth' try: outFile = open (outFilePath, 'w') except IOError as e: logging.critical ('Could not open output file ' + outFilePath + ' !!!\n' + 'Exception: ' + str(e)) sys.exit (-1) return outFile def AddDataToEPICWeatherFile(outFile, currDatetime, solarRad, maxTemp, minTemp, rain, relHum, windSpd): pass outFile.write (('%6d%4d%4d' + (6 * '%6.2f') + '\n') % (currDatetime.year, currDatetime.month, currDatetime.day, solarRad, maxTemp, minTemp, rain, relHum, windSpd)) def CloseEPICWeatherFile(outFile): outFile.close() 11 def gsb_weather_CRUNCEP(latindx, lonindx, baseDir, outDir, optsDict): outFile = OpenEPICWeatherFile(outDir, latindx, lonindx) for year in xrange (baseYear, finalYear + 1): logging.info ('Processing year ' + str (year) + '...') currDatetime = datetime.datetime (year, baseMonth, baseDay) currDirPrefix = (baseDir + os.sep) currDirSuffix = (str(year) + '.nc') press_FileName = (currDirPrefix + 'press' + os.sep + 'cruncep_press_' + currDirSuffix) qair_FileName = (currDirPrefix + 'qair' + os.sep + 'cruncep_qair_' + currDirSuffix) tair_FileName = (currDirPrefix + 'tair' + os.sep + 'cruncep_tair_' + currDirSuffix) rain_FileName = (currDirPrefix + 'rain' + os.sep + 'cruncep_rain_' + currDirSuffix) swdown_FileName = (currDirPrefix + 'swdown_total' + os.sep + 'cruncep_swdown_' + currDirSuffix) uwind_FileName = (currDirPrefix + 'uwind' + os.sep + 'cruncep_uwind_' + currDirSuffix) vwind_FileName = (currDirPrefix + 'vwind' + os.sep + 'cruncep_vwind_' + currDirSuffix) """ if optsDict ['wind']: """ # Open all NetCDF Files for current year pressFile = open_NetCDF_AnnualFile(press_FileName) qairFile = open_NetCDF_AnnualFile(qair_FileName) tairFile = open_NetCDF_AnnualFile(tair_FileName) rainFile = open_NetCDF_AnnualFile(rain_FileName) swdownFile = open_NetCDF_AnnualFile(swdown_FileName) uwindFile = open_NetCDF_AnnualFile(uwind_FileName) vwindFile = open_NetCDF_AnnualFile(vwind_FileName) """ if optsDict ['wind']: """ if year % 4 == 0: maxDay = 366 else: maxDay = 365 for dayNum in xrange (0, maxDay): # logging.info ('\tProcessing day ' + str(dayNum + 1) + '...') timeIndex = dayNum * 4 try: filepath = "pressure" pressureDaily = pressFile.variables['press'][timeIndex:timeIndex+4, latindx, lonindx] pressureAvgDaily = sum(pressureDaily)/4.0 # print "average pressure = ", pressureAvgDaily filepath = "qair" qairDaily = qairFile.variables['qair'][timeIndex:timeIndex+4, latindx, lonindx] qairAvgDaily = sum(qairDaily)/4.0 # print "average qair = ", qairAvgDaily filepath = "tair" tairDaily = tairFile.variables['tair'][timeIndex:timeIndex+4, latindx, lonindx] tairAvgDaily = convertToCelsius(sum(tairDaily)/4.0) maxTemp = convertToCelsius(max(tairDaily)) minTemp = convertToCelsius(min(tairDaily)) # print "max tair = ", maxTemp, "min tair = ", minTemp filepath = "relative humidity" relHum = calcRelativeHumidity(tairAvgDaily, pressureAvgDaily, qairAvgDaily) # print "relative humidity = ", relHum filepath = "rain" 12 rainDaily = rainFile.variables['rain'][timeIndex:timeIndex+4, latindx, lonindx] rainTotDaily = sum(rainDaily) # mm/6h # print "total rain = ", rainTotDaily filepath = "swdown" swdownDaily = swdownFile.variables['swdown'][timeIndex:timeIndex+4, latindx, lonindx] swdownTotDaily = sum(swdownDaily) / 1000000.0 # units = MJ/m^2. _total file has total over 6 hour period. # print "total swdown = ", swdownTotDaily Per Yaxing, filepath = "wind" uwindDaily = uwindFile.variables['uwind'][timeIndex:timeIndex+4, latindx, lonindx] vwindDaily = vwindFile.variables['vwind'][timeIndex:timeIndex+4, latindx, lonindx] uSquared = map(lambda x: x*x, uwindDaily) vSquared = map(lambda x: x*x, vwindDaily) sumSquared = [a + b for a,b in zip(uSquared, vSquared)] speedDaily = map(math.sqrt, sumSquared) speedAvgDaily = sum(speedDaily)/4.0 # print "avg wind speed = ", speedAvgDaily except: logging.critical ('Problem accessing variables in file ' + filepath) sys.exit (-1) # Write out data for this day AddDataToEPICWeatherFile(outFile, currDatetime, swdownTotDaily, maxTemp, minTemp, rainTotDaily, relHum, speedAvgDaily) # Done with this day. Increment currDatetime to next day currDatetime += datetime.timedelta (days = 1) # Close NetCDF Files pressFile.close() qairFile.close() tairFile.close() rainFile.close() swdownFile.close() uwindFile.close() vwindFile.close() CloseEPICWeatherFile(outFile) 13