InversionDocumentation2

advertisement

A Guide to Global Atmospheric Inversions

Prepared by: Martha Butler

Last Updated: March 19, 2011

Inversion Guide Page 1

This document is organized generally in the order in which steps are taken to do a global atmospheric inversion. Refer also for a general description in Butler, et al. (2010) in Tellus B .

Codes and examples of some of the control files are included in subdirectories with titles that match the major sections here. This document includes file formats of many of the netCDF files used. Code is provided “as-last-used” with no guarantees that it will work without modification.

Obvious changes required include directory specifications and local naming conventions.

Inversion Guide Page 2

Region Definitions and Observation Locations for Model Sampling

[Refer to directories 1_region_defs and 1_obs_locns in the inv_doc_pkg]

One of the first decisions to be made is the spatial resolution of the inversion. This section covers region definitions adopted for the inversion used as an example in this document. The region boundaries were determined roughly by eco-region drawing upon knowledge experts.

Region Codes: read_regions3.pro

: Input is netCDF file regions3.nc. Displays some variables including the mapping of regions to aggregated regions (called “groups”) or typical TransCom regions. This file contains the mapping of PCTM (2x2.5) grid cells to region numbers. read_fine_regions.pro

: Input is regions1x1.nc. Similar to read_regions3.pro, but for a finer scale grid. This is useful in the downscaling of inversion estimated fluxes from region to the common grid used for comparing inversion results. plotregall3.pro

: This is an example of a global map with region boundaries with optional overplotting of observation locations. This happens to be the example of the map shown in the TellusB paper. Input is a list of observation sites, in this case maplist_Paper1_rev.dat.

Region files: eddy:/abl/s0/users/mpbutler/inverse/PCTM/regions[5005]% ncdump -h regions3.nc netcdf regions3 { dimensions:

longitude = 144 ;

latitude = 91 ;

region = 48 ;

scheme = 3 ;

group = 40 ;

long_namelen = 30 ;

short_namelen = 10 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

int region_map(latitude, longitude) ;

int translation_array(scheme, region) ;

char long_region_name(region, long_namelen) ;

char short_region_name(region, short_namelen) ;

char group_name(group, long_namelen) ;

int group_region_count(group) ;

int group_region_mapping(region, group) ;

float region_area(region) ;

region_area:units = "m2" ;

float group_area(group) ;

group_area:units = "m2" ;

} netcdf regions1x1 { dimensions:

longitude = 360 ;

Inversion Guide Page 3

latitude = 180 ;

region = 48 ;

long_namelen = 30 ;

short_namelen = 10 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

int region_map_1x1(latitude, longitude) ;

char long_region_name(region, long_namelen) ;

char short_region_name(region, short_namelen) ;

float region_area(region) ;

region_area:units = "m2" ;

int land_sea_mask(latitude, longitude) ;

}

Observation Sites:

There are a number of ascii files that contain observation sites and characteristics used to develop site lists for processing of raw observation time series, locating sites on the PCTM grid, preparing point-by-point selections for PCTM, and preparing the site instructions for post forward model processing of model samples.

The following used an early stnloc.list which evolved into resolved_stnloc.list and master_obs.list (through much manual editing). This list has more locations on it than actually have observation programs. Heading rows define the columns, which include time offset from

UTC, location type, observation frequency, and responsible monitoring agency (if any). Other files used in locating sites in grid cells include: landmask.lores.dat (routinely used to differentiate land/ocean/permanent ice grid cells) and surf.height (containing the nominal surface elevations assumed for each grid cell).

Please be aware that the algorithm for finding grid cells for station locations uses the convention in the CSU version of PCTM of the latitude index beginning with zero when these were used.

Sampling site codes: locatestat.f90

: Locates sites on grid with origin at Greenwich/South Pole. locatestat2.f90

: Same but for Dateline/South Pole origin. create_pbp2.f90

: Input are stnloc.list and landmask.lores.dat. Output is pbp_profsite which is input to

PCTM for nearest grid box model sampling (profiles are chosen here). create_postproc1v2.f90

: Input is stnloc.list and output is postproc1.list which instructs post processing reduction of model sample profiles to station samples.

Files which will be used in controlling the processing and preparation of raw observation files: all_active_20080617.dat continuous_active_20080617.dat flask_active_20080617.dat aircraft_active_20080617.dat

And finally station lists used in preparing the inversion-specific lists of observation sites used in an inversion execution (format is dictated by inversion code): list_dat_master_20080617.dat

Inversion Guide Page 4

list_dat_master_active_20080617.dat

Inversion Guide Page 5

Preparation of Background Fluxes for Forward Transport

[Refer to directory 2_background_flux]

Background Fluxes

In general, the assumption is that one would choose more recent, perhaps higher resolution (in time and space) background fluxes for forward transport runs than I used. In that light, this section assumes that maps at PCTM spatial resolution have been created (examples given under

Files below). Codes here are given as examples of how to prepare the maps for use in PCTM. A caveat for all flux map preparation: know if your PCTM setup is configured for dateline/South

Pole origin or Greenwich/South Pole origin!

Background fluxes used in my inversions include:

CASA monthly mean (from TransCom): monthly maps reused every year

SiB3 hourly (from Ian Baker): maps are specific to years. These fluxes were sourced directly at

CSU. I have only monthly summaries, which were used for post-inversion analysis.

Takahashi 2002 ocean fluxes: monthly maps reused every year

Biomass Burning: GFED2: monthly maps specific to years

Annual Fossil Fuel: annual maps specific to years These maps share the same spatial pattern, but are scaled to global emission totals that can be found at CDIAC. The CDIAC annual totals change over time. The ones here date to ~2006 or 2007 if I recall.

Seasonal Fossil Fuel: monthly maps specific to years. If you need the code used to create these, please let me know. Mine differs, I think, somewhat from the way that Zhu did it.

For later use in post inversion processing, but included here because these codes use these same flux map files, is the creation of summary ‘presub’ files. These take the background flux maps and summarize them at the level of region/month or group/month.

Codes for creating PCTM input flux maps: makemonflux.f90

: Input is regrid_casa_mon.nc or regrid_ocn_mon.nc. Output are 12 monthly maps. makefossilflux2.f90

: Input is annfossil_6yr.nc. Output are 5 annual mean maps (2000-2004). makefossilflux3.f90

: Input is seasfossil_5yr.nc. Output are five annual files, each with 12 monthly maps. fire1.f90

: Input is fire-5yr.nc. Output are five annual files, each with 12 monthly maps.

Verification: checkmonflux.f90

: reports totals for CASA or ocean fluxes fire2.f90

: reports totals for biomass burning fluxes seasfossil6.f90

: reports totals for seasonal fossil

Codes for summary presub file creation: presub_iav.f90

: Input are seasfossil_5yr.nc, SiBmaps_5yr.nc, regrid_ocn_mon.nc, fire_5yr.nc (for example). Output is presub_iav_v1.nc (for this combination of background fluxes). make_presub_grp.f90

: Input are regions3.nc and presub_iav_v1.nc; Output is presub_iav_grp_v1.nc with summaries at the group level, including TransCom regions.

Files

Inversion Guide Page 6

eddy:/abl/s0/users/mpbutler/transfer_psutocsu[5078]% ncdump -h annfossil_6yr.nc netcdf annfossil_6yr { dimensions:

longitude = 144 ;

latitude = 91 ;

year = 6 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float annual_mag(year) ;

annual_mag:units = "mol/s" ;

float seasfossil(latitude, longitude, year) ;

seasfossil:units = "mol/m2/s" ;

} eddy:/abl/s0/users/mpbutler/transfer_psutocsu[5079]% ncdump -h seasfossil_5yr.nc netcdf seasfossil_5yr { dimensions:

longitude = 144 ;

latitude = 91 ;

year = 5 ;

month = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float monthly_mag(year, month) ;

monthly_mag:units = "mol/s" ;

float seasfossil(year, month, latitude, longitude) ;

seasfossil:units = "mol/m2/s" ;

} eddy:/abl/s0/users/mpbutler/transfer_psutocsu[5080]% ncdump -h fire_5yr.nc netcdf fire_5yr { dimensions:

longitude = 144 ;

latitude = 91 ;

time1 = 12 ;

time2 = 5 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float flux_mag(time2, time1) ;

flux_mag:units = "mol/s" ;

float monthly_flux(time2, time1, latitude, longitude) ;

monthly_flux:units = "mol/m2/s" ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5087]% ncdump -h regrid_casa_mon.nc netcdf regrid_casa_mon { dimensions:

longitude = 144 ;

latitude = 91 ;

Inversion Guide Page 7

time = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float monthly_mag(time) ;

monthly_mag:units = "mol/s" ;

float casa_flux(time, latitude, longitude) ;

casa_flux:units = "mol/m2/s" ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5088]% ncdump -h regrid_ocn_mon.nc netcdf regrid_ocn_mon { dimensions:

longitude = 144 ;

latitude = 91 ;

time = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float monthly_mag(time) ;

monthly_mag:units = "mol/s" ;

float ocn_flux(time, latitude, longitude) ;

ocn_flux:units = "mol/m2/s" ;

} netcdf presub_iav_v1 { dimensions:

len = 10 ;

presub = 4 ;

region = 47 ;

month = 12 ;

year = 5 ; variables:

char presubnm(presub, len) ;

char regnm(region, len) ;

float global_totals(year, presub) ;

global_totals:units = "Gt C" ;

float region_totals(year, region, presub) ;

region_totals:units = "Gt C" ;

float presub(year, month, region, presub) ;

presub:units = "Gt C" ;

} netcdf presub_iav_grp_v1 { dimensions:

longlen = 30 ;

len = 10 ;

presub = 4 ;

group = 40 ;

region = 47 ;

month = 12 ;

year = 5 ; variables:

Inversion Guide Page 8

char presubnm(presub, len) ;

char regnm(region, len) ;

char grpnm(group, longlen) ;

float global_totals(year, presub) ;

global_totals:units = "Gt C" ;

float region_totals(year, region, presub) ;

region_totals:units = "Gt C" ;

float group_totals(year, group, presub) ;

float presub(year, month, region, presub) ;

presub:units = "Gt C" ;

}

float group_presub(year, month, group, presub) ;

group_presub:units = "Gt C" ;

Inversion Guide Page 9

Preparation of Basis Functions for Forward Transport

[Refer to directory 3_basis_functions]

Most regridding to the PCTM grid is based on a code (regrid.f) provided by Kevin Gurney, which regridded TransCom input (0.5 x 0.5 degree in file input.new.dat) to PCTM resolution.

The land patterns attribute to each grid cell in a region a weight or proportion of the bio activity in the region. The ocean patterns are 'flat' with equal weighting to each grid cell within the region

(except where there is seasonal sea ice). No attempt was made to blur continental or regional margins. Each grid cell is in one and only one region; each grid cell is land or ocean or permanent ice. But see the TransCom3 protocol for other approaches.

Codes: pattern.f90

: Input are input.new.dat and landmask.lores.dat from/for TransCom. Output are the hires_landpattern.nc (0.5x0.5), landpattern.nc(2.5x2.0), and hires_ocnpattern.nc (0.5x0.5) [ocean pattern file not used until later]. landbasis.f90

: Input are landpattern.nc and the region number map grid predecessor to regions3.nc.

Output is landbasis5.nc. regridobasis.f90

: Input are input.new.dat and landmask.lores.dat. Output is ocnbasis5.nc. makepulse.f90

: Input is landbasis5.nc. Output are 36 2D maps (one for each land region) in a format suitable for PCTM input. Note that the pattern is the same for all months of the year. makeopulse.f90

: Input is ocnbasis5.nc. Output are 132 2D maps (one for each ocean region and month) in a format suitable for PCTM input. Patterns differ by month for the few ocean regions with seasonal sea ice.

Verification/checking codes: pulsecheck.f90

(for landbasis5.nc) and pulsemonth.f90

(for ocnbasis5.nc)

For use (much later) in downscaling inversion flux estimates at region scale to 1x1 degree (common format for comparison across inversions): pattern2.f90

: Input are hires_landpattern.nc and hires_ocnpattern.nc. Output are landpattern1x1.nc and oceanpattern1x1.nc. Note that the 1x1 ocean pattern file only indicates a 1 or 0 in each grid box, indicating what is ocean and what is not ocean.

Files: eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5046]% ncdump -h hires_landpattern.nc netcdf hires_landpattern { dimensions:

longitude = 720 ;

latitude = 360 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

double lpattern(latitude, longitude) ;

lpattern:units = "none" ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5047]% ncdump -h landpattern.nc netcdf landpattern {

Inversion Guide Page 10

dimensions:

longitude = 144 ;

latitude = 91 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

double lpattern(latitude, longitude) ;

lpattern:units = "none" ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5048]% ncdump -h landbasis5.nc netcdf landbasis5 { dimensions:

longitude = 144 ;

latitude = 91 ;

region = 36 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east of dateline" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

}

double landbasis(region, latitude, longitude) ;

landbasis:units = "mol/m2/s" ; eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5049]% ncdump -h hires_ocnpattern.nc netcdf hires_ocnpattern { dimensions:

longitude = 720 ;

latitude = 360 ;

time = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float opattern(time, latitude, longitude) ;

opattern:units = "none" ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5050]% ncdump -h ocnbasis5.nc netcdf ocnbasis5 { dimensions:

longitude = 144 ;

latitude = 91 ;

region = 11 ;

time = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

double ocnbasis(time, region, latitude, longitude) ;

ocnbasis:units = "mol/m2/s" ;

}

Inversion Guide Page 11

eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5051]% ncdump -h landpattern1x1.nc netcdf landpattern1x1 { dimensions:

longitude = 360 ;

latitude = 180 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

double lpattern(latitude, longitude) ;

lpattern:units = "none" ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/input[5052]% ncdump -h oceanpattern1x1.nc netcdf oceanpattern1x1 { dimensions:

longitude = 360 ;

latitude = 180 ;

month = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

float opattern(month, latitude, longitude) ;

opattern:units = "none" ;

}

Inversion Guide Page 12

Forward Transport (PCTM)

[Refer to directory 4_PCTM]

All comments here apply to the CSU version of PCTM. If you have sign on to a Scott Denning server (not sure that I still have a valid signon?) and can find the directory /user1/martha/, you can locate the whole code package that I used. The specific codes that I modified for my own purposes are included in the package accompanying this document.

The changes to the PCTM codes are annotated, usually in the beginning as well as at the changes. Modifications were made to handle the time resolutions of the different flux inputs, the output of the profiles at the sampling locations, the output of 10-day mean atmospheric grids, differences from the original CSU code for GEOS-4 driver data, and the ‘Nick Parazoo fix’ to prevent excessive vertical mixing when using 6-hourly (rather than 3-hourly) meteorological driver data.

Protocol for Forward Runs:

Assuming that the driver data is available and matched to the years of the background fluxes: run each background flux with emissions ON for one year and OFF for the remaining years. For example, the 2000 ocean flux was run with emissions for 2000 and continued without emissions for 2001-2004; the 2001 ocean flux was run with emissions for 2001 and continued without emissions for 2002-2004. The basis (response) functions were run for one month with emissions and then turned off for the remainder of the run. For example, January 2000 response functions were run beginning with January 2000 driver data for one month ON, and then for 24 months with no emissions. The 24 months of circulation after emissions was chosen as the time when most of the response function emissions were completely mixed in the model atmosphere. This was verified with the 2002 response functions which were run until the end of the driver data

(end of 2004). And, yes, this does mean running each region/month response function as a separate tracer (in my case, 47 regions x 12 months x 5 years), using many multiple tracer runs.

For computing efficiency, many basis function tracers (for the same month, but different regions) were run in the same PCTM execution.

Output of PCTM runs: pbp_profsite files specify the horizontal grid locations where vertical column profiles (as well as the corresponding surface pressures and temperature vertical profiles) are written out hourly. Disk space availability precluded retention of the column data, which was reduced in a post processing step. A 10-day mean history of the atmospheric grid for each tracer was also written out in each forward run.

PCTM codes modified/added: modules_mpb_bg.f90, modules_mpb_lpulse.f90, modules_mpb_opulse.f90, chemparm_bg6.f90, chemparm_pulse.f90, mainpchem6.f90, pbp_setup.f90, pbp_setup_call_mpb.f90, pchem_co2_fvdas.f90, read_fvdas6.f90, readcloud6.f90, set_params_mpb.f90, write_pbp_mpb.f90, writehistory_mpb.f90, zenith_angle_low.f90 and Makefile.linux_mpb6. As you will

Inversion Guide Page 13

see in the makefile, in general, replaced codes were renamed with the standard name before execution of the makefile. Replaced, original code was saved in the /user1/martha/ libraries.

Files: pbp_profsite, qsub script examples (bg and pulse), runtime examples (bg and pulse)

Inversion Guide Page 14

Post Processing of the Forward Transport Runs

[Refer to directory 5_postproc_PCTM]

The first step in post processing is to resolve the column output of the PCTM forward run to model samples at site locations. Another objective is to turn the binary file PCTM output into netCDF files for future use. (These netCDF files were FTP'd from CSU to Penn State.) There are two streams of files produced here: one is the sampling site responses to the basis function and the background flux forward runs of PCTM; the second is an archival process of 10-day man atmospheric grids (and matching surface pressure/temperature profiles) for the PCTM forward runs. These last have been used for some basic verification and analysis, but are underutilized.

Possible to use for an OSSE?

Codes: postproc1.f90

which requires postproc1.list (instructions for reducing columns to site samples) and postproc1.control. postproc1.list is the same for each run of postproc1.f90, postproc1.control is specific to the PCTM run being processed. Input are the point-by-point binary files output from PCTM;

Output are netCDF files with names ending in '_response.nc'. For the basis function PCTM runs there are four netCDF files for each month from the beginning of 2000 to the end of 2004 (three groups of land regions and one group of ocean regions). For each background fluxes there are five netCDF files, one for each year.

Codes for creating the netCDF versions of the atmospheric history grids (and matching surface pressure and temperature profiles): readbinhist.f90

(basis function atmospheric grids), readbinpst.f90

(matching surface pressure and temperature profiles), readbinhistbg.f90

(background flux atmospheric grids), readbinhistpst.f90

(matching surface pressure and temperature profiles).

File examples:

Example netCDF file output of postproc1.f90 for January 2000 responses at sampling sites for the first of three groups of land regions

eddy:/abl/s0/users/mpbutler/inverse/PCTM/basisresponsenew[5005]% ncdump -h LG1Jan0_response.nc netcdf LG1Jan0_response { dimensions:

station = 864 ;

tracer = 12 ;

time = 18288 ;

len = 15 ; variables:

char station(station, len) ;

char tracer(tracer, len) ;

int time(time) ;

time:units = "hours of integration" ;

float model_level(time, station) ;

float response(time, tracer, station) ;

response:units = "ppm" ;

}

Example of netCDF file output of postproc1.f90 for January 2000 responses at sampling sites for the ocean regions eddy:/abl/s0/users/mpbutler/inverse/PCTM/basisresponsenew[5006]% ncdump -h OGJan0_response.nc netcdf OGJan0_response { dimensions:

Inversion Guide Page 15

station = 864 ;

tracer = 11 ;

time = 18288 ;

len = 15 ; variables:

char station(station, len) ;

char tracer(tracer, len) ;

int time(time) ;

time:units = "hours of integration" ;

float model_level(time, station) ;

}

float response(time, tracer, station) ;

response:units = "ppm" ;

Example of netCDF file output of postproc1.f90 for responses at sampling sites for 2000 SiB background flux eddy:/abl/s0/users/mpbutler/inverse/PCTM/bgresponsenew[5010]% ncdump -h SiB00_response.nc netcdf SiB00_response { dimensions:

station = 864 ;

tracer = 1 ;

time = 43848 ;

len = 15 ; variables:

char station(station, len) ;

char tracer(tracer, len) ;

int time(time) ;

time:units = "hours of integration" ;

float model_level(time, station) ;

float response(time, tracer, station) ;

response:units = "ppm" ;

}

Example of netCDF file output readbinhist.f90 10-day mean atmospheric grids for land region 5 in january

2000 eddy:/abl/s0/users/mpbutler/inverse/PCTM/basisoutputnew/2000[5016]% ncdump -h lnd05m01.nc netcdf lnd05m01 { dimensions:

longitude = 144 ;

latitude = 91 ;

level = 25 ;

time = 76 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

int level(level) ;

level:units = "model level" ;

float mixingratio(time, level, latitude, longitude) ;

mixingratio:units = "ppm" ;

}

Example of netCDF file output of readbinpst.f90 matching the January 2000 basis function atmospheric grids eddy:/abl/s0/users/mpbutler/inverse/PCTM/basisoutputnew/2000[5017]% ncdump -h pst_20000101.nc

Inversion Guide Page 16

netcdf pst_20000101 { dimensions:

longitude = 144 ;

latitude = 91 ;

level = 25 ;

time = 76 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

int level(level) ;

level:units = "model level" ;

float surface_pressure(time, latitude, longitude) ;

surface_pressure:units = "hPa" ;

float temperature_profile(time, level, latitude, longitude) ;

temperature_profile:units = "K" ;

}

Example of netCDF file output of readbinhistbg.f90 10-day mean atmospheric grids for the SiB 2000 bg flux eddy:/abl/s0/users/mpbutler/inverse/PCTM/bgoutputnew[5022]% ncdump -h SiB00.nc netcdf SiB00 { dimensions:

longitude = 144 ;

latitude = 91 ;

level = 25 ;

time = 182 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

int level(level) ;

level:units = "model level" ;

float mixingratio(time, level, latitude, longitude) ;

mixingratio:units = "ppm" ;

}

Example of netCDF file output of readbinhstpst.f90 matching the 2000 background fluxes eddy:/abl/s0/users/mpbutler/inverse/PCTM/bgoutputnew[5023]% ncdump -h pst_2000.nc netcdf pst_2000 { dimensions:

longitude = 144 ;

latitude = 91 ;

level = 25 ;

time = 182 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

int level(level) ;

level:units = "model level" ;

float surface_pressure(time, latitude, longitude) ;

surface_pressure:units = "hPa" ;

Inversion Guide Page 17

float temperature_profile(time, level, latitude, longitude) ;

temperature_profile:units = "K" ;

}

Inversion Guide Page 18

Acquisition of CO

2

Observation Time Series

[Refer to directory 6_obsprep]

For this, please see the separate document CO2_Obs_Acquisition_20101217.doc, Collecting and processing raw observation time series which references observation sources and the codes to make common format hourly/daily/monthly gap-filled time series. For the inversions here, time series were processed for many sites, but only those that were missing 12 or fewer monthly records during 2000-2004 were included in the inversion (with a few exceptions). The lengthy procedure for acquiring ARM/Southern Great Plains CO

2

data is obsolete, as those data can now be acquired from the Ameriflux site.

N.B. If you use Globalview product as input, ask someone (David Baker?) about generating model-data mismatch error to put into the observations file. I used the natural monthly variability of the observations.

The directory 6_obsprep contains codes used to process observations sourced from NOAA ESRL and WDCGG archives. Let me know if you need code for specific flux tower sites. Be aware that file formats may change at the archive sites.

Inversion Guide Page 19

Packaging the Observations for Inversion

[Refer to directory 7_pkgobs]

Monthly averaged observations are put into the format expected by the inversion code. Also described here is the creation of matrix (essentially an index of observation hours by site) that will be used to create a co-sampled transport matrix for the inversion. And there is code to merge the active observation data set into a larger data set containing simulated observations used to test the potential value of observation sites where model samples were done, but no observations actually exist. Simulated observations are described later.

Codes: package_obs2.f90

Input are the master_obs.list (all sites marked 'active' are selected) and the final gap-filled and standard deviation-adjusted monthly averaged observations at active observation sites. Output is a netCDF file of observations for the 205 active sites potentially available for use. There is the option in this code to add model-data mismatch error based on a classification code for each site in the master_obs.list (e.g., coastal, continental, remote). Based on analysis of posterior statistics, I did not use the model data mismatch (except to test the effect on the inversion of adding it). The natural monthly variability of the raw observations seemed to be adequate. The format of the packaged observations is dictated by the inversion code. However, the 'station_count' variable (not used in the inversion code) is used to hold a position index of the site. build_obshr_arrayf.f90

Input are the all_active_20080617.dat list and all of the raw observations at their finest temporal resolution. An array variable in the output file contains a 0 or 1 for every hour in the time span 2000-2004. '1' indicates that there is an observation for this hour. There are also variables for the index position of the active observation site in the context of the 424-site model sample files (most likely to be used) and the 864-site model sample files (output of postproc1.f90). These indexes are used in building different versions of the transport matrix. merge_obs_simobs2.f90

Input are the 205-site active observations file and a 424-site simulated observations file. Output is a merged file with the active observations overlaying the simulated observations for the active observations sites. This merged 'observations' file is used in inversion variations testing potential observation sites. eddy:/abl/s0/users/mpbutler/inverse/PCTM/obs[5029]% ncdump -h active_obs205_20090911.nc netcdf active_obs205_20090911 { dimensions:

station_count = 205 ;

mtot = 60 ;

datatype = 2 ;

latlon = 2 ;

namelen = 15 ; variables:

char site_name(station_count, namelen) ;

int station_count(station_count) ;

float mtot(mtot) ;

float station_location(latlon, station_count) ;

float CO2_concentration(datatype, mtot, station_count) ;

} eddy:/abl/s0/users/mpbutler/inverse/PCTM/obs[5030]% ncdump -h obshrf_20090708.nc netcdf obshrf_20090708 {

Inversion Guide Page 20

dimensions:

obs_sites = 205 ;

max_year = 5 ;

max_month = 12 ;

max_day = 31 ;

max_hour = 24 ;

len = 15 ; variables:

char active_sites(obs_sites, len) ;

int a_index(obs_sites) ;

a_index:long_name = "index into 424 site files" ;

int b_index(obs_sites) ;

b_index:long_name = "index into 864 site files" ;

int obs_hour(max_hour, max_day, max_month, max_year, obs_sites) ;

}

obs_hour:long_name = "hour selected for cosampling" ;

obs_hour:valid_range = 0, 1 ; eddy:/abl/s0/users/mpbutler/inverse/PCTM/simobs[5036]% ncdump -h merged_obs424_20090929.nc netcdf merged_obs424_20090929 { dimensions:

station_count = 424 ;

mtot = 60 ;

datatype = 2 ;

latlon = 2 ;

namelen = 15 ; variables:

char site_name(station_count, namelen) ;

int station_count(station_count) ;

float mtot(mtot) ;

float station_location(latlon, station_count) ;

float CO2_concentration(datatype, mtot, station_count) ;

}

Inversion Guide Page 21

Creating the Transport Matrix for the Inversion

[Refer to directory 8_transport_matrix]

The model samples (both background fluxes and response functions) must be averaged to the time resolution of the inversion (month in this case) and packaged in a format dictated by the inversion code. (This format differs a bit from the TransCom IAV in that the interannual variability of the transport has been taken into account; in other words (and for example), the response to the 2000 SiB background flux is not assumed to be the same as the response to the

2001 SiB background flux). There are three main processes here: averaging of the background flux responses and the response function (pulse) responses from hour to month, spinning up the first year of the background responses, and then creating the transport matrix (gmatrix) from the response functions and four background fluxes (SiB or CASA, ocean, fire, annual (ff) or seasonal (ffs) fossil).

The 3 options here are whether to use cosampled or default or all hours protocols in the averaging process. The co-sampling version is listed first (more complicated than the others....); however, after the averaging the processing is about the same and the file formats are the same.

Cosampling implies using only the model output hours which match actual observation hours for each site. All hours protocol simply averages all the model output hours. The default protocol uses the hours that would be used for each site IF observations were present for all hours. For example, for a continental, continuous sampling site, model samples from 12-16 LST only will be included in the averaging. For flask sites, this is problematic; how does one know when the samples would have been taken that are ‘missing’ in the observation data sets? The default protocol for flasks that I used is somewhat problematic: choose the time of day when the flask samples for the site are usually taken (based on time of day analysis for 2000-2004) and choose 5 days during the month (very arbitrary).

Averaging

Code for co-sampled background fluxes: average_bg_cosampled2.f90

Choose a background flux to process (in the code); five years of response files will be averaged into one output file. Input: station list of

205 active stations (all_active_20080617.dat), the observation hour array (obshrf_20090708.nc) and the five yearly response files (e.g., SiByy_response.nc where yy is {00, 01, 02, 03, 04}). Output is (for example) SiB_cos205v4_monresponse.nc (format below).

Code for co-sampled response functions (aka 'pulse' here): average_pulse_cosampled.f90

Choose a year to process (in the code); all of the pulse responses for that year will be averaged by month and incorporated into the output file. Input: list of 205 active stations (all_active_20080617.dat), observation hour array (obshrf_20090708.nc), and all the pulse files for the year ({LG1, LG2, LG3,

OG}{JanY,...,DecY}_response.nc where Y= {0, 1, 2, 3, 4}. Output is named, for example, for 2000, pulse00_cos205v4_monresponse.nc (format below).

Code for default background flux sampling ( average_bg_default2.f90

) and all hours background flux sampling ( average_bg_allhr2.f90

) differs in the station list (424 site list_dat_resp_20080617.dat) and the observation hour array is not used.

Code for default pulse sampling ( average_pulse_default.f90

) and all hours pulse sampling

( average_pulse_allhr.f90

) differs in the station list (424 site list_dat_resp_20080617.dat) and the

Inversion Guide Page 22

observation hour array is not used.

Spinup

In order to minimize wild fluxes in the first year of the inversion, the year 2000 background flux responses were front-end loaded with four years of responses to the 2000 responses (as a substitute for 1999, 1998,

1997, 1886 responses in 2000). This is confusing; just look at the code. If you can run more years of background flux transport before the beginning of the inversion, use those responses instead. My experiment was limited by the number of years of transport fields available.

Code for spinning up the cosampled background fluxes: spinup.f90

Choose the background flux, the number of years (4), and cosampled/default (in the code). Input is (for example).

SiB_cos205v4_monresponse.nc. Output is (for example) SiB_cos205v4s4_monresponse.nc. Format is the same.

Spinning up the default version uses the same code; all hours spin up uses spinup_allhr.f90

.

Creating the Transport Matrix (gmatrix)

Monthly responses from four background flux files, and five yearly pulse files are reorganized and packaged according to the expectations of the inversion.

Code: make_gmatrix2.f90

. Choose (in the code) whether this is for active, default, or all hours and which

4 background fluxes to use. Input are the four (spunup) background flux files and the five yearly pulse files. Output is a gmatrix file for the inversion (for example, gmatrix_cos205v4s_ffsSof.nc for the cosampled transport matrix with seasonal fossil, SiB, ocean, and fire background fluxes) (format below).

Merging the transport matrix for an inversion using active and simulated sites (if used):

Code: merge_gmatrix.f90

Input: station list (all_active_20080617.dat with the indexes of the 205 active site in the 424 site file), the default gmatrix and the cosampled gmatrix. Output is a merged 424 site gmatrix with the co-sampled responses overlaying the default responses for the 205 active sites. Use gmatrix files with the same background fluxes for both default and cosampled file inputs. Merged file format is the same as the default file (station=424).

Example monthly averaged background response file. Default and all hours files differ only in the station dimension. Spun up files have the same format. eddy:/abl/s0/users/mpbutler/inverse/PCTM/monresponsenew[5043]% ncdump -h

SiB_cos205v4_monresponse.nc netcdf SiB_cos205v4_monresponse { dimensions:

station = 205 ;

tracer = 5 ;

responsemonth = 60 ;

len = 15 ;

datatype = 2 ; variables:

char station(station, len) ;

int station_aindex(station) ;

char tracer(tracer, len) ;

int respmonth_count(tracer) ;

float bgmonresponse(datatype, responsemonth, tracer, station) ;

bgmonresponse:units = "ppm" ;

}

Example monthly averaged pulse file. Default and all hours files differ only in the station dimension. eddy:/abl/s0/users/mpbutler/inverse/PCTM/monresponsenew[5044]% ncdump -h pulse00_cos205v4_monresponse.nc netcdf pulse00_cos205v4_monresponse { dimensions:

Inversion Guide Page 23

station = 205 ;

tracer = 47 ;

pulsemonth = 12 ;

responsemonth = 25 ;

len = 15 ;

datatype = 2 ; variables:

char station(station, len) ;

int station_aindex(station) ;

char tracer(tracer, len) ;

}

int respmonth_count(pulsemonth) ;

float monresponse(datatype, responsemonth, pulsemonth, tracer, station) ;

Example gmatrix file. Default, all hours, and merged files differ only in the station dimension. eddy:/abl/s0/users/mpbutler/inverse/PCTM/monresponsenew[5046]% ncdump -h gmatrix_cos205v4s_ffsSof.nc netcdf gmatrix_cos205v4s_ffsSof { dimensions:

station_count = 205 ;

region = 47 ;

year = 5 ;

monthpulse = 12 ;

time1 = 25 ;

time2 = 60 ;

no_presub = 4 ; variables:

int tracer_resplen(monthpulse, year) ;

int presub_resplen(year) ;

float tracer_response_month(time1, monthpulse, year, region, station_count) ;

float presub_response_month(time2, year, no_presub, station_count) ;

}

Inversion Guide Page 24

Creating Simulated Observations

[Refer to directory 9_simobs]

This turned into a rather complicated process with subjective judgment inserted along the way.

Consider this one way to do it. I would not apply this procedure globally as it requires that there be some real observations in the same general area (continent/latitude band) in order to tune the fitting parameters in the final steps. The process worked OK for the northern mid-latitudes of

North America. With that disclaimer, the process starts with compositing the responses to the forward runs of the background fluxes (a combination of bio/ocn/fire/fossil), into a monthly response. This was done for active observation sites and the key sites on the to-be-simulated list.

Then fits were determined for these composited background responses, as well as for actual observation sites, and the GLOBALVIEW marine boundary layer data set at the latitudes of the active and to-be-simulated sites. The simulated observations were created using the offset and trend from the MBL and the harmonics from the background response adjusted by a factor

(multiplier) derived (subjectively) from actual observation sites relatively nearby. This last was intended to counteract the tendency of the SiB flux, in particular, to have an amplitude much larger than observed. The multiplier factor for the harmonics would be less of an issue if using

CASA for the biosphere background flux. By creating simulated observations for sites with real observations, it was possible to compare the end-product with the goal. This comparison was used to tune fits for simulated sites with similar characteristics. The method neglects the lag in seasonality between MBL and far-from-the-MBL continental sites. The following is a bit convoluted, as it was truly an exploratory effort that, once done to some degree of satisfaction, was left alone.

Code: average_bg_default2.f90

(described earlier) to create monthly averaged response files for each of the background fluxes to be used (in this case: SiB, ocean, biomass burning, and seasonal fossil). spinup.f90

(described earlier) to spin up the first year of the responses make_bg_simobs.f90

which (in spite of the name) is only the first step. Input are 4 spun-up background flux files averaged monthly using 'default' selection rules.

Output is a composited file at monthly resolution. netcdf bgsimobs_def_ffsSof_v1s { dimensions:

station_count = 424 ;

mtot = 60 ;

datatype = 2 ;

latlon = 2 ;

namelen = 15 ; variables:

char site_name(station_count, namelen) ;

int station_count(station_count) ;

float mtot(mtot) ;

float station_location(latlon, station_count) ;

float CO2_concentration(datatype, mtot, station_count) ;

}

Inversion Guide Page 25

The next three codes all use the same fitting routine (IDL curvefit), but applied to the composited background responses, the Globalview marine boundary layer dataset, and the actual monthly observations file.

Codes: calc_fits_modresp.pro

(for the composited background response file), create_fits_gvmbl.pro

(for the MBL), calc_fits.pro

(for the observations).

Output are files of fit parameters files. netcdf modresp_ffsSof_def424_6parmfit { dimensions:

station_count = 424 ;

mtot = 60 ;

namelen = 15 ;

parameters = 6 ; variables:

char site_name(station_count, namelen) ;

float mtot(mtot) ;

float fit_parameters(parameters, station_count) ;

float reduced_chisq(station_count) ;

float response_stdv(mtot, station_count) ;

} netcdf mbl424_6parmfit { dimensions:

station_count = 424 ;

mtot = 60 ;

namelen = 15 ;

parameters = 6 ; variables:

char site_name(station_count, namelen) ;

float mtot(mtot) ;

float fit_parameters(parameters, station_count) ;

float reduced_chisq(station_count) ;

float mbl_site_stdv(mtot, station_count) ;

float mbl_site_month(mtot, station_count) ;

} netcdf active_obs205_20090708_6parmfit { dimensions:

station_count = 205 ;

mtot = 60 ;

namelen = 15 ;

parameters = 6 ; variables:

char site_name(station_count, namelen) ;

float mtot(mtot) ;

float fit_parameters(parameters, station_count) ;

float reduced_chisq(station_count) ;

float obs_stdv(mtot, station_count) ;

}

Codes for comparing the fits: read_fit_parms2a.pro

: input are fit parameters for MBL and the observations, output is a fit comparisons file; read_fit_parms2b.pro

: input are fit parameters for MBL and model responses, output is a fit comparisons file. netcdf mbl_obs_fit_comparisons {

Inversion Guide Page 26

dimensions:

station_count = 205 ;

namelen = 15 ;

latlon = 2 ; variables:

char site_name(station_count, namelen) ;

int station_count(station_count) ;

float station_location(latlon, station_count) ;

float offset_ratio(station_count) ;

float trend_ratio(station_count) ;

}

float amplitude_ratio(station_count) ;

int lag(station_count) ; netcdf mbl_mod_fit_comparisons { dimensions:

station_count = 424 ;

namelen = 15 ; variables:

char site_name(station_count, namelen) ;

float offset_ratio(station_count) ;

float trend_ratio(station_count) ;

}

float amplitude_ratio(station_count) ;

int lag(station_count) ;

{magic happens here}

Code: create_simobs_part1.pro

: Input are the MBL fits, the MBL-obs fit comparisons, the MBL-model fit comparisons AND a list of sites with schemes and multipliers (list included in the directory for your reference). Output is a file of fit factors to be applied to the model response fit parameters file. The code create_simobs_part2.pro

takes those fit factors and the model response parameters file, constructs simulated observations, and writes out a simulated observations file in the format used in the inversion.

This is referenced in the merging procedure in the section covering packaging of observations for the inversion. netcdf all424_fitfactors_20090929 { dimensions:

station_count = 424 ;

latlon = 2 ;

namelen = 15 ; variables:

char site_name(station_count, namelen) ;

int station_count(station_count) ;

float station_location(latlon, station_count) ;

float offset(station_count) ;

}

float trend(station_count) ;

float amp_factor(station_count) ; netcdf def424_ffsSof_simobs_20090929 { dimensions:

station_count = 424 ;

mtot = 60 ;

datatype = 2 ;

latlon = 2 ;

Inversion Guide Page 27

namelen = 15 ; variables:

char site_name(station_count, namelen) ;

int station_count(station_count) ;

float mtot(mtot) ;

float station_location(latlon, station_count) ;

float CO2_concentration(datatype, mtot, station_count) ;

}

Inversion Guide Page 28

Prior Fluxes and Uncertainties

[Refer to directory 10_prior_flux]

The file format/contents are dictated by the inversion code and method. The sources_land and sources_ocean priors (datatype_count=1) are estimates of the adjustments to the background

(presub) fluxes for each region/month. As with TransCom, the sources_land and sources_ocean priors vary by month but repeat each year. Unlike TransCom, I have used zeros in these variables, indicating 'no adjustment' to the background fluxes as my prior. The datatype_count=2 variables are the uncertainties in these prior flux estimates; these I have derived from the primary background fluxes for each region (terrestrial or ocean)...hence the different prior files for inversions using SiB3 vs. CASA (see description in Tellus paper). Many versions of these prior files exist; most were made for experimenting with the sensitivity to the magnitude of the uncertainties.

It does not seem to matter what is in the sources_offset variable (single number meant to be a global mean from which all model response fields are offset). An offset is calculated in the inversion and reported in the inversion posterior flux output file.

Codes: make_priors2_iav.f90

(for the SiB3 and CASA versions of priors), make_priors3_iav.f90

(for the version of priors derived from TransCom IAV priors), report_priors_iav.pro

(skeleton code to read variables in the priors files), readtcpriors.pro

(reads TC annual mean and IAV priors)

Files: Prior files sharing the format required by the inversion code as documented here: priors2_iav_base.nc (base level uncertainties for inversions using SiB), priors2c_iav_base.nc (base level uncertainties for inversions using CASA monthly background fluxes), priors_sprd_tc3iav_nolu2.nc

(uncertainties derived from TransCom IAV, omitting land use change priors). See also below the file descriptions for the TransCom seasonal mean and IAV inversions. Note that I have made changes to the time dimensions/variables.

File format as dictated by Inversion Code documented here: netcdf priors2_iav_base { dimensions:

presub_count = 4 ;

presubp1 = 5 ;

datatype_count = 2 ;

land_region = 36 ;

ocean_region = 11 ;

month = 12 ;

year = 5 ;

namelen = 10 ; variables:

char presub_name(presubp1, namelen) ;

float sources_offset(datatype_count) ;

float time(month, year) ;

float sources_presub(datatype_count, year, presub_count) ;

float sources_land(datatype_count, month, year, land_region) ;

float sources_ocean(datatype_count, month, year, ocean_region) ;

}

Inversion Guide Page 29

TransCom3 IAV prior file descriptions (for comparison):

Priors for TransCom IAV netcdf priors.tdi.L1and0.3.newo { dimensions:

presub_count = 4 ;

presubp1 = 5 ;

datatype_count = 2 ;

land_region = 11 ;

ocean_region = 11 ;

month = 276 ;

year = 23 ;

namelen = 10 ; variables:

char presub_name(presubp1, namelen) ;

float time(month) ;

float sources_offset(datatype_count) ;

float sources_presub(datatype_count, year, presub_count) ;

}

float sources_land(datatype_count, month, land_region) ;

float sources_ocean(datatype_count, month, ocean_region) ;

Priors for TransCom seasonal mean inversion netcdf priors2_iav_base { dimensions:

presub_count = 4 ;

presubp1 = 5 ;

datatype_count = 2 ;

land_region = 36 ;

ocean_region = 11 ;

month = 12 ;

year = 5 ;

namelen = 10 ; variables:

char presub_name(presubp1, namelen) ;

float sources_offset(datatype_count) ;

float time(month, year) ;

float sources_presub(datatype_count, year, presub_count) ;

float sources_land(datatype_count, month, year, land_region) ;

float sources_ocean(datatype_count, month, year, ocean_region) ;

}

Inversion Guide Page 30

The Inversion

[Refer to directory 11_inversion]

In the directory I have included the code packages as well as an example inversion directory structure with the required files and sample output files that are created by the inversion code. I have also included Peter Rayner’s description of the SVDcalc version of the inversion.

Code Sub-directories: tc3 (TransCom IAV code which is what I started with), newinv_iav (my version)

Code changes in my version: Upgraded to F90, including using some built-in Fortran functions and changing functions to subroutines as appropriate; Rewrote (or made major modifications to) some key code modules (e.g., invmain_iav.f90, buildgreens_iav.f90, buildfrechet_iav.f90 and output_iav.f90);

Omitted some dead code; Added much in-line documentation (mine are marked with !; previous comments are also identified with 'c'). Buildgreens_iav.f90, buildfrechet_iav.f90 and svdcalc.f90 are not for the faint of heart. I needed to change the build... codes in order to allow for interannual variability in the transport. Be aware that the order of some arrays within the code must have made sense to Peter

Rayner, but may not to some others (me, for example). This will be important in post-processing one very key output (the posterior covariance).

Start with invmain_iav.f90 to understand the code.

There are options for doing the inversion: the straight-forward method (using bayesinv.f90) and the more stable Rayner (1999) SVD version (svdcalc.f90). LAPACK linear algebra routines are used. (You will likely need to make changes to the Makefile.) Please note that there is an effort on the part of Peter Rayner,

Thomas Lauvaux, et al. to make this svdcalc approach into a licensed commercial product. PSU has/will have a research license; I think that you just need to ask.

I also did not make use of the multiple-runs at a time scripts that were needed for TransCom (especially since I could never make them work!); I did one-off runs...and then moved the output to directories for each inversion run; I am sure that you can do it more gracefully.

Changes I should have made, but did not: There are some partial filedumps during execution that I used in debugging and should have turned off. The use of 'recipes' is currently nonfunctional, but would need to be activated if, for example, carbon isotope observations were used in addition to the carbon dioxide observations. The purpose of 'recipes' is to translate into common units (if I recall correctly). I would also get rid of the 'groups' processing; it was annoying to get it to work, and then I followed the TransCom route of doing all the 'group' processing in the post processing. There are also 'stubs' for calling code to implement prior covariance structures on sources (region fluxes) and observations, which is a needed improvement; this would also require some changes in places where the prior covariance matrices are inverted. Thomas Lauvaux has done this, but not with this version of the code(?).

See the sample_inversion sub-directory for the following. This is what should appear immediately after the inversion is run, before post processing. ‘iav’ is the executable output of the make.

The inversion is run using a 'control' file; an example is here, followed by a description of all the necessary files that must be in the files4run/ subdirectory. eddy:/abl/s0/users/mpbutler/flib/newinv_iav/net49_ffsS_base[5038]% more control

&control title = 'iav - net49 base ffsS - svd - 20100127' datflist = 'files4run/list_dat_act205_net49.dat' constf = 'files4run/constraints_none.dat' groupf = 'files4run/spatial_groups.dat'

Inversion Guide Page 31

sourcef = 'files4run/priors2_iav_base.nc' datf = 'files4run/active_obs205_20090911.nc' greenf = 'files4run/gmatrix_cos205v4s_ffsSof.nc' outputf = 'files4run/global-output' nlandreg = 36 noceanreg = 11 npresub = 5 numsvdvecs = 20 doinfluence= .true. dosvd = .true. dumpmats = .true. dumpgroups = .false. dumpgreens = .true. dumpfrech = .true. firstsrc = 2000 lastsrc = 2004 firstdat = 2000 lastdat = 2004

/ recipef = 'files4run/recipes.dat'

/

Necessary files (descriptions and examples/partial examples of files not already described): datflist: This is the list of possible observation sites, with the sites to be used in this inversion run 'turned on'. There are indexes that match station order in some other input files. The station names must match what is in the observation file. partial listing of datflist: (note that the actual file is aligned much more neatly!) The number of stations is the number of stations in the list, not necessarily the number used in the inversion. First column is a sequence number only; second is '9999' if the station is to be used in the inversion; third is the observation site name; fourth and fifth are latitude and longitude, sixth and seventh are not used. eddy:/abl/s0/users/mpbutler/flib/newinv_iav/files4run[5041]% more list_dat_act205_net49.dat

##

# station list for inversion

# format statement:

# (2i6,3x,a15,2f9.2,i6,3x,a1)

# next line is (a24,2x,i4)

# number of stations: 205

1 0 alt_01D0 82.45 -62.52 1 A

2 0 alt482n00 82.45 -62.52 1 B

3 0 alt482n00 82.45 -62.52 1 B

4 9999 alt482n01 82.45 -62.52 1 B

5 0 ams137s00 -37.95 77.53 2 B

6 0 brw_01D0 71.32 -156.60 6 A

7 9999 brw_01C0 71.32 -156.60 6 A

8 9999 cgo_01D0 -42.00 142.50 8 A

9 0 cgo540s00 -42.00 142.50 8 B

10 9999 cmn644n00 44.18 10.70 9 B

11 0 coi243n00 43.15 145.50 11 E

12 9999 cpt134s00 -34.35 18.49 13 B

13 9999 fsd449n00 49.88 -81.57 14 B

14 0 hat224n00 24.05 123.80 15 E

15 0 hun_01D0 46.95 16.65 16 A

Inversion Guide Page 32

16 0 izo_01D0 28.30 -16.48 18 A

17 9999 izo128n00 28.30 -16.48 18 B constf: This is a constraints file...not used. I think(?) this is used in the seasonal mean inversion to fix the global trend in carbon dioxide over the period of the inversion. eddy:/abl/s0/users/mpbutler/flib/newinv_iav/files4run[5043]% more constraints_none.dat constraint file

0 groupf: (partial listing) The following is as streamlined as possible given the expectations of the code. I did not use the raw group output of the inversion. eddy:/abl/s0/users/mpbutler/flib/newinv_iav/files4run[5044]% more spatial_groups.dat global aggregate groups for the spatial inversion

50 files4run/groups/lnd01.av lnd01.av files4run/groups/lnd02.av lnd02.av files4run/groups/lnd03.av lnd03.av files4run/groups/lnd04.av lnd04.av files4run/groups/lnd05.av lnd05.av files4run/groups/lnd06.av lnd06.av files4run/groups/lnd07.av lnd07.av files4run/groups/lnd08.av lnd08.av files4run/groups/lnd09.av lnd09.av files4run/groups/lnd10.av lnd10.av files4run/groups/lnd11.av lnd11.av files4run/groups/lnd12.av lnd12.av files4run/groups/lnd13.av lnd13.av files4run/groups/lnd14.av lnd14.av files4run/groups/lnd15.av lnd15.av files4run/groups/lnd16.av lnd16.av files4run/groups/lnd17.av lnd17.av files4run/groups/lnd18.av lnd18.av files4run/groups/lnd19.av lnd19.av files4run/groups/lnd20.av lnd20.av files4run/groups/lnd21.av lnd21.av

... files4run/groups/lnd33.av lnd33.av files4run/groups/lnd34.av lnd34.av files4run/groups/lnd35.av lnd35.av files4run/groups/lnd36.av lnd36.av files4run/groups/ocn01.av ocn01.av files4run/groups/ocn02.av ocn02.av files4run/groups/ocn03.av ocn03.av files4run/groups/ocn04.av ocn04.av files4run/groups/ocn05.av ocn05.av files4run/groups/ocn06.av ocn06.av files4run/groups/ocn07.av ocn07.av files4run/groups/ocn08.av ocn08.av files4run/groups/ocn09.av ocn09.av files4run/groups/ocn10.av ocn10.av files4run/groups/ocn11.av ocn11.av files4run/groups/lnd.all lnd.all files4run/groups/ocn.all ocn.all files4run/groups/global.all global.all

Inversion Guide Page 33

sourcef: This is the netCDF file with the priors. datf: The observations netCDF file. greenf: The transport matrix netCDF file (gmatrix). outputf: Directions for output. 'global-output' refers to 'global-block' which is essentially a no-op. 'docovar' looked interesting, but wasn't used. Aha, now I see that maybe 'dogroups = .false.' might have been useful? eddy:/abl/s0/users/mpbutler/flib/newinv_iav/files4run[5046]% more global-output

&outputnml

blockf = 'files4run/global-block'

dosources = .true.

dodata = .true.

dogroups = .true

docovar = .false./

&end eddy:/abl/s0/users/mpbutler/flib/newinv_iav/files4run[5047]% more global-block one global block for 1980s inversion

0 average-block average-block

See the

Following is a directory listing for an example inversion: control influence.dat predicted.groups prior.sources tspresubdump.dat datlist.dat predicted.sources srclist.dat vardump frechet.dat prior.data svdump.dat greens.dat predicted.data prior.groups tsmondump.dat

'prior.data' and 'prior.sources' are ascii file listings of the priors (region/month) and the observations

(site/month)

'predicted.data' and 'predicted.sources' are ascii file listings of the inversion output. Be aware that the

'predicted.sources' are only the adjustments to the prior (zero, in my examples) + background fluxes

(presubs).

'vardump', is (half of) the posterior flux covariance matrix and is a binary file. Converting this to more usable form is covered in Post Processing Inversion Output.

'influence.dat' is interesting and purports to show sensitivities between observations and region fluxes, but it was not easy to figure this out.

The remaining files are dumps or partial dumps of intermediate files or group-related files and will not be referenced again (some could/should be suppressed, but were useful in debugging).

Inversion Guide Page 34

Post Processing Inversion Output

[refer to directory 12_post_inversion]

Now that there is inversion output, there is more processing to do to make it useful. The posterior flux files contain only the adjustments to the prior fluxes and the posterior covariance is difficult to interpret. I started with some code borrowed from Kevin Gurney (unfinished code for seasonal inversion) and from David Baker/Kevin Gurney (for TransCom IAV). I reused and rewrote, and ended up with a bit of a mess (lots of things are output and not used, for example). I mostly maintained the ability to process multiple runs into mean results, but did not use this feature (so cannot vouch for the results). I also retained the feature of omitting one or more years of the direct inversion output from both the beginning and ends of the inversion time series. This was useful in TransCom because it enabled them to avoid the issues of spin up by sampling not using the first years of output. For example, I used only the middle three of the five years of my inversion, which made some of the output (e.g., deseasonalized fluxes) of limited use.

Subsequent analysis showed that I could have used the last year of the run, but probably not the first six months of the run. Always meant to go back and clean up this code….

The initial post inversion processing consists of four codes primary codes. Everything after that is re-packaging of output for convenience, specific analyses, or (inevitably) to satisfy requests to provide the output downscaled to 1x1 degree grid for inversion comparisons. There is also a few other codes not covered yet: one is for a simple process of creating the observation file expected by the post processing code, which has a subset of observations exactly matching those used in each inversion. Also used here are the summary ‘presub’ files whose creation was covered in the section on Background Fluxes earlier.

Codes: makedfile.f90

Input are observations netCDF file (e.g., active_obs205_20090911.nc) and the network listing for a specific inversion network (e.g., list_dat_act205_net49.dat). Output is an ascii file of observations for the post processing (e.g., datafile.net49.dat).

[See section on background fluxes for these ‘presub’ codes.] presub_iav.f90

, presub_iav2.f90

, presub_iav3.f90

, presub_iav4.f90

, readSIBbin.f90

, make_presub_grp.f90

: These are a series of codes to create netCDF files with region/month and aggregated region (group)/month totals for each of the background (presub) fluxes used in the inversion.

Example of the presub region/month and group/month totals for the SiB3, seasonal fossil, Takahashi ocean, and biomass burning combination of background fluxes: netcdf presub_iav_grp_v1 { dimensions:

longlen = 30 ;

len = 10 ;

presub = 4 ;

group = 40 ;

region = 47 ;

month = 12 ;

Inversion Guide Page 35

year = 5 ; variables:

char presubnm(presub, len) ;

char regnm(region, len) ;

char grpnm(group, longlen) ;

float global_totals(year, presub) ;

global_totals:units = "Gt C" ;

float region_totals(year, region, presub) ;

region_totals:units = "Gt C" ;

float group_totals(year, group, presub) ;

float presub(year, month, region, presub) ;

presub:units = "Gt C" ;

float group_presub(year, month, group, presub) ;

group_presub:units = "Gt C" ;

}

First Steps in Post Processing

Recall that the posterior fluxes output from the inversion are adjustments to the background fluxes

(presubs) for each region month. In the first steps of post processing, all the relevant files are gathered

(the datafile and presub files noted just above), the region definitions file, and the inversion output for preliminary analysis. The antecedent codes borrowed from TransCom are (process.f and process.tdi.pn13.f) [second more useful than the first]. My versions split these into a couple of parts. Note that the first two codes contain lots of commented out remains and write out variables that have not necessarily been used or found to be useful. Follow on code that 'packages' results contain results reported from this inversion experiment. Most of these codes work on the principle of omitting the first and last years of the inversion results from the analysis. There are versions of these codes that omit only the first year; last year looks usable.

Codes: post_part1.f90

Output are the displays of chi square statistics and files of regional flux correlations

(post_correl.dat) and model-data mismatch statistics (post.dataresid.dat). Most of this (except perhaps the chi square stats the way that TransCom did them) are also dealt with elsewhere. post_part2.f90

Output are model specific (post_run_region_month.txt, post_run_group_month.txt, post_run_summary.txt) and mean results (post_mean_region_month.txt, post_mean_group_month.txt, post_mean_summary.txt) by month and year for regions and aggregated regions (groups). These codes were originally intended to be used for analyzing output from the same inversion using multiple transport model responses. This is not necessarily the way that these rewritten codes have been used. The facility to include multiple inversion runs of output in one execution is retained, but has not been tested or used. reformat_vardump2.f90

Input is the binary vardump file from the inversion containing all the posterior flux covariance data; output is a netCDF file with the vardump reorganized in a more understandable way (at least I think so). The sequence of variables in the vardump file is not easy to understand.

Example of the reformatted vardump. netcdf net49_ffsS_base_36month_covar { dimensions:

short_namelen = 10 ;

region = 47 ;

covar_len = 1692 ; variables:

char short_region_name(region, short_namelen) ;

int region_start_index(region) ;

float covar_matrix(covar_len, covar_len) ;

}

Packaging the Flux and Data Results for Further Analysis

The following codes and file examples are intended to consolidate output files for further analysis.

Inversion Guide Page 36

Code: package_results.f90

Input are reference files and post_run_region_month.txt and post_run_group_month.txt for a specific inversion execution. Output are a region level and a group level netCDF file with fluxes in multiple units (originally for NACP Interim Synthesis). This group level file contains results for only the 22 original TransCom regions. netcdf net49_ffsS_base_region { dimensions:

longlen = 30 ;

region = 47 ;

month = 36 ; variables:

char region_name(region, longlen) ;

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float region_sourcesink(month, region) ;

region_sourcesink:units = "PgC yr-1" ;

float region_unc(month, region) ;

region_unc:units = "PgC yr-1" ;

float region_fossil_source(month, region) ;

region_fossil_source:units = "PgC yr-1" ;

float region_fire_source(month, region) ;

region_fire_source:units = "PgC yr-1" ;

float region_flux(month, region) ;

region_flux:units = "mol m-2 s-1" ;

float region_fossil_flux(month, region) ;

region_fossil_flux:units = "mol m-2 s-1" ;

float region_fire_flux(month, region) ;

region_fire_flux:units = "mol m-2 s-1" ;

} netcdf net49_ffsS_base_group { dimensions:

longlen = 30 ;

group = 22 ;

month = 36 ; variables:

char group_name(group, longlen) ;

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float group_sourcesink(month, group) ;

group_sourcesink:units = "PgC yr-1" ;

float group_unc(month, group) ;

group_unc:units = "PgC yr-1" ;

float group_fossil_source(month, group) ;

group_fossil_source:units = "PgC yr-1" ;

float group_fire_source(month, group) ;

group_fire_source:units = "PgC yr-1" ;

float group_flux(month, group) ;

group_flux:units = "mol m-2 s-1" ;

float group_fossil_flux(month, group) ;

group_fossil_flux:units = "mol m-2 s-1" ;

float group_fire_flux(month, group) ;

Inversion Guide Page 37

group_fire_flux:units = "mol m-2 s-1" ;

}

Code: package_expanded_results.f90

Similar to package_results.f90 but with some different variables in the output files, including all of the groups defined in the regions3.nc file. netcdf net49_ffsS_base_exp_region { dimensions:

len = 10 ;

longlen = 30 ;

region = 47 ;

month = 36 ;

presub = 4 ; variables:

char presub_name(presub, len) ;

char region_name(region, longlen) ;

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float region_sourcesink(month, region) ;

region_sourcesink:units = "PgC yr-1" ;

float region_unc(month, region) ;

region_unc:units = "PgC yr-1" ;

float region_prior_unc(month, region) ;

region_prior_unc:units = "PgC yr-1" ;

float region_fossil_source(month, region) ;

region_fossil_source:units = "PgC yr-1" ;

float region_psbio_sourcesink(month, region) ;

region_psbio_sourcesink:units = "PgC yr-1" ;

float region_psocn_sourcesink(month, region) ;

region_psocn_sourcesink:units = "PgC yr-1" ;

float region_fire_source(month, region) ;

region_fire_source:units = "PgC yr-1" ;

} netcdf net49_ffsS_base_exp_group { dimensions:

len = 10 ;

longlen = 30 ;

group = 40 ;

month = 36 ;

presub = 4 ; variables:

char presub_name(presub, len) ;

char group_name(group, longlen) ;

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float group_sourcesink(month, group) ;

group_sourcesink:units = "PgC yr-1" ;

float group_unc(month, group) ;

group_unc:units = "PgC yr-1" ;

float group_prior_unc(month, group) ;

group_prior_unc:units = "PgC yr-1" ;

Inversion Guide Page 38

float group_fossil_source(month, group) ;

group_fossil_source:units = "PgC yr-1" ;

float group_psbio_sourcesink(month, group) ;

group_psbio_sourcesink:units = "PgC yr-1" ;

float group_psocn_sourcesink(month, group) ;

group_psocn_sourcesink:units = "PgC yr-1" ;

}

float group_fire_source(month, group) ;

group_fire_source:units = "PgC yr-1" ;

Codes for producing summary results files (aka 'martha learns linear algebra'). Note that the reformat_vardump2.f90 and package_expanded_results.f90 codes must be run before these codes.

These should replicate exactly (and in a much more succinct fashion), the group and annual summary processing found in post_part2.f90. rmon_to_ryr.f90

Output is in region/year resolution. rmon_to_gmon.f90

Output is in aggregated region/month resolution. rmon_to_gyr.f90

Output is in aggregated/year resolution. netcdf net49_ffsS_base_region_annual { dimensions:

longlen = 30 ;

region = 47 ;

year = 3 ;

covar_len = 141 ; variables:

char region_name(region, longlen) ;

float year(year) ;

year:units = "year with decimal value of mid-year day" ;

float region_sourcesink(year, region) ;

region_sourcesink:units = "PgC yr-1, including fire source" ;

float covar_matrix(covar_len, covar_len) ;

int region_start_index(region) ;

float region_prior_unc(year, region) ;

region_prior_unc:units = "PgC yr-1" ;

} netcdf net49_ffsS_base_group_month { dimensions:

longlen = 30 ;

group = 40 ;

month = 36 ;

covar_len = 1440 ; variables:

char group_name(group, longlen) ;

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float group_sourcesink(month, group) ;

group_sourcesink:units = "PgC yr-1, including fire source" ;

float covar_matrix(covar_len, covar_len) ;

int group_start_index(group) ;

float group_prior_unc(month, group) ;

} netcdf net49_ffsS_base_group_annual {

Inversion Guide Page 39

dimensions:

longlen = 30 ;

group = 40 ;

year = 3 ;

covar_len = 120 ; variables:

char group_name(group, longlen) ;

float year(year) ;

year:units = "year with decimal value of mid-year day" ;

float group_sourcesink(year, group) ;

group_sourcesink:units = "PgC yr-1, including fire source" ;

float covar_matrix(covar_len, covar_len) ;

int group_start_index(group) ;

float group_prior_unc(year, group) ;

}

Codes for data analysis: dataresid3.pro

Input are the datafile....dat prepared for the inversion, prior.data, predicted.data from the inversion output. Calculates station residuals, RMS errors by station, Taylor skill scores, chi square by station and puts all the results into a netCDF file. netcdf net49_ffsS_base_datastat { dimensions:

station = 78 ;

month = 36 ;

namelen = 15 ; variables:

char station_name(station, namelen) ;

float station_latitude(station) ;

float station_longitude(station) ;

float obs(month, station) ;

float unc(month, station) ;

float predicted_obs(month, station) ;

float residual(month, station) ;

float obs_mean(station) ;

float obs_stddev(station) ;

float pred_mean(station) ;

float pred_stddev(station) ;

float correlation(station) ;

float rmse(station) ;

float score1(station) ;

float score4(station) ;

float chisq(station) ;

float mresid(station) ;

float mresid_stdv(station) ;

}

Inversion Guide Page 40

Downscaling the inversion results to grid scale (and finer resolution)

[Refer to directory 13_downscaling_flux]

Downscaling the inversion results from region to grid scale (PCTM grid and 1x1)

I have mixed feelings about the validity of this exercise. However, inversion comparison exercises often call for standard gridded 1 x 1 degree results; this is driven by those inversions that are done at the grid scale of the transport model rather than this inversion's 'big' region method. Here is how I did it (one of many attempts); you may choose some other way.

Codes for downscaling results: region_to_grid.f90

Input are the landpattern.nc and ocnpattern.nc used in basis function map creation and the grid-to-region mapping in regions3.nc. Output is cellfraction.nc. netcdf cellfraction { dimensions:

longitude = 144 ;

latitude = 91 ;

month = 12 ; variables:

float longitude(longitude) ;

longitude:units = "degrees east of dateline" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

double cellfraction(month, latitude, longitude) ;

cellfraction:units = "none: cell fraction of region" ;

} region_to_grid3.f90

The strategy here is to take in the flux adjustments calculated by the inversion, distribute these to the PCTM grid cells in the region, add these to the sum of the imposed background fluxes (bio+burn) or ocean. This is kinda complicated! See the code. Result is monthly grid maps of fluxes on the PCTM grid. netcdf net49_ffsS_base_TC_2.5x2 { dimensions:

longitude = 144 ;

latitude = 91 ;

month = 36 ; variables:

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float longitude(longitude) ;

longitude:units = "degrees east" ;

longitude:valid_range = -180.f, 177.5f ;

longitude:long_name = "longitude grid cell center" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

latitude:valid_range = -90.f, 90.f ;

latitude:long_name = "latitude grid cell center" ;

float post_land_flux(month, latitude, longitude) ;

post_land_flux:units = "mol m-2 s-1" ;

Inversion Guide Page 41

post_land_flux:long_name = "posterior land flux without biomass burning" ;

float post_ocean_flux(month, latitude, longitude) ;

post_ocean_flux:units = "mol m-2 s-1" ;

post_ocean_flux:long_name = "posterior ocean flux without biomass burning" ;

float fossil_flux(month, latitude, longitude) ;

fossil_flux:units = "mol m-2 s-1" ;

fossil_flux:long_name = "imposed fossil emissions" ;

float fire_flux(month, latitude, longitude) ;

fire_flux:units = "mol m-2 s-1" ;

fire_flux:long_name = "imposed biomass burning emissions" ;

float prior_land_flux(month, latitude, longitude) ;

prior_land_flux:units = "mol m-2 s-1" ;

prior_land_flux:long_name = "prior land flux" ;

float prior_ocean_flux(month, latitude, longitude) ;

}

prior_ocean_flux:units = "mol m-2 s-1" ;

prior_ocean_flux:long_name = "prior ocean flux" ; regrid_fine3.f90

This does a 'simple' regridding of the 2.5x2 degree results to the 1x1 grid. This includes a global rescaling to preserve the totals from the original. A land/sea mask is also included (as required for a TransCom intercomparison). netcdf PSU_SiB_1x1 { dimensions:

month = 36 ;

longitude = 360 ;

latitude = 180 ; variables:

float month(month) ;

month:units = "seconds since 1970-01-01 00:00:00" ;

float decimal_month(month) ;

decimal_month:units = "year with decimal value of mid-month day" ;

float longitude(longitude) ;

longitude:units = "degrees east" ;

longitude:valid_range = -179.5f, 179.5f ;

longitude:long_name = "longitude grid cell center" ;

float latitude(latitude) ;

latitude:units = "degrees north" ;

latitude:valid_range = -89.5f, 89.5f ;

latitude:long_name = "latitude grid cell center" ;

float post_land_flux(month, latitude, longitude) ;

post_land_flux:units = "mol m-2 s-1" ;

post_land_flux:long_name = "posterior land flux without biomass burning" ;

float post_ocean_flux(month, latitude, longitude) ;

post_ocean_flux:units = "mol m-2 s-1" ;

post_ocean_flux:long_name = "posterior ocean flux without biomass burning" ;

float fossil_flux(month, latitude, longitude) ;

fossil_flux:units = "mol m-2 s-1" ;

fossil_flux:long_name = "imposed fossil emissions" ;

float fire_flux(month, latitude, longitude) ;

fire_flux:units = "mol m-2 s-1" ;

fire_flux:long_name = "imposed biomass burning emissions" ;

float prior_land_flux(month, latitude, longitude) ;

prior_land_flux:units = "mol m-2 s-1" ;

prior_land_flux:long_name = "prior land flux" ;

float prior_ocean_flux(month, latitude, longitude) ;

Inversion Guide Page 42

prior_ocean_flux:units = "mol m-2 s-1" ;

prior_ocean_flux:long_name = "prior ocean flux" ;

int land_sea_mask(latitude, longitude) ;

land_sea_mask:valid_range = 0, 1, 2 ;

land_sea_mask:long_name = "ice 0, land 1,ocean 2" ;

}

Inversion Guide Page 43

Download