Session 2: Databases and Archiving Data
BVOC Working Group
Session 2: Databases and Archiving Data
Monday, September 9, 2002
Discussion Leaders: Nick Hewitt and Christine Wiedinmyer
Recorder: Shelley Pressley
N. HEWITT – Welcome to Lancaster University, we have about 10,000 students here at the University and the University is growing. Princess Alexander is the new chancellor of the University. Glad you could all make it.
C. WIEDINMYER – The objectives of this session are to discuss archiving data and databases. Currently the data are somewhat lacking, the information is difficult to find, and there are different formats. Another question is how do we interpret the data that are available? And how can we use it? How do we use the data from the lab or field studies for model verification etc., and how do we make the data more accessible to the research community? What kind of information do people want and how do we make it useful for them to retrieve?
J. KARLIK – are experimental data and model output apples and oranges? Are they different things?
P. HARLEY - please describe what we currently have? (to C. Wiedinmyer)
C. WIEDINMYER – N. Hewitt has developed an enclosure database that describes the emission factors for specific species. NCAR has taken that a step further by also logging what other parameters were measured during the study, and what the experiment entailed.
These are parameters that could possibly explain the difference in some of the emission rates.
A. GUENTHER – J. Karlik’s question about model output – yes they are different. One type of model output is an Emission inventory. GEIA started with Emission inventory activities – or what modelers wanted? The primary goal of GEIA was to develop one global inventory (for 1 year) of monthly emissions on a 1 o x 1 o grid.
However, emission modelers need more. Modelers want all the pieces and tools to describe the VOC emissions. So we need to provide the model output – that is one goal, but you also must provide all the pieces in order to get to that output. i.e. References for those pieces that go into the output. Another GEIA goal is determining the uncertainties associated with those models.
Q: How do we evaluate the pieces that are used to model BVOC emissions?
Also, big difference between global and regional emissions estimates.
J. KARLIK – concerned about the early emission factors reported in the literature and the environment they were measured in. This work was done before we realized the importance of sun vs. shade leaves for example.
4/9/2020 1
Session 2: Databases and Archiving Data
M. POTASNAK – do people currently ask for the input used in the Guenther ‘95 model?
A. GUENTHER – yes
N. HEWITT – even now 10 years later I still get questions about what went into models and it would have been nice to keep track of those that expressed interest.
It is very important that we keep track of where the numbers come from for input into models – but that doesn’t mean the number is correct. From a historical point it is important and we must remember this is a community approach.
Q; How do we keep track of these requests? Use this to identify the need for the data, emission model pieces and emission model outputs.
P. HARLEY – but similar to how the JPL works, we need to filter out the data that is uncertain.
A. GUENTHER – label each emission measurement but still come up with an emission factor that is based on discussions about the quality of data that is out there.
N. HEWITT – the analogy with gas kinetics (JPL) is a little different. Their numbers are generated in the lab and can be reproduced, typically. But for our application, the environmental conditions are different and we may not even be measuring the same species each time.
N. HEWITT – maybe it’s the method that is being judged not the number?
P. HARLEY – A good outcome of this meeting would be to describe the ideal method for future enclosure measurements
A. STEINER – from a modeler's point of view it’s nice to have datasets for verifying the models
I. GALBALLY – we seem to be moving from good and bad data to a set of qualifiers or protocol that should be followed and then we can use that protocol to judge the usefulness of historical data. Also in these systems for a given species there isn’t one emission factor even though we would like for there to be one. There may be a number of good values that we except – and in other cases those numbers may converge to a single value. So maybe we should focus on a range of values – not a number
There is a lot of variability in the published values- therefore we should come up with a range of numbers (for emissions factors) instead of a specific number?
P. HARLEY - do modelers want a number? Or is a range acceptable?
4/9/2020 2
Session 2: Databases and Archiving Data
Y. WU - for high resolution models we should let the modelers use the data in their best judgment
W. VIZUETE – allow the modelers to look at the different emission factors and decide which value would be the best to use. The choice of number to use is also based on the purpose of the model and what they are trying to model. As long as all the information is present, then the modeler can decide
M. POTASNAK – do the modelers want species level emission factors, or emissions on an ecosystem level?
A. GUENTHER – some modelers want that species level emission. So it is important
J. RINNE – the scale of the model will change which emission factor you use, so we need both species level and ecosystem level emission factors. As far as ecosystem levels, some of them depend on how the lower level emission factors are scaled up.
C. WIEDINMYER – if there was a database with emission factors and a protocol for determining which emission factors are of good quality, would it be useful?
W. VIZUETE – yeah it would be useful –even without a protocol
J. KARLIK – from a CA perspective, their group is very interested in this idea
C. WIEDINMYER – the enclosure database already exists, do we want to add other types of datasets such as flux and vertical measurements. For example the Blodgett flux dataset is not available easily on the web. (Allen Goldstein) So how do we come up with a format for people to use in order to submit that kind of data and what kind of metadata should be included? What about a fair use policy? How do you recognize people’s data when you use it?
J. RINNE – if someone is using the data they should contact the owner of the data – they should be required to contact the owner of the raw data.
C. GERON – is that practical? What about students that move on and are hard to contact?
P. HARLEY – generally most of the data is published.
A. GUENTHER – not necessarily, even if you publish a paper that raw data is not available, but it should be
T. KARL – maybe that should be indicated on the website, if the data is published or not so that it is clear
P. HARLEY – Standard formats for flux data have probably been set up with other data acquisition systems
4/9/2020 3
Session 2: Databases and Archiving Data
A. GUENTHER – J. Rinne’s point is important – you should let the owners know how their data is being used.
C. WIEDINMYER – would you ask to be a coauthor? Another issue to deal with it
J. RINNE – that’s not really a big issue, it depends on the situation and how much the data is used.
M. POTASNAK – what about the difference between enclosure data and flux data?
C. WIEDINMYER – if the emission factor is simply used you probably don’t need to include the reference? but for large field datasets you should. The database would include a large literature review of the emission factors and using the protocol, certain emission factors would be recommended for modeling. The next section would be field measurements for model validation. The 3 rd section would have various emission inventories (different scales possibly) and our recommendation for what you should use.
It would also have the algorithms that are currently being used – i.e. all the pieces that go into the models. These are all questions that have been asked of NCAR – it would be nice to have it in one place?
M. POTASNAK – For the 2 nd
section, other databases only store the metadata, but to actually get the data you have to go to an individual website. This makes it more difficult for the user to really get the data, but the owner has some control of how people are using the data.
A. GUENTHER – That method makes it hard to maintain the data
J. RINNE – for long term it’s easier to keep the data in one place.
M. POTASNAK – will there be a standard structure for the data?
C. WIEDINMYER – i.e. for AmeriFlux there is a standard format – how realistic would it be for BVOC datasets to fit that format
S. OWEN – in theory it sounds like a great organization – but does it work in practice?
For the BEMA database it didn’t work well. There were a lot of metadata files but no data
N. HEWITT – it must be extremely easy and it must be focused. Links to other websites probably don’t work. But you also don’t want a large computer storage space where people just dump things. But over all it must be user friendly
J. RINNE – there must be good documentation of the data stored there, and ASCII format is typically easy – so something simple like ASCII is best.
4/9/2020 4
Session 2: Databases and Archiving Data
J. KARLIK – as we go along some of these decisions will be made by the manager of the database. This will need to be an iterative process
A. GUENTHER – we would like this to happen, but there is no long term funding and we as a community must make this happen. We need to build this on our.
W. VIZUETE – ultimately it will fall on who will do the work. There is a lot of work for the manager getting things in the correct format, even with the support of the community.
S. PRESSLEY – for PROPHET program we used the NASA format and it was fairly easy to use. It was ugly at the beginning, but was usable. S. Hayward sneezed.
C. WIEDINMYER – but how easy is it to use netCDF formats? Or the NASA format?
They are binary and it is more difficult to use that format.
L. OTTER – many of the SAFARI people had difficulties with the netCDF format
P. HARLEY – I think we are putting the cart before the horse. I imagined something very simple. For enclosures it’s simple, but it becomes more complex with 10 HZ type data.
J. RINNE – that’s why there are 3 parts…the simple emission factors in section 1 and the
2 nd
section for flux data where there is much more data.
C. WIEDINMYER – we have 2 parts, enclosure stuff and above canopy data and models.
A. GUENTHER – do we need a community effort to put together a database of enclosures? Do we need a community effort to put together a database for flux datasets?
Do we need one for modeling efforts?
M. POTASNAK – Would like to offer one last pitch for above canopy flux datasets – they are not available easily. In some cases the large datasets are already available in other databases i.e. PROPHET type data
J. KARLIK – I still think the database would be helpful.
W. VIZUETE – What would the 3 rd
section be again?
C. WIEDINMYER – 3 rd
section would be the recommendations on how to use the data – this is what we suggest you use for your emission factors, here are the algorithms that are used, and model input information. i.e. modeling tools
A. GUENTHER – species level emission factor stuff in the first section and in the modeling tool section ecosystem type emission factors. It would also be nice to include satellite information that can be used for modeling in the last section.
J. RINNE – how do we list the algorithms? When we don’t even know how they work
4/9/2020 5
Session 2: Databases and Archiving Data
N. HEWITT – how do we develop a database when the field is constantly evolving?
A. STEINER – is this necessary? Will there be more people like me that need this information or will there be more people making measurements that would benefit from a this protocol?
C. WIEDINMYER – as a graduate student it would have been great to see something like this – guide as a student. It also would have been nice to post your own data to get it out there for others to see and use.
P. HARLEY – if you build it they will come…there will be more people interested in the field.
J. RINNE – from an emission point of view – it is really handy to verify your own measurements. To see if someone else has measured it before
C. GERON – The USEPA point of view. There is a need for summarizing the literature on a global scale – for example methanol emissions from which plants? We are already incorporating methanol into the emission inventories and we don’t know anything about methanol.
A. GUENTHER – getting the users involved to give feedback to the investigators would be helpful
Y. WU – this interaction does occur currently, there are lots of input from the measurement people and the modelers.
J. KARLIK – in CA we had a BVOC working group, he is encouraging us to continue.
C. WIEDINMYER – would the people here be willing to contribute to the database?
Would we help and guide the way data should be used? As a modeler would you be willing to ask for the info you need? i.e. all the data from Malaysia? Even small as it is – it’s the only data available
C. GERON – Everyone here is supportive of the idea, it’s the people that aren’t here that we worry about. That makes the management end of it very difficult. It must include convincing people to put their data on the site.
N. HEWITT – we have a project here and it takes one full time person to get people to put their data on the site. Therefore user friendly-ness is important for this site.
J. RINNE – one advantage, if you put your data on the database you know where it is in the future.
4/9/2020 6
Session 2: Databases and Archiving Data
A. GUENTHER – that is very helpful, so you can find you own data in the future But it is also helpful for getting your data referenced etc.
C. WIEDINMYER – is it worth it to just put the minimal amount of data on the website, considering we don’t have that many people here?
N. HEWITT – back to the fair use policy. The contributors must feel like they can get something out of their efforts. How will they benefit from getting their data on the database?
A. GUENTHER – one way to show a benefit is to show the site is getting hits. This will increase collaborations. What about showing citations that are based on the data that is posted?
I. GALBALLY – Another idea is before people can download data, they must register and they should commit that they will cite the appropriate reference, especially if it is unpublished data.
M. POTASNAK – how will the emission factor data be referenced?
I. GALBALLY – It should be referenced back to the database, to include those people
(manager) that spend time preparing the recommendation.
S. OWEN – reference the database? How? This isn’t good format for most journals. You must reference the original authors of that work. Not the database
I. GALBALLY – if you cite the original emission factor, then cite the author of that work, but if you cite a group decision for the recommended emission factor?
C. WIEDINMYER – the JPL reference is cited, but it is published as a supplement every year. For example Louisa Emmons has put tougher a database, but she has written a paper to go along with her work that can be used for reference.
A. GUENTHER – for the emission recommendation we need to have a reference, but for the emission data part it should not be reference. You should reference the individual researcher for those.
C. WIEDINMYER – there is already unpublished data on the site – for example Rei
Rasmussen has tons of information that hasn’t been published
A. GUENTHER – that could be one of the steps of the protocol, to be published, and it should be important. But we should include both published and unpublished data on the website. Question for Ian Galbally: how did the data you’ve worked with in the past meet the protocol? Was it a binary yes/no qualification if they met the protocol?
4/9/2020 7
Session 2: Databases and Archiving Data
I. GALBALLY – yes it is binary – there were major and minor protocols that had to be met
A. GUENTHER – should we do this? Or should we let the user decide how to use the data?
S. OWEN – C. Wiedinmyer is this your full time job?
C. WIEDINMYER – well, not really, and it would require someone full time to manage this. Maybe we need to look for someone to fund for this position?
A. GUENTHER – well it is a high priority for C. Wiedinmyer whether she wants to accept this or not. But it depends on the community and what they want. NCAR does have the resources and community resources and the EPA interagency agreement, and if there is a need there will be a high priority for this. So the question about getting the resources – yes we can do it. But it would be nice to have some European input too.
N. HEWITT – yes there is a need for this database.
C. WIEDINMYER – about 90 different people have logged on to the current website and we have a record of those that have logged on. So far the list of people in the community is limited to just those that I have contacted, but it will continue to grow.
N. HEWITT – people outside of our community would also be interested in this database, for example agency people that work on emissions. And they are a potential source of funding.
A. GUENTHER - Could we get some help (not necessarily funding) from other agencies such as TX? Help in terms of managing databases etc.
C. WIEDINMYER – there has been a lot of interest from these other agencies – their interest is air pollution control strategy studies. And they are interested in the community input. So help from the research community is very important.
Other Notes:
Data submittal- Formatting
If a format is recommended for data submitted to this site, it MUST be
user-friendly (for both user and submitter)
focused
structured where people can put as much or as little into it
Well- documented
Simple and easy to read
Q: who will put the data into a specified format? Will it be the PI or the database manager?
4/9/2020 8
Session 2: Databases and Archiving Data
This is an evolving process, and can be modified as we go…. We need to build a web site and database that can allow for development.
A bulletin board for comments and collaborative discussions should be set up.
The FAIR USE agreement needs to make contributors feel like they are getting something out of putting information on the site. Should show that the web site and database are getting accessed- and show the published papers that have utilized the database/web site.
Some of the criteria for the recommended emission factors:
data must be published
unequivocal identification of species…
etc. (Ian makes a list later in the day)
If rating data, make a protocol that identifies a yes or no to the criteria for each data point.
Then a rating (e.g. 50% of the criteria have been met) can be assigned to each point. Then should we summarize this- or let the user decide?
4/9/2020 9