2011NovAPECSWebinar - Federation of Earth Science

advertisement
Responsible Data Use and Local Data
Management
Ruth Duerr
National Snow and Ice Data Center
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Overview
• Responsible Data Use
•
•
•
•
Fair access and use
Data restrictions
Citation and credit
Providing feedback
• Local Data Management
•
•
•
•
•
File names
Directory structures
Backing up your data
Data formats
Documentation and metadata
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Responsible Data Use
(or what should you do if you find yourself re-using someone else’s data)
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Your Responsibilities as a Data User
• Determining the suitability of data for your purposes
• Following applicable data access and use policies
• Giving credit to archives and data creators
• Providing the data source with feedback about any errors
or limitations with the data discovered
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Just because it is “good” data, doesn’t mean that it is right
for your project!
Corollary
Just because it isn’t right for your project, doesn’t mean that
it is “bad” data!
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Hints for Determining Data Suitability
• Read any papers, documentation and metadata provided –
it is there for a reason!
• See http://nsidc.org/data/mod10a1v5.html for an example of a fairly
well documented data set
• If you still have questions, assess support availability and if
acceptable ask!
• See http://nsidc.org/data/g02199.html for an example of a poorly
documented data set with an extremely low level of available support
• Be aware that due to documentation and support
limitations, the best data for your purposes may not be
available to or usable by you
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
A few words about data access and use
• The trend in many disciplines is towards greater data sharing, but…
• Norms vary by discipline (and country), for example you may need to
• Submit an application for access
• Sign a data transfer and usage agreement
• Travel to the repository to obtain access
• Moreover there are legitimate reasons for restricting access, for
example:
• To protect the confidentiality of human subjects
• To protect the rights of local and traditional knowledge holders
• To protect information that if released may cause harm (e.g., location of
endangered species, sacred sites, etc.)1
• It is your responsibility to understand and follow the norms for the data
your are using
1
see IPY Data Policy at classic.ipy.org/Subcommittees/final_ipy_data_policy.pdf
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Would you share your data if you didn’t
know that you were going to be given
credit for your work?
So cite the data you use!
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Data Citation – Now
• Currently data citation standards and requirements vary
1.
2.
3.
4.
From journal to journal
From repository to repository
From discipline to discipline
Some times from author to author
• Do your best to honor these existing norms
• What might a data citation look like?
• Zwally, H.J., R. Schutz, C. Bentley, J. Bufton, T. Herring, J. Minster, J.
Spinhirne, and R. Thomas. 2003. GLAS/ICESat L1A Global Altimetry
Data V018, 15 October to 18 November 2003. National Snow and Ice
Data Center. Data set accessed 2011-07-21 at
doi:10.3334/NSIDC/gla01.
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Data Citation – In the Near Future
• DataCite and other groups are working to make data
citation a normal part of the scientific process
• For example, as of this year Thompson-Reuters Web of Science and
Web of Knowledge include published data sets (i.e., that have a DOI)
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Why provide feedback?
• Prevent other users from repeating your mistakes
• Improve the data or their documentation
• Better science, perhaps even new results, papers, and
collaborators
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
A few words about providing feedback
• Feedback to a PI
• Your reasons for using someone else’s data are likely different than
their reasons for acquiring it in the first place
• So, they probably weren’t thinking of your needs when they acquired,
documented and made it available
• Yet, if they thought their data would be useful to a community they
probably would be eager to help
• Diplomacy and tact may be called for (especially if you really think
you’ve found an error not just a documentation problem)
• Feedback to a data center is almost always welcome
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Local Data Management
(or managing your own data)
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
The 5 P’s matter!
(prior planning prevents poor performance)
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Local Data Management
• File names
• Directory structures
• Backing up your data
• Data formats
• Documentation and metadata
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Dilbert’s file naming convention
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Assign descriptive file names
• File names should be unique and reflect the file
contents
• Bad file names
• Mydata
• 2001_data
• A better file name might be
• bigfoot_agro_2000_gpp.tif
•
•
•
•
•
BigFoot is the project name
Agro is the field site name
2000 is the calendar year
GPP represents Gross Primary Productivity data
tif is the file type – GeoTIFF
• But only if you document the naming convention!
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Organize files logically
Biodiversity
• Make sure your file system is logical and
efficient
Lake
Biodiv_H20_heatExp_2005_2008.csv
Experiments
Field work
Biodiv_H20_predatorExp_2001_2003.csv
…
Biodiv_H20_planktonCount_start2001_active.csv
Biodiv_H20_chla_profiles_2003.csv
…
Grassland
Courtesy of S. Hampton, UC-Santa Barbara
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Backup Your Data!!!
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Why?
You think it's easy to recover data off a
• Broken DVD
• A burned up memory stick
• A drowned laptop
• A crashed hard drive
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Backing up your data files
• Create back-up copies often
• Ideally three copies
• original, one on-site (external), and one off-site
• Frequency based on need / risk
• Higher value data should be backed up more often
• Sensor data collected at high frequency should be backed up more frequently
• Ensure that all backup copies are identical to the original
files
• Use checksums or file comparisons
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Test your backups
• Automatically test backup copies of files frequently to
ensure they are viable
• Media degrade over time
• Test copies using check sum or file compare
• Be certain that you can recover from a data loss
• Periodically test your ability to restore information (at least once a
year)
• Simulate an actual loss, by trying to recover solely from the backed up
copies
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Data Formats – Best Practices
• Don’t use a proprietary format!
• These have a short shelf life and will probably become unreadable
after a few years
• Don’t invent your own format!
• No one but you will have the tools to read it
• Use open source, well-documented, community-based
standard formats where ever possible especially if they are
self-describing
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Self-describing data formats
• Information describing the data contents of the file are
embedded within the data file itself:
• Names for various fields
• Data types – Standardized, portable, machine independent
• Pointers to various fields, making it efficient to extract the particular
fields you want without reading the entire file
• Attributes and flags related to the primary fields with extra information
such as units, fill values, etc.
• Include a standard API and portable data access libraries in
a variety of languages
• There are tools that can open and work with arbitrary files,
using the embedded descriptions to interpret the data.
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Some example self-describing formats
• HDF – Hierarchical Data Format
• HDF4 and HDF5 versions are in use today
• A NASA variant called HDF-EOS is used within the Earth Observing
System program.
• NetCDF – Network Common Data Form
• Widely used by agencies including NASA and NOAA
• Climate and forecast (CF) metadata conventions help standardize
some things into NetCDF in a common manner.
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Documentation (metadata)
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Poor data practice results in loss of
information
Time of publication
Information Content
Specific details
General details
Retirement or
career change
Accident
Death
Time
(Michener et al. 1997)
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Don't you think it would be more
efficient
• If you didn't have to remember
• the name of that file?
• and the directory where you put it?
• the units those measurements were taken in?
• which sample site was which?
• etc.
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Making Your Research Easier and Cheaper
Write it down!
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Needed Documentation
Who: Data Set
Creator and
Contact
Where:
Geographic Extent
and Location of
Data Set Coverage
What: Title of Data
Set and Keywords
Describing the
Data Set
When: Temporal
Coverage of the
Data Set
Why: Description
and Purpose of the
Data Set
How: How the
Data Set was
Created and How
to Access the Data
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
Documentation Best Practices
• Document your conventions as they’re established
• Revise documentation contemporaneously, not “after the fact”
• This work is also the basis for end-user or reviewer documentation
• What should you document? Everything!
• Data import, manipulation, QC procedures, special flags and encoding
• Naming conventions, layouts, headings, units and abbreviations
• Does TEMP mean “temporary,” “air temperature at time of observation,” or ?
• Formulae and constants
Responsible Data Use and Local Data Management; Presented 8 Nov 2011, APECS Webinar
References and Resources
• Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and
S. G. Stafford. 1997. “Nongeospatial metadata for the
ecological sciences.” Ecological Applications 7(1):330-342.
• Data management training materials in development are
available at
http://wiki.esipfed.org/index.php/Data_Management_Course
_Outline
• A short list of data management related resources available
on the web can be found at
http://wiki.esipfed.org/index.php/Data_Management_Resour
ces
Download