Document 11224496

advertisement
Data Management …
a “nuts-and-bolts” part of
Responsible Conduct of Research
March 21, 2015
Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental Sciences,
Environmental Studies
enid.karry@bc.edu
Sally Wyman, Collection Development Librarian, Sr. Bibliographer for
Chemistry, Physics, Environmental Studies
sally.wyman@bc.edu
Barbara Mento, Data/GIS Librarian, Sr. Bibliographer for Computer
Science, Economics, Mathematics
barbara.mento@bc.edu
A “Typical”
Data Management Plan
1-2 pages describing the project and how data will be:
§ 
Collected (including formats, size, etc.) … Secured … Analyzed … Shared …
Preserved
Details about access/sharing
§ 
Potential audience(s) for the data
§ 
How access will be provided and how others will find it:
“Access” (freely-available) vs. “Sharing” (by request)
§ 
Stipulations for privacy, confidentiality, IP or other rights
§ 
Allowed re-use of the data, derivative products
Metadata standards to be used
How long data will be retained -- archiving, long-term preservation andformat
migration
File Formats
Whenever possible, save your data using open standards.
Avoid proprietary formats. Some examples:
§  TXT, PDF/PDF Archival, not Word (doc, docx)
§  ASCII, not Excel (xls, xlsx)
§  MPEG-4, not Quicktime (qtff)
§  TIFF or JPEG2000, not GIF or JPG
§  XML or RDF, not RDBMS
Ideally, save files in both original format AND one of the preferred ones
listed above.
Organization
File Naming Conventions/Best Practices
§ 
Consistent, descriptive, UNIQUE … avoid spaces and special characters
§ 
Use brief names
§ 
Can contain:
§ 
Project acronyms
§ 
Researchers’ initials
§ 
File type information
§ 
Version number
§ 
Date
§ 
File Status
IUS_v02_092011_final.csv
Internet Usage Study version 2, Sept 2011,
final draft, in csv format
Data Entry and Quality Control
§ 
Whatever you use, be consistent
§ 
Define abbreviations in readme.txt file or in a “codebook”
§ 
Record dates for best sorting (YYYYMMDD)
§ 
Check periodically for data corruption/integrity using checksum, for example § 
Flag problematic data
§ 
Handling of null values: problematic in moving across software platforms
§ 
Consider using blanks: treated as null values by R, Python, Excel
§ 
Don’t use text (as in, “no data”) in a data column formatted for numbers
§ 
Avoid manual data entry whenever possible
§ 
Consider making your raw data files “read only”
Ci#ng Data Sets
Essen#al cita#on elements; style will vary: • 
author or creator • 
#tle or descrip#on • 
year of publica#on • 
publisher and/or the database/archive from which it was retrieved • 
the URL or DOI if the data set is online
National Center for Biotechnology Information.
PubChem Compound Database; CID=5934766,
http://pubchem.ncbi.nlm.nih.gov/summary/
summary.cgi?cid=5934766 (accessed Feb. 22,
2011).
Mackey, R.A., Mackey, E.F., and O’Brien, B.A.
(1990). Lasting relationships research data archive
(eScholarship version) [Data file]. Boston
College School of Social Work.
http://hdl.handle.net/2345/2228
§ 
Additional Support
The Libraries
§ 
The Data Management LibGuide
libguides.bc.edu/dataplan
§ 
Subject Specialists
www.bc.edu/libraries/help/askalib.html
§ 
eScholarship@BC
escholarship.bc.edu
§  The Office for Sponsored Programs Research
http://www.bc.edu/research/osp.html
§  ITS/Research Services
http://www.bc.edu/offices/researchservices/
§  Office for Research Integrity and Compliance
http://www.bc.edu/research/oric/compliance.html
§  The Office for Technology Transfer and Licensing
http://www.bc.edu/research/ottl/
§ 
Some Useful Links
Data Management and Sharing Snafu in 3 Short Acts
(NYU Health Sciences Library)
https://www.youtube.com/watch?v=N2zK3sAtr-4
§  DataOne Best Practices
https//www.dataone.org/all-best-practices-download-pdf
§  DCC (Digital Curation Center) Disciplinary Metadata Standards
http://www.dcc.ac.uk//resources/metadata-standards
§  DCC Digital Curation Center Metadata Standards – Physical
Sciences
http://www.dcc.ac.uk/resources/subject-areas/physical-science
§  Guide to Writing “Readme” Style Metadata (Cornell)
http://data.research.cornell.edu/content/readme
Download