Data Management … a “nuts-and-bolts” part of Responsible Conduct of Research March 21, 2015 Enid Karr, Sr. Bibliographer for Biology, Earth & Environmental Sciences, Environmental Studies enid.karry@bc.edu Sally Wyman, Collection Development Librarian, Sr. Bibliographer for Chemistry, Physics, Environmental Studies sally.wyman@bc.edu Barbara Mento, Data/GIS Librarian, Sr. Bibliographer for Computer Science, Economics, Mathematics barbara.mento@bc.edu A “Typical” Data Management Plan 1-2 pages describing the project and how data will be: § Collected (including formats, size, etc.) … Secured … Analyzed … Shared … Preserved Details about access/sharing § Potential audience(s) for the data § How access will be provided and how others will find it: “Access” (freely-available) vs. “Sharing” (by request) § Stipulations for privacy, confidentiality, IP or other rights § Allowed re-use of the data, derivative products Metadata standards to be used How long data will be retained -- archiving, long-term preservation andformat migration File Formats Whenever possible, save your data using open standards. Avoid proprietary formats. Some examples: § TXT, PDF/PDF Archival, not Word (doc, docx) § ASCII, not Excel (xls, xlsx) § MPEG-4, not Quicktime (qtff) § TIFF or JPEG2000, not GIF or JPG § XML or RDF, not RDBMS Ideally, save files in both original format AND one of the preferred ones listed above. Organization File Naming Conventions/Best Practices § Consistent, descriptive, UNIQUE … avoid spaces and special characters § Use brief names § Can contain: § Project acronyms § Researchers’ initials § File type information § Version number § Date § File Status IUS_v02_092011_final.csv Internet Usage Study version 2, Sept 2011, final draft, in csv format Data Entry and Quality Control § Whatever you use, be consistent § Define abbreviations in readme.txt file or in a “codebook” § Record dates for best sorting (YYYYMMDD) § Check periodically for data corruption/integrity using checksum, for example § Flag problematic data § Handling of null values: problematic in moving across software platforms § Consider using blanks: treated as null values by R, Python, Excel § Don’t use text (as in, “no data”) in a data column formatted for numbers § Avoid manual data entry whenever possible § Consider making your raw data files “read only” Ci#ng Data Sets Essen#al cita#on elements; style will vary: • author or creator • #tle or descrip#on • year of publica#on • publisher and/or the database/archive from which it was retrieved • the URL or DOI if the data set is online National Center for Biotechnology Information. PubChem Compound Database; CID=5934766, http://pubchem.ncbi.nlm.nih.gov/summary/ summary.cgi?cid=5934766 (accessed Feb. 22, 2011). Mackey, R.A., Mackey, E.F., and O’Brien, B.A. (1990). Lasting relationships research data archive (eScholarship version) [Data file]. Boston College School of Social Work. http://hdl.handle.net/2345/2228 § Additional Support The Libraries § The Data Management LibGuide libguides.bc.edu/dataplan § Subject Specialists www.bc.edu/libraries/help/askalib.html § eScholarship@BC escholarship.bc.edu § The Office for Sponsored Programs Research http://www.bc.edu/research/osp.html § ITS/Research Services http://www.bc.edu/offices/researchservices/ § Office for Research Integrity and Compliance http://www.bc.edu/research/oric/compliance.html § The Office for Technology Transfer and Licensing http://www.bc.edu/research/ottl/ § Some Useful Links Data Management and Sharing Snafu in 3 Short Acts (NYU Health Sciences Library) https://www.youtube.com/watch?v=N2zK3sAtr-4 § DataOne Best Practices https//www.dataone.org/all-best-practices-download-pdf § DCC (Digital Curation Center) Disciplinary Metadata Standards http://www.dcc.ac.uk//resources/metadata-standards § DCC Digital Curation Center Metadata Standards – Physical Sciences http://www.dcc.ac.uk/resources/subject-areas/physical-science § Guide to Writing “Readme” Style Metadata (Cornell) http://data.research.cornell.edu/content/readme