Presentation to ARGIS - Atlanta Region GIS User Group October 30, 2013 Jennifer Doty | jennifer.doty@emory.edu Data Management Specialist Emory Center for Digital Scholarship Overview Best practices for managing geospatial data: • File formats • Naming conventions • Folder structure • Storage and backup • Documentation Trends in geospatial data archiving: • Federal funding agencies’ requirements • State initiatives for preservation 2 Best Practices: File Formats Type of data Acceptable formats for sharing, reuse and preservation Other acceptable formats for data preservation Geospatial data vector and raster data ESRI Shapefile (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn) ESRI Geodatabase format (.mdb, .gdb) geo-referenced TIFF (.tif, .tfw) CAD data (.dwg) MapInfo Interchange Format (.mif) for vector data Keyhole Mark-up Language (KML) (.kml) tabular GIS attribute data Adobe Illustrator (.ai), CAD data (.dxf or .svg) binary formats of GIS and CAD packages UK Data Archive File Formats guide, http://www.data-archive.ac.uk/create-manage/format/formats-table 3 Best Practices: File Formats GeoMAPP Geospatial Data File Formats Reference Guide: • provides quick reference of common geospatial raster and vector dataset types • serves as tool to identify geospatial format types based on file extensions • also includes information on standards and specifications for documenting geospatial data http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls 4 Best Practices: Naming Conventions • Create meaningful but brief naming conventions for your project • Use file names to classify broad types of files • Avoid using spaces and special characters • Begin names with letters, not numbers e.g. Census2010_blockgroups_GA, not 2010Census… • Avoid very long file names 5 Best Practices: Naming Conventions Example: keyword_steward_extent_date.ext • Keyword (essential)—be as descriptive of the contents of the data as possible by using a word or short phrase • Steward (essential)—either the creator of the dataset or the last one to make a significant modification to a dataset • Extent (optional)—may be included to indicate resolution of the data (e.g. county, state, or international) • Date (optional)—may be used to indicate the date of creation or the age range of the content. Recommended format is YYYYMMDD Indiana Geographic Information Council, http://www.igic.org/standards/namingstandard.pdf 6 Best Practices: Naming Conventions Versioning: • useful to indicate file revisions or edits, especially in collaborations • can be through discrete or continuous numbering, depending on minor or major revisions – think of software versioning—ArcGIS 10 was significant change from 9.x., but ArcGIS 10.1 was (relatively) minor change to 10 7 Best Practices: Folder Structure • Separate directories for scratch workspace and final data • Hierarchy—is deep or shallow best for your project? 8 9 Tape library, CERN, Geneva by Cory Doctorow / CC BY-SA 2.0 Best Practices: Storage & Backup Storage Considerations: • Accessibility • Read/Write speed • Size limits—overall vs. file size Options: • Local—PC drive, flash drive, external hard drive • Server—department/organization server space • Cloud—Dropbox, Google Drive, etc. 11 Best Practices: Storage & Backup Backup Considerations: • Accessibility (local, server, cloud) • Redundancy (rule of thumb—here, near, far) Options: • Incremental/Snapshot • Automated 12 Metadata is a love note… by sarah0s / CC BY-NC-ND 2.0 Best Practices: Documentation “When thoughtfully populated, geospatial metadata can be a critical resource for understanding and managing geospatial data for current and future GIS practitioners and those trying to preserve the data.” -Utilizing Geospatial Metadata to Support Data Preservation Practices, January 2011, GeoMAPP (http://www.geomapp.net/publications_categories.htm) 14 Best Practices: Documentation Metadata—represents the who, what, when, where, why and how Standards: • CSDGM (FGDC) • ISO 19115-2003 / 19139 15 FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM) http://www.fgdc.gov/csdgmgraphical/index.html 16 CSDGM Fields for Preservation 17 Checklist: CSDGM Fields for Preservation Identification Information - basic info about data set, including: • party responsible—usually creator • publication date—date the data set is completed and ready for use • title—”where” “what” “when” • maintenance/update frequency—annually, as needed, based on census, etc. • bounding coordinates • keywords (theme and place) • access and use constraints—any restrictions, disclaimers, or guidance on data set attribution • contact details GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf 18 Checklist: CSDGM Fields for Preservation Data Quality Information – provides historical lineage and source descriptions for the data used in the creation of the data set, including: • originator • publisher, publication date & place • “currentness” of source data • process description GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf 19 Checklist: CSDGM Fields for Preservation Spatial Reference Information - description of the reference frame for, and the means to encode, coordinates in the data set, including: • map projection name • coordinate system name • unit of measure • geodetic model—datum, ellipsoid GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf 20 Checklist: CSDGM Fields for Preservation Entity and Attribute Information - details about content of the data set—the entities, their attributes, and domains from which attribute values may be assigned, including: • entity label • attribute label and description GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf 21 Checklist: CSDGM Fields for Preservation Metadata Reference Information - information on the party responsible for creating the metadata and the currentness of the metadata: • metadata standard name • metadata standard version GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf 22 Data Management Initiatives Federal agency mandates for sponsored research: • NSF & NIH requirements for DM plans • GIS Inventory (Ramona) & Federal Grants data sharing plans—gisinventory.net Other related initiatives: • USGS DM working group • DM training for early career researchers 23 FGDC Geospatial Data Lifecycle Model http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf 24 State & National Initiatives in Geospatial Data Archiving GeoMAPP - Geospatial Multistate Archive and Preservation Partnership (www.geomapp.net): • federally funded partnership between the Library of Congress and state geospatial and archives staff from North Carolina, Kentucky, Montana, and Utah National Digital Stewardship Alliance (NDSA), Geospatial Content Team (www.digitalpreservation.gov/ndsa): • report identifying appraisal and selection activities as they effect decisions defining geospatial content of enduring value for the nation 25 Open GeoPortal @ Emory NASA Goddard Photo and Video / CC BY 32 Green Question Mark by mikecogh on Flickr / CC BY Contact Information: Jennifer Doty | jennifer.doty@emory.edu Data Management Specialist Michael Page | michael.page@emory.edu Geographer & Geospatial Data Librarian Emory Center for Digital Scholarship digitalscholarship.emory.edu 33