Best Practices for GIS - the Atlanta Regional Commission

advertisement
Presentation to ARGIS - Atlanta Region GIS User Group
October 30, 2013
Jennifer Doty | jennifer.doty@emory.edu
Data Management Specialist
Emory Center for Digital Scholarship
Overview
Best practices for managing geospatial data:
• File formats
• Naming conventions
• Folder structure
• Storage and backup
• Documentation
Trends in geospatial data archiving:
• Federal funding agencies’ requirements
• State initiatives for preservation
2
Best Practices: File Formats
Type of data
Acceptable formats for
sharing, reuse and
preservation
Other acceptable formats
for data preservation
Geospatial data
vector and raster data
ESRI Shapefile
(essential - .shp, .shx, .dbf,
optional - .prj, .sbx, .sbn)
ESRI Geodatabase format
(.mdb, .gdb)
geo-referenced TIFF (.tif,
.tfw)
CAD data (.dwg)
MapInfo Interchange
Format (.mif) for vector
data
Keyhole Mark-up Language
(KML) (.kml)
tabular GIS attribute data
Adobe Illustrator (.ai), CAD
data (.dxf or .svg)
binary formats of GIS and
CAD packages
UK Data Archive File Formats guide, http://www.data-archive.ac.uk/create-manage/format/formats-table
3
Best Practices: File Formats
GeoMAPP Geospatial Data File Formats
Reference Guide:
• provides quick reference of common
geospatial raster and vector dataset types
• serves as tool to identify geospatial format
types based on file extensions
• also includes information on standards and
specifications for documenting geospatial data
http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls
4
Best Practices: Naming Conventions
• Create meaningful but brief naming
conventions for your project
• Use file names to classify broad types of files
• Avoid using spaces and special characters
• Begin names with letters, not numbers
e.g. Census2010_blockgroups_GA, not 2010Census…
• Avoid very long file names
5
Best Practices: Naming Conventions
Example: keyword_steward_extent_date.ext
• Keyword (essential)—be as descriptive of the contents of
the data as possible by using a word or short phrase
• Steward (essential)—either the creator of the dataset or
the last one to make a significant modification to a dataset
• Extent (optional)—may be included to indicate resolution
of the data (e.g. county, state, or international)
• Date (optional)—may be used to indicate the date of
creation or the age range of the content. Recommended
format is YYYYMMDD
Indiana Geographic Information Council, http://www.igic.org/standards/namingstandard.pdf
6
Best Practices: Naming Conventions
Versioning:
• useful to indicate file revisions or edits,
especially in collaborations
• can be through discrete or continuous
numbering, depending on minor or major
revisions
– think of software versioning—ArcGIS 10 was
significant change from 9.x., but ArcGIS 10.1 was
(relatively) minor change to 10
7
Best Practices: Folder Structure
• Separate directories for scratch workspace
and final data
• Hierarchy—is deep or shallow best for your
project?
8
9
Tape library, CERN, Geneva by Cory Doctorow / CC BY-SA 2.0
Best Practices: Storage & Backup
Storage Considerations:
• Accessibility
• Read/Write speed
• Size limits—overall vs. file size
Options:
• Local—PC drive, flash drive, external hard drive
• Server—department/organization server space
• Cloud—Dropbox, Google Drive, etc.
11
Best Practices: Storage & Backup
Backup Considerations:
• Accessibility (local, server, cloud)
• Redundancy (rule of thumb—here, near, far)
Options:
• Incremental/Snapshot
• Automated
12
Metadata is a love note… by sarah0s / CC BY-NC-ND 2.0
Best Practices: Documentation
“When thoughtfully populated, geospatial
metadata can be a critical resource for
understanding and managing geospatial data for
current and future GIS practitioners and those
trying to preserve the data.”
-Utilizing Geospatial Metadata to Support Data Preservation
Practices, January 2011, GeoMAPP
(http://www.geomapp.net/publications_categories.htm)
14
Best Practices: Documentation
Metadata—represents the who, what, when,
where, why and how
Standards:
• CSDGM (FGDC)
• ISO 19115-2003 / 19139
15
FGDC’s Content Standard
for Digital Geospatial
Metadata (CSDGM)
http://www.fgdc.gov/csdgmgraphical/index.html
16
CSDGM Fields for
Preservation
17
Checklist: CSDGM Fields for
Preservation
Identification Information - basic info about data set, including:
• party responsible—usually creator
• publication date—date the data set is completed and ready for use
• title—”where” “what” “when”
• maintenance/update frequency—annually, as needed, based on
census, etc.
• bounding coordinates
• keywords (theme and place)
• access and use constraints—any restrictions, disclaimers, or
guidance on data set attribution
• contact details
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
18
Checklist: CSDGM Fields for
Preservation
Data Quality Information – provides historical
lineage and source descriptions for the data
used in the creation of the data set, including:
• originator
• publisher, publication date & place
• “currentness” of source data
• process description
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
19
Checklist: CSDGM Fields for
Preservation
Spatial Reference Information - description of
the reference frame for, and the means to
encode, coordinates in the data set, including:
• map projection name
• coordinate system name
• unit of measure
• geodetic model—datum, ellipsoid
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
20
Checklist: CSDGM Fields for
Preservation
Entity and Attribute Information - details about
content of the data set—the entities, their
attributes, and domains from which attribute
values may be assigned, including:
• entity label
• attribute label and description
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
21
Checklist: CSDGM Fields for
Preservation
Metadata Reference Information - information
on the party responsible for creating the
metadata and the currentness of the metadata:
• metadata standard name
• metadata standard version
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
http://www.geomapp.net/docs/GeoMetadata_Items_for_Preservation_2011_0110.pdf
22
Data Management Initiatives
Federal agency mandates for sponsored research:
• NSF & NIH requirements for DM plans
• GIS Inventory (Ramona) & Federal Grants data
sharing plans—gisinventory.net
Other related initiatives:
• USGS DM working group
• DM training for early career researchers
23
FGDC Geospatial Data Lifecycle Model
http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf
24
State & National Initiatives in
Geospatial Data Archiving
GeoMAPP - Geospatial Multistate Archive and
Preservation Partnership (www.geomapp.net):
• federally funded partnership between the Library of
Congress and state geospatial and archives staff from
North Carolina, Kentucky, Montana, and Utah
National Digital Stewardship Alliance (NDSA), Geospatial
Content Team (www.digitalpreservation.gov/ndsa):
• report identifying appraisal and selection activities as
they effect decisions defining geospatial content of
enduring value for the nation
25
Open GeoPortal @ Emory
NASA Goddard Photo and Video / CC BY
32
Green Question Mark by mikecogh on Flickr / CC BY
Contact Information:
Jennifer Doty | jennifer.doty@emory.edu
Data Management Specialist
Michael Page | michael.page@emory.edu
Geographer & Geospatial Data Librarian
Emory Center for Digital Scholarship
digitalscholarship.emory.edu
33
Download