NGDA Format Registry Why do we need a FR? May 16, 2005 We are designing with long-term storage in mind (> 100 years) Cannot depend on format spec to be available via url or even a format registry that might not still be up to date or in existence Thus semantic definition of format must be archived with the object itself This semantic definition must be comprehensive so that format can be accessed even if current access mechanisms no longer exist! Catherine Masi, National Geospatial Digital Archive NGDA Format Registry Two major tasks Analyze and define spatial data formats (Meredith Williams) Develop local format registry with programmatic interface to existing authoritative/collaborative FR (Catherine Masi) May 16, 2005 Catherine Masi, National Geospatial Digital Archive Analyze and define spatial data formats Is there a comprehensive list of geospatial formats? Are they defined? How? May 16, 2005 List of Spatial Data Formats - MW Digital Map Formats Vector File Formats Raster File Formats Other categories - TIN, ASCII, 3D, Tabular Databases Unacceptable Formats Catherine Masi, National Geospatial Digital Archive Analyze and define spatial data formats What formats do we have in ADL? How do we define them? ADL format documentation ADL website: http://www.alexandria.ucsb.edu/adl/Collection%20Developmen t/BucketDescrip.htm MIME types: http://www.iana.org/assignments/media-types/ ADL literature/presentations: May 16, 2005 Format type: hierarchical vocabulary: ADL Object Format Thesaurus loosely based on MIME multiple values: union compare: DC.Format ADL Webclient list: http://webclient.alexandria.ucsb.edu/mw/index.jsp Catherine Masi, National Geospatial Digital Archive Analyze and define spatial data formats What are our preferred formats for NGDA, if any? MW tested three geospatial formats using Sustainability Test derived from LCDF GJ - "we can ingest anything if we have the definition representation information" Decided to limit allowed formats to a few the first year – CASIL test suite (geotiff, shapefile) What if there is free proprietary software, such as from ESRI, that allows one to look the files. Should we request and archive that as well? - No (UCSB) May 16, 2005 Catherine Masi, National Geospatial Digital Archive Analyze and define spatial data formats How will we define our formats? Using Meredith's list of Spatial Data Formats Begin defining using LoC Digital Formats as an example How do we know that we have sufficient semantic information to define each geospatial format? What information is required to make the format usable? Ask the users. What information is required to programmatically access the format if current access mechanisms become obsolete? Prioritize and start with most important/ubiquitous formats for our archive Cooordinate with format definitions in Jhove May 16, 2005 Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR What format registries are out there? Library of Congress Digital Formats (LCDF) Global Digital Format Registry (GDFR) Harvard Global Digital Format Registry Description Ockerbloom's Format Registry Demonstrator (FRED) PRONOM - File format registry - UK archives May 16, 2005 Practical, in use, not geo-spatial Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR Coordinate our efforts with the LCDF, GDFR, FRED, TOM NH initiated contact (Stephen Abrams, John Ockerbloom, Steve Morris, etc.) at DLF Questions for DFL meeting to get discussion started. May 16, 2005 Questions that we formulated showed that we have to solve a lot of these problems on our own, especially with regard to the technical aspects of building a FR and interaction mechanisms between LC, GDFR and our local FR Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR Do the existing format registries contain geospatial formats? May 16, 2005 No, in the future we will contribute geospatial formats to an existing registry effort such as LCDF or GDFR Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR Do the existing format registries support access and contribution mechanisms? May 16, 2005 No. Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR How are Library of Congress Digital Formats stored internally? Database? XML? Directory structure? May 16, 2005 In MS Word files Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR Is there a data dictionary or other mechanism for defining fields in LCDF? May 16, 2005 FDD Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR CM contacted Steve Morris (NCSU - NDIIPP), Stephen Abrams (Harvard - GDFR) and John Mark Ockerbloom (Penn - FRED), to open up a discussion on the technical aspects of developing a geospatial format registry. S. Abrams responded that GDFR is still only an idea rather than a reality and that a technical discussion of how our GIS formats should be managed in a GDFRconformant way is a bit premature May 16, 2005 Catherine Masi, National Geospatial Digital Archive Develop local format registry with programmatic interface to existing authoritative/collaborative FR What are the requirements for the NGDA Format Registry? May 16, 2005 independent contains sufficient semantic information to programmatically access format (UCSB) contains geospatial reference information definitions exist in simple documented format in simple directory structure access/search mechanism not necessary for access interfaces with collaborative authoritative FR for updates and contributions Catherine Masi, National Geospatial Digital Archive First steps: CM began prototyping the physical structure of format registry using 2 CASIL formats, geotiff and shapefile. May 16, 2005 Created directory based registry. Incorporated info from MW's documents Spatial Data Formats and Sustainability Test Created record layout loosely based on Library of Congress Digital Formats but including spatial reference information. Included format spec as local website (in the case of geotiff) and as local pdf file (in the case of shapefile). All links on record referred to local copies of format information. All documentation about the format is located locally in that format's directory Entries are not complete. This is just a first pass at what the html-rendered format entries will look like. Focus here is on physical structure rather than content. Catherine Masi, National Geospatial Digital Archive First steps: Refining content using input from DV, MW and from actual data users as to what is needed to adequately define a format. Determine sufficient semantic info to define geospatial formats Review CASIL formats. Began to flesh out sufficient semantic info. Started with geotiff, shapefile. Review record layout and add, change and delete fields. May 16, 2005 Catherine Masi, National Geospatial Digital Archive Next steps Make sure format spec is complete and all information is May 16, 2005 located locally where possible. Determine where we draw the line between format registry information/policy/higher level descriptive metadata. Format registry will stick to format spec and a few other important fields only. Develop xml stylesheet of record layout. Decided that html, xml and pdf are acceptable archivable formats for format registry information. Flatten the directory structure (hierarchy) because tfw, for example, is not a subtype of geotiff but can be attached to a tiff or another format. Work more on trying to find a sensible organization for the files in our FR Link to other parts of Archive (Descriptive Metadata) from within FR Catherine Masi, National Geospatial Digital Archive Later Develop method of search, retrieval, update Begin to develop programmatic interface to LoC Digital Formats or other authoritative/collaborative format registry May 16, 2005 Catherine Masi, National Geospatial Digital Archive