NGDA Format Registries Presentation

advertisement
NGDA Format Registry
 Why do we need a FR?




May 16, 2005
We are designing with long-term storage in mind (>
100 years)
Cannot depend on format spec to be available via
url or even a format registry that might not still be
up to date or in existence
Thus semantic definition of format must be
archived with the object itself
This semantic definition must be comprehensive so
that format can be accessed even if current access
mechanisms no longer exist!
Catherine Masi, National Geospatial
Digital Archive
NGDA Format Registry
 Two major tasks
 Analyze and define spatial data formats
(Meredith Williams)
 Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR (Catherine Masi)
May 16, 2005
Catherine Masi, National Geospatial
Digital Archive
Analyze and define spatial data formats
 Is there a comprehensive list of geospatial
formats? Are they defined? How?

May 16, 2005
List of Spatial Data Formats - MW
 Digital Map Formats
 Vector File Formats
 Raster File Formats
 Other categories - TIN, ASCII, 3D, Tabular
Databases
 Unacceptable Formats
Catherine Masi, National Geospatial
Digital Archive
Analyze and define spatial data formats
 What formats do we have in ADL? How do we define them?

ADL format documentation



ADL website:
http://www.alexandria.ucsb.edu/adl/Collection%20Developmen
t/BucketDescrip.htm
MIME types: http://www.iana.org/assignments/media-types/
ADL literature/presentations:






May 16, 2005
Format
type: hierarchical
vocabulary: ADL Object Format Thesaurus
 loosely based on MIME
multiple values: union
compare: DC.Format
ADL Webclient list:
http://webclient.alexandria.ucsb.edu/mw/index.jsp
Catherine Masi, National Geospatial
Digital Archive
Analyze and define spatial data formats
 What are our preferred formats for NGDA, if any?

MW tested three geospatial formats using
Sustainability Test derived from LCDF

GJ - "we can ingest anything if we have the
definition representation information"
Decided to limit allowed formats to a few the first
year – CASIL test suite (geotiff, shapefile)
What if there is free proprietary software, such as
from ESRI, that allows one to look the files.
Should we request and archive that as well? - No
(UCSB)


May 16, 2005
Catherine Masi, National Geospatial
Digital Archive
Analyze and define spatial data formats
 How will we define our formats?


Using Meredith's list of Spatial Data Formats
Begin defining using LoC Digital Formats as an
example
 How do we know that we have sufficient semantic
information to define each geospatial format?


What information is required to make the format
usable? Ask the users.
What information is required to programmatically
access the format if current access mechanisms
become obsolete?
 Prioritize and start with most important/ubiquitous
formats for our archive
 Cooordinate with format definitions in Jhove
May 16, 2005
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 What format registries are out there?



Library of Congress Digital Formats (LCDF)
Global Digital Format Registry (GDFR) Harvard
 Global Digital Format Registry Description
 Ockerbloom's Format Registry Demonstrator
(FRED)
PRONOM - File format registry - UK archives

May 16, 2005
Practical, in use, not geo-spatial
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 Coordinate our efforts with the LCDF, GDFR,
FRED, TOM


NH initiated contact (Stephen Abrams, John
Ockerbloom, Steve Morris, etc.) at DLF
Questions for DFL meeting to get discussion
started.

May 16, 2005
Questions that we formulated showed that we
have to solve a lot of these problems on our own,
especially with regard to the technical aspects of
building a FR and interaction mechanisms
between LC, GDFR and our local FR
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 Do the existing format registries
contain geospatial formats?

May 16, 2005
No, in the future we will contribute
geospatial formats to an existing
registry effort such as LCDF or GDFR
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 Do the existing format registries
support access and contribution
mechanisms?

May 16, 2005
No.
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 How are Library of Congress Digital
Formats stored internally? Database?
XML? Directory structure?

May 16, 2005
In MS Word files
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 Is there a data dictionary or other
mechanism for defining fields in
LCDF?

May 16, 2005
FDD
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR

CM contacted Steve Morris (NCSU - NDIIPP),
Stephen Abrams (Harvard - GDFR) and John Mark
Ockerbloom (Penn - FRED), to open up a
discussion on the technical aspects of developing
a geospatial format registry.
 S.
Abrams responded that GDFR is still
only an idea rather than a reality and that
a technical discussion of how our GIS
formats should be managed in a GDFRconformant way is a bit premature
May 16, 2005
Catherine Masi, National Geospatial
Digital Archive
Develop local format registry with
programmatic interface to existing
authoritative/collaborative FR
 What are the requirements for the NGDA
Format Registry?






May 16, 2005
independent
contains sufficient semantic information to
programmatically access format (UCSB)
contains geospatial reference information
definitions exist in simple documented format
in simple directory structure
access/search mechanism not necessary for
access
interfaces with collaborative authoritative FR
for updates and contributions
Catherine Masi, National Geospatial
Digital Archive
First steps:
 CM began prototyping the physical structure of format
registry using 2 CASIL formats, geotiff and shapefile.







May 16, 2005
Created directory based registry.
Incorporated info from MW's documents Spatial Data Formats
and Sustainability Test
Created record layout loosely based on Library of Congress
Digital Formats but including spatial reference information.
Included format spec as local website (in the case of geotiff)
and as local pdf file (in the case of shapefile).
All links on record referred to local copies of format
information.
All documentation about the format is located locally in that
format's directory
Entries are not complete. This is just a first pass at what the
html-rendered format entries will look like. Focus here is on
physical structure rather than content.
Catherine Masi, National Geospatial
Digital Archive
First steps:
 Refining content using input from DV, MW and from
actual data users as to what is needed to adequately
define a format.
 Determine sufficient semantic info to define
geospatial formats
 Review CASIL formats. Began to flesh out
sufficient semantic info. Started with geotiff,
shapefile.
 Review record layout and add, change and
delete fields.
May 16, 2005
Catherine Masi, National Geospatial
Digital Archive
Next steps
 Make sure format spec is complete and all information is




May 16, 2005
located locally where possible.
Determine where we draw the line between format registry
information/policy/higher level descriptive metadata.
Format registry will stick to format spec and a few other
important fields only.
Develop xml stylesheet of record layout. Decided that html,
xml and pdf are acceptable archivable formats for format
registry information.
Flatten the directory structure (hierarchy) because tfw, for
example, is not a subtype of geotiff but can be attached to
a tiff or another format. Work more on trying to find a
sensible organization for the files in our FR
Link to other parts of Archive (Descriptive Metadata) from
within FR
Catherine Masi, National Geospatial
Digital Archive
Later
 Develop method of search, retrieval,
update
 Begin to develop programmatic
interface to LoC Digital Formats or
other authoritative/collaborative
format registry
May 16, 2005
Catherine Masi, National Geospatial
Digital Archive
Download