PREMIS

advertisement
PREMIS
• What is PREMIS?
o Preservation Metadata Implementation Strategies
• When is PREMIS use?
o PREMIS is used for “repository design, evaluation, and
archived information packaged among preservation
repositories”
• How is PREMIS use?
o PREMIS Data Dictionary provides guidelines regarding
“the information a repository uses to support the
digital preservation process”
PREMIS
• What is “preservation metadata” referring to?
o “It is information that supports and documents the digital
preservation process”. Which includes information such as:
 Provenance – refers to who has ownership of the digital
object
 Authenticity – refers to the claim of the digital object
 Preservation activity – refers to the activities that have been
carried out to preserve the digital object
 Technical environment – refers to the tasks required to
interpret and use the digital object
 Rights management – refers the intellectual property rights
that must be declared
PREMIS Data Dictionary
•
Conventions for each entry in the PREMIS Data Dictionary
o Name of the semantic units : a descriptive name that refers to a piece of information or
knowledge
 Example: objectIdentifer under the <object>
o Semantic components: refers to sub-units held within a container.
o Definition: refers to the meaning of the semantic unit.
o Rationale: explains why the semantic unit is needed
o Data constraint: indicates how the semantic unit should be encoded.
 Containers: refers to a xml tag that have no value rather serve to group related semantic
units
 None: indicates that the semantic unit can be a value of any form
 Value should be taken from a controlled vocabulary: “PREMIS Data Dictionary does not
specify what this authority list of values should be, and it is assumed that different
repositories will use different vocabularies”.
 Extension containers: are containers that are designed to give a place for non-PREMIS
metadata
PREMIS Data Dictionary
o Object category: specifies the object to which the semantic unit
applies to (a representation, file, or bitstream).
o Applicability: indicates whether the semantic unit applies to the
category of the object.
o Example: sample values that the semantic unit may use
o Repeatability: indicates that a semantic unit can take multiple values
o Obligation: indicates whether the value of the semantic unit is
mandatory, meaning a repository must know this information
o Creation/Maintenance note: further detail regarding how the values
are created and or updated
o Usages notes: provides information regarding the use of the semantic
unit.
PREMIS Data Dictionary Mandatory
Semantic Units
• objectIdentifier *
• objectCategory
• objectCharacteristics *
• format *
• storage *
• eventIdentifier
• eventType
• eventDateTime
• agentIdentifier *
Note: * indicates semantic units that are repeatable
PREMIS Data Model
PREMIS Intellectual Entity
• Intellectual Entity – refer to content that can
be describe as a unit (e.g. books, maps,
articles)
PREMIS Object Entity
• Objects – refer to units of information in digital form.
PREMIS defines different kinds of objects it can an a
file, bitstream or representation
o File – it is a computer file, such as a pdf, txt or JPEG
o Bitstream – refer to data bits within a file that
contain common properties for preservation
purposes
PREMIS Object Entity
o Representation – refer to a set of files, that includes
structural metadata, required to be identified, stored and
maintained in order to assemble a complete rendition of
an Intellectual unit.
 For example, text files and images files of a magazine
are required to form a representation.
PREMIS Object Entity
•
•
Sample syntax
<object> </object>
The units of information that can be recorded includes:
o Type of object (file, bitstream, or representation)
o A unique identifier for the object under <objectIdentifier>
 Stores information such as the type and value. The type refers to the classification of the
domain that creates the object identifier. The value of the object identifier.
•
For example,
<object xsi:type="representation">
<objectIdentifier>
<objectIdentifierType>FDsys ACP</objectIdentifierType>
<objectIdentifierValue>R0b002ee180b003b0</objectIdentifierValue>
</objectIdentifier>
</object>
This particular segment states that this object is a representation (that is a set of files, this representation has a
unique identifier)
PREMIS Object Example
•
Other units of information that can be recorded includes:
o “Information indicating the policy on the set of preservation functions to
be applied to an object” under the <preservationLevel>
<object xsi:type="file">
<objectIdentifier>
<objectIdentifierType>FDsys ACP</objectIdentifierType>
<objectIdentifierValue>D09002ee180b003a9</objectIdentifierValue>
</objectIdentifier>
<preservationLevel>
<preservationLevelValue>full</preservationLevelValue>
</preservationLevel>
PREMIS Object Entity
• Other units of information that can be recorded includes:
o Information indicating if the object is subject to one or more processes
of decoding or unbundling under <compositeLevel>
o information used to verify if an object has been changed in an
undocumented or unauthorized way under <fixity>
 Information contained within the <messageDigestAlgorithm>
refers to the algorithm used to produce the message digest for the
digital object.
 Information contained within the <messageDigest> refers to the
“output of the message digest algorithm”
 Information contained within the <messageDigestOriginator>
refers to the agent that generated the original message digest that
will be compared to the fixity check.
PREMIS Object Entity
•
o The size of the object under <size>
o The format of the object under <format>
 <formatDesignation> refers to the “identification of the format of the object”
• Information contained within <formatName> classifies the format of the
file or bitstream.
 <formatRegistry> identifies additional information about the format by using a
entry in a format registry.
• Information contained within < formatRegistryName> identifies the
format registry that was used.
• Information contained within < formatRegistryKey> refers to the “unique
key used to reference an entry for this format in a format registry”
 <formatNote> contains additional information about the format
For example
PREMIS Object Example
<objectCharacteristics>
<compositionLevel>0</compositionLevel>
<fixity>
<messageDigestAlgorithm>SHA-256</messageDigestAlgorithm>
<messageDigest>4977070b92f0bb2642c6be368ad68a8d1d1c5dbbb3310544db781f56a860b0a1</messageDigest>
<messageDigestOriginator>FDsys</messageDigestOriginator>
</fixity>
<size>9326</size>
<format>
<formatDesignation>
<formatName>text/plain</formatName>
</formatDesignation>
<formatRegistry>
<formatRegistryName>PRONOM</formatRegistryName>
<formatRegistryKey>x-fmt/111</formatRegistryKey>
</formatRegistry>
<formatNote>Plain Text File</formatNote>
</format>
</objectCharacteristics>
PREMIS Object Entity
• Other units of information that can be recorded includes:
o The original name of the object (prior to being named by the
repository) under <originalName>
o Information about where and how a files are stored in the repository
under <storage>
 <contentLocation> stores information needed to retrieve a file
from a storage system.
• Information contained within <contentLocationType> refers to
the way of accessing the location of the content.
• Information contained within <contentLocationValue> refers
to the “location of the content used by the storage system”.
 The medium on which an object is stored is contained within
<storageMedium>
PREMIS Object Entity
o Information describing a relationship between an object and one or
more objects.
 <relationshipType> classifies the nature of the relationship.
 <relationshipSubType> characterizes the nature of the
relationship.
 <relatedObjectIdentification> refers to “the identifier of the
related resource”.
 Information contained within <relatedObjectIdentifierType>
refers to the classification of the domain that creates the
identifier.
 Information contained within <relatedObjectIdentifierValue>
refers to “the value of the identifier”.
PREMIS Object Example
<originalName>S3880IS.txt</originalName>
<storage>
<contentLocation>
<contentLocationType>URI</contentLocationType>
<contentLocationValue>file:/u02/app/emc/documentum/data/fdsysprod1/fdsysprod1/content_storage_0
1/00002ee1/80/55/b0/48.txt</contentLocationValue>
</contentLocation>
<storageMedium>hard disk</storageMedium>
</storage>
<relationship>
<relationshipType>structural</relationshipType>
<relationshipSubType>is part of</relationshipSubType>
<relatedObjectIdentification>
<relatedObjectIdentifierType>FDsys ACP</relatedObjectIdentifierType>
<relatedObjectIdentifierValue>R0b002ee180b003b0</relatedObjectIdentifierValue>
</relatedObjectIdentification>
</relationship>
</object>
PREMIS Event Entity
• Events – refers to actions that involve an object and an agent
known to the system
o Events are critical for maintaining the digital provenance of an
object (helps demonstrates the authenticity of the object)
• Examples of Events:
o modifying an document
o actions that create new relationships
 Object could be related to another object as a result of a particular event,
for instance if a program takes file 1 and generates a different version
known as file 2
o Actions that check the validity and integrity of the objects (i.e.
virus scan)
PREMIS Event Entity
• Sample syntax
<event> </event>
• The information that can be recorded under event includes:
o A unique identifier for the event under <eventIdentifier>
 The <eventIdentifierType> refers to the classification of the domain that
creates the event identifier. The <eventIdentiferValue refers to the
value of the event identifier.
o The type of event under <eventType>
 Classifies the nature of the event.
o Date, time and type of event under <eventDateTime>
PREMIS Event Entity
•
Additional information that can be recorded under event includes:
o Detail description of the event under <eventDetail>
o The outcome of the event under <eventOutcomeInformation>
 Indicates if the event was a success, partial success, or failure.
o Agents involved in the event and their specific roles under <linkingAgentIdentifier>
 The <linkingAgentIdentifierType> refers to the classification of the domain that
creates the linking agent identifier. The <linkingAgentIdentifierValue> refers to the
“value of the linking agent identifier”. The <linkingAgentRole> indicates the role of
the agent associated to the event.
 Agents role are defined here because agents can perform different roles in different
events
o Objects involved in the event and their specific roles under <linkingObjectIdentifier>
 The <linkingObjectIdentifierType> refers to the classification of the domain that
creates the linking object identifier. The <linkingObjectIdentifierValue> refers to the
“value of the linking object identifier”. The <linkingObjectRole> indicates the role of
the object associated to the event.
PREMIS Event Example
<event>
<eventIdentifier>
<eventIdentifierType>FDsys:event</eventIdentifierType>
<eventIdentifierValue>1cdd2b6c-5a2d-449b-b386-ebb15eb4af11</eventIdentifierValue>
</eventIdentifier>
<eventType>Rendition Submitted</eventType>
<eventDateTime>2010-10-06T19:38:47-04:00</eventDateTime>
<eventDetail>Rendition R0b002ee180b003b0, uploaded by hotfolderadmin, was submitted in the Submission Information package
P0b002ee180b003af</eventDetail>
<eventOutcomeInformation>
<eventOutcome>Success</eventOutcome>
</eventOutcomeInformation>
<linkingAgentIdentifier>
<linkingAgentIdentifierType>FDsys:agent</linkingAgentIdentifierType>
<linkingAgentIdentifierValue>hotfolderadmin</linkingAgentIdentifierValue>
<linkingAgentRole>implementer</linkingAgentRole>
</linkingAgentIdentifier>
<linkingObjectIdentifier>
<linkingObjectIdentifierType>FDsys</linkingObjectIdentifierType>
<linkingObjectIdentifierValue>R0b002ee180b003b0</linkingObjectIdentifierValue>
<linkingObjectRole>outcome</linkingObjectRole>
</linkingObjectIdentifier>
</event>
PREMIS Agent Entity
• Agents – refer to people, organizations, or
software associated with events, more
specifically preservation events, of an object
o In the data model diagram, there is no arrow from
Agent entity to the Object entity, that is because
Agents influence Objects indirectly through
Events.
PREMIS Agent Entity
• Sample syntax
<agent> </agent>
• The information that can be recorded under agent includes:
o A unique identifier for the agent under <agentIdentifier>
 Information contained within <agentIdentifierType> refers to
the classification of the domain that creates the agent
identifier.
 Information contained within <agentIdentifierValue> refers
“value of the agent identifier”.
o The agent’s name under <agentName>
o The type of agent (people, organization or software) under
<agentType>
PREMIS Agent Example
<agent>
<agentIdentifier>
<agentIdentifierType>FDsys:agent</agentIdentifierType>
<agentIdentifierValue>hotfolderadmin</agentIdentifierValue>
</agentIdentifier>
<agentName>hotfolderadmin</agentName>
<agentType>Person</agentType>
</agent>
PREMIS Rights Entity
• Rights – refers to the rights and permission that are directly
relevant to preserving objects
• Sample syntax
<rights> </rights>
• The information that can be recorded under right includes:
o A unique identifier for the rights statement
o The action(s) that the rights statement allows
o The object(s) to which the statement applies
o The agents involved in the rights statements and their roles
• Note: Keep in mind that FDsys doesn’t use <rights>
premis.xml Header
•
xmlns - refers to a namespace, which is a unique value (Note: the xml parser
does not use the namespace URI to look up information)
• xmlns:xsi – indicates to the XML parser that this document should be
validated against a schema
• xsi:schemaLocation: the first value refers to the namespace that will be used
and the second value refers to the location of the schema that will be used,
in this case it is the MODS XML schema.
• version: refers to the PREMIS version
• Example:
<premis xmlns="info:lc/xmlns/premis-v2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="info:lc/xmlns/premis-v2
http://www.loc.gov/standards/premis/premis.xsd"
version="2.0">
Additional Information On Using
premis.xml
•
When will premis.xml be used by METS (aip.xml) ?
o PREMIS digital object that requires its components to be organized so that the integrity is
preserved, this is known as structural metadata, will use METS to accomplished this.
o METS uses a pointer to the metadata that is located outside of the METS document. More
specifically, it uses a xlink:href to indicate the location of such file.
o Example code from aip.xml:
<!-- PREMIS OBJECT -->
<mets:amdSec ID="AMD_OTHER">
<mets:techMD ID="D09002ee180affcca-TEC">
<mets:mdRef ID="M09002ee180affcca-tdiv" MDTYPE="PREMIS" MIMETYPE="text/xml"
LOCTYPE="URL" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:/premis.xml" />
</mets:techMD>
<mets:digiprovMD ID="D09002ee180affcca-DIG">
<mets:mdRef ID="M09002ee180affcca-ddiv" MDTYPE="PREMIS" MIMETYPE="text/xml"
LOCTYPE="URL" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:/premis.xml" />
</mets:digiprovMD>
</mets:amdSec>
Additional Information On Using
premis.xml
o
o
Mets will use a struct map to organize the components of the premis object.
Example code from aip.xml:
<mets:structMap>
<mets:div ID="R0b002ee180b0044c-div" LABEL="xml-submitted">
<mets:fptr FILEID="D09002ee180b00449" />
<mets:div ID="R0b002ee180b00452-div" LABEL="Graphic Support Documents">
<mets:fptr FILEID="D09002ee180affca3" />
<mets:fptr FILEID="D09002ee180b0045b" />
<mets:fptr FILEID="D09002ee180b00464" />
<mets:fptr FILEID="D09002ee180b0046e" />
<mets:fptr FILEID="D09002ee180b00477" />
<mets:fptr FILEID="D09002ee180b0047c" />
<mets:fptr FILEID="D09002ee180b00483" />
<mets:fptr FILEID="D09002ee180b00493" />
<mets:fptr FILEID="D09002ee180b00499" />
<mets:fptr FILEID="D09002ee180b004a4" />
<mets:fptr FILEID="D09002ee180b004af" />
<mets:fptr FILEID="D09002ee180b004bc" />
<mets:fptr FILEID="D09002ee180b004c6" />
<mets:fptr FILEID="D09002ee180b004ce" />
<mets:fptr FILEID="D09002ee180b004cf" />
</mets:div>
</mets:div>
</mets:structMap>
References
• Understanding PREMIS
o http://www.loc.gov/standards/premis/understand
ing-premis.pdf
• Data Dictionary for Preservation Metadata
o http://www.oclc.org/research/activities/past/orpr
ojects/pmwg/premis-final.pdf
• W3C Schools
o http://www.w3schools.com/xml/default.asp
Download
Related flashcards

Sound recording

24 cards

Computer file systems

17 cards

XML

28 cards

XML

35 cards

Computer file formats

58 cards

Create Flashcards