• What is PREMIS? o Preservation Metadata Implementation Strategies
• When is PREMIS use?
o PREMIS is used for “repository design, evaluation, and archived information packaged among preservation repositories”
• How is PREMIS use?
o PREMIS Data Dictionary provides guidelines regarding
“the information a repository uses to support the digital preservation process”
• What is “preservation metadata” referring to?
o “It is information that supports and documents the digital preservation process”. Which includes information such as:
Provenance – refers to who has ownership of the digital object
Authenticity – refers to the claim of the digital object
Preservation activity – refers to the activities that have been carried out to preserve the digital object
Technical environment – refers to the tasks required to interpret and use the digital object
Rights management – refers the intellectual property rights that must be declared
• Conventions for each entry in the PREMIS Data Dictionary o Name of the semantic units : a descriptive name that refers to a piece of information or knowledge
Example: objectIdentifer under the <object> o Semantic components: refers to sub-units held within a container.
o Definition: refers to the meaning of the semantic unit.
o Rationale: explains why the semantic unit is needed o Data constraint: indicates how the semantic unit should be encoded.
Containers: refers to a xml tag that have no value rather serve to group related semantic units
None: indicates that the semantic unit can be a value of any form
Value should be taken from a controlled vocabulary: “PREMIS Data Dictionary does not specify what this authority list of values should be, and it is assumed that different repositories will use different vocabularies”.
Extension containers: are containers that are designed to give a place for non-PREMIS metadata
o Object category: specifies the object to which the semantic unit applies to (a representation, file, or bitstream).
o Applicability: indicates whether the semantic unit applies to the category of the object.
o Example: sample values that the semantic unit may use o Repeatability: indicates that a semantic unit can take multiple values o Obligation: indicates whether the value of the semantic unit is mandatory, meaning a repository must know this information o Creation/Maintenance note: further detail regarding how the values are created and or updated o Usages notes: provides information regarding the use of the semantic unit.
• objectIdentifier *
• objectCategory
• objectCharacteristics *
• format *
• storage *
• eventIdentifier
• eventType
• eventDateTime
• agentIdentifier *
Note: * indicates semantic units that are repeatable
•
o
o
o Representation – refer to a set of files, that includes structural metadata, required to be identified, stored and maintained in order to assemble a complete rendition of an Intellectual unit.
For example, text files and images files of a magazine are required to form a representation.
• Sample syntax
<object> </object>
• The units of information that can be recorded includes: o Type of object (file, bitstream, or representation) o A unique identifier for the object under <objectIdentifier>
Stores information such as the type and value. The type refers to the classification of the domain that creates the object identifier. The value of the object identifier.
• For example,
<object xsi:type="representation">
<objectIdentifier>
<objectIdentifierType>FDsys ACP</objectIdentifierType>
<objectIdentifierValue>R0b002ee180b003b0</objectIdentifierValue>
</objectIdentifier>
</object>
This particular segment states that this object is a representation (that is a set of files, this representation has a unique identifier)
• Other units of information that can be recorded includes: o “Information indicating the policy on the set of preservation functions to be applied to an object” under the <preservationLevel>
<object xsi:type="file">
<objectIdentifier>
<objectIdentifierType>FDsys ACP</objectIdentifierType>
<objectIdentifierValue>D09002ee180b003a9</objectIdentifierValue>
</objectIdentifier>
<preservationLevel>
<preservationLevelValue>full</preservationLevelValue>
</preservationLevel>
• Other units of information that can be recorded includes: o Information indicating if the object is subject to one or more processes of decoding or unbundling under <compositeLevel> o information used to verify if an object has been changed in an undocumented or unauthorized way under <fixity>
Information contained within the <messageDigestAlgorithm> refers to the algorithm used to produce the message digest for the digital object.
Information contained within the <messageDigest> refers to the
“output of the message digest algorithm”
Information contained within the <messageDigestOriginator> refers to the agent that generated the original message digest that will be compared to the fixity check.
o The size of the object under <size> o The format of the object under <format>
<formatDesignation> refers to the “identification of the format of the object”
• Information contained within <formatName> classifies the format of the file or bitstream.
<formatRegistry> identifies additional information about the format by using a entry in a format registry.
• Information contained within < formatRegistryName> identifies the format registry that was used.
• Information contained within < formatRegistryKey> refers to the “unique key used to reference an entry for this format in a format registry”
<formatNote> contains additional information about the format
• For example
<objectCharacteristics>
<compositionLevel>0</compositionLevel>
<fixity>
<messageDigestAlgorithm>SHA-256</messageDigestAlgorithm>
<messageDigest>4977070b92f0bb2642c6be368ad68a8d1d1c5dbbb3310544db781f56a860b0a1</messageDigest>
<messageDigestOriginator>FDsys</messageDigestOriginator>
</fixity>
<size>9326</size>
<format>
<formatDesignation>
<formatName>text/plain</formatName>
</formatDesignation>
<formatRegistry>
<formatRegistryName>PRONOM</formatRegistryName>
<formatRegistryKey>x-fmt/111</formatRegistryKey>
</format>
</objectCharacteristics>
</formatRegistry>
<formatNote>Plain Text File</formatNote>
• Other units of information that can be recorded includes: o The original name of the object (prior to being named by the repository) under <originalName> o Information about where and how a files are stored in the repository under <storage>
<contentLocation> stores information needed to retrieve a file from a storage system.
• Information contained within <contentLocationType> refers to the way of accessing the location of the content.
• Information contained within <contentLocationValue> refers to the “location of the content used by the storage system”.
The medium on which an object is stored is contained within
<storageMedium>
o Information describing a relationship between an object and one or more objects.
<relationshipType> classifies the nature of the relationship.
<relationshipSubType> characterizes the nature of the relationship.
<relatedObjectIdentification> refers to “the identifier of the related resource”.
Information contained within <relatedObjectIdentifierType> refers to the classification of the domain that creates the identifier.
Information contained within <relatedObjectIdentifierValue> refers to “the value of the identifier”.
<originalName>S3880IS.txt</originalName>
<storage>
<contentLocation>
<contentLocationType>URI</contentLocationType>
<contentLocationValue>file:/u02/app/emc/documentum/data/fdsysprod1/fdsysprod1/content_storage_0
1/00002ee1/80/55/b0/48.txt</contentLocationValue>
</contentLocation>
<storageMedium>hard disk</storageMedium>
</storage>
<relationship>
<relationshipType>structural</relationshipType>
<relationshipSubType>is part of</relationshipSubType>
<relatedObjectIdentification>
</relatedObjectIdentification>
</relationship>
</object>
<relatedObjectIdentifierType>FDsys ACP</relatedObjectIdentifierType>
<relatedObjectIdentifierValue>R0b002ee180b003b0</relatedObjectIdentifierValue>
• Events – refers to actions that involve an object and an agent known to the system o Events are critical for maintaining the digital provenance of an object (helps demonstrates the authenticity of the object)
• Examples of Events: o modifying an document o actions that create new relationships
Object could be related to another object as a result of a particular event, for instance if a program takes file 1 and generates a different version known as file 2 o Actions that check the validity and integrity of the objects (i.e. virus scan)
• Sample syntax
<event> </event>
• The information that can be recorded under event includes: o A unique identifier for the event under <eventIdentifier>
The <eventIdentifierType> refers to the classification of the domain that creates the event identifier . The <eventIdentiferValue refers to the value of the event identifier.
o The type of event under <eventType>
Classifies the nature of the event.
o Date, time and type of event under <eventDateTime>
• Additional information that can be recorded under event includes: o Detail description of the event under <eventDetail> o The outcome of the event under <eventOutcomeInformation>
Indicates if the event was a success, partial success, or failure.
o Agents involved in the event and their specific roles under <linkingAgentIdentifier>
The <linkingAgentIdentifierType> refers to the classification of the domain that creates the linking agent identifier. The <linkingAgentIdentifierValue> refers to the
“value of the linking agent identifier”. The <linkingAgentRole> indicates the role of the agent associated to the event.
Agents role are defined here because agents can perform different roles in different events o Objects involved in the event and their specific roles under <linkingObjectIdentifier>
The <linkingObjectIdentifierType> refers to the classification of the domain that creates the linking object identifier. The <linkingObjectIdentifierValue> refers to the
“value of the linking object identifier”. The <linkingObjectRole> indicates the role of the object associated to the event.
<event>
<eventIdentifier>
<eventIdentifierType>FDsys:event</eventIdentifierType>
<eventIdentifierValue>1cdd2b6c-5a2d-449b-b386-ebb15eb4af11</eventIdentifierValue>
</eventIdentifier>
<eventType>Rendition Submitted</eventType>
<eventDateTime>2010-10-06T19:38:47-04:00</eventDateTime>
<eventDetail>Rendition R0b002ee180b003b0, uploaded by hotfolderadmin, was submitted in the Submission Information package
P0b002ee180b003af</eventDetail>
<eventOutcomeInformation>
<eventOutcome>Success</eventOutcome>
</eventOutcomeInformation>
<linkingAgentIdentifier>
<linkingAgentIdentifierType>FDsys:agent</linkingAgentIdentifierType>
<linkingAgentIdentifierValue>hotfolderadmin</linkingAgentIdentifierValue>
<linkingAgentRole>implementer</linkingAgentRole>
</linkingAgentIdentifier>
<linkingObjectIdentifier>
<linkingObjectIdentifierType>FDsys</linkingObjectIdentifierType>
<linkingObjectIdentifierValue>R0b002ee180b003b0</linkingObjectIdentifierValue>
<linkingObjectRole>outcome</linkingObjectRole>
</linkingObjectIdentifier>
</event>
•
o
• Sample syntax
<agent> </agent>
• The information that can be recorded under agent includes: o A unique identifier for the agent under <agentIdentifier>
Information contained within <agentIdentifierType> refers to the classification of the domain that creates the agent identifier.
Information contained within <agentIdentifierValue> refers
“value of the agent identifier”.
o The agent’s name under <agentName> o The type of agent (people, organization or software) under
<agentType>
<agent>
<agentIdentifier>
<agentIdentifierType>FDsys:agent</agentIdentifierType>
<agentIdentifierValue>hotfolderadmin</agentIdentifierValue>
</agentIdentifier>
<agentName>hotfolderadmin</agentName>
<agentType>Person</agentType>
</agent>
• Rights – refers to the rights and permission that are directly relevant to preserving objects
• Sample syntax
<rights> </rights>
• The information that can be recorded under right includes: o A unique identifier for the rights statement o The action(s) that the rights statement allows o The object(s) to which the statement applies o The agents involved in the rights statements and their roles
• Note: Keep in mind that FDsys doesn’t use <rights>
• xmlns - refers to a namespace, which is a unique value (Note: the xml parser does not use the namespace URI to look up information)
• xmlns:xsi – indicates to the XML parser that this document should be validated against a schema
• xsi:schemaLocation: the first value refers to the namespace that will be used and the second value refers to the location of the schema that will be used, in this case it is the MODS XML schema.
• version: refers to the PREMIS version
• Example:
<premis xmlns="info:lc/xmlns/premis-v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2
http://www.loc.gov/standards/premis/premis.xsd" version="2.0">
• When will premis.xml be used by METS (aip.xml) ?
o PREMIS digital object that requires its components to be organized so that the integrity is preserved, this is known as structural metadata, will use METS to accomplished this.
o METS uses a pointer to the metadata that is located outside of the METS document. More specifically, it uses a xlink:href to indicate the location of such file. o Example code from aip.xml:
<!-- PREMIS OBJECT -->
<mets:amdSec ID="AMD_OTHER">
<mets:techMD ID="D09002ee180affcca-TEC">
<mets:mdRef ID="M09002ee180affcca-tdiv" MDTYPE="PREMIS" MIMETYPE="text/xml"
LOCTYPE="URL" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:/premis.xml" />
</mets:techMD>
<mets:digiprovMD ID="D09002ee180affcca-DIG">
<mets:mdRef ID="M09002ee180affcca-ddiv" MDTYPE="PREMIS" MIMETYPE="text/xml"
LOCTYPE="URL" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:/premis.xml" />
</mets:digiprovMD>
</mets:amdSec>
o Mets will use a struct map to organize the components of the premis object.
o Example code from aip.xml:
<mets:structMap>
<mets:div ID="R0b002ee180b0044c-div" LABEL="xml-submitted">
<mets:fptr FILEID="D09002ee180b00449" />
<mets:div ID="R0b002ee180b00452-div" LABEL="Graphic Support Documents">
<mets:fptr FILEID="D09002ee180affca3" />
<mets:fptr FILEID="D09002ee180b0045b" />
<mets:fptr FILEID="D09002ee180b00464" />
<mets:fptr FILEID="D09002ee180b0046e" />
<mets:fptr FILEID="D09002ee180b00477" />
<mets:fptr FILEID="D09002ee180b0047c" />
<mets:fptr FILEID="D09002ee180b00483" />
<mets:fptr FILEID="D09002ee180b00493" />
<mets:fptr FILEID="D09002ee180b00499" />
<mets:fptr FILEID="D09002ee180b004a4" />
<mets:fptr FILEID="D09002ee180b004af" />
<mets:fptr FILEID="D09002ee180b004bc" />
<mets:fptr FILEID="D09002ee180b004c6" />
</mets:div>
</mets:div>
<mets:fptr FILEID="D09002ee180b004ce" />
<mets:fptr FILEID="D09002ee180b004cf" />
</mets:structMap>
•
o
•
o
•
o