PREMIS

advertisement

PREMIS

• What is PREMIS? o Preservation Metadata Implementation Strategies

• When is PREMIS use?

o PREMIS is used for “repository design, evaluation, and archived information packaged among preservation repositories”

• How is PREMIS use?

o PREMIS Data Dictionary provides guidelines regarding

“the information a repository uses to support the digital preservation process”

PREMIS

• What is “preservation metadata” referring to?

o “It is information that supports and documents the digital preservation process”. Which includes information such as:

 Provenance – refers to who has ownership of the digital object

 Authenticity – refers to the claim of the digital object

 Preservation activity – refers to the activities that have been carried out to preserve the digital object

 Technical environment – refers to the tasks required to interpret and use the digital object

 Rights management – refers the intellectual property rights that must be declared

PREMIS Data Dictionary

• Conventions for each entry in the PREMIS Data Dictionary o Name of the semantic units : a descriptive name that refers to a piece of information or knowledge

 Example: objectIdentifer under the <object> o Semantic components: refers to sub-units held within a container.

o Definition: refers to the meaning of the semantic unit.

o Rationale: explains why the semantic unit is needed o Data constraint: indicates how the semantic unit should be encoded.

 Containers: refers to a xml tag that have no value rather serve to group related semantic units

 None: indicates that the semantic unit can be a value of any form

 Value should be taken from a controlled vocabulary: “PREMIS Data Dictionary does not specify what this authority list of values should be, and it is assumed that different repositories will use different vocabularies”.

 Extension containers: are containers that are designed to give a place for non-PREMIS metadata

PREMIS Data Dictionary

o Object category: specifies the object to which the semantic unit applies to (a representation, file, or bitstream).

o Applicability: indicates whether the semantic unit applies to the category of the object.

o Example: sample values that the semantic unit may use o Repeatability: indicates that a semantic unit can take multiple values o Obligation: indicates whether the value of the semantic unit is mandatory, meaning a repository must know this information o Creation/Maintenance note: further detail regarding how the values are created and or updated o Usages notes: provides information regarding the use of the semantic unit.

PREMIS Data Dictionary Mandatory

Semantic Units

• objectIdentifier *

• objectCategory

• objectCharacteristics *

• format *

• storage *

• eventIdentifier

• eventType

• eventDateTime

• agentIdentifier *

Note: * indicates semantic units that are repeatable

PREMIS Data Model

PREMIS Intellectual Entity

Intellectual Entity – refer to content that can be describe as a unit (e.g. books, maps, articles)

PREMIS Object Entity

• Objects – refer to units of information in digital form.

PREMIS defines different kinds of objects it can an a file, bitstream or representation

o

File – it is a computer file, such as a pdf, txt or JPEG

o

Bitstream – refer to data bits within a file that contain common properties for preservation purposes

PREMIS Object Entity

o Representation – refer to a set of files, that includes structural metadata, required to be identified, stored and maintained in order to assemble a complete rendition of an Intellectual unit.

 For example, text files and images files of a magazine are required to form a representation.

PREMIS Object Entity

• Sample syntax

<object> </object>

• The units of information that can be recorded includes: o Type of object (file, bitstream, or representation) o A unique identifier for the object under <objectIdentifier>

 Stores information such as the type and value. The type refers to the classification of the domain that creates the object identifier. The value of the object identifier.

• For example,

<object xsi:type="representation">

<objectIdentifier>

<objectIdentifierType>FDsys ACP</objectIdentifierType>

<objectIdentifierValue>R0b002ee180b003b0</objectIdentifierValue>

</objectIdentifier>

</object>

This particular segment states that this object is a representation (that is a set of files, this representation has a unique identifier)

PREMIS Object Example

• Other units of information that can be recorded includes: o “Information indicating the policy on the set of preservation functions to be applied to an object” under the <preservationLevel>

<object xsi:type="file">

<objectIdentifier>

<objectIdentifierType>FDsys ACP</objectIdentifierType>

<objectIdentifierValue>D09002ee180b003a9</objectIdentifierValue>

</objectIdentifier>

<preservationLevel>

<preservationLevelValue>full</preservationLevelValue>

</preservationLevel>

PREMIS Object Entity

• Other units of information that can be recorded includes: o Information indicating if the object is subject to one or more processes of decoding or unbundling under <compositeLevel> o information used to verify if an object has been changed in an undocumented or unauthorized way under <fixity>

 Information contained within the <messageDigestAlgorithm> refers to the algorithm used to produce the message digest for the digital object.

 Information contained within the <messageDigest> refers to the

“output of the message digest algorithm”

 Information contained within the <messageDigestOriginator> refers to the agent that generated the original message digest that will be compared to the fixity check.

PREMIS Object Entity

o The size of the object under <size> o The format of the object under <format>

 <formatDesignation> refers to the “identification of the format of the object”

• Information contained within <formatName> classifies the format of the file or bitstream.

 <formatRegistry> identifies additional information about the format by using a entry in a format registry.

• Information contained within < formatRegistryName> identifies the format registry that was used.

• Information contained within < formatRegistryKey> refers to the “unique key used to reference an entry for this format in a format registry”

 <formatNote> contains additional information about the format

• For example

PREMIS Object Example

<objectCharacteristics>

<compositionLevel>0</compositionLevel>

<fixity>

<messageDigestAlgorithm>SHA-256</messageDigestAlgorithm>

<messageDigest>4977070b92f0bb2642c6be368ad68a8d1d1c5dbbb3310544db781f56a860b0a1</messageDigest>

<messageDigestOriginator>FDsys</messageDigestOriginator>

</fixity>

<size>9326</size>

<format>

<formatDesignation>

<formatName>text/plain</formatName>

</formatDesignation>

<formatRegistry>

<formatRegistryName>PRONOM</formatRegistryName>

<formatRegistryKey>x-fmt/111</formatRegistryKey>

</format>

</objectCharacteristics>

</formatRegistry>

<formatNote>Plain Text File</formatNote>

PREMIS Object Entity

• Other units of information that can be recorded includes: o The original name of the object (prior to being named by the repository) under <originalName> o Information about where and how a files are stored in the repository under <storage>

 <contentLocation> stores information needed to retrieve a file from a storage system.

• Information contained within <contentLocationType> refers to the way of accessing the location of the content.

• Information contained within <contentLocationValue> refers to the “location of the content used by the storage system”.

 The medium on which an object is stored is contained within

<storageMedium>

PREMIS Object Entity

o Information describing a relationship between an object and one or more objects.

 <relationshipType> classifies the nature of the relationship.

 <relationshipSubType> characterizes the nature of the relationship.

 <relatedObjectIdentification> refers to “the identifier of the related resource”.

 Information contained within <relatedObjectIdentifierType> refers to the classification of the domain that creates the identifier.

 Information contained within <relatedObjectIdentifierValue> refers to “the value of the identifier”.

PREMIS Object Example

<originalName>S3880IS.txt</originalName>

<storage>

<contentLocation>

<contentLocationType>URI</contentLocationType>

<contentLocationValue>file:/u02/app/emc/documentum/data/fdsysprod1/fdsysprod1/content_storage_0

1/00002ee1/80/55/b0/48.txt</contentLocationValue>

</contentLocation>

<storageMedium>hard disk</storageMedium>

</storage>

<relationship>

<relationshipType>structural</relationshipType>

<relationshipSubType>is part of</relationshipSubType>

<relatedObjectIdentification>

</relatedObjectIdentification>

</relationship>

</object>

<relatedObjectIdentifierType>FDsys ACP</relatedObjectIdentifierType>

<relatedObjectIdentifierValue>R0b002ee180b003b0</relatedObjectIdentifierValue>

PREMIS Event Entity

• Events – refers to actions that involve an object and an agent known to the system o Events are critical for maintaining the digital provenance of an object (helps demonstrates the authenticity of the object)

• Examples of Events: o modifying an document o actions that create new relationships

 Object could be related to another object as a result of a particular event, for instance if a program takes file 1 and generates a different version known as file 2 o Actions that check the validity and integrity of the objects (i.e. virus scan)

PREMIS Event Entity

• Sample syntax

<event> </event>

• The information that can be recorded under event includes: o A unique identifier for the event under <eventIdentifier>

 The <eventIdentifierType> refers to the classification of the domain that creates the event identifier . The <eventIdentiferValue refers to the value of the event identifier.

o The type of event under <eventType>

 Classifies the nature of the event.

o Date, time and type of event under <eventDateTime>

PREMIS Event Entity

• Additional information that can be recorded under event includes: o Detail description of the event under <eventDetail> o The outcome of the event under <eventOutcomeInformation>

 Indicates if the event was a success, partial success, or failure.

o Agents involved in the event and their specific roles under <linkingAgentIdentifier>

 The <linkingAgentIdentifierType> refers to the classification of the domain that creates the linking agent identifier. The <linkingAgentIdentifierValue> refers to the

“value of the linking agent identifier”. The <linkingAgentRole> indicates the role of the agent associated to the event.

 Agents role are defined here because agents can perform different roles in different events o Objects involved in the event and their specific roles under <linkingObjectIdentifier>

 The <linkingObjectIdentifierType> refers to the classification of the domain that creates the linking object identifier. The <linkingObjectIdentifierValue> refers to the

“value of the linking object identifier”. The <linkingObjectRole> indicates the role of the object associated to the event.

PREMIS Event Example

<event>

<eventIdentifier>

<eventIdentifierType>FDsys:event</eventIdentifierType>

<eventIdentifierValue>1cdd2b6c-5a2d-449b-b386-ebb15eb4af11</eventIdentifierValue>

</eventIdentifier>

<eventType>Rendition Submitted</eventType>

<eventDateTime>2010-10-06T19:38:47-04:00</eventDateTime>

<eventDetail>Rendition R0b002ee180b003b0, uploaded by hotfolderadmin, was submitted in the Submission Information package

P0b002ee180b003af</eventDetail>

<eventOutcomeInformation>

<eventOutcome>Success</eventOutcome>

</eventOutcomeInformation>

<linkingAgentIdentifier>

<linkingAgentIdentifierType>FDsys:agent</linkingAgentIdentifierType>

<linkingAgentIdentifierValue>hotfolderadmin</linkingAgentIdentifierValue>

<linkingAgentRole>implementer</linkingAgentRole>

</linkingAgentIdentifier>

<linkingObjectIdentifier>

<linkingObjectIdentifierType>FDsys</linkingObjectIdentifierType>

<linkingObjectIdentifierValue>R0b002ee180b003b0</linkingObjectIdentifierValue>

<linkingObjectRole>outcome</linkingObjectRole>

</linkingObjectIdentifier>

</event>

PREMIS Agent Entity

Agents – refer to people, organizations, or software associated with events, more specifically preservation events, of an object

o

In the data model diagram, there is no arrow from

Agent entity to the Object entity, that is because

Agents influence Objects indirectly through

Events.

PREMIS Agent Entity

• Sample syntax

<agent> </agent>

• The information that can be recorded under agent includes: o A unique identifier for the agent under <agentIdentifier>

 Information contained within <agentIdentifierType> refers to the classification of the domain that creates the agent identifier.

 Information contained within <agentIdentifierValue> refers

“value of the agent identifier”.

o The agent’s name under <agentName> o The type of agent (people, organization or software) under

<agentType>

PREMIS Agent Example

<agent>

<agentIdentifier>

<agentIdentifierType>FDsys:agent</agentIdentifierType>

<agentIdentifierValue>hotfolderadmin</agentIdentifierValue>

</agentIdentifier>

<agentName>hotfolderadmin</agentName>

<agentType>Person</agentType>

</agent>

PREMIS Rights Entity

• Rights – refers to the rights and permission that are directly relevant to preserving objects

• Sample syntax

<rights> </rights>

• The information that can be recorded under right includes: o A unique identifier for the rights statement o The action(s) that the rights statement allows o The object(s) to which the statement applies o The agents involved in the rights statements and their roles

• Note: Keep in mind that FDsys doesn’t use <rights>

premis.xml Header

• xmlns - refers to a namespace, which is a unique value (Note: the xml parser does not use the namespace URI to look up information)

• xmlns:xsi – indicates to the XML parser that this document should be validated against a schema

• xsi:schemaLocation: the first value refers to the namespace that will be used and the second value refers to the location of the schema that will be used, in this case it is the MODS XML schema.

• version: refers to the PREMIS version

• Example:

<premis xmlns="info:lc/xmlns/premis-v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2

http://www.loc.gov/standards/premis/premis.xsd" version="2.0">

Additional Information On Using premis.xml

• When will premis.xml be used by METS (aip.xml) ?

o PREMIS digital object that requires its components to be organized so that the integrity is preserved, this is known as structural metadata, will use METS to accomplished this.

o METS uses a pointer to the metadata that is located outside of the METS document. More specifically, it uses a xlink:href to indicate the location of such file. o Example code from aip.xml:

<!-- PREMIS OBJECT -->

<mets:amdSec ID="AMD_OTHER">

<mets:techMD ID="D09002ee180affcca-TEC">

<mets:mdRef ID="M09002ee180affcca-tdiv" MDTYPE="PREMIS" MIMETYPE="text/xml"

LOCTYPE="URL" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:/premis.xml" />

</mets:techMD>

<mets:digiprovMD ID="D09002ee180affcca-DIG">

<mets:mdRef ID="M09002ee180affcca-ddiv" MDTYPE="PREMIS" MIMETYPE="text/xml"

LOCTYPE="URL" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="file:/premis.xml" />

</mets:digiprovMD>

</mets:amdSec>

Additional Information On Using premis.xml

o Mets will use a struct map to organize the components of the premis object.

o Example code from aip.xml:

<mets:structMap>

<mets:div ID="R0b002ee180b0044c-div" LABEL="xml-submitted">

<mets:fptr FILEID="D09002ee180b00449" />

<mets:div ID="R0b002ee180b00452-div" LABEL="Graphic Support Documents">

<mets:fptr FILEID="D09002ee180affca3" />

<mets:fptr FILEID="D09002ee180b0045b" />

<mets:fptr FILEID="D09002ee180b00464" />

<mets:fptr FILEID="D09002ee180b0046e" />

<mets:fptr FILEID="D09002ee180b00477" />

<mets:fptr FILEID="D09002ee180b0047c" />

<mets:fptr FILEID="D09002ee180b00483" />

<mets:fptr FILEID="D09002ee180b00493" />

<mets:fptr FILEID="D09002ee180b00499" />

<mets:fptr FILEID="D09002ee180b004a4" />

<mets:fptr FILEID="D09002ee180b004af" />

<mets:fptr FILEID="D09002ee180b004bc" />

<mets:fptr FILEID="D09002ee180b004c6" />

</mets:div>

</mets:div>

<mets:fptr FILEID="D09002ee180b004ce" />

<mets:fptr FILEID="D09002ee180b004cf" />

</mets:structMap>

References

Understanding PREMIS

o

http://www.loc.gov/standards/premis/understand ing-premis.pdf

Data Dictionary for Preservation Metadata

o

http://www.oclc.org/research/activities/past/orpr ojects/pmwg/premis-final.pdf

W3C Schools

o

http://www.w3schools.com/xml/default.asp

Download