LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel 2011-10-13 foreword • Terminology is one of the difficult problems in digital librarianship. • I will use the double quotes here to represent a term that is used as it is in omeka. • Please open your winscp, omeka web, web admin and phpmyadmin. items • In omeka, you store “items”. • Item are either digital resources – images – video • or something non-digital of which your are storing a digital representation of – person – event table: “items” • It stores data about each “item” – “id” of the item, an autoincrement – “item_type_id”, number | +1 slide – “collection_id”, number | +2 slide – whether it is “featured”, a Boolean – whether it is “public”, a Boolean – when last “modified”, time – when “added”, time table: “item_types” • Each item is of one type. Types are described in the “item_types” table, with the columns – “id” an autoincrement – “name” the name of the item type, string – “description” a longer explanation what the item type means. • Each record in the “item” table references an id in the type. table: “collections” • Each collection is described here – – – – – – – – – “id” auto_increment “name”, a string “description”, a string users who are “collectors”, a string |not further discussed| whether it is “public”, a Boolean whether it is “featured”, a Boolean when “added”, a time when last “modified”, a time the “owner_id” |not further discussed| items • An items has two aspects to it. – There is the metadata about the items. – There is the item itself. This is in fact a collection of “file”s. table: “files” |1| • The fields in that table are – “id” auto_increment – “item_id” of the item the file attaches to – its “size” in bytes – if it “has_derivative_image” a Boolean – the time last “modified”, a time – the time it was “added”, a time – if it was “stored”, a Boolean table: “files” |2| • More fields of this table – the “authentication” an checksum of the path to the file – the “mime_browser”, a mime type as sent to browser – the “mime_os”, the mime type as determined by the omeka installation, using an external application – the “type_os”, the file type as determined by the omeka installation, using an external application – the “archive_filename”, a random file name – the “original_filename”, filename or URL of origin file storage • The “archive” directory stores files. • The original is in “files”. • Derivative files are in – “thumbnails” – “fullsize” – “square_thumbnails” • I don’t know why the original size is not the full size. metadata • Metadata is a descriptions that can be attached to a “record”. • A record is an aggregate concept that groups “files” and “items”. • Metadata is a set of attribute/value pairs. The attributes are called “elements”. table: “elements” • We start with the “elements” table. It contains all the properties one can attach to anything – an “id” auto_increment – a “record_type_id”, the id of a “record_type” |+1 – a “data_type_id”, the id of a “data_type” |+2 – an “element_set_id”, id of an “element_set” |+3 – an “order” that appears always to be null, unused – a “name” for the property – a “description” containing the fill-in instructions. table: “record_types” This table contains three records id | name | description 1 | All | Elements, element sets, and element texts assigned to this record type relate to all possible records. 2 | Item | Elements, element sets, and element texts assigned to this record type relate to item records. 3 | File | Elements, element sets, and element texts assigned to this record type relate to file records. table: “data_types” Only contains these records id | name | description 1 | Text | A long, typically multi-line text string. Up to 65535 characters. 2 | Tiny Text | A short, typically one-line text string. Up to 255 characters. 3 | Date Range | A date range, begin to end. In format yyyymm-dd yyyy-mm-dd. 4 | Integer | Set of numbers consisting of the natural numbers including 0 (0, 1, 2, 3, ...) and their negatives (0, 1, -2, -3, ...). 9 | Date | A date in format yyyy-mm-dd 10| Date Time | A date and time combination in the format: yyyy-mm-dd hh:mm:ss table: “element_sets” Only contains these records id| record_type_id | name | description 1 |1 | Dublin Core | … “common to all Omeka resources, including items, files, collections, exhibits, and entities” … 3 | 2 | Item Type Metadata | “The item type metadata element set, consisting of all item type elements bundled with Omeka and all item type elements created by an administrator.” 4 | 3 | Omeka Legacy File | … 5 | 3 | Omeka Image File | … “previous versions” … 6 | 3 | Omeka Video File | … “previous versions” … clearing the legacy • Since element sets 4, 5, and 6 are legacy, I believe the elements in them can be deleted mysql> delete from omeka_elements where element_set_id=4; Query OK, 13 rows affected (0.00 sec) mysql> delete from omeka_elements where element_set_id=5; Query OK, 8 rows affected (0.00 sec) mysql> delete from omeka_elements where element_set_id=6; Query OK, 6 rows affected (0.00 sec) clearing the legacy • All these elements had record_type_id=3, now gone – mysql> select * from omeka_elements where record_type_id=3; – Empty set (0.00 sec) • The record type can be deleted – mysql> delete from omeka_record_types where id=3; – Query OK, 1 row affected (0.02 sec) clearing the legacy • Element sets 4, 5, and 6 can be deleted mysql> delete from omeka_element_sets where id=4; Query OK, 1 row affected (0.01 sec) mysql> delete from omeka_element_sets where id=5; Query OK, 1 row affected (0.01 sec) mysql> delete from omeka_element_sets where id=6; Query OK, 1 row affected (0.00 sec) • Things start to make a lot more sense. item-type specific metadata • You can create data elements (aka metadata fields) for a specific item. • You can not however, share these fields across item. • So if you want to express the “geekiness” of your items, you have to add “geekiness” as an element for each item separately. omeka tags • A tag is a way for omeka to group items together. • Each item can have multiple tags and each tag can item can have multiple tags. • For LIS650 veterans, it’s like grouping HTML elements in the <body> into classes. table: tags • Each tag is recorded in this table. It has only two columns – “id”, an autoincrement identifier – the “name” a string up to 256 characters long • This table stores all the tags. table: taggings • This table has the following columns – “id” an auto_increment – “relation_id” gives the id of the item that has been tagged. – “tag_id” gives the number of the tag being given – “entity_id” |who did it?, not further discussed – “type”, a type of action taken, not further discussed. – “time” a timestamp when the action happened. Dublin Core data • Dublin Core is a metadata set that is used in omeka. • This is the common set for all types. • We need to review the official meaning of these elements here. • I quote from Hillman’s Dublin core usage guide. http://dublincore.org/documents/usageguide /elements.shtml dublin core: title • “The name given to the resource. Typically, a Title will be a name by which the resource is formally known.” • “If in doubt about what constitutes the title, repeat the Title element.” dublin core: subject • “The topic of the content of the resource. Typically, a Subject will be expressed as keywords or key phrases or classification codes that describe the topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.” dublin core: description • “An account of the content of the resource. Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a freetext account of the content.” • “Use full sentences.” dublin core: type • “The nature or genre of the content of the resource. Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMIType vocabulary ). To describe the physical or digital manifestation of the resource, use the FORMAT element.” dublin core: source • “A Reference to a resource from which the present resource is derived. The present resource may be derived from the Source resource in whole or part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system”… “include in this area information about a resource that is related intellectually to the described resource but does not fit easily into a Relation element.” dublin core: relation • “A reference to a related resource. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.” dublin core: coverage • “The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary. dublin core: creator • “An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organization, or a service. Typically the name of the Creator should be used to indicate the entity.” • “Creators should be listed separately, preferably in the same order that they appear in the publication.” dublin core: publisher • “The entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.” • “The intent of specifying this field is to identify the entity that provides access to the resource. “ dublin core: contributor • An entity responsible for making contributions to the content of the resource. Examples of a Contributor include a person, an organization or a service. Typically, the name of Contributor should be used”. • “The same general guidelines for using names of persons or organizations as Creators apply here.” dublin core: rights • “Information about rights held in and over the resource. Typically a Rights element will contain a rights management statement for the resource, or reference a service providing such information.” • “Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.” dublin core: date • “A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601” “and follows the YYYY-MM-DD format.” dublin core: format • “The physical or digital manifestation of the resource. Typically, Format may include the media-type or dimensions of the resource. Examples of dimensions include size and duration.” • “Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [http://www.iana.org/ assignments/mediatypes/]” dublin core: identifier • “An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Examples of formal identification systems include the Uniform Resource Identifier (URI)” … dublin core: language • “A language of the intellectual content of the resource. Recommended best practice for the values of the Language element is defined by RFC 3066 [RFC 3066, http://www.ietf.org/rfc/ rfc3066.txt] which, in conjunction with ISO 639 [ISO 639, http://www.oasisopen.org/cover/iso639a.html]), defines twoand three-letter primary language tags with optional subtags.” item type specific metadata • There are a bunch of different types that are built-in. • Each type takes Dublin Core metadata as well as some extra metadata • These item-specific metadata fields can be changed using the web interface. omeka item types • Document A resource containing textual data. • Moving Image A series of visual representations that, when shown in succession, impart an impression of motion. • Oral History A resource containing historical information obtained in interviews with persons having firsthand knowledge. omeka item types • Sound A resource whose content is primarily intended to be rendered as audio. • Still Image A static visual representation. Examples of still images are: paintings, drawings, graphic designs, plans and maps. • Website A resource comprising of a web page or web pages and all related assets ( such as images, sound and video files, etc. ). omeka item types • Event A non-persistent, time-based occurrence. Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, and responsible agents associated with an event. Examples include an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration. omeka item types • Email A resource containing textual messages and binary attachments sent electronically from one person to another or one person to many people. • Lesson Plan Instructional materials. • Hyperlink Title, URL, Description or annotation. omeka item types • Person An individual, biographical data, birth and death, etc. • Interactive Resource A resource requiring interaction from the user to be understood, executed, or experienced. Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments omeka items and files • Omeka has items. • Omeka also has files. They attach to items. • Omeka sometimes needs to collectively refer to items and types. – In the database tables, the aggregate is called “records”. This is confusing. – I will call them “itofis” here. • This allows me to discuss the tables. http://openlib.org/home/krichel Please shutdown the computers when you are done. Thank you for your attention!