omeka metadata and tables

advertisement
LIS654 lecture 5
DC metadata and omeka tables
Thomas Krichel
2011-10-13
foreword
• Terminology is one of the difficult problems in
digital librarianship.
• I will use the double quotes here to represent
a term that is used as it is in omeka.
• Please open your winscp, omeka web, web
admin and phpmyadmin.
items
• In omeka, you store “items”.
• Item are either digital resources
– images
– video
• or something non-digital of which your are
storing a digital representation of
– person
– event
table: “items”
• It stores data about each “item”
– “id” of the item, an autoincrement
– “item_type_id”, number | +1 slide
– “collection_id”, number | +2 slide
– whether it is “featured”, a Boolean
– whether it is “public”, a Boolean
– when last “modified”, time
– when “added”, time
table: “item_types”
• Each item is of one type. Types are described
in the “item_types” table, with the columns
– “id” an autoincrement
– “name” the name of the item type, string
– “description” a longer explanation what the item
type means.
• Each record in the “item” table references an
id in the type.
table: “collections”
• Each collection is described here
–
–
–
–
–
–
–
–
–
“id” auto_increment
“name”, a string
“description”, a string
users who are “collectors”, a string |not further discussed|
whether it is “public”, a Boolean
whether it is “featured”, a Boolean
when “added”, a time
when last “modified”, a time
the “owner_id”
|not further discussed|
items
• An items has two aspects to it.
– There is the metadata about the items.
– There is the item itself. This is in fact a collection
of “file”s.
table: “files” |1|
• The fields in that table are
– “id” auto_increment
– “item_id” of the item the file attaches to
– its “size” in bytes
– if it “has_derivative_image” a Boolean
– the time last “modified”, a time
– the time it was “added”, a time
– if it was “stored”, a Boolean
table: “files” |2|
• More fields of this table
– the “authentication” an checksum of the path to the
file
– the “mime_browser”, a mime type as sent to browser
– the “mime_os”, the mime type as determined by the
omeka installation, using an external application
– the “type_os”, the file type as determined by the
omeka installation, using an external application
– the “archive_filename”, a random file name
– the “original_filename”, filename or URL of origin
file storage
• The “archive” directory stores files.
• The original is in “files”.
• Derivative files are in
– “thumbnails”
– “fullsize”
– “square_thumbnails”
• I don’t know why the original size is not the
full size.
metadata
• Metadata is a descriptions that can be
attached to a “record”.
• A record is an aggregate concept that groups
“files” and “items”.
• Metadata is a set of attribute/value pairs. The
attributes are called “elements”.
table: “elements”
• We start with the “elements” table. It contains
all the properties one can attach to anything
– an “id” auto_increment
– a “record_type_id”, the id of a “record_type” |+1
– a “data_type_id”, the id of a “data_type” |+2
– an “element_set_id”, id of an “element_set” |+3
– an “order” that appears always to be null, unused
– a “name” for the property
– a “description” containing the fill-in instructions.
table: “record_types”
This table contains three records
id | name | description
1 | All | Elements, element sets, and element texts
assigned to this record type relate to all possible
records.
2 | Item | Elements, element sets, and element
texts assigned to this record type relate to item
records.
3 | File | Elements, element sets, and element texts
assigned to this record type relate to file records.
table: “data_types”
Only contains these records
id | name | description
1 | Text
| A long, typically multi-line text string. Up to
65535 characters.
2 | Tiny Text | A short, typically one-line text string. Up to 255
characters.
3 | Date Range | A date range, begin to end. In format yyyymm-dd yyyy-mm-dd.
4 | Integer | Set of numbers consisting of the natural
numbers including 0 (0, 1, 2, 3, ...) and their negatives (0, 1, -2, -3, ...).
9 | Date
| A date in format yyyy-mm-dd
10| Date Time | A date and time combination in the format:
yyyy-mm-dd hh:mm:ss
table: “element_sets”
Only contains these records
id| record_type_id | name | description
1 |1 | Dublin Core | … “common to all Omeka resources,
including items, files, collections, exhibits, and entities” …
3 | 2 | Item Type Metadata | “The item type metadata
element set, consisting of all item type elements bundled
with Omeka and all item type elements created by an
administrator.”
4 | 3 | Omeka Legacy File | …
5 | 3 | Omeka Image File | … “previous versions” …
6 | 3 | Omeka Video File | … “previous versions” …
clearing the legacy
• Since element sets 4, 5, and 6 are legacy, I
believe the elements in them can be deleted
mysql> delete from omeka_elements where
element_set_id=4;
Query OK, 13 rows affected (0.00 sec)
mysql> delete from omeka_elements where
element_set_id=5;
Query OK, 8 rows affected (0.00 sec)
mysql> delete from omeka_elements where
element_set_id=6;
Query OK, 6 rows affected (0.00 sec)
clearing the legacy
• All these elements had record_type_id=3, now
gone
– mysql> select * from omeka_elements where
record_type_id=3;
– Empty set (0.00 sec)
• The record type can be deleted
– mysql> delete from omeka_record_types where
id=3;
– Query OK, 1 row affected (0.02 sec)
clearing the legacy
• Element sets 4, 5, and 6 can be deleted
mysql> delete from omeka_element_sets where
id=4;
Query OK, 1 row affected (0.01 sec)
mysql> delete from omeka_element_sets where
id=5;
Query OK, 1 row affected (0.01 sec)
mysql> delete from omeka_element_sets where
id=6;
Query OK, 1 row affected (0.00 sec)
• Things start to make a lot more sense.
item-type specific metadata
• You can create data elements (aka metadata
fields) for a specific item.
• You can not however, share these fields across
item.
• So if you want to express the “geekiness” of
your items, you have to add “geekiness” as an
element for each item separately.
omeka tags
• A tag is a way for omeka to group items
together.
• Each item can have multiple tags and each tag
can item can have multiple tags.
• For LIS650 veterans, it’s like grouping HTML
elements in the <body> into classes.
table: tags
• Each tag is recorded in this table. It has only
two columns
– “id”, an autoincrement identifier
– the “name” a string up to 256 characters long
• This table stores all the tags.
table: taggings
• This table has the following columns
– “id” an auto_increment
– “relation_id” gives the id of the item that has
been tagged.
– “tag_id” gives the number of the tag being given
– “entity_id” |who did it?, not further discussed
– “type”, a type of action taken, not further
discussed.
– “time” a timestamp when the action happened.
Dublin Core data
• Dublin Core is a metadata set that is used in
omeka.
• This is the common set for all types.
• We need to review the official meaning of
these elements here.
• I quote from Hillman’s Dublin core usage
guide.
http://dublincore.org/documents/usageguide
/elements.shtml
dublin core: title
• “The name given to the resource. Typically, a
Title will be a name by which the resource is
formally known.”
• “If in doubt about what constitutes the title,
repeat the Title element.”
dublin core: subject
• “The topic of the content of the resource.
Typically, a Subject will be expressed as
keywords or key phrases or classification
codes that describe the topic of the resource.
Recommended best practice is to select a
value from a controlled vocabulary or formal
classification scheme.”
dublin core: description
• “An account of the content of the resource.
Description may include but is not limited to:
an abstract, table of contents, reference to a
graphical representation of content or a freetext account of the content.”
• “Use full sentences.”
dublin core: type
• “The nature or genre of the content of the
resource. Type includes terms describing
general categories, functions, genres, or
aggregation levels for content. Recommended
best practice is to select a value from a
controlled vocabulary (for example, the
DCMIType vocabulary ). To describe the
physical or digital manifestation of the
resource, use the FORMAT element.”
dublin core: source
• “A Reference to a resource from which the
present resource is derived. The present
resource may be derived from the Source
resource in whole or part. Recommended best
practice is to reference the resource by means
of a string or number conforming to a formal
identification system”… “include in this area
information about a resource that is related
intellectually to the described resource but
does not fit easily into a Relation element.”
dublin core: relation
• “A reference to a related resource.
Recommended best practice is to reference
the resource by means of a string or number
conforming to a formal identification system.”
dublin core: coverage
• “The extent or scope of the content of the
resource. Coverage will typically include
spatial location (a place name or geographic
co-ordinates), temporal period (a period label,
date, or date range) or jurisdiction (such as a
named administrative entity). Recommended
best practice is to select a value from a
controlled vocabulary.
dublin core: creator
• “An entity primarily responsible for making
the content of the resource. Examples of a
Creator include a person, an organization, or a
service. Typically the name of the Creator
should be used to indicate the entity.”
• “Creators should be listed separately,
preferably in the same order that they appear
in the publication.”
dublin core: publisher
• “The entity responsible for making the
resource available. Examples of a Publisher
include a person, an organization, or a service.
Typically, the name of a Publisher should be
used to indicate the entity.”
• “The intent of specifying this field is to identify
the entity that provides access to the
resource. “
dublin core: contributor
• An entity responsible for making contributions
to the content of the resource. Examples of a
Contributor include a person, an organization
or a service. Typically, the name of Contributor
should be used”.
• “The same general guidelines for using names
of persons or organizations as Creators apply
here.”
dublin core: rights
• “Information about rights held in and over the
resource. Typically a Rights element will
contain a rights management statement for
the resource, or reference a service providing
such information.”
• “Rights information often encompasses
Intellectual Property Rights (IPR), Copyright,
and various Property Rights. If the rights
element is absent, no assumptions can be
made about the status of these and other
rights with respect to the resource.”
dublin core: date
• “A date associated with an event in the life
cycle of the resource. Typically, Date will be
associated with the creation or availability of
the resource. Recommended best practice for
encoding the date value is defined in a profile
of ISO 8601” “and follows the YYYY-MM-DD
format.”
dublin core: format
• “The physical or digital manifestation of the
resource. Typically, Format may include the
media-type or dimensions of the resource.
Examples of dimensions include size and
duration.”
• “Recommended best practice is to select a value
from a controlled vocabulary (for example, the
list of Internet Media Types
[http://www.iana.org/ assignments/mediatypes/]”
dublin core: identifier
• “An unambiguous reference to the resource
within a given context. Recommended best
practice is to identify the resource by means
of a string or number conforming to a formal
identification system. Examples of formal
identification systems include the Uniform
Resource Identifier (URI)” …
dublin core: language
• “A language of the intellectual content of the
resource. Recommended best practice for the
values of the Language element is defined by
RFC 3066 [RFC 3066, http://www.ietf.org/rfc/
rfc3066.txt] which, in conjunction with ISO
639 [ISO 639, http://www.oasisopen.org/cover/iso639a.html]), defines twoand three-letter primary language tags with
optional subtags.”
item type specific metadata
• There are a bunch of different types that are
built-in.
• Each type takes Dublin Core metadata as well
as some extra metadata
• These item-specific metadata fields can be
changed using the web interface.
omeka item types
• Document
A resource containing textual
data.
• Moving Image A series of visual
representations that, when shown in
succession, impart an impression of motion.
• Oral History A resource containing historical
information obtained in interviews with
persons having firsthand knowledge.
omeka item types
• Sound A resource whose content is
primarily intended to be rendered as audio.
• Still Image
A static visual representation.
Examples of still images are: paintings,
drawings, graphic designs, plans and maps.
• Website A resource comprising of a web page
or web pages and all related assets ( such as
images, sound and video files, etc. ).
omeka item types
• Event A non-persistent, time-based
occurrence. Metadata for an event provides
descriptive information that is the basis for
discovery of the purpose, location, duration,
and responsible agents associated with an
event. Examples include an exhibition,
webcast, conference, workshop, open day,
performance, battle, trial, wedding, tea party,
conflagration.
omeka item types
• Email A resource containing textual
messages and binary attachments sent
electronically from one person to another or
one person to many people.
• Lesson Plan Instructional materials.
• Hyperlink
Title, URL, Description or
annotation.
omeka item types
• Person An individual, biographical data, birth
and death, etc.
• Interactive Resource
A resource requiring
interaction from the user to be understood,
executed, or experienced. Examples include
forms on Web pages, applets, multimedia
learning objects, chat services, or virtual
reality environments
omeka items and files
• Omeka has items.
• Omeka also has files. They attach to items.
• Omeka sometimes needs to collectively refer
to items and types.
– In the database tables, the aggregate is called
“records”. This is confusing.
– I will call them “itofis” here.
• This allows me to discuss the tables.
http://openlib.org/home/krichel
Please shutdown the computers when
you are done.
Thank you for your attention!
Download