Archiving - Hans Rausing Endangered Languages Project

advertisement
Archiving
David Nathan
Endangered Languages ARchive (ELAR)
one of 3 semi-autonomous programs of the
Hans Rausing Endangered Languages Project
staff of 3; archivist, software developer,
technician, (research assistants etc)
develop preservation infrastructure,
cataloguing and dissemination; policies;
facilities; training and advice; materials
development and publishing
What is a digital language archive?
A trusted repository created and maintained by
an institution with a commitment to the longterm preservation of archived material
Will have policies and processes for materials
acquisition, cataloguing, preservation,
dissemination, migration to new digital formats
A collection of managed materials
What is archiving of language materials?
Preparing materials in a structured form suitable
for long-term preservation
Creating long-term relationships
It is not backup
It is not dissemination/publication
It should not impinge on good linguistic practice
Kinds of language archives
Many cross-cutting classifications:
Indigenous vs outsider, eg. Squamish Nation
Regional vs international, eg. AILLA, Paradisec;
DoBeS, ELAR
Associated with research institute, eg. AIATSIS, ANLC
Granter-funded, eg. DoBeS, ELAR, OTA
Digital vs physical vs mixed, eg. DoBeS vs Vienna
Sound Archive, ANLC
Potential users
Speakers and their descendants - up to 95% of
users of UCB are community members
Depositors - to create or renew materials
Other researchers - comparative/historical
linguists, typologists, theoreticians,
anthropologists, historians, musicologists etc etc
Other “stakeholders”, eg educationalists
Journalists and the wider public
Archives networks
Digital Endangered Languages and Archives
Network (DELAMAN)
ELAR, DOBES, ANLC, Paradisec, EMELD,
LACITO, AIATSIS, AMPM (Maori)
Open Language Archives Community (OLAC)
Others, eg. D-LIB
http://www.dlib.org/
Open Archives Initiative
Archive architectures
afd_34
afd_34
dfa dfadf
dfa dfadf
fds fdafds
fds fdafds
afd_34
dfa dfadf
fds fdafds
Producers
Ingestion
afd_34
afd_34
dfa dfadf
dfa dfadf
fds fdafds
fds fdafds
Archive
Dissemination
Designated
communities
Archive architectures
afd_34
afd_34
dfa dfadf
dfa dfadf
fds fdafds
fds fdafds
afd_34
dfa dfadf
fds fdafds
Producers
Ingestion
afd_34
afd_34
dfa dfadf
dfa dfadf
fds fdafds
fds fdafds
Archive
Dissemination
Designated
communities
The archive needs to define three types of ‘packages’:
ingestion, archive and dissemination.
The way we were ...
ASEDA Aboriginal Studies Electronic Data
Archive at AIATSIS Canberra (modeled on
Oxford Text Archive)
opportunistically collect and catalogue
electronic materials that were at risk or not
accessible
lexica
grammars
texts
etc
How things have changed ..
types of data (modalities and some genres)
means of storage
standardisation and metadata
dissemination
(most explosive) expanded into practice and
workflow of linguists
What can a language archive offer?
 Security - keep your electronic materials safe
 Preservation - store your materials for the long term
 Discovery - help others to find out about your materials
 Protocols - respect and implement sensitivities, restrictions
 Sharing - share results of your work, if appropriate
 Acknowledgement - create citable acknowledgement
 Mobilisation - create usable language materials for
communities
 Quality and standards - advice for assuring your materials
are of the highest quality and robust standards
Preservation issues
making materials robust
making storage robust
organisational, ownership and policy issues
changing technologies
refreshing
migrating
Changing technologies
data
data model
 advantages of digital preservation
 based around copying
 also transmission, dissemination
software
 implications
robust formats (standard, open, explicit)
formats with long horizons
formats easy to refresh
formats that don’t require particular software (but
distinguish where software is intrinsic)
may have to describe software or even archive the
software
Two preservation models
 “preserving the bytestream”
 LOCKSS: “lots of copies keep stuff safe”
http://lockss.stanford.edu/
guess which community it came from!
 (plus ...) distributed archiving
Some backup issues
risk management
undetected problems
useless backups
under some circumstances, ELAR may provide
backup
Documenter & archive interactions
Grant formulation and application
Communications, questions, advice
Training
Archiving
Documenter & archive interactions
Documenter & archive interactions
What can you archive?
Media - sound, video
Graphics - images, scans
Text - fieldnotes, grammars, description,
analysis
Structured data - aligned and annotated
transcriptions, databases, lexica
Metadata - structured, standardised contextual
information about the materials
Data portability
Bird and Simons 2003:
(for language documentation) our data needs
to have integrity, flexibility, longevity and broad
utility
Data portability
complete
explicit
documented
preservable
transferable
accessible
adaptable
not technology-specific
(also appropriate, accurate, useful etc!!)
Formats






sound - WAV
image - BMP, TIFF, JPEG. See full advice about images
video - MPEG2
text - plain text, with or without markup
documents - plain text, PDF or postscript
structured text - XML, other markup (with description of markup
system)
 structured data in commonly available Office formats - ELAR will
convert them to archive-suitable formats
 character encoding :
 preferred encoding is ASCII or Unicode
 clearly document any other encodings used, e.g. ISO 8859-5
 discuss with us if you use font substitution to handle non-Roman
characters
Basic management points
 filenames - extension, ASCII, nospaces, check
capitalisation
 use directory structures wisely
 versions
 distinguish formats - working, presentation, and
archiving
 handling of characters, fonts, character sets
 file encoding and format (many file types are ASCII)
 metadata!
Data format duty cycle examples
Raw
Video
DVI
Working
Interchange
Archive
Dissemination
softwarespecific
MPEG-2
MPEG-2
MPEG2, AVI, QT
Fieldnotes Shoebox
Shoebox
FOSF
XML
WWW, print
dictionary
Audio
ATRAC
WAV
WAV
BWF
MP3
Complex
data
multiple
FM Pro
database
RTF, XML
XML
Interactive
application
Multimodal
multiple
multiple
as above
as above
Multimedia
application
page
Archive objects
 informed by traditions, eg document archives
 sometimes, simply called a “resource”
 it could be a file, a set of files, a directory, a
“session” or a coherent item with many parts
 should have archival qualities eg Bird & Simons “7
Dimensions” (or see Thieberger in LDD2)
 may impose standard structures or formats
 need deposit event and processes
 legal and protocol
 verification
 accession
 ongoing processes
Selection
Example: video: How much volume allocated?
Answer: ...
However:
unlikely that linguist is in position to plan and
consistently create excellent video, so selection is
unavoidable
data has always been selected!
(... selection)
you also:
selected
labeled
transformed/processed/edited
added, corrected, expanded
made links
made or assumed relationships between “whole”
and processed units; invented labels, IDs, scope etc
imposed formats
Examples
Characters
Did my characters come
through?
Answer: ...
há pa ki hená mázaska
wikcémna nú pa iyóphewa-ye ks t DBW
wóz?az?a-s?ni yeló DB OK
wash things-NEG ASS.M
perhaps ELAR should do it? 'he didn't do the wash'
However:
wóz az a-s ni yeló DB OK
wash things-NEG ASS.M
'he didn't do the wash'
Preservation
Is my file preservable?
Note:
characters?
inconsistent segmentation
data as comments
conventions/metadata
Text transcription: “Korimáka”
Language: Choguita Rarámuri
Language used for transcription: Spanish
Consultant: Luz Elena León Ramírez
Linguist: abriela Cabaero
Transcription: erth Fuen & Gabrela Cabaero
Date recorded: 11/02/2006
Date tranbscribed: 11/02/2006
Recording: rec6-LEL.wav
Knowledge representation 1 - before
wama momol chi naron mon chayako (LB) / wama momol chi naron chayako (MD)
wama momol chi nan mon chayako (more emphatic(LB) / wama momol chi nan chayako (MD)
Why don't you and him do it?
+ Notes have both of these sentences without the negator mon.
OK runon naynangkroy ile ri
He ate their sago.
* kipin kannangkroy ngolu
intended: We ate their cassowary.
OK kipin kanangkroy ngolu
We ate their cassowary.
Knowledge representation 1 - after
* kipin kannangkroy ngolu
intended: We ate their cassowary.
OK kipin kanangkroy ngolu
We ate their cassowary.
<sentence.set num="75">
<version>
<walman>Kipin kannangkroy ngolu</walman>
<judgement>*</judgement>
</version>
<english>We ate their cassowary. </english>
</sentence.set>
<sentence.set num="76">
<version>
<walman>Kipin kanangkroy ngolu</walman>
<judgement>OK</judgement>
</version>
<english>We ate their cassowary.</english>
</sentence.set>
Knowledge representation 2
<?xml version=“1.0” encoding=“UTF-8”?>
<FMPXMLRESULT xmlns=“http://www.filemaker.com/fmpxmlresult”>
<PRODUCT BUILD=“06/26/2002” NAME=“FileMaker Pro” VERSION=“6.0v2”/>
<DATABASE DATEFORMAT=“M/d/yyyy” LAYOUT=““ NAME=“Videos”
RECORDS=“13” TIMEFORMAT=“h:mm:ss a”/>
<METADATA>
<FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Index name” TYPE=“TEXT”/>
<FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Image desc” TYPE=“TEXT”/>
<FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Date” TYPE=“TEXT”/>
<FIELD EMPTYOK=“YES” MAXREPEAT=“1” NAME=“Content” TYPE=“TEXT”/>
</METADATA>
<RESULTSET FOUND=“13”>
<ROW MODID=“16” RECORDID=“40”>
<COL><DATA>Morly Beeta</DATA></COL>
<COL><DATA>Interview with Morly Beeta</DATA></COL>
<COL><DATA>Jan/13/05</DATA></COL>
<COL><DATA>Obu history by Morly Beeta</DATA></COL>
</ROW>
ELAR conversion - original
Language
Dialects
Speakers
Place recorded
Date recorded
Recording name
Duration
Recorded by
Recording equipment
Translated by
Transcribed by
Reviewed and corrected by
Unangam Tunuu [Aleut Language]
Qawalangin [Eastern Aleut]
Nii}u}i{ [Western Aleut]
Maria Turnpaugh, Nick Lekanoff, Clara Golodoff
Unalaska, AK. Ray Hudson Room, Unalaska Public Library.
7.21.04
UNAK2trk1
16:21 min.
Alice Taff
Marantz CDR 300 recorder with one flat filtered table-mounted cardiod
microphone. Also audio/video miniDV - Canon GL2.
Alice Taff with Maria Turnpaugh 000-493sec. Millie Prokopeuff 455-499sec.
Alice Taff
Moses Dirks
129
ET
Kamagala, afternoon
afternoon
135
CG
Aang
yes
136
ET
Sla{chxisaada{, ii? Nice weather.
nice weather
140
CG
Yeah. Maku{
that's all right
143
ET
Alqutaadaltxichin? How are you?
How are you all?
ELAR conversion - XHTML
<?xml version=“1.0” encoding=“UTF-8”?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
<html xmlns=“http://www.w3.org/1999/xhtml” xml:lang=“en” lang=“en”>
<head><title>ANC14trk1</title>
<link href=“taff.css” type=“text/css” rel=“stylesheet”></link></head><body>
<table class=“metadata”>
<tr><td>Language</td><td class=“language”>Unangax̌ (Aleut)</td></tr>
<tr><td>Dialect</td><td class=“dialect”>Niiĝuĝix̌ (Western Aleut)</td></tr>
<tr><td>Speakers</td><td class=“speaker”>Alice Petrivelli, Vera Snigaroff, Mary Snigaroff, Vivian Koenig</td></tr>
<tr><td>Place recorded</td><td class=“place”>Anchorage, Alaska </td></tr>
<tr><td>Date recorded</td><td class=“date”>Mar. 15, 2005</td></tr>
<tr><td>Recording name</td><td class=“rec_name”>ANC14trk1</td></tr>
<tr><td>Recorded by</td><td class=“rec_by”>Alice Taff, Piama Oleyer</td></tr>
<tr><td>Recording equipment</td><td class=“rec_equip”>Marantz CDR300 CD recorder with one flat-filtered, tablemounted cardioid microphone. </td></tr>
<tr><td>Translated/Transcribed by</td><td>Simeon L. Snigaroff, December 2005</td></tr>
</table>
<table class=“transcript”>
<tr><td class=“time”>1</td><td class=“speaker”>ap</td><td class=“transcription”>Uqlaĝiix̌, x̌aayax̌, uqlaĝil agach aliguutax̌
ax̌.</td></tr>
<tr><td> </td><td> </td><td class=“translation”>To take a bath, Steam bath, to take a bath is the one that is
Aleut</td></tr>
<tr><td> </td><td> </td><td> </td></tr>
<tr><td class=“time”>5</td><td class=“speaker”>vs</td><td class=“transcription”>uhmm</td></tr>
ELAR conversion - in browser
Language
Dialect
Speakers
Place recorded
Date recorded
Recording name
Recorded by
Recording equipment
Translated/Transcribed by
1 ap
Unangax̌ (Aleut)
Niiĝuĝix̌ (Western Aleut)
Alice Petrivelli, Vera Snigaroff, Mary
Snigaroff, Vivian Koenig
Anchorage, Alaska
Mar. 15, 2005
ANC14trk1
Alice Taff, Piama Oleyer
Marantz CDR300 CD recorder with
one flat-filtered, table-mounted
cardioid microphone.
Simeon L. Snigaroff, December 2005
Uqlaĝiix̌, x̌aayax̌, uqlaĝil agach aliguutax̌ ax̌.
To take a bath, Steam bath, to take a bath is the one that is Aleut
Deposit form
is on the web
Protocol
Sensitivities, restrictions: identification,
description and implementation
ELAR Deposit Form “Section C”
ELAR pays careful attention to any sensitivities
or restrictions that apply to any part of your
deposit. There are four ways that Access
Protocol is implemented:
You define permissions for the whole deposit or for
individual files (or parts of files)
We provide defaults to protect your data if you do
not define permissions
You/we keep permissions up to date
You list other rights holders
ELAR Deposit Form “Section C”
P1. Anyone

Any person may view/listen to or receive a digital copy of any part of the deposit
P2. Certain people or groups
Choose any combination of P2A, P2B, and P2C:
P2A Research community members
What level of access (choose one only)?
P2A1. They can receive a digital copy of requested material

P2A2. They can view/listen but cannot receive a digital copy

P2B. Language community members
See below regarding identifying members
What level of access (choose one only)?
P2B1. They can receive a digital copy of requested material

P2B2. They can view/listen but cannot receive a digital copy

P2C. Particular named people or bodies

See below regarding identifying people/bodies
P3. Depositor is asked permission for each request
You will be contacted and asked for permission on each request.
How do you want to be contacted?
P3A. Requester is given address to contact you directly

P3B. ELAR will relay requests to you

P4. Only the depositor has access

Persons other than the depositor will not be able to request access.
ELAR Deposit Form “Section C”
Identifying people/bodies
If you chose P2B or P2C, tell us how ELAR should determine who is a member of
a group (e.g. language community, educational body). Choose one of the following:
M1. You tell ELAR how to determine membership (tell us in Part D)

M2. ELAR will ask you on each occasion

M3. ELAR will make a judgement about membership

If you chose P2C, then list the names of the people or bodies in Part D.
Contacting you
If you choose P3A or P3B, you will be able to decide about each particular request.
If the choice is P3A, we will send your address to the requester, who can then ask
you directly for permission. You then send us your decision. If the choice is P3B,
ELAR will act as an intermediary, and pass on the request to you, so that your
privacy is maintained. However, if you chose one of P3A or P3B and you (or your
delegate) are not contactable, ELAR will need to make the decision or change the
access permissions.
Similarly, if we need to contact you to ask about group membership, and you (or
your delegate) are not contactable, we will need to make the decision or change
the access permissions.
Other aspects
defaults
sunset clause
we provide means to change/manage protocol
file or object-level protocol
delegate
other rights holders
effort to identify depositor for long-term
depositor-oriented
ELAR’s holdings
 ELAR currently hold 36 deposits with a total volume of
approx 0.9 TB.
 The average deposit is about 25 GB, however, the
sizes vary widely, with a few much larger deposits, and
the median size is around 10GB.
 We expect this to nearly double over the next year
 See next slides for distribution of data types
ELAR holdings by data type
 This table analyses
some data types of
interest for a
representative sample
(70%) of holdings
 Date type by volume and
number of files, sorted
by volume
Data type
Volume (MB)
Files
audio
360,411
6,312
video
208,995
895
image
28,592
2,221
msword
223
404
pdf
196
134
eaf
33
176
text
32
781
lex
9
29
trs
5
246
xls
1
19
imdi
1
26
ELAR holdings by data type
 This table analyses
some data types of
interest for a
representative sample
(70%) of holdings
 Date type by number of
files and volume, sorted
by number of files
Data type
Files
Volume (MB)
audio
6,312
360,411
image
2,221
28,592
video
895
208,995
text
781
32
msword
404
223
trs
246
5
eaf
176
33
pdf
134
196
lex
29
9
imdi
26
1
xls
19
1
Metadata
ELAR metadata set =
Selection from IMDI*, OLAC*, EAD, TEI
ELAR-specific (e.g. protocol, geographical)
Depositor metadata
* ie. a set of metadata elements that maps onto both IMDI and OLAC
{{
Archive
ELAR metadata set
Deposit
Your metadata
All other files
Types of Metadata
Depositor's / delegates' details
Descriptive metadata
Administrative metadata
preservation metadata
Access protocols
Metadata for individual files
Depositors and delegates
Name
Address
Contact details (telephone, fax, email, URL)
Role
Affiliation
Date of birth, Nationality
Descriptive metadata
Title, Description, Subject, Summary
Keywords
Subject Language, Community
Location
Timespan
Helps in cataloguing
Administrative metadata
Project details
funding and hosting institutions
Details of external copies
Details of accession agreement
cf. Deposit form
Preservation metadata
Carrier media
Provenance (Source)
Access
access protocols (see elsewhere)
group membership identification
File-level metadata
Media files
duration, file size
MIME type, content type
Text files
font, character set, encoding
format, markup
Metadata files
schema
scope
validity
ELAR Metadata Set
input
Deposit Form
Metadata edited via ELAR website
Full ELAR metadata set
Export to IMDI
Export to OLAC
output
Export to TEI
(Descriptive, Structural, Technical, Administrative, Preservation)
Type
ELAR set
A
Accession_acquisition
A
Accession_agreement
Field
ELAR Comments
= EAD <acqinfo>
depositor_has_signed
depositor_sign_date
ELAR_has_signed
ELAR_sign_date
hard_copy_location
A
Accession_appraisal
= EAD <appraisal>
A
Accession_date
Fixed at accession
A
Accession_number
A
Accession_request *
Requests to ELAR e.g. anonymisation, conversions
requestnum
request
action_note
A
Accession_status
P
Carrier *
The transmission medium (carrier) e.g. CD, DV tape, La Cie
256MB external hard disk
medium
labeling_system
info
D
Community *
Culture or community group(s) represented
D
Metacontent_file *
For depositors' metadata of various kinds. Contextual (eg lg
info resource, bio of speaker; methodology; local history
etc); metadata (eg depositor's IMDI metadata or file
inventory), related resources (imdi 75).
type
scope
List or description of files covered by this metadata file
format
Format of data organisation
schema
A formalisation of a format; by url; or name of schema; or
indicate schema file within deposit
validated
info
D
Creator *
Person primarily associated with production of resource
name
notes
D
Date *
Dates in the lifecycle of the resource; e.g. start and end
dates of data collection
event
date_or_start_date
end_date
A
Depositor’s delegate; has ability to administer deposit
Delegate
address_line_1
address_line_2
address_line_3
address_state_county
address_country
address_postcode
email
fax
family_name
given_name
title
nationality
telephone
url
A
Depositor
Person who has rights in materials, provides them to ELAR,
and makes agreement via deposit form
address_line_1
address_line_2
address_line_3
address_state_county
address_country
address_postcode
affiliation
dob
email
fax
family_name
.
given_name
title
nationality
role
telephone
url
e.g. collector, fieldworker, donor
D
Description
Description of the resource: see Summary which has higher
priority
D
Description_language *
ISO 639-2b, = EAD <langusage>
language_name
language_code
P
Dissemination_format
Information about presentation formats, status of
presentation objects etc
D
Features
Any unique or outstanding features of the deposit
A
Handle
ELAR use
P
ID
Accession number
D
Keywords
From 2 to 6 keywords related to the content; separate by
commas
D
Language *
ISO 639-2b,A15; See alt_names; = EAD <langmaterial>
name
code
alt_names
info
D
Linguistic_genre *
Covering type, conventions, key, links, as OLAC type 121122; imdi62; orthographic, phonetic, morphologic, syntactic,
translation, …
type
D
Address or more specific place than village, e.g. “Lydia’s
house”, “Primary school”
Location
address
country
Preferably name in English, standard spelling; otherwise
recommended to add country_code
country_code
ISO country code to allow variant naming of country
latitude
In Decimal Degree format
longitude
In Decimal Degree format
region
town
village
T
Mediafile
format
size_time_duration
allow free text, eg about 1hr 15 mins
size_data_volume
type
M
Other_info *
ELAR use only, see 205+G95
info
importance
domain
D
Participant *
role
A
lifecycle
Lifecycle attribute of particpant; need for major participants,
whether alive or dead, who are descendants etc
alt_name *
Use name, then Alt_name (eg as referred to in the content)
also Abbreviation, eg as used in transcription or annotation
anonymise
Need action notes and record of dates anonymised in order to
account for anonymisation obligations
name
As person name abbrev or similar
type
Also covers imdi participant.role, see also Metacontent_file
A
Project
contact
description
Only if short - not meant to include narrative descriptions of
over 30 words
funder
Project_sponsors\n{sponsor, type [host, funder …]}
host
code
title
A
Protocol_acknowledgement
See deposit form
yes/no
ack_text
A
Protocol_maintenance
ELAR use only. Need to define actions and vocabulary;
action
date
A
Protocol_M_code
See deposit form
M_code
instructions
A
Protocol_other_rights_holders
Describes how ELAR determines a person’s group
membership or access status
See deposit form
name
role
address
resource_ID
identify particular file(s) affected by these rights
A
Where P2C was selected, to list individuals’/organisations’
names and access types
Protocol_P2_names
name
type [individual, organisation]
access_type
A
Protocol_Pcode
Identifies people and access conditions. One only P_code, if
P_code = P2, then P2_code can be any combination of P2A,
P2B, and P2C; .....Rights * {right_type, right_holder,
contact_info} Access * {ELAR_code, date_from, date_to,
revise_date, revise_notes, other}
P_code
P2_codes
A
Protocol_Usage_restrictions
Default is private study only, = EAD <userestrict>
A
Publisher *
Where item is also disseminated, eg other archive
name
address
T
Quality
Distinguish archive's entry from depositor's. Add here
recording conditions and equipment. For quality, do not use
subjective terms like good or poor - describe phenomena, eg
medium tape hiss, slight clipping, road traffic in background
beween 5 and 7 mins etc
S
Relation *
= EAD <revisiondesc>
type
this_role
this_id
that_role
that_id
info
Update, correction, contains, part, replaces, extends/enriches,
transcription, translation, annotation, other_alignment,
other_description, contains_examples, alternative, authorship,
protocol_info
P
Relation_external *
materials are deposited with another archive or institution, ~
EAD <repository>
future
repository_name
repository_id
date
info
yes/no
existing_id
info
A
Revision
Does depositor intend to update or revise the deposit
yes/no
info
P
Source
Processed
Detail the data processing in deriving the deposit/file from
the source, e.g. digitisation parameters
Provenance
Use where depositor was not the main creator, or where
deposit/file is drawn from existing media or data sources.
Where materials were located, how depositor came into
possession, collector, protocol history, access and protocol
for source etc; Reference to a specific tape with a unique
label. Element characterizing the media format such as
DAT, DV, VHS, Hi-8, … See also Accession_acquisition
A
Status
Deposit, complete or not, ELAR only, complete or not, with
narr notes
D
Subject *
A short description
D
Summary
A description/summary/abstract (may also describe genres,
media, formats). = EAD <abstract>
T
Textfile
Characterset
e.g. Cyrillic, Chinese-traditional
Encoding
Encoding, eg. Unicode, Latin-1 Extended, Big-5, ASCII
Font
e.g. IPA-Kiel
Format
Can be a flag to apply to any part of deposit - encourage
structured narrative field to tabulate
Markup
e.g. TEI-lite, Shoebox, EAF; give or link to schema etc
D
Title
Note that Deposit here means a bundle which is the whole
set of files deposited
D
Type
Genre and/or type of content/event; see Johnson & Dwyer
Format details
 Filenames
 characters [A-Z], [a-z], [0-9], underscore and a single period
before the extension
 correct MIME extension
e.g. http://www.utoronto.ca/webdocs/HTMLdocs/Book/Book-3ed/appb/mimetype.html
 favour lower case letters
 maximum length 30 characters
 maximum directory depth 8
 File formats, see
http://www.hrelp.org/archive/depositors/formats.html
Dobbin
Audio evaluation, processing and reporting
Dobbin
Dobbin
Dobbin
Dobbin
Dobbin
Dobbin
ELAR will
 preserve your deposited materials
 provide for making changes where possible
 provide web-based metadata management
 implement your access restrictions etc
 give feedback about materials
 provide advice, general and specific
 assistance, eg data conversion
 provide some equipment and services
 on a case by case basis, develop resources
We ask you to
 manage materials well
 collect and provide protocol information
 deliver materials, metadata
 send trial samples etc
 not withhold materials
 share/manage/delegate custodianship of materials
 maintain relationships with language stakeholders
and ELAR
Delivery of materials
 mostly we expect to receive copies on computerreadable media such as CD/DVD/HD
 DVDs seem consistently unreliable
 some digitisation of media may be possible
Questions?
Download