slides - Text Encoding Initiative

TEI Projects and Small Libraries
Examining TEI Markup Decisions and Procedures
Richard Wisneski, Head, Bibliographic and Metadata
Virginia Dressler, Digital Librarian
Stephanie Pasadyn, Technical Services Librarian
Kelvin Smith Library
November 2009
The Project:
• Digitizing and Encoding Kelvin Smith Library’s (KSL)
Books on Cleveland, Ohio and the Western Reserve
Digital Text Collection
• Using “Book Viewer” in KSL’s “Digital Case” (institutional
digital repository) to Display texts’ PDF, Page Images,
and TEI
• Have applied for an NEH Humanities Collections and
Reference Resources Grant to fund project
• Will collaborate with neighboring institutions to
incorporate into our collection their texts on the history
of Cleveland and the Western Reserve
Why Do This Project?
• Availability of Texts is Limited: See Spreadsheet
• No other institution has a project akin to this in Northeast
• Interest in Cleveland and Western Reserve history among
historians and scholars. Cleveland…
Why TEI?
• To allow researchers to have access to an electronic text
that does not require special-purpose software or
• To analyze information – provide a standard textencoding scheme and metadata language which
accommodates searching, retrieval, etc.
• To share information – have a standard format for data
interchange in humanities research
• Texts are being encoded in Level 3 (structural)
• To create stand-alone electronic text with hierarchy
• Emphasis on divisions within text, tables, lists, notes,
front and back matter
Current Project Practices
Project Log
Currently, kept on Google Docs in MS Excel shared file:
• Step 1: Review and assess digital images
Review digital content
Organize and assess
Image assessment
Key points in assessment
• Complete, uncorrupted files
• Ascertain image quality as to current practices and
• Check for legibility of text for OCR process
• Compare illustrations and photos with original source if
• Rescan if needed
Optical Character Recognition
• Step 2:
Sidekick 1400u
Image conversion
• Processing tiff files for the book viewer
Book viewer demo
Text Clean-UP
Student Workers, Volunteers do work in
OpenOffice and oXygen
TEI Headers
Professional Catalogers create TEI headers:
<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="" type="xml"?>
<TEI xmlns:xsi="" xmlns="">
<title type="main">Report on the preliminary surveys for the Cleveland, Painesville and
Ashtabula Rail Road Company </title>
<title type="sub">An electronic version</title>
<persName>Harbach, Frederick, 1817-1851</persName>
<name xml:id="ksl">Kelvin Smith Library, Case Western Reserve University</name>
<resp>Publisher of TEI-conformant electronic version.</resp>
<name xml:id="mxb">Mary Burns</name>
<resp>TEI Header creator</resp>
<name xml:id="rlw">Richard Wisneski</name>
<extent>1.448 MB</extent>
<publisher>Digital Case, Kelvin Smith Library, Case Western Reserve
<pubPlace>Cleveland, Ohio</pubPlace>
<distributor n="collection">KSL Digital Book Collection</distributor>
<p>This work is in the public domain and may be freely downloaded for personal or
academic use.</p>
<date when-iso="2009-09-01" />
TEI Structural Mark-up
• Text Encoders mark text following TEI P5, Level 3
<div type="section" xml:id="section1" n="1">
<pb n="5" facs=“clecle00-00003.jp2“ />
<p>The first settlers of Cleveland were from Connecticut;
and, according to tradition, as soon as three families had
established themselves — it was about the beginning of the
present century — they set up a school for their <hi
rend="ital">five children.</hi> The population had
increased to <hi rend="ital">fifty-seven</hi>in 1810, and
the oldest inhabitants think there was a school taught in
that year. It is certain, however, that it could not have been
very large. The earliest school mentioned in any record was
kept by a Mr. Capman in 1814. But it was not till1836, the
year of organization under the City Charter, that any system
of <hi rend="ital">public instruction </hi>was adopted.
Previous to this year, the schools, of whatever grade or
character, were supported mainly by private
TEI Structural Mark-up (continued)
<table rend="boxed" cols="3" rows="4" xml:id="Table2">
<head>TABLE OP CURVATURE.</head>
<cell> </cell>
<cell role="label">SOUTH ROUTE.</cell>
<cell role="label">NORTH ROUTE.</cell>
<cell>Deflections to Right</cell>
<cell>236° </cell>
<cell>Deflections to Left</cell>
Text Encoding
Learning TEI
• Practical Application
• Internal Documentation
• CaseLearns
Learning TEI
One on one overview
Creating master outline
Coding page by page
Referring to and updating documentation
Learning TEI
Learning TEI
Learning TEI
Learning TEI
Human Error
Evolution of Institutional Practice
Minimal Time Allotment
Limited Opportunity for Continuing
To Be Done
• Re-Scan some of the books
• Continue to encode
• Hold half- full-day workshops on text
encoding to full-time staff
• Create of MODS, MARC-XML, and METS
• Re-examine “Book Viewer”
Discussion Questions
• Ways to expedite text encoding
• Ways to scan texts – outsourcing?
• Funding challenges (outsourcing,
scanning, equipment)
• Book viewer – effective? Ineffective?
• Text-Encoding Level – change?
• Learning TEI – in-house classes and
documentation, TEI-C documentation.
Webinars? Online tutorials? Certificate
Richard Wisneski:
Virginia Dressler:
Stephanie Pasadyn:
Links and references
• Digital Case homepage
• Digital Case Book Viewer collection
• Women Writers Online, Brown University:
• Poetess Archive, University of Miami at Ohio:
• Victorian Women Writers Project, Indiana University:
• Swinburne Project, Indiana University: