LaQuAT L a Qu

advertisement

LaQuAT

L

inking

a

nd

Qu

erying

A

ncient

T

exts

Mike Jackson

Mike Jackson

Architect, EPCC michaelj@epcc.ed.ac.uk

+44 131 650 5141

What is LaQuAT?

• 6 month JISC-funded ENGAGE project

– Facilitate use of e-infrastructures

• Kings College, London

– Centre for e-Research (CeRch)

– Arts and Humanities e-Science Support Centre (AHeSSC)

– Application-domain expertise and development

• EPCC

– Technology providers

• National Grid Service (NGS)

– Deployment

2

Aims

• Link legacy data – keep original data as-is

• Use OGSA-DAI components

– Data access and integration software

– Web services

• Write demonstrators

• Deploy on the NGS

• Publish experiences, guidelines and case studies

• First step towards an e-infrastructure for humanities

3

Volterra

• Projet Volterra

– Department of History, University College London

• Data on late Roman legal texts

– Imperial pronouncements in Latin

• Relational database

4

HGV

• Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens

– Institut für Papyologie, Heidelberg Academy of the

Sciences

• Greek papyri meta-data

– Bibliography

– Dates

– Places – where found + provenance

• Relational database

5

IAphrodisias

• Inscriptions of Aphrodisias

– Kings College, London

• Greek inscriptions meta-data

– Bibliography

– Dates

– Places – where found + provenance

– Transcript

• XML database

– One XML document per inscription

– eXtensible Mark-up Language

– Share structured data

– Encode documents

– EpiDoc XML – Text Encoding Initiative for inscriptions

6

Example document excerpt

<keywords>

<term>

<geogName type="ancientRegion" key="Asia" cert="high" full="yes">

Asia

</geogName>

</term>

<term>

<geogName type="modernCountry" key="TR" cert="high" full="yes">

Turkey

</geogName>

</term>

<term>

<placeName type="ancientFindspot" key="Aphrodisias" cert="high” full="yes">

Aphrodisias

</placeName>

</term>

<term>

<rs type="textType" key="sacer" cert="high">

Religious

</rs>

</term>

</keywords>

7

Why link?

Prosopographical researcher wants to learn about patterns of relationships and activities of group of individuals in a society of a certain period by analysing the data from inscriptions and legal legislation records of that period

• Volterra + HGV (relational + relational)

– Overlaps of dates and places

• Volterra + IAphrodisias (relational + XML)

– Overlaps of dates and people

• Volterra + IAphrodisias + HGV (relational + XML + relational)

– Overlaps of dates

• Insights outwith the scope of any single database

8

SQL-92 versus the “real world”

• Volterra

– Laws 1 (HonoreEL)

– Law ID

– Honore Ref No

– Titulus (source)

• HGV

– Erwähnte Daten 08-04-25 Kopie

– HGVjuni-9

– Erw. Daten exp.?

• SQL-92 standard and table and column names

– No to ( , ) , - , spaces,…

• Database products are more lenient

9

Control characters

• Web services are based around XML document exchanges

• HGV in FileMaker Pro

– In the database

BGU XIII 2223.2-3 usually with ÍpÉ §moË in this context

– On the wire

<columnValue>BGU XIII 2223.2-3 ^ Kusually with ÍpÉ §moË in this context</columnValue>

• XML document + CTRL-K = invalid XML document

10

Languages and character sets – HGV

Publikation Datierung

1 CPR V 1 66, 2.

Sept.

3 CPR VII 1 7 - 4 v.Chr.

9 CPR XV 1

28 O.Bodl. III 1

(S. XVIII)

3 v.Chr.,

29. Aug. -

27. Sept.

130, 30.

Apr.

Ort Originaltitel

Oxyrhynchos Receipt for Dyke - tax

Soknopaiu

Nesos

(Arsinoites)

Soknopaiu

Nesos

(Arsinoites)

Petition der Priester des

Soknopaios an den

Präfekten

Traduzione greca dell’atto di rinuncia alla casa - mulino

Syene Receipt ά ξιον

11

Languages and character sets

• Multi-lingual

• Accented characters, Greek, German,…

– Erwähnte Daten 08-04-25 Kopie

– Petition der Priester des Soknopaios an den

– PräfektenTraduzione greca dell’atto di rinuncia alla casa - mulino

– Receipt for χειρων ά ξιον

• Character sets and encodings

– UTF-8 – our Linux test server

– CP1252 – my laptop

12

Database products and design

• Volterra in Microsoft Access

– Laws data about different time periods in 6 tables

– Not all tables have the same columns

– Not all tables use the same column names

• HGV in FileMaker Pro

– One massive table of 50,000 rows with 75 columns

• Database

– Data access and management tool

– Or

– Data entry and storage tool

13

How it works

Volterra – legal texts

View presents N tables as one table

SQL views

Query

OGSA-DAI

HGV – papyrus records

View translates

German column

SQL views names to English

Query

OGSA-DAI

DQP executes subqueries via workflows

DQP parses query and forms query plan

DQP

Query

Client

DQP aggregates results

14

Questions

15

Challenges

• Volterra

– Viewing multiple tables as a single table

– Full text searches

• HGV

– Mapping between German and English

– UNICODE characters

– Full text searches

– Elementary JDBC driver – little meta-data

• IAphrodisias

– UNICODE characters

– Full text searches

16

Challenges

• Dates vary from the day to 50-100 year spans

• Same query

– “find objects from, or references to, the period between 1479 and

1425 BC”

– “find objects associated with the reign of Tuthmosis III”

– Authority lists

• Variants

– Spelling and languages

– Spatial/geographical changes – Tuthmosis expanded Egyptian rule

• Relational-XML data integration

– AIST

17

Demonstrators

• Existing on-line query forms

• Emulate these but with OGSA-DAI back ends

18

Download