Archaeology and Terminology

advertisement
Archaeology and Terminology
Ceri Binding
Hypermedia Research Unit, University of Glamorgan, Wales, UK
http://hypermedia.research.glam.ac.uk/
Any process needs (human) validation...
[Pictures from BBC news website]
Translation:
“Pedestrians
look left”
Translation:
“I am not in the office at
the moment. Send any
work to be translated”
STAR project - overview
• AHRC funded project in collaboration with
English Heritage Centre for Archaeology,
Portsmouth
• Aim: to investigate the potential of semantic
technologies for widening access to digital
archaeology resources, including disparate
datasets and associated grey literature.
STAR - general architecture
Applications – Server Side, Rich Client, Browser
Data access layer - Web Services, SQL, SPARQL
RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS)
Indexing
Grey
Literature
reports
Conversion (SKOS)
EH
thesauri,
glossaries
Data Mapping / Normalisation
STAN
RRAD
IADB
LEAP
Archaeological Datasets
RPRE
The Archaeological Archipelagos
[Keith May, English Heritage]
English Heritage controlled vocabularies
• 27 glossaries – from English Heritage recording
manuals (2006)
• 6 main thesauri used:
– Monument Types thesaurus
– Archaeological Sciences thesaurus
– Evidence thesaurus
– Main Building Materials thesaurus
– MDA Object Types thesaurus
– Timelines thesaurus
• Converted to SKOS format for use within STAR
Expressive vs. controlled vocabulary
“…how many of those writing [grey literature] reports
would think to describe what they are recording/writing
about using the same thesauri? […] it would have been a
lot quicker and easier if standardised terminology had
been used in the report text when describing types of
monument, event and artefact, as well as dates/periods
etc.” [G. Falkingham]
“Grey Literature is very often the only place where field workers have any
opportunity to engage in creating their own narrative of the site, both of
the archaeological event and of the archaeological story of the site itself. I
think it would be throwing the baby out with the bath water to
concentrate solely on the data without continuing to offer highly skilled
and experienced fieldworkers the opportunity to actually tell us what they
think the data means...” [S. Jeffrey]
Descriptive, semi-controlled vocabulary…
Deposit Colour
Deposit Texture Deposit
Compaction
(Reddy) Brown
Dark orange/brown Orangy brown, very
Firm
Plastic
9Reddy) brown
Dark red brown
light brown on edges
Friable
Sticky
Brown
Grey brown
and sides of profile
Friable to loose Sticky (wet)
Brown red
Grey/brown
Red /brown
Friable/loose
Sticky/firm
Brown/reddy
Light brown
Red brown
Friable-loose
Varies
Dark brown
Light yellow brown Red/brown
Loose
“…another
of mybrown
examplesReddish
has something
about
some flint that is
Dark brown/orange
Medium
brown
Loose/friabe
Dark grey brown
Mid brown
Reddy brown
Loose/friable
‘snuff coloured’
& I don’t know
if I’ve ever seen
snuff, let alone
Dark orange
brown
Midcolour
red brown
know
what
it is, or Varies
might have been over 150 years ago,
Dark orange
brown
Orange
brown
lightsense
brownto take some kind of
and
I would
think
it wouldVery
make
with darker
Orange/brown
integrated
approach fromWhite
the outset, rather than the usual
patches
Orangy brown
Yellow brown
‘bricolage’ of having one route
for the archivists, another for
Dark orange loam
Yellow/orange brown
those interested in searching spreadsheets, another for people
interested in googling graphics, etc.” [G. Carver]
Worst of all worlds?
Terminology control for time periods
•
•
•
•
•
•
•
•
Centuries
BC / AD years
3 age system
Monarchs / Roman emperors
Cultural styles
Geological periods
Prefixes: pre, post, mid etc.
Any combinations of these
Examples of periods
encountered in data
MLC2-C3
AD 341-6
Iron Age
First half 1st century?
Antonine
LC2/EC3
Early C3
MLA
Time period alignment –
data cleansing / semantic enrichment
Object No
1519 AD 354-64
1520 1st century AD
1538 2nd century
1548 1st century
1562 AD 367-75
1563 AD 348-50
1567 Mid 1st century AD
1571 First half 1st centu
1572 Mid first century AD
1580 c. AD 270
1583 First half first cen
1591 AD 341-6
1593 AD 287-93
1594 AD 43-44
1595 Medieval
1627 2nd century AD
1631 ?1st century
1635 AD 354-64
1664 AD 330-5
1681 Medieval
1701 Romano-British
1704 Modern?
98157 post-mediaeval
Period
MIN YEAR
MAX YEAR
354
1
101
1
367
348
33
1
33
270
1
341
287
43
1066
101
1
354
330
1066
43
1901
1540
364
100
200
100
375
350
66
50
66
270
50
346
293
44
1540
200
100
364
335
1540
410
2100
1901
Time period relationships
Period P1
occurs before P1*
occurs after P1*
meets P1
met by P1
overlaps P1
overlapped by P1
starts P1*
started by P1*
finishes P1*
finished by P1*
includes P1*
occurs during P1*
equal to P1*
Time
[*Transitive]
Time Period Comparison – Closeness Calculation
IU
Period P1
Period P2
NMP
MP
NMP
Period P3
NMP
D
NMP
Time
Match(P1, P2) = W1 (MP / IU) + W2 (IU / (NMP + IU)) + W3 (IU / (D + IU))
SKOS Concepts + CRM Entities
Time period concepts also have
implicit spatio-temporal context
skos:Concept
crm:E2.TemporalEntity
crm:E52.Time-Span
crm:P4F.has_time-span
rdf:type
rdfs:subClassOf
crm:P7F.took_place_at
rdf:type
<#stuart>
crm:E4.Period
crm:E53.Place
crm:P116F.starts
skos:broader
<#jacobean>
skos:broader
<#caroline>
crm:P119F.meets
skos:broader
skos:broader
<#restoration>
crm:P118F.overlaps
crm:P115F.finishes
skos:broader
<#williamandmary>
crm:P119F.meets
crm:P119F.meets
<#queenanne>
Time period alignment – data processing
• Align data relative to closest period concepts
from English Heritage ‘Timelines’ thesaurus
Time period alignment - results
Data records relative to closest ‘known’ periods
Data record (dates deduced from labels)
Label
From
To
1555-1623?
1555
AD270-284
270
Relationship
overlaps
occurs during
1623
includes
overlapped by
occurs during
overlaps
includes
284
started by
overlapped by
met by
Calculated closest matching known periods
Label
From
To Closeness
JAMES I AND VI
1567 1625
0.895
POST MEDIEVAL
1540 1901
0.838
ELIZABETHAN
1558 1603
0.814
nd
th
2 HALF 16 CENTURY AD
1551 1600
0.808
rd
LATE 3 CENTURY
267
300
0.885
th
rd
4 QUARTER 3 CENTURY AD
276
300
0.706
PROBUS
276
282
0.699
AURELIAN
270
275
0.665
rd
rd
3 QUARTER 3 CENTURY AD
251
275
0.610
QUINTILLUS
270
270
0.532
Data aligned to closest ‘ known’ periods
Data record – dates deduced from labels
ID Label
From
To
Closest controlled match based on dates
ID Label
From
To
1315 AD 228-31
228
231
136122 ALEXANDER SEVERUS
222
235
1316 AD 364-78
364
378
900014 3RD QUARTER 4TH CENTURY AD
351
375
1317 AD 69-79
69
79
136087 VESPASIAN
69
79
1318 AD 270-4
270
274
136164 TETRICUS I
270
274
1319 AD 275-402
275
402
134825 4TH CENTURY AD
300
399
1320 AD 341-6
341
346
900013 2ND QUARTER 4TH CENTURY AD
326
350
1321 AD 268-70
268
270
136154 CLAUDIUS II GOTHICUS
268
270
1322 AD 367-75
367
375
900014 3RD QUARTER 4TH CENTURY AD
351
375
1324 AD 270-84
270
284
135952 LATE 3RD CENTURY
266
299
1325 AD 270-84
270
284
135952 LATE 3RD CENTURY
266
299
1326 AD 367-75
367
375
900014 3RD QUARTER 4TH CENTURY AD
351
375
1327 AD 383-8
383
388
900015 4TH QUARTER 4TH CENTURY AD
376
399
1328 AD 330-40
330
340
900013 2ND QUARTER 4TH CENTURY AD
326
350
1337 Post-medieval
1540
1901
134746 POST MEDIEVAL
1540
1901
1370 Medieval
1066
1540
134745 MEDIEVAL
1066
1540
1371 AD 1943
1943
1943
134848 SECOND WORLD WAR
1939
1945
Timeline service test client
Semantic enrichment
• Borderline between data cleansing and data
creation…
“Possibly fragment of belt buckle or nail”
•BELT
•Belt Clasp -> use STRAP FITTING
•BUCKLE
•Buckle Plate -> use BUCKLE
•NAIL
•HOBNAIL
•SHOEING NAIL
“The single most useful thing you can do to ensure the long-term
preservation of your data is to plan for it to be re-used”
[Archaeology Data Service]
Aligning controlled vocabularies
• Different scope notes, same concepts?
• Different thesauri, same concepts?
Archaeological Objects
•SARCOPHAGUS
•SUNDIAL
•WALL PAINTING
•WHIPPING POST
RCHME Monument Types
•SARCOPHAGUS
•SUNDIAL
•WALL PAINTING
•WHIPPING POST
RCHMS Monument Types
RCHMW Monument Types
STAR general architecture
• Windows applications
• Browser components
• Full text search
• Browse concept space
• Navigate via expansion
• Cross search archaeological
datasets
STAR client applications
English Heritage
thesauri (SKOS)
Grey literature
indexing
STAR web services
Archaeological
Datasets (CRM)
STAR datasets
Windows Client Applications
Browse available thesauri
Search across multiple thesauri
Navigate via semantic
expansion
Interactive tools to aid data entry
Controlled types used in main search interface
 Interactive selection from glossary/thesaurus concepts
 Filtered to concepts actually used in indexing
 Group / context types – from (enhanced) cuts and deposits
glossary
 Context find materials – from building materials thesaurus
 Context find types – from MDA Object types thesaurus
 Context sample types – from existing data values...
Interactive tools to aid data entry
Summary
• Tension between expressive vs. controlled
vocabulary, but general agreement on benefits
of control
• Better coordination and alignment of
controlled vocabularies would be beneficial
• Web services and interactive tools to aid data
entry and search
• Issues encountered are not about particular
technologies – more fundamental KO issues
Archaeology and Terminology
Ceri Binding
Hypermedia Research Unit, University of Glamorgan, Wales, UK
http://hypermedia.research.glam.ac.uk/
Accommodating Approximation
Uncertainty
Approximate time period
crm: P81F.ongoing_throughout
crm: P82F.at_some_time_within
Latest end date
Earliest end date
Latest start date
Earliest start date
crm:E52.Time-Span – modelling uncertainty
Uncertainty
Download