Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK http://hypermedia.research.glam.ac.uk/ Any process needs (human) validation... [Pictures from BBC news website] Translation: “Pedestrians look left” Translation: “I am not in the office at the moment. Send any work to be translated” STAR project - overview • AHRC funded project in collaboration with English Heritage Centre for Archaeology, Portsmouth • Aim: to investigate the potential of semantic technologies for widening access to digital archaeology resources, including disparate datasets and associated grey literature. STAR - general architecture Applications – Server Side, Rich Client, Browser Data access layer - Web Services, SQL, SPARQL RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS) Indexing Grey Literature reports Conversion (SKOS) EH thesauri, glossaries Data Mapping / Normalisation STAN RRAD IADB LEAP Archaeological Datasets RPRE The Archaeological Archipelagos [Keith May, English Heritage] English Heritage controlled vocabularies • 27 glossaries – from English Heritage recording manuals (2006) • 6 main thesauri used: – Monument Types thesaurus – Archaeological Sciences thesaurus – Evidence thesaurus – Main Building Materials thesaurus – MDA Object Types thesaurus – Timelines thesaurus • Converted to SKOS format for use within STAR Expressive vs. controlled vocabulary “…how many of those writing [grey literature] reports would think to describe what they are recording/writing about using the same thesauri? […] it would have been a lot quicker and easier if standardised terminology had been used in the report text when describing types of monument, event and artefact, as well as dates/periods etc.” [G. Falkingham] “Grey Literature is very often the only place where field workers have any opportunity to engage in creating their own narrative of the site, both of the archaeological event and of the archaeological story of the site itself. I think it would be throwing the baby out with the bath water to concentrate solely on the data without continuing to offer highly skilled and experienced fieldworkers the opportunity to actually tell us what they think the data means...” [S. Jeffrey] Descriptive, semi-controlled vocabulary… Deposit Colour Deposit Texture Deposit Compaction (Reddy) Brown Dark orange/brown Orangy brown, very Firm Plastic 9Reddy) brown Dark red brown light brown on edges Friable Sticky Brown Grey brown and sides of profile Friable to loose Sticky (wet) Brown red Grey/brown Red /brown Friable/loose Sticky/firm Brown/reddy Light brown Red brown Friable-loose Varies Dark brown Light yellow brown Red/brown Loose “…another of mybrown examplesReddish has something about some flint that is Dark brown/orange Medium brown Loose/friabe Dark grey brown Mid brown Reddy brown Loose/friable ‘snuff coloured’ & I don’t know if I’ve ever seen snuff, let alone Dark orange brown Midcolour red brown know what it is, or Varies might have been over 150 years ago, Dark orange brown Orange brown lightsense brownto take some kind of and I would think it wouldVery make with darker Orange/brown integrated approach fromWhite the outset, rather than the usual patches Orangy brown Yellow brown ‘bricolage’ of having one route for the archivists, another for Dark orange loam Yellow/orange brown those interested in searching spreadsheets, another for people interested in googling graphics, etc.” [G. Carver] Worst of all worlds? Terminology control for time periods • • • • • • • • Centuries BC / AD years 3 age system Monarchs / Roman emperors Cultural styles Geological periods Prefixes: pre, post, mid etc. Any combinations of these Examples of periods encountered in data MLC2-C3 AD 341-6 Iron Age First half 1st century? Antonine LC2/EC3 Early C3 MLA Time period alignment – data cleansing / semantic enrichment Object No 1519 AD 354-64 1520 1st century AD 1538 2nd century 1548 1st century 1562 AD 367-75 1563 AD 348-50 1567 Mid 1st century AD 1571 First half 1st centu 1572 Mid first century AD 1580 c. AD 270 1583 First half first cen 1591 AD 341-6 1593 AD 287-93 1594 AD 43-44 1595 Medieval 1627 2nd century AD 1631 ?1st century 1635 AD 354-64 1664 AD 330-5 1681 Medieval 1701 Romano-British 1704 Modern? 98157 post-mediaeval Period MIN YEAR MAX YEAR 354 1 101 1 367 348 33 1 33 270 1 341 287 43 1066 101 1 354 330 1066 43 1901 1540 364 100 200 100 375 350 66 50 66 270 50 346 293 44 1540 200 100 364 335 1540 410 2100 1901 Time period relationships Period P1 occurs before P1* occurs after P1* meets P1 met by P1 overlaps P1 overlapped by P1 starts P1* started by P1* finishes P1* finished by P1* includes P1* occurs during P1* equal to P1* Time [*Transitive] Time Period Comparison – Closeness Calculation IU Period P1 Period P2 NMP MP NMP Period P3 NMP D NMP Time Match(P1, P2) = W1 (MP / IU) + W2 (IU / (NMP + IU)) + W3 (IU / (D + IU)) SKOS Concepts + CRM Entities Time period concepts also have implicit spatio-temporal context skos:Concept crm:E2.TemporalEntity crm:E52.Time-Span crm:P4F.has_time-span rdf:type rdfs:subClassOf crm:P7F.took_place_at rdf:type <#stuart> crm:E4.Period crm:E53.Place crm:P116F.starts skos:broader <#jacobean> skos:broader <#caroline> crm:P119F.meets skos:broader skos:broader <#restoration> crm:P118F.overlaps crm:P115F.finishes skos:broader <#williamandmary> crm:P119F.meets crm:P119F.meets <#queenanne> Time period alignment – data processing • Align data relative to closest period concepts from English Heritage ‘Timelines’ thesaurus Time period alignment - results Data records relative to closest ‘known’ periods Data record (dates deduced from labels) Label From To 1555-1623? 1555 AD270-284 270 Relationship overlaps occurs during 1623 includes overlapped by occurs during overlaps includes 284 started by overlapped by met by Calculated closest matching known periods Label From To Closeness JAMES I AND VI 1567 1625 0.895 POST MEDIEVAL 1540 1901 0.838 ELIZABETHAN 1558 1603 0.814 nd th 2 HALF 16 CENTURY AD 1551 1600 0.808 rd LATE 3 CENTURY 267 300 0.885 th rd 4 QUARTER 3 CENTURY AD 276 300 0.706 PROBUS 276 282 0.699 AURELIAN 270 275 0.665 rd rd 3 QUARTER 3 CENTURY AD 251 275 0.610 QUINTILLUS 270 270 0.532 Data aligned to closest ‘ known’ periods Data record – dates deduced from labels ID Label From To Closest controlled match based on dates ID Label From To 1315 AD 228-31 228 231 136122 ALEXANDER SEVERUS 222 235 1316 AD 364-78 364 378 900014 3RD QUARTER 4TH CENTURY AD 351 375 1317 AD 69-79 69 79 136087 VESPASIAN 69 79 1318 AD 270-4 270 274 136164 TETRICUS I 270 274 1319 AD 275-402 275 402 134825 4TH CENTURY AD 300 399 1320 AD 341-6 341 346 900013 2ND QUARTER 4TH CENTURY AD 326 350 1321 AD 268-70 268 270 136154 CLAUDIUS II GOTHICUS 268 270 1322 AD 367-75 367 375 900014 3RD QUARTER 4TH CENTURY AD 351 375 1324 AD 270-84 270 284 135952 LATE 3RD CENTURY 266 299 1325 AD 270-84 270 284 135952 LATE 3RD CENTURY 266 299 1326 AD 367-75 367 375 900014 3RD QUARTER 4TH CENTURY AD 351 375 1327 AD 383-8 383 388 900015 4TH QUARTER 4TH CENTURY AD 376 399 1328 AD 330-40 330 340 900013 2ND QUARTER 4TH CENTURY AD 326 350 1337 Post-medieval 1540 1901 134746 POST MEDIEVAL 1540 1901 1370 Medieval 1066 1540 134745 MEDIEVAL 1066 1540 1371 AD 1943 1943 1943 134848 SECOND WORLD WAR 1939 1945 Timeline service test client Semantic enrichment • Borderline between data cleansing and data creation… “Possibly fragment of belt buckle or nail” •BELT •Belt Clasp -> use STRAP FITTING •BUCKLE •Buckle Plate -> use BUCKLE •NAIL •HOBNAIL •SHOEING NAIL “The single most useful thing you can do to ensure the long-term preservation of your data is to plan for it to be re-used” [Archaeology Data Service] Aligning controlled vocabularies • Different scope notes, same concepts? • Different thesauri, same concepts? Archaeological Objects •SARCOPHAGUS •SUNDIAL •WALL PAINTING •WHIPPING POST RCHME Monument Types •SARCOPHAGUS •SUNDIAL •WALL PAINTING •WHIPPING POST RCHMS Monument Types RCHMW Monument Types STAR general architecture • Windows applications • Browser components • Full text search • Browse concept space • Navigate via expansion • Cross search archaeological datasets STAR client applications English Heritage thesauri (SKOS) Grey literature indexing STAR web services Archaeological Datasets (CRM) STAR datasets Windows Client Applications Browse available thesauri Search across multiple thesauri Navigate via semantic expansion Interactive tools to aid data entry Controlled types used in main search interface Interactive selection from glossary/thesaurus concepts Filtered to concepts actually used in indexing Group / context types – from (enhanced) cuts and deposits glossary Context find materials – from building materials thesaurus Context find types – from MDA Object types thesaurus Context sample types – from existing data values... Interactive tools to aid data entry Summary • Tension between expressive vs. controlled vocabulary, but general agreement on benefits of control • Better coordination and alignment of controlled vocabularies would be beneficial • Web services and interactive tools to aid data entry and search • Issues encountered are not about particular technologies – more fundamental KO issues Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK http://hypermedia.research.glam.ac.uk/ Accommodating Approximation Uncertainty Approximate time period crm: P81F.ongoing_throughout crm: P82F.at_some_time_within Latest end date Earliest end date Latest start date Earliest start date crm:E52.Time-Span – modelling uncertainty Uncertainty