Empirical Knowledge Representation Enrico Motta Knowledge Media Institute The Open University, UK Introduction • KR/Ont.Eng. research traditionally follows a top-down approach – formalisms are designed on the basis of modelling needs and computational considerations; tools and applications based on these formalisms are realized – little history of paying attention to users • Contrast, e.g., with software engineering, where empirical studies have a long and fine tradition • However, emergence of large scale web semantics means that engaging with formal representation of knowledge is no longer the preserve of a few samurais – a far larger group of users now publishes, consumes and in general tries to make sense of formal semantic structures. 2 A research programme Need for a research programme centred on empirical studies of KR and OO, which should foster the construction of an empirical body of evidence about the usability of alternative modelling solutions more ambitiously, it should help the discipline to start moving away from a purely top-down paradigm, to include a user-centric research element, consistently with most other engineering disciplines. • This goes hand in hand with an emerging recognition in the Ont. Eng. community that from an epistemological point of view there isn’t necessarily a unique way of doing things. – For instance, alternative design patterns exist for representing time, alternative upper level ontologies exist, etc. • Our work is also motivated by the fact that it is not uncommon for authors to make statements about the intuitiveness of different solutions. – Hence, there is awareness that user acceptance is indeed valuable – However, such statements tend to reflect an author’s epistemological standpoint, rather than any concrete user experience. 3 An empirical study on time representation • 13 subjects took part in an empirical study, comprising nine different elements, including modelling task (SE1) William Shakespeare was born and baptised in Stratford-upon-Avon in 1564, which at the time had a population of 5000 people. (SE2) In 1582, William Shakespeare, aged 18, married Anne Hathaway, aged 26. (SE3) William Shakespeare bought the Lord Chamberlain’s Men playing company in London in 1585 paying in cash. (SE4) Richard Burbage was an actor. (SE5) In 1592, William Shakespeare was a family man in Stratford-upon-Avon and an actor in London. (SE6) In 1595, he wrote the plays Richard II, Romeo and Juliet, and A Midsummer Night’s Dream. (SE7) In 1597, William Shakespeare decided that he will retire at New Place in 1613. And, by the way: (SE8) in 2007, Michael Phelps surpassed Ian Thorpe’s world record from 2001. • Subjects were given a KR formalism to solve the task with meta-level features but restricted to binary relations • Four patterns for time representation 4 3D and 3D+1 3D Pattern (time stamping statements) holdsAt (rel (arg1, arg2), t) 3D Example holdsAt (population (France, 55M), 1985) 3D+1 Pattern (slicing relations) rel@t (arg1, arg2) temporalSubpartOf (rel@t, rel) atTime (rel@t, t) 3D+1 Example population@1985 (France, 55M) temporalSubpartOf (population@1985, population) atTime (population@1985, 1985) 5 4D – slicing individuals Pattern rel (arg1@t1, arg2@t2) temporalSubpartOf (arg1@t1, arg1) temporalSubpartOf (arg2@t2, arg2) atTime (arg1@t1, t1) atTime (arg2@t2, t2) Example: “Henry, in 2009, was taller than Boris in 2010”, taller (henry@2009, boris@2010) temporalSubpartOf (henry@2009, henry) temporalSubpartOf (boris@2010, boris) atTime (henry@2009, 2009) atTime (boris@2010, 2010) 6 N-ary pattern Generic solution to represent n-ary relations in formalisms that only support binary relations. The transformation consists of decomposing the n-ary statement in n+1 binary statements, by creating an instance of the relation and then linking this instance to the n arguments of the original statement. Example: “Michael Phelps was born in Towson in 1985” typeOf (birthEvent1, BirthEvent) location (birthEvent1, Towson) atTime (birthEvent1, 1985) subject (birthEvent1, MichaelPhelps) 7 Results • 3D pattern considered as the most user-friendly, in particular by the least experienced users. – Hence, it seems unfortunate that, in contrast with other standards, such as KIF and Common Logic Web-KR languages for the Semantic Web only provide rather cumbersome mechanisms for representing statements about statements • 3D+1 is more readily used by subjects that have lower levels of skill in KR. – Subjects that use the 3D+1 pattern tend to perform less well and take a greater amount of time to complete the modelling tasks. • 3D+1 and n-ary pattern appear to provide some kind of alternative dominant solutions for representing temporal information – experts opted for the n-ary pattern possibly on the basis of its superior representational power and degree of flexibility – novices opted for 3D+1 possibly on the basis of surface syntactic features (problem solving by imitation) 8 Results • 4D Pattern – The 4D pattern is rated as the least intuitive. This confirms the observations about this pattern found in the literature. • however, users with a higher level of expertise in KR were in favor of using it for the modelling task, probably as a result of its representational power and flexibility. • N-ary Pattern – Most widely used pattern throughout the modelling task and, particularly, the choice of users with a higher level of expertise. • As in the case of 4D, this is likely to be the case because of its generality and flexibility. • positive correlation between using this pattern and getting a high score in the modelling task. 9