Spinning out of IPAS Sam Chapman Ravish Bhagdev, Vitaveska Lanfranchi, Fabio Ciravegna University of Sheffield. Director, Knowledge Now Ltd. s.chapman@dcs.shef.ac.uk s.chapman@k-now.co.uk http://www.dcs.shef.ac.uk/~sam/ http://www.k-now.co.uk IPAS Technology ◦ Background ◦ K-Search ◦ K-Forms Application of Technology K-Now Outline of talk 2 Integrated Products And Services (IPAS) is essential to the long-term success of businesses in the interlinked emerging global environment. ◦ Aim: knowledge transfer between “three worlds”: new service design, new product design, the operation of existing products and services in the field. IPAS integrates and applies a number of disparate research fields spanning ◦ ◦ ◦ ◦ ◦ computer science, engineering design, knowledge management, manufacturing, work psychology. IPAS: what is IPAS? The objective is to develop and exploit technologies such as ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ meta-data, semantics, ontologies, text mining, search, social interactions, knowledge representation, semantic web services. to empower the right person with the right information at the right time. IPAS: objectives Funds: about £1.5M ◦ 39% Rolls Royce plc ◦ 42% Department Trade and Industry Now TSB ◦ 18% other industrial partners IPAS: participants Sheffield’s role ◦ Use of Text mining, Knowledge capture, Semantics, Ontologies, Text and image annotation, Knowledge storage, Search, ◦ To manage knowledge sharing and reuse between the three worlds IPAS: Sheffield's Role To provide the capability to query legacy document content as if it were structured information ◦ Via acquisition of content metadata Automatically from legacy data Manually and semi-automatically for new documents (including images) Empowering: ◦ Content based retrieval (NOT documents) ◦ Automatic quantitative analysis ◦ Automatic correlation of facts across documents and archives To improve the way documents are currently created, providing tools that will make easier to semantically “annotate” the content while writing IPAS: Sheffield's Role Existing Knowledge managment solutions rely upon combinations of basic technologies: ◦ Simple document indexing, ◦ DataBase backed centralised views of how knowledge should be organised, with regimented interfaces to them, ◦ But such approaches have a myriad of issues, Sheffield’s role in IPAS tries to resolve these. Technology: background 8 Keyword approaches are simple and understood by users but… ◦ Uncertain returns/Unquantifiable 10,145 hits but how many are relevant to query? ◦ Synonyms “George W Bush” vs “43rd president of the USA” ◦ Homonyms “Bank” (river vs financial) ◦ Meronyms USA doesn’t include Massachusetts ◦ Documents not “knowledge” Repeated information – pointer to context not information directly Although techniques to mitigate these “defects” can be added to Search Systems they are: ◦ Partial solutions (addressing some cases only) ◦ Unsuited to most knowledge discovery needs Technology: background-Keyword 9 Most business knowledge needs are not being met by keyword approaches alone: ◦ Business Intelligence ◦ Quantitative Analysis ◦ Trend Spotting Uncertain returns are not suitable for many users Technology: background-Keyword10 Other issues in keyword approaches: ◦ Repetitive textual information Repeated documents/formats confuses returns ◦ Constrained Language “Blade” in the aerospace domain is in >70% available documents Context of information is paramount to its meaning, e.g. ◦ “Damaged Blade XYZ” ◦ “Damaged Blade YXZ housing” Technology: background-Keyword11 Deployment ◦ Access to a corpus of documents. (crawled or otherwise) ◦ Creation of Inverted Index ◦ Additional features [optional] Synonym list(s) Query Suggestion Custom Stemming OneBoxtm style additions Federated search combinations etc etc Technology: background-Keyword12 Storing graph structured typed knowledge rather than index based stores. ◦ Extensible and combinable schemas ◦ Hierarchically typed information Name: <String> Sam Chapman Person: <Person> Person3456 Residence: Age: <Country> UK <Integer> 35 ◦ Precise Query Support Quantifiable Technology: background-Semantics 13 Semantic approaches offer great advantages vs keyword approaches but… Have shortcomings: ◦ Difficulties in obtaining structured data “Semantic capture” ◦ Query although powerful can be complex from a user perspective ◦ Rigid structured organisation constrains user interaction Not all possible "knowledge" is encoded into a re-usable structured form. Technology: background-Semantics 14 Deployment ◦ Requires the acquisition of structured information (semantic capture) Two methods: ◦ Legacy and external document support ◦ Capture at point of knowledge generation methodology for semantic capture will be detailed Technology: background-Semantics 15 What is needed is a way to combine keyword and semantic style interactions into a single hybrid approach to empower flexible query. ◦ Users can easily switch between, or combine: Keyword approaches ◦ Simple ◦ Fast ◦ Not constrained by structured representations Semantic approaches ◦ Quantitative ◦ Accurate ◦ According to structured representations Technology: K-Search - aim 16 K-Search: ◦ Server-based tool for hybrid searching and sharing of knowledge stored in a hybrid repository. Advanced Query Capabilities. ◦ Mixed Modality and flexibility Enabling Knowledge reuse. Quantitative analysis / Business Intelligence / Trend Analysis Simple user interface to create queries and visualise results. ◦ Hides the complexity of the search mechanism and underlying storage. Technology: K-Search 17 Semantic Capture of Legacy documents ◦ Uses: Document conversion (MS WORD and PDF) Machine Learning (K-ML based upon T-Rex) Rule based extraction (K-Rules based upon Saxon) ◦ To - Extract structured knowledge using: Layout Typography Linguistic Context Etc ◦ Achieve: Precision = 98% Recall = 99% F-Measure (harmonic) = 98% Technology: K-Search 18 K-Search allows the user to perform complex queries of three types: ◦ Keyword Search: simply inserting keywords the results are retrieved and displayed. ◦ Semantic Search: using ontology concepts to focus through the available knowledge For example ◦ E.g. Identify geographic regions where staffXYZ filed report<typed> concerning componentXYZ detailing damageMechanismXYZ in the last 3 months. ◦ Hybrid Search: perform queries mixing semantic and keyword approaches. Technology: K-Search 19 A query is created via a web form interface enabling easy graphical composition of semantic concepts and keyword-based conditions. ◦ Keywords can be inserted into a default form field in a way similar to most search engines; ◦ Boolean operators AND and OR can be used in their combination. ◦ Additional conditions on conceptual knowledge can be easily added to the query by clicking on an ontology concept. Technology: K-Search 20 Quantitative analysis of query results enables ◦ Problem identification ◦ Trends to be plotted using conceptual information K-Search supports: ◦ Visualisation of quantitative information. ◦ Automatically generating graphs and charts. ◦ External support for analysis. Graphs and documents can be shared. Connection to External applications (MS OFFICE, DB and statistical analysis packages). Technology: K-Search 21 50 45 40 35 30 service engineers designers others 25 20 15 10 5 ep t co nc em en t ev ne en gi pr ob l co m po ne nt 0 The starting point of Search varies from user experience and focus Technology: K-Search - eval 22 Engineers Designers Other Different users and groups of users use search differently. K-Search empowers such flexibility Technology: K-Search - eval 23 Usability evaluations using - ISO DIS 9241-11 Technology: K-Search - eval 24 K-Search has been awarded runner-up status in the Rolls-Royce, Directors Creativity Award 2007 It is deployed in ◦ ◦ ◦ ◦ RR – Event Reports RR – Technical Variance Documents X-Media Box and Kernal Coming soon: RR – ERMS RR – Module Condition Reports RR - C-Sheet RR - Event Summary Reports RR - Modification bulletins RR - Service bulletins RR - SDM/Maximo Talkback (feedback management for entertainment industry) Patented technology for Hybrid search Technology: K-Search - Validation 25 K-Search is designed to work ontop of K-Store ◦ A hybrid keyword and semantic store K-Search can query a variety of distributed knowledge K-Search IE Capture: legacy Documents Database plugins K-Forms Captured Knoweldge K-Form Technology: K-Search 26 Capturing Knowledge from legacy documents requires: ◦ Custom extraction per document type. ◦ Skilled programmer involvement. ◦ A small degree of inprecision. K-Forms intends to provide an alternative means for Knowledge acquistion ◦ Capture at point of knowledge generation Technology: K-Forms 27 Semantic Capture of New Knowledge ◦ K-Forms A flexible form design/entry based application ◦ Allows dynamic form generation to capture new knowledge at the time needed. Doesn’t require specialised knowledge/technical skill. Users can design custom forms at any moment. ◦ Captured knowledge according to interlinked knowledge structures (ontologies). Technology: K-Forms 28 When a new form is designed ◦ Behind the scenes an ontology is generated specifying the form concepts and relations between them. ◦ Also linkages are made between existing knowledge structures to link knowledge regardless of capture context. Technology: K-Forms FormA Person Name Feedback ... ... 29 FormB FormA Feedback ... Person Person Name ... Name ... Project Organisation ... ... Linkages develop between forms automatically creating semantically interlinked knowledge at the point of its capture without additional effort from users. In a corporate knowledge base this will provide a complete interconnected knowledgebase automatically through usage. Technology: K-Forms 30 A spin-out company from the Europe's largest NLP Research Group at the University of Sheffield (UK) Commercialising Business Intelligence and Knowledge Management software. ◦ Semantic Knowledge Capture technologies ◦ Hybrid Storage technologies ◦ Hybrid Search technologies K-Now 31 The technology commercialised was developed primarially within IPAS. K-Now is primarily based on the combination of two main ontology based semantic tools: ◦ K-Forms – a flexible business tool for semantic capture of conceptual knowledge at the point of generation. ◦ K-Search – a server-based business tool for semantic, keyword and hybrid knowledge search. ◦ Other technologies K-Store, K-ML, K-Rules. Aim o Deriving business advantage for dynamic knowledge needs across (and beyond) an organisation. K-Now 32 A.-S. Dadzie, R. Bhagdev, A. Chakravarthy, S. Chapman, J. Iria, V. Lanfranchi, J. Magalhães, D. Petrelli and F. Ciravegna: “Applying Semantic Web Technologies to Knowledge Sharing in Aerospace Engineering” in Journal of Intelligent Manufacturing, to appear in 2008 Ravish Bhagdev, Sam Chapman, Fabio Ciravegna, Vitaveska Lanfranchi and Daniela Petrelli: “Hybrid Search: Effectively Combining Keywords and Semantic Searches” in Proceedings of the 5th European Semantic Web Conference, ESWC 08, Tenerife, June 2008 Vitaveska Lanfranchi, Ravish Bhagdev, Sam Chapman, Fabio Ciravegna, Daniela Petrelli: “Extracting and Searching Knowledge for the Aerospace Industry” in Proceedings of 1st European Semantic Technology Conference, Vienna May31, June 1 2007 s.chapman@k-now.co.uk http://www.k-now.co.uk Highlight Publications 33 ` s.chapman@k-now.co.uk http://www.k-now.co.uk Questions 34