Spinning out of IPAS Sam Chapman University of Sheffield. Director, Knowledge Now Ltd.

advertisement
Spinning out of IPAS
Sam Chapman
Ravish Bhagdev, Vitaveska Lanfranchi, Fabio Ciravegna
University of Sheffield.
Director, Knowledge Now Ltd.
s.chapman@dcs.shef.ac.uk
s.chapman@k-now.co.uk
http://www.dcs.shef.ac.uk/~sam/
http://www.k-now.co.uk
IPAS
Technology
◦ Background
◦ K-Search
◦ K-Forms
Application of Technology
K-Now
Outline of talk
2
 Integrated Products And Services (IPAS) is
essential to the long-term success of businesses
in the interlinked emerging global environment.
◦ Aim: knowledge transfer between “three worlds”:
 new service design,
 new product design,
 the operation of existing products and services in the field.
 IPAS integrates and applies a number of
disparate research fields spanning
◦
◦
◦
◦
◦
computer science,
engineering design,
knowledge management,
manufacturing,
work psychology.
IPAS: what is IPAS?
 The objective is to develop and exploit
technologies such as
◦
◦
◦
◦
◦
◦
◦
◦
meta-data,
semantics,
ontologies,
text mining,
search,
social interactions,
knowledge representation,
semantic web services.
 to empower the right person with the
right information at the right time.
IPAS: objectives
 Funds: about £1.5M
◦ 39% Rolls Royce plc
◦ 42% Department Trade and Industry
 Now TSB
◦ 18% other industrial partners
IPAS: participants
 Sheffield’s role
◦ Use of
 Text mining,
 Knowledge capture,
 Semantics,
 Ontologies,
 Text and image annotation,
 Knowledge storage,
 Search,
◦ To manage knowledge
 sharing and reuse between the three worlds
IPAS: Sheffield's Role
To provide the capability to query legacy document content
as if it were structured information
◦
Via acquisition of content metadata


Automatically from legacy data
Manually and semi-automatically for new documents (including
images)
Empowering:
◦
Content based retrieval (NOT documents)
◦
Automatic quantitative analysis
◦
Automatic correlation of facts across documents and
archives
To improve the way documents are currently created,
providing tools that will make easier to semantically
“annotate” the content while writing
IPAS: Sheffield's Role
 Existing Knowledge managment solutions
rely upon combinations of basic
technologies:
◦ Simple document indexing,
◦ DataBase backed centralised views of how
knowledge should be organised, with
regimented interfaces to them,
◦ But such approaches have a myriad of issues,
 Sheffield’s role in IPAS tries to resolve these.
Technology: background
8
 Keyword approaches are simple and understood
by users but…
◦ Uncertain returns/Unquantifiable
 10,145 hits but how many are relevant to query?
◦ Synonyms
 “George W Bush” vs “43rd president of the USA”
◦ Homonyms
 “Bank” (river vs financial)
◦ Meronyms
 USA doesn’t include Massachusetts
◦ Documents not “knowledge”
 Repeated information – pointer to context not information
directly
 Although techniques to mitigate these “defects”
can be added to Search Systems they are:
◦ Partial solutions (addressing some cases only)
◦ Unsuited to most knowledge discovery needs
Technology: background-Keyword 9
 Most business
knowledge needs
are not being met
by keyword
approaches alone:
◦ Business Intelligence
◦ Quantitative Analysis
◦ Trend Spotting
 Uncertain returns
are not suitable for
many users
Technology: background-Keyword10
 Other issues in keyword approaches:
◦ Repetitive textual information
 Repeated documents/formats confuses
returns
◦ Constrained Language
 “Blade” in the aerospace domain is in
>70% available documents
 Context of information is paramount to its
meaning, e.g.
◦ “Damaged Blade XYZ”
◦ “Damaged Blade YXZ housing”
Technology: background-Keyword11
 Deployment
◦ Access to a corpus of documents. (crawled or
otherwise)
◦ Creation of Inverted Index
◦ Additional features [optional]
 Synonym list(s)
 Query Suggestion
 Custom Stemming
 OneBoxtm style additions
 Federated search combinations
 etc etc
Technology: background-Keyword12
 Storing graph structured
typed knowledge rather
than index based stores.
◦ Extensible and combinable
schemas
◦ Hierarchically typed
information
Name:
<String>
Sam
Chapman
Person:
<Person>
Person3456
Residence:
Age:
<Country>
UK
<Integer>
35
◦ Precise Query Support
 Quantifiable
Technology: background-Semantics
13
 Semantic approaches offer great
advantages vs keyword approaches but…
 Have shortcomings:
◦ Difficulties in obtaining structured data
 “Semantic capture”
◦ Query although powerful can be complex from
a user perspective
◦ Rigid structured organisation constrains user
interaction
 Not all possible "knowledge" is encoded into a
re-usable structured form.
Technology: background-Semantics
14
 Deployment
◦ Requires the acquisition of structured
information (semantic capture)
 Two methods:
◦ Legacy and external document support
◦ Capture at point of knowledge generation
 methodology for semantic capture will
be detailed
Technology: background-Semantics
15
 What is needed is a way to combine
keyword and semantic style
interactions into a single hybrid
approach to empower flexible query.
◦ Users can easily switch between, or
combine:
 Keyword approaches
◦ Simple
◦ Fast
◦ Not constrained by structured representations
 Semantic approaches
◦ Quantitative
◦ Accurate
◦ According to structured representations
Technology: K-Search - aim
16
 K-Search:
◦ Server-based tool for hybrid searching and
sharing of knowledge stored in a hybrid
repository.
 Advanced Query Capabilities.
◦ Mixed Modality and flexibility
 Enabling Knowledge reuse.
 Quantitative analysis / Business Intelligence /
Trend Analysis
 Simple user interface to create queries and
visualise results.
◦ Hides the complexity of the search mechanism and
underlying storage.
Technology: K-Search
17
 Semantic Capture of Legacy documents
◦ Uses:
 Document conversion (MS WORD and PDF)
 Machine Learning (K-ML based upon T-Rex)
 Rule based extraction (K-Rules based upon
Saxon)
◦ To - Extract structured knowledge using:
 Layout
 Typography
 Linguistic Context
 Etc
◦ Achieve:
 Precision = 98%
 Recall = 99%
 F-Measure (harmonic) = 98%
Technology: K-Search
18
 K-Search allows the user to perform
complex queries of three types:
◦ Keyword Search: simply inserting keywords the
results are retrieved and displayed.
◦ Semantic Search: using ontology concepts to
focus through the available knowledge
 For example
◦ E.g. Identify geographic regions where staffXYZ filed
report<typed> concerning componentXYZ detailing
damageMechanismXYZ in the last 3 months.
◦ Hybrid Search: perform queries mixing
semantic and keyword approaches.
Technology: K-Search
19
 A query is created via a web form interface
enabling easy graphical composition of semantic
concepts and keyword-based conditions.
◦ Keywords can be inserted into a default form field in a way
similar to most search engines;
◦ Boolean operators AND and OR can be used in their
combination.
◦ Additional conditions on conceptual knowledge can be easily
added to the query by clicking on an ontology concept.
Technology: K-Search
20
 Quantitative analysis of query results enables
◦ Problem identification
◦ Trends to be plotted using conceptual information
 K-Search supports:
◦ Visualisation of quantitative information.
◦ Automatically generating graphs and charts.
◦ External support for analysis.
 Graphs and documents can be shared.
 Connection to External applications (MS OFFICE, DB and statistical
analysis packages).
Technology: K-Search
21
50
45
40
35
30
service engineers
designers
others
25
20
15
10
5
ep
t
co
nc
em
en
t
ev
ne
en
gi
pr
ob
l
co
m
po
ne
nt
0
The starting point
of Search varies
from user
experience and
focus
Technology: K-Search - eval
22
Engineers
Designers
Other
Different users and groups of
users use search differently.
K-Search empowers such flexibility
Technology: K-Search - eval
23
Usability evaluations using - ISO DIS 9241-11
Technology: K-Search - eval
24
 K-Search has been awarded runner-up
status in the Rolls-Royce, Directors
Creativity Award 2007
 It is deployed in
◦
◦
◦
◦
RR – Event Reports
RR – Technical Variance Documents
X-Media Box and Kernal
Coming soon:








RR – ERMS
RR – Module Condition Reports
RR - C-Sheet
RR - Event Summary Reports
RR - Modification bulletins
RR - Service bulletins
RR - SDM/Maximo
Talkback (feedback management for entertainment industry)
 Patented technology for Hybrid search
Technology: K-Search - Validation 25
 K-Search is designed to work ontop of K-Store
◦ A hybrid keyword and semantic store
 K-Search can query a variety of distributed
knowledge
K-Search
IE Capture:
legacy
Documents
Database
plugins
K-Forms
Captured
Knoweldge
K-Form
Technology: K-Search
26
 Capturing Knowledge from legacy
documents requires:
◦ Custom extraction per document type.
◦ Skilled programmer involvement.
◦ A small degree of inprecision.
 K-Forms intends to provide an
alternative means for Knowledge
acquistion
◦ Capture at point of knowledge
generation
Technology: K-Forms
27
 Semantic Capture of New Knowledge
◦ K-Forms
 A flexible form design/entry based application
◦ Allows dynamic form generation to capture
new knowledge at the time needed.
 Doesn’t require specialised knowledge/technical
skill.
 Users can design custom forms at any moment.
◦ Captured knowledge according to interlinked
knowledge structures (ontologies).
Technology: K-Forms
28
 When a new form is designed
◦ Behind the scenes an ontology is
generated specifying the form
concepts and relations between them.
◦ Also linkages are made between
existing knowledge structures to link
knowledge regardless of capture
context.
Technology: K-Forms
FormA
Person
Name
Feedback
...
...
29
FormB
FormA
Feedback
...
Person
Person
Name
...
Name
...
Project
Organisation
...
...
 Linkages develop between forms automatically creating
semantically interlinked knowledge at the point of its
capture without additional effort from users.
 In a corporate knowledge base this will provide a
complete interconnected knowledgebase automatically
through usage.
Technology: K-Forms
30
 A spin-out company from the Europe's largest
NLP Research Group at the University of Sheffield
(UK)
 Commercialising Business Intelligence and
Knowledge Management software.
◦ Semantic Knowledge Capture technologies
◦ Hybrid Storage technologies
◦ Hybrid Search technologies
K-Now
31
 The technology commercialised was
developed primarially within IPAS.
 K-Now is primarily based on the
combination of two main ontology
based semantic tools:
◦ K-Forms – a flexible business tool for semantic
capture of conceptual knowledge at the point of
generation.
◦ K-Search – a server-based business tool for
semantic, keyword and hybrid knowledge search.
◦ Other technologies
 K-Store, K-ML, K-Rules.
 Aim
o
Deriving business advantage for dynamic
knowledge needs across (and beyond) an
organisation.
K-Now
32
 A.-S. Dadzie, R. Bhagdev, A. Chakravarthy, S. Chapman, J.
Iria, V. Lanfranchi, J. Magalhães, D. Petrelli and F.
Ciravegna: “Applying Semantic Web Technologies to
Knowledge Sharing in Aerospace Engineering” in Journal of
Intelligent Manufacturing, to appear in 2008
 Ravish Bhagdev, Sam Chapman, Fabio Ciravegna,
Vitaveska Lanfranchi and Daniela Petrelli: “Hybrid Search:
Effectively Combining Keywords and Semantic Searches” in
Proceedings of the 5th European Semantic Web
Conference, ESWC 08, Tenerife, June 2008
 Vitaveska Lanfranchi, Ravish Bhagdev, Sam Chapman,
Fabio Ciravegna, Daniela Petrelli: “Extracting and Searching
Knowledge for the Aerospace Industry” in Proceedings of
1st European Semantic Technology Conference, Vienna
May31, June 1 2007
s.chapman@k-now.co.uk
http://www.k-now.co.uk
Highlight Publications
33
`
s.chapman@k-now.co.uk
http://www.k-now.co.uk
Questions
34
Download