SEEKabstractITCON - UF CISE

advertisement
SEEK: Accomplishing Enterprise Information Integration Across Heterogeneous Sources
Authors
William O’Brien, Ph.D.,
Ph: 352-392-7213
Dept. of Civil & Coastal Engineering
Em: wjob@ce.ufl.edu
R. Raymond Issa, Ph.D., J.D. M.E. Rinker School of Building Construction
Ph: 352-392-7438
Em: raymond-issa@ufl.edu
Joachim Hammer, Ph.D.,
Ph: 352-392-2687
Computer & Information Science & Engineering
Em: jhammer@cise.ufl.edu
Mark Schmalz, Ph.D., O.D., Computer & Information Science & Engineering
Ph: 352-392-6831
Em: mssz@cise.ufl.edu
Joseph Geunes, Ph.D.,
Ph: 352-392-1220
Dept. of Industrial & Systems Engineering
Em: geunes@ise.ufl.edu
Sherman Bai, Ph.D.,
Ph: 352-392-1220
Dept. of Industrial & Systems Engineering
Em: bai@ise.ufl.edu
University of Florida, Gainesville, FL 32611-6120, USA
Keywords
Knowledge sharing, legacy system integration, knowledge capture, supply chain management,
value added analysis
Background
This paper describes a set of enabling technologies to support integration of information stored
in the heterogeneous legacy systems of the many firms on construction projects. Specifically,
we provide a new approach, known as SEEK – Scalable Extraction of Enterprise Knowledge,
that supports extraction and composition of knowledge across legacy sources that are
heterogeneous both physically and semantically. SEEK is not meant to be a general-purpose
extraction/composition toolkit. Rather it supports extraction of specific and limited forms of
knowledge. Current instantiations of SEEK extract knowledge supporting applications in
construction scheduling and supply-chain management.
In particular, our approach to knowledge integration differs significantly from approaches
based on data standards such as the IFC and AECXML. Practically, we believe that is it
implausible that the potentially hundreds of firms in a project supply chain will uniformly
subscribe to a single data standard or even a compatible set of standards. This makes connection
to heterogeneous information systems and translation of semantically heterogeneous data in
those systems dual challenges that must be overcome if knowledge stored in legacy systems is
to be leveraged.
Beyond connection and extraction, there are challenges in knowledge composition. Raw
data must often be transformed to a form suitable for decision-making. Consider for example
that much of the data used for operations in firms is detailed in nature, often mimicking
accounting details or a detailed work breakdown structure. This data is too detailed for many
enterprise level decision support and analysis tools (for example, quantitative supply chain
models). In general, we must compose the data needed as input for analysis tools from data used
by legacy applications that were developed for other purposes.
Objectives
SEEK is an attempt to overcome the collective challenges of assembling knowledge resident
in numerous legacy information systems. Instantiation of SEEK will allow the implementation
of enterprise level decision support tools to improve construction performance. SEEK is
designed with several aspects of scalability that promise:
 Rapid configuration with semi-automatic set-up: SEEK can be flexibly configured to
accept a wide variety of legacy systems, removing the burden of set-up from the firm.
 Customization of specific instantiations of SEEK components through user
configuration and tuning, assisted by domain experts or knowledge engineers.
 Composition of knowledge via analysis and processing of data extracted from the legacy
source, allowing queries beyond those natively supported by the source.
 Protection of source-specific, proprietary knowledge by establishing a layer between the
source and enterprise/decision maker. (SEEK tools may be provided by a third-party
separate from the decision maker, encouraging adoption.)
 Extended capabilities through upgrades to the modular components of SEEK.
 Application to numerous domains, such as design, prototyping, test, manufacturing, and
maintenance, via the basic SEEK modular architecture.
Methodology
A high-level view of the SEEK architecture is shown in Figure 1. SEEK provides a
middleware layer that bridges the gap between legacy information sources and decision makers
or decision support tools employing information from legacy systems. SEEK thus follows
established mediation/wrapper methodologies.
Applications/
decision support
End Users and
Decision Support
SEEK
Analysis
Module
Sources
Knowledge
Extraction
Module
source
expert
Wrapper
executive
Legacy data
and systems
Legend:
run-time/operational data flow
build-time/set-up/tuning data flow
Figure 1: Schematic diagram of SEEK logical architecture.
abstract for ITCON – Special Edition on Knowledge Management in Construction
University of Florida
2
Novel aspects of SEEK include an analysis module and a knowledge extraction module
integrated with the legacy data interface. The analysis module supports advanced processing of
data extracted from legacy sources by wrappers, further supporting composition of knowledge
required by decision makers or support tools. The knowledge extraction module directs setup of
the analysis module and wrapper, thus supporting (a) automatic connection of legacy sources
with decision makers, and (b) fine tuning of the knowledge extraction process by domain
experts and knowledge engineers or managers.
In practice, the knowledge extraction and analysis modules significantly extend existing
wrappers by (1) accessing knowledge encoded in applications as database schemas, business
rules, high-level code, etc. rather than accessing data only; (2) supporting discovery of
operational knowledge with customizable templates; (3) integrating machine learning
techniques to simplify and speed up customization of the knowledge extraction templates. Since
the wrapper and analysis tools are integrated in SEEK, the amount of data transferred between
legacy sources and decision makers can be reduced. We also note that security of information
exchange can be improved since the extracted information can be filtered and summarized in
the analysis module prior to transmission over the enterprise network.
Results
Our Phase-I development includes initial prototype design, implementation, and validation.
Fig. 2 shows the prototype architecture (an elaboration of the high-level view of Fig. 1). The
wrapper works as follows. Requests from the hub are represented in XML using the global
schema defined in the hub. Based on the given input data and the desired output described in the
query, the wrapper selects a suitable execution plan using a library of customized analysis
functions for decision-support and planning.
Decision support specific
SEEK firm-specific
FIRM
SEEK-general modules
Data
SEEK
Decision-Support
Request
Send/Receive
MS Project
(JDBC/ODBC)
SQL
Wraper
(SQL Query library)
Result Set
for the SQL query
Parsed XML
Query DOM
Result
XML Doc
Invoke Wrapper
Extractor
Intermediate
Result Set
Invoke Analysis
Send/Receive
(Sockets)
Query
XML Doc
Analysis
Library
Signature &
Given values
Matcher
Analysis
Module
Final Result
XML
Parser
Full Result
XML Doc
Parsed Query
Template
Invalid
Signature
XML Result
Generator
Parser
DTD for hub
schema
Query Template
(XML-DOC)
Figure 2: Conceptual Overview of the SEEK Prototype Architecture.
abstract for ITCON – Special Edition on Knowledge Management in Construction
University of Florida
3
Maximum scalability is provided by carefully separating the wrapper components into three
groups: (1) decision support tool-specific components (shown as white boxes in Fig. 2), (2)
SEEK general components (dark shaded boxes) and, (3) SEEK firm-specific components (light
shaded boxes). The modular architecture supports incremental code enhancement to specific
modules without impacting other modules. For example, we separated the generation of firm
specific Structured Query Language (SQL) queries in the wrapper module from the code in the
analysis module that uses an internal data format to represent data or requests. Hence, when the
wrapper connects to a firm that supports a different data model or query interface (e.g., PERL),
only the wrapper module needs to be modified. This setup is accomplished semi-automatically
via the knowledge extraction module (not shown in figure 2).
Figure 3 contains a screenshot of our SEEK prototype that depicts the results of a query
about resource availability over time. The source being queried is a MS Project application.
Subcontractor Resource Availability
1
4
Figure 3: Snapshot of our SEEK prototype, illustrating a query about resource availability.
(note that we have not invested in interface design as SEEK will likely exist between
applications and data sources as shown in figure 1)
Acknowledgements
This material is based upon work supported by the National Science Foundation under grant
number CMS-0075407.
abstract for ITCON – Special Edition on Knowledge Management in Construction
University of Florida
4
Download