Mobile Database Systems

advertisement
Integration, Diffusion and Merging in
Information Management Discipline
Vijay Kumar
School of Computing and Engineering
University of Missouri-Kansas City
5100 Rockhill Road
Kansas City, MO 64110, USA
kumarv@umkc.edu
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Outline
 Fully Connected Information Space







Prolifiration of Data Formats
Information Domains
Information Integration Scenario
Mobile Database System
Transaction Management
Data Broadcast
Conclusion
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Fully connected information space
Air
Land and water
Under water
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Integration
Two or more data segments are put together to form
a single meaningful segment. For example, invoice
from two or more different companies are integrated
together to bill the customer. Theses invoices may
have totally different formats.
Final document D = d1  d2  …  dn; where di’s are
component documents and format (di)  format (dj).
If format (di) = format (dj), the semantics may not be
the same.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Integration
Space database
Airline database
Railway database
Bank database
Metro database
Taxi database
Cruise database
Insurance database
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Diffusion
A data segment is transposed (diffused) into another
segment that has a different format.
Figure 1 (.jpg or .jif)
Invoice 1 (.pdf)
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Merging
Two or more data segments are put together to form
a single meaningful segment. These are semantically
identical data streams but could have different
formats.
Inv
oic
e1
Invoice 3 (1 and 2 are merged)
e
oic
2
Inv
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
It is not always easy to identify these distinctly in
information management activities.
These problems make management of information
quite difficult and the situation is getting complex
because of the proliferation of mobile environment,
web, data warehousing, and sensor technology.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
We discuss a few disciplines, try to understand
their information management needs, and look at
some solutions. Details discussions on these
topics can be found in my papers.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Current health care services are
Highly federated.
Patients are seen in multiple departments and
physician’s offices.
Prescriptions
laboratory
are
filled
in
pharmacies,
and
Radiographic information is captured in another
environment.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
From data format viewpoint information from each
device including human is represented in a
specialized format usually not compatible to each
other. This is not only time consuming but primitive
from current information management viewpoint.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Highly heterogeneous medical informatics domain
Hand recorded history
Data format - Fh
Voice recorded data
Data format - Fv
Electronic Medical Record
Fh
Fv
Physical examination data
Data format - Fp
OCR data
Data format - Fo
Fp
Fo
Fx
Fn
X-Ray data
Data format - Fx
NMR data
Data format - Fn
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
The data compatibility problem gets worse because of
Synonyms and homonyms which may be present
in all or some of the formats.
False data redundancy which may not be easily
recognizable. For example two different patients
with the same name may be examined by two
different caregivers and one is subjected to OCR
and another to X-Ray.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Two records may falsely taken as duplication
which may lead to incorrect billing or diagnosis.
There are a finite number of combinations of first
names and surnames. This leads to significant
real-world duplication of partial or entire names.
People are actually identified by more than one
name, often using a nickname or the middle
name rather than their given first name.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
How are we coping?
Identifiers such as SSN (Social Security Number),
or medical record number do not exist for all
people.
A positive DNA identification of individual patients
is not practicable in most locations.
Sequencing technology is currently limited and
expensive, and the resultant data is large.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Medical data acquisition methods increase the
difficulty of assimilating these facts into a
comprehensive patient history.
A majority of medical history is still hand-written
into patient charts, which is difficult or impossible
to acquire electronically.
Snapshot digital images increase the storage
requirements without significant analytical benefit.
Physician dictation is also not easily captured.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Thus, a correct and consistent maintenance of EMR
(Electronic Medical Record) is highly desirable
which must not undermine the efficiency in data
access and management.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
An approach
We observe that federated medical data of a
patient is related in a subtle way. We propose to
discover this interrelationship through “activityresult” binding.
An activity-result binding indicates that if the
result value is “x” then the activity must be “y” or
a result of “x” can only be produced by an activity
“y”.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
An approach
It is the transitive nature of this correlation that
forms the basis of our information gain approach
(iff x then y  if y then x). The fact that an event is
observed gives some insight into the activities,
and persons involved in the creation of the event.
Conversely the actors within and context of a
process can assist in interpretation of the event
result.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
The assertion is that it is possible to develop a
formal mechanism by which contextual knowledge
is used for search and analysis algorithms to
affect information gain.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
The data storage structure itself can often imply
information about a data acquisition method, the
location of an activity, or the person involved.
Example: If a record exists in a table, which has
been designated as a temporary holding area for
scanned data relating to cardiac catheterization
procedures, it can be inferred that the data
acquisition method was OCR, the encounter type
is cath lab procedure, and the location is cardiac
cath lab.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Example1: Catheterization procedure data is recorded on paper
and then scanned into the table CathOCR. To insert it in CathProc
table each tuple must be associated with a caregiver. The incoming
data must be matched against the repository Caregiver table to
retrieve the identifiers. Often the data collector does not know the
caregiver’s first name, so only an initial is inserted. An automated
batch process attempts to move data from the CathOCR table to
the CathProc table.
Each procedure must be associated with the appropriate caregiver.
A join on the CathOCR.CGLName=Caregiver.Lname produces 9
tuples but if the condition Cargiver.Specialty = Cardiology is added
to the query criteria, only the caregiver with CGID=1 is matched to
each procedure. This results in a 1:1 relationship between
procedures and caregivers, which is the desired outcome. The
query result is then inserted into the CathProc table.
Vijay Kumar, UMKC, USA
Lname Fname CGLname CGFname ProcData
Doe
John
Johnson
Doe
Jane
Johnson
Doe
Jamie
Johnson
- -M
- -- --
ClinicalDataRepository
CathOCR
OCRStagingDatabase
Integration, Diffusion and Merging in
Information Management Discipline
Only one M. Johnson
is a cardiologist.
Caregiver
CGID
Fname
Lname
Middle
Speciality
SSN
1
Johnson Michael Thomas Cardiology
123-45-6789
2
Johnson M
Neorology
111-22-3333
3
Johnson Mary
FamilyPractice 123-44-5555
}
}
Which M. Johnson?
CathProc
CathID
Lname Fname
CGID ProcData
1
Doe
John
1
- --
2
Doe
Jane
1
- --
3
Doe
Jamie
1
- --
}
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobile Database System (MDS)
The MDS that we present here is a ubiquitous
database system where unlike conventional
systems the processing unit could also reach data
location for processing. Thus, it can process
debit/credit transactions, pay utility bills, make
airline reservations, and other transactions
without being subject to any geographical
constraints.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobile Database System (MDS)
PSTN
DB
DB
HLR
VLR
DBS
DBS
MSC
MSC
BSC
BSC
Fixed Host
Fixed Host
Cn
C2
C1
BS
MU
MU
BS
MU
BS
MU
MU
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobilaction
An Execution Fragment eij is a partial order eij= {j, j}
where
j = OSj  {Nj} where OSj = k Ojk, Ojk{read, write},
and Nj {abort, commit}.
For any Ojk and Ojl where Ojk = R(x) and Ojl = W(x) for a
data object x, then either Ojk j Ojl or Ojl j Ojk
 Ojk OSj, OSj j Nj
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobilaction
A Mobile Transaction Ti is a triple <Fi, Li, FLMi>
where Fi = {ei1, ei2 ... , ein} is a set of execution
fragments, Li = {li1, li2, ... , lin} is a set of locations,
and FLMi = {flmi1, flmi2, ... , flmin} is a set of fragment
location mappings where j, flmi1(eij) = lij.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobilaction: Execution and Commitment
Conventional two-phase or three-phase commit
protocol would not work satisfactorily in MDS. It will
generate excessive overhead, which could not be
handled by MDS.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobilaction: Execution and Commitment
We have developed a commit protocol, which we
refer to as TCOT (Transaction Commit on Timeout)
which meets the following objectives:
Uses minimum number of wireless messages.
MU and DBS involved in Ti processing have
independent decision making capability
It is non-blocking.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Mobilaction: Execution and Commitment
TCOT is based on timeout concept. Timeouts are
usually used to identify a failure situation. We
assume that instead of failure the end of timeout
period indicates a success. Thus, at the end of the
timeout it is expected that the transaction is
committed.
This is the basis of defining the
completion of transaction commit in TCOT.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Application Recovery in Mobile Database System
We utilize the unique processing capability of
mobile agents in managing application log for
efficient application recovery, which will conform to
MDS limitations and mobile discipline constraints.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
Satellite broadcast system
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
Traffic
Stock
Airline
Theater
Restaurant
Weather
A sample IC space
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
Country
State
Cities
Zip Code
Major roads
A sample location hierarchy
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
Broadcast arrangement.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
Weather at = 6AM Traffic at = 7AM
Weather at = 8AM Traffic at = 9AM
This is a weather channel and weather info. at 6AM
follows
Weather report
Weather report
------This is a traffic channel and traffic info. at 7AM follows
Traffic report
Traffic report
-------------------
Broadcast Index
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
InfoSpace
Node
InfoSpace
Node
InfoSpace
coordinator
Broadcast
Scheduler
InfoSpace
Node
InfoSpace
server
Infostation
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
MU
BS
MU
MU
B
S
MU
BS
Surrogate
BS
MU
BS
Infostation
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Data Dissemination through wireless channels
Mobile Unit
Client
Proxy
File server
Period routine
Data
BT, PT, T
Data
Staging
Coordinator
Staging Machine
Infostation
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Push  Pull  Push
In data dissemination either Push or Pull model is
used. This is too rigid. We have proposed a
dynamic approach where data changes its
dissemination mode from Push to Pull to Push.
This under this scheme depending upon the
popularity factor a data is disseminated using
Push or Pull model.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
World Wide Web (Web)
A Web is a global sharable repository and an
excellent platform for e-commerce and mcommerce. Organizations no longer want to limit
the scope of the web to a repository and a
showcase; rather they want to use it as a powerful
communication tool to disseminate latest
information on all kinds of things.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Web services – existing scheme
There have been increasing demands from mobile
users to access location-based information
(locations of restaurant, movie theatres, etc.) and
desired services (ticket booking, buying pizzas,
etc.) at any time and from anywhere through
mobile devices using Location Dependent Query
(LDQ).
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Web services – existing scheme
Service Provider (SP)
Content Provider (CP)
Location based information scheme
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Web services – existing scheme
Each CP provides specific information and supports
specific format. A SP or a number of CPs has to
individually register with a SP for satisfying the needs of
a mobile user. In this tight integration, the user may
have to content with fixed information format and if the
user wants information on a particular topic his SP may
not be able to provide it because the SP may not be able
to register with the desired CP dynamically.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
we propose to use Web service as an interface
(middleware) between the CPs and SPs. Thus, a SP will
interact with Universal Description, Discovery &
Integration (UDDI), which in turn will reach relevant web
service to get the answer.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
Our scheme will make it possible to discover locationbased web services easily and cheaply through the
location-aware UDDI. We present a couple of simple
examples to show the usefulness of our proposal.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
Example 1: User subscribes to SP for service by giving
payment information and preference profile. The user
during his trip to Kansas City wants to go to a coffee
shop. He enters the request, gets the list of coffee
shops (identified using his personal profile), selects the
shop which gives discount on coffee, clicks the link and
pays for the item. In return he gets a transaction id,
goes to the shop, enters the id and gets his coffee.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
Example 2: User wants to eat special pizza. He selects
pizza store using mobile device after getting store’s
information from Web Bazaar. The service selects the
right kinds of pizzas using information from profile. The
pizza order is given to the shop and when it is ready the
GPS service is used to get user’s location. User
location is dispatched to map web service to obtain
route for delivery.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
Pull model
User requests a transaction, server looks for
appropriate service, contacts the CP and retrieves
the information, process data and gives the results
back to the user.
Push model
The server collects the information from different
data sources according to the current location of the
user and pushes it to mobile unit.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
Our aim is to develop a proactive architecture for mcommerce applications so we use push. Proactive
architecture requires caching of user required context
services on the mobile unit which greatly reduces the
query processing time as the upward communication
from the mobile unit to the middleware is greatly
reduced.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
Major requirements in mobile middleware are
Semantic profile driven cache management
Semantic web services description
Semantic web services discovery protocol
A structure of UDDI, which can search, based
location context of the user
broadcasting of web services information.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Our scheme- Web Bazaar
CP/LSP
Middleware
Semantic caching
User profile manager
User profile
Web service
registry
CP/LSP
Semantic service discovery
User context
A reference structure of Web Bazaar.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Sensor Technology
Pervading aspect of information space is very useful
but at the same time it creates a serious problem
related to the capture of information from difficult to
reach geographical locations not easily reachable by
humans such as ocean bed, enemy territories, deep
space, and so on.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Sensor Technology
Such requirements gave rise to sensor technology
where minute device called “sensor” is utilized for
data collection, validation, processing, and storing. A
sensor is a programmable, low-cost, low-power, multifunctional device.
One of its multi-functional
properties is its capability of continuously gathering
desired information about the location of its
deployment.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Sensor Technology
Air
Micro-sensornet
Above
water
Underwater
A sensor node
Wireless link
A micro-sensornet node
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Sensor Technology
We define the concept of “Embedded Sensor Space
(ESS)”, which is a countably infinite set of uniquely
programmed sensors. Thus, ESS = s1, s2, ..., s∞
where si (i = 1, 2, ..., ∞) is a programmed sensor. A
node in the embedded sensor net captures data of
its environment and dispatches it to other sensors
through routers.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Sensor Technology
Authorized users
Sensor
Landfill
Sensor
Landfill
Software Interface
Sensor
Landfill
Browser
DBMS
Database
WebDb
Data
warehouse
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Conclusions and Future of Data Management
We started with the common problem of data
integration and discussed its effect on a number of
disciplines. We do not have a perfect solution and
each system handles them in their own way. There is
one standard format and there cannot be one. Every
one has their own approach which is usually different
than others. So the only solution we can think of is
an intelligent interface which will achieve integration,
diffusion, and merging.
Vijay Kumar, UMKC, USA
Integration, Diffusion and Merging in
Information Management Discipline
Thank you
Vijay Kumar, UMKC, USA
Download