(smadnick@mit.edu) :
Information
Integration
Total Data Quality
(TDQM) Program (5)
Technologies Applications
COntext
INterchange
(COIN) (1)
Financial Services
(account aggregation)
RFID IT
Infrastructure
Others …
Security Analysis
Data
Quality
Military Logistics
MIT Information
Quality (MIT-IQ)
Program
Pros and cons
Of data standards
System Dynamics
Modeling of
State Stability (4)
Economic model of alternatives to
EU Database
Directive (3)
Strategy, Policy
& Legal Issues
Security
Stakeholder
Perceptions of
Security (2)
1
“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .
. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”
Source: Kathy Sawyer, Boston Globe , October 1, 1999, page 1.
2
Concept
: Length
Meters Feet
Shared
Ontologies f() meters feet
Conversion
Libraries
Context Management
Administrator part length
17
Context
Mediator
Source
Context
2
Select partlength x 3.35
From catalog
Where partno=“12AY”
Context
Transformation
Receiver
Context
Source
1 Select partlength
From catalog
Where partno=“12AY”
55.25
3
Receiver
3
Question 39: People are aware of good security practices.
MI
Gap between Assessment and Importance
– for your company
Overall = 1.28
(5.04 vs. 6.32) Overall
MA
Gap
Miscellaneous 1 = 2.40
(4.20 vs. 6.60)
Company X 2 = 1.83
(5.00 vs. 6.83)
Misc.
Comp X
Company W 2 = 1.89
(4.61 vs. 6.50)
Comp W
Company I 3 = 0.44
(5.33 vs. 5.78)
1
Comp I
4
Original pilot sample: diverse array of companies many middle-managers
2
5 6
High-tech organizations
3 Non-USA company
7
4
Adopted
EU introduced Database
Directive granting database makers sui generis right to prevent unauthorized extraction and reutilization of the whole, a substantial part of, or systematic extraction of insubstantial part of database contents
HR 2652 Collections of
Information Antipiracy Act: criminal and civil remedies if reuse of substantial part of another person’s database causes or has the potential to cause harm
Failed Failed
HR354
Collections of
Information
Antipiracy Act: similar to HR
2652 with more fair use exceptions
Failed
HR 3261 Database and
Collections of Information
Misappropriation act: disallow free riders from creating functional equivalent databases reduce revenue of the creators
91 96 98 99
Start
Feist case: US
Supreme Court decided that databases lacking minimal originality are not copyrightable
Failed
HR 3531 Database
Investment and
Intellectual Property
Piracy Act: similar to
EU Directive, with no fair use exceptions
Failed
HR 1858 Consumer and Investor Access to
Information Act: disallow verbatim copying of a database
03 04
Failed
HR 3872 Consumer Access to Information Act: prevent free-rider from engaging in direct competition that threatens the existence or the quality of creator database
Incentives of creating databases v . Value Creation through data reuse
5
(4) MIT Model – High Level System Dynamics View:
Loads on State Stability vs. Capacity to Manage
+
+
Regime
Legitimacy
+
-
+
State Institutional
Capacity
+ -
-
-
+
Economic
Performance
-
+
Investment in Other
Productive
Investments
-
+
Invesment in
Internal/External security
+
Regime Force and Violence
(Regime
Anti-terrorism
Violent Activity)
+
-
-
-
Anti-Regime
Activity
+
-
+
+
+
Dissident
Institutional
Capacity
Insurgents
+
-
Demographic and
Socio-political
Cleavages
-
Civic Capacity and
Social Liberties
-
-
Socio-political
-
Mobilization
+
6
Picture of old lady or young lady ?
7
In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians said their forces would be in the field in Bavaria by Oct. 20 .
The Austrian staff planned based on that date in the
Gregorian calendar .
Russia, however, used the ancient Julian calendar , which lagged 10 days behind.
The difference allowed Napoleon to surround Austrian
General Mack's army at Ulm on Oct. 21, well before the
Russian forces arrived.
Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
8
Data Integration: MIT COntext INterchange (COIN) Project
Professor Stuart Madnick & Dr. Michael Siegel, Sloan School of Management, {smadnick,msiegel}@mit.edu
Dealing with multiple semantically heterogeneous data sources has been the subject of intense initiatives in the COntext INterchange (COIN) research group at MIT for several years. The work, based on a solid theoretical foundation (there are several journal publications) has been used to develop a series of prototypes and development systems in applications ranging from financial services to integration of diverse intelligence data.
The COIN technologies consist of two key components that can be used separately but were designed to be efficiently used together: (1) a multimedia data federation engine (aka Cameleon), which can extract and merge data from semistructured HTML and XML web sites (i.e., a “web wrapper”) as well as traditional relational data bases and (2) a powerful data semantics mediation engine (aka COIN mediator) which provides a means for semantic representation and reasoning.
Some key papers on Cameleon can be found at:
“Information Aggregation using the Caméléon# Web Wrapper” at http://web.mit.edu/smadnick/www/wp/2005-06.pdf
“The Cameleon Web Wrapper Engine” at http://web.mit.edu/smadnick/www/wp/2000-03.pdf
“Querying Web-Sources within a Data Federation” at http://web.mit.edu/smadnick/www/wp/2006-09.pdf
Functionality: It is not surprising that data gathered by different agencies (possibly from different countries) for different purposes would represent
“similar” information in different ways. This can range from simple things like representing a person’s weight in pounds, kilograms, or even
“stones” (in UK) to what definition of “terrorist” is used in “number of terrorists.” In order to use these diverse sources in combination, these differences must be resolved A further complication is that these “definitions” (i.e. “contexts”) also change over time (i.e. “temporal context”) – as a simple example, military expenditures in France used to be reported in Francs, now they are in Euros. We have demonstrated the technology in application areas ranging from financial services to global price shopping to counter-terrorism and intelligence data. Some papers illustrating the functionality can be found at:
“Context Mediation Demonstration of Counter-Terrorism Intelligence (CTI) Integration” at http://web.mit.edu/smadnick/www/wp/2005-03.pdf
“Semantic Information Integration in the Large: Adaptability, Extensibility, and Scalability of the Context Mediation Approach” at http://web.mit.edu/smadnick/www/wp/2005-04.pdf
Technology: The COIN mediation reasoning is done with an innovative integrated framework of abductive and constraint logic programming to determine semantic conflicts. This is then combined with the ability to dynamically produce complex conversion programs by using symbolic equation solving techniques to invert, compose complex conversion programs from simple conversion components.
The basic theory of COIN mediation was presented in this paper:
“Context Interchange: New Features and Formalisms for the Intelligent Integration of Information” published in the ACM Transaction of
Information Systems at http://web.mit.edu/smadnick/www/wp/1997-03.pdf
The research has been extensively expanded since then to include temporal reasoning and much more powerful automatic and dynamic conversion program generation, described in papers such as:
“Information Integration In the Presence of Equational Ontological Conflicts” at http://web.mit.edu/smadnick/www/wp/2002-16.pdf
“Effective Data Integration in the Presence of Temporal Semantic Conflicts” at http://web.mit.edu/smadnick/www/wp/2004-02.pdf
Although there still exist important context mediation research issues, such as automated generation and reconciliation of ontologies, the current
COIN technologies are advanced enough to be immediate and valuable contributors to efforts in enterprise integration, semantic integration and the Semantic Web.
9