CSE
5810
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-255
Storrs, CT 06269-2155 steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
Copyright © 2008 by S. Demurjian, Storrs, CT.
SWEA1
CSE
5810
Emerging Discipline in Mid-1990s
Software as Collection of Interacting Components
What are Local Interactions (within Component)?
What are Global Interactions (between Components)?
Advantages of SW Architectural Design
Understand Communication/Synchronization
Definition of Database Requirements
Identification of Performance/Scaling Issues
Detailing of Security Needs and Constraints
Towards Large-Scale Software Development
For Biomedical Informatics:
What are Architectures for Data Sharing?
How is Interoperability Facilitated?
SWEA2
CSE
5810
Exceed Traditional Algorithm/Data Structure
Perspective
Emphasize Componentwise Organization and System
Functionality
Focus on Global and Local Interactions
Identify Communication/Synchronization
Requirements
Define Database Needs and Dependencies
Consider Performance/Scaling Issues
Understand Potential Evolution Dimensions
SWEA3
CSE
5810
Architecturally:
Modules
Interconnections Among Modules
Decomposition into Subsystems
Code:
Algorithms/Data Structures
Tasking/Control Threads
Executable:
Memory Management
Runtime Environment
Is this a Realistic/Accurate View?
Yes for a Single “Application”
What about Application of Applications?
System of Systems?
SWEA4
CSE
5810
Is there any Engineering?
Is there any Science?
Collection of Disparate Techniques:
Data-Flow Diagrams
E-R Diagrams
Finite State Machines
Petri Nets
UML Class, Object, Sequence, Etc.
Design Patterns
Model Drive Architectures
What is being “Engineered”?
How do we Know we are Done?
E.g. Does Artifact Match Specification?
SWEA5
CSE
5810
Specification (Abstract Models, Algebraic Semantics)
Software Structure (Bundling Representation with
Algorithms)
Languages Issues (Models, Scope, User-Defined
Types)
Information Hiding (Protect Integrity of Information)
Integrity Constraints (Invariants of Data Structures)
Is this up to date?
What else can be Added to List?
Design Patters
Model Driven Architectures
XML –Data Modeling and Dependencies
Others?
SWEA6
CSE
5810
Compilers Have Had Great Success
Originally by Hand
Then Compiler Compilers
Parser Generators - Lex/Yacc
Solid Science Behind Compilers
Regular, Context Free, Context Sensitive
Languages
FSAs, PDAs, CFGs, etc.
Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writing
SWEA7
CSE
5810
C - Still Remains Industry Stronghorse
Separate Compilation
Decomposition of System into Subsystems, etc.
Shared Declarations
ADTs in C, But Compiler won't Enforce Them
Modula-II and Ada 83 Had
Information Hiding
Public/Private Paradigm
Module/Package Concepts
Import/Export Paradigm
Rigor Enforced by Compiler – but Can’t
Bind/Group Modules into Subsystems
Precisely Specify Interconnections and Interactions
Among Subsystems and Components
SWEA8
CSE
5810
C++ and Ada95
Considered “Legacy” Languages - Old
Java, C# - Are they Headed Toward Legacy?
How do they Rate?
What Do they Offer that Hasn't been Offered
Before?
What are Unique Benefits and Potential of Java?
What about new Web Technologies?
Javascript, Perl, PhP, Phython, Ruby
XML and SOAP
Mobile Computing
How do all of these fit into this process?
Particularly in Regards to C/S Solutions!
SWEA9
CSE
5810
Architectural Description Languages
Provide Tools to Describe Architectures
Definition and Communication
Codification of Architectural Expertise
Frameworks for Specific Domains
DB vs. GUI vs. Embedded vs. C/S
Formal Underpinning for Engineering Rigor
What has Appeared for Each of these?
Struts for GUI
Open Source Frameworks (mediawiki)
Wide-Ranging Standards (XML)
Model-Driven Architectures
What Else???
SWEA10
CSE
5810
What are Popular Architectural Styles?
How are they Characterized?
Example in Practice
Explore a Taxonomy of Styles
Focus on “Micro-Architectures”
Components
Flow Among Components
Represents “Single” Application
Forms Basis for “Macro-Architectures”
System of Systems
Application of Applications
Significantly Scaling Up
SWEA11
CSE
5810
Data Flow Systems
Batch Sequential
Pipes and Filters
Call & Return Systems
Main/Subroutines
(C, Pascal)
Object Oriented
Implicit Invocation
Hierarchical Systems
Virtual Machines
Interpreters
Rule Based Systems
Data Centered Systems
DBS
Hypertext
Blackboards
Independent
Components
Communicating
Processes/Event
Systems
Client/Server
Two-Tier
Multi-Tier
SWEA12
CSE
5810
Establish Framework of …
Components
Building Blocks for Constructing Systems
A Major Unit of Functionality
Examples Include: Client, Server, Filter, Layer, DB
Connectors
Defining the Ways that Components Interact
What are the Protocols that Mandate the Allowable
Interactions Among Components?
How are Protocols Enforced at Run/Design Time?
Examples Include: Procedure Call, Event Broadcast,
DB Protocol, Pipe
SWEA13
CSE
5810
What Is the Design Vocabulary?
Connectors and Components
What Are Allowable Structural Patterns?
Constraints on Combining Components &
Connectors
What Is the Underlying Conceptual Model?
Von Newman, Parallel, Agent, Message-Passing…
Are their New Emerging Models?
Collaborative Environments/Shareware?
What Are Essential Invariants of a Style?
Limits on Allowable Components & Connectors
Common Examples of Usage
Advantages and Disadvantages of a Style
Common Specializations of a Style
SWEA14
CSE
5810
Components are Independent
Entities. No Shared State!
Sort
Sort Merge
Components with
Input and Output
Connectors for Flow Streams of I/O
Filters:
Invariant: Unaware of up and Down Stream
Behavior
Streamed Behavior: Output Could Go From
One Filter to the Next One Allowing Multiple
Filters to Run in Parallel.
SWEA15
CSE
5810
Possible Specializations:
Pipelines - Linear Sequence
Bounded - Limits on Data Amounts
Typed Pipes - Known Data Format
What is a Classic Example?
Other Examples:
Compilers
Sequential Processes
Parallel Processes
SWEA16
CSE
5810
Text Information Retrieval Systems
Scanning Newspapers for Key Words, Etc.
Also, Boolean Search Expressions
Where is Such an Architecture Utilized Today?
What is Potential Usage in BMI?
User
Search
Controller
Commands
Disk
Controller
Control
Programming
Result
Query
Resolver
Term
Comparator
Data
Search
DB
SWEA17
CSE
5810
Can be Structured to Model Medical Workflows
Series of Actions taken by Stakeholders on Patient
SWEA18
CSE
5810
Extension of Rishi’s work …
Linear Ontology Architectural Pattern (LOAP)
Model Knowledge in a Process
Continue with Examples from Prior PPT http://www.engr.uconn.edu/~steve/Cse5810/Attaining-Semantic-
Enterprise-Interoperability-through-Ontology-Architectural-
Patterns.pdf
SWEA19
CSE
5810
Linear Ontology Architectural Pattern (LOAP)
Diagnosis, Test, and Anatomy Ontologies
SWEA20
CSE
5810
SWEA21
CSE
5810
What has Classic OO Solution Evolved into Today?
Client (Browser + Struts)
Server (Many Variants of OO Languages)
Database Server (typically Relational)
Different Style (e.g., Design Pattern)
Does Pattern Capture All Aspects of Style?
Do we Need to Couple Technology with Pattern?
Dr. D, Jan 01, 08
Fever, Flu, Bed Rest
No Scripts
No Tests
Item(Phy_Name*, Date*,
Visit_Flag, Symptom, Diagnosis, Treatment,
Presc_Flag, Pre_No, Pharm_Name, Medication,
Test_Flag, Test_Code, Spec_No, Status, Tech)
SWEA22
CSE
5810
Emerged as the Recognition that in Object-Oriented
Systems Repetitions in Design Occurred
Gained Prominence in 1995 with Publication of
“Design Patterns: Elements of Reusable Object-
Oriented Software”, Addison-Wesley
“… descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context…”
Akin to Complicated Generic
Usage of Patterns Requires
Consistent Format and Abstraction
Common Vocabulary and Descriptions
Simple to Complex Patterns – Wide Range
SWEA23
CSE
5810
Utilized to Define a One-to-Many Relationship
Between Objects
When Object Changes State – all Dependents are
Notified and Automatically Updated
Loosely Coupled Objects
When one Object (Subject – an Active Object)
Changes State than Multiple Objects (Observers –
Passive Objects) Notified
Observer Object Implements Interface to Specify the Way that Changes are to Occur
Two Interfaces and Two Concrete Classes
SWEA24
CSE
5810
SWEA25
CSE
5810
http://java.sun.com/blueprints/patterns/MVC-detailed.html
SWEA26
CSE
5810
Three Parts of the Pattern:
Model
Enterprise Data and Business Rules for Accessing and
Updating Data
View
Renders the Contents (or Portion) of Model
Deals with Presentation of Stored Data
Pull or Push Model Possible
Controller
Translates Interactions with View into Actions on
Model
Actions could be Button Clicks (GUI), Get/Post http
(Web), etc.
SWEA27
CSE
5810
http://java.sun.com/blueprints/patterns/MVC-detailed.html
SWEA28
CSE
5810
Unified higher-level global interface/system developed from
a set of complex heterogeneous source interfaces/subsystems
makes local sources easier to utilize for the clients
Composition of Pattern
Subsytems
System Composed of Subsytems
Clients
SWEA29
CSE
5810
SWEA30
CSE
5810
Leverage Façade Pattern for
Local As View (LAV) Methodology
MApping FRAmework (MAFRA) provides a conceptual framework for building semantic mappings between heterogeneous ontology models using semantics bridges
High Level Centralized Ontology Architectural
Patterns (COAP)
Extend Façade Concept
Subsystems are Local Schemas
System is Global Schema
SWEA31
CSE
5810
SWEA32
CSE
5810
SWEA33
CSE
5810
COAP Allows us to Define and Integrate Ontologies at a Much Higher Level
Integrating Multiple Ontologies
OM
1
Local
Ontology
Model (LO
1
)
OM
2
Global Ontology Model (O
G
)
Local
Ontology
Model (LO
2
)
OM
3
OM
N
Local
Ontology
Model (LO
3
)
….
Local
Ontology
Model (LO
N
)
SWEA34
CSE
5810
Example Unifies ICS, DSM, SNOMED etc.
UMLS
SNOMED-CT
Symptoms, Procedure,
Findings, etc.
Disease
Mental
Disorders
ICD
OMIM
Gene
Ontology
Gene
DSM
SWEA35
CSE
5810
Example Unifies ICS, DSM, SNOMED etc.
UMLS Metathesaurus
M
ICD
ICD Codes
M
SNOMED
SNOMED
…………
M
NCBI
NCBI
M
LOINC
LONIC
SWEA36
CSE
5810
Useful Systems
Base Utility
Core level
Users
Components - Virtual Machine at Each Layer
Connectors - Protocols That Specify How Layers
Interact
Interaction Is Restricted to Adjacent Layers
SWEA37
CSE
5810
Advantages:
Increasing Levels of Abstraction
Support Enhancement - New Layers
Support for Reuse
Drawbacks:
Not Feasible for All Systems
Performance Issues With Multiple Layers
Defining Abstractions Is Difficult.
SWEA38
CSE
5810
One Approach to Constructing Access to Patient Data for Clinical Research and Clinical Practice
Construct Layered Data Repositories as Below
Each Layer Targets Different User Group
Need to Fine Tune Access Even within Layers
Aggregated
De-identified
Patient
Data
Provider
Cl. Researchers
Public Health Researchers
SWEA39
CSE
5810
ISO Open Systems Interconnect (OSI) Model
Now Widely Used as a Reference Architecture
7-layer Model
Provides Framework for Specific Protocols (Such as IP, TCP, FTP, RPC, UDP, RSVP, …)
Application
Presentation
Session
Transport
Network
Data Link
Physical
Application
Presentation
Session
Transport
Network
Data Link
Physical
SWEA40
CSE
5810
Application
Presentation
Session
Transport
Network
Data Link
Physical
Application
Presentation
Session
Transport
Network
Data Link
Physical
Physical (Hardware)/Data Link Layer Networks:
Ethernet, Token Ring, ATM
Network Layer Net: The Internet
Transport Layer Net: Tcp-based Network
Presentation/Session Layer Net: Http/html, RPC,
PVM, MPI
Applications, E.g., WWW, Window System,
Algorithm
SWEA41
CSE
5810
Consider a set of Domain Models id
Name
Name
Disease
Symptom id
Disease
Id:Integer
Name:String
0…*
0…*
Symptom
Id:Integer
Name:String
Disease hasSymptom
Laboratory
Tests owl:Class
Disease
∩ owl:Class
Laboratory Tests
(c): Clinical OWL representation
(a) : Clinical ERD Model (b) : Clinical UML Model
Customer cId
Customer
Cloud Space cEmail
Space
Customer cId:Integer cEmail:String
0…*
0…*
Cloud Space
Space:Integer
Location:String hasCloudSpace
Cloud Space cloudAllows
Location
Content
Allowed
Content
Allowed
Content Allowed types:Enum
(d) : Business ERD Model
(e) : Business UML Model owl:Class
Customer
∩ owl:Class
CloudSpace
(f) : Business OWL representation
SWEA42
CSE
5810
Query and Web Service
Model Terminology
Mapping
Axioms & Rules
Ontology
Conceptual
Model
(a) : Layered Ontology Architectural Pattern
(LaOAP).
Query and Web Service
Disease Queries
Terminology
Heart Attack, Fever, Cold
Mapping
Disease(id) ~ Disease(uid)
Axiom
Disease ∩ Symtom
Disease Ontology
Model
(b) : Instance of LaOAP.
SWEA43
CSE
5810
Query and Web Service Layer
PREFIX laoap: <http://xmlns.com/Laoap/>
Select ?disease ?symp {?disease laoap:hasSymptom ?symp}
Terminology Layer
High Fever, Asthma, Heart Attack, John Smith, 50GB,
Mapping Layer
Disease
Illness id commonName severity owl:Class
Disease owl:Class
CloudSpace
∩
∩ uid name severity owl:Class
Symptom owl:Class
Customer
Axiom & Rules Layer
Conceptual Model Layer
Disease
Symptom hasSymptom cloudAllows
ContectAllowed
CloudSpace hasSpace Space
SWEA44
CSE
5810
Query and Web Service Layer
PREFIX laoap: <http://xmlns.com/Laoap/>
Select ?disease ?symp {?disease laoap:hasSymptom ?symp}
Terminology Layer
High Fever, Asthma, Heart Attack, John Smith, 50GB,
Mapping Layer
Disease
Illness severity id
commonName uid name severity owl:Class
Disease owl:Class
CloudSpace
∩
∩ owl:Class
Symptom owl:Class
Customer
Axiom & Rules Layer
Conceptual Model Layer
Disease
Symptom hasSymptom cloudAllows
ContectAllowed
CloudSpace hasSpace Space
SWEA45
CSE
5810
Query and Web Service Layer
PREFIX laoap: <http://xmlns.com/Laoap/>
Select ?disease ?symp {?disease laoap:hasSymptom ?symp}
Terminology Layer
High Fever, Asthma, Heart Attack, John Smith, 50GB,
Mapping Layer
Disease
Illness id commonName severity owl:Class
Disease owl:Class
CloudSpace
∩
∩ uid name severity owl:Class
Symptom owl:Class
Customer
Axiom & Rules Layer
Conceptual Model Layer
Disease
Symptom hasSymptom cloudAllows
ContectAllowed
CloudSpace hasSpace Space
SWEA46
CSE
5810
Ontology Pattern
(OP)
Content
OP
Structural
OP
Architectural
OP
Logical
OP
Lexico-Syntactic
OP
Naming
OP
Reasoning
OP
Annotation
OP
Presentation
OP
Correspondence
OP
Reengineering
OP
Mapping
OP
Logical Macro
OP
Transformation
OP
SchemaReengineering
OP
Gangemi, A., & Presutti, V. (2009). Ontology
Design Patterns. In Handbook on Ontologies:
International Handbooks on Information Systems
(pp. 221-243). IOS Press.
Refactoring
OP
SWEA47
CSE
5810
Time-Indexed-
Participation
Object
Setting-for
1
1
Setting-for
Event
Setting-for
1 temporal-location
Time-
Interval
(a) : CODeP Time Indexed Participation Pattern .
1…*
Role defines
Modal
Target
Task
1…* defines
Description
1…* satisfies classifies
Object
1…*
1…* classifies
1…*
Event participant
Setting-for
(b) : CODeP Task Role Pattern .
Situation
Space-
Region
Space-
Region
1 1
Space-Location
Object
1 temporal-part-of
Space-Location
Object
1…*
Participant-in
Constant-Participantin
1…*
Event
Time-
Interval
Temporal-location
Part-of
Event
Temporal-location
1…*
Time-
Interval
(c) : CODeP Participation Pattern .
Gangemi, A. (2006). Ontology Patterns for Semantic
Web Content. Proceeding of 4th International Semantic
Web Conference , (pp. 262-276).
SWEA48
CSE
5810 ks1 ks2 ks3
Blackboard
(shared data) ks8 ks7 ks6 ks4 ks5
Knowledge Sources Interact With the Blackboard.
Blackboard Contains the Problem Solving State Data.
Control Is Driven by the State of the Blackboard.
DB Systems Are a Form of Repository With a Layer
Between the BB and the KSs - Supports
Concurrent Access, Security, Integrity, Recovery
SWEA49
CSE
5810 c1 c2 c3
Database
(shared data) c8 c7 c6 c4 c5
Clients Interact With the DBMS
Database Contains the Problem Solving State Data
Control is Driven by the State of the Database
Concurrent Access, Security, Integrity, Recovery
Single Layer System: Clients have Direct Access
Control of Access to Information must be
Carefully Defined within DB Security/Integrity
SWEA50
CSE
5810
c8 c1 c2
Web Portal
Shared c7 c3 c6 c4 c5
Clients are Providers, Patients, Clinical Researchers
Database Underlies Web Portal
Simply a Portion of Architecture
Interactions with PHR (Patients)
Interactions with EMR (Providers)
Interactions with Database/Warehouse (Researchers)
SWEA51
CSE
5810 c8 c1
Virtual Chart c7 c2
c3 c6 c4 c5
Clients are Providers, Patients, Clinical Researchers
SWEA52
CSE
5810
Inputs
Data
(program state)
Program being interpreted
Outputs
Simulated interpretation engine
Selected instruction
Selected data
Internal interpreter state
What Are Components and Connectors?
Where Have Interpreters Been Used in CS&E?
LISP, ML, Java, Other Languages, OS
Command Line
SWEA53
CSE
5810
SWEA54
CSE
5810
Set point
Controller
Input variables
D s to manipulated variables
With Feedback
Process
Controlled variable
Set point
Input variables
Controller
D s to manipulated variables
Without Feedback
Process
Controlled variable
Also:
Open vs. Close Loop Systems
Well Defined Control and Computational
Characters
Heavily Used in Engineering Fields.
SWEA55
CSE
5810
SWEA56
CSE
5810
Clear Applicability to Medical Processes that have
Underlying BMI – Low Level Processes
Waiting for
Heart Signal irregular beat
Heartbeat
Heart Signal
Trigger
Local
Alarm timeout
Trigger
Remote
Alarm
Waiting for
Resp. Signal
Breath
Resp Signal
Alarm Reset
SWEA57
CSE
5810
Widespread use in Practice for All Types of
Distributed Systems and Applications
Two Kinds of Components
Servers: Provide Services - May be Unaware of
Clients
Web Servers (unaware?)
Database Servers and Functional Servers (aware?)
Clients: Request Services from Servers
Must Identify Servers
May Need to Identify Self
A Server Can be Client of Another Server
Expanding from Micro-Architectures (Single
Computer/One Application) to Macro-Architecture
SWEA58
CSE
5810
Normally, Clients and Servers are Independent
Processes Running in Parallel
Connectors Provide Means for Service Requests and
Answers to be Passes Among Clients/Servers
Connectors May be RPC, RMI, etc.
Advantages
Parallelism, Independence
Separation of Concerns, Abstraction
Others?
Disadvantages
Complex Implementation Mechanisms
Scalability, Correctness, Real-Time Limits
Others?
SWEA59
CSE
5810
Initial Data
Entry Operator
(Scanning &
Posting)
Advanced Data
Entry
Operators
Analyst Manager
Document
Server
Stored
Images/CD
Database
Server
Running
Oracle
10-100MB Network
RMI Registry
RMI Act.
Obj/Server
RMI Act.
Obj/Server
Functional Server
SWEA60
CSE
5810
Licensing
Licensing
Division
Scanning
Operator
Scanner
DB
Historical
Records
DB
Completed
Applications
DB
Supervisor
Review
DB
Stored
Images
Licensing Division
Data Entry Operator
Printer
DB
Basic
Information
Entered
New Licenses
New Appointments
FOI
Letters (Request
Information, etc.)
SWEA61
CSE
5810
Small Manufacturer Previously on C++
New Order Entry, Inventory, and Invoicing
Applications in Java Programming Language
Existing Customer and Order Database
Most of Business Logic in Stored Procedures
Tool-generated GUI Forms for Java Objects
SWEA62
CSE
5810
Passenger Check-in for Regional Airline
Local Database for Seating on Today's Flights
Clients Invoke EJBs at Local Site Through RMI
EJBs Update Database and Queue Updates
JMS Queues Updates to Legacy System
DBC API Used to Access Local Database
SWEA63
CSE
5810
Web Access to Brokerage Accounts
Only HTML Browser Required on Front End
"Brokerbean" EJB Provides Business Logic
Login, Query, Trade Servlets Call Brokerbean
Use JNDI to Find EJBs, RMI to Invoke Them
SWEA64
CSE
5810
Two-tier Through JDBC API is Simplest
Multi-tier: Separate Business Logic, Protect Database
Integrity, More Scaleable
JMS Queues vs. Synchronous (RMI or IDL):
Availability, Response Time, Decoupling
JMS Publish & Subscribe: Off-line Notification RMI
IIOP vs. JRMP vs. Java IDL:
Standard Cross-language Calls or Full Java
Functionality
JTS: Distributed Integrity, Lockstep Actions
SWEA65
CSE
5810
Architectural Styles Provide Patterns
Suppose Designing a New System
During Requirements Discovery, Behavior and
Structure of System Will Emerge
Attempt to Match to Architectural Style
Modify, Extend Style as Needed
By Choosing Existing Architectural Style
Know Advantages and Disadvantages
Ability to Focus in on Problem Areas and
Bottlenecks
Can Adjust Architecture Accordingly
Architectures Range from Large Scale to Small Scale in their Applicability
We’ll see Examples for BMI Shortly …
SWEA66
CSE
5810
Macro-Architectures
System of Systems
Application of Applications
Particularly for HIT and HIE!
Involves Two Key Issues
Interoperability
Heterogeneous Distributed Databases
Heterogeneous Distributed Systems
Autonomous Applications
Scalability
Rapid and Continuous Growth
Amount of Data
Variety of Data Types
Different Privacy Levels or Ownerships of Data
SWEA67
CSE
5810 Simple Federation
FDB Global
Schema
Federated
Integration
Multiple Nested Federation
FDB Global
Schema 4
Federated
Integration
Local
Schema
Local
Schema
Local
Schema
FDB 1
Local
Schema
Federation
FDB3
Federation
SWEA68
CSE
5810
Technology
Web/HTTP, JDBC/ODBC, CORBA (ORBs +
IIOP), XML
Architecture
Information Broker
• Mediator-Based Systems
• Agent-Based Systems
SWEA69
CSE
5810
DBMS
Web Server
CGI Script Invocation or JDBC Invocation
Web Server are
Stateless
DB Interactions Tend to be Stateful
Invoking a CGI
Script on Each DB
Interaction is Very
Expensive, Mainly
Due to the Cost of
DB Open
Internet
Browser
SWEA70
CSE
5810
DBMS
Web Server
Internet
Helper
Processes
CGI Script or JDBC
Invocation
To Avoid Cost of
Opening Database, One can Use Helper
Processes that Always
Keep Database Open and Outlive Web
Connection
Newly Invoked CGI
Scripts Connect to a
Preexisting Helper
Process
System is Still Stateless
Browser
SWEA71
CSE
5810
WWW Client
(Netscape)
WWW client
(Info. Explore)
Internet
HTTP Server
WWW Client
(HotJava)
DBWeb Gateway
DBWeb Gateway
DBWeb Gateway
DBWeb
Dispatcher
DBWeb Gateway
SWEA72
CSE
5810
Transcend Normal Two, Three, and Four Tier Solutions –
Macro-Architecture
Emerging Standards
FHIR, SMART, open mHealth
An Architecture of Architectures!
Need to Integrate Systems that are Themselves Multi-Tier and Distributed
Need to Resolve Data Ownership Issues
State of Connecticut Agencies Don’t Share
Competing Hospitals Seek to Protect Market Share
T1, T2, and Clinical Research Requires
Interoperating Genomic Databases/Supercomputers
Integration of De-identified Patient Data from Multiple Sources to
Allow Sufficient Study Samples
De-identified Data Repositories or Data Marts
Dealing with Ownership Issues (DNA Research)
SWEA73
CSE
5810
A Major Opportunity for Business
A Global Marketplace
Business Across State and Country Boundaries
A Way of Extending Services
Online Payment vs. VISA, Mastercard
A Medium for Creation of New Services
Publishers, Travel Agents, Teller, Virtual Yellow
Pages, Online Auctions …
A Boon for Academia
Research Interactions and Collaborations
Free Software for Classroom/Research Usage
Opportunities for Exploration of Technologies in
Student Projects
What are Implications for BMI, HIE?
SWEA74
CSE
5810
Server
Corporate
Network
Business to Business
Information sharing
Ordering info./status
Targeted electronic commerce
Intranet
Decision support
Mfg.. System monitoring corporate repositories
Workgroups
Server
Internet
Internet
Sales
Marketing
Information
Services
Provider Network
Server
Corporate
Network
Server
Exposure to Outside
Provider Network
SWEA75
CSE
5810
Everyone can Publish Information on the Web
Independently at Any Time
Consequently, there is an Information Explosion
Identifying Information Content More Difficult
There are too Many Search Engines but too Few
Capable of Returning High Quality Data
Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes
What are Information Delivery Issues for BMI?
Publishing of Patient Education Materials
Publishing of Provider Education Materials
How Can Patients/Providers find what Need?
How do they Know if its Relevant? Reputable?
SWEA76
CSE
5810
Scenario 1: World Wide Wait
A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web
You Want to Monitor the Results for this
Important Event, so you Fire up your Trusty Web
Browser, Pointing at the Result Posting Site, and
Wait, and Wait, and Wait …
What is the Problem?
The Scalability Problems are the Result of a
Mismatch Between the Data Access Characteristics of the Application and the Technology Used to
Implement the Application
May not be Relevant to BMI: Hard to Apply Scenario
SWEA77
CSE
5810
Scenario 2:
Many Applications Today have the Need for
Tracking Changes in Local and Remote Data
Sources and Notifying Changes If Some Condition
Over the Data Source(s) is Met
To Monitor Changes on Web, You Need to Fire
Your Trusty Web Browser from Time to Time,
Cache the Most Recent Result, and Difference
Manually Each Time You Poll the Data Source(s)
Issue: Pure Pull is Not the Answer to All Problems
BMI: If a Patient Enters Data that Sets off a Chain
Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event)
SWEA78
CSE
5810
Applications are Asymmetric but the Web is Not
Computation Centric vs. Information Flow Centric
Type of Asymmetry
Network Asymmetry
Satellite, CATV, Mobile Clients, Etc.
Client to Server Ratio
Too Many Clients can Swamp Servers
Data Volume
Mouse and Key Click vs. Content Delivery
Update and Information Creation
Clients Need to be Informed or Must Poll
Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification
FHIR and moving to Mobile Dominated World
SWEA79
CSE
5810
Pull-Based System
Transfer of Data from Server to Client is Initiated by a Client Pull
Clients Determine when to Get Information
Potential for Information to be Old Unless Client
Periodically Pulls
Push-Based System
Transfer of Data from Server to Client is Initiated by a Server Push
Clients may get Overloaded if Push is Too
Frequent
Hybrid
Pull and Push Combined
Pull First and then Push Continually
SWEA80
CSE
5810
Semantics: Servers Publish/Clients Subscribe
Servers Publish Information Online
Clients Subscribe to the Information of Interest
(Subscription-based Information Delivery)
Data Flow is Initiated by the Data Sources
(Servers) and is Aperiodic
Danger: Subscriptions can Lead to Other
Unwanted Subscriptions
Applications
Unicast: Database Triggers and Active Databases
1-to-n: Online News Groups
May work for Clinical Researcher to Provider Push
SWEA81
CSE
5810
Three Types of Nodes:
Data Sources
Provide Base Data which is to be Disseminated
Clients
Who are the Net Consumers of the Information
Information Brokers
Acquire Information from Other Data Sources, Add
Value to that Information and then Distribute this
Information to Other Consumers
By Creating a Hierarchy of Brokers, Information
Delivery can be Tailored to the Need of Many Users
Brokers may be Ideal Intermediaries for BMI!
Act on Behalf of Patients, Providers
Incorporate Secure Access
SWEA82
CSE
5810
Ubiquitous/Pervasive
Many computers and information appliances everywhere, networked together
Inherent Complexity:
Coping with Latency (Sometimes
Unpredictable)
Failure Detection and Recovery
(Partial Failure)
Concurrency, Load Balancing,
Availability, Scale
Service Partitioning
Ordering of Distributed Events
“Accidental” Complexity:
Heterogeneity: Beyond the Local
Case: Platform, Protocol, Plus All
Local Heterogeneity in Spades.
Autonomy: Change and Evolve
Autonomously
Tool Deficiencies: Language
Support (Sockets,rpc),
Debugging, Etc.
SWEA83
Problem: too many sources,too much information
CSE
5810 Internet:
Information Jungle
Infopipes
Clean, Reliable,
Timely Information,
Anywhere
Digital
Earth
Personalized
Filtering &
Info. Delivery
Sensors
SWEA84
CSE
5810
Thin
Client
Web
Server
Mainframe
Database
Server
SWEA85
CSE
5810
Infotaps &
Fat Clients
Sensors
Variety of Servers
Many sources
Database
Server
SWEA86
CSE
5810
Heterogeneity:
How Much can we Really Integrate?
Syntactic Integration
Different Formats and Models
Web/SQL Query Languages
Semantic Interoperability
Basic Research on Ontology, Etc
Autonomy
No Central DBA on the Net
Independent Evolution of Schema and Content
Interoperation is Voluntary
Interface Technology (Support for Isvs)
DCOM: Microsoft Standard
CORBA, Etc...
SWEA87
CSE
5810
Security
System Security in the Broad Sense
Attacks: Penetrations, Denial of Service
System (and Information) Survivability
Security Fault Tolerance
Replication for Performance, Availability, and
Survivability
Data Quality
Web Data Quality Problems
Local Updates with Global Effects
Unchecked Redundancy (Mutual Copying)
Registration of Unchecked Information
Spam on the Rise
SWEA88
CSE
5810
Data Warehousing
Provide Access to Data for Complex Analysis,
Knowledge Discovery, and Decision Making
Underlying Infrastructure in Support of Mining
Provides Means to Interact with Multiple DBs
OLAP (on-Line Analytical Processing) vs. OLTP
Data Mining – Role in BMI and Healthcare?
Discovery of Information in a Vast Data Sets
Search for Patterns and Common Features based
Discover Information not Previously Known
Medical Records Accessible Nationwide
Research/Discover Cures for Rare Diseases
Relies on Knowledge Discovery in DBs (KDD)
SWEA89
CSE
5810
A Data Warehouse
Database is Maintained Separately from an
Operational Database
“A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for
Management’s Decision Making Process
[W.H.Inmon]”
OLAP (on-Line Analytical Processing)
Analysis of Complex Data in the Warehouse
Attempt to Attain “Value” through Analysis
Relies on Trained and Adept Skilled Knowledge
Workers who Discover Information
Data Mart
Organized Data for a Subset of an Organization
Establish De-Identified Marts for BMI Research
SWEA90
CSE
5810
Option 1
Leverage Existing
Repositories
Collate and Collect
May Not Capture All
Relevant Data
Option 2
Start from Scratch
Utilize Underlying
Corporate Data
Option 1:
Consolidate Data Marts
Corporate data warehouse
Option 2:
Build from scratch
Data Mart
Data Mart
...
Data Mart
Data Mart
Corporate data
SWEA91
CSE
5810
Clinical and Epidemiological Research (and for T2 and T1)
Each Study Submitted to Institutional Review Board (IRB)
For Human Subjects (Assess Risks, Protect Privacy)
See: http://resadm.uchc.edu/hspo/irb/
To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to
Create a Data Mart for each Approved Study
Export/Excerpt Study Data from Warehouse
May be Single or Multiple Sources
BMI data warehouse
Data Mart
...
Data Mart
Data Mart Data Mart
SWEA92
CSE
5810
Utilizes a “Multi-Dimensional” Data Model
Warehouse Comprised of
Store of Integrated Data from Multiple Sources
Processed into Multi-Dimensional Model
Warehouse Supports of
Times Series and Trend Analysis
“Super-Excel” Integrated with DB Technologies
Data is Less Volatile than Regular DB
Doesn’t Dramatically Change Over Time
Updates at Regular Intervals
Specific Refresh Policy Regarding Some Data
SWEA93
CSE
5810
External data sources
Operational databases
Extraxt
Transform
Load
Refresh metadata monitor integrator
OLAP Server
Summarization report
Data Warehouse serve
Query report
Data mining
Data marts
SWEA94
CSE
5810
Most of Data Warehouses use a Start Schema to
Represent Multi-Dimensional Data Model
Each Dimension is Represented by a Dimension
Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates
A Fact Table Connects All Dimension Tables with a
Multiple Join
Each Tuple in Fact Table Represents the Content of One Dimension
Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables
Links Between the Fact Table and the Dimensional
Tables for a Shape Like a Star
SWEA95
CSE
5810
Representation of Information in Two or More
Dimensions
Typical Two-Dimensional - Spreadsheet
In Practice, to Track Trends or Conduct Analysis,
Three or More Dimensions are Useful
For BMI – Axes for Diagnosis, Drug, Subject Age
SWEA96
CSE
5810
Supporting Multi-Dimensional Schemas Requires
Two Types of Tables:
Dimension Table: Tuples of Attributes for Each
Dimension
Fact Table: Measured/Observed Variables with
Pointers into Dimension Table
Star Schema
Characterizes Data Cubes by having a Single Fact
Table for Each Dimension
Snowflake Schema
Dimension Tables from Star Schema are
Organized into Hierarchy via Normalization
Both Represent Storage Structures for Cubes
SWEA97
CSE
5810
Date
Date
Month
Year
Store
StoreID
City
State
Country
Region
Sale Fact Table
Date
Product
Store
Customer
Unit_Sales
Dollar_Sales
Product
ProductNo
ProdName
ProdDesc
Categoryu
Customer
CustID
CustName
CustCity
CustCountry
SWEA98
CSE
5810
Date
Date
Month
Year
Symptoms
Pulmonary
Heart
Mus-Skel
Skin
Digestive
Patient Fact Table
Visit Date
Vitals
Symptoms
Patient
Medications
Etc.
Vitals
BP
Temp
Resp
HR (Pulse)
Patient
PatientID
PatientName
PatientCity
PatientCountry
Reference another Star
Schema for all Meds
SWEA99
CSE
5810
SWEA100
CSE
5810
SWEA101
CSE
5810
Data Acquisition
Extraction from Heterogeneous Sources
Reformatted into Warehouse Context - Names,
Meanings, Data Domains Must be Consistent
Data Cleaning for Validity and Quality is the Data as Expected w.r.t. Content? Value?
Transition of Data into Data Model of Warehouse
Loading of Data into the Warehouse
Other Issues Include:
How Current is the Data? Frequency of Update?
Availability of Warehouse? Dependencies of Data?
Distribution, Replication, and Partitioning Needs?
Loading Time (Clean, Format, Copy, Transmit,
Index Creation, etc.)?
For CTSA – Data Ownership (Competing Hosps).
SWEA102
CSE
5810
Data Warehousing Requires Knowledge Discovery to
Organize/Extract Information Meaningfully
Knowledge Discovery
Technology to Extract Interesting Knowledge
(Rules, Patterns, Regularities, Constraints) from a
Vast Data Set
Process of Non-trivial Extraction of Implicit,
Previously Unknown, and Potentially Useful
Information from Large Collection of Data
Data Mining
A Critical Step in the Knowledge Discovery
Process
Extracts Implicit Information from Large Data Set
SWEA103
CSE
5810
Learning the Application Domain (goals)
Gathering and Integrating Data
Data Cleaning
Data Integration
Data Transformation/Consolidation
Data Mining
Choosing the Mining Method(s) and Algorithm(s)
Mining: Search for Patterns or Rules of Interest
Analysis and Evaluation of the Mining Results
Use of Discovered Knowledge in Decision Making
Important Caveats
This is Not an Automated Process!
Requires Significant Human Interaction!
SWEA104
CSE
5810
OLAP Strategies
Roll-Up: Summarization of Data
Drill-Down: from the General to Specific (Details)
Pivot: Cross Tabulate the Data Cubes
Slide and Dice: Projection Operations Across
Dimensions
Sorting: Ordering Result Sets
Selection: Access by Value or Value Range
Implementation Issues
Persistent with Infrequent Updates (Loading)
Optimization for Performance on Queries is More
Complex - Across Multi-Dimensional Cubes
Recovery Less Critical - Mostly Read Only
Temporal Aspects of Data (Versions) Important
SWEA105
CSE
5810
Data Cube
A Multidimensonal Array
Each Attribute is a Dimension
In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date
Product Store Date Sale acron Rolla,MO 7/3/99 325.24
budwiser LA,CA 5/22/99 833.92
large pants NY,NY 2/12/99 771.24
3’ diaper Cuba,MO 7/30/99 81.99
Product
Pants
Diapers
Beer
Nuts
West
Region
East
Central
Mountain
South
Jan Feb March April
Date
SWEA106
CSE
5810
For BMI – Imagine a Data Table with Patient Data
Define Axis
Summarize Data
Create Perspective to Match Research Goal
Essentially De-identified Data Mart
Patient Med BirthDat Dosage
Steve Lipitor 1/1/45 10mg
John Zocor 2/2/55 80mg
Harry Crestor 3/3/65 5mg
Lois Lipitor 4/4/66 20mg
Charles Crestor 7/1/59 10mg
Medication
Lescol
Crestor
Zocor
Lipitor
5
Dosage
10
20
40
80
1940s 1950s 1960s 1970s
Decade
SWEA107
CSE
5810
The Slicing Action
A Vertical or Horizontal Slice Across Entire Cube
Months
Months
Slice on city Atlanta
Multi-Dimensional Data Cube
SWEA108
CSE
5810
The Dicing Action
A Slide First Identifies on Dimension
A Selection of Any Cube within the Slice which
Essentially Constrains All Three Dimensions
Months Months
Electronics
March 2000
Atlanta
Dice on Electronics and Atlanta
SWEA109
CSE
5810
Drill Down - Takes a Facet (e.g.,
Q1) and Decomposes into Finer Detail
Drill down on Q1
Q1 Q2 Q3 Q4
Jan Feb March
Q1 Q2 Q3 Q4
Roll Up on Location
(State, USA)
Roll Up: Combines Multiple Dimensions
From Individual Cities to State
SWEA110
CSE
5810
Analysis and Access Dramatically More Complicated!
Time Series Data for Glucose, BP, Peak Flow, etc.
Spatial databases
Multimedia databases
World Wide Web
Time series data
Geographical and Satellite Data
SWEA111
CSE
5810
Descriptive Mining
Discover and Describe General Properties
60% People who buy Beer on Friday also have
Bought Nuts or Chips in the Past Three Months
Predictive Mining
Infer Interesting Properties based on Available
Data
People who Buy Beer on Friday usually also Buy
Nuts or Chips
Result of Mining
Order from Chaos
Mining Large Data Sets in Multiple Dimensions
Allows Businesses, Individuals, etc. to Learn about
Trends, Behavior, etc.
Impact on Marketing Strateg
SWEA112
CSE
5810
Association
Discover the Frequency of Items Occurring
Together in a Transaction or an Event
Example
80% Customers who Buy Milk also Buy Bread
Hence - Bread and Milk Adjacent in Supermarket
50% of Customers Forget to Buy Milk/Soda/Drinks
Hence - Available at Register
Prediction
Predicts Some Unknown or Missing Information based on Available Data
Example
Forecast Sale Value of Electronic Products for Next
Quarter via Available Data from Past Three Quarters
SWEA113
CSE
5810
Motivated by Market Analysis
Rules of the Form
Item1 ^ Item2 ^
…
^ Itemk Itemk+1 ^
…
^ Itemn
Example
“Beer ^ Soft Drink
Pop Corn”
Problem: Discovering All Interesting Association
Rules in a Large Database is Difficult!
Issues
Interestingness
Completeness
Efficiency
Basic Measurement for Association Rules
Support of the Rule
Confidence of the Rule
SWEA114
CSE
5810
Classification
Determine the Class or Category of an Object based on its Properties
Example
Classify Companies based on the Final Sale Results in the Past Quarter
Clustering
Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity
Example
Group Crime Locations to Find Distribution Patterns
SWEA115
CSE
5810
Two Stages
Learning Stage: Construction of a Classification
Function or Model
Classification Stage: Predication of Classes of
Objects Using the Function or Model
Tools for Classification
Decision Tree
Bayesian Network
Neural Network
Regression
Problem
Given a Set of Objects whose Classes are Known
(Training Set), Derive a Classification Model which can Correctly Classify Future Objects
SWEA116
CSE
5810
Attributes
Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false
Class Attribute - Play/Don’t Play the Game
Training Set
Values that Set the Condition for the Classification
What are the Pattern Below?
Outlook Temperature Humidity Windy Play sunny 85 85 false No overcast 83 78 false Yes sunny 80 90 true No sunny 72 95 false No sunny 72 70 false Yes
… … … … ...
SWEA117
CSE
5810
Summarization
Characterization (Summarization) of General
Features of Objects in the Target Class
Example
Characterize People’s Buying Patterns on the Weekend
Potential Impact on “Sale Items” & “When Sales Start”
Department Stores with Bonus Coupons
Discrimination
Comparison of General Features of Objects
Between a Target Class and a Contrasting Class
Example
Comparing Students in Engineering and in Art
Attempt to Arrive at Commonalities/Differences
SWEA118
CSE
5810
Attribute-Oriented Induction
Generalization using Concert hierarchy (Taxonomy) barcode category brand content size
14998 milk diaryland Skim 2L
12998 mechanical MotorCraft valve 23a 12in
… … … … ...
Category Content Count milk skim 280 milk 2% 98
… … ...
food
Milk … bread
Skim milk … 2% milk
White whole bread … wheat
Lucern … Dairyland
Wonder … Safeway
SWEA119
CSE
5810
Technology Push
Technology for Collecting Large Quantity of Data
Bar Code, Scanners, Satellites, Cameras
Technology for Storing Large Collection of Data
Databases, Data Warehouses
Variety of Data Repositories, such as Virtual Worlds,
Digital Media, World Wide Web
Corporations want to Improve Direct Marketing and
Promotions - Driving Technology Advances
Targeted Marketing by Age, Region, Income, etc.
Exploiting User Preferences/Customized Shopping
What is Potential for BMI?
How do you see Data Mining Utilized?
What are Key Issues to Worry About?
SWEA120
CSE
5810
Security and Social
What Information is Available to Mine?
Preferences via Store Cards/Web Purchases
What is Your Comfort Level with Trends?
User Interfaces and Visualization
What Tools Must be Provided for End Users of
Data Mining Systems?
How are Results for Multi-Dimensional Data
Displayed?
Performance Guarantees
Range from Real-Time for Some Queries to Long-
Term for Other Queries
Data Sources of Complex Data Types or Unstructured
Data - Ability to Format, Clean, and Load Data Sets
SWEA121
CSE
5810
An Initiative of the University of Connecticut
Center for Public Health and Health Policy
Robert H. Aseltine, Jr., Ph.D.
Cal Collins
January 16, 2008
SWEA122
CSE
5810
State of Connecticut Agencies Collect and Maintain
Data in Separate Databases such as:
Vital Statistics: Birth, Death (DPH)
Surveillance data: Lead Screening and
Immunization Registries (DPH)
Administrative services: LINK system (DCF),
CAMRIS (DMR)
Benefit programs: WIC (DPH), Medicaid (DSS)
Educational achievement: (PSIS)
Such Data is Un-Integrated
Impossible to Track Assess Target Populations
Difficult to Develop Evidence-Based Practices
Limits Meaningful Interactions Among State
Agencies
SWEA123
What Do We Mean by “Integration?”
UCONN Health Center
Low Birth Weight Infant Registry
CSE
First Name DOB SSN Birth Wt.
(kg)
Appel
Berry
Carat
Ernst
Gomez
Hurst
Keller
Martinez
Rodriguez
Smith
April
John
Colleen
Max
Gloria
William
Helene
Pedro
Felix
Peggy
05/05/1
995
06/06/1
996
07/07/1
997
08/08/1
998
09/09/1
999
10/10/2
000
01/01/1
999
02/02/1
997
03/03/1
993
04/04/1
994
016-000-9876
216-000-4576
119-000-1234
116-000-3456
036-000-9999
016-000-5599
017-000-2340
018-000-9886
029-000-9111
016-000-8787
2.8
2.9
1.9
2.7
2.6
3.1
2.5
3.0
2.8
2.5
Dept. of Mental Retardation
Birth to Three System
Last Name
Allen
Buck
Cleary
Dory
Ernst
Friday
Glenn
Martinez
Riley
Sanchez
Max
Joe
Valerie
Pedro
Lily
Ramon
First Name DOB
Gwen
Jerome
Jane
Daniel
Street
04/04/19
94
11/03/19
99
03/23/19
98
08/08/19
98
03/03/19
96
03/03/19
93
01/01/19
99
07/01/19
99
03/03/19
93
03/03/19
93
Apple
Burbank
Cedar
Dogfish
Elm
Fruit
Glen
High
Ipswich
Juniper
CT Dept. of Education
PSIS System
Last Name
Town
Enfie
West
Tolla
Hartf
Enfie
Wind
Branf
Hartf
Bridg
New
Appel
Carat
Cleary
Ernst
Gomez
Friday
Keller
Martinez
Riley
Sanchez
April
Colleen
Jane
Max
Gloria
Joe
Helene
Pedro
Lily
Ramon
First Name
248
201
249
CMT
Math
134
256
268
152
289
265
309
Polio Vac
Date
01/05/
1999
05/01/
1998
01/28/
2000
01/09/
1999
01/01/
1999
10/01/
1999
11/01/
2001
12/01/
2003
01/01/
1999
01/01/
1999
180
122
159
Days in
Attendance
179
122
178
145
168
170
180
Last Name First Name
Ernst
Martinez
Max
Pedro
DOB
04/04/1994
08/08/1998
SSN
116-000-3456
018-000-9886
Birth Wt.
2.7
3.0
Street
Elm
High
Town
Enfield
Hartford
CMT Math
Grade 3
152
248
Polio
Vaccination
Date
01/09/1999
12/01/2003
Days in
Attendance
145
180
SWEA124
CSE
5810
Security and Privacy
HIPAA
FERPA
WIC, Social Security (Medicaid/Medicare) regulations
State statutes
Alteration/disruption of business practices
Unique identification of individuals/cases
Accuracy and reliability of data
Disparate hardware/software platforms
SWEA125
CSE
5810
Security and Privacy
HIPAA
FERPA
WIC, Social Security (Medicaid/Medicare) regulations
State statutes
Alteration/disruption of business practices
Unique identification of individuals/cases
Accuracy and reliability of data
Disparate hardware/software platforms
SWEA126
CSE
5810
Connecticut Health Information Network
A Federated Network That:
Allows Shared Access to “Health”-related Data
From Heterogeneous Databases
Allows Agencies to Retain Complete Control Over
Access to Data
Has Minimal Impact on Business Practices
Complies with Security and Privacy Statutes
Incorporates Cutting-edge Approaches to Case
Matching
Partnership of:
Early Partners: DPH, DCF, DDS, DoE, DOIT,
UConn, Akaza Research
SWEA127
CSE
5810
SWEA128
CSE
5810
Produce relational, record-level datasets by merging data from multiple agencies to support research into health, education, and social services, licensing
De-identify or anonymize that data to the level necessary for a particular application
Utilized internally within an agency to integrate data that does not need to be anonymized.
Supports Integraiton with legacy systems that hold data in incompatible formats http://www.publichealth.uconn.edu/pathproduct.html
SWEA129
CSE
5810
integrates data from diverse sources that may or may not share a universal record identifier handles data in a HIPAA and FERPA compliant manner utilizes a highly secure architecture maintains the autonomy of agency data - exposure, location, and schema provides an extremely easy to learn and flexible user interface requires no changes to agency database schemas needs minimal upgrade to departmental computer hardware and software once installed, it can quickly and efficiently produce integrated datasets
SWEA130
CSE
5810
Only Scratched Surface on Architectures
Micro Architectures
Macro Architectures
Super-Macro Architectures (We’ll see …)
What’s are Key Facets in the Discussion?
Role and Impact of Standards
Open Solutions
Architectural Variants – Reuse “Architecture”
Can we Reuse CHIN for Clinical Practice?
Are All Contributors Simply Each Hospital and EHR?
How do we Connect all of the Pieces?
What are Next Steps?
Let’s Review Some other Work
Source: Wide Range of Presentations on Web
SWEA131