Software and Enterprise Architectures

advertisement

CSE

5810

Software and Enterprise Architectures

Prof. Steven A. Demurjian, Sr.

Computer Science & Engineering Department

The University of Connecticut

371 Fairfield Road, Box U-255

Storrs, CT 06269-2155 steve@engr.uconn.edu

http://www.engr.uconn.edu/~steve

(860) 486 - 4818

Copyright © 2008 by S. Demurjian, Storrs, CT.

SWEA1

Software Architectures

CSE

5810

Emerging Discipline in Mid-1990s

Software as Collection of Interacting Components

What are Local Interactions (within Component)?

What are Global Interactions (between Components)?

Advantages of SW Architectural Design

Understand Communication/Synchronization

Definition of Database Requirements

Identification of Performance/Scaling Issues

Detailing of Security Needs and Constraints

Towards Large-Scale Software Development

For Biomedical Informatics:

What are Architectures for Data Sharing?

How is Interoperability Facilitated?

SWEA2

Concepts of Software Architectures

CSE

5810

Exceed Traditional Algorithm/Data Structure

Perspective

Emphasize Componentwise Organization and System

Functionality

Focus on Global and Local Interactions

Identify Communication/Synchronization

Requirements

Define Database Needs and Dependencies

Consider Performance/Scaling Issues

Understand Potential Evolution Dimensions

SWEA3

Software Design Levels

CSE

5810

Architecturally:

 Modules

Interconnections Among Modules

Decomposition into Subsystems

Code:

 Algorithms/Data Structures

 Tasking/Control Threads

Executable:

 Memory Management

 Runtime Environment

Is this a Realistic/Accurate View?

Yes for a Single “Application”

What about Application of Applications?

System of Systems?

SWEA4

Software Engineering - an Oxymoron?

CSE

5810

Is there any Engineering?

Is there any Science?

Collection of Disparate Techniques:

 Data-Flow Diagrams

 E-R Diagrams

Finite State Machines

Petri Nets

UML Class, Object, Sequence, Etc.

Design Patterns

 Model Drive Architectures

What is being “Engineered”?

How do we Know we are Done?

 E.g. Does Artifact Match Specification?

SWEA5

What's Available for Engineering Software?

CSE

5810

Specification (Abstract Models, Algebraic Semantics)

Software Structure (Bundling Representation with

Algorithms)

Languages Issues (Models, Scope, User-Defined

Types)

Information Hiding (Protect Integrity of Information)

Integrity Constraints (Invariants of Data Structures)

Is this up to date?

What else can be Added to List?

 Design Patters

Model Driven Architectures

XML –Data Modeling and Dependencies

Others?

SWEA6

Engineering Success in Computing

CSE

5810

Compilers Have Had Great Success

 Originally by Hand

Then Compiler Compilers

Parser Generators - Lex/Yacc

Solid Science Behind Compilers

Regular, Context Free, Context Sensitive

Languages

FSAs, PDAs, CFGs, etc.

Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writing

SWEA7

History of Programming

CSE

5810

C - Still Remains Industry Stronghorse

 Separate Compilation

Decomposition of System into Subsystems, etc.

Shared Declarations

 ADTs in C, But Compiler won't Enforce Them

Modula-II and Ada 83 Had

Information Hiding

Public/Private Paradigm

Module/Package Concepts

Import/Export Paradigm

Rigor Enforced by Compiler – but Can’t

 Bind/Group Modules into Subsystems

Precisely Specify Interconnections and Interactions

Among Subsystems and Components

SWEA8

‘Recent-Past’ Generation?

CSE

5810

C++ and Ada95

Considered “Legacy” Languages - Old

Java, C# - Are they Headed Toward Legacy?

How do they Rate?

What Do they Offer that Hasn't been Offered

Before?

 What are Unique Benefits and Potential of Java?

What about new Web Technologies?

Javascript, Perl, PhP, Phython, Ruby

XML and SOAP

Mobile Computing

How do all of these fit into this process?

Particularly in Regards to C/S Solutions!

SWEA9

What's Next Step?

CSE

5810

Architectural Description Languages

 Provide Tools to Describe Architectures

 Definition and Communication

Codification of Architectural Expertise

Frameworks for Specific Domains

DB vs. GUI vs. Embedded vs. C/S

Formal Underpinning for Engineering Rigor

What has Appeared for Each of these?

Struts for GUI

Open Source Frameworks (mediawiki)

Wide-Ranging Standards (XML)

Model-Driven Architectures

What Else???

SWEA10

Architectural Styles

CSE

5810

What are Popular Architectural Styles?

 How are they Characterized?

 Example in Practice

Explore a Taxonomy of Styles

Focus on “Micro-Architectures”

 Components

Flow Among Components

Represents “Single” Application

Forms Basis for “Macro-Architectures”

System of Systems

Application of Applications

 Significantly Scaling Up

SWEA11

Taxonomy of Architectural Styles

CSE

5810

Data Flow Systems

 Batch Sequential

 Pipes and Filters

Call & Return Systems

 Main/Subroutines

(C, Pascal)

Object Oriented

Implicit Invocation

 Hierarchical Systems

Virtual Machines

Interpreters

Rule Based Systems

Data Centered Systems

 DBS

Hypertext

Blackboards

Independent

Components

 Communicating

Processes/Event

Systems

Client/Server

 Two-Tier

 Multi-Tier

SWEA12

CSE

5810

Taxonomy of Architectural Styles

Establish Framework of …

 Components

 Building Blocks for Constructing Systems

 A Major Unit of Functionality

Examples Include: Client, Server, Filter, Layer, DB

 Connectors

 Defining the Ways that Components Interact

 What are the Protocols that Mandate the Allowable

Interactions Among Components?

 How are Protocols Enforced at Run/Design Time?

Examples Include: Procedure Call, Event Broadcast,

DB Protocol, Pipe

SWEA13

Overall Framework

CSE

5810

What Is the Design Vocabulary?

 Connectors and Components

What Are Allowable Structural Patterns?

 Constraints on Combining Components &

Connectors

What Is the Underlying Conceptual Model?

 Von Newman, Parallel, Agent, Message-Passing…

Are their New Emerging Models?

Collaborative Environments/Shareware?

What Are Essential Invariants of a Style?

 Limits on Allowable Components & Connectors

Common Examples of Usage

Advantages and Disadvantages of a Style

Common Specializations of a Style

SWEA14

CSE

5810

Pipes and Filters

Components are Independent

Entities. No Shared State!

Sort

Sort Merge

Components with

Input and Output

Connectors for Flow Streams of I/O

 Filters:

Invariant: Unaware of up and Down Stream

Behavior

Streamed Behavior: Output Could Go From

One Filter to the Next One Allowing Multiple

Filters to Run in Parallel.

SWEA15

Pipes and Filters

CSE

5810

Possible Specializations:

 Pipelines - Linear Sequence

 Bounded - Limits on Data Amounts

 Typed Pipes - Known Data Format

What is a Classic Example?

Other Examples:

 Compilers

 Sequential Processes

 Parallel Processes

SWEA16

Pipes and Filters - Another Example

CSE

5810

Text Information Retrieval Systems

 Scanning Newspapers for Key Words, Etc.

 Also, Boolean Search Expressions

Where is Such an Architecture Utilized Today?

What is Potential Usage in BMI?

User

Search

Controller

Commands

Disk

Controller

Control

Programming

Result

Query

Resolver

Term

Comparator

Data

Search

DB

SWEA17

Pipes and Filters – In BMI

CSE

5810

Can be Structured to Model Medical Workflows

Series of Actions taken by Stakeholders on Patient

SWEA18

CSE

5810

Patterns for Ontologies

Extension of Rishi’s work …

Linear Ontology Architectural Pattern (LOAP)

 Model Knowledge in a Process

 Continue with Examples from Prior PPT http://www.engr.uconn.edu/~steve/Cse5810/Attaining-Semantic-

Enterprise-Interoperability-through-Ontology-Architectural-

Patterns.pdf

SWEA19

Patterns for Ontologies

CSE

5810

 Linear Ontology Architectural Pattern (LOAP)

 Diagnosis, Test, and Anatomy Ontologies

SWEA20

CSE

5810

Extending the Example

SWEA21

What has OO Evolved Into?

CSE

5810

What has Classic OO Solution Evolved into Today?

 Client (Browser + Struts)

Server (Many Variants of OO Languages)

Database Server (typically Relational)

Different Style (e.g., Design Pattern)

 Does Pattern Capture All Aspects of Style?

 Do we Need to Couple Technology with Pattern?

Dr. D, Jan 01, 08

Fever, Flu, Bed Rest

No Scripts

No Tests

Item(Phy_Name*, Date*,

Visit_Flag, Symptom, Diagnosis, Treatment,

Presc_Flag, Pre_No, Pharm_Name, Medication,

Test_Flag, Test_Code, Spec_No, Status, Tech)

SWEA22

Design Patterns as Software Architectures

CSE

5810

Emerged as the Recognition that in Object-Oriented

Systems Repetitions in Design Occurred

Gained Prominence in 1995 with Publication of

“Design Patterns: Elements of Reusable Object-

Oriented Software”, Addison-Wesley

“… descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context…”

 Akin to Complicated Generic

Usage of Patterns Requires

 Consistent Format and Abstraction

 Common Vocabulary and Descriptions

Simple to Complex Patterns – Wide Range

SWEA23

The Observer Pattern

CSE

5810

Utilized to Define a One-to-Many Relationship

Between Objects

When Object Changes State – all Dependents are

Notified and Automatically Updated

Loosely Coupled Objects

 When one Object (Subject – an Active Object)

Changes State than Multiple Objects (Observers –

Passive Objects) Notified

Observer Object Implements Interface to Specify the Way that Changes are to Occur

Two Interfaces and Two Concrete Classes

SWEA24

CSE

5810

The Observer Pattern

SWEA25

CSE

5810

Model View Controller

http://java.sun.com/blueprints/patterns/MVC-detailed.html

SWEA26

Model View Controller

CSE

5810

 Three Parts of the Pattern:

 Model

 Enterprise Data and Business Rules for Accessing and

Updating Data

View

Renders the Contents (or Portion) of Model

Deals with Presentation of Stored Data

Pull or Push Model Possible

Controller

Translates Interactions with View into Actions on

Model

Actions could be Button Clicks (GUI), Get/Post http

(Web), etc.

SWEA27

CSE

5810

Model View Controller

http://java.sun.com/blueprints/patterns/MVC-detailed.html

SWEA28

The Façade Design Pattern

CSE

5810

Unified higher-level global interface/system developed from

 a set of complex heterogeneous source interfaces/subsystems

 makes local sources easier to utilize for the clients

Composition of Pattern

 Subsytems

System Composed of Subsytems

Clients

SWEA29

CSE

5810

Facade

SWEA30

CSE

5810

Other Ontology Architectural Patterns

Leverage Façade Pattern for

 Local As View (LAV) Methodology

 MApping FRAmework (MAFRA) provides a conceptual framework for building semantic mappings between heterogeneous ontology models using semantics bridges

 High Level Centralized Ontology Architectural

Patterns (COAP)

Extend Façade Concept

Subsystems are Local Schemas

System is Global Schema

SWEA31

CSE

5810

LAV Ontology and Example

SWEA32

CSE

5810

MAFRA Ontology and Example

SWEA33

COAP Ontology and Example

CSE

5810

COAP Allows us to Define and Integrate Ontologies at a Much Higher Level

Integrating Multiple Ontologies

OM

1

Local

Ontology

Model (LO

1

)

OM

2

Global Ontology Model (O

G

)

Local

Ontology

Model (LO

2

)

OM

3

OM

N

Local

Ontology

Model (LO

3

)

….

Local

Ontology

Model (LO

N

)

SWEA34

COAP Ontology and Example

CSE

5810

 Example Unifies ICS, DSM, SNOMED etc.

UMLS

SNOMED-CT

Symptoms, Procedure,

Findings, etc.

Disease

Mental

Disorders

ICD

OMIM

Gene

Ontology

Gene

DSM

SWEA35

COAP Ontology and Example

CSE

5810

 Example Unifies ICS, DSM, SNOMED etc.

UMLS Metathesaurus

M

ICD

ICD Codes

M

SNOMED

SNOMED

…………

M

NCBI

NCBI

M

LOINC

LONIC

SWEA36

Layered Systems

CSE

5810

Useful Systems

Base Utility

Core level

Users

Components - Virtual Machine at Each Layer

Connectors - Protocols That Specify How Layers

Interact

Interaction Is Restricted to Adjacent Layers

SWEA37

Layered Systems

CSE

5810

Advantages:

 Increasing Levels of Abstraction

Support Enhancement - New Layers

Support for Reuse

Drawbacks:

 Not Feasible for All Systems

Performance Issues With Multiple Layers

Defining Abstractions Is Difficult.

SWEA38

Layered Systems in BMI

CSE

5810

One Approach to Constructing Access to Patient Data for Clinical Research and Clinical Practice

Construct Layered Data Repositories as Below

 Each Layer Targets Different User Group

 Need to Fine Tune Access Even within Layers

Aggregated

De-identified

Patient

Data

Provider

Cl. Researchers

Public Health Researchers

SWEA39

ISO as Layered Architecture

CSE

5810

 ISO Open Systems Interconnect (OSI) Model

 Now Widely Used as a Reference Architecture

7-layer Model

Provides Framework for Specific Protocols (Such as IP, TCP, FTP, RPC, UDP, RSVP, …)

Application

Presentation

Session

Transport

Network

Data Link

Physical

Application

Presentation

Session

Transport

Network

Data Link

Physical

SWEA40

CSE

5810

ISO OSI Model

Application

Presentation

Session

Transport

Network

Data Link

Physical

Application

Presentation

Session

Transport

Network

Data Link

Physical

Physical (Hardware)/Data Link Layer Networks:

Ethernet, Token Ring, ATM

Network Layer Net: The Internet

Transport Layer Net: Tcp-based Network

Presentation/Session Layer Net: Http/html, RPC,

PVM, MPI

Applications, E.g., WWW, Window System,

Algorithm

SWEA41

Layered Ontology Architectural Pattern (LaOAP)

CSE

5810

 Consider a set of Domain Models id

Name

Name

Disease

Symptom id

Disease

Id:Integer

Name:String

0…*

0…*

Symptom

Id:Integer

Name:String

Disease hasSymptom

Laboratory

Tests owl:Class

Disease

∩ owl:Class

Laboratory Tests

(c): Clinical OWL representation

(a) : Clinical ERD Model (b) : Clinical UML Model

Customer cId

Customer

Cloud Space cEmail

Space

Customer cId:Integer cEmail:String

0…*

0…*

Cloud Space

Space:Integer

Location:String hasCloudSpace

Cloud Space cloudAllows

Location

Content

Allowed

Content

Allowed

Content Allowed types:Enum

(d) : Business ERD Model

(e) : Business UML Model owl:Class

Customer

∩ owl:Class

CloudSpace

(f) : Business OWL representation

SWEA42

LaOAP and Example

CSE

5810

Query and Web Service

Model Terminology

Mapping

Axioms & Rules

Ontology

Conceptual

Model

(a) : Layered Ontology Architectural Pattern

(LaOAP).

Query and Web Service

Disease Queries

Terminology

Heart Attack, Fever, Cold

Mapping

Disease(id) ~ Disease(uid)

Axiom

Disease ∩ Symtom

Disease Ontology

Model

(b) : Instance of LaOAP.

SWEA43

CSE

5810

Implementation from Model to Code

Query and Web Service Layer

PREFIX laoap: <http://xmlns.com/Laoap/>

Select ?disease ?symp {?disease laoap:hasSymptom ?symp}

Terminology Layer

High Fever, Asthma, Heart Attack, John Smith, 50GB,

Mapping Layer

Disease

Illness id commonName severity owl:Class

Disease owl:Class

CloudSpace

∩ uid name severity owl:Class

Symptom owl:Class

Customer

Axiom & Rules Layer

Conceptual Model Layer

Disease

Symptom hasSymptom cloudAllows

ContectAllowed

CloudSpace hasSpace Space

SWEA44

CSE

5810

Query and Web Service Layer

PREFIX laoap: <http://xmlns.com/Laoap/>

Select ?disease ?symp {?disease laoap:hasSymptom ?symp}

Terminology Layer

High Fever, Asthma, Heart Attack, John Smith, 50GB,

Mapping Layer

Disease

Illness severity id

Implementation from Model to Code

commonName uid name severity owl:Class

Disease owl:Class

CloudSpace

∩ owl:Class

Symptom owl:Class

Customer

Axiom & Rules Layer

Conceptual Model Layer

Disease

Symptom hasSymptom cloudAllows

ContectAllowed

CloudSpace hasSpace Space

SWEA45

CSE

5810

Implementation from Model to Code

Query and Web Service Layer

PREFIX laoap: <http://xmlns.com/Laoap/>

Select ?disease ?symp {?disease laoap:hasSymptom ?symp}

Terminology Layer

High Fever, Asthma, Heart Attack, John Smith, 50GB,

Mapping Layer

Disease

Illness id commonName severity owl:Class

Disease owl:Class

CloudSpace

∩ uid name severity owl:Class

Symptom owl:Class

Customer

Axiom & Rules Layer

Conceptual Model Layer

Disease

Symptom hasSymptom cloudAllows

ContectAllowed

CloudSpace hasSpace Space

SWEA46

Other Ontology Patterns

CSE

5810

Ontology Pattern

(OP)

Content

OP

Structural

OP

Architectural

OP

Logical

OP

Lexico-Syntactic

OP

Naming

OP

Reasoning

OP

Annotation

OP

Presentation

OP

Correspondence

OP

Reengineering

OP

Mapping

OP

Logical Macro

OP

Transformation

OP

SchemaReengineering

OP

Gangemi, A., & Presutti, V. (2009). Ontology

Design Patterns. In Handbook on Ontologies:

International Handbooks on Information Systems

(pp. 221-243). IOS Press.

Refactoring

OP

SWEA47

CSE

5810

Other Ontology Patterns

Time-Indexed-

Participation

Object

Setting-for

1

1

Setting-for

Event

Setting-for

1 temporal-location

Time-

Interval

(a) : CODeP Time Indexed Participation Pattern .

1…*

Role defines

Modal

Target

Task

1…* defines

Description

1…* satisfies classifies

Object

1…*

1…* classifies

1…*

Event participant

Setting-for

(b) : CODeP Task Role Pattern .

Situation

Space-

Region

Space-

Region

1 1

Space-Location

Object

1 temporal-part-of

Space-Location

Object

1…*

Participant-in

Constant-Participantin

1…*

Event

Time-

Interval

Temporal-location

Part-of

Event

Temporal-location

1…*

Time-

Interval

(c) : CODeP Participation Pattern .

Gangemi, A. (2006). Ontology Patterns for Semantic

Web Content. Proceeding of 4th International Semantic

Web Conference , (pp. 262-276).

SWEA48

Repositories

CSE

5810 ks1 ks2 ks3

Blackboard

(shared data) ks8 ks7 ks6 ks4 ks5

Knowledge Sources Interact With the Blackboard.

Blackboard Contains the Problem Solving State Data.

Control Is Driven by the State of the Blackboard.

DB Systems Are a Form of Repository With a Layer

Between the BB and the KSs - Supports

 Concurrent Access, Security, Integrity, Recovery

SWEA49

Database System as a Repository

CSE

5810 c1 c2 c3

Database

(shared data) c8 c7 c6 c4 c5

Clients Interact With the DBMS

Database Contains the Problem Solving State Data

Control is Driven by the State of the Database

 Concurrent Access, Security, Integrity, Recovery

Single Layer System: Clients have Direct Access

Control of Access to Information must be

Carefully Defined within DB Security/Integrity

SWEA50

Team Project as a Repository

CSE

5810

 c8 c1 c2

Web Portal

Shared c7 c3 c6 c4 c5

Clients are Providers, Patients, Clinical Researchers

Database Underlies Web Portal

Simply a Portion of Architecture

 Interactions with PHR (Patients)

Interactions with EMR (Providers)

Interactions with Database/Warehouse (Researchers)

SWEA51

Virtual Chart as a Repository

CSE

5810 c8 c1

Virtual Chart c7 c2

 c3 c6 c4 c5

Clients are Providers, Patients, Clinical Researchers

SWEA52

Interpreters

CSE

5810

Inputs

Data

(program state)

Program being interpreted

Outputs

Simulated interpretation engine

Selected instruction

Selected data

Internal interpreter state

What Are Components and Connectors?

Where Have Interpreters Been Used in CS&E?

 LISP, ML, Java, Other Languages, OS

Command Line

SWEA53

CSE

5810

Java as Interpreter

SWEA54

Process Control Paradigms

CSE

5810

Set point

Controller

Input variables

D s to manipulated variables

With Feedback

Process

Controlled variable

Set point

Input variables

Controller

D s to manipulated variables

Without Feedback

Process

Controlled variable

 Also:

 Open vs. Close Loop Systems

Well Defined Control and Computational

Characters

Heavily Used in Engineering Fields.

SWEA55

CSE

5810

Process Architecture: Statechart Diagram?

SWEA56

Process Architecture: Activity Diagram?

CSE

5810

 Clear Applicability to Medical Processes that have

Underlying BMI – Low Level Processes

Waiting for

Heart Signal irregular beat

Heartbeat

Heart Signal

Trigger

Local

Alarm timeout

Trigger

Remote

Alarm

Waiting for

Resp. Signal

Breath

Resp Signal

Alarm Reset

SWEA57

Single and Multi-Tier Architectures

CSE

5810

Widespread use in Practice for All Types of

Distributed Systems and Applications

Two Kinds of Components

 Servers: Provide Services - May be Unaware of

Clients

Web Servers (unaware?)

Database Servers and Functional Servers (aware?)

Clients: Request Services from Servers

Must Identify Servers

May Need to Identify Self

A Server Can be Client of Another Server

Expanding from Micro-Architectures (Single

Computer/One Application) to Macro-Architecture

SWEA58

Single and Multi-Tier Architectures

CSE

5810

Normally, Clients and Servers are Independent

Processes Running in Parallel

Connectors Provide Means for Service Requests and

Answers to be Passes Among Clients/Servers

Connectors May be RPC, RMI, etc.

Advantages

Parallelism, Independence

Separation of Concerns, Abstraction

 Others?

Disadvantages

Complex Implementation Mechanisms

Scalability, Correctness, Real-Time Limits

Others?

SWEA59

CSE

5810

Example: Software Architectural Structure

Initial Data

Entry Operator

(Scanning &

Posting)

Advanced Data

Entry

Operators

Analyst Manager

Document

Server

Stored

Images/CD

Database

Server

Running

Oracle

10-100MB Network

RMI Registry

RMI Act.

Obj/Server

RMI Act.

Obj/Server

Functional Server

SWEA60

Business Process Model

CSE

5810

Licensing

Licensing

Division

Scanning

Operator

Scanner

DB

Historical

Records

DB

Completed

Applications

DB

Supervisor

Review

DB

Stored

Images

Licensing Division

Data Entry Operator

Printer

DB

Basic

Information

Entered

New Licenses

New Appointments

FOI

Letters (Request

Information, etc.)

SWEA61

Two-Tier Architecture

CSE

5810

Small Manufacturer Previously on C++

New Order Entry, Inventory, and Invoicing

Applications in Java Programming Language

Existing Customer and Order Database

Most of Business Logic in Stored Procedures

Tool-generated GUI Forms for Java Objects

SWEA62

Three-Tier Architecture

CSE

5810

Passenger Check-in for Regional Airline

Local Database for Seating on Today's Flights

Clients Invoke EJBs at Local Site Through RMI

EJBs Update Database and Queue Updates

JMS Queues Updates to Legacy System

DBC API Used to Access Local Database

SWEA63

Four-Tier Architecture

CSE

5810

Web Access to Brokerage Accounts

Only HTML Browser Required on Front End

"Brokerbean" EJB Provides Business Logic

Login, Query, Trade Servlets Call Brokerbean

Use JNDI to Find EJBs, RMI to Invoke Them

SWEA64

Architecture Comparisons

CSE

5810

Two-tier Through JDBC API is Simplest

Multi-tier: Separate Business Logic, Protect Database

Integrity, More Scaleable

JMS Queues vs. Synchronous (RMI or IDL):

 Availability, Response Time, Decoupling

JMS Publish & Subscribe: Off-line Notification RMI

IIOP vs. JRMP vs. Java IDL:

 Standard Cross-language Calls or Full Java

Functionality

JTS: Distributed Integrity, Lockstep Actions

SWEA65

Comments on Architectural Styles

CSE

5810

Architectural Styles Provide Patterns

 Suppose Designing a New System

During Requirements Discovery, Behavior and

Structure of System Will Emerge

Attempt to Match to Architectural Style

 Modify, Extend Style as Needed

By Choosing Existing Architectural Style

 Know Advantages and Disadvantages

Ability to Focus in on Problem Areas and

Bottlenecks

Can Adjust Architecture Accordingly

Architectures Range from Large Scale to Small Scale in their Applicability

We’ll see Examples for BMI Shortly …

SWEA66

The Next Big Challenge

CSE

5810

Macro-Architectures

 System of Systems

Application of Applications

Particularly for HIT and HIE!

Involves Two Key Issues

 Interoperability

Heterogeneous Distributed Databases

Heterogeneous Distributed Systems

 Autonomous Applications

Scalability

Rapid and Continuous Growth

Amount of Data

 Variety of Data Types

 Different Privacy Levels or Ownerships of Data

SWEA67

Interoperability: A Classic View

CSE

5810 Simple Federation

FDB Global

Schema

Federated

Integration

Multiple Nested Federation

FDB Global

Schema 4

Federated

Integration

Local

Schema

Local

Schema

Local

Schema

FDB 1

Local

Schema

Federation

FDB3

Federation

SWEA68

Database Interoperability in the Internet

CSE

5810

Technology

 Web/HTTP, JDBC/ODBC, CORBA (ORBs +

IIOP), XML

Architecture

Information Broker

• Mediator-Based Systems

• Agent-Based Systems

SWEA69

CSE

5810

Connecting a DB to the Web

DBMS

Web Server

CGI Script Invocation or JDBC Invocation

Web Server are

Stateless

DB Interactions Tend to be Stateful

Invoking a CGI

Script on Each DB

Interaction is Very

Expensive, Mainly

Due to the Cost of

DB Open

Internet

Browser

SWEA70

CSE

5810

Connecting More Efficiently

DBMS

Web Server

Internet

Helper

Processes

CGI Script or JDBC

Invocation

To Avoid Cost of

Opening Database, One can Use Helper

Processes that Always

Keep Database Open and Outlive Web

Connection

Newly Invoked CGI

Scripts Connect to a

Preexisting Helper

Process

System is Still Stateless

Browser

SWEA71

CSE

5810

DB-Internet Architecture

WWW Client

(Netscape)

WWW client

(Info. Explore)

Internet

HTTP Server

WWW Client

(HotJava)

DBWeb Gateway

DBWeb Gateway

DBWeb Gateway

DBWeb

Dispatcher

DBWeb Gateway

SWEA72

CSE

5810 

Biomedical Architectures

Transcend Normal Two, Three, and Four Tier Solutions –

Macro-Architecture

Emerging Standards

 FHIR, SMART, open mHealth

An Architecture of Architectures!

 Need to Integrate Systems that are Themselves Multi-Tier and Distributed

Need to Resolve Data Ownership Issues

 State of Connecticut Agencies Don’t Share

Competing Hospitals Seek to Protect Market Share

T1, T2, and Clinical Research Requires

 Interoperating Genomic Databases/Supercomputers

Integration of De-identified Patient Data from Multiple Sources to

Allow Sufficient Study Samples

De-identified Data Repositories or Data Marts

Dealing with Ownership Issues (DNA Research)

SWEA73

Internet and the Web

CSE

5810

A Major Opportunity for Business

 A Global Marketplace

 Business Across State and Country Boundaries

 A Way of Extending Services

Online Payment vs. VISA, Mastercard

 A Medium for Creation of New Services

Publishers, Travel Agents, Teller, Virtual Yellow

Pages, Online Auctions …

A Boon for Academia

 Research Interactions and Collaborations

Free Software for Classroom/Research Usage

Opportunities for Exploration of Technologies in

Student Projects

What are Implications for BMI, HIE?

SWEA74

WWW: Three Market Segments

CSE

5810

Server

Corporate

Network

Business to Business

 Information sharing

Ordering info./status

Targeted electronic commerce

Intranet

Decision support

Mfg.. System monitoring corporate repositories

Workgroups

Server

Internet

Internet

 Sales

Marketing

Information

Services

Provider Network

Server

Corporate

Network

Server

Exposure to Outside

Provider Network

SWEA75

Information Delivery Problems on the Net

CSE

5810

Everyone can Publish Information on the Web

Independently at Any Time

 Consequently, there is an Information Explosion

 Identifying Information Content More Difficult

There are too Many Search Engines but too Few

Capable of Returning High Quality Data

Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes

What are Information Delivery Issues for BMI?

 Publishing of Patient Education Materials

Publishing of Provider Education Materials

How Can Patients/Providers find what Need?

How do they Know if its Relevant? Reputable?

SWEA76

Example Web Applications

CSE

5810

Scenario 1: World Wide Wait

 A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web

 You Want to Monitor the Results for this

Important Event, so you Fire up your Trusty Web

Browser, Pointing at the Result Posting Site, and

Wait, and Wait, and Wait …

What is the Problem?

 The Scalability Problems are the Result of a

Mismatch Between the Data Access Characteristics of the Application and the Technology Used to

Implement the Application

May not be Relevant to BMI: Hard to Apply Scenario

SWEA77

Example Web Applications

CSE

5810

Scenario 2:

 Many Applications Today have the Need for

Tracking Changes in Local and Remote Data

Sources and Notifying Changes If Some Condition

Over the Data Source(s) is Met

 To Monitor Changes on Web, You Need to Fire

Your Trusty Web Browser from Time to Time,

Cache the Most Recent Result, and Difference

Manually Each Time You Poll the Data Source(s)

Issue: Pure Pull is Not the Answer to All Problems

BMI: If a Patient Enters Data that Sets off a Chain

Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event)

SWEA78

What is the Problem?

CSE

5810

Applications are Asymmetric but the Web is Not

 Computation Centric vs. Information Flow Centric

Type of Asymmetry

Network Asymmetry

Satellite, CATV, Mobile Clients, Etc.

Client to Server Ratio

Too Many Clients can Swamp Servers

Data Volume

 Mouse and Key Click vs. Content Delivery

Update and Information Creation

Clients Need to be Informed or Must Poll

Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification

FHIR and moving to Mobile Dominated World

SWEA79

What are Information Delivery Styles?

CSE

5810

Pull-Based System

Transfer of Data from Server to Client is Initiated by a Client Pull

Clients Determine when to Get Information

 Potential for Information to be Old Unless Client

Periodically Pulls

Push-Based System

 Transfer of Data from Server to Client is Initiated by a Server Push

 Clients may get Overloaded if Push is Too

Frequent

Hybrid

 Pull and Push Combined

 Pull First and then Push Continually

SWEA80

Publish/Subscribe

CSE

5810

Semantics: Servers Publish/Clients Subscribe

 Servers Publish Information Online

 Clients Subscribe to the Information of Interest

(Subscription-based Information Delivery)

 Data Flow is Initiated by the Data Sources

(Servers) and is Aperiodic

 Danger: Subscriptions can Lead to Other

Unwanted Subscriptions

Applications

 Unicast: Database Triggers and Active Databases

 1-to-n: Online News Groups

May work for Clinical Researcher to Provider Push

SWEA81

Design Options for Nodes

CSE

5810

Three Types of Nodes:

 Data Sources

 Provide Base Data which is to be Disseminated

Clients

Who are the Net Consumers of the Information

Information Brokers

Acquire Information from Other Data Sources, Add

Value to that Information and then Distribute this

Information to Other Consumers

 By Creating a Hierarchy of Brokers, Information

Delivery can be Tailored to the Need of Many Users

Brokers may be Ideal Intermediaries for BMI!

 Act on Behalf of Patients, Providers

 Incorporate Secure Access

SWEA82

CSE

5810

Research Challenges

Ubiquitous/Pervasive

Many computers and information appliances everywhere, networked together

Inherent Complexity:

 Coping with Latency (Sometimes

Unpredictable)

 Failure Detection and Recovery

(Partial Failure)

Concurrency, Load Balancing,

Availability, Scale

Service Partitioning

 Ordering of Distributed Events

“Accidental” Complexity:

Heterogeneity: Beyond the Local

Case: Platform, Protocol, Plus All

Local Heterogeneity in Spades.

Autonomy: Change and Evolve

Autonomously

Tool Deficiencies: Language

Support (Sockets,rpc),

Debugging, Etc.

SWEA83

Infosphere

Problem: too many sources,too much information

CSE

5810 Internet:

Information Jungle

Infopipes

Clean, Reliable,

Timely Information,

Anywhere

Digital

Earth

Personalized

Filtering &

Info. Delivery

Sensors

SWEA84

CSE

5810

Thin

Client

Current State-of-Art

Web

Server

Mainframe

Database

Server

SWEA85

CSE

5810

Infotaps &

Fat Clients

Infosphere Scenario – for BMI

Sensors

Variety of Servers

Many sources

Database

Server

SWEA86

Heterogeneity and Autonomy

CSE

5810

Heterogeneity:

 How Much can we Really Integrate?

Syntactic Integration

Different Formats and Models

Web/SQL Query Languages

Semantic Interoperability

Basic Research on Ontology, Etc

Autonomy

 No Central DBA on the Net

 Independent Evolution of Schema and Content

Interoperation is Voluntary

Interface Technology (Support for Isvs)

DCOM: Microsoft Standard

 CORBA, Etc...

SWEA87

Security and Data Quality

CSE

5810

Security

 System Security in the Broad Sense

Attacks: Penetrations, Denial of Service

System (and Information) Survivability

 Security Fault Tolerance

Replication for Performance, Availability, and

Survivability

Data Quality

 Web Data Quality Problems

Local Updates with Global Effects

Unchecked Redundancy (Mutual Copying)

 Registration of Unchecked Information

Spam on the Rise

SWEA88

Data Warehousing and Data Mining

CSE

5810

Data Warehousing

 Provide Access to Data for Complex Analysis,

Knowledge Discovery, and Decision Making

Underlying Infrastructure in Support of Mining

Provides Means to Interact with Multiple DBs

 OLAP (on-Line Analytical Processing) vs. OLTP

Data Mining – Role in BMI and Healthcare?

 Discovery of Information in a Vast Data Sets

Search for Patterns and Common Features based

Discover Information not Previously Known

 Medical Records Accessible Nationwide

Research/Discover Cures for Rare Diseases

Relies on Knowledge Discovery in DBs (KDD)

SWEA89

Data Warehousing and OLAP

CSE

5810

A Data Warehouse

Database is Maintained Separately from an

Operational Database

“A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for

Management’s Decision Making Process

[W.H.Inmon]”

OLAP (on-Line Analytical Processing)

Analysis of Complex Data in the Warehouse

Attempt to Attain “Value” through Analysis

 Relies on Trained and Adept Skilled Knowledge

Workers who Discover Information

Data Mart

Organized Data for a Subset of an Organization

Establish De-Identified Marts for BMI Research

SWEA90

CSE

5810

Building a Data Warehouse

 Option 1

Leverage Existing

Repositories

Collate and Collect

 May Not Capture All

Relevant Data

Option 2

 Start from Scratch

 Utilize Underlying

Corporate Data

Option 1:

Consolidate Data Marts

Corporate data warehouse

Option 2:

Build from scratch

Data Mart

Data Mart

...

Data Mart

Data Mart

Corporate data

SWEA91

CSE

5810

BMI – Partition/Excerpt Data Warehouse

Clinical and Epidemiological Research (and for T2 and T1)

Each Study Submitted to Institutional Review Board (IRB)

 For Human Subjects (Assess Risks, Protect Privacy)

 See: http://resadm.uchc.edu/hspo/irb/

To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to

Create a Data Mart for each Approved Study

Export/Excerpt Study Data from Warehouse

May be Single or Multiple Sources

BMI data warehouse

Data Mart

...

Data Mart

Data Mart Data Mart

SWEA92

CSE

5810

Data Warehouse Characteristics

Utilizes a “Multi-Dimensional” Data Model

Warehouse Comprised of

 Store of Integrated Data from Multiple Sources

 Processed into Multi-Dimensional Model

Warehouse Supports of

Times Series and Trend Analysis

“Super-Excel” Integrated with DB Technologies

Data is Less Volatile than Regular DB

Doesn’t Dramatically Change Over Time

Updates at Regular Intervals

Specific Refresh Policy Regarding Some Data

SWEA93

Three Tier Architecture

CSE

5810

External data sources

Operational databases

Extraxt

Transform

Load

Refresh metadata monitor integrator

OLAP Server

Summarization report

Data Warehouse serve

Query report

Data mining

Data marts

SWEA94

Data Warehouse Design

CSE

5810

Most of Data Warehouses use a Start Schema to

Represent Multi-Dimensional Data Model

Each Dimension is Represented by a Dimension

Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates

A Fact Table Connects All Dimension Tables with a

Multiple Join

Each Tuple in Fact Table Represents the Content of One Dimension

Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables

Links Between the Fact Table and the Dimensional

Tables for a Shape Like a Star

SWEA95

What is a Multi-Dimensional Data Cube?

CSE

5810

Representation of Information in Two or More

Dimensions

Typical Two-Dimensional - Spreadsheet

In Practice, to Track Trends or Conduct Analysis,

Three or More Dimensions are Useful

For BMI – Axes for Diagnosis, Drug, Subject Age

SWEA96

Multi-Dimensional Schemas

CSE

5810

Supporting Multi-Dimensional Schemas Requires

Two Types of Tables:

 Dimension Table: Tuples of Attributes for Each

Dimension

 Fact Table: Measured/Observed Variables with

Pointers into Dimension Table

Star Schema

 Characterizes Data Cubes by having a Single Fact

Table for Each Dimension

Snowflake Schema

 Dimension Tables from Star Schema are

Organized into Hierarchy via Normalization

Both Represent Storage Structures for Cubes

SWEA97

CSE

5810

Date

Date

Month

Year

Store

StoreID

City

State

Country

Region

Example of Star Schema

Sale Fact Table

Date

Product

Store

Customer

Unit_Sales

Dollar_Sales

Product

ProductNo

ProdName

ProdDesc

Categoryu

Customer

CustID

CustName

CustCity

CustCountry

SWEA98

CSE

5810

Example of Star Schema for BMI

Date

Date

Month

Year

Symptoms

Pulmonary

Heart

Mus-Skel

Skin

Digestive

Patient Fact Table

Visit Date

Vitals

Symptoms

Patient

Medications

Etc.

Vitals

BP

Temp

Resp

HR (Pulse)

Patient

PatientID

PatientName

PatientCity

PatientCountry

Reference another Star

Schema for all Meds

SWEA99

CSE

5810

A Second Example of Star Schema …

SWEA100

CSE

5810

and Corresponding Snowflake Schema

SWEA101

Data Warehouse Issues

CSE

5810

Data Acquisition

 Extraction from Heterogeneous Sources

Reformatted into Warehouse Context - Names,

Meanings, Data Domains Must be Consistent

Data Cleaning for Validity and Quality is the Data as Expected w.r.t. Content? Value?

Transition of Data into Data Model of Warehouse

 Loading of Data into the Warehouse

Other Issues Include:

How Current is the Data? Frequency of Update?

Availability of Warehouse? Dependencies of Data?

Distribution, Replication, and Partitioning Needs?

Loading Time (Clean, Format, Copy, Transmit,

Index Creation, etc.)?

For CTSA – Data Ownership (Competing Hosps).

SWEA102

Knowledge Discovery

CSE

5810

Data Warehousing Requires Knowledge Discovery to

Organize/Extract Information Meaningfully

Knowledge Discovery

 Technology to Extract Interesting Knowledge

(Rules, Patterns, Regularities, Constraints) from a

Vast Data Set

 Process of Non-trivial Extraction of Implicit,

Previously Unknown, and Potentially Useful

Information from Large Collection of Data

Data Mining

 A Critical Step in the Knowledge Discovery

Process

 Extracts Implicit Information from Large Data Set

SWEA103

Steps in a KDD Process

CSE

5810

Learning the Application Domain (goals)

Gathering and Integrating Data

Data Cleaning

Data Integration

Data Transformation/Consolidation

Data Mining

Choosing the Mining Method(s) and Algorithm(s)

Mining: Search for Patterns or Rules of Interest

Analysis and Evaluation of the Mining Results

Use of Discovered Knowledge in Decision Making

Important Caveats

 This is Not an Automated Process!

 Requires Significant Human Interaction!

SWEA104

OLAP Strategies

CSE

5810

OLAP Strategies

 Roll-Up: Summarization of Data

Drill-Down: from the General to Specific (Details)

Pivot: Cross Tabulate the Data Cubes

Slide and Dice: Projection Operations Across

Dimensions

Sorting: Ordering Result Sets

 Selection: Access by Value or Value Range

Implementation Issues

Persistent with Infrequent Updates (Loading)

Optimization for Performance on Queries is More

Complex - Across Multi-Dimensional Cubes

Recovery Less Critical - Mostly Read Only

Temporal Aspects of Data (Versions) Important

SWEA105

On-Line Analytical Processing

CSE

5810

Data Cube

 A Multidimensonal Array

 Each Attribute is a Dimension

In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date

Product Store Date Sale acron Rolla,MO 7/3/99 325.24

budwiser LA,CA 5/22/99 833.92

large pants NY,NY 2/12/99 771.24

3’ diaper Cuba,MO 7/30/99 81.99

Product

Pants

Diapers

Beer

Nuts

West

Region

East

Central

Mountain

South

Jan Feb March April

Date

SWEA106

On-Line Analytical Processing

CSE

5810

 For BMI – Imagine a Data Table with Patient Data

 Define Axis

Summarize Data

Create Perspective to Match Research Goal

Essentially De-identified Data Mart

Patient Med BirthDat Dosage

Steve Lipitor 1/1/45 10mg

John Zocor 2/2/55 80mg

Harry Crestor 3/3/65 5mg

Lois Lipitor 4/4/66 20mg

Charles Crestor 7/1/59 10mg

Medication

Lescol

Crestor

Zocor

Lipitor

5

Dosage

10

20

40

80

1940s 1950s 1960s 1970s

Decade

SWEA107

Examples of Data Mining

CSE

5810

 The Slicing Action

 A Vertical or Horizontal Slice Across Entire Cube

Months

Months

Slice on city Atlanta

Multi-Dimensional Data Cube

SWEA108

Examples of Data Mining

CSE

5810

 The Dicing Action

 A Slide First Identifies on Dimension

 A Selection of Any Cube within the Slice which

Essentially Constrains All Three Dimensions

Months Months

Electronics

March 2000

Atlanta

Dice on Electronics and Atlanta

SWEA109

CSE

5810

Examples of Data Mining

Drill Down - Takes a Facet (e.g.,

Q1) and Decomposes into Finer Detail

Drill down on Q1

Q1 Q2 Q3 Q4

Jan Feb March

Q1 Q2 Q3 Q4

Roll Up on Location

(State, USA)

Roll Up: Combines Multiple Dimensions

From Individual Cities to State

SWEA110

Mining Other Types of Data

CSE

5810

Analysis and Access Dramatically More Complicated!

Time Series Data for Glucose, BP, Peak Flow, etc.

Spatial databases

Multimedia databases

World Wide Web

Time series data

Geographical and Satellite Data

SWEA111

Advantages/Objectives of Data Mining

CSE

5810

Descriptive Mining

 Discover and Describe General Properties

 60% People who buy Beer on Friday also have

Bought Nuts or Chips in the Past Three Months

Predictive Mining

 Infer Interesting Properties based on Available

Data

 People who Buy Beer on Friday usually also Buy

Nuts or Chips

Result of Mining

Order from Chaos

Mining Large Data Sets in Multiple Dimensions

Allows Businesses, Individuals, etc. to Learn about

Trends, Behavior, etc.

Impact on Marketing Strateg

SWEA112

Data Mining Methods (1)

CSE

5810

Association

 Discover the Frequency of Items Occurring

Together in a Transaction or an Event

 Example

80% Customers who Buy Milk also Buy Bread

Hence - Bread and Milk Adjacent in Supermarket

 50% of Customers Forget to Buy Milk/Soda/Drinks

Hence - Available at Register

Prediction

Predicts Some Unknown or Missing Information based on Available Data

Example

Forecast Sale Value of Electronic Products for Next

Quarter via Available Data from Past Three Quarters

SWEA113

Association Rules

CSE

5810

Motivated by Market Analysis

Rules of the Form

 Item1 ^ Item2 ^

^ Itemk  Itemk+1 ^

^ Itemn

Example

“Beer ^ Soft Drink 

Pop Corn”

Problem: Discovering All Interesting Association

Rules in a Large Database is Difficult!

Issues

Interestingness

Completeness

Efficiency

Basic Measurement for Association Rules

Support of the Rule

Confidence of the Rule

SWEA114

Data Mining Methods (2)

CSE

5810

Classification

 Determine the Class or Category of an Object based on its Properties

 Example

Classify Companies based on the Final Sale Results in the Past Quarter

Clustering

 Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity

 Example

Group Crime Locations to Find Distribution Patterns

SWEA115

Classification

CSE

5810

Two Stages

 Learning Stage: Construction of a Classification

Function or Model

 Classification Stage: Predication of Classes of

Objects Using the Function or Model

Tools for Classification

Decision Tree

Bayesian Network

Neural Network

Regression

Problem

 Given a Set of Objects whose Classes are Known

(Training Set), Derive a Classification Model which can Correctly Classify Future Objects

SWEA116

An Example

CSE

5810

Attributes

Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false

Class Attribute - Play/Don’t Play the Game

Training Set

 Values that Set the Condition for the Classification

 What are the Pattern Below?

Outlook Temperature Humidity Windy Play sunny 85 85 false No overcast 83 78 false Yes sunny 80 90 true No sunny 72 95 false No sunny 72 70 false Yes

… … … … ...

SWEA117

Data Mining Methods (3)

CSE

5810

Summarization

 Characterization (Summarization) of General

Features of Objects in the Target Class

 Example

 Characterize People’s Buying Patterns on the Weekend

 Potential Impact on “Sale Items” & “When Sales Start”

Department Stores with Bonus Coupons

Discrimination

 Comparison of General Features of Objects

Between a Target Class and a Contrasting Class

 Example

Comparing Students in Engineering and in Art

Attempt to Arrive at Commonalities/Differences

SWEA118

Summarization Technique

CSE

5810

Attribute-Oriented Induction

Generalization using Concert hierarchy (Taxonomy) barcode category brand content size

14998 milk diaryland Skim 2L

12998 mechanical MotorCraft valve 23a 12in

… … … … ...

Category Content Count milk skim 280 milk 2% 98

… … ...

food

Milk … bread

Skim milk … 2% milk

White whole bread … wheat

Lucern … Dairyland

Wonder … Safeway

SWEA119

Why is Data Mining Popular?

CSE

5810

Technology Push

 Technology for Collecting Large Quantity of Data

 Bar Code, Scanners, Satellites, Cameras

 Technology for Storing Large Collection of Data

Databases, Data Warehouses

Variety of Data Repositories, such as Virtual Worlds,

Digital Media, World Wide Web

Corporations want to Improve Direct Marketing and

Promotions - Driving Technology Advances

Targeted Marketing by Age, Region, Income, etc.

Exploiting User Preferences/Customized Shopping

What is Potential for BMI?

 How do you see Data Mining Utilized?

 What are Key Issues to Worry About?

SWEA120

Requirements & Challenges in Data Mining

CSE

5810

Security and Social

 What Information is Available to Mine?

Preferences via Store Cards/Web Purchases

What is Your Comfort Level with Trends?

User Interfaces and Visualization

 What Tools Must be Provided for End Users of

Data Mining Systems?

 How are Results for Multi-Dimensional Data

Displayed?

Performance Guarantees

 Range from Real-Time for Some Queries to Long-

Term for Other Queries

Data Sources of Complex Data Types or Unstructured

Data - Ability to Format, Clean, and Load Data Sets

SWEA121

CSE

5810

An Initiative of the University of Connecticut

Center for Public Health and Health Policy

Robert H. Aseltine, Jr., Ph.D.

Cal Collins

January 16, 2008

SWEA122

What is CHIN?

CSE

5810

State of Connecticut Agencies Collect and Maintain

Data in Separate Databases such as:

 Vital Statistics: Birth, Death (DPH)

Surveillance data: Lead Screening and

Immunization Registries (DPH)

Administrative services: LINK system (DCF),

CAMRIS (DMR)

Benefit programs: WIC (DPH), Medicaid (DSS)

Educational achievement: (PSIS)

Such Data is Un-Integrated

Impossible to Track Assess Target Populations

Difficult to Develop Evidence-Based Practices

Limits Meaningful Interactions Among State

Agencies

SWEA123

What Do We Mean by “Integration?”

UCONN Health Center

Low Birth Weight Infant Registry

CSE

First Name DOB SSN Birth Wt.

(kg)

Appel

Berry

Carat

Ernst

Gomez

Hurst

Keller

Martinez

Rodriguez

Smith

April

John

Colleen

Max

Gloria

William

Helene

Pedro

Felix

Peggy

05/05/1

995

06/06/1

996

07/07/1

997

08/08/1

998

09/09/1

999

10/10/2

000

01/01/1

999

02/02/1

997

03/03/1

993

04/04/1

994

016-000-9876

216-000-4576

119-000-1234

116-000-3456

036-000-9999

016-000-5599

017-000-2340

018-000-9886

029-000-9111

016-000-8787

2.8

2.9

1.9

2.7

2.6

3.1

2.5

3.0

2.8

2.5

Dept. of Mental Retardation

Birth to Three System

Last Name

Allen

Buck

Cleary

Dory

Ernst

Friday

Glenn

Martinez

Riley

Sanchez

Max

Joe

Valerie

Pedro

Lily

Ramon

First Name DOB

Gwen

Jerome

Jane

Daniel

Street

04/04/19

94

11/03/19

99

03/23/19

98

08/08/19

98

03/03/19

96

03/03/19

93

01/01/19

99

07/01/19

99

03/03/19

93

03/03/19

93

Apple

Burbank

Cedar

Dogfish

Elm

Fruit

Glen

High

Ipswich

Juniper

CT Dept. of Education

PSIS System

Last Name

Town

Enfie

West

Tolla

Hartf

Enfie

Wind

Branf

Hartf

Bridg

New

Appel

Carat

Cleary

Ernst

Gomez

Friday

Keller

Martinez

Riley

Sanchez

April

Colleen

Jane

Max

Gloria

Joe

Helene

Pedro

Lily

Ramon

First Name

248

201

249

CMT

Math

134

256

268

152

289

265

309

Polio Vac

Date

01/05/

1999

05/01/

1998

01/28/

2000

01/09/

1999

01/01/

1999

10/01/

1999

11/01/

2001

12/01/

2003

01/01/

1999

01/01/

1999

180

122

159

Days in

Attendance

179

122

178

145

168

170

180

Last Name First Name

Ernst

Martinez

Max

Pedro

DOB

04/04/1994

08/08/1998

SSN

116-000-3456

018-000-9886

Birth Wt.

2.7

3.0

Street

Elm

High

Town

Enfield

Hartford

CMT Math

Grade 3

152

248

Polio

Vaccination

Date

01/09/1999

12/01/2003

Days in

Attendance

145

180

SWEA124

Key Challenges to Integrating Data

CSE

5810

Security and Privacy

 HIPAA

FERPA

WIC, Social Security (Medicaid/Medicare) regulations

 State statutes

Alteration/disruption of business practices

Unique identification of individuals/cases

Accuracy and reliability of data

Disparate hardware/software platforms

SWEA125

Key Challenges to Integrating Data

CSE

5810

Security and Privacy

 HIPAA

FERPA

WIC, Social Security (Medicaid/Medicare) regulations

 State statutes

Alteration/disruption of business practices

Unique identification of individuals/cases

Accuracy and reliability of data

Disparate hardware/software platforms

SWEA126

The Solution: CHIN

CSE

5810

Connecticut Health Information Network

A Federated Network That:

Allows Shared Access to “Health”-related Data

From Heterogeneous Databases

 Allows Agencies to Retain Complete Control Over

Access to Data

Has Minimal Impact on Business Practices

Complies with Security and Privacy Statutes

 Incorporates Cutting-edge Approaches to Case

Matching

Partnership of:

 Early Partners: DPH, DCF, DDS, DoE, DOIT,

UConn, Akaza Research

SWEA127

CSE

5810

Current CHIN Architecture

SWEA128

Path – Modular Data Integration

CSE

5810

Produce relational, record-level datasets by merging data from multiple agencies to support research into health, education, and social services, licensing

De-identify or anonymize that data to the level necessary for a particular application

Utilized internally within an agency to integrate data that does not need to be anonymized.

Supports Integraiton with legacy systems that hold data in incompatible formats http://www.publichealth.uconn.edu/pathproduct.html

SWEA129

Path – Capabilities

CSE

5810

 integrates data from diverse sources that may or may not share a universal record identifier handles data in a HIPAA and FERPA compliant manner utilizes a highly secure architecture maintains the autonomy of agency data - exposure, location, and schema provides an extremely easy to learn and flexible user interface requires no changes to agency database schemas needs minimal upgrade to departmental computer hardware and software once installed, it can quickly and efficiently produce integrated datasets

SWEA130

Concluding Remarks

CSE

5810

Only Scratched Surface on Architectures

 Micro Architectures

 Macro Architectures

 Super-Macro Architectures (We’ll see …)

What’s are Key Facets in the Discussion?

 Role and Impact of Standards

Open Solutions

Architectural Variants – Reuse “Architecture”

Can we Reuse CHIN for Clinical Practice?

 Are All Contributors Simply Each Hospital and EHR?

 How do we Connect all of the Pieces?

What are Next Steps?

Let’s Review Some other Work

 Source: Wide Range of Presentations on Web

SWEA131

Download