finswarch - University of Connecticut

advertisement
Software and Enterprise Architectures
CSE
5095
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
Copyright © 2008 by S. Demurjian, Storrs, CT.
SWEA1
Software Architectures

CSE

5095





Emerging Discipline in Mid-1990s
Software as Collection of Interacting Components
What are Local Interactions (within Component)?
What are Global Interactions (between Components)?
Advantages of SW Architectural Design
 Understand Communication/Synchronization
 Definition of Database Requirements
 Identification of Performance/Scaling Issues
 Detailing of Security Needs and Constraints
Towards Large-Scale Software Development
For Biomedical Informatics:
 What are Architectures for Data Sharing?
 How is Interoperability Facilitated?
SWEA2
Concepts of Software Architectures

CSE
5095






Exceed Traditional Algorithm/Data Structure
Perspective
Emphasize Componentwise Organization and System
Functionality
Focus on Global and Local Interactions
Identify Communication/Synchronization
Requirements
Define Database Needs and Dependencies
Consider Performance/Scaling Issues
Understand Potential Evolution Dimensions
SWEA3
The HTSS Software Architecture
CSE
5095
IL
IL
IL
SDO
EDO
SDO
EDO
Payment
CR
CR
CR
CR
IL:
CR:
IC:
DO:
Item Locator
Cash Register
Invent. Control
Deli Orderer for
Shopper/Employee
Item
IC
Order
IC
Non-Local
Client Int.
CreditCardDB
Inventory
Control
ItemDB
Global
Server
ItemDB
Local
Server
ATM-BanKDB
OrderDB
SupplierDB
SWEA4
Multiple Backend Database System (MBDS)
CSE
5095
Backend
Database
Processor
Database
Controller
Backend
Database
Processor
Host/User
Backend
Database
Processor
SWEA5
The MBDS Processes
CSE
5095
Database
Controller
Request
Preparation
Post
Processing
Put Msg.
Get Msg.
Get Msg.
Put Msg.
Directory
Management
Record
Processing
Concurrency
Control
Disk I/O
Backend
Database
Processor
SWEA6
Multiple Processes in MBDS
CSE
5095
No.
1
2
3
4
6
12
15
16
21
22
23
Type
New Request
Results of Request
Number of Reqs in Transaction
Aggregate Operators (Sum, etc.)
Parsed Request to Backends
Backend Aggregate Operator Results
Ids for Accessing Database Indexes
Request and Disk Addresses
Ids for Accessing Database Records
Locks Obtained: Okay to Execute
Request ID of Finished Request
SRC
Host
PoPr
ReqP
ReqP
ReqP
RecP
DM
DM
DM
CC
RecP
DST
ReqP
Host
PoPr
PoPr
DM
PoPr
DMs
RecP
CC
RecP
CC
SWEA7
Message Passing in MBDS
CSE
5095
F15 From
Other
Backend
A1
Request
Preparation
D6
Put Msg.
B3
C4
K12
Post
Processing
K12
Get Msg.
E15 To Backend(s)
Get Msg.
Put Msg.
D6,F15
E15
Directory
Management
G21
K12
H22
Record
Processing
I16
Concurrency
Control
J23
Disk I/O
SWEA8
Software Design Levels

CSE
5095



Architecturally:
 Modules
 Interconnections Among Modules
 Decomposition into Subsystems
Code:
 Algorithms/Data Structures
 Tasking/Control Threads
Executable:
 Memory Management
 Runtime Environment
Is this a Realistic/Accurate View?
 Yes for a Single “Application”
 What about Application of Applications?
 System of Systems?
SWEA9
Software Engineering - an Oxymoron?

CSE

5095



Is there any Engineering?
Is there any Science?
Collection of Disparate Techniques:
 Data-Flow Diagrams
 E-R Diagrams
 Finite State Machines
 Petri Nets
 UML Class, Object, Sequence, Etc.
 Design Patterns
 Model Drive Architectures
What is being “Engineered”?
How do we Know we are Done?
 E.g. Does Artifact Match Specification?
SWEA10
What's Available for Engineering Software?

CSE

5095





Specification (Abstract Models, Algebraic Semantics)
Software Structure (Bundling Representation with
Algorithms)
Languages Issues (Models, Scope, User-Defined
Types)
Information Hiding (Protect Integrity of Information)
Integrity Constraints (Invariants of Data Structures)
Is this up to date?
What else can be Added to List?
 Design Patters
 Model Driven Architectures
 XML –Data Modeling and Dependencies
 Others?
SWEA11
Engineering Success in Computing

CSE
5095


Compilers Have Had Great Success
 Originally by Hand
 Then Compiler Compilers
 Parser Generators - Lex/Yacc
Solid Science Behind Compilers
 Regular, Context Free, Context Sensitive
Languages
 FSAs, PDAs, CFGs, etc.
Science has Provided Engineering Success re. Ease
and Accuracy of Modern Compiler Writing
SWEA12
History of Programming

CSE
5095


C - Still Remains Industry Stronghorse
 Separate Compilation
 Decomposition of System into Subsystems, etc.
 Shared Declarations
 ADTs in C, But Compiler won't Enforce Them
Modula-II and Ada 83 Had
 Information Hiding
 Public/Private Paradigm
 Module/Package Concepts
 Import/Export Paradigm
Rigor Enforced by Compiler – but Can’t
 Bind/Group Modules into Subsystems
 Precisely Specify Interconnections and Interactions
Among Subsystems and Components
SWEA13
‘Recent-Past’ Generation?

CSE
5095


C++ and Ada95
 Considered “Legacy” Languages - Old
Java, C# - Are they Headed Toward Legacy?
 How do they Rate?
 What Do they Offer that Hasn't been Offered
Before?
 What are Unique Benefits and Potential of Java?
What about new Web Technologies?
 Javascript, Perl, PhP, Phython, Ruby
 XML and SOAP
 How do all of these fit into this process?
 Particularly in Regards to C/S Solutions!
SWEA14
What's Next Step?

CSE
5095





Architectural Description Languages
 Provide Tools to Describe Architectures
 Definition and Communication
Codification of Architectural Expertise
Frameworks for Specific Domains
DB vs. GUI vs. Embedded vs. C/S
Formal Underpinning for Engineering Rigor
What has Appeared for Each of these?
 Struts for GUI
 Open Source Frameworks (mediawiki)
 Wide-Ranging Standards (XML)
 Model-Driven Architectures
 What Else???
SWEA15
Architectural Styles

CSE
5095



What are Popular Architectural Styles?
 How are they Characterized?
 Example in Practice
Explore a Taxonomy of Styles
Focus on “Micro-Architectures”
 Components
 Flow Among Components
 Represents “Single” Application
Forms Basis for “Macro-Architectures”
 System of Systems
 Application of Applications
 Significantly Scaling Up
SWEA16
Taxonomy of Architectural Styles

CSE
5095



Data Flow Systems
 Batch Sequential
 Pipes and Filters
Call & Return Systems

 Main/Subroutines
(C, Pascal)
 Object Oriented
 Implicit Invocation
 Hierarchical Systems

Virtual Machines
 Interpreters
 Rule Based Systems
Data Centered Systems
 DBS
 Hypertext
 Blackboards
Independent
Components
 Communicating
Processes/Event
Systems
Client/Server
 Two-Tier
 Multi-Tier
SWEA17
Taxonomy of Architectural Styles

CSE
5095
Establish Framework of …
 Components
 Building Blocks for Constructing Systems
 A Major Unit of Functionality
 Examples Include: Client, Server, Filter, Layer, DB

Connectors
 Defining the Ways that Components Interact
 What are the Protocols that Mandate the Allowable
Interactions Among Components?
 How are Protocols Enforced at Run/Design Time?
 Examples Include: Procedure Call, Event Broadcast,
DB Protocol, Pipe
SWEA18
Overall Framework

CSE
5095






What Is the Design Vocabulary?
 Connectors and Components
What Are Allowable Structural Patterns?
 Constraints on Combining Components &
Connectors
What Is the Underlying Conceptual Model?
 Von Newman, Parallel, Agent, Message-Passing…
 Are their New Emerging Models?
 Collaborative Environments/Shareware?
What Are Essential Invariants of a Style?
 Limits on Allowable Components & Connectors
Common Examples of Usage
Advantages and Disadvantages of a Style
Common Specializations of a Style
SWEA19
Pipes and Filters
CSE
5095
Components are Independent
Entities. No Shared State!
Components with
Input and Output
Sort
Sort
Merge
Connectors for Flow Streams of I/O

Filters:
 Invariant: Unaware of up and Down Stream
Behavior
 Streamed Behavior: Output Could Go From
One Filter to the Next One Allowing Multiple
Filters to Run in Parallel.
SWEA20
Pipes and Filters

CSE
5095


Possible Specializations:
 Pipelines - Linear Sequence
 Bounded - Limits on Data Amounts
 Typed Pipes - Known Data Format
What is a Classic Example?
Other Examples:
 Compilers
 Sequential Processes
 Parallel Processes
SWEA21
Pipes and Filters - Another Example

CSE
5095


Text Information Retrieval Systems
 Scanning Newspapers for Key Words, Etc.
 Also, Boolean Search Expressions
Where is Such an Architecture Utilized Today?
What is Potential Usage in BMI?
User
Commands
Search
Disk
Controller
Controller
Programming
Result
Query
Resolver
Control
Term
Search
Comparator Data DB
SWEA22
ADTs and OO Architectures

CSE

5095
Widespread Usage in the 1990’s
Advantages Are Well Known
Components
op
obj
op
Connectors
op
obj
op
op
op
obj
obj
op
op
op
op
op
obj
obj
op
op
obj
obj

Disadvantages:
 Interaction Required Object Identity
 If Identity Changes, It Is Difficult to Track All
Affected Objects.
SWEA23
Implicit Invocation

CSE
5095


Similar to OO in the Sense that Components Can Call
Services on Other Components
How Does this Work?
 Components Have List of Events they can Raise
and List of Procedures to Handle Events
 When Event is Raised, it is Broadcast
 All Components that Have Procedure to Handle
Broadcast Event will Act Upon it
 The Component That Raised the Event has no
Knowledge of Which Component(s) will Handle
Event
What are Some Examples?
SWEA24
Implicit Invocation

CSE
5095

Advantages
 No Need to Know the Targeted Components
 Single Event can Impact Multiple Components
 New Event Handlers can Easily be Added
 New Events Can then be Raised
Disadvantages
 No Control Over the Order of Processing When an
Event is Raised
 No Control Over “Who” and “How Many” Process
Events
 Very Non-Deterministic System Behavior
SWEA25
What has OO Evolved Into?

CSE
5095

What has Classic OO Solution Evolved into Today?
 Client (Browser + Struts)
 Server (Many Variants of OO Languages)
 Database Server (typically Relational)
Different Style (e.g., Design Pattern)
 Does Pattern Capture All Aspects of Style?
 Do we Need to Couple Technology with Pattern?
Dr. D, Jan 01, 08
Fever, Flu, Bed Rest
No Scripts
No Tests
Item(Phy_Name*, Date*,
Visit_Flag, Symptom, Diagnosis, Treatment,
Presc_Flag, Pre_No, Pharm_Name, Medication,
Test_Flag, Test_Code, Spec_No, Status, Tech)
SWEA26
Layered Systems
CSE
5095
Useful Systems
Base Utility
Core
level
Users



Components - Virtual Machine at Each Layer
Connectors - Protocols That Specify How Layers
Interact
Interaction Is Restricted to Adjacent Layers
SWEA27
Layered Systems

CSE
5095

Advantages:
 Increasing Levels of Abstraction
 Support Enhancement - New Layers
 Support for Reuse
Drawbacks:
 Not Feasible for All Systems
 Performance Issues With Multiple Layers
 Defining Abstractions Is Difficult.
SWEA28
Layered Systems in BMI

CSE
5095

One Approach to Constructing Access to Patient Data
for Clinical Research and Clinical Practice
Construct Layered Data Repositories as Below
 Each Layer Targets Different User Group
 Need to Fine Tune Access Even within Layers
Aggregated
De-identified
Patient
Data
Provider
Cl. Researchers
Public Health Researchers
SWEA29
ISO as Layered Architecture

CSE
5095
ISO Open Systems Interconnect (OSI) Model
 Now Widely Used as a Reference Architecture
 7-layer Model
 Provides Framework for Specific Protocols (Such
as IP, TCP, FTP, RPC, UDP, RSVP, …)
Application
Presentation
Session
Transport
Network
Data Link
Physical
Application
Presentation
Session
Transport
Network
Data Link
Physical
SWEA30
ISO OSI Model
Application
Presentation
Session
Transport
Network
Data Link
Physical
CSE
5095
Application
Presentation
Session
Transport
Network
Data Link
Physical

Physical (Hardware)/Data Link Layer Networks:
Ethernet, Token Ring, ATM
Network Layer Net: The Internet

Transport Layer Net: Tcp-based Network

Presentation/Session Layer Net: Http/html, RPC,
PVM, MPI
Applications, E.g., WWW, Window System,
Algorithm


SWEA31
Repositories
ks8
CSE
5095
ks1
Blackboard
(shared data)
ks2
ks3
ks6
ks4




ks7
ks5
Knowledge Sources Interact With the Blackboard.
Blackboard Contains the Problem Solving State Data.
Control Is Driven by the State of the Blackboard.
DB Systems Are a Form of Repository With a Layer
Between the BB and the KSs - Supports
 Concurrent Access, Security, Integrity, Recovery
SWEA32
Database System as a Repository
c8
CSE
5095
c1
Database
(shared data)
c2
c3
c6
c4



c7
c5
Clients Interact With the DBMS
Database Contains the Problem Solving State Data
Control is Driven by the State of the Database
 Concurrent Access, Security, Integrity, Recovery
 Single Layer System: Clients have Direct Access
 Control of Access to Information must be
Carefully Defined within DB Security/Integrity
SWEA33
Team Project as a Repository
c8
CSE
5095
c1
Web Portal
Shared
c2
c3
c6
c4



c7
c5
Clients are Providers, Patients, Clinical Researchers
Database Underlies Web Portal
Simply a Portion of Architecture
 Interactions with PHR (Patients)
 Interactions with EMR (Providers)
 Interactions with Database/Warehouse (Researchers)
SWEA34
Interpreters
CSE
5095
Inputs
Outputs


Program being
interpreted
Data
(program state)
Simulated
interpretation
engine
Selected
instruction
Selected
data
Internal
interpreter
state
What Are Components and Connectors?
Where Have Interpreters Been Used in CS&E?
 LISP, ML, Java, Other Languages, OS
Command Line
SWEA35
Java as Interpreter
CSE
5095
SWEA36
Process Control Paradigms
Input variables
CSE
5095
Set
point
Ds to
manipulated
variables
Controller
Input variables
Set
point

Controller
Ds to
manipulated
variables
With Feedback
Process
Controlled
variable
Without Feedback
Process
Controlled
variable
Also:
 Open vs. Close Loop Systems
 Well Defined Control and Computational
Characters
 Heavily Used in Engineering Fields.
SWEA37
Process Architecture: Statechart Diagram?
CSE
5095
SWEA38
Process Architecture: Activity Diagram?

CSE
5095
Clear Applicability to Medical Processes that have
Underlying BMI – Low Level Processes
Waiting for
Heart Signal
timeout
irregular beat
Heartbeat
Heart Signal
Waiting for
Resp. Signal
Breath
Trigger
Local
Alarm
Trigger
Remote
Alarm
Resp Signal
Alarm Reset
SWEA39
Design Patterns as Software Architectures

CSE
5095



Emerged as the Recognition that in Object-Oriented
Systems Repetitions in Design Occurred
Gained Prominence in 1995 with Publication of
“Design Patterns: Elements of Reusable ObjectOriented Software”, Addison-Wesley
 “… descriptions of communicating objects and
classes that are customized to solve a general
design problem in a particular context…”
 Akin to Complicated Generic
Usage of Patterns Requires
 Consistent Format and Abstraction
 Common Vocabulary and Descriptions
Simple to Complex Patterns – Wide Range
SWEA40
The Observer Pattern

CSE
5095


Utilized to Define a One-to-Many Relationship
Between Objects
When Object Changes State – all Dependents are
Notified and Automatically Updated
Loosely Coupled Objects
 When one Object (Subject – an Active Object)
Changes State than Multiple Objects (Observers –
Passive Objects) Notified
 Observer Object Implements Interface to Specify
the Way that Changes are to Occur
 Two Interfaces and Two Concrete Classes
SWEA41
The Observer Pattern
CSE
5095
SWEA42
Model View Controller

http://java.sun.com/blueprints/patterns/MVC-detailed.html
CSE
5095
SWEA43
Model View Controller

CSE
5095
Three Parts of the Pattern:
 Model
 Enterprise Data and Business Rules for Accessing and
Updating Data

View
 Renders the Contents (or Portion) of Model
 Deals with Presentation of Stored Data
 Pull or Push Model Possible

Controller
 Translates Interactions with View into Actions on
Model
 Actions could be Button Clicks (GUI), Get/Post http
(Web), etc.
SWEA44
Model View Controller

http://java.sun.com/blueprints/patterns/MVC-detailed.html
CSE
5095
SWEA45
UML for System Modeling

CSE
5095



UML is a Language for Specifying, Visualizing,
Constructing, and Documenting Software Artifacts
What Does a Modeling Language Provide?
 Model Elements: Concepts and Semantics
 Notation: Visual Rendering of Model Elements
 Guidelines: Hints and Suggestions for Using
Elements in Notation
References and Resources
 Web: http://www.uml.org/
Is UML Sufficient for Complexity of BMI?
 Able to Model Information Needs for BMI?
 Able to Represent Required Architectures?
SWEA46
UML Diagrammatic Representations

CSE
5095




Component Diagram: Captures the Physical Structure
of the Implementation
Deployment Diagram: Captures the Topology of a
System’s Hardware
Collaboration Diagram: Captures Dynamic Behavior
(Message-Oriented)
What About Other Diagrams?
 State Chart Diagram: Captures Dynamic Behavior
(Event-Oriented)
 Activity Diagram: Captures Dynamic Behavior
(Activity-Oriented)
These and Others Seem too Low Level …
What is Role of UML for BMI?
 Yet Another Design Artifact
 Can it be More?
SWEA47
Component Diagram

Captures the Physical Structure of the Implementation
CSE
5095
SWEA48
Deployment Diagram

Captures the Topology of a System’s Hardware
CSE
5095
SWEA49
Collaboration Diagram
CSE
5095
SWEA50
Single and Multi-Tier Architectures

CSE
5095

Widespread use in Practice for All Types of
Distributed Systems and Applications
Two Kinds of Components
 Servers: Provide Services - May be Unaware of
Clients
 Web Servers (unaware?)
 Database Servers and Functional Servers (aware?)

Clients: Request Services from Servers
 Must Identify Servers
 May Need to Identify Self
 A Server Can be Client of Another Server

Expanding from Micro-Architectures (Single
Computer/One Application) to Macro-Architecture
SWEA51
Single and Multi-Tier Architectures

CSE
5095




Normally, Clients and Servers are Independent
Processes Running in Parallel
Connectors Provide Means for Service Requests and
Answers to be Passes Among Clients/Servers
Connectors May be RPC, RMI, etc.
Advantages
 Parallelism, Independence
 Separation of Concerns, Abstraction
 Others?
Disadvantages
 Complex Implementation Mechanisms
 Scalability, Correctness, Real-Time Limits
 Others?
SWEA52
Example: Software Architectural Structure
CSE
5095
Initial Data
Entry Operator
(Scanning &
Posting)
Advanced Data
Entry
Operators
Analyst
Manager
10-100MB Network
Document
Server
Stored
Images/CD
Database
Server
Running
Oracle
RMI Registry
RMI Act.
Obj/Server
RMI Act.
Obj/Server
Functional Server
SWEA53
Business Process Model
CSE
5095
DB
DB
Historical Completed
Records Applications
Licensing
DB
Supervisor
Review
Scanner
DB
Licensing
Division
Scanning
Operator
Stored
Images
Licensing Division Printer
Data Entry Operator
DB
Basic
Information
Entered
New Licenses
New Appointments
FOI
Letters (Request
Information, etc.)
SWEA54
Two-Tier Architecture

CSE
5095 



Small Manufacturer Previously on C++
New Order Entry, Inventory, and Invoicing
Applications in Java Programming Language
Existing Customer and Order Database
Most of Business Logic in Stored Procedures
Tool-generated GUI Forms for Java Objects
SWEA55
Three-Tier Architecture

CSE
5095 




Passenger Check-in for Regional Airline
Local Database for Seating on Today's Flights
Clients Invoke EJBs at Local Site Through RMI
EJBs Update Database and Queue Updates
JMS Queues Updates to Legacy System
DBC API Used to Access Local Database
SWEA56
Four-Tier Architecture

CSE

5095



Web Access to Brokerage Accounts
Only HTML Browser Required on Front End
"Brokerbean" EJB Provides Business Logic
Login, Query, Trade Servlets Call Brokerbean
Use JNDI to Find EJBs, RMI to Invoke Them
SWEA57
Architecture Comparisons

CSE
5095 



Two-tier Through JDBC API is Simplest
Multi-tier: Separate Business Logic, Protect Database
Integrity, More Scaleable
JMS Queues vs. Synchronous (RMI or IDL):
 Availability, Response Time, Decoupling
JMS Publish & Subscribe: Off-line Notification RMI
IIOP vs. JRMP vs. Java IDL:
 Standard Cross-language Calls or Full Java
Functionality
JTS: Distributed Integrity, Lockstep Actions
SWEA58
Comments on Architectural Styles

CSE
5095



Architectural Styles Provide Patterns
 Suppose Designing a New System
 During Requirements Discovery, Behavior and
Structure of System Will Emerge
 Attempt to Match to Architectural Style
 Modify, Extend Style as Needed
By Choosing Existing Architectural Style
 Know Advantages and Disadvantages
 Ability to Focus in on Problem Areas and
Bottlenecks
 Can Adjust Architecture Accordingly
Architectures Range from Large Scale to Small Scale
in their Applicability
We’ll see Examples for BMI Shortly …
SWEA59
Other Issues in Software Architectures

CSE
5095






Consider a Set of Applications
 New Software
 Legacy, COTS, Databases, etc.
A Distributed Application is a Set of Applications
Deployed Over a Network that Communicate
Relationship Between Applications
Different Implementations of “Same” Application on
Different Hardware Platforms
Configuration of Various Hardware Nodes
Different Node Types in the Network
Issue:
 What is the ‘Best’ Way to Deploy Applications
Across the Network of Available Resources?
SWEA60
Distributed Application & Hardware Nodes
CSE
5095

Computers & Connections May have
Different Characteristics that Affect
their Usage
 Speed
 Storage
 Bandwidth
SWEA61
Objective: ‘Best’ Deployment

CSE
5095


A Distributed System is Optimally
Deployed if it Yields the Best
Performance
Performance: Efficient Use of
Resources via Throughput,
Response Time, or Number of
Messages
What are Implications in BMI?
 Need to Bring Together
Multiple Assets
 Work Efficiently Across
Network
 Unifying Clinical Research
Repositories
SWEA62
Distr. Systems: Combo of Requirements
CSE
5095
interaction
patterns
software
elements
hardware
elements
Specification
interfaces
connections
protocols
SWEA63
Deployment Influenced by Many Factors
CSE
5095
algorithms
software
architecture
underlying
network
replication
degree
Performance
processing
nodes
usage
patterns
middleware
deployment
SWEA64
Framework for Design and Deployment
CSE
5095
SOFTWARE
HARDWARE
Dependencies
Deployment
PERFORMANCE
SWEA65
What is I5?

CSE
5095


Five Definition Languages
 Interface
 Inheritance
 Implementation
 Instantiation
 Installation
Five Formal Integrated Graphical Languages Based on
UML’s Implementation Diagrams
The Application, Network, Dependencies and the
Deployment are Part of an Integrated Framework
SWEA66
The Five Levels of I5
Abstraction

Interface (I1) - Types of Components, Nodes
and Connectors

Implementation (I2) - Classes of
Components, Nodes and Connectors

Integration (I3) - Dependencies Between
Component and Node Classes

Instantiation (I4) - Instances of Each Class
Definition

Installation (I5) - Deployment of Each
Instance (Requirements and Complete
Deployment)
Detail
CSE
5095
SWEA67
Levels of Specification in I5
 Types
CSE
5095
- Generic Definition of Components, Nodes, and
Connectors According to Their Role
 Defined in I1
 Used in I2 to Define Classes
 Classes - Different Implementations of the Types
 Defined in I2
 Used in I3 to Associate Software Components and
Hardware Artifacts and I4 to Define Instances
 Instances - Identical Copies of the Different Classes
 Defined in I4
 Used in I5 to Deploy Instances Across Nodes
SWEA68
UML

CSE
5095

UML is a Set of Graphical Specification Languages
(OMG’s Standard Design Language Since November,
1997)
Implementation Diagrams
 Component Diagrams:
 Show the Physical Structure of the Code in Terms of
Code Components and Their Dependencies

Deployment Diagrams:
 Show the Physical Architecture of the Hardware and
Software in the System.
 They Have a Type and an Instance Version.
SWEA69
UML

CSE

5095
When to Use Deployment Diagrams
“… In practice, I haven’t seen this kind of diagram
used much. Most people do draw diagrams to show
this kind of information but they are informal
cartoons. On the whole, I don’t have a problem with
that since each system has its own physical
characteristics that your want to emphasize. As we
wrestle more and more with distributed systems,
however, I’m sure we will require more formality as
we understand better which issues need to be
highlighted in deployment diagrams.”
 From “UML Distilled. Applying the Standard
Object Modeling Language”, by Martin Fowler.
Addison-Wesley, Object Technology Series, 7th.
Reprint June, 1998.
SWEA70
Pros and Cons of Graphical Modeling

CSE
5095
Advantages:




Clear to Show
Structure
Excellent
Communication
Vehicle
Addresses Different
Aspects of
Modeling in an
Integrated Fashion

Disadvantages:



Shows Little (or No)
Details
There is a Big Gap
Between Specification
and Implementation
Limited by Screen
Size & Printable Page
Solution: Associate a Complete Textual
Specification to Graphical Model that Contains
the Necessary Details for Each Element
SWEA71
Design Concepts

CSE
5095






Interface Interaction With the Outer World
Signature + Requested Services
Type: Abstract Entity - Interface + Semantics
Subtype: Inherits the Supertype Definition
Class: Implementation of a Type
Realization: Relation Between a Type and a Class That
Implements It
Subclass: Inherits the Superclass Implementation
Instance: Element of a Class
SWEA72
The I5 Framework

CSE
5095



An Integrated Specification Framework for
Distributed Systems
 Support for the Architectural Specification of OO
and Component Based Distributed Systems
 Heterogeneous Network - Platforms
A Five Level Framework for Defining Software and
Hardware (Platforms) With a Uniform Notation and
With Different Levels of Abstraction
Specified Textually in Z or Graphically in UML
 Emphasis on Implementation Diagrams
Please See http://www.engr.uconn.edu/~cecilia
SWEA73
Dependencies Between Levels
CSE
5095
Component Types
Node Types
INTERFACE
Component Classes
Node Classes
IMPLEMENTATION
Implementation
Dependencies
Inst. Components
INTEGRATION
Inst. Nodes
System
Instantiation
Installation Req.
(together,separated)
INSTANTIATION
Installation Req.
(fix location)
Complete Installation
INSTALLATION
SWEA74
Interface - Software: I1S

CSE
5095
Components Types




Type
Supertypes
Associated
Interfaces
Calls

Properties



Types are Unique
Supertypes Must Be
Part of I1S
Calls Must Be
Satisfied in I1S
SWEA75
Interface - Software: I1S
CSE
5095
response
Client
<<call>>
<<call>>
request
receive
FrontEnd
<<call>>
<<call>>
Replica
receive
gossip
<<call>>
SWEA76
Interface - Hardware: I1H

CSE
5095 

Node Types
Connector Types
Connections

Properties
 All Node Types Must Be
Connected
 Only Node and Connector
Types Defined Take Part in
the Connections
MPI
Sockets
SUN
Intel
Pentium
SWEA77
Implementation - Software: I2S

CSE
5095
Component Classes
 Component Type
 Class
 Superclasses
 Calls to Classes
Interfaces

Properties:
 Only Types in I1S are
Allowed
 Superclasses Are
Realizations of the
Supertypes
 Calls & Inheritance are
Satisfied Within I2S
SWEA78
Implementation - Software: I2S
CSE
5095
response
response
PCCtrCl
XCtrCl
<<call>>
<<call>>
request
receive
XFrontEnd
<<call>>
Counter
receive
gossip
<<call>>
SWEA79
Implementation - Hardware: I2H

CSE
5095


Node Classes
 Node Type
 Class
Connector Classes
 Type
 Class
Connections Between
Node Classes

Properties
 Node and Connector
Classes Refine the
Types in I1H
 Connections are With
Connector Classes That
Refine Connector Types
in I1H
SWEA80
Implementation - Hardware: I2H
CSE
5095
MPI
Sockets
SUN
<<realizes>>
Intel
Pentium
<<realizes>>
MPI_Impl
CSockets
SUN OS 4.1.4
Win95
SWEA81
Software and Hardware Integration: I3

CSE
5095

Relation <<supports>>
 Instances of the Component Class May Run on
Instances of the Node Class
 Important Step Since it Constrains
Deployment Options
Properties
 Only Node and Component Classes Defined in
I2 Can Participate of the <<supports>>
Relation
SWEA82
Software and Hardware Integration: I3
CSE
5095
response
response
PCCtrCl
XCtrCl
<<supports>>
<<supports>>
MPI_Impl
request
XFrontEnd
CSockets
<<supports>>
Win95
SUN OS 4.1.4
receive
<<supports>>
Counter
receive
gossip
SWEA83
Instantiation - Software: I4S

CSE
5095
Component Instances
 Class
 Identification
 Calls

Properties
 Instance Calls Refine
Class Calls
 Only Classes in I2S May
Be Instantiated
SWEA84
Instantiation - Software: I4S
CSE
5095
request
c1:PCCtrCl
response
fe1:XFrontEnd
response
receive
request
c3:PCCtrCl
c4:XCtrCl
fe2:XFrontEnd
response
ct1:Counter
receive
gossip
ct2:Counter
receive
gossip
c2:PCCtrCl
response
receive
gossip
receive
ct3:Counter
receive
gossip
ct4:Counter
receive
gossip
ct5:Counter
receive
gossip
ct6:Counter
SWEA85
Instantiation - Hardware: I4H

CSE
5095

Node Instances
 Class
 Identification
Connector Instances
 Class
 Identification
 Set of Connected
Nodes

Properties
 There are Only
Instances of the Node
& Connector Classes
Defined in I2H
 Connectors Refine
I2H Connections
SWEA86
Instantiation - Hardware: I4H
CSE
5095
pc1:Win95
pc2:Win95
pc3:Win95
pc4:Win95
sock1
sock2
sock3
sock4
sun1:
SunOS4.1.4
sun2:
SunOS4.1.4
sun3:
SunOS4.1.4
sun4:
SunOS4.1.4
sun5:
SunOS4.1.4
sun9:
SunOS4.1.4
sun10:
SunOS4.1.4
mpi1
sun6:
SunOS4.1.4
sun7:
SunOS4.1.4
sun8:
SunOS4.1.4
SWEA87
Installation Requirements

CSE
5095



A Set of Component Instances Must Be Deployed
Together or Separated
Fix the Location of Some Component Instances
All Installation Requirements Must Be Consistent
With the Requirements Imposed by All the Previous
Specification Levels
Requirements
 Together
 Separated
 Fix
SWEA88
Installation - Requirements: Ifix, Iseparated
CSE
5095
receive
receive
fe2:XFrontEnd
fe1:XFrontEnd
request
sun2:SunOS4.1.4
request
sun3:SunOS4.1.4
separated = {ct1:Counter, ct2:Counter, ct3:Counter,
ct4:Counter, ct5:Counter, ct6:Counter}
SWEA89
Mapping Applications to Hardware

CSE

5095
Applications (Left) and Hardware (Right) Instances
Restrictions on
 Which Applications can be Deployed on Which
Hardware?
 Which Applications Deployed Together?
 Which Applications Must be Separate?
SWEA90
Objective: ‘Best” Optimal Deployment
CSE
5095
SWEA91
Using I5 for BMI

CSE
5095
Focus at Architectural Level
 Multiple Assets to Bring Together
 Hospital EMRs, Provider EMRs, Other Systems


Multiple and Disparate Hardware
Different Contexts and Needs
 Clinical Practice – (Near) Real-Time Integration/Access
 Clinical Research – De-Identified Integrated Repository

Performance will be Key Issue
 Clinical Practice – Time of Access
 Clinical Research – Volume of Information
 Some Genomic Data Requires Terabytes of Data!
 Information overload Possible
SWEA92
The Next Big Challenge

CSE
5095

Macro-Architectures
 System of Systems
 Application of Applications
Involves Two Key Issues
 Interoperability
 Heterogeneous Distributed Databases
 Heterogeneous Distributed Systems
 Autonomous Applications

Scalability




Rapid and Continuous Growth
Amount of Data
Variety of Data Types
Different Privacy Levels or Ownerships of Data
SWEA93
Interoperability: A Classic View
CSE
5095
Local
Schema
Simple Federation
Multiple Nested Federation
FDB Global
Schema
FDB Global
Schema 4
Federated
Integration
Federated
Integration
Local
Schema
Local
Schema
FDB 1
Local
Schema
Federation
FDB3
Federation
SWEA94
What is CORBA?

CSE

5095
Differs from Typical Programming Languages
Objects can be …
 Located Throughout Network
 Interoperate with Objects on other Platforms
 Written in Ant PLs for which there is mapping
from IDL to that Language
Application
Interfaces
Domain Interfaces
Object Request Broker
Object Services
SWEA95
What is CORBA?

CSE

5095
Allow Interactions from Client to Server CORBA
Installed on All Participating Machines
Client Application
Static
Stub
DII
Server Application
ORB
Interface
ORB
Interface
Skel
eton
DSI
Object Adapter
Client ORB Core
Network
IDL - Independent
Same for all
applications
Server ORB Core
There may be multiple
object adapters
SWEA96
CORBA-Based Development
CSE
5095
IDL file
Client
Application
IDL Compiler
Stub
ORB/IIOP
Object
Implementation
IDL Compiler
Skeleton
ORB/IIOP
SWEA97
Database Interoperability in the Internet

CSE
5095

Technology
 Web/HTTP, JDBC/ODBC, CORBA (ORBs +
IIOP), XML
Architecture
Information Broker
•Mediator-Based Systems
•Agent-Based Systems
SWEA98
ORB Integration:Java Client + Legacy Application
CSE
5095
Java Client
Legacy
Application
Java
Wrapper
Object Request Broker (ORB)
CORBA is the Medium of Info. Exchange
Requires Java/CORBA Capabilities
SWEA99
Java Client with Wrapper to Legacy Application
CSE
5095
Java Client
Java Application Code
WRAPPER
Mapping Classes
JAVA LAYER
Interactions Between Java Client
and Legacy Appl. via C and RPC
C is the Medium of Info. Exchange
Java Client with C++/C Wrapper
NATIVE LAYER
Native Functions (C++)
RPC Client Stubs (C)
Legacy
Application
Network
SWEA100
COTS and Legacy Appls. to Java Clients
CSE
5095
COTS Application
Legacy Application
Java Application Code
Java Application Code
Native Functions that
Map to COTS Appl
NATIVE LAYER
Native Functions that
Map to Legacy Appl
NATIVE LAYER
JAVA LAYER
JAVA LAYER
Mapping Classes
JAVA NETWORK WRAPPER
Mapping Classes
JAVA NETWORK WRAPPER
Network
Java Client
Java Client
Java is Medium of Info. Exchange - C/C++ Appls with Java Wrappers
SWEA101
Java Client to Legacy App via RDBS
CSE
5095
Transformed
Legacy Data
Java Client
Updated Data
Relational
Database
System(RDS)
Extract and
Generate Data
Transform and
Store Data
Legacy
Application
SWEA102
JDBC

CSE
5095

JDBC API Provides DB Access Protocols for Open,
Query, Close, etc.
Different Drivers for Different DB Platforms
JDBC API
Java
Application
Driver Manager
Driver
Oracle
Driver
Access
Driver
Driver
Sybase
SWEA103
Connecting a DB to the Web

CSE
5095
DBMS

CGI Script Invocation
or JDBC Invocation
Web Server

Web Server are
Stateless
DB Interactions Tend
to be Stateful
Invoking a CGI
Script on Each DB
Interaction is Very
Expensive, Mainly
Due to the Cost of
DB Open
Internet
Browser
SWEA104
Connecting More Efficiently

CSE
5095
DBMS
Helper
Processes
CGI Script
or JDBC
Invocation

Web Server
Internet

To Avoid Cost of
Opening Database, One
can Use Helper
Processes that Always
Keep Database Open
and Outlive Web
Connection
Newly Invoked CGI
Scripts Connect to a
Preexisting Helper
Process
System is Still Stateless
Browser
SWEA105
DB-Internet Architecture
CSE
5095
WWW Client
(Netscape)
WWW client
(Info. Explore)
WWW Client
(HotJava)
Internet
HTTP Server
DBWeb Gateway
DBWeb Gateway
DBWeb Gateway
DBWeb
Dispatcher
DBWeb Gateway
SWEA106
Biomedical Architectures

CSE
5095

Transcend Normal Two, Three, and Four Tier
Solutions – Macro-Architecture
An Architecture of Architectures!
 Need to Integrate Systems that are Themselves
Multi-Tier and Distributed
 Need to Resolve Data Ownership Issues
 State of Connecticut Agencies Don’t Share
 Competing Hospitals Seek to Protect Market Share

T1, T2, and Clinical Research Requires
 Interoperating Genomic Databases/Supercomputers
 Integration of De-identified Patient Data from Multiple
Sources to Allow Sufficient Study Samples
 De-identified Data Repositories or Data Marts

Dealing with Ownership Issues (DNA Research)
SWEA107
Consider Team Project Architecture
Providers
Patients
CSE
5095
PHR
EMR
Web-Based
Portal(XML + HL7)
Open Source DB
(XML or MySQL)
Feedback
Repository
Clinical Researchers
Education
Materials
SWEA108
Internet and the Web

CSE
5095
A Major Opportunity for Business
 A Global Marketplace
 Business Across State and Country Boundaries

A Way of Extending Services
 Online Payment vs. VISA, Mastercard

A Medium for Creation of New Services
 Publishers, Travel Agents, Teller, Virtual Yellow Pages,
Online Auctions …


A Boon for Academia
 Research Interactions and Collaborations
 Free Software for Classroom/Research Usage
 Opportunities for Exploration of Technologies in
Student Projects
What are Implications for BMI? Where is the Adv?
SWEA109
WWW: Three Market Segments
Server
CSE
5095
Business to Business
Corporate
Network



Server
Intranet




Decision
support
Mfg.. System
monitoring
corporate
repositories
Workgroups
Information sharing
Ordering info./status
Targeted electronic
commerce
Internet
Corporate
Server Network
Internet




Sales
Marketing
Information
Services
Provider Network
Server
Provider Network
Exposure to Outside
SWEA110
Information Delivery Problems on the Net

CSE
5095



Everyone can Publish Information on the Web
Independently at Any Time
 Consequently, there is an Information Explosion
 Identifying Information Content More Difficult
There are too Many Search Engines but too Few
Capable of Returning High Quality Data
Most Search Engines are Useful for Ad-hoc Searches
but Awkward for Tracking Changes
What are Information Delivery Issues for BMI?
 Publishing of Patient Education Materials
 Publishing of Provider Education Materials
 How Can Patients/Providers find what Need?
 How do they Know if its Relevant? Reputable?
SWEA111
Example Web Applications

CSE
5095


Scenario 1: World Wide Wait
 A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web
 You Want to Monitor the Results for this Important
Event, so you Fire up your Trusty Web Browser,
Pointing at the Result Posting Site, and Wait, and
Wait, and Wait …
What is the Problem?
 The Scalability Problems are the Result of a
Mismatch Between the Data Access Characteristics
of the Application and the Technology Used to
Implement the Application
May not be Relevant to BMI: Hard to Apply Scenario
SWEA112
Example Web Applications

CSE
5095


Scenario 2:
 Many Applications Today have the Need for
Tracking Changes in Local and Remote Data
Sources and Notifying Changes If Some Condition
Over the Data Source(s) is Met
 To Monitor Changes on Web, You Need to Fire
Your Trusty Web Browser from Time to Time,
Cache the Most Recent Result, and Difference
Manually Each Time You Poll the Data Source(s)
Issue: Pure Pull is Not the Answer to All Problems
BMI: If a Patient Enters Data that Sets off a Chain
Reaction, how Can Provider be Notified and in Turn
the Provider Notify the Patient (Bad Health Event)
SWEA113
What is the Problem?

CSE
5095

Applications are Asymmetric but the Web is Not
 Computation Centric vs. Information Flow Centric
Type of Asymmetry
 Network Asymmetry
 Satellite, CATV, Mobile Clients, Etc.

Client to Server Ratio
 Too Many Clients can Swamp Servers

Data Volume
 Mouse and Key Click vs. Content Delivery

Update and Information Creation
 Clients Need to be Informed or Must Poll

Clearly, for BMI, Simple Web Environment/Browser
is Not Sufficient – No Auto-Notification
SWEA114
What are Information Delivery Styles?

CSE
5095


Pull-Based System
 Transfer of Data from Server to Client is Initiated
by a Client Pull
 Clients Determine when to Get Information
 Potential for Information to be Old Unless Client
Periodically Pulls
Push-Based System
 Transfer of Data from Server to Client is Initiated
by a Server Push
 Clients may get Overloaded if Push is Too
Frequent
Hybrid
 Pull and Push Combined
 Pull First and then Push Continually
SWEA115
Publish/Subscribe

CSE
5095


Semantics: Servers Publish/Clients Subscribe
 Servers Publish Information Online
 Clients Subscribe to the Information of Interest
(Subscription-based Information Delivery)
 Data Flow is Initiated by the Data Sources
(Servers) and is Aperiodic
 Danger: Subscriptions can Lead to Other
Unwanted Subscriptions
Applications
 Unicast: Database Triggers and Active Databases
 1-to-n: Online News Groups
May work for Clinical Researcher to Provider Push
SWEA116
Design Options for Nodes

CSE
5095
Three Types of Nodes:
 Data Sources
 Provide Base Data which is to be Disseminated

Clients
 Who are the Net Consumers of the Information

Information Brokers
 Acquire Information from Other Data Sources, Add
Value to that Information and then Distribute this
Information to Other Consumers
 By Creating a Hierarchy of Brokers, Information
Delivery can be Tailored to the Need of Many Users

Brokers may be Ideal Intermediaries for BMI!
 Act on Behalf of Patients, Providers
 Incorporate Secure Access
SWEA117
Research Challenges

CSE
5095
Ubiquitous/Pervasive
Many computers and information
appliances everywhere,
networked together

Inherent Complexity:
 Coping with Latency (Sometimes
Unpredictable)
 Failure Detection and Recovery
(Partial Failure)
 Concurrency, Load Balancing,
Availability, Scale
 Service Partitioning
 Ordering of Distributed Events
“Accidental” Complexity:
 Heterogeneity: Beyond the Local
Case: Platform, Protocol, Plus All
Local Heterogeneity in Spades.
 Autonomy: Change and Evolve
Autonomously
 Tool Deficiencies: Language
Support (Sockets,rpc),
Debugging, Etc.
SWEA118
Infosphere
Problem: too many sources,too much information
CSE
5095
Internet:
Information Jungle
Infopipes
Clean, Reliable,
Timely Information,
Anywhere
Digital
Earth
Personalized
Filtering &
Info. Delivery
Sensors
SWEA119
Current State-of-Art
CSE
5095
Web
Server
Mainframe
Database
Server
Thin
Client
SWEA120
Infosphere Scenario – for BMI
CSE
5095
Infotaps &
Fat Clients
Sensors
Variety
of Servers
Many sources
Database
Server
SWEA121
Heterogeneity and Autonomy

CSE
5095
Heterogeneity:
 How Much can we Really Integrate?
 Syntactic Integration
 Different Formats and Models
 Web/SQL Query Languages

Semantic Interoperability
 Basic Research on Ontology, Etc

Autonomy
 No Central DBA on the Net
 Independent Evolution of Schema and Content
 Interoperation is Voluntary
 Interface Technology (Support for Isvs)
 DCOM: Microsoft Standard
 CORBA, Etc...
SWEA122
Security and Data Quality

CSE
5095
Security
 System Security in the Broad Sense
 Attacks: Penetrations, Denial of Service
 System (and Information) Survivability
 Security Fault Tolerance
 Replication for Performance, Availability, and
Survivability

Data Quality
 Web Data Quality Problems




Local Updates with Global Effects
Unchecked Redundancy (Mutual Copying)
Registration of Unchecked Information
Spam on the Rise
SWEA123
Legacy Data Challenge

CSE
5095

Legacy Applications and Data
 Definition: Important and Difficult to Replace
 Typically, Mainframe Mission Critical Code
 Most are OLTP and Database Applications
Evolution of Legacy Databases
 Client-server Architectures
 Wrappers
 Expensive and Gradual in Any Case
SWEA124
Potential Value Added/Jumping on Bandwagon

CSE
5095




Sophisticated Query Capability
 Combining SQL with Keyword Queries
Consistent Updates
 Atomic Transactions and Beyond
But Everything has to be in a Database!
 Only If we Stick with Classic DB Assumptions
Relaxing DB Assumptions
 Interoperable Query Processing
 Extended Transaction Updates
Commodities DB Software
 A Little Help is Still Good If it is Cheap
 Internet Facilitates Software Distribution
 Databases as Middleware
SWEA125
Data Warehousing and Data Mining

CSE
5095

Data Warehousing
 Provide Access to Data for Complex Analysis,
Knowledge Discovery, and Decision Making
 Underlying Infrastructure in Support of Mining
 Provides Means to Interact with Multiple DBs
 OLAP (on-Line Analytical Processing) vs. OLTP
Data Mining
 Discovery of Information in a Vast Data Sets
 Search for Patterns and Common Features based
 Discover Information not Previously Known
 Medical Records Accessible Nationwide
 Research/Discover Cures for Rare Diseases

Relies on Knowledge Discovery in DBs (KDD)
SWEA126
Data Warehousing and OLAP

CSE
5095


A Data Warehouse
 Database is Maintained Separately from an
Operational Database
 “A Subject-Oriented, Integrated, Time-Variant, and
Non-Volatile Collection of Data in Support for
Management’s Decision Making Process
[W.H.Inmon]”
OLAP (on-Line Analytical Processing)
 Analysis of Complex Data in the Warehouse
 Attempt to Attain “Value” through Analysis
 Relies on Trained and Adept Skilled Knowledge
Workers who Discover Information
Data Mart
 Organized Data for a Subset of an Organization
 Establish De-Identified Marts for BMI Research
SWEA127
Building a Data Warehouse

CSE
5095
Option 1
 Leverage Existing
Repositories
 Collate and Collect
 May Not Capture All
Relevant Data

Option 2
 Start from Scratch
 Utilize Underlying
Corporate Data
Corporate
data warehouse
Option 1:
Consolidate Data Marts
Option 2:
Build from
scratch
Data Mart
...
Data Mart
Data Mart
Data Mart
Corporate data
SWEA128
BMI – Partition/Excerpt Data Warehouse

CSE
5095

Clinical and Epidemiological Research (and for T2 and T1)
Each Study Submitted to Institutional Review Board (IRB)
 For Human Subjects (Assess Risks, Protect Privacy)
 See: http://resadm.uchc.edu/hspo/irb/
To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to
Create a Data Mart for each Approved Study
 Export/Excerpt Study Data from Warehouse
 May be Single or Multiple Sources
BMI
data warehouse
Data Mart
...
Data Mart
Data Mart
Data Mart
SWEA129
Data Warehouse Characteristics

CSE

5095


Utilizes a “Multi-Dimensional” Data Model
Warehouse Comprised of
 Store of Integrated Data from Multiple Sources
 Processed into Multi-Dimensional Model
Warehouse Supports of
 Times Series and Trend Analysis
 “Super-Excel” Integrated with DB Technologies
Data is Less Volatile than Regular DB
 Doesn’t Dramatically Change Over Time
 Updates at Regular Intervals
 Specific Refresh Policy Regarding Some Data
SWEA130
Three Tier Architecture
CSE
5095
monitor
External data sources
OLAP Server
integrator
Summarization
report
Operational databases
Extraxt
Transform
Load
Refresh
serve
Data Warehouse
Query report
Data mining
metadata
Data marts
SWEA131
Data Warehouse Design

CSE
5095


Most of Data Warehouses use a Start Schema to
Represent Multi-Dimensional Data Model
Each Dimension is Represented by a Dimension
Table that Provides its Multidimensional Coordinates
and Stores Measures for those Coordinates
A Fact Table Connects All Dimension Tables with a
Multiple Join
 Each Tuple in Fact Table Represents the Content of
One Dimension
 Each Tuple in the Fact Table Consists of a Pointer
to Each of the Dimensional Tables
 Links Between the Fact Table and the Dimensional
Tables for a Shape Like a Star
SWEA132
What is a Multi-Dimensional Data Cube?

CSE
5095



Representation of Information in Two or More
Dimensions
Typical Two-Dimensional - Spreadsheet
In Practice, to Track Trends or Conduct Analysis,
Three or More Dimensions are Useful
For BMI – Axes for Diagnosis, Drug, Subject Age
SWEA133
Multi-Dimensional Schemas

CSE
5095



Supporting Multi-Dimensional Schemas Requires Two
Types of Tables:
 Dimension Table: Tuples of Attributes for Each
Dimension
 Fact Table: Measured/Observed Variables with
Pointers into Dimension Table
Star Schema
 Characterizes Data Cubes by having a Single Fact
Table for Each Dimension
Snowflake Schema
 Dimension Tables from Star Schema are Organized
into Hierarchy via Normalization
Both Represent Storage Structures for Cubes
SWEA134
Example of Star Schema
CSE
5095
Product
Date
Date
Month
Year
Sale Fact Table
Date
ProductNo
ProdName
ProdDesc
Categoryu
Product
Store
Customer
Unit_Sales
Store
StoreID
City
State
Country
Region
Dollar_Sales
Customer
CustID
CustName
CustCity
CustCountry
SWEA135
Example of Star Schema for BMI
CSE
5095
Vitals
Date
Date
Month
Year
Patient Fact Table
Visit Date
BP
Temp
Resp
HR (Pulse)
Vitals
Symptoms
Patient
Medications
Symptoms
Pulmonary
Heart
Mus-Skel
Skin
Digestive
Etc.
Patient
PatientID
PatientName
PatientCity
PatientCountry
Reference another Star
Schema for all Meds
SWEA136
A Second Example of Star Schema …
CSE
5095
SWEA137
and Corresponding Snowflake Schema
CSE
5095
SWEA138
Data Warehouse Issues

CSE
5095

Data Acquisition
 Extraction from Heterogeneous Sources
 Reformatted into Warehouse Context - Names,
Meanings, Data Domains Must be Consistent
 Data Cleaning for Validity and Quality
is the Data as Expected w.r.t. Content? Value?
 Transition of Data into Data Model of Warehouse
 Loading of Data into the Warehouse
Other Issues Include:
 How Current is the Data? Frequency of Update?
 Availability of Warehouse? Dependencies of Data?
 Distribution, Replication, and Partitioning Needs?
 Loading Time (Clean, Format, Copy, Transmit,
Index Creation, etc.)?
 For CTSA – Data Ownership (Competing Hosps).
SWEA139
Knowledge Discovery

CSE
5095


Data Warehousing Requires Knowledge Discovery to
Organize/Extract Information Meaningfully
Knowledge Discovery
 Technology to Extract Interesting Knowledge
(Rules, Patterns, Regularities, Constraints) from a
Vast Data Set
 Process of Non-trivial Extraction of Implicit,
Previously Unknown, and Potentially Useful
Information from Large Collection of Data
Data Mining
 A Critical Step in the Knowledge Discovery
Process
 Extracts Implicit Information from Large Data Set
SWEA140
Steps in a KDD Process

CSE

5095







Learning the Application Domain (goals)
Gathering and Integrating Data
Data Cleaning
Data Integration
Data Transformation/Consolidation
Data Mining
 Choosing the Mining Method(s) and Algorithm(s)
 Mining: Search for Patterns or Rules of Interest
Analysis and Evaluation of the Mining Results
Use of Discovered Knowledge in Decision Making
Important Caveats
 This is Not an Automated Process!
 Requires Significant Human Interaction!
SWEA141
OLAP Strategies

CSE
5095

OLAP Strategies
 Roll-Up: Summarization of Data
 Drill-Down: from the General to Specific (Details)
 Pivot: Cross Tabulate the Data Cubes
 Slide and Dice: Projection Operations Across
Dimensions
 Sorting: Ordering Result Sets
 Selection: Access by Value or Value Range
Implementation Issues
 Persistent with Infrequent Updates (Loading)
 Optimization for Performance on Queries is More
Complex - Across Multi-Dimensional Cubes
 Recovery Less Critical - Mostly Read Only
 Temporal Aspects of Data (Versions) Important
SWEA142
On-Line Analytical Processing

CSE
5095

Data Cube
 A Multidimensonal Array
 Each Attribute is a Dimension
In Example Below, the Data Must be Interpreted so
that it Can be Aggregated by Region/Product/Date
Product
Product
Store
Date
Sale
acron
Rolla,MO 7/3/99 325.24
budwiser LA,CA
5/22/99 833.92
large pants NY,NY
2/12/99 771.24
Pants
Diapers
Beer
Nuts
West
East
3’ diaper Cuba,MO 7/30/99 81.99
Region
Central
Mountain
South
Jan
Feb March April
Date
SWEA143
On-Line Analytical Processing

CSE
5095
For BMI – Imagine a Data Table with Patient Data
 Define Axis
 Summarize Data
 Create Perspective to Match Research Goal
 Essentially De-identified Data Mart
Medication
Patient
Med
BirthDat Dosage
Steve
Lipitor
1/1/45 10mg
John
Zocor
2/2/55
Harry
Crestor
3/3/65 5mg
Lois
Lipitor
4/4/66 20mg
Charles Crestor
7/1/59
Lescol
Crestor
Zocor
Lipitor
80mg
10mg
5
10
Dosage
20
40
80
1940s 1950s 1960s 1970s
Decade
SWEA144
Examples of Data Mining

CSE
5095
The Slicing Action
 A Vertical or Horizontal Slice Across Entire Cube
Months
Slice
on city Atlanta
Products Sales
Products Sales
Months
Multi-Dimensional Data Cube
SWEA145
Examples of Data Mining

CSE
5095
The Dicing Action
 A Slide First Identifies on Dimension
 A Selection of Any Cube within the Slice which
Essentially Constrains All Three Dimensions
Months
Products Sales
Products Sales
Months
March 2000
Electronics
Atlanta
Dice on Electronics and Atlanta
SWEA146
Examples of Data Mining
Drill Down - Takes a Facet (e.g.,
Q1)
and Decomposes into Finer Detail
Jan Feb March
Products Sales
CSE
5095
Drill down
on Q1
Roll Up
on Location
(State, USA)
Roll Up: Combines Multiple Dimensions
From Individual Cities to State
Q1 Q2 Q3 Q4
Products Sales
Products Sales
Q1 Q2 Q3 Q4
SWEA147
Mining Other Types of Data

CSE

5095
Analysis and Access Dramatically More Complicated!
Time Series Data for Glucose, BP, Peak Flow, etc.
Spatial databases
Multimedia databases
World Wide Web
Time series data
Geographical and Satellite Data
SWEA148
Advantages/Objectives of Data Mining

CSE
5095


Descriptive Mining
 Discover and Describe General Properties
 60% People who buy Beer on Friday also have
Bought Nuts or Chips in the Past Three Months
Predictive Mining
 Infer Interesting Properties based on Available
Data
 People who Buy Beer on Friday usually also Buy
Nuts or Chips
Result of Mining
 Order from Chaos
 Mining Large Data Sets in Multiple Dimensions
Allows Businesses, Individuals, etc. to Learn about
Trends, Behavior, etc.
 Impact on Marketing Strateg
SWEA149
Data Mining Methods (1)

CSE
5095
Association
 Discover the Frequency of Items Occurring
Together in a Transaction or an Event
 Example
 80% Customers who Buy Milk also Buy Bread
Hence - Bread and Milk Adjacent in Supermarket
 50% of Customers Forget to Buy Milk/Soda/Drinks
Hence - Available at Register

Prediction
 Predicts Some Unknown or Missing Information
based on Available Data
 Example
 Forecast Sale Value of Electronic Products for Next
Quarter via Available Data from Past Three Quarters
SWEA150
Association Rules

CSE

5095


Motivated by Market Analysis
Rules of the Form
 Item1^Item2^…^ ItemkItemk+1 ^ … ^ Itemn
Example
 “Beer ^ Soft Drink  Pop Corn”
Problem: Discovering All Interesting Association
Rules in a Large Database is Difficult!
 Issues
 Interestingness
 Completeness
 Efficiency

Basic Measurement for Association Rules
 Support of the Rule
 Confidence of the Rule
SWEA151
Data Mining Methods (2)

CSE
5095
Classification
 Determine the Class or Category of an Object
based on its Properties
 Example
 Classify Companies based on the Final Sale Results in
the Past Quarter

Clustering
 Organize a Set of Multi-dimensional Data Objects
in Groups to Minimize Inter-group Similarity is
and Maximize Intra-group Similarity
 Example
 Group Crime Locations to Find Distribution Patterns
SWEA152
Classification

CSE
5095


Two Stages
 Learning Stage: Construction of a Classification
Function or Model
 Classification Stage: Predication of Classes of
Objects Using the Function or Model
Tools for Classification
 Decision Tree
 Bayesian Network
 Neural Network
 Regression
Problem
 Given a Set of Objects whose Classes are Known
(Training Set), Derive a Classification Model
which can Correctly Classify Future Objects
SWEA153
An Example

CSE
5095


Attributes
Attribute
Possible Values
outlook
sunny, overcast, rain
temperature continuous
humidity
continuous
windy
true, false
Class Attribute - Play/Don’t Play the Game
Training Set
 Values that Set the Condition for the Classification
 What are the Pattern Below?
Outlook Temperature Humidity
sunny
85
85
overcast 83
78
sunny
80
90
sunny
72
95
sunny
72
70
…
…
…
Windy
false
false
true
false
false
…
Play
No
Yes
No
No
Yes
...
SWEA154
Data Mining Methods (3)

CSE
5095
Summarization
 Characterization (Summarization) of General
Features of Objects in the Target Class
 Example
 Characterize People’s Buying Patterns on the Weekend
 Potential Impact on “Sale Items” & “When Sales Start”
 Department Stores with Bonus Coupons

Discrimination
 Comparison of General Features of Objects
Between a Target Class and a Contrasting Class
 Example
 Comparing Students in Engineering and in Art
 Attempt to Arrive at Commonalities/Differences
SWEA155
Summarization Technique

CSE

5095
Attribute-Oriented Induction
Generalization using Concert hierarchy (Taxonomy)
barcode category
14998
milk
brand
diaryland
content
size
Skim
2L
food
12998 mechanical MotorCraft valve 23a 12in
…
…
…
…
...
Milk
…
Skim milk … 2% milk
Category
milk
milk
…
Content Count
skim
2%
…
280
98
...
bread
White
whole
bread … wheat
Lucern … Dairyland
Wonder … Safeway
SWEA156
Why is Data Mining Popular?

CSE
5095
Technology Push
 Technology for Collecting Large Quantity of Data
 Bar Code, Scanners, Satellites, Cameras

Technology for Storing Large Collection of Data
 Databases, Data Warehouses
 Variety of Data Repositories, such as Virtual Worlds,
Digital Media, World Wide Web


Corporations want to Improve Direct Marketing and
Promotions - Driving Technology Advances
 Targeted Marketing by Age, Region, Income, etc.
 Exploiting User Preferences/Customized Shopping
What is Potential for BMI?
 How do you see Data Mining Utilized?
 What are Key Issues to Worry About?
SWEA157
Requirements & Challenges in Data Mining

CSE
5095



Security and Social
 What Information is Available to Mine?
 Preferences via Store Cards/Web Purchases
 What is Your Comfort Level with Trends?
User Interfaces and Visualization
 What Tools Must be Provided for End Users of
Data Mining Systems?
 How are Results for Multi-Dimensional Data
Displayed?
Performance Guarantees
 Range from Real-Time for Some Queries to LongTerm for Other Queries
Data Sources of Complex Data Types or Unstructured
Data - Ability to Format, Clean, and Load Data Sets
SWEA158
CSE
5095
An Initiative of the University of Connecticut
Center for Public Health and Health Policy
Robert H. Aseltine, Jr., Ph.D.
Cal Collins
January 16, 2008
SWEA159
What is CHIN?

CSE
5095

State of Connecticut Agencies Collect and Maintain
Data in Separate Databases such as:
 Vital Statistics: Birth, Death (DPH)
 Surveillance data: Lead Screening and
Immunization Registries (DPH)
 Administrative services: LINK system (DCF),
CAMRIS (DMR)
 Benefit programs: WIC (DPH), Medicaid (DSS)
 Educational achievement: (PSIS)
Such Data is Un-Integrated
 Impossible to Track Assess Target Populations
 Difficult to Develop Evidence-Based Practices
 Limits Meaningful Interactions Among State
Agencies
SWEA160
What Do We Mean by “Integration?”
UCONN Health Center
Low Birth Weight Infant Registry
Dept. of Mental Retardation
Birth to Three System
CT Dept. of Education
PSIS System
CSE
5095
Last Name
First Name
DOB
SSN
Birth Wt.
(kg)
Last Name
First Name
DOB
Street
Town
Appel
April
01/01/1
999
016-000-9876
2.8
Allen
Gwen
01/01/19
99
Apple
Enfie
Berry
John
02/02/1
997
216-000-4576
2.9
Buck
Jerome
07/01/19
99
Burbank
West
Carat
Colleen
03/03/1
993
119-000-1234
1.9
Cleary
Jane
03/03/19
93
Cedar
Tolla
Ernst
Max
04/04/1
994
116-000-3456
2.7
Dory
Daniel
03/03/19
93
Dogfish
Hartf
Gomez
Gloria
05/05/1
995
036-000-9999
2.6
Ernst
Max
04/04/19
94
Elm
Enfie
Hurst
William
06/06/1
996
016-000-5599
3.1
Friday
Joe
11/03/19
99
Fruit
Wind
Keller
Helene
07/07/1
997
017-000-2340
2.5
Glenn
Valerie
03/23/19
98
Glen
Branf
Pedro
08/08/1
998
018-000-9886
Martinez
Pedro
08/08/19
98
High
Hartf
Felix
09/09/1
999
029-000-9111
Riley
Lily
03/03/19
96
Ipswich
Bridg
Sanchez
Ramon
New
Peggy
016-000-8787
03/03/19
93
Juniper
10/10/2
000
Martinez
Rodriguez
Smith
3.0
2.8
2.5
Last Name
First Name
CMT
Math
Polio Vac
Date
Days in
Attendance
Appel
April
134
01/05/
1999
179
Carat
Colleen
256
05/01/
1998
122
Cleary
Jane
268
01/28/
2000
178
Ernst
Max
152
01/09/
1999
145
Gomez
Gloria
289
01/01/
1999
168
Friday
Joe
265
10/01/
1999
170
Keller
Helene
309
11/01/
2001
180
Martinez
Pedro
248
12/01/
2003
180
Riley
Lily
201
01/01/
1999
122
Sanchez
Ramon
249
01/01/
1999
159
Last Name
First Name
DOB
SSN
Birth Wt.
Street
Town
CMT Math
Grade 3
Polio
Vaccination
Date
Days in
Attendance
Ernst
Max
04/04/1994
116-000-3456
2.7
Elm
Enfield
152
01/09/1999
145
Martinez
Pedro
08/08/1998
018-000-9886
3.0
High
Hartford
248
12/01/2003
180
SWEA161
Key Challenges to Integrating Data

CSE
5095




Security and Privacy
 HIPAA
 FERPA
 WIC, Social Security (Medicaid/Medicare)
regulations
 State statutes
Alteration/disruption of business practices
Unique identification of individuals/cases
Accuracy and reliability of data
Disparate hardware/software platforms
SWEA162
Key Challenges to Integrating Data

CSE
5095




Security and Privacy
 HIPAA
 FERPA
 WIC, Social Security (Medicaid/Medicare)
regulations
 State statutes
Alteration/disruption of business practices
Unique identification of individuals/cases
Accuracy and reliability of data
Disparate hardware/software platforms
SWEA163
The Solution: CHIN

CSE

5095

Connecticut Health Information Network
A Federated Network That:
 Allows Shared Access to “Health”-related Data
From Heterogeneous Databases
 Allows Agencies to Retain Complete Control Over
Access to Data
 Has Minimal Impact on Business Practices
 Complies with Security and Privacy Statutes
 Incorporates Cutting-edge Approaches to Case
Matching
Partnership of:
 Early Partners: DPH, DCF, DDS, DoE, DOIT,
UConn, Akaza Research
SWEA164
CHIN Processes and Components
CSE
5095
Define data
elements
in CHIN
Map data
elements to
source database
Publish “metadata”
to CHIN with security
and privacy rules
CHIN Metadata
Registry
CHIN
Contributor
CHIN Metadata Registry
and CHIN Trusted
Broker
Query Execution:
Identifier Matching and
Data Merge
CHIN GRID and
Trusted Broker
Review Committee Approval
Build Query
CHIN Enterprise
Administration
CHIN Metadata Registry
and CHIN Query Builder
De-identify Data
CHIN Trusted Broker and
De-Identification Engine
Integrated,
De-identified
Data
SWEA165
Original CHIN Architecture
CSE
5095
http://publichealth.uconn.edu/CHIN.php
SWEA166
Second CHIN Architecture: User Side
CSE
5095
A
&
A
Contributor
Contributor
SWEA167
Second CHIN Architecture: Contributor Side
CSE
5095
A
&
A
Front End
Trusted
Broker
SWEA168
Current CHIN Architecture
CSE
5095
SWEA169
CHIN Architecture: Standards-based

CSE
5095

All data is mapped to Health Level Seven’s Clinical
Document Architecture (CDA) in XML
 Health Level Seven (HL7), is an ANSI-approved
Standards Developing Organization
 HL7 has its own XML Special Interest Group,
responsible for developing XML implementations
of its standards in XML
 HL7 is also an active participant in W3C, the
organization responsible for the development of
XML
 CDA was approved as an ANSI standard in
November of 2000.
Component Architecture communicates via Web
Services and OGSA Grid standards
SWEA170
CHIN Arch.: Proven, Open Components

CSE
5095
Components are based on open-source libraries
 The grid-based servers Mako and Virtual Mako are
part of the Mobius Project from Ohio State
University’s Dept. of BioInformatics
 The translation tools to get data into XML are
provided by the XQuare and XBridge projects,
hosted on the ObjectWeb website, an open source
middleware community
 The algorithm and code for identity management is
FEBRL, Freely Extensible Biomedical Record
Linkage, which was developed at Australian
National University
 NuSOAP Web Services Engine for component
integration
SWEA171
FEBRL

CSE 
5095



Identifier matching in FEBRL proceeds in four steps:
Data cleansing and standardization
 Removes, to the degree possible, string discrepancies based
on common misspellings, extra white space, or misplaced
name or address components.
Indexing
 Reduces the size of the number of record comparisons
which must be performed for scalability; blocking, sorting,
and bigram indexing methods are all supported.
Record comparison
 Conducted using an arbitrary composition of exact or
inexact string comparison methods over any combination of
fields
Classification.
 Follows the Felligi-Sunter34 model, with records pairs
assigned a weight based on a pallet of probabilities and
matches determined based on the record pair weights
SWEA172
FEBRL

CSE
5095

The current prototype uses FEBRL to implement a simplistic
method of linkage whereby record pairs are declared a match if
the first and last name are exactly equal.
Next Steps
 Evaluate the accuracy of linking records over a rubric of
five data fields - first name, last name, date of birth, social
security number, and gender.
 Exact and inexact matching (ie misspellings and slight
discrepancies), including experimental variations of the
service based on the blinded bigram matching algorithm.
 Assess false positives and false negatives produced by each
palette of field comparison algorithms.
 Evaluate the accuracy of linking records using fabricated
data sets with characteristics similar to real datasets
 Experiment with variations of canopy cluster matching
algorithm.
SWEA173
Other CHIN Issues

CSE
5095

Why Choose an Open Architecture?
 Increased Accountability
 Plenty of Documentation and Research
 Greater Transparency
 Ease of Installation, Maintenance, Dissemination
How is Data Ported into CHIN?
 CHIN is based on a Grid, with each organization
supporting its own data through a Contributor
server
 Agency staff has complete control over access to
data on CHIN by other users
 Only one server faces to the outside network
SWEA174
Creating a Contributor Server
External IP Address
Connection to
CHIN Trusted
Broker
CSE
5095
Data Elements
Firewall
Contributor Server
Contains:
XML generated files
Mako service
Java files


*.xqy files
XML files to
generate CDA
compliant files
Datasource
SWEA175
Connecting to rest of Network
External IP Address
Connection
to information
•Metadata
Registry takes
•About
elements
CHINdata
Trusted
•About data security
Broker information
•Datasource
CSE
5095
•Contributor profile is registered with
CHIN Network Admin
Data Elements
Firewall
Contributor Server
Contains:
XML generated files
Mako service
Java files


*.xqy files
XML files to
generate CDA
compliant files
Datasource
SWEA176
How do we get data out?

CSE
5095

The Trusted Broker component:
 Pulls XML from the Virtual Mako which reaches
out to all Contributors
 Compares records from different Contributors
using FEBRL
 De-identifies data sets to generate a final data set
for Investigators
The Front End component:
 Provides a central place for users to connect to the
system
 Connects to the Metadata Registry and the Trusted
Broker via Web Services calls
 Allows different users of the system to perform
different actions
SWEA177
Getting Data from CHIN
CSE
5095
SWEA178
Getting Data From CHIN
CSE
5095
XML Files
•CHIN also contains:
•A Front-end server to take queries
•A Trusted Broker to compare data,
perform record linkage,
and de-identify results
FEBRL
Result Set
Deidentify
Final Result Set
SWEA179
Progress to Date

CSE

5095




Needs assessment completed
Technical and functional specifications identified
MOU’s with state agencies
Expanding list of partners
Prototype developed
Funding for Model Network
Development/Deployment /Evaluation 2008
SWEA180
Demo
CSE
5095
SWEA181
EMR Architectures

CSE
5095
Provider-Based Systems have Two Variants
 All Data In House





Limited In House – Off Site Storage (Larger,
Multi-Site Practices





Larger Providers (Clinics)
Control All Own Data
Sizeable IT Staff for 24-7 Operations
Control of Own Backups
Smaller Providers – Limited IT Staff
Desire Out-of-Box Solution
Local Data for Ease of Access
Remote Storage – Promotes Off-Hours Access
Even 1st Variant – Service for “Backups”
SWEA182
EMR for Large Providers - AllScript
CSE
5095
SWEA183
EMR for Smaller Providers
Provider’s Office
Vendor’s Location
Server/Data Farm
CSE
5095
Local
EMR
Patient
Data
Remote
EMR
Remote Access
SWEA184
Integrating Clinical Repositories

CSE
5095

Provider/Hospital Relationship
 Provider has Privileges at Hospital
 Provider Chooses Office-Based EMR
 More Easily Integrated with Hospital EMR
 Emerging at Community Hospital Level
Example:
 Milford Hospital, MA
 All Area Providers with Privileges Linked in
 Ability to See Patient Records, Tests, at Hospital
 Unclear on Uploads from Providers to Hospital
 However, No Link to UMass Medical Center (of
which Milford Hospital is Affiliated)
SWEA185
Integrating Clinical Repositories

CSE

5095

CTSA – Region Wide Clinical/Translational Research
Target Area Hospitals
 St. Francis, Hartford, Hosp. Central CT, CCMC
 Each Hospital has Own Clinical Repository (EMR)
For Wider-Scoped T1, T2, and Clinical Research
 Need to Integrate these Repositories at Some
Level
 What is Most Practical?
 Setting up Centralized De-Identified Repository?
 Creating Data Marts as you go?
 What are Pros and Cons of Each?

Researcher Seeking CHF Patient Data Needs to
have De-Identified Data Mart
SWEA186
Integrating Clinical Repositories
CSE
5095
SWEA187
Integrating Clinical Repositories
CSE
5095
SWEA188
Integrating Clinical Repositories
CSE
5095
SWEA189
Integrating Clinical Repositories
CSE
5095
NHIN Prototype Phase I
SWEA190
Integrating Clinical Repositories
CSE
5095
NHIN Prototype Phase II
SWEA191
CSE
5095
SWEA192
Personal Health Record Integration
CSE
5095
SWEA193
Concluding Remarks

CSE
5095

Only Scratched Surface on Architectures
 Micro Architectures
 Macro Architectures
 Super-Macro Architectures (We’ll see …)
What’s are Key Facets in the Discussion?
 Role and Impact of Standards
 Open Solutions
 Architectural Variants – Reuse “Architecture”
 Can we Reuse CHIN for Clinical Practice?
 Are All Contributors Simply Each Hospital and EHR?
 How do we Connect all of the Pieces?

What are Next Steps?
 Let’s Review Some other Work
 Source: Wide Range of Presentations on Web
SWEA194
Download