Considering and assuring quality dimensions in architecture design
"Drowning in data, yet starved of information"
(Ruth Stanat, 1990, in 'The Intelligent Corporation’ )
Ir. Nitesh Bharosa | n.bharosa@tudelft.nl
11-02-2010
Nitesh Bharosa
• PHD candidate at the ICT Section (finishing in January 2011)
• M.Sc. in Systems Engineering, Policy Analysis and Management
Thesis: Enterprise Architecture at Siemens
Research interest
• information & system quality
• orchestration & coordination
• enterprise-architecture, SOA, SAAS,
• public safety and disaster management
• Courses:
• SPM3410 Web information Systems and Management
• SPM4341 Design of Innovative ICT-infrastructures and services,
• guest lectures e-business and management of technology
2
• Understand the concepts of information and system quality in multi-actor environments
• Be able to distinguish multiple information quality dimensions
• Be able to distinguish multiple systems quality dimensions
• Understand principles for assuring information and system quality
• Introduction to “Master of Disaster Game”
3
• Strong, Lee & Wang. (1997). Data quality in context.
.
• Nelson et al (2002). Antecedents of information and system quality.
• Bharosa, N., et al (2009). Identifying and confirming information and system quality requirements for multiagency disaster management.
4
1. Background and relevance
2. Concepts and definitions
3. Hurdles for IQ and SQ in practice
4. Complex multi actor case: Disaster management
5. How do we assure information and system quality in the architecture?
6. Summary and conclusions
5
Information
Quality
System
Quality
*Delone & Mclean (1992). Information Systems Success: the quest for the dependent variable. Information Systems Research, 3(1), pp.60-95
7
• Operational Impacts:
• Lowered customer satisfaction
• Increased cost: 8–12% of revenue in the few, carefully studied cases
• For service organizations, 40–60% of expense
• Lowered employee satisfaction
• Typical Impacts:
• Poorer decision making: Poorer decisions that take longer to make
• More difficult to implement data warehouses
• More difficult to reengineer
• Increased organizational mistrust
• Strategic Impacts:
• More difficult to set strategy
• More difficult to execute strategy
• Contribute to issues of data ownership
• Compromise ability to align organizations
*based on Redman (2002)
8
9
• Quality is not a new concept in information systems management and research
• What is ‘new’ is the explosion in the quantity of information and the increasing reliance of most segments of society on that information
• Challenges: defining and improving quality for a specific context
• Information systems researchers have attempted to define data quality, information quality software quality, system quality, documentation quality, service quality, web quality and global information systems quality
10
• Quality information is information that meets specifications or requirements (Khan & Strong, 1999)
• IQ is the characteristic of information to meet the functional , technical , cognitive , and aesthetic requirements of information producers , administrators , consumers, and experts (Eppler, 2003)
•
• Information of high IQ is fit for use
(Huang, Lee, Wang, 1999, p. 43) by information consumers
IQ as set of dimensions describing the quality of the information produced by the information system (Delone & Mclean, 1992).
• Quality of information can be defined as a difference between the required information (determined by a goal) and the obtained information (Gerkes, 1997)
11
Information as a product
· usefulness
· comprehensibility
· relevancy
· completeness
· adequate representation
· coherence
· clarity
Information as a process
· trustworthiness
· accessibility
· objectivity
· credibility
· interactivity (feedback)
*Lesca & Lesca (1995)
12
Perspective
Content
Scope
Criteria relevance, obtainability, clarity of definition
Comprehensiveness, essentialness
Level of detail Attribute granularity, precision of domains
Composition Naturalness, identifiability, homogeneity, minimum unnecessary redundancy
View Consistency Semantic consistency, structural consistency
Conceptual View
Reaction to change Robustness, flexibility
Values Accuracy, completeness, consistency, currency/ cycle time
Category
Intrinsic data quality
Dimension
Accuracy
Objectivity
Believability
Reputation
Accessibility data quality
Accessibility
Access
Security
Meaning
The extent to which information represents the underlying reality.
The extent to which information is unbiased, unprejudiced and impartial
The extent to which information is regarded as true and credible.
The extent to which information is highly regarded in terms of its source or content
The extent to which information is available, or easily and quickly retrievable.
The extent to which access to information is restricted appropriately to maintain its security
*Strong, D. M., Lee, Y. W., & Wang, R. Y. 1997.
14
Data Quality in Context. Communications of the ACM, 40(5): pp.103-110.
Category
Contextual data quality
Representational data quality
Dimension
Relevancy
Value added
Meaning
The extent to which information is applicable and helpful for the task at hand
The extent to which information is beneficial and provides advantages from its use
The extent to which information is sufficiently up to date for the task at hand Timeliness
Completeness The extent to which information is not missing and is of sufficient bread and depth for the tasks at hand
The extent to which the volume information is appropriate for the tasks at hand Appropriate amount of data
Interpretability The extent to which information is appropriate languages, symbols and units and the definitions are clear
The extent to which information is composedly represented Concise representation
Consistent representation
The extent to which information is presented in the same format
Understandability The extent to which information is easy comprehended
*Strong, D. M., Lee, Y. W., & Wang, R. Y. 1997.
15
Data Quality in Context. Communications of the ACM, 40(5): pp.103-110.
16
Complexity of quality management
Intelligence
Based on level of understanding & experience
Knowledge
Internalization over time (human processing, can be tacit)
Information
Processing (use of information systems)
(raw) Data
Volume
17
18
• Data is an discrete, unitary, and indivisible element which conveys a single value. Data serves as the basis for computation and reasoning to be executed
• Information is an aggregate of one or more data elements with certain established relationships, and it has the ability to convey a single, meaningful message
• Knowledge is a large-scale selective combination or union of related pieces of information accumulated over a prolonged period of time, and it can be viewed as a discipline area
• Wisdom is the new knowledge subset created when the deductive ability acquired by a person after attaining a sufficient level of understanding of a knowledge area is executed
*Adapted from Liang (1994)
19
• “Perfect” IQ, is difficult, if not impossible, to achieve
• but neither is it necessary!
• If users of the data feel that its quality, which can be described by such attributes as accuracy, completeness and timeliness, is sufficient for their needs, then, from their perspective, at least, the quality of the information available to them is fine
• Hence we need a clear understanding of user processes and their information needs in specific context
21
22
• Defined as: the quality of the information system (as producing system) and not of the information (as product) (Delone & McLean, 1992)
• Also not a ‘new’ concept in information systems
• However, this concept has received less formal and coherent treatment than information quality
• Trend: information systems are becoming more than just single software applications
• SQ is also an antecedent for information system success
23
SQ dimension Example
Accessibility
Response time
Reliability
The 9/11 case shows that access to data across agency lines also needs to be improved to support interagency coordination (Comfort &
Kapucu, 2006). “In some cases, needed information existed but was not accessible” (Dawes et al., 2004)
As much of the information needed during the response is time sensitive a low response time is necessary (Board on Natural Disasters,
1999). In case of emergencies, time is of the essence—every moment of delay can significantly reduce an accident victim’s chances of survival
(Horan & Schooley, 2007) underlining the need for low response times
“…responding to disaster situations, where every second counts, requires reliable, dedicated equipment. Experience has shown that these systems are often the most unreliable during critical incidents when public demand overwhelms the systems” (National Research
Council, 2007)
24
SQ dimension Example
Interoperability “…given the number of organizations that must come together to cope with a major disaster, the interoperability of communications and other IT systems is often cited as a major concern” (National Research Council, 2007)
Integration The need for integration intensifies as the number of organizations engaged in response operations increases and the range of problems they confront widens (Comfort &
Kapucu, 2006)
Flexibility “A catastrophic incident has unique dimensions/ characteristics requiring that response plans/strategies be flexible enough to effectively address emerging needs and requirements”
(National Research Council, 2007)
25
• Examples include supply chains, value networks traffic systems and crisis management networks
• In such systems, intra- and inter organizational information flows need to be coordinated in order to achieve goals: high interdependency
• Information systems play in critical role in the coordination process
• Multiple echelons of coordination: strategic, tactical and operational
• Actors operate in a complex, dynamic and unpredictable task environment
26
• Chernobyl (1986)
• Herculus (1999)
• Enschede (2000)
• New York (2001)
• Singapore (2003)
• Tsunami (2004)
• Schiphol (2006)
• Delft (2008)
• …
27
Public Administration Review 62, Special Issue (September), 98–107
Strategic Echelon
Tactical Echelon
Operational Echelon
29
Manual situation report generation
IQ dimension Example
Completeness
Correctness
Relevancy
In the response to the 2004 Tsunami, “mostly, the information is incomplete, yet conclusions must be drawn immediately” (Samarajiva,
2005). “During Katrina, the federal government lacked the timely, accurate, and relevant ground-truth information necessary to evaluate which critical infrastructures were damaged, inoperative, or both”
(Townsend et al, 2006)
Firefighters rushing to the Shiphol Detention Complex received incorrect information about the open gates to the area and were delayed in finding the right gate (Van Vollehoven et al, 2006)
When police helicopters observed that one of the Twin Towers was going to collapse, they immediately requested all police officers leave the building. Despite that this information was also relevant for firefighters and ambulance services, they had never received this information and as a result, almost 400 of them died
32
SQ dimension Example
Accessibility
Response time
Flexibility
The 9/11 case shows that access to data across agency lines also needs to be improved to support interagency coordination (Comfort &
Kapucu, 2006). “In some cases, needed information existed but was not accessible” (Dawes, et al., 2004).
If there was a comprehensive plan to quickly communicate critical information to the emergency responders and area residents who needed it, the mixed messages from Federal, State, and local officials on the reentry into New Orleans could have been avoided (Townsend et al, 2006).
“A catastrophic incident has unique characteristics requiring that response systems be flexible enough to effectively address emerging needs and requirements” (National Research Council, 2007). “The lack of such capacity at the regional level (incl. municipalities, counties, districts, nonprofit and private institutions), was evident in the effort to mobilize response to the 9/11 events” (Comfort & Kapucu, 2006).
33
+
?
+
34
1. Understand the stakeholder goals and information needs
2. Model the process and information flows
3. Define clear IQ and SQ measurement instruments
4. Analyze hurdles for IQ and SQ (symptoms) on the various architectural layers (i.e., via observations and interviews)
5. Synthesize principles for assuring IQ and SQ
6. Implement and evaluate principles (i.e., prototyping, gaming simulation)
7. Train awareness: information as a product
8. Capture feedback and start over again (continuous process)
35
• Consumers/clients
• Process architects
• Database architects
• Data suppliers
• Application architects
• Communication trainers
• Programmers
• Managers (CIO, CTO etc)
• Auditors etc
36
Emergency Control Room (ECR)
Field Workers
Commando
Place Incident
(CoPI)
Municipal Crisis
Center (MCC)
Get acquainted and read material
Get acquainted and read material
Get acquainted and read material
Get acquainted and read material
Go to
Stations
Send
Emergency
Messages
PDA
Read Message and Broadcast thru DIOS
Process
Information
Requests
Send
Emergency
Messages
Reply on Mail and store information
Read
Emergency
Message on
Beamer
Receive CoPI
Information
Requests
Go to Info Point with information requests
37
Read
Emergency
Message on
Laptop
Complete
SITRAP by filling in DIOS
Exchange Info
Requests with
Field
IM: Interpret and react on
DIOS output
Read
Emergency
Message on
Laptop
Complete
SITRAP by filling in DIOS
Give Press
Conference
IM: Interpret and react on
DIOS output
• Context dependent
• Multidimensional constructs
• Subjective: dependent on the user judgment
• So, how do we measure IQ and SQ?
• Need for multiple instruments
• Questionnaires (paper or online)
• Observations
• Interviews
• Focus groups
• Gaming
38
The information I received from others was timely (upto-date).
The information I received from others was correct
(free-of-error)
The information I received from others was accurate
(no missing piece of information)
Others provided me with too much information
The information I received from others was relevant
(directly applicable to my decisions or actions)
The information I received from others was consistent
(not contradicting to other information)
Strongly
Disagree
Neutral
Strongly
Agree
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
39
The information system immediately provided the information I requested
I was able to obtain all the information I needed using the information system
The information system provided me with relevant information
The information system provided me with contradicting information
The response time of the information system was too high (I had to wait too long for the information I requested)
The information provided by the information system was in an easily understandable format (uncomplicated)
Strongly
Disagree
Neutral
Strongly
Agree
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
40
Typical hurdles Architecture
Layers
Stakeholder
Network
Process
Data
Technology
Ownership, isolation from processes, individual processing capability (overload), context and subjective interpretation
Fragmentation, the politics of information, incentives to share, security and privacy requirements
Event uncertainty, ad-hoc and unprecedented process flows, changing tasks and information needs
Multiple databases, large volumes, aggregation, integrating external and internal data, refining data into classified actionable 'chunks'
Heterogeneity, silo’s, incompatible standards, user accessibility, interface to sources, retrieval, reliability (up-time)
41
• Sender or source based strategies
• e.g., rules and policies, data cleansing
• Receiver or destination based strategies
• e.g., filters, aggregation algorithms
• Mediation or network based strategies
• e.g., stewardship and “Information Orchestration”
42
• data cleansing & normalization (Hernadez & Stolfo, 1998),
• data tracking & statistical process control (Redman, 1996),
• data source calculus & algebra (Lee, Bressen, & Madnick, 1998)
• data stewardship (English, 1999)
• dimensional gap analysis (Kahn, Strong, & Wang, 2002)
• Usually there are four steps involved
1. Profiling and identification of DQ problems
2. Reviewing and characterize of expectations (business rules)
3. Instrument development and Measurement
4. Solution proposition and implementation
43
• More/better hardware
• More/better software
• Reduce number of nodes in the information flow
• Redundancy (reliability and robustness)
• Less forms and procedures in the information exchange process
44
• More databases and technologies include higher cost and do not solve IQ and SQ problems in coherence
• Assume a “static” data layer
• Do not address task environment dynamics and uncertainty
• Reactive, do not include strategies for sensing and adapting
• Need for proactive mechanisms to deal with dynamic information needs
45
Before a disaster
Advance structuring strategy
Preemptive principles
(e.g., IQ auditing)
Protective principles
(e.g., dependency diversification)
Offensive
Information
Orchestration
Defensive
Dynamic adjustment strategy
Exploitative principles
(e.g., proactive sensing)
During a disaster
Corrective principles
(e.g., IQ rating)
46
46
• Examples of preemptive principles
• Treat information as product not by-product
• Organize IQ audits on a regular basis
• Assign IQ roles and responsibilities across organizational units
• Examples of protective principles
• Maximize the number of sources for each information object
• Define several information access and manipulation levels
• Strive for loosely coupled application components
47
• Examples of exploitative principles
• Anticipate information needs prior to the occurrence of events
• Exploit multi-channel and technology convergence
• Scan the environment for complementary information
• Examples of corrective principles
• Maximize the number of feedback opportunities across the network
• Develop policies for ascertaining information needs, acquiring and managing information throughout its life cycle
• Encourage a sharing culture (data to information transformation by collective interpretation, discussion & expert analysis)
48
49
* Source: Lee et al (2006) Journey to data quality 51
• security & accessibility: the more secure an information system is, the less convenient is its access
• timeliness & accuracy: the more current a piece of information has to be, the less time is available to check on its accuracy
• correctness or reliability and timeliness: the faster information has to be delivered to the end-user, the less time is available to check its reliability or correctness
• right amount of information (or scope) and comprehensibility: more detailed information can prevent a fast comprehension, because it becomes difficult “to see the big picture”
• conciseness & right amount (scope) of information: the more detail that is provided, the less concise a piece of information or document is going to be
*based on Eppler (2003)
52
• Flexibility versus robustness
• Accessibility versus security
• Security versus interoperability
• Reliability versus flexibility
• Availability versus cost
• Adaptability versus accountability
53
• Assuring high IQ and SQ is becoming more important and more problematic
• The hurdles for IQ and SQ are abundant and multi-level
• There is no one best (technical) solution for IQ problems, the solution space covers multiple architecture layers (e.g., organizational, process and technical layers)
• Assuring IQ and SQ is an continuous process and needs to be institutionalized/embodied in the organizational culture
• There are many information quality dimensions and not all are relevant: some tradeoffs need to be made
54