Data and information quality - Computing and Information Systems

Data and Information Quality: an Information-theoretic Perspective Wei Hu and Junkang Feng Decisions making and efficiencies of business flow heavily depend on the quality of the information systems implemented. The evaluation of data quality (DQ) and information quality (IQ) has been treated as challenging issues in the field of information systems management for the last twenty years. However, we observe the definitions of DQ and IQ in the literature are not necessarily convincing, which seems to have hampered the development of deep and sound understanding of the issue, and of practically applicable and effective measures and techniques for their evaluation. It would appear that this might be caused by the seemingly lack of research on the definitions of data quality and information quality from an information-theoretic perspective. Through our review of relevant information systems literature, we believe that a rigorous and theoretically sound foundation is highly desirable to provide an insight into specifying and distinguishing the terms ‘data quality’ and ‘information quality’. This paper presents a data-info quality model under an Information Source (S) – Information Bearer (B) – Information Receiver (R) framework based upon theories of semantic information, including Dretske’s semantic theory of information, Devlin’s ‘infon’ theory, Stamper’s Organizational Semiotics, and Floridi’s revised standard definition of information. We present a set of definitions, compare data quality with information quality, and outline the objective and subjective aspects involved in addressing this problem. This model forms a basis for our further research in data and information quality assessment. 1. Introduction Information systems (IS) play a key role in organizations for decision-making and efficient business flow for years. Issues regarding the evaluation of data quality (DQ) and information quality (IQ) have been noticed and identified increasingly within the field of information systems management in recent years. Numerous research efforts have been made in this area from different disciplines and using different research approaches for the purpose of developing data and information quality concepts and methods (Ballou&Pazer, 85; Burgess et al., 04; Dedeke, 00; English, 99; Eppler, 01; Hill, 04; Lee et al., 02; Liu & Chi, 02; Price & Shanks, 04; Redman, 01; Wand & Wang, 96; Wang & Strong, 96; etc). Hundreds of tools have been produced for evaluating quality in practice since 1996 (English, 99). Research and practice indicates that data or information quality should be defined accurately and is taken as encompassing multiple dimensions. Many data or information quality frameworks have been presented in the literature. They contain some quality dimensions or categories derived normally based upon some research method in a specific domain with a set of quality metrics, criteria, components, items, or attributes. Eppler (01) gives five future directions for information quality research. The quest for more generic framework and the development of frameworks that show interdependencies between different quality criteria are emphasised. It appears, however, there is still lack of theoretical underpinnings of the exploration of interdependencies or inter-relationships among those quality indicators proposed. It leads to the difficulties for the professional who needs to decide an appropriate framework with a large set of criteria given for a task in hand within an organization. When investigating further, we observe that the terms ‘data quality’ and ‘information quality’ are considered synonyms by many if not all. They are usually interchangeable in relevant quality literature. The very concept of IQ is somewhat nebulous (Ballou et al., 03-04). It makes the discussion of the aforementioned question difficult and ambiguous. It seems to us that the notions of DQ and IQ are yet to be defined adequately in a grounded way. In this paper, we wish to argue that a set of wellestablished theories, including Dretske’s semantic theory of information (Dretske, 81), Devlin’s ‘infon’ theory (Devlin, 91), Stamper’s Organizational Semiotics (Stamper, 97) and Floridi’s revised standard definition of information (Floridi, 05), would provide a novel insight into investigating DQ and IQ and shed light to the interdependencies among different quality indicators. The specific aim of this paper is to explore how this problem might be approached from another perspective, namely an information-theoretic perspective, and then further research might be pursued to develop a quality framework thereby to analyze and rearrange existing or derive new quality categories for quality assessment in practice. This paper is organized as follows. We first review existing studies about DQ and IQ and the limitations of them that we notice particularly in Section 2. In Section 3, we present the basic notions of the theories referenced in this paper, and then introduce an information-centric framework for information systems and information flow. In Section 4, we propose a data-info quality model for understanding ‘data’, ‘data quality’, ‘information quality’ and their inter-relationships. And then we use this model to analyze quality categories from some existing approaches. Finally, in Section 5 we give conclusions and indicate future work. 2. Literature Review Existing studies have reached a consensus that DQ and IQ is a multi-dimensional concept. Research efforts have been made to derive quality indicators for the development of different quality frameworks. Wang and some other researchers (Lee et al., 02; Wang & Strong, 96; etc], following the methods developed in marketing research for determining the quality characteristics of products, present a framework of information quality (IQ) from information consumer’s perspective. They group all of their IQ dimensions into four IQ categories, Intrinsic IQ, Contextual IQ, Representational IQ, and Accessibility IQ. English gives three reasons to measuring information quality and two definitions of IQ (English, 99). One is its inherent quality, and the other is its pragmatic quality. His approach to quality includes three components, namely data definition quality, data content quality, and data presentation quality. DeLone and McLean’s review of the MIS literature during the 1980’s reports twenty-three IQ measures from nine previous studies (DeLone & McLean, 92; DeLone & McLean, 03). D&M (DeLone and McLean) say: “understandably, most measures of information quality are from the perspective of the user of this information and are thus fairly subjective in character” (DeLone & McLean, 92). Furthermore, we find that there are different classifications for existing approaches to DQ and IQ in terms of different perspectives. We illustrate them in Table 1. In addition, Eppler (01) reviews and finds out twenty information quality frameworks appearing in the literature from 1989 to 1999 in sixteen various application contexts. Many approaches to the quality problems in his findings, however, are proposed from a management, manufacturing, or technology perspective. He claims that the majority of frameworks they studied are context-specific rather than generic and widely applicable. He evaluates the frameworks according to two dimensions: analytic and pragmatic criteria respectively. Perspective Classifications Empirical research Practitioner-based approach Research Approaches (Price & Shanks, 04) Theoretical approach Literature-based approach Integrated approach Communities Academics’ view (Lee et al., 02) Practitioners’ view Software Quality Subject Domains (Burgess et al., 04) Data Quality Information Quality Web Quality Table 1: Classifications of existing approaches to DQ/IQ Through reviewing the literature, it seems to us that there is a lack of overarching theoretical perspectives or approaches for classifying existing quality frameworks with respect to their quality indicators delivered. Fundamental questions still remain as to how quality should be defined and the specific criteria that should be used to evaluate information quality (Price & Shanks, 04). As mentioned in Section 1, we are arguing that defining and distinguishing DQ and IQ should be addressed as a priority. However, from the work of Price and Shanks (04), the authors indicate that “due to the lack of agreement on the precise definition of information in the literature, we choose to restrict our usage of the term information to informal discussion and avoid its use in formal definitions”. It is difficult to achieve an agreement on the definitions of the terms Data and Information. To this end, we attempt to use an information-theoretical perspective for seeking a solution and providing a fresh insight as it would seem necessary to construct a formal and theoretically sound quality framework under which we derive quality criteria and categories. Existing studies normally consider data or information as a type of products or output of an information system and use the analogy between data and products to develop measurement models of DQ and IQ (Kahn, 97; Lee et al., 02; Price & Shanks, 04; etc). In the literature, the definitions of data quality and information quality are distinguished depending on whether information is considered to be a product or a service. However, the analogical approach is still limited because data are after all different from products (Liu & Chi, 02). Theoretical approaches do appear in the literature. Wand and Wang drive quality definitions by anchoring them in ontological foundations and base on the notion that the role of an information system is to provide a representation of an application domain as perceived by the user. For the information system to function properly, both the representation and interpretation transformations, involved in the development and use of an information system, need to be performed flawlessly (Wand & Wang, 96). It results in a set of four intrinsic data quality dimensions: complete, unambiguous, meaningful, and correct. A semiotic information quality framework (Price & Shanks, 04) is presented to define information quality and corresponding quality categories in terms of three semiotic levels, namely syntactic, semantic, and pragmatic, defined by Morris (38) and in terms of definitions for data, information and meaning by Mingers (95). Hill (04) proposes an information-theoretic model based upon Shannon & Weaver’s information theory for the purpose of considering customer information quality in an organization. It provides a quantitative assessment of proposed information quality improvements. However, there seems a lack of knowledge and attempt of using an informationtheoretic perspective for investigating both terms of ‘data’ and ‘information’ and DQ and IQ. For example, Wand and Wang (96) derived four DQ attributes, which is only a small sample of the attributes in assessing intrinsic DQ. This might be due to the lack of an understanding of the subjective and objective nature of the domain. 3. An Information-theoretic approach to quality Through reviewing the literature, we believe that an information-theoretical underpinning for the terms of ‘data’, ‘information’, DQ, and IQ should shed light to the quest of a generic quality model for the purpose of exploring interdependencies or interrelationships among quality indicators proposed in various quality frameworks. In this paper, we present an overall model for such a purpose that is based upon a set of well-established theories. Theories of Semantic Information Information is still an ‘explicandum’ (Floridi, 05) in academic community today. Numerous attempts have been made to define it, but many of them are ‘merry-go-round’ definitions (Stamper, 97). Shannon and Weaver’s paper (49) over half a century ago gives a mathematical model of communication, in which they use probability to define the amount of information that is caused by ‘reduction in uncertainty’. This covers only the engineering aspect of information creation and transmission. Dretske (81) makes a profound paradigm shift from engineering aspect to semantic aspect of information. We take Dretske’s account of the relationship between information and knowledge to be an important insight, which we intend to use as a way of incorporating epistemological considerations into the theory of information. Following Dretske, information will be taken as created by or associated with a state of affairs among a set of possibilities of a situation, the occurrence or realization of which reduces the uncertainty of the situation. We focus on claims of the form ‘a’s being F carries the information that b is G’. From the point of view of semiotics, which has been used in developing a science for information systems, we say that one signal, a’s being F, carries information about a state of affairs, b is G. Relevant to this, Dretske establishes the following definition of information content: Let k be prior knowledge about a specific information source, r being F carries the information that s is G if and only if the conditional probability of s being G given that r is F is 1 (and less than 1 given k alone). Following above definition, we proposed our first basic notion called ‘data bears information’ (Hu & Feng, 02; Xu & Feng, 02) which is now re-illustrated in Figure 1. being simplified to Information Level Y X bears Data Level being simplified to A B Figure 1: Simplification on information level and data level The main point relevant to this paper that this diagram illustrates is that a representation/signal is considered to represent/carry part of information existing in the real world. When the source of information, namely that part of real world, is changed or simplified, a new representation/signal could be used to replace the old one. For example, in the database area, we could use (entity-relationship) ER schemas to design a conceptual representation for a university (a part of real world). With the modification made on the information requirements of the university’s information systems, the representation used to bear the information source, namely ER schemas in this case, would be rearranged accordingly. Information can be transmitted. A state of affairs, say r1, is a particular case or an instantiation of a general situation, say r. The reduction in uncertainty at r due to the occurrence of r1 may be accounted for by one or more events, say s1, s2,...,sn, that occur at another general situation, say s. This gives rise to a special kind of relationship - ‘informational relationship’ (Dretske, 81) - between these two general situations r and s. An informational relationship captures certain degree of dependency between a state of affairs r1 of a general situation r and what takes place in another general situation s. This dependency can be demonstrated by the fact that r1’s appearance alters the distribution of probabilities of the various possibilities at s. The dependency is a type of regularity concerning different general situations based upon nomic dependencies (Dretske, 81), logic, or norms, etc. in a social setting. Due to this relationship, information created at s is transmitted to r. We will call s the ‘information source’, and r ‘the bearer of information about s’. Moreover, a state of affairs r1 at r can be seen as a signal that carries information about s. A sign/signal carries information about states of affairs in the world – what it signifies, even though the sign/signal may never be actually observed by anyone. Besides, if it is recorded, r1 becomes a piece of data. Thus data carry information. In general, data in a database system are a collection of recorded signals or events, which bear certain information about the source within a process of information transmission. Information is carried by a sign and is objective and in analogue form. Therefore we believe that it would be beneficial to look into problems regarding data and information from the perspectives adopted by various semantic information theories, which might help reach the root and reveal the essence of the problems. In order to introduce a theoretically sound foundation for the notions of information and our data-info quality model, we start with the ontological assumption that information is objective. In the beginning there was information. The word came later (Dretske, 81). The existence of information is independent of its interpreters or receivers (agents). We notice that Floridi defines four types of data: primary data; metadata; operational data; derivative data (Floridi, 05). He revises the ‘standard definition of information’ and adds a fourth condition to it. His work will be discussed further in Section 4. The S-B-R Framework To facilitate further studies of information within the context of information systems, that is, to gain insight and to be able to explain various phenomena in human communication, information creation and transformation, and the development of information systems, an overarching framework seems highly desirable even necessary. Aforementioned various theories and semiotics can be seen, among other things, address the issue of information and information flow in different ways and emphasize different aspects of it. We find that all these may be incorporated within a framework, which would help make sense of them, and make good use of them in understanding information and information flow. We believe that such a framework should be formulated from the point of view of how information is created, carried and finally received. Therefore we have created a framework consisting of Information Source, Information Bearer and Information Receiver, and the links between them. We call such an abstract model the ‘S-B-R Framework’ (illustrated in Figure 2). We use a simple example to show how this framework might work. As illustrated in Figure 2, some information is created due to reduction in uncertainty, for example, the tree is 80 years old, rather than it is possible that the tree is 40 years old or 80 years old among many other possibilities at an information source. This information can be carried by an information bearer due to an informational relationship between the source and the bearer, which may be based upon some ‘nomic dependencies’ (Dretske, 81). An information bearer provides an opportunity for an information receiver, for example a human agent, to receive information about the information source. By consulting an information bearer, an information receiver can acquire information (illustrated by dotted line in Figure 2) if the receiver is aware of and attuned to some constraints (Devlin, 91), which formulates the dependency and therefore the informational relationship between the bearer and the source. Information Source Information must be created in the first place. Following Dretsk, any situation may be regarded as a source of information as long as reduction in uncertainty takes place. It could be a Universe of Discourse, a particular situation (Devlin, 91), a relation, an event with uncertain outcomes, and so on. For example, the situation ‘choosing one from eight employees to do an unpleasant job’ can be an information source S. S provide an opportunity for R to receive information about S B provide information Information Source In addition, we maintain that the literal meaning, if any, of a bearer is independent of the information that it bears. It is only accidental that the former is (part) of the latter. Information Bearer R Information Receiver access/interpret B for receiving information about S carry information information, i.e., ‘Tree is 80 years old’, … Age of the tree when it was felled receives sees Type of the tree Animals that live in the vicinity bears … Tree stump Human being Figure 2 S-B-R Framework From the point of view of semiotics, an information source S can be seen as the ‘sign object’ (Falkenberg, 98) that conforms to the definition of ‘sign’ given by Charles Sanders Peirce. It is a thing that the sign alludes or refers to. Information Bearer Information flow requires, as necessity, some representation of information, which we call the bearer. An Information Bearer can be a traffic light or signal, a physical sign or an IT system. Following Stamper (97), anything, say x, can function as a sign if it can stand for something else, say y, for the people in some community. Here, x is an information bearer for y. With our S-B-R framework, our ontological assumptions are that information may or may not be carried by a bearer; information can be conveyed only through a bearer; and information is independent of whether one receives it or can receive it or not. For example, if a book were written in ancient Chinese, we would consider that it carries certain information no matter whether we can read it or not. Considering the structure of a sign given by Peirce, we agree that the ‘representamen’, which is a thing serving as the ‘carrier’ of the sign, is independent of its meaning (Falkenberg, 98). For example, an entity in an Entity Relationship data schema might refer to something that has no semantic correspondence with the meaning of the name given to that entity. Information Receiver To be able to receive information carried by a bearer, following Devlin, we maintain that an information receiver must be aware of and actually invokes some relevant ‘constraints’ (Devlin, 91) in order to receive information that is borne by a bearer. Different receivers may receive different information from the same bearer. The users of an information system are information receivers. In a system integration environment, an agent or a mediator can be an information receiver, which may process information further. 4. A Data-info quality model Information Quality is critical in organizations (Ballou et al., 03-04; DeLone & McLean, 2003). Early research efforts in Data Quality at MIT led to the development of the Total Data Quality Management (TDQM) cycle: Define, Measure, Analyze, and Improve (Wang, 04). Tu & Wang worked on ER extensions at the attribute level via modeling data quality of the original schemas (Tu & Wang, 93). Brodie (80) places the role of data quality within the life-cycle framework with an emphasis on database constraints. We believe that data quality has a close relationship with the tasks of information systems design and information quality has an interrelationship with data quality of an information system. In this section we put forward an observation, namely, it might be helpful to go back to the basics of information systems development. A similar perspective has been utilized by Wand and Wang (96). In this paper, we use another perspective, namely, information-theoretic perspective to look at Information Systems from the point of view of information flow from the source of information to the receiver of the information via some information bearer for the purpose of forming a data-info quality model. This idea comes from a seemingly widely accepted opinion that an information system is designed to store data (including multi-media data) and provide information to the information consumers. It is an ‘information-bearing’ media for the purpose of serving business processes and performance within an organization. Furthermore, it appears that there is a lack of a practical, theoreticalgrounded information-centric model in the literature thereby to explore and analyze an inevitable phenomenon, namely, information flow, in IS development and IS evaluation, in particular, DQ evaluation and IQ evaluation. The motivation of our work is that we aim to bring some contribution on the theoretical level through our model and address relevant issues mentioned in Section 1. Definitions Many definitions of the terms ‘data quality’ and ‘information quality’ have been proposed in the literature. Eppler lists seven definitions of information quality from reviewing existing literature on information quality published from 1989 to 1999 (Eppler, 01). It seems that many of them are defined from a management, manufacturing, or technology perspective. Some definitions for both of terms are ambiguous and overlap. We wish to argue that this might be caused by the lack of a sound theoretical foundation. The S-B-R framework described above might fill in this gap by providing a fresh insight into the problem and help define ‘data’ and ‘information’ for studying ‘data quality’ and ‘information quality’. Drawing on relevant literature regarding data quality and information quality and under the S-B-R framework, we generalize a conceptual model for considering these two terms as illustrated in Figure 3. We call it the ‘data-info quality model’. In the diagram, S normally contains three parts in the context of Information Systems Development. They are ‘original user requirements’, ‘user expectations’, and ‘organizational needs’. The latter two change due to the dynamic nature of organizational goals, business strategies and performance. In the middle of the diagram, B is an information system that is a carrier or a mediator of information source S. It can be an ERP system, a CRM system, and so on, in the core of which lies a data engine, such as a database or a data warehouse. R, the information receiver, receives information, which is part of S, by accessing and interpreting B. Following the notion of ‘data bears information’ discussed in Section 3 and the objectives of data quality and information quality evaluation appearing in context (Wang et al., 95) we look at Information Bearer (B) for assessing ‘data quality’. In the other words, the assessment of data quality is a task to define the quality of an information bearer. For assessing ‘information quality’ of an information system, we examine the linkage between Information Source (S) and Information Bearer (B), and the linkage between Information Receiver (R) and Information Bearer (B). In the other words, to assess information quality, we have to take the whole chain from S through B to R into consideration. We examine how well the information bearer represents the information source, and how well the information bearer supports the information receiver. That is to say, we look at how good the bearer is at conveying information to the receiver who would use perception and other cognitive means for this purpose. To enable such assessment, we present the information-theoretic definitions of data, information, data quality and information quality below. Definition 1. Data is a set of values recorded in an information system, which are collected from the real world, generated from some pre-defined procedures, indicating the nature of stored values, or regarding usage of stored values themselves; or, a model for the purpose of organizing, constraining, representing those values in an information system for its consumers. DQ R S B Original user requirement New organizational needs User expectations Information Consumer Represent Access Database/ Data Warehouse (inc. data value, structure, Machine constraints, etc) IQ Figure 3: A Data-info Quality Model We define data here in a broad sense to cover values and structures existing in an information system. Following Floridi, the first type above can be of four types (namely primary data, metadata, operational data, and derivative data) according to their sources and purposes. The second type has a direct impact on the organization of data of the first type in terms of requirements. Definition 2. Information, carried by non-empty, well-formed, meaningful, and truthful data (Floridi, 05), is a set of states of affairs, which are part of the real world and independent of its receivers. We define information in an objective way following Dretske and Floridi. Floridi (05) revises standard definition of information with adding a fourth condition that information must be truthful. As explained by Florid, ‘Truthful’ is used here as synonymous for ‘true’, to mean ‘representing or conveying true contents about the referred situation or topic’. Definition 3. Data Quality is the intrinsic quality of data (a type of information bearer) itself. This definition reveals the objective characteristics of the task of evaluating the quality of data, such as, representation, precision, and etc. It is in conformity with the discussion of the ‘syntactic quality criteria’ reported by the work of Price and Shanks (04) and the ‘inherent information quality characteristics’ defined by English (99), and the ‘intrinsic’ and ‘contextual’ data quality category proposed by Wang and Strong (96). Definition 4. Information Quality is the degree to which the information is represented and to which the information can be perceived and accessed. The term ‘information quality’ is defined from two directions in our data-info quality model. It is not a one-array concept; rather it is the degree of some relevant correspondence between the information source and the information bearer, and between the information bearer and the information receiver respectively. From a semiotic perspective, our work on this level is also in conformity with the ‘semantic quality criteria’ and the ‘pragmatic quality criteria’ reported by Price and Shanks (04), the ‘pragmatic information quality characteristics’ defined by English (99), and the ‘representational’ and ‘accessibility’ data quality categories proposed by Wang and Strong (96). 5. Data Quality Quality vs. Information According to Floridi (05), nonempty, well-formed and meaningful data may be of poor quality. Data that are incorrect, imprecise or inaccurate are still data and they are often recoverable, but, if they are not truthful, they can only constitute misinformation, which is not information at all. Following Floridi and considering our data-info quality model, we believe that high data quality is a necessary condition for seeking high information quality within an information system. It is not, however, a sufficient condition. For example, a well-organized database using Chinese characters that has recorded accurate and timely stock information does not have high information quality if its users include some non-Chinese speakers even though the system has high data quality. Take another example, a decisionmaker is provided a stock report with a set of complete, readable, and well-formatted data. He/she will not obtain any information if data is not true or inaccurate to reflect real situation. Therefore, high information quality should be based upon high data quality, and the data must be appropriately presented and accessible to the information consumer. Based upon our above thinking and definitions regarding data quality and info quality, we can rearrange existing quality dimensions and criteria in the literature into a new framework, as shown in Table 2 and Table 3 respectively. It is intuitively organized based upon our experience and corresponding description of the selected dimensions from the literature. Data Quality Price and Shanks (04) English (99) Wang et al. (96) Dedeke (00) latter. Distinguishing information quality from data quality will help IS professionals and organizations derive required and appropriate quality criteria for the task in hand. Further analysis and validation on aforementioned issues will be reported in our future publications. Objectivity vs. Subjectivity In the relevant literature, the notion of data or information quality depends on the actual use of data. They are normally investigated from the viewpoint of information consumers. From the work of Wand and Wang (96), a design-oriented approach is proposed to define data quality based upon a concept called ‘possible data deficiencies’ in a system context. Ballou and Pazer’s study focuses primarily on intrinsic dimensions that can be measured objectively (Ballou & Pazer, 85; Ballou & Pazer, 95). However, it would appear that the issue of subjectivity versus objectivity that are involved in data and information quality evaluation in information systems are hardly addressed adequately. We believe that to address this issue is important - not only can an insight of the problem be gained, but also it should benefit the selection of research methods for the development of a methodology for assessing data quality and information quality. Information Quality Syntactic quality Semantic quality, pragmatic quality Inherent characteristics Pragmatic characteristics Intrinsic, contextual Representational, accessibility Ergonomic, accessibility Representation Table 2: Some existing quality dimensions rearranged within a data-info quality framework Data Quality Information Quality Accuracy, format, timeliness, precision, amount of data, etc. Relevancy, accessibility, usefulness, readability, completeness, consistency, reliability, importance, truthfulness, etc. Table 3: Some existing quality criteria rearranged within a data-info quality framework Interdependencies among quality dimensions and criteria can be further explored and studied from the point of the view of the inter-relationship between data quality and information quality. The quality criteria for the former will clearly have impact on the Our preliminary thinking about this philosophical issue is that it can be looked at with the ‘S-B-R’ perspective. In Figure 3, we have shown that information quality is concerned with two linkages between S and B, and between B and R separately. The first linkage embodies the objective aspect of the problem following our ontological assumption on information. It depends on design-oriented or system-oriented. Therefore, theoretical techniques (i.e., SQL query design, schema transformation, and etc) and quantitative research methods will contribute to detecting and providing solutions to the problems. The second linkage should be looked at within a social setting, and therefore predominately inter-subjective or subjective (Mingers, 95). For example, different groups of information consumers may have different qualifications and different knowledge background, and therefore may receive different information from accessing the same information bearer. Qualitative research methods may contribute to identifying problems, reaching conclusions and obtaining solutions. Much more work should be carried out along this avenue. 6. Summary and Future Work In this paper, we have examined some fundamental issues concerning data and information quality evaluation from an information-theoretic perspective that is informed by a set of well-established theories. We have proposed a data-info quality model based upon an information-centric framework to provide a rigorous theoretical foundation for (1) defining and distinguishing the terms of ‘data quality’ and ‘information quality’; (2) discussing the interrelationships between two terms; (3) studying the subjective and objective characteristics of data quality and information quality. A more generic framework for data and information quality and a set of quality categories and criteria with their interdependencies articulated will be reported in future publications. Ballou, D. P. and H. L. Pazer, “Designing Information Systems to Optimize the AccuracyTimeliness Tradeoff”, Information Systems Research, 6(1) 1995, pp. 51-72. Ballou, D., Madnick, S., and Wang, R. Y., “Special Section: Assuring Information Quality”, Journal of Management Information Systems, Winter 2003-4, Vol. 20, No. 3, pp. 9-11. Brodie, M. L., “Data quality information systems, information, and management,” vol. 3, pp. 245-258, 1980. Burgess, M., Fiddian, N. J., and Gray, W, “Quality measures and the information consumer”, ICIQ 2004 Dedeke, A. “A Conceptual Framework for Developing Quality Measures for Information Systems”, Proceedings of the 2000 Conference on Information Quality (IQ-2000), Cambridge, MA, USA, 2000, pp.126-128. DeLone, W. H. and McLean, E. R. “Information Systems Success: The Quest for the Dependent Variable”, Information Systems Research, Volume 3, No. 1, March 1992, pp. 60-95. DeLone, W. H., & McLean, E. R. “The DeLone and McLean model of information systems success: A ten-year update”. Journal of Management Information Systems, 19(4), 2003, pp. 9-30. Devlin, K. Logic and Information. Cambridge University Press, Cambridge, 1991. Dretske, F. I. Knowledge and the Flow of Information, Basil Blackwell, Oxford, 1981. English, L. P., Improving Data Warehouse and Business Information Quality. Wiley & Sons, New York, 1999. This model is being validated through a two-stage survey. First, a series of interviews will be organized with selected organizations, enterprises, and institutions in the UK and China. The goal is to elaborate the model using a qualitative research method and to generate a data-info quality framework. Then, a questionnaire will be used to test in the real world the proposed quality framework and to categorize quality criteria. Eppler, M. J., “The concept of information quality: an interdisciplinary evaluation of recent information quality frameworks”, Studies in Communication Sciences, 1 (2001) p.167-182. References Hill, G., “An information-theoretic model of customer information quality”, Proc. IFIP Int’l Conf. on Decision Support Systems, Italy, 2004. Ballou, D. P. and H. L. Pazer, “Modeling Data and Process Quality in Multi-input, Multi-output Information Systems”, Management Science, 31(2) 1985, pp. 150-162. Falkenberg, D. E., Hesse, W., Stamper, R., et al. A Framework of Information Systems Concepts – The FRISCO Report (web edition), IFIP, 1998. Floridi, L., “Is Semantic Information Meaningful Data?”, Philosophy and Phenomenological Research, Vol. LXX, No. 2, March 2005. Hu, W. and Feng, J. 2002. “Some considerations for a semantic analysis of conceptual data schemata”, In Systems Theory and Practice in the Knowledge Age, (E. Ragsdell et al.), Kluwer Academic/Plenum Publishers. New York. ISBN 0-306-47247-3. Kahn, B. K., Strong, D. M. and Wang, R. Y., “A Model for Delivering Quality Information as Product and Service”, in Conference on Information Quality, Cambridge, MA, pp. 80-94, 1997. Lee, Y. W., Strong, D., Kahn, B., and Wang, R., “AIMQ: a methodology for information quality assessment”, Information and Management, 40(2) pp. 133-146, 2002. Liu, L. and Chi, L., “Evolutional Data Quality: A Theory-specific View”, ICIQ 2002. Mingers, J. “Information and meaning: foundations for an intersugjective account”, Information Systems Journal, 1995; 5:285 – 306 Morris, C., “Foundations of the Theory of Signs”, in International Encyclopedia of Unified Science, vol.1, University of Chicago Press, London, 1938. Price, R.J., Shanks, G.A., “A semiotic information quality framework”, In R. Meredith, G. Shanks, D. Arnott and S. Carlsson (eds.) Proceedings of the 2004 IFIP International Conference on Decision Support Systems (DSS2004): Decision Support in an Uncertain and Complex World, Prato, Italy, 1-3 July: 658-672. Redman, T., Data Quality: The Field Guide, New Jersey: Digital Press, 2001. Shannon, C. E. and Weaver, W. The mathematical theory of communication. Urbana: University of Illinois, 1949. Stamper, R. “Organisational Semiotics”, In Information Systems: An Emerging Discipline?, Mingers, J and Stowell, F. ed. The McGraw-Hill Companies, London, 1997. Tu, S.Y. and Wang, R. Y., “Modeling Data Quality and Context Through Extension of the ER Model”, Massachusetts Institute of Technology (MIT) Sloan School of Management, Cambridge, MA, TDQM93-13, 1993. Wand, Y. and Wang, R. Y., “Anchoring Data Quality Dimensions in Ontological Foundations”, Communications of the ACM, 39(11): 86-95, 1996 Wang, R.Y., Kon, H.B., and Madnick, S.E., “Data quality requirements analysis and modeling”, Proc. Ninth Int’l Conf. on Data Engineering, pp. 670-677, Vienna, 1993. Wang, R. Y., Storey, V. C., and Firth, C. P., 1995, “A Framework for Analysis of Data Quality Research”, IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No. 4, 1995. Wang, R.Y. and Strong. D.M. (1996) “Beyond Accuracy: What Data Quality Means to Data Consumers”, Journal of Management Information Systems, 12(4): 5-34. Wang, R. Y., “Data Quality: Theory in Practice”, EPA 23rd Annual Conference, April 2004. Xu, H. and Feng, J., “Towards a Definition of the ‘Information Bearing Capability’ of a Conceptual Data Schema”, In Systems Theory and Practice in the Knowledge Age, (E. Ragsdell et al.), Kluwer Academic/Plenum Publishers. New York. ISBN 0306-47247-3, 2002.

Data and information quality - Computing and Information Systems

Related documents

Products

Support

Data and information quality - Computing and Information Systems

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib