This article was downloaded by: [128.122.253.228] On: 01 July 2015, At: 06:52 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Management Science Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org Modeling Information Manufacturing Systems to Determine Information Product Quality Donald Ballou, Richard Wang, Harold Pazer, Giri Kumar Tayi, To cite this article: Donald Ballou, Richard Wang, Harold Pazer, Giri Kumar Tayi, (1998) Modeling Information Manufacturing Systems to Determine Information Product Quality. Management Science 44(4):462-484. http://dx.doi.org/10.1287/mnsc.44.4.462 Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact permissions@informs.org. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. © 1998 INFORMS Please scroll down for article—it is on subsequent pages INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org Modeling Information Manufacturing Systems to Determine Information Product Quality Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Donald Ballou j Richard Wang j Harold Pazer j Giri Kumar.Tayi Management Science and Information Systems, State University of New York at Albany, Albany, New York 12222 Total Data Quality Management (TDQM) Research Program, Room E53-320, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 Management Science and Information Systems, State University of New York at Albany, Albany, New York 12222 Management Science and Information Systems, State University of New York at Albany, Albany, New York 12222 M any of the concepts and procedures of product quality control can be applied to the problem of producing better quality information outputs. From this perspective, information outputs can be viewed as information products, and many information systems can be modeled as information manufacturing systems. The use of information products is becoming increasingly prevalent both within and across organizational boundaries. This paper presents a set of ideas, concepts, models, and procedures appropriate to information manufacturing systems that can be used to determine the quality of information products delivered, or transferred, to information customers. These systems produce information products on a regular or as-requested basis. The model systematically tracks relevant attributes of the information product such as timeliness, accuracy and cost. This is facilitated through an information manufacturing analysis matrix that relates data units and various system components. Measures of these attributes can then be used to analyze potential improvements to the information manufacturing system under consideration. An illustrative example is given to demonstrate the various features of the information manufacturing system and show how it can be used to analyze and improve the system. Following that is an actual application, which, although not as involved as the illustrative example, does demonstrate the applicability of the model and its associated concepts and procedures. (Data Quality; Timeliness of Information; Information Product; Information Systems; Critical Path) 1. Introduction Product quality in manufacturing systems has become increasingly important. The current emphasis on Total Quality Management (TQM) is a manifestation of this trend. Although increasing competition has heightened attention to quality, quality control in manufacturing systems has a long tradition (Shewhart 1931, Deming 1986, Figenbaum 1991). Quality-driven organizations continually strive to improve their products in a variety of ways. Some changes are major, others minor, but taken together over an extended period of time such changes can yield profound improvements in the product’s overall quality. As in manufacturing systems, information quality in computer-based systems is becoming increasingly critical to many organizations. The current efforts toward information highways and networked organizations underscore the importance of information quality. Organizations are relying more on the quality of the raw data and the correctness of processing activities that ul0025-1909/98/4404/0462$05.00 462 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 462 Monday Apr 20 09:28 AM Man Sci (April) 0010 Copyright q 1998, Institute for Operations Research and the Management Sciences BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. timately determine the information outputs. They would obviously prefer that their information outputs be of the highest possible quality. As with product manufacturing, however, cost must be taken into consideration. A workable goal, then, is to achieve the highest possible information quality at a reasonable cost. 1.1. Information Manufacturing Systems Many of the concepts and procedures of product quality control can be applied to the problem of producing better quality information outputs. Use of the term information manufacturing encourages researchers and practitioners alike to seek cross-disciplinary analogies that can facilitate the transfer of knowledge from the field of product quality to the less well-developed field of information quality. We use the term information manufacturing advisedly. For the purposes of this research, we refer to information manufacturing as the process that transforms a set of data units into information products. In addition, we refer to information manufacturing systems as information systems that produce predefined information products. We use the term information product to emphasize the fact that the information output has value and is transferred to the customer, who can be external or internal. The systems we model have an analogy in manufacturing known as made-to-stock. Made-to-stock items are typically inventoried or can be assembled upon demand. Requests for such products can be readily satisfied because the materials, procedures, and processes needed for their manufacture are known in advance. In the realm of information systems an example would be a request by a client to his or her financial advisor for a portfolio risk analysis. Although this would be requested on an ad hoc basis, the data and programs needed to perform the analysis would be in place ready to be used. In our context, a predefined data unit could be, for example, a number, a record, a file, a spreadsheet, or a report. A predefined processing activity could be an arithmetical operation over a set of primitive data units or an operation such as sorting a file. An information product could be a sorted file or a corrected mailing list. This information product, in turn, can be a predefined data unit in another information manufacturing system. Viewing information systems in the light of what is known about producing high quality manufactured MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 463 Monday Apr 20 09:28 AM Man Sci (April) 0010 goods can be very useful. An example of the potentially fruitful cross-pollination between manufacturing and information systems is the concept of critical path. Identifying the critical path is, of course, a standard activity in manufacturing. As will be shown in the illustrative example, if one wishes to produce a certain information output sooner, one should first concentrate on those activities on the critical path. Although much can be gained by incorporating concepts and techniques from product manufacturing into the realm of information manufacturing, the analogies between the two fields have important limitations. These limitations arise from the nature of the raw material used in information manufacturing, namely the original or raw input data. A significant feature of data is that, although it is used, it does not get consumed. One might think of a file or a data base as analogous to in-process inventory. Yet such inventory gets depleted, whereas stored data can be reused indefinitely. In a sense, a database is more analogous to a tool crib than to inventory. With a tool crib, tools are used and then returned; they are not consumed. However, even this analogy heightens the differences between the two kinds of manufacturing. Tools are used to produce the manufactured product and are not incorporated into the product as is the case with data from a data base. A related issue is that producing multiple copies of an information product is inexpensive, almost trivial when compared to manufactured products. We consider four attributes of information products in this paper: timeliness, data quality, cost, and value. The term ‘‘data quality’’ is used in a generic sense, that is, ‘‘data quality’’ is a place holder for whatever dimensions of data quality are relevant. If one is interested solely in the data’s completeness, then one would replace the term ‘‘data quality’’ wherever it appears in this work with the word completeness. It should be noted we use the term data quality for intermediate data products (those that experience additional processing) and reserve the terms information quality and information product for the final product that is delivered to the customer. Timeliness is usually considered a dimension of data quality; see, for example, (Ballou and Pazer 1985). The need to treat timeliness separately can be best understood by considering one of the ultimate goals of this research: to permit changes to the information 463 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. manufacturing system, ranging from fine tuning to reengineering, in the context of customer concerns regarding the information products. Many information products are time-sensitive, and thus any efforts directed toward improving these products must explicitly factor timeliness into the analysis. 1.2. Purpose and Scope of Paper In this paper, we present a set of ideas, concepts, models, and procedures that form some of the building blocks upon which a theoretical foundation for information manufacturing systems can be established. Based on these building blocks, we present a methodology for determining information product attribute values. In addition, we illustrate how these building blocks can be used to study options in improving the information manufacturing system. In our context, information products can be produced on a regular basis, for example standardized regular billing such as monthly credit card statements. In some cases there are few quality problems with an information product. However, timeliness and quality may be conflicting goals. For example, the sooner credit card bills are delivered to customers, the sooner the issuing company would be reimbursed. Also, the card issuer would be able to identify nonpayment problems sooner. Speeding up the production of the monthly statement, however, could compromise quality. Our work is designed to provide tools for analyzing how changing the information manufacturing system would affect tradeoffs such as this one. As is also the case with traditional product manufacturing systems, there is a partial separation between the strategic issues of product mix and pricing and the managerial issues related to the efficient manufacture of the desired products. In both cases, those designing the requisite manufacturing systems can make important contributions to these strategic decisions by determining the economic and technical feasibility of possible variants of the product mix. While many of the strategic issues relating to product mix and pricing extend well beyond the domain of production planning, an important contribution of the production sector is in transforming product specifications into the desired components of the product mix. The concepts, techniques, and procedures presented in this paper permit the de- 464 3b28 0010 Mp 464 Monday Apr 20 09:28 AM Man Sci (April) 0010 signers to assess the impact of various information manufacturing scenarios on timeliness, quality, and cost attributes of the information product. If necessary, modifications to individual products and/or the product mix can be made in light of such an assessment. Thus, this paper does not explicitly address the significant issue as to whether the information products are appropriate although it does facilitate analysis of revamped systems that produce different, presumably more appropriate, information products. It is not explicitly concerned with issues such as what kinds of data to use or what kinds of processing are required, but it does allow the designer to test out various alternatives. Also excluded in our present model are ad hoc queries. If such queries are requested frequently enough, they could be included in the analysis. However, if they are that well-defined, we have in some sense a made-to-stock situation. 1.3. Background and Related Research Organizations are now better equipped than ever to develop systems that use raw data originating from a variety of sources. Unfortunately, most databases are not error free, and some contain a surprisingly large number of errors.1 It has long been recognized that data problems can cause computer-based systems to perform poorly. The need to ensure data quality in computer systems has been addressed by both researchers and practitioners for some time. A growing body of literature has focused on data quality: what it is, how to achieve it, and the consequences arising when it is inadequate (Wang et al. 1995). The dimensions of data quality have been studied (Ballou and Pazer 1995), (Wang and Strong 1996). A model for tracking errors through a system to determine their impact on the information outputs has been developed by Ballou and Pazer (1985). Procedures for achieving data quality have also been presented (Morey 1982, Ballou and Tayi 1989). Deficiencies in data that affect individual’s lives have also been formally examined by various researchers. Laudon (1986) determined that the records of many of those involved with the criminal justice system contain potentially damaging errors. The impact of errors 1 ‘‘Databases are Plagued by Reign of Error,’’ The Wall Street Journal, May 26, 1992. MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems in information on the likelihood of making correct decisions was analyzed by Ballou and Pazer (1990). Research efforts on data quality presented in the existing literature have addressed issues from the information systems perspective, but no general mechanism has been proposed to systematically track attributes of data. Our methodology allows for the systematic tracking of timeliness, quality, and cost. This capability can be used to analyze an information manufacturing system and, based on the analysis, to experiment with various options. The ideas, concepts, model, and procedures proposed in this paper would be useful in providing a common set of terms and thus supporting the building of a cumulative body of research in this domain. A major outcome of the work described in this paper is a model-based approach for studying information manufacturing systems. In the following section, we introduce the foundation of our model. This model incorporates the various components of the information manufacturing system and key system parameters including timeliness and data quality, as well as value to the customer and cost of information products. In §3, we use the model to provide a methodology for analyzing the impact of system modifications on information product attributes. In §4, this methodology is exemplified through an illustrative example. Specifically, we focus on explaining the mechanics of the proposed methodology. Next, in §5, we present a real-life application called the Optiserve case with a goal to demonstrate the methodology’s usefulness and ease of implementation in improving an actual information manufacturing system. Toward this end, we highlight the modeling nuances needed to accommodate the realistic aspect of this case, and outline the methods for acquiring appropriate data. Concluding remarks are found in §6. that the product’s potential value to the customer may be diminished if it is untimely or of poor quality. The value of the information products can be improved by making appropriate changes to the information manufacturing system. The importance of doing this is attested by Hammer (1990). We seek to determine the key parameter values that will help to identify those changes to the system. 2.1. Modeling of Information Manufacturing Systems To evaluate various system configurations, data units must be tracked through the various stages or steps of the information manufacturing process. Any of these steps has the potential to affect timeliness and data quality for better or worse. For example, introduction of additional quality control would enhance the quality of the data units but with a concomitant degradation in the timeliness measure. Also, improving a processing activity could result in higher levels of data quality and improved timeliness but increase the cost. The various components of the information manufacturing system are displayed in Figure 1. The data vendor block represents the various sources of input raw data. Each vendor block can be thought of Figure 1 Components of the Information Manufacturing System 2. Foundation of the Information Manufacturing Model As previously stated, the term information manufacturing refers to a predefined set of data units which undergo predefined processing activities to produce information products for internal or external customers, or both. We postulate that each information product has an intrinsic value for a given customer, and we assume MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 465 Monday Apr 20 09:28 AM Man Sci (April) 0010 465 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems as a specialized processing block, one that does not have a predecessor block. Thus, one vendor (internal or external) can potentially supply several different types of raw data. The role of the processing block is to add value by manipulating or combining appropriate data units. The data storage block models the placement of data units in files or data bases where they are available as needed for additional processing. The quality block enhances data quality so that the output stream has a higher quality level than the input stream. The customer block represents the output, or information product, of the information manufacturing system. It is used to explicitly model the customer, which is appropriate, as the ultimate judgment regarding the information products’ quality and timeliness is customer-based. We envision that the modeling of the information manufacturing system would take place at an appropriate level of detail. For example, the effect of a quality block could be modeled by specifying the fraction of apparently defective units entering and the fraction leaving. At a more detailed level, the quality block splits the incoming stream into apparently good and defective subsets. The apparently defective subset is examined and undergoes corrective action as appropriate. Depending on the nature of the defects identified, the apparently defective items could be split into additional subsets, each of which would undergo different, appropriate corrective action. Associated with each of these subsets are probability values giving the likelihood of Type I and Type II errors, which, together with knowledge of the original fraction of defectives, yields the fraction of apparently correct and defective units arriving at the next block. Information regarding how this applies in other contexts can be found in Ballou and Pazer (1982) and Morey (1982). For this paper we have chosen not to model at this level of detail. The nature of the activities performed by the quality control blocks is context-dependent. This is true even for the same data quality dimension. For example, suppose that an information product is dependent upon a form with blanks filled in by various parties. A quality control check in this case could be a scan of the form by a knowledgeable individual to identify missing information. Another type of completeness quality control could be a verification that all stores have reported their sales for the most recent period. An accuracy check 466 3b28 0010 Mp 466 Monday Apr 20 09:28 AM Man Sci (April) 0010 could be a comparison of this period’s and last period’s results with outliers flagged for verification. Figure 2 displays a simple information manufacturing system but one which captures many of the potential components and interactions. This system will be used throughout the paper to illustrate concepts, components, and procedures developed for the information manufacturing model. In this system there are five primitive data units (DU1 –DU5 ) supplied by three different vendors (VB1 , VB2 , VB3 ). There are three data units (DU6 , DU8 , DU10 ) that are formed by having passed through one of the three quality blocks (QB1 –QB3 ). For example, DU6 represents the impact of QB1 on DU2 . There are six processing blocks (PB1 –PB6 ) and accordingly six data units that are the result or output of these processing blocks (DU7 , DU9 , DU11 , DU12 , DU13 , DU14 ). There is one storage block (SB1 ) in Figure 2. The storage block is used both as a pass-through block (DU6 enters and exits SB1 intact and is passed on to PB3 ) and as the source for database processing (DU1 and DU8 are jointly processed by PB4 ). Note that the autonomy of the data units need not be preserved. A new data unit DU11 that involves DU1 and DU8 is formed. The system has three customers (CB1 –CB3 ), each of whom receives some subset of the information products. Also note that multiple copies of data can be produced. For example, two copies of DU6 are produced and used subsequently by PB1 and PB3 . Note that the placement of a quality block following a vendor block (similar to acceptance sampling) indicates that the data supplied by vendors in general is deficient with regards to data quality. For our illustrative example, the data DU2 has historically exhibited quality deficiencies thus necessitating the quality block QB1 . This modeling is similar to the use of data flow diagrams (DFD). ‘‘Vendor’’ and ‘‘customer’’ blocks are analogous to ‘‘external entities,’’ the ‘‘Process’’ block to ‘‘function,’’ and the ‘‘Data Storage’’ block to ‘‘data store.’’ We have deliberately chosen not to use this terminology and notation primarily to emphasize in our exposition the analogy with product manufacturing, the theme of this paper. Also, the concept of quality block does not have a direct analogue in the DFD technique. However, those wishing to use DFD techniques to model an information manufacturing process certainly could do so. This would take advantage of the knowl- MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Figure 2 An Illustrative Information Manufacturing System edge of CASE tools held by many information systems professionals. As will be explained in greater depth in §3, the data units have associated with them vectors of characteristics or parameters whose components change as a result of passing through the various stages of the information manufacturing process. What constitutes a data unit is context-dependent. For example, if all fields for all records of a certain file possess the same timeliness and data quality characteristics, and if the entire contents of the file are processed in the same manner, then that file could be treated as a single data unit. In contrast, if the fields within a record differ markedly in terms of their timeliness and data quality attributes, then it would be necessary to model them individually. By this we mean that each field of each record would be treated as a different data unit. Clearly in practice compromises would have to be made to avoid an inordinate quantity of data units, but in theory there is no limit regarding their number. The Optiserve case described in §5 illustrates how to convert an actual situation into an information manufacturing system model of the type displayed in Figure MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 467 Monday Apr 20 09:28 AM Man Sci (April) 0010 2. That case will examine the current system and provides a basis for reengineering of the system. 2.2. Measurement of Key System Parameters In this section we present various formulas to measure timeliness, data quality, and value. To use the information manufacturing model, these factors must be quantified. A discussion preceding each formula identifies properties that any measure of the quality in question must possess. Some of these measures can be justified on the basis of previous research. Thus these formulas build upon the accumulated knowledge and can be applied in a wide range of situations. That being said, the precise expression for these measures is not critical for the information manufacturing model. If in a certain case or situation those responsible for implementation of the information manufacturing system feel that a different set of formulas would be more appropriate, then the analysis would proceed using their formulas instead of the ones used here. 2.2.1. Timeliness. The timeliness of a raw or primitive data unit is governed by two factors. The first, 467 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems currency, refers to the age of the primitive data units used to produce the information products. The second, volatility, refers to how long the item remains valid. The age of some data, that is, its currency, does not matter. The fact that George Washington was the first president of the United States remains true no matter when that fact centered the system. In contrast, currency matters in the case of a market free fall, when yesterday’s stock quotes may be woefully out of date. The currency dimension is solely a characteristic of the capture of the data; in no sense is it an intrinsic property. The volatility of the data is, however, an intrinsic property unrelated to the data management process. (We may choose to manage volatile data in such a way that it is reasonably current, but such activities do not affect in any way the underlying volatility.) 2.2.1.1. Timeliness Measure for Primitive Data Units. The first step in developing a measure for timeliness of a primitive data unit is to quantify the currency and volatility aspects of timeliness. Both currency and volatility need to be measured in the same time units. It is natural to use time tags to indicate when the data item was obtained; see, for example, Wang, Kon, and Madnick (1993). This information is used to determine an appropriate currency measure. The currency measure is a function of several factors: when the information product is delivered to the customer (Delivery Time); when the data unit is obtained (Input Time); and how old the data unit is when received (Age). These factors can be combined to yield the following definition of currency. Currency Å (Delivery Time 0 Input Time) / Age. (1) Note that the term in parentheses represents how long the data have been in the system and the last factor represents the time difference between when the real-world event occurred and when the data was entered (Wand and Wang 1996). As will be shown in the Illustrative Example of §4, volatility is captured in a way analogous to the shelf life of a product. Perishable commodities such as food products are sold at the regular, full price only during specified periods of time. Degradation of the product during that time is not deemed to be serious. Similarly, suppliers of primitive or raw data units and/or data managers would determine the length of time during 468 3b28 0010 Mp 468 Monday Apr 20 09:28 AM Man Sci (April) 0010 which the data in question remain valid. This number, which we refer to as shelf life, is our measure of volatility. The shelf life of highly volatile data such as stock quotes or currency conversion tables would be very short. On the other hand the shelf life of data such as the name of the first president of the United States would be infinite. The shelf life would be determined by the data quality manager in consultation with the information product consumers and of necessity is product-dependent. If the information product is designed for customers who are long-term investors in the stock market, then quotes in today’s paper regarding yesterday’s close are more than adequate. If the product is for customers who are ‘‘in and out’’ traders, then the most recent trading price is appropriate. In the former case shelf life is in terms of one or more days. In the second case it is minutes or even seconds. Our approach postulates that the timeliness of an information product is dependent upon when the information product is delivered to the customer. Thus timeliness cannot be known until delivery. The purpose of producing a timeliness measure is to have a metric that can be used to gauge the effectiveness of improving the information manufacturing system. For comparison purposes, it is important to have an absolute rather than a relative scale for timeliness. With this in mind we measure timeliness on a continuous scale from 0 to 1. Value of 1 is appropriate for data that meet the most strict timeliness standard; value of 0 for data that are unacceptable from the timeliness viewpoint. The currency or overall age of a primitive data unit is good or bad depending on the data unit’s volatility (shelf life). A large value for currency is unimportant if the shelf life is infinite. On the other hand, a small value for currency can be deleterious to quality if the shelf life is very short. This suggests that timeliness is a function of the ratio of currency and volatility. This consideration in turn motivates the following timeliness measure for primitive data units. Timeliness Å {max[(1 0 currency/volatility), 0]}s (2) In §§4 and 5, volatility is measured in terms of shelflife, thus Timeliness Å {max[(1 0 currency/shelf-life), 0]}s (2a) The exponent s is a parameter that allows us to control the sensitivity of timeliness to the currency- MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems volatility ratio. Note that for high volatility (i.e., short shelf life) the ratio is large, whereas for low volatility (i.e., long shelf life) the ratio is small. Clearly having that ratio equal to or close to zero is desirable. As that ratio increases, is the timeliness affected relatively little (s Å 0.5, say), a lot (s Å 2, say) or neither (s Å 1)? The appropriate value for s is context-dependent and of necessity involves judgement. In Equation (2), volatility and the exponents are given as inputs to the model, while currency is computed, as will be presented in §3 and illustrated in §4. 2.2.1.2. Timeliness Measure for Output of Processing Blocks. Our goal is to attach a timeliness measure to each information output. Each such output is the result of certain processing and various inputs. Each input in turn can be the result of other processing and inputs. Potentially each information output is dependent upon several stages of processing and a host of primitive data units. This convolution is addressed by considering one block at a time. First, we focus on those blocks that involve processing activities, both arithmetical and nonarithmetical. Quality and storage blocks are treated next. It is important to keep in mind that a timeliness value is computed and attached to each process output. Timeliness is actually measured only for primitive data units. Arithmetical Operations. Even simple cases present problems. Suppose, for example, that output value y is the difference of input values x1 and x2 , i.e., y Å x1 0 x2 . Assume further that x1 has a very good measure for timeliness whereas x2 has a poor measure for timeliness. If x1 Å 1000 and x2 Å 10, then the timeliness value for y is very good. Conversely, should x1 Å 10 and x2 Å 1000, the timeliness value for y is poor. Clearly any composite timeliness value must take magnitudes into account. How the variables interact must also be accounted for. If, for example, x1 and x2 have the timeliness measure described above, and are of roughly equal magnitudes, then outputs y1 Å x1 / x2 and y2 Å x1 ∗ x2 clearly differ in how the poor level of timeliness of x2 impacts the timeliness of the outputs. From the calculus we know that given a function y Å f(x1 , x2 , . . . , xn ), xi Å xi (t), then dy Å dt n S D Ìf ∑ Ìx i iÅ1 dxi . dt MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 469 Monday Apr 20 09:28 AM Man Sci (April) 0010 This expression captures how the dependent variable is affected by changes in time t. More importantly from our perspective, it accounts for the interaction among the independent variables. We, of course, are not concerned with rates of change of the variables with respect to time. Still the above can provide guidance regarding a timeliness measure for a computed output. Ordinarily one would expect that if the timeliness value for each of the inputs were 1, then the timeliness value for the output would be excellent, undoubtedly equal to 1 also. Conversely, if all primitive data items possess a timeliness value of 0, one would expect the timeliness value for any resulting output of the processing blocks to be 0 as well. Considerations such as these motivate our definition for timeliness of the output of a processing block that involves arithmetical computations. Let T(xi ) denote the timeliness measure for xi and let y Å f(x1 , x2 , . . . , xn ) be an information output. Then we propose the following to represent or measure the timeliness of y. T(y) Å ( niÅ1 wi ∗ T(xi ) ( n iÅ1 wi where wi Å Z Z Ìf Ìxi ∗É xi É. (3) Equation (3) is a weighted average of the T(xi ). It is assumed that each of the terms above is evaluated using those values that determine the output value for y. (If y Å x1 0 x2 and x1 Å 1000, x2 Å 10, then these values will be used as appropriate in Equation (3).) Note that if T(xi ) Å 0 for all i, then T(y) Å 0 and if T(xi ) Å 1 for all i, then T(y) Å 1. The dependence of the timeliness of y on the interactions of the xi is captured in a manner analogous to the chain rule of the calculus. Finally, the need to involve the magnitudes of the values is explicitly modeled. The absolute values ensure that the range 0 to 1 is preserved and that positive and negative values do not cancel each other. As indicated, if a different formula would be more appropriate, then the analysis would proceed in the same manner using that formula. This is demonstrated in the Illustrative Example. It is important to note that because of the currency component, timeliness measures cannot be stored. Rather they must be determined at the time the information product is delivered to the customer. Delivering the same information product to different customers at different times would result in different timeliness values for these customers. 469 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Nonarithmetical Operations. Data units can undergo processing that does not involve any arithmetical operations. For some types of data the processing does not change the timeliness value. For example, if the data unit is a file and the process is to sort the file, then the output data unit would have the same timeliness measure as the input data unit. Recall that the timeliness measure ultimately depends upon the volatility of the raw data and the time the customer receives the information product. Built into the latter value is the time for, say, sorting the file. Also, if the activity should be to extract a subset from a data unit, the resulting (subset) data unit would inherit the timeliness measure from the (superset) data unit. Another situation would be combining all or a portion of two or more data units. For example, suppose two data units (files) are merged. Then a natural timeliness measure for the resulting data unit would be some weighted average of the timeliness values for the original data units. This is consistent with the timeliness value for computed outputs. (Recall that Equation (3) is essentially a weighted average of the timeliness measures of the input data units.) The weights could reflect the size of the data units that are merged, their importance or some combination of attributes. The example in §4 illustrates these concepts and methodology using equal weights for the inputs to processing blocks. 2.2.1.3. Quality and Storage Blocks. The timeliness measure for the output of a quality block is the same as that for the input data unit. Again this is so even though quality control consumes time, the justification being that all timeliness measures ultimately depend upon that point in time when the customer receives the product. Thus time for one specific activity is already incorporated. If only some of the units pass through a quality control block, then the subsets would have to be modeled separately and different timeliness measures could result. Analogously for storage activity, the timeliness of a retrieved data unit is that of the stored data unit, and for combinations of data units weighting is appropriate. 2.2.2. Data Quality. It is also important to be able to assess, in an overall sense, the quality of the information products. These products, as discussed before, are manufactured in multiple stages of processing and are based on data that have various levels of quality. 470 3b28 0010 Mp 470 Monday Apr 20 09:28 AM Man Sci (April) 0010 For our model we need to determine how each type of block affects the quality of the input stream. Some cases are straightforward. For example, the storage of data does not affect its quality. (This assumes there is no external event such as accidental erasure of data.) If the incoming data to a storage block has a certain level of incompleteness, then the outgoing data has the same level. For the vendor block it is necessary to know the quality of the primitive data units. Determining this precisely can be difficult and may require, for example, a statistical analysis similar to that used by Morey (1982). Alternatively, these values can be estimated by using some combination of judgment based on past experience and quality control procedures such as information audits. In any case, the quality estimations for the primitive data units are exogenous to the system being modeled. For the quality block typically the output data quality is better than the input data quality. The magnitude of the improvement must be determined by the analyst or furnished to the individual. As with timeliness, weighting or inheritance is appropriate for certain types of processing. (Should the processing block function be to sort a file, then the value for quality out would be inherited from the value of quality in.) The least straight-forward case is a processing block that involves arithmetical operations, which we now discuss. Let DQ(xi ) denote a measure of the data quality of data unit xi . As stated above, estimating the values for the DQ(xi )s is an issue of concern only for the primitive data units. Suppose all the inputs to some stage are themselves outputs of other stages. Then the appropriate data quality measures have already been determined by applying at those previous stages the expression given below. As before, we use a scale from 0 to 1 as the domain for DQ(xi ) with 1 representing data with no quality problems and 0 those with intolerable quality. If all data items should have a data quality measure equal to 1 and if all processing is correct, then the output quality measure should be 1 as well. Conversely, if the quality of all inputs is 0, then the quality of the output should be 0 as well. Given this reasoning, we form a weighted average of the DQ(xi ) values for the data quality of the output. Let y be determined by data items x1 , x2 , . . . , xn , i.e., let y Å f(x1 , . . . , xn ). Then Data Component (DC), an esti- MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems mate for the data quality of output y resulting solely from deficiencies in the input units, can be obtained from Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. DC Å Z Z ( niÅ1 wi ∗ DQ(xi ) Ìf where wi Å ∗É xi É. ( niÅ1 wi Ìxi (4) Note that DC satisfies 0 ° DC ° 1; DC Å 0 if, and only if, DQ(xi ) Å 0 for all i; DC Å 1 if, and only if, DQ(xi ) Å 1 for all i. Once again, DC involves the magnitude of the input values and the interactions among the data. Formulas analogous to (4) were used by Ballou and Pazer (1985). Although it has been implicitly assumed that the processing activities are computerized, this need not be the case. In most systems some of the processing activities, such as data entry, have manual components. Especially in this situation, and to a lesser degree with fully computerized systems, the processing itself can introduce errors. Let PE be a measure of processing effectiveness. If PE Å 1, then the processing never introduces errors. If PE Å 0, then the processing corrupts the output to such a degree that the data quality measure for that output should be 0. Thus, the output quality of y, DQ(y), is determined by both input data quality and processing effectiveness, i.e., DQ(y) Å f(DC, PE). (5) There are various possibilities for this relationship. For example, one such functional relationship is q_______ DQ(y) Å DC ∗ PE. (6) Note that DQ(y) Å 1 if, and only if, both DC and PE equal 1. Also DQ(y) Å 0 if either DC or PE is 0. Also, if DC Å PE should hold, then DQ(y) has the same value as DC and PE. The data quality of the data items changes, of course, as these values undergo a series of processing and quality control activities. The inputs for a given process may well be outputs from other processes. Thus whenever a data value undergoes processing or quality control, the resulting quality measure of the output needs to be recorded so that the information is available for determining quality control of any subsequent outputs. If the processing is complex, it may be necessary to substitute subjectively derived quality response functions for the MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 471 Monday Apr 20 09:28 AM Man Sci (April) 0010 calculus-based analysis. For example, this would be necessary if the processing block involves a linear program. In such cases, one could specify that the output quality is some function of the average input quality. This function could be determined by simulation. 2.2.3. Cost of Information Product. To evaluate the effectiveness of improving the system, it is necessary to compare changes in value to the customer with changes in cost. However, costing in a multi-input, multi-output environment such as the one we model is difficult to do and the approaches available are often controversial. Implications of costing for multiple use systems have been recognized and difficulties encountered in trying to predict and track costs have been considered in Kraimer, Dutton, and Northrup (1981). Nevertheless, because of its importance, there is a substantial body of literature dealing with the pricing of computer services; see, for example, (Kriebel and Mikhail 1980). In our methodology we adopt a cost accumulation and pro rata allocation approach which, although ad hoc, facilitates the estimation of the information product’s cost in a straightforward manner. As long as this costing approach is used consistently in evaluating all the possible options, it would not lead to erroneous decisions. 2.2.4. Value to the Customer. Ultimately, of course, the measure that counts is the value of the product to the consumer. This has been emphasized in both manufacturing and information systems environments. Our approach is to hypothesize an ideal product, one with 100% customer satisfaction. Any actual product would deviate from the ideal on several dimensions. Since our concern is with evaluating alternative system designs so as to improve either timeliness or data quality or both, it is natural in this context to limit consideration of the determinants of value to these dimensions. Thus for each customer C, the actual value VA is a function of the intrinsic value VI , the timeliness T and data quality DQ, i.e., VA Å fc (VI , T, DQ). (7) Given the above mechanism for measuring an information product’s timeliness and data quality, a functional form for VA could be VA Å VI (w*(DQ) a / (1 0 w)*T b ). (8) 471 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Here VI , w, a, and b are customer dependent. The weight w is a number between 0 and 1 (inclusive) and captures the relative importance to the customer of product quality and product timeliness. For example, w Å 1 implies timeliness is of no concern to the customer whereas w Å .5 implies that quality and timeliness are equally important. The exponents a and b reflect the customer’s sensitivity to changes in DQ and T. Variants of Equation (8) have been utilized in several previous research efforts; see, for example, (Ahituv 1980). 3. A Methodology for Determining Information Product Attributes The concepts and methods described in the previous section form the basis for the methodology introduced in this section. The purpose of this methodology is to determine information product attribute values that can be used to suggest ways to improve the system. Timeliness, quality, and cost attribute values provide the producer with a means to assess the potential value of the information product for customers. As with product manufacturing, however, the producer should analyze the information manufacturing process with the goal of improving one or more of these values. Doing this may well result in degradation of the other attribute values but should in an overall sense enhance the value of the information product for the customer. The producer would have to determine whether the tradeoff is beneficial. To improve the timeliness value, for example, one needs to change the system. There are two ways to accomplish this. Modifying data gathering procedures so as to obtain data which are more current is one approach. The other is to modify the system so as to process the data more rapidly. In both cases a mechanism is needed that can be used to determine what approach would produce the largest improvement in timeliness in a cost effective manner. For this we present the information manufacturing analysis matrix in tabular structure. The Information Manufacturing Analysis Matrix has one row for each of the data units, primitive and computed. With the exception of those blocks representing the data vendors, the matrix has one column for every block. Should a particular data unit pass through an activity block, then associated with the cell determined 472 3b28 0010 Mp 472 Monday Apr 20 09:28 AM Man Sci (April) 0010 by the appropriate row and column is a five-component vector, the components of which are described below. The Information Manufacturing Analysis Matrix for the system displayed in Figure 2 is presented in Figure 4 as part of the Illustrative Example found in §4. The entries found in certain cells of the matrix shown in Figure 4 indicate that the data unit passes through that activity block. Note that there is a five-component vector of parameters associated with that cell. Recall that SB1 is used as a pass through for data units DU1 , DU6 and DU8 . In order to determine appropriate modifications to the system, it is necessary to track time, data quality and cost. The information needed for this is first described in general terms and then followed by a discussion of these parameters for each of the different types of activity blocks. Listed below are the five components of the vector of parameters. p: This specifies the predecessor or originating block. For example, the predecessors of PB2 are QB2 , the origin of DU8 , and VB3 , the origin of DU5 . t1 : This represents the time when the data unit is available for the activity. For example, the value for t1 for the vector associated with (DU10 , PB6 ) is that time when DU10 is ready for the processing block PB6 . In the special case when p is a vendor block (VBI ) then t1 Å Input Time as given in Equation (1), the expression for currency. t2 : This is the time when the processing begins. Processing cannot start until all data units needed for that block are available. Also, processing may begin at a scheduled time tsch . Thus t2 is the larger of max{t 1* s} and tsch . DQI : This is the quality of the incoming data unit for a particular activity. It is, of course, the same as the data quality of the output of the predecessor block. As mentioned above, we use the term ‘‘data quality’’ as a place holder for whatever dimension or dimensions of data quality are of concern to management (with the exception of timeliness). CostI : This represents the cost of the incoming data unit. In essence, CostI is the pro rated accumulation cost of all the previous activities that this data unit has undergone. (We assume that if a data unit is passed onto more than one activity, then CostI for those is determined in some pro rated fashion. This implies that total cost is preserved.) MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems We now examine in some detail the implications of the parameters for each of the activities. For this it is useful to use the notation DQO to refer to the output quality resulting from some process and CostO as the cost of the output. DQO is computed using the concepts and expressions given in §2, and set equal to DQI of successor blocks. Also CostO is the sum of all input CostI plus the cost of the block. Processing Block. For a processing block the interpretation for each of the parameters is straight-forward. Information regarding the cost, processing time, and impact on quality for each processing block needs to be furnished to the analyst. Assuming arithmetical operations are involved, the quality of the output is computed using Equation (5). This is then the value of DQI for all blocks that receive that output. The time when the processing is complete is used as the t1 value for all blocks that receive that output. If there is a delay in getting the data unit to the next activities, it may be necessary to include a delay parameter as a descriptor of a processing block. This concept also applies to quality and storage blocks. Quality Block. A quality block can operate on only one data unit at a time. It may, however, process different data units at different times. It is necessary that the ts reflect this. DQI is simply, of course, the DQO of the previous process. Determination of the DQo of the quality block would require something akin to the statistical analysis described by Morey (1982). If t2 ú t1 should hold, then time may be needlessly wasted at this step. A positive value for (t2 0 t1 ) would reflect a scheduling delay or a queuing delay. If an entire file is being checked, we assume that the file is not ready to be passed on until all corrections and changes have been made. Storage Block. The value for t2 is that time at which storing of the data unit commences. Assuming storage of data cannot affect quality, DQI Å DQo . If a certain subsequent process should require a subset of some data unit (part of a relational table, for example), then that subset inherits the data quality properties of the data unit. A data unit is modeled for each subset, even if they should come from the same original data unit. Information regarding the cost and storage time also needs to be furnished to the analyst. Note that storage time is how long it takes to store the data unit. The data unit is available for subsequent processing any time af- MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 473 Monday Apr 20 09:28 AM Man Sci (April) 0010 ter it is stored. The amount of time between when the data unit is stored and when it is used does not affect the overall timeliness value of information products delivered to customers unless the storage block should lie on the critical path. In this case, timeliness is affected by the Delivery Time component of Equation (1). Customer Block. For the customer block, t1 represents the time the product is finished, t2 the time it is received. For on-line delivery systems t1 Å t2 could hold. For this activity DQI has the value of DQO of the final process that generated the product. CostO Å CostI assuming the cost of delivery is negligible. If delivery affects cost or quality, then the impact can be modeled as an additional processing block. 3.1. Timeliness, Quality and Cost of Information Products: Customer Perspective The above structure provides the basis for making changes to the information manufacturing system by allowing one to quantify the parameters of timeliness, quality and cost. We now discuss issues related to this in the context of the customer’s needs. Determination of customer needs regarding these parameters, especially for external customers, can be made using market research techniques such as focus groups. Timeliness. A value for the timeliness of a information product for a particular customer cannot be determined until the customer has received the information product. This value can be determined by first computing the timeliness values T(xi ) for each of the primitive data units provided by the vendors using Equation (2). These values are then available as input to subsequent activities. As previously discussed, sometimes the timeliness value for an output from an activity block differs from the input values, sometimes they are the same. Whenever arithmetical processing is involved, then Equation (3) would be invoked. Activities such as quality control affect timeliness via the Delivery Time component in Equation (1). Note that the need to wait until the time when the information product is delivered to the customer necessitates a ‘‘second pass’’ through the system in order to compute the timeliness values (to be illustrated in §4). If timeliness is specified and measured in terms of a contracted delivery date, then the second pass through the system is not necessary, as the ‘‘delivery time’’ is prespecified. However, the timeliness 473 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems analysis is still important as it may be possible to deliver the product sooner and hence gain a competitive advantage. The information product’s timeliness measure is, of course, a number between 0 and 1. Whatever the number happens to be, its significance or interpretation is customer dependent. Some customers might find the information product’s timeliness to be satisfactory, others not. If it is determined that the product is not sufficiently timely, then this timeliness value serves as a benchmark to determine the impact on timeliness of various changes that could be made to the information manufacturing system. Using the framework described in this section, one can reengineer the system and re-compute the timeliness, quality and cost values. For example, one possible way to improve timeliness might be to eliminate a specific quality block, which could enhance the ultimate timeliness at the expense of quality. Quality. The quality parameter is simpler in that the quality measure of the product delivered to the customer is that associated with the output of the final activity block. Again though, its meaning or significance is customer-dependent. Whatever the number happens to be, the customer might feel that the quality is totally satisfactory, completely unsatisfactory, or anything in between. Again, the importance of the number is that it serves as a benchmark to gauge the magnitude of quality improvement resulting from making changes to the system. Note that different customers will perceive the quality differently. Some may feel the quality is fine whereas others may demand enhanced quality. In some sense the information producer needs to optimize total value across all products and all customers. 3.2. Cost and Value The cost of the information product is of interest to the producer. The customer is concerned with value received and price paid, the latter being an issue beyond the scope of this paper. As discussed, both quality and timeliness influence value, as do many other factors which we deliberately have not modeled. To perform the analysis, some functional expression relating these quantities is required. Solely for the purposes of discussion we use Equation (8). To maximize the net value (total value minus total cost) received by all customers for all information prod- 474 3b28 0010 Mp 474 Monday Apr 20 09:28 AM Man Sci (April) 0010 ucts, one must obtain information from the customers regarding each product’s intrinsic and actual value together with the customer’s perception regarding timeliness and quality. Suppose there are M customers and N information products. Then for each customer i and product j, an expression of Equation (8) applies, namely VA (i, j) Å VI (i, j) ∗ [w(i, j) ∗ (DQ(j)) a(i,j) / (1 0 w(i, j)) ∗ T(i, j) b(i,j) ]. (9) Many of the VI (i, j) values would be zero. The double subscript on T is necessary since the same product could be delivered to different customers at different times. Only a single subscript for product quality is required, as the computed quality measure of a particular product is the same for all customers. The customer’s sensitivity to improvements in data quality and timeliness can be handled via the exponents a(i, j) and b(i, j) respectively. The producer wishes to optimize net value. Assuming appropriate and consistent units, the problem is given by M Maximize N ∑ ∑ [VA (i, j) 0 C(i, j)] (10) iÅ1 jÅ1 subject to 0 ° T(i, j) ° 1, 1 ° i ° M, 0 ° DQ(j) ° 1, 1 ° j ° N. It should be kept in mind that this is a nonlinear optimization problem given the structure of VA as shown in Equation (9). Here C(i, j) represents the portion of the cost of product j assigned to customer i. It should be noted that this formula is used in the context of evaluating a small set of possible alternative information manufacturing system configurations instead of the traditional optimization. We have presented a methodology for determining the timeliness, quality, and cost of information products. In the next section, an example is presented to illustrate some of the conceptual and computational issues that will be encountered when the information manufacturing model is applied to real world scenarios. It also illustrates the business impact of possible changes to the system. 4. Illustrative Example For continuity, the system depicted in Figure 2 and described in the previous section will be used. Figure 3 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Figure 3 Figure 3(a) Data for Illustrative Example Descriptive Inputs Required for the Primitive Data Units Figure 3(b) Descriptive Inputs Required for the Processing Blocks Figure 3(c) Descriptive Inputs Required for the Quality Blocks presents the descriptive inputs required to compute the timeliness, quality, cost, and value characteristics of information products to be delivered to an array of customers. MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 475 Monday Apr 20 09:28 AM Man Sci (April) 0010 Figure 3(a) identifies the seven descriptive inputs required for each of the five primary data units. For example, DU2 is obtained from the first vendor at a cost of 10, is of intermediate quality, and is already 2 time 475 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Figure 3(d) Descriptive Inputs Required for the Storage Block units old when it enters the system at the beginning of the information manufacturing process. It is highly volatile with a shelf life of only 30 time units and a second degree timeliness function (i.e., s Å 2 in Equation (2a)). By contrast, only four descriptive inputs are required for each of the six processing blocks as shown in Figure 3(b). For example, PB2 has a cost of 30 and requires 4 time units to complete. As noted in the previous section, when processing is complex, it may be necessary to substitute subjectively derived quality response functions for the calculus-based analysis. This is the process followed in this example. For PB2 , output quality is equal to the square of the average (unweighed) quality of the two input data units (DU5 and DU8 ). The output of this block is available to the next block without delay. Each quality block also requires only four descriptive inputs shown in Figure 3(c). QB2 has a cost of 40 and requires 8 time units to complete. Once again a subjectively derived quality output function is employed. The effect of this quality block is to eliminate 75% of the difference between the quality of the input flow and the best achievable quality (i.e., Qout Å 1.0). This output is also available without delay. Figure 3(d) presents the three descriptive inputs required for the storage block. For simplicity only a fixed cost of 5 units per input is assigned. The storage time for SB1 , 1 time unit, is the time spent to store a data unit and, as mentioned earlier, is unrelated to the time the data unit actually spends in storage. No additional delay is encountered. The only additional requirement for analyzing the system is the specification of a cost accounting assumption relating to the allocation of input and processing costs across multiple outputs. The simplifying assumption of equal allocation is made for this example. We now proceed to explain the mechanics of applying the methodology using the data found in Figure 3 to generate the Information Manufacturing Analysis Matrix presented in Figure 4. It may be instructive to 476 3b28 0010 Mp 476 Monday Apr 20 09:28 AM Man Sci (April) 0010 view the column corresponding to PB6 . We can observe that this block requires 10 time units and incurs a cost of 100. Its quality output function is of the second degree and there is a delay of 1 time unit to deliver its information product to the final customers. This processing block requires three inputs, DU4 , DU10 , and DU11 which arrive from VB3 , QB3 , and PB4 respectively at times 10, 31, and 20. Since processing will not begin until all inputs are available, processing starts at time Å 31. The quality and costs of the three inputs are (0.9, 30), (0.9373, 73.75), and (0.9153, 77.6). The output of this process block is DU13 and is represented by a row in the Information Manufacturing Analysis Matrix. It can be seen from this matrix, as well as from Figure 2, that DU13 is a final information product which is provided to three customers. It is created at time Å 41 by PB6 (31 / 10), but since there is a oneunit delay it is not received by the customers until time Å 42. Since Q1 , the average quality for the three inputs described above is 0.9176, and since the quality output function is of the second degree, data quality of DU13 Å 0.842. The cost of the three inputs when added to the processing cost yields a sum of 281.35. Since this is equally distributed over the three outputs, Ci Å 93.75 for DU13 . Figure 5 provides the information required to evaluate the products received by the customers. The relevant cost and quality are obtained from the Information Manufacturing Analysis Matrix. As discussed in the previous section, determining the timeliness value requires a ‘‘second pass’’ through the system. For example, once it is determined that DU12 is delivered to the customer at time Å 13, the currency value of its primary data input (DU2 ) can be determined by Equation (1) as 13 0 0 / 2 Å 15. Therefore, the timeliness value of DU2 can be determined by Equation (2a) as (15/30)**2 Å 0.25. Since there is only one input, this is also the timeliness value of the information product, DU12 . In a similar manner, once delivery times for DU13 and DU14 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems The Information Manufacturing Analysis Matrix for the Illustrative Example Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Figure 4 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 477 Monday Apr 20 09:28 AM Man Sci (April) 0010 477 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Figure 5 Information Required to Evaluate the Information Products are determined to be 42 and 37, the timeliness values of various primary inputs can be determined by Equation (2a). Starting with these and employing equal weights (for simplicity) at each point of convergence, the timeliness values for DU13 and DU14 are determined to be 0.35 and 0.46. The right-hand portion of Figure 5 presents customerspecific descriptive inputs required to determine the value of information products to the three customers. Also, for simplicity the linear version of Equation (8) was utilized for all customers. In this example, marketing research has determined that Customer 1 finds data quality and timeliness to be equally important. Timeliness is twice as important as data quality for Customer 2, while the reverse is true for Customer 3. This example also shows that the intrinsic value for the same information product can vary among customers. Figure 6 presents the results of the Illustrative Example and highlights the ‘‘bottom line’’ of this entire analysis. The numbers in parentheses relate to a modified version and are discussed later in this section. Before the modification, on the aggregate, this information manufacturing process generates 64.69 in net value (the difference between total value to the customers and total cost to the firm). It should be noted that this is not the same as net profit, but it appears that, in aggregate, it should be possible to negotiate prices that will pro- 478 3b28 0010 Mp 478 Monday Apr 20 09:28 AM Man Sci (April) 0010 vide both a profit to the manufacturer and a net value to the customers. Another picture emerges, however, when net value is viewed, in the disaggregate, by looking at individual customers. The value of DU13 to Customers 2 and 3 is less than the cost of production. If either of these customers should discontinue buying DU13 , the consequences would be substantial since revenues would decline but costs would remain unchanged (unless production of DU13 were terminated). Since the purpose of this framework is to provide a vehicle for improving information manufacturing systems, the example will be extended in that direction. By inspecting Figure 6, it can be determined that quality is near the top of the scale (recall that 0.842 is a point on a zero-one quality scale and does not imply that only 84.2% of the output is correct). On the other hand, timeliness is rather poor for all information products. This suggests a qualitytimeliness tradeoff which could be achieved by the elimination of a time-consuming quality block. Since DU13 is in a deficit position it should also be a quality block that affects this output. QB3 is such a block. The justification for this is found in Figures 2, 3, 4, and 6. It is further observed from Figure 4 that QB3 is on the critical path for all information products except DU12 . A side benefit of this quality-timeliness tradeoff will be the avoidance of the 50-unit cost for QB3 . MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems Evaluation of Information Products Before and (After) Improvement Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. Figure 6 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 479 Monday Apr 20 09:28 AM Man Sci (April) 0010 479 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems The analysis is repeated for the modified version, and the results are shown in parenthesis in Figure 6. Not only has the aggregate net total value more than doubled but also each information product now has a positive net value for each customer. Given the parameters of this example, the avoidance of QB3 seems an excellent first step in improving this system. For a note of caution, see Chengalur-Smith, Ballou, and Pazer (1992) where it was shown that this may be detrimental if there is a considerable variability in the input material. 5. Application of the Information Manufacturing Model: The Case of Optiserve In this section we apply the model described in this paper to a mission-critical information manufacturing system found in a major optical products company, which we will refer to as Optiserve (Lai 1993, Wang and Firth 1993). For expository purposes, we present the relevant portion of the case with a simplified system topology. The current system is discussed in some detail, and a reengineered system designed to address some of the current deficiencies is briefly presented. Since the most difficult aspect of implementing the model is obtaining estimates for the various parameters needed by the model, we concentrate on this. The analysis, once the numbers are available, proceeds as was demonstrated in the Illustrative Example. 5.1. Current System Optiserve is a large optical products chain with 750 stores located throughout the country. It provides oneFigure 7 stop eye care in that each store provides eye exams, optical products, and fitting of the products. However, grinding of the lens is carried out at four manufacturing laboratories. Our analysis focuses on the express orders which are handled by the main laboratory. Optiserve strives to differentiate itself in several ways, one of the most important being customer service. This is a key factor in Optiserve’s mission, which is to ‘‘create patients for life.’’ However, at the present time, problems with data quality not only are costing the company over two million dollars on an annual basis but also are resulting in the loss of customers (Wang and Firth 1993). Optiserve’s current information manufacturing system is displayed in Figure 7. Several types of data, modeled as DU1 , are obtained and entered onto the Spectacle Rx form, the output of interest in this case. At this stage, the Rx form has patient information (e.g., patient name and telephone number), the prescription (provided by the optometrist), information on the glasses themselves (e.g., frame number and cost, provided by the optician), and additional characteristics, such as distance between the patient’s pupils, provided by the optician. These forms are batched and entered by the optician (represented by PB1 ) into the store’s PC whenever the optician has free time, often at the end of the day. Roughly 80% of the data quality problems arise as a consequence of the above process. Normally twice a day the optician forwards the day’s orders (DU2 ) to an IBM mainframe based at corporate headquarters. In a process represented by PB2 , the IBM queries the item master file (SB1 ) to determine if the frame ordered is available (DU3 ). Updating of this file is represented by VB2 . The Rx ticket (DU4 ) is then Current Information Manufacturing System 480 3b28 0010 Mp 480 Monday Apr 20 09:28 AM Man Sci (April) 0010 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems checked by the IBM for completeness and correctness (QB1 ). Assuming no problems, it then forwards the Rx ticket (DU5 ) to an HP computer also based at headquarters. That computer accesses the frame characteristic file (SB2 ) to obtain information (DU6 ) regarding the physical characteristics of the frames ordered (size, shape, etc.) and uses that information to generate the specifications the laboratory will use to actually grind the lenses (PB3 ). In some cases this cannot be physically done. For example, if the lens is large, the blanks may not be thick enough to accommodate a high curvature. (The process of checking the output of PB3 , namely DU7 , is modeled by QB2 .) Assuming no problems, the Spectacle Rx ticket (DU8 ), now complete with grinding instructions, is returned to the IBM (PB4 ), which routes the Rx ticket (DU9 ) to the main laboratory (CB1 ). It is important to keep in mind that the above scenario captures an information system that supports a manufacturing system. The information product, which is manufactured on a regular, repetitive basis, is the completed Spectacle Rx ticket (DU9 ), which is delivered to one of the laboratories. The customer for the information product is the laboratory, an internal customer (CB1 ). The fraction of Rx tickets that exhibit some data quality problem at some stage is not excessive, as 95% of the tickets are completely free of data quality problems. Of those with deficiencies, two-fifths relate to initial data gathering and entry, another two-fifths to frame-related errors, and one-fifth due to inability to match the frame chosen with the prescribed physical requirements for grinding. 2% of all Rx tickets which are in error are never detected until the patient receives the glasses. For the 3% of all the Rx tickets where a problem is detected, the optician has to contact the customer, who usually has to come back into the store. As mentioned, this results in a non-trivial financial loss to the company. More importantly, it violates Optiserve’s desire to differentiate itself by service and results in permanent loss of customers. 5.2. Reengineered System As indicated, the purpose of our model is to provide information regarding the impact of proposed changes to the information manufacturing system. The Optiserve case illustrates substantial problems with all three di- MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 481 Monday Apr 20 09:28 AM Man Sci (April) 0010 mensions tracked, i.e., quality, timeliness, and cost. A design alternative is proposed and discussed. It involves a reengineering of the system, which substantially improves the timeliness and quality dimensions and hence the company’s goal of superior customer service, but at an increased cost in terms of hardware and software. The reengineering option is a decentralized version of the current system. In each store the PC has been upgraded to include features such as a math coprocessor, and it now has the responsibility for most of the computations and processing. The optician still enters the data directly into the PC. In addition, the optician checks to determine if there have been any significant changes in the prescription. (Such changes are verified with the prescribing optometrist.) Most importantly, the computation of the physical characteristics of the lenses formerly performed by the HP computer (that resides at headquarters) are now performed by the PC in the store while the customer waits. A copy of the item master file and frame characteristic file reside on the PC and are updated by a server located at the headquarters periodically as appropriate. In the reengineered system a new processing block, which can be labeled PB1* essentially incorporates the activities of PB1 , PB2 , and PB3 of the current system (see Figure 7). If any problems are identified, they can be resolved immediately, as the patient has been asked to wait a moment while the computations are performed. A quality control block, say QB 1* , would essentially combine the functions of QB1 and QB2 . If there is no problem, the PC forwards the Rx ticket to the server in the headquarters (which replaces the IBM mainframe), which in turn forwards the ticket to the appropriate laboratory. This results in a major improvement in patient service and would serve to further differentiate Optiserve on the service dimension. In the reengineered system, the server performs the following functions: (1) It maintains the most current version of the item master file and the frame characteristic file. Periodically, it updates the 750 PCs at the store level with the most current version. (2) It keeps track of the inventory level of blank lens, frames, etc. in the laboratories. Each laboratory will periodically report its actual inventory level to the server for reconciliation purposes. This would account for breakage and defective items (3). The server would route the Spectacle Rx ticket to the appropriate laboratories. 481 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems 5.3. Data Requirements for the Optiserve Case: Current System In this subsection we describe how to obtain the data required to implement the information manufacturing model in the case of Optiserve. Specifically we present one set of input data values needed to compute the timeliness, quality, cost, and value characteristics of the Rx ticket—the information product. Although the other sets are not necessarily handled in the same manner, similar procedures can be applied to obtain the corresponding input values. Those desirous of the full set of input data should contact the authors. 5.3.1. Primitive Data Units. Eight descriptors are required for each primitive data unit. As an example, we will consider DU1 . 1. Vendor— The vendor is the patient who is the source of the information collected by both the optician and the optometrist. 2. Cost— The cost of securing the patient information has two major components. The cost of the optician’s time was approximately $10 while the cost of the optometrist’s time was approximately $20 to yield the estimate of $30. 3. Quality— The estimate of the quality of the input is constructed from three sources: the proportion of remakes due to data quality (2%), the proportion of erroneous orders detected at QB2 (1%), and the proportion of erroneous orders detected at QB1 which were attributable to vendor input (1%). Thus, the proportion of ‘‘defects’’ originating from the vendor is closely approximated by 4%, the sum of these three error rates. As noted in §3, the modeling process can incorporate either relative or actual quality measures. In this case, since actual measures are available, the quality of DU1 is 0.96. 4. Input Time— The completion of the collection of patient information is used as the point of reference for this analysis and is consequently set equal to zero. 5. Age— The information is collected by the optician and optometrist over the period of an hour. On the average, this information is one-half hour old at the completion of the collection process (t Å 0). Since the analysis is based on a ten-hour day, one-half hour is represented as 0.05 day. 482 3b28 0010 Mp 482 Monday Apr 20 09:28 AM Man Sci (April) 0010 6. Volatility— At first it would appear that much of the information concerning the patient would be of rather low volatility. However, the volatility of the information in this analysis relates to how long the patient will wait before canceling the order (shelf life). After this point the information is useless. Most of the orders are express orders, which implies that receiving the glasses promptly is critical. 7. Shelf-Life— It was determined that unless the laboratory received the patient information within five days it would become useless due to cancellation of the order. 8. Timeliness Function— Since cancellations accelerate near the end of the five-day shelf-life, an exponent of less than one for the timeliness function is required. An exponent of 14 indicating that approximately 67% of the cancellations occurred during the fifth day proved satisfactory. 5.3.2. Processing Blocks. Four descriptors are required for each processing block. As an example we will discuss PB2 . 1. Cost— The cost of the IBM query of the item master file to determine frame availability was estimated to be $4.00. This includes items such as personnel costs. 2. Time— The expected time required by this query is very small compared to various delays in the system. Such times are represented in the model as 0.001 day, an upper limit on the time required. 3. Quality Function— Unlike the previous example where the processing was based on aggregation and the corresponding quality function based on averaging of inputs, this process is based on a comparison. Consequently the output is of acceptable quality if and only if both the information provided by the vendor and the frame availability information are correct. The corresponding quality function is the product of the quality of DU2 and DU3 . 4. Delay— This processing by the IBM is done at one hour intervals, consequently the average delay is onehalf hour or 0.05 day based on a 10 hour day. 5.3.3. Storage Block. For each storage block only three descriptors are required. SB1 will be used as an example. 1. Cost— The cost of storing master file data for the IBM is estimated to be $0.1, which includes the time to retrieve the data and disk storage cost. MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems 2. Time— This is primarily the time to retrieve the record associated with a patient’s frame, which is relatively short, and is estimated to be 0.0002 work day. 3. Delay— The frame availability information is available without delay once the query is received since the database is on-line. 5.3.4. Quality Blocks. Each Quality Block requires four descriptors. QB1 is used as an example. 1. Cost— When the rework operation is included as part of the quality block, the cost is a sum of the actual cost of checking the information quality plus a prorated cost to cover the rework operation for that proportion of the flow which is rejected. The check is performed by the IBM at a cost of $0.10. 2. Time— A similar argument holds for the time estimate. The time is quite small and 0.0002 days is used as an upper limit. 3. Quality Function— At this point approximately 5% of the flow is in error (4% from the flow provided by the patient and 1% from errors in the item master file concerning frame availability). The quality check at QB1 focuses only on frame related data, consequently only about 2% of the flow (1% relating to errors from the patient flow and 1% from the master file) is detected to be in error and rejected. The remaining 3% out of the initial 5% is used in the modeling of the output quality, Qout , for this block by using the parameter, 0.60, in the quality function (i.e. .03/.05, QB1 ). The specific structure of the quality function is necessitated by the fact that Qout is the proportion that is good. Note that this structure is similar to that used in Figure 3(c). 4. Delay— Since the quality check is made by the IBM as the last step in processing, the delay Å 0 for this quality block. 5.3.5. Information Required to Evaluate Information Products. Of the eight descriptors required to evaluate an information product, three are computed while the other five must be specified. 1. Customer— The customer for the information product is the laboratory. 2. Intrinsic Value— VI was set to 1.00 so that it could be scaled up or down as a function of spectacle value. This permits flexibility for the analysis. 3. Weighting Factor— Since it was estimated that for laboratory operations, quality was approximately twice MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998 3b28 0010 Mp 483 Monday Apr 20 09:28 AM Man Sci (April) 0010 as important as timeliness, the weight, w, was set equal to 0.67. 4. Data Quality Exponent— Since the value of the information product declines substantially for relatively small data quality problems, an exponent greater than one is indicated. For example, moving from 0.95 quality to, say 0.98 quality, although a small quality increase, would be a substantial improvement in the value of the information product. It was estimated that a probability of error of 0.10 would reduce the value of the information product by half, while if the error probability were 0.50 the information product would have no appreciable value. An exponent of a Å 7.0 provides a good approximation to these conditions. 5. Timeliness Exponent— Since for this case the shelf-life used in the timeliness function was based on the patient’s tolerance for delays, a linear function is used to convert timeliness to value to avoid ‘‘doublecounting’’ this factor in the model. 6. Conclusions We have presented an information manufacturing model that can be used to determine the timeliness, quality, cost, and value of information products. The systems we model have a predefined set of data units which undergo predefined processing activities. Our work is customer driven in that the value of the information products manufactured by the system is determined by the customer of information products. The Optiserve case focused on a relatively small-scale information manufacturing system. For large-scale systems, which may contain hundreds of processes, data units, and so forth, a hierarchical modeling approach is required. Under this approach an analyst would model, initially, at a higher (macro) level with each block possibly representing a large number of related activities. That macro model, which would contain a relatively small number of blocks, is then analyzed. Those blocks that for whatever reason require more specific analysis are then replaced with a detailed (micro) model. One of the benefits of the Information Manufacturing Model is the ability to use it to study the impact on an information system of a changed environment and the efficacy of various options for addressing these changes. For example, suppose that governmental regulations 483 Downloaded from informs.org by [128.122.253.228] on 01 July 2015, at 06:52 . For personal use only, all rights reserved. BALLOU, WANG, PAZER, AND TAYI Modeling Information Manufacturing Systems alter the frequency with which information products are required. Proposed changes to the current system can be simulated to verify if these alterations can, in fact, enable the information product to be delivered when required. It also provides insights regarding the information quality that would result together with the associated costs. This research is particularly timely in light of the industrial trend toward total quality management and business process reengineering. At the intersection of these driving forces is information quality. Ultimately, we need to deliver high-quality information products to the customer in a timely and cost-effective manner.2 2 Work reported herein has been supported, in part, by MIT’s Total Data Quality Management (TDQM) Research Program, MIT’s International Financial Services Research Center (IFSRC), Fujitsu Personal Systems, Inc., Bull-HN, and Advanced Research Projects Agency and US Naval Command, Control, Ocean Surveillance Center. References Ahituv, N., ‘‘A Systematic Approach Toward Assessing the Value of an Information System,’’ MIS Quarterly, 4, 4 (1980), 61–75. Ballou, D. P. and H. L. Pazer, ‘‘The Impact of Inspector Fallibility on the Inspection Policy in Serial Production System,’’ Management Sci., 28, 4 (1982), 387–399. and , ‘‘Modeling Data and Process Quality in Multi-input, Multi-output Information Systems,’’ Management Sci., 31, 2 (1985), 150–162. and , ‘‘A Framework for the Analysis of Error in Conjunctive, Multi-Criteria, Satisficing Decision Processes,’’ J. of Decision Sciences Inst., 21, 4 (1990), 752–770. and , ‘‘Designing Information Systems to Optimize the Accuracy-Timeliness Tradeoff,’’ Information Systems Res., 6, 1 (1995), 51–72. and G. K. Tayi, ‘‘Methodology for Allocating Resources for Data Quality Enhancement,’’ Comm. ACM, 32, 3 (1989), 320–329. Chengalur-Smith, I., D. Ballou, and H. Pazer, ‘‘Dynamically Determine Optimal Inspection Strategies for Serial Production Processes,’’ International J. Production Res., 30, 1 (1992), 169–187. Deming, E. W., Out of the Crisis, Center for Advanced Engineering Study, Massachusetts Institute of Technology, Cambridge, MA, 1986. Figenbaum, A. V., Total Quality Control, McGraw-Hill, New York, 1991. Hammer, M., ‘‘Reengineering Work: Don’t Automate, Obliterate,’’ Harvard Business Rev., 90, 4 (1990), 104–112. Kraimer, K. L., W. H. Dutton, and A. Northrup, The Management of Information Systems, Columbia University Press, New York, 1981. Kriebel, C. H. and O. Mikhail, ‘‘Dynamic Pricing of Resources in Computer Networks,’’ Logistics, 1980. Lai, S. G., Data Quality Case Study—‘‘Optiserv Limited,’’ Master’s Thesis, MIT Sloan School of Management, Cambridge, MA, 1993. Laudon, K. C., ‘‘Data Quality and Due Process in Large Interorganizational Record Systems,’’ Comm. ACM, 29, 1 (1986), 4–11. Morey, R. C., ‘‘Estimating and Improving the Quality of Information in the MIS,’’ Comm. ACM, 25, 5 (1982), 337–342. Shewhart, W. A., Economic Control of Quality of Manufactured Products, Van Nostrand, New York City, 1931. Wand, Y. and R. Y. Wang, ‘‘Anchoring Data Quality Dimensions in Ontological Foundations,’’ Comm. ACM, November (1996). Wang, R. Y. and C. Firth, Using a Flow Model to Analyze the Business Impact of Data Quality, (No. TDQM-93-08), Total Data Quality Management (TDQM) Research Program, MIT Sloan School of Management, Cambridge, MA, 1993. , H. B. Kon, and S. E. Madnick, ‘‘Data Quality Requirements Analysis and Modeling,’’ in Proc. 9th International Conf. on Data Engineering, (670–677), IEEE Computer Society Press, Vienna, 1993. , V. Storey, and C. P. Firth, ‘‘A Framework for Analysis of Data Quality Research,’’ IEEE Trans. on Knowledge and Data Engineering, 7, 4 (1995), 623–640. and D. Strong, ‘‘Beyond Accuracy: What Data Quality Means to Data Consumers,’’ J. Management Information Systems, 12, 4 (Spring 1996), 5–34. Accepted by Abraham Seidmann; received June 1993. This paper has been with the authors 11 months for 3 revisions. 484 3b28 0010 Mp 484 Monday Apr 20 09:28 AM Man Sci (April) 0010 MANAGEMENT SCIENCE/Vol. 44, No. 4, April 1998