DATA WAREHOUSING IN COMPUTER INTEGRATED MANUFACTURING (CIM) Steve S. Daino Masters of Science Graduate Student Submitted in Partial Completion of the Requirements of IEM 5303 Computer Integrated Manufacturing This paper was developed to assist students in partial fulfillment of course requirements. No warranty of any kind is expressed or implied. Readers of this document bear sole responsibility for verification of its contents and assume any/all liability for any/all damage or loss resulting from its use. Table of Contents ABSTRACT ........................................................................................................................ 3 INTRODUCTION TO DATA WAREHOUSING ............................................................. 3 Definitions and Concepts ............................................................................................ 4 Considerations............................................................................................................. 6 FUNDAMENTALS OF DATA WAREHOUSING ........................................................... 9 Types of Data Warehouses ......................................................................................... 9 Technological Building Blocks................................................................................... 9 Data Warehouse Framework and Architecture ......................................................... 10 Data Warehouse Logical Model ............................................................................... 12 BENEFITS OF DATA WAREHOUSING IN CIM ......................................................... 15 CASE STUDIES ............................................................................................................... 16 Toyota Motor Company's Toyota Logistics Services (TLS) [4] ............................. 16 General Motors [5]................................................................................................... 17 MCI WorldCom and Industry Data Exchange (IDE) [6].......................................... 17 REFERENCES ................................................................................................................. 19 List of Figures Figure 1. Reasons for moving data outside the operations systems ……………… 5 Figure 2. Data Warehouse Architecture …………………………………………. 10 Figure 3. Data warehouse entities align with the business structure …………….. 13 Figure 4. Data De-normalization and Transformation ………………………….. 14 Figure 5. Data Scrubbing and Staging …………………………………………. 2 14 ABSTRACT The incorporation of data warehousing in computer-integrated manufacturing provides a means by which organizations can become more successful and globally competitive in today's fierce economic market. "Business as usual" is no longer enough. Since the onset of the information superhighway, a new avenue has opened enabling organizations to include state-of-the-art technologies for the collection, processing, and analysis of information. Data collection technologies and the tools used to analyze and form decisions based on the collected data encompass the concept of data warehousing. This technology is becoming an elemental tool by which corporations are able to gain a vital competitive decision-making edge throughout the product development lifecycle. Data warehousing is driven by the concept that data stored for business analysis is most effectively accessed by separating it from the data in operational systems. “ It is unacceptable to have business analysis interfere with and degrade performance of operational systems.” [8] Analytical tools and procedures complement data warehouses by allowing users to actively manipulate data retrieved from computer servers in order to get the necessary information used for decision-making. This paper will discuss the scope of data warehousing in computer integrated manufacturing (CIM), explain the advantages of data warehousing in CIM, and illustrate the use of data warehousing in manufacturing enterprises through selected case studies. INTRODUCTION TO DATA WAREHOUSING Computers began to play an important and vital role in the manufacturing process with the Automation era. However, these older processes were independent or standalone systems that did not allow the transmission and storage of information onto other computer systems. The users' needs to have access to this data eventually evolved into the creation of distributed database management technology. This software allowed specific information to be pulled from databases located throughout the organization, collected and stored on a computer located in a central location and consolidated and analyzed by the users. 3 Although the concept of the distributed database management system was good in theory, it still did not resolve the issues with the inability to share information among the various relational databases or incompatible computer systems. “Despite all of the changes in platforms, architectures, tools , and technologies over the past decades, a remarkably large number of business applications continue to run in the mainframe environment of the 1970’s.” [8] Additionally, the speed of the entire distributed process was fairly slow. The solution to this problem was the onset of data warehousing. Through the concept of client/server technology, data would be copied regardless of computer types and platforms and placed onto a common server. Definitions and Concepts By definition, a data warehouse is a centralized, integrated repository of information that must support complex decision support queries without performance degradation . A data warehouse simply provides a means to manage data outside of the operational systems where the data is originated. Reasons for moving data outside of operational systems are illustrated below in Figure 1. This process is achieved through an integrated system of hardware, software and network technologies designed to convert operational data into accessible business information. The main purpose of the data warehouse is to provide historical data for analyzing past performance in order to aid in future decisions. It is also important to remember that once operational data is brought to the data warehouse, it is no longer dynamic data and should not be further modified. 4 Order processing Data Warehouse •2 second response time •Last 6 months orders Daily closed orders • Last 5 years data • Response time 2 seconds to 60 minutes Product Price/inventory Weekly product price/Inventory •10 second response time • Data is not modified •Last 10 price changes •Last 20 inventory transactions Weekly marketing programs Marketing •30 second response time •Last 2 years programs •Different performance requirements •Combine data from multiple applications •Data is mostly non-volatile •Data saved for a long time period Figure 1. Reasons for moving data outside the operations systems [8] The concept of data warehousing differs greatly from that of our current technology regarding production databases, also known as online transaction-processing systems or OLTP systems. As displayed in Table 1, the differences between these two concepts affect both purpose and design. [1] The type of data warehouse an organization decides to incorporate should be based on two factors: the data needed to make informative decisions and the methods used to conduct its business. Since organizations are finding an increasing need to analyze historical data from all aspects of their manufacturing processes in order to improve their efficiency and increase their business performance, data warehousing solutions are becoming the foundation for future opportunities. 5 Data Warehouses OLTP Systems Optimized for data retrieval and reporting Read-only system Optimized for data entry and updates Contains data needed for running the day-to-day operations of a business Contains data used for analyzing Contains current and highly the business volatile data with some elements being unknown or incomplete at data entry time Incorporated redundant data Does not incorporate redundancy in data storage Table 1. Data Warehouses versus OLTP Systems Considerations One of the greatest advantages of data warehousing technology is in the area of accessibility to information. Organizations are finding it advantageous to not only open up the data to the users within the company, but also to customers, suppliers, and business partners. The benefits of information sharing include improved operations and products, more content customers, and increased revenue. However, sharing technology also requires a considerable amount of planning and coordination, a reliable process to ensure data integrity, increased security, and scalability of the system based on the number of users accessing the server computer system. The decision for an organization to incorporate data warehousing into their business depends on several factors. The practical considerations listed in Table 2 show that these factors have an impact as to whether the data warehousing technology in an organization becomes a valuable tool or just another expensive technology fad. [1] 6 Time and Money Space Consolidation Security User-Friendliness Project Planning Table 2. Practical Considerations for Data Warehousing Data warehouses are expensive in both time and money. In a 1996 study published by IDC, the average cost of building a data warehouse was $2.2 million, with an average time of 2.3 years to break even. Ninety percent of the companies in the study achieved greater than 40 percent return on investment, and 50 percent achieved over 160 percent return on investment. The average return on investment over three years, cumulative, was about 400 percent, with a higher return on investment for data marts. [1] Building a data warehouse can be profitable, however, a return on investment will not be realized for quite some time. The organization needs to be fully aware of the amount of investment and time required before any payback can be expected. Data warehouses also require an enormous amount of disk space. It is not uncommon to have stored a years worth of data as the minimum, or several years if trending analysis is used. Data warehouses measured in terabytes or petabytes are not uncommon. Disk space issues will also affect the cost of the project in both implementation and maintenance. Consolidation of data should be considered in the design of the warehouse especially if data from multiple sources may uncover incompatibility issues. Consistency in the data is the key to being able to extract useful data for decision-making applications. Security affects the organization both physically and philosophically. Since one the great advantages of a data warehouse is the ability to share information amongst multiple users, security considerations that allow this type of access are a necessity. This 7 contradicts the old security standard of limiting access for only those who absolutely need it. However, the evolution of technology is forcing organizations to change from a 'right to know' mind set to a 'need to know' philosophy. By permitting users access to the data, they are able to maximize their effectiveness and contributions to the organization. Data warehouses and the tools used for analysis should be user-friendly and not cumbersome to the point of frustration. Encouraging the users to search for answers to their questions will promote creativity and reduce costs associated with maintaining users' requests. Finally, proper planning is a necessity if the project is to be successful. An organization needs to have a full understanding of their business objectives, the potential cost versus the benefits, the resources required and the organizational commitment needed to implement a successful data warehouse project. If these areas are not fully researched before the implementation of the project, the organization may have wasted time and money on a project destined to fail. According to Hal Lorin of the Manticore Consultancy in New York, most of the failed data warehousing projects he has had chance to examine closely have demonstrated a set of common patterns: The database product was driving the project, not being driven by it. No close and effective link existed between business processes analysis and data normalization. No multi-dimensional views were available with the 'warehouse' context. No serious data normalization had been done and there was no architecture for an integrated repository of browse able Meta data. There was a totally inadequate study of scale and capacity issues. There was no co-ordination with legacy application portfolio management. The 'decision tools' fronting the warehouse were trivial or not at all integrated, providing two-dimensional representations of fixed queries based on simple extraction to a desktop PC. [10] 8 FUNDAMENTALS OF DATA WAREHOUSING Types of Data Warehouses Organizations need to consider the different variations of data warehousing and the benefits of each before deciding which would be best for their project. The operational data store, the data mart, and the enterprise data warehouse are all examples of the different types of data warehouses. The operational data store, one of the simplest types of data warehouses, is basically a replicated production database that has been adjusted for errors. Its primary purpose is to generate standard operations reports and to provide transaction detail information for summary level analysis. The data mart is different from a data store in that it usually contains limited information regarding a specific department or business process. An example for the use of data marts would be analyzing sales information for a specific region or product line. Since data marts contain only summary information, they can be linked to data stores for more detailed transactional data. The enterprise data warehouse holds information taken from throughout an organization. This type of warehouse is the most complex in both areas of establishing and maintaining, since information must be collected from multiple systems into a common database. The most common problem with the enterprise data warehouse regards incompatible or inconsistent data. The majority of the time spent on building such a data warehouse is directed to the extracting, cleaning, and loading of data. Technological Building Blocks The technological building blocks of any data warehouse include the relational database management system (RDBMS), a structured query language (SQL), and on-line 9 analytical processing software (OLAP).[2] The RDBMS stores the information in a database. This element is critical to the efficiency of the warehouse due to the factors that include the integrity of data stored and the relationship of the data within the tables. The SQL is a tool used to create, maintain, and view data. The SQL is critical to the warehouse because it is the point of entry through which the users access the information. OLAP software allows the users to view the summarized data. This tool empowers the users to take control of their needs by giving them the control to do their own search and analysis of the data. Data Warehouse Framework and Architecture The structure of data, communication, processing and presentation that allows users to access enterprise data consists of several interconnected parts. These parts, referred to as layers, can be graphically viewed in Figure 1 shown below. [3] Figure 2. - Data Warehouse Architecture The operational and external database layers consist of the "front-line" operational systems. These systems can be related to the different computer manufacturing processes located throughout the organization. Since these systems directly control the manufacturing process, their business transactions are limited in focus. Within an 10 enterprise, there may be hundreds of "front-line" systems each performing different functions as well as collecting process information to be stored. The information access layer provides the user of a data warehouse with both hardware and software tools that enable them to process and display data in a desired format for analysis. These tools allow the user to take control of their searches and gives them the data needed to make informed decisions. The data access layer is the communication network by which the information access layer communicates with the operational database. An important function of the data access layer is to act as the front-end for providing users with universal and transparent access to the data, regardless of the different DBMSs, file systems, manufacturers and network protocols. Structured Query Language (SQL) has become the standard for polling information at this level. Data access filters allow SQL to access both relational and non-relational DBMSs in addition to information from dated DBMSs within an enterprise. The data directory layer is the repository of metadata and can be thought of as the "Yellow Pages" for information about the data. Some of the aspects described in Metadata include the data's source, how it is mapped, and its reliability ranking. "This information allows end-users to access data from the data warehouse or operational database without having to know where the data resides or the form in which it is stored." [3] The process management layer serves as the process "scheduler" which updates the data warehouse. Since data integrity is an important aspect, the organization needs to consider the optimum time to schedule jobs that pull data from the production databases to be stored in the warehouse database. Failures due to bad or unknown data are unacceptable and procedures should be in place to handle these situations. 11 The application-messaging layer is also referred to as "middleware." Its function is to transport information throughout the enterprise computing network and includes tools for filtering and merging data, managing the metadata, and handling the data warehouse change modifications. The data warehouse layer is the core of the data warehouse. In a physical data warehouse, data is extracted from operational and/or external systems and stored in a database on a centrally located server. However, with the onset of client/server technology, not all computers have the capacity to store data. For these systems, the data warehouse may provide a virtual view. In a virtual or logical system, the data warehouse does not actually involve storing data but displays a copy of the database stored on the server. The data-staging layer is also referred to as replication management. "Data staging involves data quality analysis and filters to identify patterns and data structures within existing operational data." [3] Data Warehouse Logical Model In order to bring data from operational systems into a data warehouse, it must be logically transformed. Operational systems generally contain overlapping reference data or data in varying forms that is not entirely necessary for analysis in a data warehouse. The data entities incorporated into a data warehouse should align with the business structure and are built by collecting data from multiple source applications. An example of this is illustrated below in Figure 3. 12 Order processing Customer orders Product price Data Warehouse Available Inventory Customers Products Product Price/inventory Product price Product Inventory Orders Product Inventory Product Price changes Product Price Marketing Customer Profile Product price Marketing programs •No data model restrictions of the source application •Data warehouse model has business entities Figure 3. Data warehouse entities align with the business structure [8] Before data can be incorporated into a data warehouse it must first be “denormalized” and transformed, (Figure 4) and scrubbed or staged (Figure 5). Normalization of data occurs when relations or tables are progressively decomposed into smaller relations to a point where all attributes in a relation are tightly coupled” [8] Denormalization involves reducing the need for database table joins used in the normalization process and results in increased performance as the time required to perform “join” processes increases as the size of the data tables increase. Additionally, because data in a warehouse is static, all of the “joins” in an operational system may no longer necessary. 13 Order processing Customer orders Product price Data Warehouse Available Inventory Product Price/inventory Product price Customers De-normalized data Product Inventory Products Orders Transform State Product Price changes Product Inventory Product Price Ex ten sib le dat a wa reh ou se Marketing Customer Profile Product price Marketing programs • Structured extensible data model • Data warehouse model aligns with the business structure • Transformation of the state information • Data is de-normalized because the relationships are static Figure 4. Data Denormalization and Transformation [8] “Different source applications invariably use different attribute values to represent the same meaning. These different values need to be converted into a single value as data is loaded into the data warehouse.” [8] Transformation Operational System A ----------------------cust, cust_id, borrower >> customer ID ----------------------“1” >> “M” Data Warehouse System Summarized Data Detailed “2” >> “F” Operational System B ----------------------- Data Missing >>> “……..” • Uniform business terms • Single physical definition of an attribute • Consistent use of entity attributes • Default and missing values Figure 5. Data Scrubbing and Staging [8] 14 BENEFITS OF DATA WAREHOUSING IN CIM Computer-Integrated Manufacturing (CIM) is defined as, “Systems which enable the integrated, rationalized design, development, implementation, operation and improvement of production facilities and their output over the life cycle of the product. These systems identify and use appropriate technology to achieve their goals at minimum cost and effort” [9]. The ability to establish and understand the correlation between activities of different organizational groups within a company is often cited as the biggest advanced feature of data warehousing systems. [8] “Concurrent Engineering is a systematic approach to the integrated, concurrent design of products and their related processes, including manufacture and support. This approach is intended to cause the developers, from the outset, to consider all elements of the product life cycle from conception to disposal, including quality, cost, schedule, and user requirements. (Pennell and Winner, 1989)” Data warehousing can provide information for analysis that can aid in reducing design and production costs, ensuring product quality, and reducing the time required to go from product concept to production. The advantages for implementing data warehouse technology in today's organizations are sometimes hard to quantify since some elements are of intangible value. However, the positive impact on data warehousing investments include: more cost effective decision making, enhanced customer service, better business intelligence, enhanced asset and liability management, support of business process re-engineering and return on investment. [11] Organizations are beginning to understand the value of information. By implementing data warehouse technology, they are better able to achieve strategic advantages by obtaining better and more timely access to information about their business, products, customers, and competition. 15 CASE STUDIES Toyota Motor Company's Toyota Logistics Services (TLS) [4] Toyota Logistics Services is responsible for the shipment and maintenance information for all imported and domestically produced Toyota automobiles. Their system for tracking vehicles imported into the various U.S. ports and those manufactured in the U.S. factories proved tedious and overwhelming. TLS wanted a system that would allow them to reduce the factory to dealer shipment time to seven days through the use of inland transportation. In order to do this, TLS decided to implement a data warehouse. They incorporated Red Brick Systems' data warehousing software along with Brio Technologies' BrioQuery SQL software to run on a Unix-based IBM RS6000 computer. This system had to be able to manage the fast paced changes in the logistic areas in addition to provide the users with quick results for user queries. Before installing the new system, annual reports on the cost of freight transportation took up to three months to compile. In addition, TLS had to hire temporary personnel to assist in the preparation of this data. After implementing this new technology, the same report now takes only three weeks without the use of additional personnel. Toyota, realizing the success of this project, is committed to expanding this project to include another ten subject areas. This will allow the users to further analyze lead times and costs for getting vehicles accessorized and delivered to the dealerships by either truck or rail. 16 General Motors [5] General Motors, recognizing that they have a wealth of information stored in many different places, has decided to develop a massive data warehouse that will include detailed consumer information to be used for marketing purposes. This project will connect the stand alone customer databases from its car and truck divisions, car leasing, and home mortgages and credit card units to a central repository with a link to the company's international operations. General Motors' intentions are to promote their BuyPower online car shopping service and cross-sell customers on General Motors' financial services. In order to accomplish this task successfully, they need to better understand who their customers are and what are their needs. General Motors currently has not decided which hardware and software technologies to use or which marketing strategy plan to incorporate. However, they are currently researching and testing various systems around the world in an attempt to make a determination on which type system is best for them. MCI WorldCom and Industry Data Exchange (IDE) [6] MCI Worldcom and Industry Data Exchange Association decided to join forces and create several industry-specific, community-shared data warehouses that would serve as a place for manufacturers to distribute product information using electronic data interchange. Over 200 manufacturers are expected to participate in this project. In this system, distributors who are authorized by the manufacturers would pay a monthly fee to access the data warehouse and pull information into their own operational systems. MCI Worldcom and IDE decided that their project would use a Sun Microsystems ES5500 computer running Oracle's Oracle8 database. The disk capacity of their data warehouse would be 160 GB and would hold information on over three million products. The main objective of this project would be to serve as a central repository for suppliers and distributors to maintain product specs, part numbers, pricing, and packaging 17 quantities. By incorporating a standard format, confusion and misinformation are minimized while efficiency and productivity increases for all users. 18 REFERENCES [1]. Gagnon G., “Data Warehousing, an Overview,” PC Magazine, March 9, 1999, pp. 245. [2] Ewing, J. and Lais, S., (1999), “Data Warehousing for Information Retrieval,” Government Computer News 18, 4,47-50. [3] Orr, K. (1996), “Data Warehousing Technology,” White Paper, The Ken Orr Institute [4] Shein, E. (1997), ”Toyota Test Drives Warehouse,” PC Week, 14, 9, 58-60. [5] Wallace, B. (1999), “Data Warehouse to Drive Online Marketing at GM,” Computerworld, July 5, pp. 6. [6] Davis, B. (1999), “Data Warehouses Open Up,” Information Week, June 28, pp. 42. [7] Schroeck, M. (1998), “Data Warehousing is Worth the Investment,” June 8, pp.47. [8] Gupta, V. (1997), “An Introduction to Data Warehousing,” White Paper, System Services Corporation, Chicago, Illinois. [9] Nazemetz, W. John, “Lecture Slides,” Lecture No. 1, Fall 2000. [10] “Data Warehouse Economics: ROI doubts?,” Data Warehousing Tools Bulletin, 11/01/96, Page: 2329 [11] Wentz, Dave, “Data Warehousing, Better Access=Better Decisions=Better Business,” White Paper, Showcase Corporation 19