01-0 Why do we need Data Warehousing TOPICS Introduction to why What is strategic information? Types of questions management asks Who needs this type of information? What does strategic information mean – more? Characteristics of strategic information How do we get answers top business objectives? Failure of past systems What operational systems do best? How are they different from informational systems? What is the solution? Visual of the process Document1 by rt 6 February 2016 Page 1 of 16 01-0 Why do we need Data Warehousing This note talks about a need that exists and why we need data warehousing to solve that need. Later we talk about what is a data warehouse, other concepts and how to design one, followed by implementing the design Introduction to why About 10 plus years ago asking why we need data warehousing was an important question. Today any reasonable size operation will use or benefit from data warehousing. The growth in storage capacity, faster processing and better software makes data warehousing a viable option for smaller operations. The next concept that follows this is BIG DATA. Both will be needed and have their place. Up until now in your subjects you have mostly been dealing with application software along the database, java and other lines of study. In the systems courses you mostly dealt with business designs and solutions. These applications are important for the running of the business. They process orders, keep track of the company stock or inventory and, in environments like Seneca College, they handle the registration process, billing for tuition, payroll, transcripts, health insurance benefits for employees and many, many other day-to-day functions. All businesses have a lot of these application processes in common, particularly those related to bookkeeping or accounting systems. Without these systems a modern business cannot survive. The big growth in computerizing systems began in the 1960s. Today, even very small companies have become dependent upon computerized operations. These operation systems, many built and modified over a 50 year period, are very effective in what they were supposed to do … run the day-today activity of the company. They collect information, save the information, modify the information and produce a myriad of reports, originally paper but now online to show what transactions have occurred and how the business is functioning. The nineteen seventies, eighties and nineties large businesses grew more complex, amalgamated with others, spread across the country and other countries and the larger corporations spread globally (think of Walmart, Target, Bank of America, General Motors, McDonalds etc). Competition increased as the competitors were doing the same thing, expanding. In order to maintain a competitive operation the decision makers in the company, upper management, could not rely solely on traditional operational data. They needed what is called strategic data. This data has always been available, but it has been hard to extract. The primary objective is to have the business run successfully, so that the systems in place were made to do operational activities very well. To get strategic information required IT personnel. The system was not optimized for ad hoc extraction to meet the needs of upper management in the strategic decision making area. In larger operations the various levels of management have a general idea how the business is doing and how it functions. Compare their knowledge to an owner of a small fruit stand operation. The fruit Document1 by rt 6 February 2016 Page 2 of 16 stand owner can see exactly what the business does and should know it intimately. The same needs exist in larger operations. The need to know the business better. Where must this “new" type of information come from. The same place it always came from, the operational data. The difference is the need is greater and results need to be available faster. If not, the competition will do it. Note again that the day-to-day operations are good at providing data about how the company runs at the moment. Again, management still wants that information, but also they want strategic information. In the later nineties companies began looking at data warehousing to provide this kind of data and those that adopted it then began to see competitive advantages. What is strategic information? Strategic information might be a question of where to build another Seneca campus or what campus location should hold a particular school. Suppose the question arises because of increased enrolment overall and changing enrolment for different schools within Seneca. For example, should the School of Information Communications and Technology continue to be located in the Seneca@York site or should it be located in the Markham campus. The argument for Markham is that the surrounding business area is considered to be the "Silicon Valley" area for the Toronto region. The argument for Seneca@York is that this campus was originally designed to be a technical campus and holds other technical areas. Then again, maybe it should be in both areas or some parts should be located in one campus and other parts in any number of other campuses. What kind of information would management need to determine what is best. Does knowing that Seneca has over 25,000 fulltime students or other information such as what subjects are the students enrolled in, help you make a decision on where to locate the School or where to build a new campus. In the case of a business operation it might be where to build the next warehouse in Canada. Marketing may want to know which product lines to expand and which ones to shrink. What product lines are affected by political decisions, by weather, by demographics? Marketing managers can't operate without information. Imagine deciding to bring a line of detergents the company is selling into Canada. Apart from all the competitive reasons, there is also environmental concerns and laws that are not part of the day-to-day operational data. It is strategic information that the upper management looks for. The purpose of strategic information is to gain a better understanding of how the business operates and therefore to gain competitive advantage. Of course the desire for this information has always been there. The problem over the years is it has been difficult to retrieve strategic information from operational systems. Document1 by rt 6 February 2016 Page 3 of 16 ASIDE: This data is from a number of sources. Every year there are hurricanes in the southern USA. Everyone would understand that sales will likely drop on the day a hurricane hits a specific area as people are either evacuated or hunkering down. They don’t venture out as much to buy groceries etc. It may also be obvious that before a storm hits an area, there may be increased purchases of water, batteries, propane, generators and similar products like plywood to board up windows to help in emergencies. It makes good sense then for large companies to watch weather patterns and move more stock of some items to their stores that are potentially in the route of the storm. Did you know that sales of Pop Tarts also increase? They don’t require refrigeration or cooking. These examples are some of the more unusual discoveries by using data warehousing or data mining to uncover characteristics of how the business operates. This is strategic information that data warehousing and data mining try to provide. This apparently is one of the things Walmart discovered. In Florida it is beer and pop tarts that sell before hurricanes. Document1 by rt 6 February 2016 Page 4 of 16 Types of questions management needs answers for OR What questions if answered achieve a competitive advantage? Example: Retail business Here are some generalized questions. - What information would help us do __________ (something – see below) if we owned the business? - What information would we need to see that will provide advantage over our competitors? - What information can we find to increase sales? - Why is one of our stores selling so much more of product X? Can whatever that store is doing be passed on to any of our other stores? - Can the business take action even if it knows the information? Examples: Lower prices? It is important to know what effect lower prices have on profit. Can you sell enough extra to offset the lower prices? Are lower prices in all regions across Canada necessary or just one region? High turnover? Increase the rate of turnover. What advantage does this have? Lower stock levels? Lower stock levels mean less money tied up in inventories. This may mean that the company has to borrow less and therefore less interest on the cost of borrowing, less costs and presumably more profit. On the other side Lower inventory has the risk of running out of items particularly a hot seasonal item. Running out of inventory means less business and less sales and therefore less profit. A balance needs to be found. The question is what is that balance. Noticing trends? Where do things sell and at what price do they sell at in different regions is important. What has been the history of a product in those regions. Are there ethnic groups in one area that can be a source of more business? Is there a growing ethnic community and have they reached a critical mass where it is time to meet their needs. EXAMPLE – V8 example -- Perogies -- Walmart example Think about what would be required for other industries such as social services, financial industries like banking, transportation (trucking), or manufacturing. Ask yourself what questions would be asked to make good decisions about running these different types of business entities. Document1 by rt 6 February 2016 Page 5 of 16 EXAMPLES of how organizations can achieve the advantages they are looking for.: Using Financial - More service – what is the effect on profit customer satisfaction increased penetration - More services such as RRSP’s and investment advice at the local level - Ability to contact customer with promotional material geared specifically to that demographic of customer - Faster services - Getting customers to pay for services - Meeting local needs - Giving free service as in President Choice AIRLINE INDUSTRY - More passengers per flights – how to increase passenger load (fixed costs are the same) - what is the trend in passenger loads by route - are there specific time periods, seasonal, Monday etc - Purpose: - to have the right flight at the right time and no more - Getting the right balance of first, business and economy class - What balance based on sales would generate the right mix? - Does first class empty seats and economy full mean loss of business? - Is there a flexible low cost way of changing size of each seat group? - Knowing who travels when MANUFACTURING - Cost reductions - Just-in-time supplies - Quality production - Lower defaults Document1 by rt 6 February 2016 Page 6 of 16 Who needs this type of information? Who needs this type of information? Again this is normally the people in the upper levels of the company. The board of directors, executives, managers at all decision making levels and marketers. These are the decision makers in the company. This information isn't for the person who loads and unloads the products at the back of the warehouse or the cashier in a grocery store. In the case of Seneca this isn't the type of information needed by faculty and office staff. Quite often the management of these people need operational type of information to help them manage. Who needs Executives Strategic Managers - at all decision making levels Information Marketers Decision Makers The decision makers tend to be looking at longer-term strategic decision making. The kinds of information they need tends to come from analysis of trends. Trends occur over time. Management needs to have an in depth knowledge of what effects the operation of the business and how these key business factors affect one another. What changes over time and how does that compare with similar companies. The big focus for executives is attention to the customer's needs in regards to products or services. NOTE: strategic information does not run the day-to-day operations of the business. Document1 by rt 6 February 2016 Page 7 of 16 What does strategic information mean? Information used To create business strategies - Will we need more advertising in the Niagara Peninsula? To establish goals -Currently averaging 250 units per month - Will need 350 units per month to be sold this year and 400 the next year To monitor results Information that drives the strategies employed in a business operation would be strategic information NOTE: The information isn’t just What did we sell today or How many frozen pizzas do we buy for the weekend NOT WHAT WE SOLD TODAY BUT … MORE ABOUT WHAT WILL WE PROMOTE WHAT WILL WE SELL WHAT DID WE NOTICE TO DO DIFERENTLY This leads to increased need for strategic information Document1 by rt 6 February 2016 Page 8 of 16 Characteristics of Strategic Information WHOLE ENTERPRISE VIEW The data must have a whole enterprise view. To make good decisions you need a view of all the information. Sales information can be in 10 different places. The information needed to do strategic planning often blends a variety of areas together. (Aside: This is the intent, but the data warehouse may be built in stages incorporating one major area at a time then melding them together when fully implemented.) DATA INTEGRITY The information being presented must be accurate. Just like accuracy must apply to the day-to-day or operational systems (OLTP – Online Transaction Processing System) the data warehousing system must maintain data integrity. The reason is that major decisions are being made based on this information. Also since data comes from different systems one system may use the values M, S or W for the status of a person and another system within the organization uses the full word Married, Single or Widowed. A decision needs to be made about data that has the same meaning. There needs to be a consistent look to the data that is in the data warehouse. ACCESSIBLE The OLTP is designed to optimize processing. For management the diagrams used to show the system and the SQL to access data is not easily understood. The information decision makers need must be easy to obtain, be flexible to answering questions that arise from the data and intuitive to management. By intuitive we mean that the information does not require a programming person. Business people can access the information available. To be responsive that data must be in a format that will allow analysis. CREDIBLE Every displayed fact must have one value. The value doesn't change over time if a similar report was asked next year. (More about this major topic later will appear under slowly changing dimensions) TIMELY Information must be available within a short time frame. Information too late is … useless VERY IMPORTANT Document1 by rt 6 February 2016 Page 9 of 16 How do we get the answers to business objectives Thinking back on your own experiences dealing with applications in a company you will know that there are lots of databases and large quantities of data to support the operations of the business. Companies retain several years' worth of customer data particularly Financial Data. However, not all of this data is in the current operational databases. Operational databases hold the “now” data. Anything other than “now” data, in other words historical data, has been archived. Operational systems have also evolved over time on many different platforms particularly in the case of companies that have merged or a have a wide range of different businesses under one umbrella company. Also, there are still a lot of legacy systems (not from the 1960s, but even systems 5, 10 and 20 years old that work perfectly well and would cost a lot to change.) These day-to-day operational systems certainly were not optimized to obtain strategic information from them. Their job is to efficiently run the daily operation of the business. They were therefore optimized to ensure cost effectiveness or efficiency. (One example is normalization of data into many tables) Looking at the above information we can see that there might be two problems: 1 Organizations have lots of data 2 The day-to-day operations are effective for their purpose but not for strategic information format Information we have … by the ton We already know that data in general keeps growing exponentially. Just think of Google, Facebook etc. The same growth of data occurs in business each year. Historically the kinds of information management needed, took too long to obtain. The longer it takes to get, the less effective the information becomes. That slowness did not help management to act as quickly as possible. SENECA EXAMPLE of the amount of data Seneca has been going since 1967. Starting in 2010 the fulltime population of Seneca reached 20,000 students and is now about 25,000. However, if you include part time students, special interest groups such as sports camps held in the summer, post diploma programs and degree programs Seneca has far more students than 25,000. What does that all mean from the data point of view? Seneca has tons of data and a lot of this data is archived for retrieval on an occasional to rare bases. The archived data is often kept on a different platform. There is no lack of data available. It just isn't easy to access and manipulate when looking at five year periods to analyze trends. Document1 by rt 6 February 2016 Page 10 of 16 Failures from past Decision support Systems- Another example (source unknown) There were decision support data being produced in the past, based on established requirements. Here is a scenario. SCENARIO The VP has noticed that the government KPI has shown a large jump for Computer Studies in student satisfaction in the upper semesters over previous years. The VP calls the Computer Studies department and the IT department for data over the last 2 years. She, the Vice President, wants to compare semester by semester such things as enrolment trends, pass/fail rates, number of times subject taught before by same faculty, job market or placement statistics. PROBLEM There is no such data at present in a format the VP can understand (report, spreadsheet) that compares these factors over different time periods. The data exists, but is in various systems. There isn’t an exisitng program to retrieve and present the data requested If you were assigned the task to gather that data from multiple applications on multiple systems from scratch… Is this going to be easy? Maybe. How long will it take. Will it impact any other job you are doing? AFTER you did it? Now the VP likes the info and asks for more and in a different format. You have to start again. These ad hoc reports are a pain. They require - Extracting data - Cleaning and compatibility issues need addressing - Data from same time units for comparison - Large files to store the extracted and reformatted data - Time by IT personnel which takes away fronm their other tasks Why the failure in the past The major reason for failure in the past was that strategic information was being provided from operational systems. We can see above what the problems entailed. NOTE: none of the above is intended to imply that operational systems are not good. They were designed for a different purpose -- to keep the day-to-day operations going. Everything has been optimized for that purpose. Without them there wouldn't be a business. Document1 by rt 6 February 2016 Page 11 of 16 What do operational systems do best Take an order from a customer Process that orders through the warehouse Make a shipment to the customer Generate an invoice Receive payment for the shipment Process the payment through the banking system Keep all financial records up to date Pay bills for costs of operations and for products Pay employees … Etc In a college it would handle Process an application for admittance' Invoice the student Accept payment and load registration data Produce timetable … Etc. How are operational systems different from Informational Systems OPERATIONAL vs INFORMATIONAL OPERATIONAL Data Content Current Values (6 pairs of blue socks) Data Structure Access Frequency Access Type Usage Response time Users Document1 by rt Optimized for transaction processing High (such as every food item swiped in a grocery store) Read Update Delete Predictable Repetitive (order are processed in a predictable fashon) Sub-seconds (don't want slow response when processing groceries) Large Number 6 February 2016 INFORMATIONAL Archived Derived (calculated, massaged) Summarized Optimized to handle complex queries Medium to Low Read ( the data has been gathered and formatted for extracting trends or other information reporting) Ad hoc Random (as in previous example questions arise that need answering) Several seconds to Minutes Relatively small number Page 12 of 16 Summary OLTP systems typically Support large numbers of concurrent users who are actively adding and modifying data. Represent the constantly changing state of an organization but don't save its history. Contain large amounts of data, including extensive data used to verify transactions. Have complex structures. Are tuned to be responsive to transaction activity. OLTP systems Provide the technology infrastructure to support the day-to-day operations of an organization. Difficulties often encountered when OLTP databases are used for online analysis include the following: Analysts do not have the technical expertise required to create ad hoc queries against the complex data structure. (Example: Business analysts do not write SQL) Analytical queries that summarize large volumes of data adversely affect the ability of the system to respond to online transactions. (Processing billions of rows of data slows down the system) System performance when responding to complex analysis queries can be slow or unpredictable, providing inadequate support to online analytical users. Constantly changing data interferes with the consistency of analytical information. Security becomes more complicated when online analysis is combined with online transaction processing. Document1 by rt 6 February 2016 Page 13 of 16 WHAT IS THE SOLUTION What is needed is a different system separate from the operational system that can provide business intelligence. Of course this leads to Data Warehousing which provides one of the keys to solving these problems, by organizing data differently for the purposes of analysis. Data warehouses – what does a DW do: Data warehouses can combine the data from heterogeneous data sources into a single homogenous structure. They organize data in simplified structures for efficiency of analytical queries rather than for transaction processing. Contain transformed data that is accurate, consistent, grouped, and displayed/formatted for analysis. Provide stable data that represents business history. DW is updated periodically (based on time periods) with additional data rather than frequent transactions. Simplify security requirements. Provide a database organized for OLAP rather than OLTP. The concept behind a data warehouse of information is not to provide new or fresh data. There is enough data already. It is to make use of that huge amount of data and transform it into a more usable form that meets the management need for strategic information. Note that the operational systems are about applications, whereas the Data Warehouse is grouped by business subjects. Business subjects (Sales, Products, Customers, Policy) are what management understands. Document1 by rt 6 February 2016 Page 14 of 16 VISUAL of the PROCESS OPERATIONAL SYSTEMS OPERATIONAL SYSTEMS OPERATIONAL SYSTEMS DATA EXTRACTION Process is known as DATA TRANSFORMATION Data staging area for Extraction, Cleansing, Aggregating and Loading to DW ETL Extraction Transformation Loading DATA WAREHOUSE Document1 by rt 6 February 2016 Page 15 of 16 The next short file is 01-1- WHAT IS A DATA WAREHOUSE Document1 by rt 6 February 2016 Page 16 of 16