01-0-Why do we need - Seneca

advertisement
01-0 Why do we need Data Warehousing
TOPICS
Introduction to why
What is strategic information?
Types of questions management asks
Who needs this type of information?
What does strategic information mean – more?
Characteristics of strategic information
How do we get answers top business objectives?
Failure of past systems
What operational systems do best?
How are they different from informational systems?
What is the solution?
Visual of the process
Document1 by rt
6 February 2016
Page 1 of 16
01-0 Why do we need Data Warehousing
This note talks about a need that exists and why we need data warehousing to solve that need. Later
we talk about what is a data warehouse, other concepts and how to design one, followed by
implementing the design
Introduction to why
About 10 plus years ago asking why we need data warehousing was an important question. Today
any reasonable size operation will use or benefit from data warehousing. The growth in storage
capacity, faster processing and better software makes data warehousing a viable option for smaller
operations. The next concept that follows this is BIG DATA. Both will be needed and have their place.
Up until now in your subjects you have mostly been dealing with application software along the
database, java and other lines of study. In the systems courses you mostly dealt with business
designs and solutions. These applications are important for the running of the business. They
process orders, keep track of the company stock or inventory and, in environments like Seneca
College, they handle the registration process, billing for tuition, payroll, transcripts, health insurance
benefits for employees and many, many other day-to-day functions. All businesses have a lot of
these application processes in common, particularly those related to bookkeeping or accounting
systems. Without these systems a modern business cannot survive.
The big growth in computerizing systems began in the 1960s. Today, even very small companies
have become dependent upon computerized operations. These operation systems, many built and
modified over a 50 year period, are very effective in what they were supposed to do … run the day-today activity of the company. They collect information, save the information, modify the information
and produce a myriad of reports, originally paper but now online to show what transactions have
occurred and how the business is functioning.
The nineteen seventies, eighties and nineties large businesses grew more complex, amalgamated
with others, spread across the country and other countries and the larger corporations spread globally
(think of Walmart, Target, Bank of America, General Motors, McDonalds etc). Competition increased
as the competitors were doing the same thing, expanding. In order to maintain a competitive
operation the decision makers in the company, upper management, could not rely solely on traditional
operational data. They needed what is called strategic data. This data has always been available,
but it has been hard to extract. The primary objective is to have the business run successfully, so that
the systems in place were made to do operational activities very well. To get strategic information
required IT personnel. The system was not optimized for ad hoc extraction to meet the needs of
upper management in the strategic decision making area.
In larger operations the various levels of management have a general idea how the business is doing
and how it functions. Compare their knowledge to an owner of a small fruit stand operation. The fruit
Document1 by rt
6 February 2016
Page 2 of 16
stand owner can see exactly what the business does and should know it intimately. The same needs
exist in larger operations. The need to know the business better.
Where must this “new" type of information come from. The same place it always came from, the
operational data. The difference is the need is greater and results need to be available faster. If not,
the competition will do it.
Note again that the day-to-day operations are good at providing data about how the company runs at
the moment. Again, management still wants that information, but also they want strategic
information. In the later nineties companies began looking at data warehousing to provide this kind of
data and those that adopted it then began to see competitive advantages.

What is strategic information?
Strategic information might be a question of where to build another Seneca campus or what campus
location should hold a particular school. Suppose the question arises because of increased
enrolment overall and changing enrolment for different schools within Seneca. For example, should
the School of Information Communications and Technology continue to be located in the
Seneca@York site or should it be located in the Markham campus. The argument for Markham is that
the surrounding business area is considered to be the "Silicon Valley" area for the Toronto region.
The argument for Seneca@York is that this campus was originally designed to be a technical campus
and holds other technical areas. Then again, maybe it should be in both areas or some parts should
be located in one campus and other parts in any number of other campuses.
What kind of information would management need to determine what is best. Does knowing that
Seneca has over 25,000 fulltime students or other information such as what subjects are the students
enrolled in, help you make a decision on where to locate the School or where to build a new campus.
In the case of a business operation it might be where to build the next warehouse in Canada.
Marketing may want to know which product lines to expand and which ones to shrink. What product
lines are affected by political decisions, by weather, by demographics? Marketing managers can't
operate without information. Imagine deciding to bring a line of detergents the company is selling into
Canada. Apart from all the competitive reasons, there is also environmental concerns and laws that
are not part of the day-to-day operational data.
It is strategic information that the upper management looks for.
The purpose of strategic information is to gain a better understanding of how the business operates
and therefore to gain competitive advantage. Of course the desire for this information has always
been there. The problem over the years is it has been difficult to retrieve strategic information from
operational systems.
Document1 by rt
6 February 2016
Page 3 of 16
ASIDE: This data is from a number of sources.
Every year there are hurricanes in the southern USA. Everyone would understand that
sales will likely drop on the day a hurricane hits a specific area as people are either
evacuated or hunkering down. They don’t venture out as much to buy groceries etc. It may
also be obvious that before a storm hits an area, there may be increased purchases of
water, batteries, propane, generators and similar products like plywood to board up
windows to help in emergencies. It makes good sense then for large companies to watch
weather patterns and move more stock of some items to their stores that are potentially in
the route of the storm.
Did you know that sales of Pop Tarts also increase? They don’t require refrigeration or
cooking. These examples are some of the more unusual discoveries by using data
warehousing or data mining to uncover characteristics of how the business operates. This is
strategic information that data warehousing and data mining try to provide.
This apparently is one of the things Walmart discovered. In Florida it is beer and pop tarts
that sell before hurricanes.
Document1 by rt
6 February 2016
Page 4 of 16
Types of questions management needs answers for
OR
What questions if answered achieve a competitive advantage?
Example: Retail business
Here are some generalized questions.
- What information would help us do __________ (something – see below) if we owned the business?
- What information would we need to see that will provide advantage over our competitors?
- What information can we find to increase sales?
- Why is one of our stores selling so much more of product X? Can whatever that store is doing be
passed on to any of our other stores?
- Can the business take action even if it knows the information?
Examples:
Lower prices?
It is important to know what effect lower prices have on profit. Can you sell enough extra to offset the
lower prices? Are lower prices in all regions across Canada necessary or just one region?
High turnover?
Increase the rate of turnover. What advantage does this have?
Lower stock levels?
Lower stock levels mean less money tied up in inventories. This may mean that the company has to
borrow less and therefore less interest on the cost of borrowing, less costs and presumably more
profit. On the other side Lower inventory has the risk of running out of items particularly a hot
seasonal item. Running out of inventory means less business and less sales and therefore less
profit. A balance needs to be found. The question is what is that balance.
Noticing trends?
Where do things sell and at what price do they sell at in different regions is important. What has been
the history of a product in those regions. Are there ethnic groups in one area that can be a source of
more business? Is there a growing ethnic community and have they reached a critical mass where it
is time to meet their needs.
EXAMPLE
– V8 example
-- Perogies
-- Walmart example
Think about what would be required for other industries such as social services, financial industries
like banking, transportation (trucking), or manufacturing. Ask yourself what questions would be asked
to make good decisions about running these different types of business entities.
Document1 by rt
6 February 2016
Page 5 of 16
EXAMPLES of how organizations can achieve the advantages they are looking for.:
Using Financial
- More service – what is the effect on  profit  customer satisfaction  increased penetration
- More services such as RRSP’s and investment advice at the local level
- Ability to contact customer with promotional material geared specifically to that demographic of customer
- Faster services
- Getting customers to pay for services
- Meeting local needs
- Giving free service as in President Choice
AIRLINE INDUSTRY
- More passengers per flights – how to increase passenger load (fixed costs are the same)
- what is the trend in passenger loads by route
- are there specific time periods, seasonal, Monday etc
- Purpose:
- to have the right flight at the right time and no more
- Getting the right balance of first, business and economy class
- What balance based on sales would generate the right mix?
- Does first class empty seats and economy full mean loss of business?
- Is there a flexible low cost way of changing size of each seat group?
- Knowing who travels when
MANUFACTURING
- Cost reductions
- Just-in-time supplies
- Quality production
- Lower defaults
Document1 by rt
6 February 2016
Page 6 of 16
Who needs this type of information?
Who needs this type of information? Again this is normally the people in the upper levels of the
company. The board of directors, executives, managers at all decision making levels and marketers.
These are the decision makers in the company. This information isn't for the person who loads and
unloads the products at the back of the warehouse or the cashier in a grocery store. In the case of
Seneca this isn't the type of information needed by faculty and office staff. Quite often the
management of these people need operational type of information to help them manage.
Who needs  Executives
Strategic
 Managers - at all decision making levels
Information  Marketers
Decision
Makers
The decision makers tend to be looking at longer-term strategic decision making. The kinds of
information they need tends to come from analysis of trends. Trends occur over time. Management
needs to have an in depth knowledge of what effects the operation of the business and how these
key business factors affect one another. What changes over time and how does that compare with
similar companies.
The big focus for executives is attention to the customer's needs in regards to products or services.
NOTE: strategic information does not run the day-to-day operations of the business.
Document1 by rt
6 February 2016
Page 7 of 16
What does strategic information mean?
Information used
 To create business strategies
- Will we need more advertising in the Niagara Peninsula?
 To establish goals
-Currently averaging 250 units per month
- Will need 350 units per month to be sold this year and 400 the next year
 To monitor results
Information that drives the strategies employed in a business operation would be strategic information
NOTE:
The information isn’t just
 What did we sell today
or
 How many frozen pizzas do we buy for the weekend
NOT WHAT WE SOLD TODAY
BUT …
MORE ABOUT
WHAT WILL WE PROMOTE
WHAT WILL WE SELL
WHAT DID WE NOTICE TO DO
DIFERENTLY
This leads to  increased need for strategic information
Document1 by rt
6 February 2016
Page 8 of 16
Characteristics of Strategic Information
WHOLE ENTERPRISE VIEW
The data must have a whole enterprise view.
To make good decisions you need a view of all the information. Sales information can be in 10
different places. The information needed to do strategic planning often blends a variety of areas
together. (Aside: This is the intent, but the data warehouse may be built in stages incorporating one
major area at a time then melding them together when fully implemented.)
DATA INTEGRITY The information being presented must be accurate.
Just like accuracy must apply to the day-to-day or operational systems (OLTP – Online Transaction
Processing System) the data warehousing system must maintain data integrity. The reason is that
major decisions are being made based on this information. Also since data comes from different
systems one system may use the values M, S or W for the status of a person and another system
within the organization uses the full word Married, Single or Widowed. A decision needs to be made
about data that has the same meaning. There needs to be a consistent look to the data that is in the
data warehouse.
ACCESSIBLE
The OLTP is designed to optimize processing.
For management the diagrams used to show the system and the SQL to access data is not easily
understood. The information decision makers need must be easy to obtain, be flexible to answering
questions that arise from the data and intuitive to management. By intuitive we mean that the
information does not require a programming person. Business people can access the information
available. To be responsive that data must be in a format that will allow analysis.
CREDIBLE
Every displayed fact must have one value.
The value doesn't change over time if a similar report was asked next year. (More about this major
topic later will appear under slowly changing dimensions)
TIMELY
Information must be available within a short time frame.
Information too late is … useless
VERY IMPORTANT
Document1 by rt
6 February 2016
Page 9 of 16
How do we get the answers to business objectives
Thinking back on your own experiences dealing with applications in a company you will know that
there are lots of databases and large quantities of data to support the operations of the business.
Companies retain several years' worth of customer data particularly Financial Data. However, not all
of this data is in the current operational databases. Operational databases hold the “now” data.
Anything other than “now” data, in other words historical data, has been archived. Operational
systems have also evolved over time on many different platforms particularly in the case of
companies that have merged or a have a wide range of different businesses under one umbrella
company. Also, there are still a lot of legacy systems (not from the 1960s, but even systems 5, 10 and
20 years old that work perfectly well and would cost a lot to change.)
These day-to-day operational systems certainly were not optimized to obtain strategic information
from them. Their job is to efficiently run the daily operation of the business. They were therefore
optimized to ensure cost effectiveness or efficiency. (One example is normalization of data into many
tables)
Looking at the above information we can see that there might be two problems:
1 Organizations have lots of data
2 The day-to-day operations are effective for their purpose but not for strategic information format
Information we have … by the ton
We already know that data in general keeps growing exponentially. Just think of Google, Facebook
etc. The same growth of data occurs in business each year. Historically the kinds of information
management needed, took too long to obtain. The longer it takes to get, the less effective the
information becomes. That slowness did not help management to act as quickly as possible.
SENECA EXAMPLE of the amount of data
Seneca has been going since 1967. Starting in 2010 the fulltime population of Seneca reached
20,000 students and is now about 25,000. However, if you include part time students, special interest
groups such as sports camps held in the summer, post diploma programs and degree programs
Seneca has far more students than 25,000.
What does that all mean from the data point of view? Seneca has tons of data and a lot of this data is
archived for retrieval on an occasional to rare bases. The archived data is often kept on a different
platform.
There is no lack of data available. It just isn't easy to access and manipulate when looking at five
year periods to analyze trends.
Document1 by rt
6 February 2016
Page 10 of 16
Failures from past Decision support Systems- Another example (source unknown)
There were decision support data being produced in the past, based on established requirements.
Here is a scenario.
SCENARIO
The VP has noticed that the government KPI has shown a large jump for Computer Studies in student
satisfaction in the upper semesters over previous years. The VP calls the Computer Studies
department and the IT department for data over the last 2 years. She, the Vice President, wants to
compare semester by semester such things as enrolment trends, pass/fail rates, number of times
subject taught before by same faculty, job market or placement statistics.
PROBLEM
There is no such data at present in a format the VP can understand (report, spreadsheet) that
compares these factors over different time periods. The data exists, but is in various systems. There
isn’t an exisitng program to retrieve and present the data requested
If you were assigned the task to gather that data from multiple applications on multiple systems from
scratch… Is this going to be easy? Maybe. How long will it take. Will it impact any other job you are
doing?
AFTER you did it?
Now the VP likes the info and asks for more and in a different format.
You have to start again.
These ad hoc reports are a pain. They require
- Extracting data
- Cleaning and compatibility issues need addressing
- Data from same time units for comparison
- Large files to store the extracted and reformatted data
- Time by IT personnel which takes away fronm their other tasks
Why the failure in the past
The major reason for failure in the past was that strategic information was being provided from
operational systems. We can see above what the problems entailed.
NOTE: none of the above is intended to imply that operational systems are not good. They were
designed for a different purpose -- to keep the day-to-day operations going. Everything has been
optimized for that purpose. Without them there wouldn't be a business.
Document1 by rt
6 February 2016
Page 11 of 16
What do operational systems do best
Take an order from a customer
Process that orders through the warehouse
Make a shipment to the customer
Generate an invoice
Receive payment for the shipment
Process the payment through the banking system
Keep all financial records up to date
Pay bills for costs of operations and for products
Pay employees
… Etc
In a college it would handle
Process an application for admittance'
Invoice the student
Accept payment and load registration data
Produce timetable
… Etc.
How are operational systems different from Informational Systems
OPERATIONAL vs INFORMATIONAL
OPERATIONAL
Data Content
Current Values
(6 pairs of blue socks)
Data Structure
Access
Frequency
Access Type
Usage
Response time
Users
Document1 by rt
Optimized for transaction
processing
High
(such as every food item
swiped in a grocery store)
Read
Update
Delete
Predictable
Repetitive
(order are processed in a
predictable fashon)
Sub-seconds
(don't want slow response
when processing groceries)
Large Number
6 February 2016
INFORMATIONAL
Archived
Derived (calculated, massaged)
Summarized
Optimized to handle complex queries
Medium to Low
Read
( the data has been gathered and
formatted for extracting trends or other
information reporting)
Ad hoc
Random
(as in previous example questions arise
that need answering)
Several seconds to
Minutes
Relatively small number
Page 12 of 16
Summary
OLTP systems typically
 Support large numbers of concurrent users who are actively adding and modifying data.
 Represent the constantly changing state of an organization but don't save its history.
 Contain large amounts of data, including extensive data used to verify transactions.
 Have complex structures.
 Are tuned to be responsive to transaction activity.
OLTP systems  Provide the technology infrastructure to support the day-to-day operations
of an organization.
Difficulties often encountered when OLTP databases are used for online analysis
include the following:
 Analysts do not have the technical expertise required to create ad hoc queries against the complex
data structure. (Example: Business analysts do not write SQL)
 Analytical queries that summarize large volumes of data adversely affect the ability of the system
to respond to online transactions. (Processing billions of rows of data slows down the system)
 System performance when responding to complex analysis queries can be slow or unpredictable,
providing inadequate support to online analytical users.
 Constantly changing data interferes with the consistency of analytical information.
Security becomes more complicated when online analysis is combined with online transaction
processing.
Document1 by rt
6 February 2016
Page 13 of 16
WHAT IS THE SOLUTION
What is needed is a different system separate from the operational system that can provide business
intelligence.
Of course this leads to Data Warehousing which provides one of the keys to solving these problems,
by organizing data differently for the purposes of analysis.
Data warehouses – what does a DW do:
Data warehouses can combine the data from heterogeneous data sources into a single
homogenous structure.
They organize data in simplified structures for efficiency of analytical queries rather than for
transaction processing.
 Contain transformed data that is accurate, consistent, grouped, and displayed/formatted for
analysis.
 Provide stable data that represents business history.
 DW is updated periodically (based on time periods) with additional data rather than frequent
transactions.
 Simplify security requirements.
 Provide a database organized for OLAP rather than OLTP.
The concept behind a data warehouse of information is not to provide new or fresh data.
There is enough data already. It is to make use of that huge amount of data and transform it
into a more usable form that meets the management need for strategic information.
Note that the operational systems are about applications, whereas the Data Warehouse is grouped
by business subjects. Business subjects (Sales, Products, Customers, Policy) are what management
understands.
Document1 by rt
6 February 2016
Page 14 of 16
VISUAL of the PROCESS
OPERATIONAL
SYSTEMS
OPERATIONAL
SYSTEMS
OPERATIONAL
SYSTEMS
DATA EXTRACTION
Process is
known as
DATA TRANSFORMATION
Data staging area for
Extraction,
Cleansing,
Aggregating and
Loading to DW
ETL
Extraction
Transformation
Loading
DATA WAREHOUSE
Document1 by rt
6 February 2016
Page 15 of 16
The next short file is
01-1- WHAT IS A DATA WAREHOUSE
Document1 by rt
6 February 2016
Page 16 of 16
Download