Ford Motor Company - University of Houston

advertisement
UNIVERSITY OF HOUSTON-CLEAR LAKE
Ford Motor Company
A Case Study on Building a Data Warehouse
Prepared by:
BalaNavya Karuturi
Vineela Tatineni
Jeff Rix
Prepared for:
Dr. Rob
ISAM 5332
12/4/2011
University of Houston-Clear Lake
ISAM 5332
Dr. Rob
Ford Motor Company: A Case Study on Building a Data Warehouse
Jeff Rix
BalaNavya Karuturi
Vineela Tatineni
Abstract
This paper discusses the process of building a
data warehouse for Ford Motor Company, a
global automotive company which is the second
largest automaker in the U.S.
Introduction
Ford Motors is a wide-spread company
expanded over all regions and states of the
United States and manufactures and sells a
wide variety and range of vehicles. The sales
and production volumes of the company has
been increasing significantly over the past 10
years. But over the recent years, there have
been inconsistencies with the sales across
various regions. It plans to reduce overhead
costs by taking note of the dealerships that are
not performing well enough. But however, they
would want to do the classification based on
region, so as not to eradicate or adversely affect
the customer-base off an entire geographical
region. Ford Motors currently does maintain
multiple transactional systems at various
dealerships. These systems are used to run the
transactional and maintenance operations of
the business like sales, services and
accessories/parts sales of the various vehicle
types. But these systems do not provide them
with the above-needed information that Ford
needs in order to obtain answers to certain
business questions like which dealership
amongst those in a particular state have the
lowest sales, so as for Ford to reduce storage
space for the dealership and in turn reduce
overhead cost at that dealership. Many more of
such strategic questions need to be answered in
order for management of Ford to take decisions
pertaining to the improvement/enhancement
of the business over time. For the same
purpose, Ford would need what is known as a
Data Warehouse that houses information in a
format so as to be able to provide answers to
the strategic questions that Ford needs answers
to. The purpose of a data warehouse is to
collect desired data from various transactional
systems, convert it into a common format,
cleanse it, and allow it to be queried for useful
strategic information in the future. Data must
be continually collected, transformed, and
cleansed from each of the different
transactional systems in order to keep the
warehouse properly maintained.
Case Study: Ford Motor Company
1. Business Scenario
Ford Motors is a wide-spread company
expanded over all regions and states of the
United States and manufactures and sells a
wide variety and range of vehicles. The sales
and production volumes of the company has
been increasing significantly over the past 10
years. Ford is increasing rapidly in terms of
1|Page
volumes and dealerships. The problem with an
enterprise as big as Ford is that the ownership
and the management face a huge problem,
which is that with the number of locations and
dealerships and the rate at which new
dealerships are allocated and the service and
sales centers are expanded, it is difficult for
them to conduct timely visits to each of the
locations in order to monitor dealership sales,
employee performance, branch revenue, and
ensure that the best care in terms of “aftersales-service” is being offered to the customers.
It is thus imperative for an enterprise of this
magnitude to get to find a way to extract the
data that he needs from each of these systems
he will be able to maintain greater control over
his business. We will define sales for new
vehicles by dealership, region and state for each
product and vehicle type for each month and
quarter in one year.
2. Why a Data Warehouse?
Being an automotive company, the inventory,
sales and location do matter for ford and this
case study helps management keep track on
sales at different locations. The data should
help in decisions regarding inventory
management, process management and cutting
down on dealerships with less efficiency. Also,
this study helps in concentrating on the
dealerships with good success rates. So, the
usual initial purpose of a Data Warehouse is
simply to give end users more flexible access to
data they already have. The Date Warehouse
grows from there. Eventually it will fulfill the
higher management’s in Data Mining: finding
profitable customer demographics that could
have been ignored, finding releases of certain
models of vehicles that they have not tried at a
particular geographical location and so on.
Some points to look at to make key decisions:

What dealership has the lowest sales in
Texas? We could determine which
dealerships to close first to reduce
overhead.
 What vehicle types sell the most in each
region? We would know what types of
vehicles to provide in each region.
 What vehicle name sells the most in
each state? We would know to order
more of that product in that state.
 What quarter has the highest sales? We
would know when to order more
vehicles for the dealerships.
 What region sells the most in the
busiest quarter? We could use this data
to find out where we would like to
create new dealerships.
 What vehicle type is selling the least in
each state? We would know to order
less of that product in that state.
What month is least successful by city? We
could enhance marketing in that quarter.
The management needs a system where data is
stored with respect to time as another
dimension, and through the same/related query
this data is used to obtain reports that will
provide them with the strategic information
that is needed from questions like those in
previous section. This can include reports such
as revenue for each dealer by state, revenue by
vehicle category, customers who have bought
high-end models and sales persons who are
pivoting most sales in the dealership. The ability
to query this kind of system online from his
office is the perfect solution to his problem.
2|Page
3. Methodology
Procedures of creating a DW that make it
advantageous:







With a Data warehouse, the queries can
be run without putting load on the
production databases.
Data of different vehicles is extracted
region wise by external websites which
are transformed and sorted accordingly.
This loaded the Data into Data Source
and connected to the Analysis Services.
Multiple data-sources can be mapped
with the data ware house which
eliminates a single source query issues.
Multiple levels of filtering can be
avoided by using a data warehouse
which is a more streamlined approach.
The database is not dependent on the
operational infrastructure and hence
decreases the criticality of having
downtime due to "Single point of
failure". This also decreases the chances
of slowing down the IT infrastructure
due to higher loads on the system.
From business perspective, the
operational and customer relationship
applications are more easy to use.
3|Page
Dimensional Modeling - Dimension Table
4. Dimensional Modeling and defining data
structure
Data Management - Dimensional Modeling
Definition
Dimensional modeling is the design concept
used by many data warehouse designers to
build their data warehouse. Dimensional model
is the underlying data model used by many of
the commercial OLAP products available today
in the market. In this model, all data is
contained
in
two
types
of
tables
called Fact Table and Dimension Table.
In a Dimensional Model, context of the
measurements are represented in dimension
tables. You can also think of the context of a
measurement as the characteristics such
as who, what, where, when, how of a
measurement (subject). In your business
process Sales, the characteristics of the
'monthly sales number' measurement can be a
Location (Where), Time (When), Product Sold
(What). The Dimension Attributes are the
various columns in a dimension table. In the
Location dimension, the attributes can be
Location
Code,
State,
Country,
Zip
code. Generally the Dimension Attributes are
used in report labels, and query constraints
such as where Country='USA'. The dimension
attributes also contain one or more hierarchical
relationships. Before designing your data
warehouse, you need to decide what this data
warehouse contains. Say if you want to build a
data warehouse containing monthly sales
numbers across multiple store locations, across
time and across products then your dimensions
are:



Location
Time
Product
Query:
Dimensional Modeling - Fact Table
In a Dimensional Model, Fact table contains the
measurements or metrics or facts of business
processes. If your business process is Sales,
then a measurement of this business process
such as "monthly sales number" is captured in
the fact table. In addition to the measurements,
the only other things a fact table contains are
foreign keys for the dimension tables.
4|Page
modeling. Modern software is very useful when
designing fact tables, dimension tables, and
establishing the relationships between them.
When you have finished modeling the
dimensions and establishing the relationships,
you end up with a database schema. The two
types of schemas that are generally used in a
data warehouse are the STAR schema and the
snowflake schema. The STAR schema is a simple
database schema for data design using a
dimensional model. This schema consists of a
fact table in the center that is directly related to
the dimension tables that surround it. Although
the STAR schema is a relational model, it is not
a normalized model. The snowflake method
normalizes the dimension tables in a STAR
schema.
As mentioned earlier, it is very
important to employ the correct approach
when designing the schema for a data
warehouse. This is where we made our first
mistake. As a result of this, we have two
different database schemas. The first schema
that we designed was a complex snowflake
schema which is depicted in figure 1.1.
Each dimension table contains data for one
dimension. In the above example you get all
your vehicle location information and put that
into one single table called Location. Your store
location data may be spanned across multiple
tables in your OLTP system (unlike OLAP), but
you need to de-normalize all that data into one
single table.
Dimensional attributes:
Within each of the dimensions. These
dimensional hierarchies are the various levels of
detail contained within a business dimension.
Managers can use the dimensional hierarchies
as the paths for drilling down or rolling up in
analysis. Dimensional modeling is the technique
that is in designing a data warehouse. Many
software vendors have expanded their
modeling case tools to include dimensional
Figure 1.1
When the dimensions in a STAR schema are
completely normalized the resulting structure
resembles a snowflake with the fact table in the
middle. In the case of Benchmark, the most
important fact to analyze is the sales
commission which is the revenue for the
company. The transaction table is the fact table
which contains commission as an attribute.
Transactions are analyzed base on dimensions
such as Customer, Employee, Investment, Time,
Date, and the Commission collected.
5|Page
At first, we approached this project from the
wrong direction, and because of that we
normalized the tables in our schema. This was
done because we were thinking in terms of a
transactional system where we would need to
track changes in the price of stocks, bonds, and
other marketable securities. We stored detail
information about employees such as position,
and salary. After we realized that we weren’t
looking at project correctly, we recognized that
this information was not necessary for the data
warehouse. Although a data warehouse is
based on a relational database like one that
would be used in an operational system what
we were building was a decision-support
system. The important component to the entire
project is the ability to track and analyze the
commission. We do not need to worry about
the price of investments or employee salary. In
order to correct our error, we redesigned our
database schema and arrived at a Star schema
which is depicted in figure 1.2. In this schema,
we have the Transaction as the fact table with
four dimension tables: Employee, Customer,
Investment and Date_Time. We kept these
dimensions because we needed to analyze
commission by state, by employee, by
customer, by investment type, and by
investment risk class.
Figure 1.2
From our new Star schema, we were able to
define the data format that we need for the
warehouse. We defined table name, attribute
name, data type of each attribute in each table,
and relationship among tables. From
Benchmark’s operational system, we extracted
data into an excel file that is depicted in figure
1.3. Many of the fields that were extracted are
necessary for an operational system, but were
not needed in the data warehouse. This is
where cleansing data becomes important. In
order to keep the warehouse efficient, we used
Excel to remove the extraneous data before it
was imported. After cleansing the data, we had
attributes that were important to the structure
of our system. An example of the cleansed data
is found in figure 1.4. When we were satisfied
that are data was formatted and cleansed
correctly, we moved to our next step which was
to implement our database schema in Microsoft
Access. This schema is displayed in figure 1.5.
We used Access to define our relationships and
make sure that the system functioned before
importing the database into SQL Server 2008.
6|Page
Figure 1.3
Figure 1.4
7|Page
Figure 1.5
5. Implementation in SQL Server 2008
In order to browse the data that is contained
within the data warehouse, you must design a
data structure called a cube. Constructing the
cube is done in SQL Server Analysis Services. A
cube is comprised of the fact table and all of the
data that is directly related to it. The cube
organizes the data into a format that can be
easily queried, rolled up, drilled down, and
sliced and diced based on the measures and
hierarchies that are applicable to your particular
data set. Importing a database from Access to
SQL Server is supposed to be an easy process,
but trust us it is anything but. Trying to figure
out how to get the program to accept your data
and process it turned out to be one of the
biggest challenges of the whole project.
According to Scott Cameron’s SQL Server 2008
Analysis Services: Step by Step, if you have an
existing relational database such as Access,
Teradata, Oracle, IBM DB2, as well as some
others, you should be able to select the
appropriate driver and connect to your data
source without any difficulties.(Cameron 39) If
only this were true. Due to not having sufficient
security clearance to upload our database onto
the University of Houston-Clear Lake (UHCL)
server, we decided to use a personal laptop to
run
the
software.
Operating
system
compatibility was one of the first problems that
we encountered. The solution to this problem
was to download the applicable service pack
from Microsoft Update. Once the software was
installed we attempted to import the database
from Access. The next attempt to import the
database resulted in being able to import the
data, but this time we could not build or deploy
our cube. Do not get frustrated when you
encounter this problem. We have chosen not to
outline the steps that we took to get the
program to function correctly as they will be
different for each application. Once we had the
database imported and functioning properly,
we commenced building our cube. The ability to
roll up, and drill down your data is based on the
hierarchies that are defined within your
dimensions. This very important step is
depicted in figure 1.6.
8|Page
Figure 1.6
Without defining the hierarchies in each of your
dimensions, you will not have access to all of
the data. When a manager is looking for
information, he may want a very high level of
granularity, or a very low level of granularity.
These types of details are very important when
deciding how to define the dimensions that are
contained within your data warehouse. When
all of the hierarchies are defined, you must set
up the relationships that are contained within
the dimensions. An example of these
relationships can be seen in figure 1.7.
Figure 1.8
6. Browsing the Cube
Keeping in mind that the ultimate goal of the
data warehouse is to provide strategic
information to managers and business owners,
it is now time to browse the cube that you have
created. This is the process where you are
actually designing the queries that will provide
the reports the end user is looking for. The
Benchmark project is concerned primarily with
commission that is collected from each
transaction. In order to get a picture of the
business as a whole it is more reasonable to
query the data for commission from a particular
region, or in our case, by each state. In figure
1.9, we have shown commission by state as it is
presented in the cube browser.
Figure 1.7
When all of hierarchies and relationships are set
up, the cube can be launched. A fully
implemented cube will look something like
figure 1.8.
Figure 1.9
This is a very high level of detail. If you were to
add all of the hierarchies that are available to
this query, you can drill the data down to
provide commission for each employee in each
zip code, as it relates to each type and name of
investment. This is shown in figure 1.10.
9|Page
Figure 1.10
7. Generating useful reports
Being able to browse the cube and design
queries is a very powerful and useful tool.
Unfortunately, to the end user of the system,
some of these queries are almost unreadable
within in the cube browser. Remember that the
final result of this project is to provide strategic
information that will be useful to management
in making decisions that will affect the future
health of the organization. These reports are
not going to be provided to a member of the IT
staff who would be comfortable viewing the
format in the browser. Management will want a
report that can be read and interpreted easily.
Providing these kinds of reports is easily done
once you have a functional cube. The cube that
was initially created within Analysis Services can
also be accessed with Reports Service which is
another very powerful tool that is included in
SQL Server 2008. By creating a Reports Services
project, we were able to generate reports from
the Benchmark warehouse that will be useful to
the owner and management.
The same
information that is depicted in figure 1.9 is
again displayed in figure 1.11 in a much easier
to read format.
Figure 1.11
Another
report
that
the
Benchmark
management wanted was Customers with High
Risk Investments, figure 1.12, which would
allow them to find customers who have money
in an investment that is now considered to be
high risk.
Figure 1.12
10 | P a g e
Even though this particular report does not
directly track commission, it is directly related
to the amount of commission that the company
collects. The goal of Benchmark is help grow the
retirement funds of their customers, and if they
were to ignore these risky investments, they
would lose money, ruin the reputation they
have strived to build, and drive new and
existing customers away. When there are no
customers, there is no commission to keep
track.
Sales by State by Vehicle Name
Lowest sale by dealers in a state
Sales by Quarter
Highest Sale by vehicle type for fourth quarter
Sales by Region
11 | P a g e
Conclusion
Sales by Month
Entering into the process of constructing a data
warehouse with no prior knowledge of the
subject proved to be quite a challenge. It also
turned into an exceptional learning experience.
We learned to carefully analyze the project that
has been presented before diving into it head
first. It is essential to do this so that you can be
sure that the correct approach is being taken in
regards to the end result. Starting with the
desired result and working backwards turned
out to be the direction that we ultimately took
with this project, and is probably a viable
approach to take when designing a data
warehouse. A data warehouse is a report
centric system, so beginning with an
understanding of the desired output will lead to
a much more efficient design plan. We believe
that we have constructed a system that
Benchmark will be able to rely on for their
reporting needs for the foreseeable future.
12 | P a g e
Works Cited
Cameron, Scott. (2009) Microsoft SQL Server 2008 Analysis Services Step by Step. Redmund, Washington:
Microsoft Press.
Ponniah, Paulraj. (2001) Data WarehousingFundamentals: A Comprehensive Guide for IT Professionals.
New York, New York: John Wiley & Sons.
13 | P a g e
Download