Final Report - University of Houston

advertisement
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
HOUSTON E-RETAILERS
ISAM 5332
December 10, 2015
By
Bala Anudeep Guduri
Kavya Hegde
Divya Gangwani
Suhas Malavalli
1
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Contents
Abstract ......................................................................................................................................................... 4
Business Scenario ......................................................................................................................................... 5
Business Need ............................................................................................................................................... 5
Why a Data Warehouse? ............................................................................................................................... 6
Dimension Model.......................................................................................................................................... 7
Fact Table.................................................................................................................................................. 9
Dimensional Model- Star Schema .......................................................................................................... 10
Hierarchies and Dimensions ....................................................................................................................... 11
Database Structure .................................................................................................................................. 12
Dimensions: ........................................................................................................................................ 12
Fact_Sales Table ................................................................................................................................. 18
Dimensions and Fact Table Relationship................................................................................................ 19
Cube Implementation .................................................................................................................................. 20
Steps for building a cube......................................................................................................................... 20
Data Source ......................................................................................................................................... 20
Data Source View ............................................................................................................................... 22
Cube Creation ..................................................................................................................................... 24
Adding Dimensions ............................................................................................................................ 24
Buliding and Deploying the Cube:...................................................................................................... 28
Browsing the Deployed Cube ..................................................................................................................... 30
Reports generated in Data Warehouse ........................................................................................................ 33
Report 1: Yearly Sales Report based on Product Category .................................................................... 33
Report 2: Sales Report based on Gender ................................................................................................ 35
Report 3: Sales Report based on Gender and Customer Age .................................................................. 36
Report 4: Monthly Sales Report based on Customer Age and Gender ................................................... 37
Report 5: Sales Report based on Product Category, Customer Age and Gender.................................... 38
Conclusion .................................................................................................................................................. 39
References:.................................................................................................................................................. 40
2
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
3
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Abstract
This report is intended on containing the overall process involved in the data warehouse from
collecting the transactional data to generating reports for the analysis purposes. This also describes
the OLAP tools that have been utilized in the process.
This report clearly describes the process involved in converting the raw data collected over the
years into useful information. This also describes the Houston E-Retailers needs and requirements
from these useful tools.
4
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Business Scenario
HOUSTON E-RETAILERS, INC is a startup E-Retail store based in Houston, Texas
started in early 2000. This company focusses on carrying various products from different famous
franchisees for a number of internationally recognized brands like Dr. Pepper under Beverages,
Proctor, Dove, Colgate and Gamble for Toys and Cosmetics and many more. It also aims in
meeting its goals of delivering quality products to the customer’s just under one roof thereby
ultimately increasing the customer base and overall profit of the business.
The company is mainly an E-Retail business which is looking on to expand its customer
base and also grow their business and compete amongst their competitors in the E-Retail market.
The business has expanded across the country and has been successful in E-retail market. One of
the biggest concern was to overcome the huge rush and deliver the products at various cities and
places in a timely manner and also provide after sale service to the customers for the products they
purchased. The company also faced problems in analysis of data as the data was not consolidated,
and there was no standard tools to generate reports. But over the years and successful growth of
the company has made it possible to overcome the problems and difficulties faced during its
course.
We have defined sales by Customer Gender, by Product and Gender, by Quarterly sales for
Product and Customer and many more. These reports would help the executives and manager to
analyze the business and make strategic decisions to increase its revenue in the upcoming years.
Business Need
The business needs of the company are as follows:

To build a long-term relationship with the customer.

To run a profitable operation which typically means increasing revenue while limiting the expense.

To analyze the transactional data and finally drive the company towards the success by making
effective strategic decisions.

Use effective tools or techniques which help in analyzing the data and produce reports to know the
trending for past and the upcoming years.
5
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Why a Data Warehouse?
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection
of data in support of management's decision making process. A data warehouse also known as
enterprise data warehouse and is a system used for reporting and data analysis which helps the
Houston E-retails to generate reports and tools based on the data produced. They also store historic
data that helps in creating the trending based reports on Gender, by age, by product. This
information helps to analyze the current trend in the market and improve strategic decisions in
order to gain profit by selling the products which are high in demand. Data warehouse is a
technique that is available to solve the previously mentioned problems and benefit our company
and drive towards success. Hence the best solution that helps executives to make strategic
decisions. Data warehouse is utilized for our E-commerce websites to solve specific issues and
capture the data through the website.
Expected Value:
From this project by the use of various tools to study the companies database, operations
and functions and analyze them to assist the executives and managers to keep track of the products
and sales by various dealers at a particular time. The various OLAP tool used for analysis, dealt
with techniques such as rolling up, drilling down and slicing and dicing through the report. The
reports included what type of customers to target based on the gender or Age group on a particular
category or product, what types of products are sold the maximum at a particular location and their
quantity, sales and product report based on time that is monthly, quarterly, yearly, information
based on vendors.
6
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Dimension Model
Dimensional modeling names a set of techniques and concepts used in data
warehouse design. A dimensional model is based on dimensions, facts, cubes and schemas such
as star and snowflake. In developing a data warehouse, managers think of business in terms of
business dimensions. Houston E-Retailer is an online business that focusses on delivering quality
products to the customers and provide after sale services to the customers and using the snowflake
schema. The business dimensions used are Customer, Vendor, Product, Product Category and
Date. Fact_Sales being the fact table with only measures.
When a business dimension is extracted and represented as a database table, it is called a
dimensional table. A dimensional table provides the textual descriptions of a business dimension
through its attributes. The diagram below describes all the business dimension.
Customer table:
Vendor table:
7
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Date table
Product table and Product Category:
8
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Fact Table
A fact is a measure calculated from facts in the market. A fact is a raw material for
knowledge observed. A fact table is used to maintain measurements. Each row in a fact table
represents data which may relate to a particular customer, a particular product, or sales in a
particular region. In this database the Fact Table is a combination of the Dimension table’s key
attributes like Product_ID, Brand_ID, Customer_ID and Date_ID which have the primary keys in
the dimension tables and connected accordingly. The fact table has a primary key named Sales_ID
and forms a composite primary key combining all the above mentioned primary keys.
Fact Sales Table:
9
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Dimensional Model- Star Schema
The Star Schema shown below has the dimension table positioned at each edge and the fact
table positioned at the center depicting a star.
The dimensions are all connected with the Fact Sales in the center which contains only measures
like Quantity, Price etc.
The Product and Product Category are related to show the relation between the two.
10
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Hierarchies and Dimensions
Dimensional Hierarchies defines a sequence of mappings from a set of low-level concepts
to higher-level, more general concepts. The dimensional levels in a hierarchy form a tree like
structure. Members at lowest level are called leaf members and they are connected to a single
member at the highest level. A hierarchy enables the user to view detailed information based on
the requirements. Thus if a user wants to see the data based on the year then the user can select the
hierarchy based on Date Dimension and Drill down by selecting Year > Quarter > Month >Day.
Thus Day is the lowest level of data which allows summarizing the data to users.
The following defines the hierarchies for Houston E- Retailers.
The figure shows the hierarchies of the dimensions:
Hierarchies are defined for two dimensions: Date, Product
Product Hierarchy and Date Hierarchy
11
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Database Structure
Dimensions:
Customer Table Design:
Has the Customer_ID as primary key and the rest of the attributes are shown in the structure below.
Customer table screenshot on Access Db:
12
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Customer Table Data
Screenshot of the table after the data is inserted.
13
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Vendor Table Design:
Primary Key: Brand_ID
Vendor Table Data
Table data are the insertion.
14
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Date Table Design:
Primary Key: Date Key
Date Table Data
15
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Product Table Design:
Primary Key: Product_ID
Product Table Data
16
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Product Category Table Design
Product Category Table Data
17
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Fact_Sales Table
Fact Sales Table Design:
Primary Key: Sales_ID
Fact_Sales Table Data:
18
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Dimensions and Fact Table Relationship
The relationship has been established between the Fact Sales and the dimension tables in such a
way that the Fact sales table has primary key from each of the dimension tables. For ex: the
customer table’s primary key “Customer_ID” is linked to “Customer_ID” in Fact sales. Hence the
Fact sales has a composite primary key.
19
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Cube Implementation
Based on Houston E-Retailer databases with the help of SQL Server Analysis Services we
performed online analytical processing (OLAP) and data mining. SQL Server Analysis Services
requires data to be retrieved from an existing database and then forms the schema and cubes.
Here we followed some basic steps for the process:
 Created an OLAP database in the SQL Server, which uses E-Commerce database in Access as data
source, and make a connection to this database.
 Build data cube schema, named as Fact_ Sales, using the existing linked tables.
 Build Fact Sale table and Product, Customer, Vendor and Date dimension tables.
 Process the E-CommerceRetailers_Cube to populate data in various hierarchies.
Steps for building a cube
Initail Steps for Building the cube:
Data Source
1) Creating the cube in Visual Studio BI
20
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
2)Connecting the database to the server (SBUS-DB)
3) Creating data source (From the database in the server)
21
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Data Source View
4) Creating Data SourceView using exsisting connection
5) Adding the required tables to the cube. The right hand side is the finalized tables that is going
to be used in cube implementation.
22
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Completing the data source view step
The relationship diagram is populated as shown below once the wizad is completed.
23
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Cube Creation
6)Finally building the cube by right clicking on the Cube node in the solution explorer.
Adding Dimensions
7) Creating dimensions
Customer Dimesions
We have added a calculated column called Age using a Case statement.
24
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Named Calculation:
Calculated columns in Customer Dimensions
Exploring Customer Dimension and checking the named columns:
The last column in the Customer table is the “Age” that resulted from the calculation above
25
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Product Dimensions
Added a new dimension Product and its hierarchy
Hierarchies in Product Dimension
Attribute Relationship : The Product Category rolls up to Brand Name , Brand Name to Product
Name and so on.
26
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Date Dimension
As the date dimension table was huge, we only wanted a specific set of attributes for our
implementation. Hence we pulled required set of attributes into the attribute column. Resulting is
the below screenshot of the Date dimension. The hierarchies also have been included in the
Hierarchy column.
Date Attribute Relationship:
27
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Final look of the Fact Sales table on the Measures column:
Buliding and Deploying the Cube:
Navigate to Build > Build the project, to build the project
Deploy the cube
28
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Successful message of the cube deployment.
29
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Browsing the Deployed Cube
1. To switch to Cube Designer in SQL Server Data Tools, double-click the Analysis Services
Tutorial cube in the Cubes folder of the Solution Explorer
2. Open the Browser tab, and then click the Reconnect button on the toolbar of the designer.
30
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
2. Click the Excel icon to launch Excel using the workspace database as the data source.
When prompted to enable connections, click Enable.
3. In the PivotTable Field List, expand Fact Sales, and then drag the Sales Amount measure
to the Values area.
31
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
4. In the PivotTable Field List, expand Product.
5. Drag the Product Hierarchy user hierarchy to the Columns area.
7. In the PivotTable Field List, expand Date, and then drag the Date_H hierarchy to the
COLUMNS area.
32
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Reports generated in Data Warehouse
Report 1: Yearly Sales Report based on Product Category
Sales Report in Excel:
33
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Graphical representation:
Conclusion:
From the above report we can know that cosmetics sales are high in a fiscal year which is 7,749.88
with is 21 % of overs all sales of “Houston E retailers “
34
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Report 2: Sales Report based on Gender
Sales report considering the gender
Conclusion:
From the above Customer gender based report we can analyze the female customers are
making more purchases than male customers. And within a fiscal year comparing two half year
the female customer purchasing increased and male customer purchasing decreased. Hence we can
conclude the purchasing trend of the female customers is increasing.
35
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Report 3: Sales Report based on Gender and Customer Age
Sales report considering two factors, Gender and the age and grouping the age to get a better result
Graphical Represenation:
Conclusion:
From the report we can infer that Female customers under the age group 15-25 generate more sales
than the other group mentioned in the graph.
36
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Report 4: Monthly Sales Report based on Customer Age and Gender
This report is for each semester for the year 2006.
Graphical Representation:
Conclusion:
From this we can infer that the customers under the age group 40-60 for both the genders have to
be looked up upon to retain them in the future years.
37
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Report 5: Sales Report based on Product Category, Customer Age and
Gender
Here in this report we have considered 3 factors, Product Category, Customer Age and Gender
Graphical Representation:
Conclusion:
When considered both male and female customers between ages 40-60, the sale is quite the same.
We can see a major difference between the 15-20 age group for every product under the category.
A good area to concentrate more on would be the 40-60 age group to retain them
38
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
Conclusion
Hence we have successfully built a data warehouse specifically for our company, the
Houston E-Retailer from the data that we collected. We have successfully developed a cube which
gives an organized and summarized data which is current or collected over the years. We have also
learnt generating different reports according to our requirement and analyzing the measures for a
given criteria. We used techniques such as drill down, roll up, Slicing and Dicing and Data mining
techniques on our database to generate various reports.
The whole process that we went through while constructing a data warehouse has given us
a lot of insight on the data warehousing concepts and also with regards to cube and its
implementation. There were many lessons learnt while we were trying to implement the data
warehouse. Solving those issues gave us a better understanding of how a data warehouse
implemented.
On an overall note, the project was really a very interesting and learning experience from
the start to finish.
39
ISAM 5332: Data Warehouse and Data Mining
Houston E-Retailers Final Report
References:
1. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals by
Paulraj Ponniah.
2. https://mis.uhcl.edu/rob/Course/DW/Resources/SQL%20Server%202012%20Analysis%
20Services%20Multidimensional%20Modeling.pdf
3. http://www.generatedata.com/
40
Download