ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report HOUSTON E-RETAILERS ISAM 5332 December 10, 2015 By Bala Anudeep Guduri Kavya Hegde Divya Gangwani Suhas Malavalli 1 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Contents Abstract ......................................................................................................................................................... 4 Business Scenario ......................................................................................................................................... 5 Business Need ............................................................................................................................................... 5 Why a Data Warehouse? ............................................................................................................................... 6 Dimension Model.......................................................................................................................................... 7 Fact Table.................................................................................................................................................. 9 Dimensional Model- Star Schema .......................................................................................................... 10 Hierarchies and Dimensions ....................................................................................................................... 11 Database Structure .................................................................................................................................. 12 Dimensions: ........................................................................................................................................ 12 Fact_Sales Table ................................................................................................................................. 18 Dimensions and Fact Table Relationship................................................................................................ 19 Cube Implementation .................................................................................................................................. 20 Steps for building a cube......................................................................................................................... 20 Data Source ......................................................................................................................................... 20 Data Source View ............................................................................................................................... 22 Cube Creation ..................................................................................................................................... 24 Adding Dimensions ............................................................................................................................ 24 Buliding and Deploying the Cube:...................................................................................................... 28 Browsing the Deployed Cube ..................................................................................................................... 30 Reports generated in Data Warehouse ........................................................................................................ 33 Report 1: Yearly Sales Report based on Product Category .................................................................... 33 Report 2: Sales Report based on Gender ................................................................................................ 35 Report 3: Sales Report based on Gender and Customer Age .................................................................. 36 Report 4: Monthly Sales Report based on Customer Age and Gender ................................................... 37 Report 5: Sales Report based on Product Category, Customer Age and Gender.................................... 38 Conclusion .................................................................................................................................................. 39 References:.................................................................................................................................................. 40 2 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report 3 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Abstract This report is intended on containing the overall process involved in the data warehouse from collecting the transactional data to generating reports for the analysis purposes. This also describes the OLAP tools that have been utilized in the process. This report clearly describes the process involved in converting the raw data collected over the years into useful information. This also describes the Houston E-Retailers needs and requirements from these useful tools. 4 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Business Scenario HOUSTON E-RETAILERS, INC is a startup E-Retail store based in Houston, Texas started in early 2000. This company focusses on carrying various products from different famous franchisees for a number of internationally recognized brands like Dr. Pepper under Beverages, Proctor, Dove, Colgate and Gamble for Toys and Cosmetics and many more. It also aims in meeting its goals of delivering quality products to the customer’s just under one roof thereby ultimately increasing the customer base and overall profit of the business. The company is mainly an E-Retail business which is looking on to expand its customer base and also grow their business and compete amongst their competitors in the E-Retail market. The business has expanded across the country and has been successful in E-retail market. One of the biggest concern was to overcome the huge rush and deliver the products at various cities and places in a timely manner and also provide after sale service to the customers for the products they purchased. The company also faced problems in analysis of data as the data was not consolidated, and there was no standard tools to generate reports. But over the years and successful growth of the company has made it possible to overcome the problems and difficulties faced during its course. We have defined sales by Customer Gender, by Product and Gender, by Quarterly sales for Product and Customer and many more. These reports would help the executives and manager to analyze the business and make strategic decisions to increase its revenue in the upcoming years. Business Need The business needs of the company are as follows: To build a long-term relationship with the customer. To run a profitable operation which typically means increasing revenue while limiting the expense. To analyze the transactional data and finally drive the company towards the success by making effective strategic decisions. Use effective tools or techniques which help in analyzing the data and produce reports to know the trending for past and the upcoming years. 5 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Why a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. A data warehouse also known as enterprise data warehouse and is a system used for reporting and data analysis which helps the Houston E-retails to generate reports and tools based on the data produced. They also store historic data that helps in creating the trending based reports on Gender, by age, by product. This information helps to analyze the current trend in the market and improve strategic decisions in order to gain profit by selling the products which are high in demand. Data warehouse is a technique that is available to solve the previously mentioned problems and benefit our company and drive towards success. Hence the best solution that helps executives to make strategic decisions. Data warehouse is utilized for our E-commerce websites to solve specific issues and capture the data through the website. Expected Value: From this project by the use of various tools to study the companies database, operations and functions and analyze them to assist the executives and managers to keep track of the products and sales by various dealers at a particular time. The various OLAP tool used for analysis, dealt with techniques such as rolling up, drilling down and slicing and dicing through the report. The reports included what type of customers to target based on the gender or Age group on a particular category or product, what types of products are sold the maximum at a particular location and their quantity, sales and product report based on time that is monthly, quarterly, yearly, information based on vendors. 6 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Dimension Model Dimensional modeling names a set of techniques and concepts used in data warehouse design. A dimensional model is based on dimensions, facts, cubes and schemas such as star and snowflake. In developing a data warehouse, managers think of business in terms of business dimensions. Houston E-Retailer is an online business that focusses on delivering quality products to the customers and provide after sale services to the customers and using the snowflake schema. The business dimensions used are Customer, Vendor, Product, Product Category and Date. Fact_Sales being the fact table with only measures. When a business dimension is extracted and represented as a database table, it is called a dimensional table. A dimensional table provides the textual descriptions of a business dimension through its attributes. The diagram below describes all the business dimension. Customer table: Vendor table: 7 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Date table Product table and Product Category: 8 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Fact Table A fact is a measure calculated from facts in the market. A fact is a raw material for knowledge observed. A fact table is used to maintain measurements. Each row in a fact table represents data which may relate to a particular customer, a particular product, or sales in a particular region. In this database the Fact Table is a combination of the Dimension table’s key attributes like Product_ID, Brand_ID, Customer_ID and Date_ID which have the primary keys in the dimension tables and connected accordingly. The fact table has a primary key named Sales_ID and forms a composite primary key combining all the above mentioned primary keys. Fact Sales Table: 9 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Dimensional Model- Star Schema The Star Schema shown below has the dimension table positioned at each edge and the fact table positioned at the center depicting a star. The dimensions are all connected with the Fact Sales in the center which contains only measures like Quantity, Price etc. The Product and Product Category are related to show the relation between the two. 10 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Hierarchies and Dimensions Dimensional Hierarchies defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts. The dimensional levels in a hierarchy form a tree like structure. Members at lowest level are called leaf members and they are connected to a single member at the highest level. A hierarchy enables the user to view detailed information based on the requirements. Thus if a user wants to see the data based on the year then the user can select the hierarchy based on Date Dimension and Drill down by selecting Year > Quarter > Month >Day. Thus Day is the lowest level of data which allows summarizing the data to users. The following defines the hierarchies for Houston E- Retailers. The figure shows the hierarchies of the dimensions: Hierarchies are defined for two dimensions: Date, Product Product Hierarchy and Date Hierarchy 11 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Database Structure Dimensions: Customer Table Design: Has the Customer_ID as primary key and the rest of the attributes are shown in the structure below. Customer table screenshot on Access Db: 12 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Customer Table Data Screenshot of the table after the data is inserted. 13 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Vendor Table Design: Primary Key: Brand_ID Vendor Table Data Table data are the insertion. 14 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Date Table Design: Primary Key: Date Key Date Table Data 15 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Product Table Design: Primary Key: Product_ID Product Table Data 16 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Product Category Table Design Product Category Table Data 17 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Fact_Sales Table Fact Sales Table Design: Primary Key: Sales_ID Fact_Sales Table Data: 18 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Dimensions and Fact Table Relationship The relationship has been established between the Fact Sales and the dimension tables in such a way that the Fact sales table has primary key from each of the dimension tables. For ex: the customer table’s primary key “Customer_ID” is linked to “Customer_ID” in Fact sales. Hence the Fact sales has a composite primary key. 19 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Cube Implementation Based on Houston E-Retailer databases with the help of SQL Server Analysis Services we performed online analytical processing (OLAP) and data mining. SQL Server Analysis Services requires data to be retrieved from an existing database and then forms the schema and cubes. Here we followed some basic steps for the process: Created an OLAP database in the SQL Server, which uses E-Commerce database in Access as data source, and make a connection to this database. Build data cube schema, named as Fact_ Sales, using the existing linked tables. Build Fact Sale table and Product, Customer, Vendor and Date dimension tables. Process the E-CommerceRetailers_Cube to populate data in various hierarchies. Steps for building a cube Initail Steps for Building the cube: Data Source 1) Creating the cube in Visual Studio BI 20 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report 2)Connecting the database to the server (SBUS-DB) 3) Creating data source (From the database in the server) 21 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Data Source View 4) Creating Data SourceView using exsisting connection 5) Adding the required tables to the cube. The right hand side is the finalized tables that is going to be used in cube implementation. 22 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Completing the data source view step The relationship diagram is populated as shown below once the wizad is completed. 23 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Cube Creation 6)Finally building the cube by right clicking on the Cube node in the solution explorer. Adding Dimensions 7) Creating dimensions Customer Dimesions We have added a calculated column called Age using a Case statement. 24 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Named Calculation: Calculated columns in Customer Dimensions Exploring Customer Dimension and checking the named columns: The last column in the Customer table is the “Age” that resulted from the calculation above 25 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Product Dimensions Added a new dimension Product and its hierarchy Hierarchies in Product Dimension Attribute Relationship : The Product Category rolls up to Brand Name , Brand Name to Product Name and so on. 26 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Date Dimension As the date dimension table was huge, we only wanted a specific set of attributes for our implementation. Hence we pulled required set of attributes into the attribute column. Resulting is the below screenshot of the Date dimension. The hierarchies also have been included in the Hierarchy column. Date Attribute Relationship: 27 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Final look of the Fact Sales table on the Measures column: Buliding and Deploying the Cube: Navigate to Build > Build the project, to build the project Deploy the cube 28 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Successful message of the cube deployment. 29 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Browsing the Deployed Cube 1. To switch to Cube Designer in SQL Server Data Tools, double-click the Analysis Services Tutorial cube in the Cubes folder of the Solution Explorer 2. Open the Browser tab, and then click the Reconnect button on the toolbar of the designer. 30 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report 2. Click the Excel icon to launch Excel using the workspace database as the data source. When prompted to enable connections, click Enable. 3. In the PivotTable Field List, expand Fact Sales, and then drag the Sales Amount measure to the Values area. 31 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report 4. In the PivotTable Field List, expand Product. 5. Drag the Product Hierarchy user hierarchy to the Columns area. 7. In the PivotTable Field List, expand Date, and then drag the Date_H hierarchy to the COLUMNS area. 32 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Reports generated in Data Warehouse Report 1: Yearly Sales Report based on Product Category Sales Report in Excel: 33 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Graphical representation: Conclusion: From the above report we can know that cosmetics sales are high in a fiscal year which is 7,749.88 with is 21 % of overs all sales of “Houston E retailers “ 34 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Report 2: Sales Report based on Gender Sales report considering the gender Conclusion: From the above Customer gender based report we can analyze the female customers are making more purchases than male customers. And within a fiscal year comparing two half year the female customer purchasing increased and male customer purchasing decreased. Hence we can conclude the purchasing trend of the female customers is increasing. 35 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Report 3: Sales Report based on Gender and Customer Age Sales report considering two factors, Gender and the age and grouping the age to get a better result Graphical Represenation: Conclusion: From the report we can infer that Female customers under the age group 15-25 generate more sales than the other group mentioned in the graph. 36 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Report 4: Monthly Sales Report based on Customer Age and Gender This report is for each semester for the year 2006. Graphical Representation: Conclusion: From this we can infer that the customers under the age group 40-60 for both the genders have to be looked up upon to retain them in the future years. 37 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Report 5: Sales Report based on Product Category, Customer Age and Gender Here in this report we have considered 3 factors, Product Category, Customer Age and Gender Graphical Representation: Conclusion: When considered both male and female customers between ages 40-60, the sale is quite the same. We can see a major difference between the 15-20 age group for every product under the category. A good area to concentrate more on would be the 40-60 age group to retain them 38 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report Conclusion Hence we have successfully built a data warehouse specifically for our company, the Houston E-Retailer from the data that we collected. We have successfully developed a cube which gives an organized and summarized data which is current or collected over the years. We have also learnt generating different reports according to our requirement and analyzing the measures for a given criteria. We used techniques such as drill down, roll up, Slicing and Dicing and Data mining techniques on our database to generate various reports. The whole process that we went through while constructing a data warehouse has given us a lot of insight on the data warehousing concepts and also with regards to cube and its implementation. There were many lessons learnt while we were trying to implement the data warehouse. Solving those issues gave us a better understanding of how a data warehouse implemented. On an overall note, the project was really a very interesting and learning experience from the start to finish. 39 ISAM 5332: Data Warehouse and Data Mining Houston E-Retailers Final Report References: 1. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals by Paulraj Ponniah. 2. https://mis.uhcl.edu/rob/Course/DW/Resources/SQL%20Server%202012%20Analysis% 20Services%20Multidimensional%20Modeling.pdf 3. http://www.generatedata.com/ 40