ARCHVES M 'ASSAC Improving Automotive Battery Sales Forecast E T T 1TE JUL 16 2015 by LIBRARIES Vinod Bulusu Master of Business Administration, IE Business School, 2015 Master of Science, Chemical Engineering, University of New Hampshire, 2006 and Haekyun Kim Bachelor of Science, Mechanical Engineering, SungKyunKwan University, 2008 SUBMITTED TO THE ENGINEERING SYSTEMS DIVISION IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ENGINEERING IN LOGISTICS AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUNE 2015 @ 2015 Vinod Bulusu and Haekyun Kim. All rights reserved. The authors hereby grant to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author......................... ................... . Master of Engineering in Logistics Program, Engineering Systems Division Sianatu 0 re Af Au thor ................... redacted Signature ...... May8,2015 ... .......... Master of Engineering in Logistics Program, Engineering Systems Division May 8 2015 Signature redacted . ........... C e rtifie d by............................................. . . .... ................... Dr. Roberto Perez-Franco Research Associate, Center for Transportation and Logistics Accepted by......................... Siqnature redacted 1 ........ .. . ........ . ......... Dr. Yossi Sheffi Director, Center for Transportation and Logistics Elisha Gray 11 Professor of Engineering Systems Professor, Civil and Environmental Engineering 1 Improving Automotive Battery Sales Forecast by Vinod Bulusu and Haekyun Kim Submitted to the Engineering Systems Division on May 8, 2015 in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Logistics Abstract Improvement in sales forecasting allows firms not only to respond quickly to customers' needs but also to reduce inventory costs, ultimately increasing their profits. Sales forecasts have been studied extensively to improve their accuracy in many different fields. However, for automotive batteries, it is very difficult to develop a highly accurate forecast model because many variables need to be considered and their correlations are complex. Additionally, current sales forecasts are derived from historical data and thus do not include any other causal factor analysis. In this study we applied causal factor analysis to determine how the forecast accuracy could be improved. We focused on understanding the relationship between temperature and sales. Using regression modelling, we found that there is a quadratic relationship between temperature and battery sales. We validated the model by comparing the actual and predicted sales for various geographies and times. We concluded that the model is more robust for predicting sales across various times than through various geographies. Thesis Supervisor: Dr. Roberto Perez-Franco Title: Research Associate, Center for Transportation and Logistics 2 Acknowledgements This effort is dedicated to my wife, Madhuri......thanks for being there always I gratefully acknowledge the Office of the Dean for Graduate Education and O'Biren family for the generous fellowship throughout the program. I would also like to thank our thesis sponsor for letting us tap onto their knowledge to enable us to complete our thesis. I express my gratitude to Dr. Roberto Perez-Franco for his continuous support and encouragement even during challenging times. I would also thank Haekyun Kim who spent numerous nights working on this thesis for his positive energy and flexibility. I owe a huge debt to my wife, Madhuri and sons Advaita and Atharva for the time not spent with them. I wish to express my sincere thanks to my thesis sponsor for providing this great opportunity. I am also grateful to Dr. Roberto Perez-Franco. I am extremely thankful and indebted to him for sharing expertise, and sincere and valuable guidance and encouragement extended to me. I take this opportunity to express gratitude to all of the Department faculty members for their help and support. I also thank my wife Eunjung for the unceasing encouragement, support and attention. I am also grateful to my thesis partner Vinod Bulusu who supported me through this venture. 3 Table of Contents A b stra ct ........................................................................................................................................... 2 Acknow ledgem ent ......................................................................................................................... 3 F igu re s ............................................................................................................................................. 5 Ta b les .............................................................................................................................................. 7 1. Introduction................................................................................................................................ 8 2. Literature Review ....................................................................................................................... 9 2.1 M odels to predict the age of lead-acid battery ............................................................... 10 2.2 Connecting Point of Sale (POS) to Forecasting ............................................................... 14 2.3 Conclusion: The need for a m ultivariate m odel of POS data........................................... 15 3. M ethodology, Data and Analysis ........................................................................................ 17 3.1 Overall M ethodology........................................................................................................... 17 3.2 Data Collection .................................................................................................................... 17 3.3 M odeling (data from 2010 -2014).................................................................................... 34 3.4 M odeling validation............................................................................................................. 39 4. Validation of the approach ................................................................................................... 42 4.1 M odel G diagnostics ....................................................................................................... 44 4.2 M odel T diagnostics ............................................................................................................ 47 4.3 Insights on the validity of the approach........................................................................... 49 5. Conclusion and Future W ork .............................................................................................. 53 References .................................................................................................................................... 54 4 Figures Figu re 3-1 : P ro cess flo w ................................................................................................................ 17 Figure 3-2: POS data com position ............................................................................................ 19 Figure 3-3: SKU sales by tim e.................................................................................................... 21 Figure 3-4: Zip code of sales ..................................................................................................... 22 Figure 3-5: Top 10 sales by SKU ................................................................................................. 23 Figure 3-6. Geographical sales of SKU 65 ................................................................................. 24 Figure 3-7: Aggregated sales in Boston area ............................................................................ 25 Figu re 3-8 : Sales o f 5 cities............................................................................................................ 26 Figure 3-9: Temperature profiles of 25 stations in Boston area ............................................... 28 Figure 3-10: Average weekly temperature of the entire Boston area ...................................... 28 Figure 3-11. Location of regions rem oved................................................................................. 29 Figure 3-12: 5 Stations not applicable to temperature aggregation ........................................ 30 Figure 3-13: Average weekly temperature of the entire LA area............................................. 30 Figure 3-14: Temperature profiles of 10 stations in Houston area ........................................... 31 Figure 3-15: Average weekly temperature of the entire Houston area .................................... 31 Figure 3-16: Temperature profiles of 15 stations in DC area ................................................... 32 Figure 3-17: Average weekly temperature of the entire DC area ............................................. 32 Figure 3-18: Temperature profiles of 10 stations in Chicago area .......................................... 33 Figure 3-19: Average weekly temperature of the entire Chicago area .................................... 33 Figure 3-20. Diagnostics of M odel 1.......................................................................................... 35 5 Figure 3-21. Diagnostics of M odel 2 .......................................................................................... 36 Figure 3-22. Diagnostics of M odel 3 .......................................................................................... 37 Figure 3-23. Diagnostics of M odel 4 .......................................................................................... 38 Figure 3-24. Diagnostics of M odel 5 .......................................................................................... 38 Figure 3-25. Diagnostics of M odel 6 .......................................................................................... 39 Figure 4-1: Model diagnostics for Model G: R 2 44 . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. Figure 4-2: Pareto Plot for M odel G .......................................................................................... 45 Figure 4-3: Param eter Estim ates for M odel G ........................................................................... 45 Figure 4-4: Prediction Expression for M odel G ........................................................................ 46 . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . Figure 4-5: Model diagnostics for Model T: R2 47 Figure 4-6: Pareto Plot for M odel T ........................................................................................... 48 Figure 4-7: Prediction Expression for M odel T .......................................................................... 48 Figure 4-8: Model validation for Boston (Model G).................................................................. 49 Figure 4-9: Model validation for Washington D.C. (Model G)................................................. 50 Figure 4-10: Model validation for Year 2012 (Model T) ............................................................. 50 Figure 4-11: Model validation for Boston (Model G) based on change ................................... 52 Figure 4-12: Model validation for Washington D.C. (Model G) based on change.................... 52 6 Tables Table 3-1: Sell-in & Sell-out data features................................................................................. 18 Table 3-2: N orm alized sales...................................................................................................... 27 Table 4-1: V alidation of m odels ................................................................................................. 42 Table 4-2: Minimum and Maximum Temperatures across the cities....................................... 43 Table 4-3: Minimum and Maximum Temperatures across the years ...................................... 7 43 1. Introduction Our thesis sponsor is a global diversified technology and industrial leader serving customers in more than 150 countries. Especially, they are the global leader in lead-acid automotive batteries and advanced batteries for start-stop, hybrid and electric vehicles. Their market share almost reached 40% (2013) in the US. Our thesis sponsor sells batteries through major automotive service retailers such as AutoZone and traditional supermarkets such as Walmart. In 2013, our sponsor company saw a phenomenal increase in automotive battery sales. Since the ramp up to production is a long process, the company could not meet demand. As a result, the company lost sales and also lost the opportunity to increase market share. Because of this experience, our sponsor company is aware of the risks brought by the variability in demand and of the importance of forecasts. Their current forecasting model is based solely on historical sales data and does not include other variables which could influence battery failure and thus sales of new batteries. Since, our thesis sponsor's mainly deals in after-market replacement battery sales, most of these sales occur due to a battery failure. Hence, battery failures correspond to battery sales. To prevent such a problem of lost sales and to meet unexpected market demand in a timely manner, a good forecast is highly desirable. In this thesis, we will propose a methodology to improve sales forecast for our sponsor. Several previous studies have suggested that temperature has an impact on the failure rate of batteries. Therefore, in this thesis we explore the link between temperature and sales in the aftermarket battery. In the following chapters, we present our literature review, methodology, results, discussion, and conclusion of our thesis. 8 2. Literature Review Many researchers such as Ruetschi (2004), Doerffel and Sharkh (2006), and Sauer and Wenzl (2007) have performed experimental and computational studies of the factors determining battery life. More recently, Waldman et al. (2014) identified aging mechanisms for Lithium ion batteries. Most of these factors are internal to the battery, such as the chemical reaction kinetics, corrosion and loss of water. However measuring these factors in daily life is cumbersome and thus the failure rate of the batteries in a market cannot be predicted accurately in a practical manner. To create a reliable model for forecasting, Geurts et a/. (1996) highlighted that the quality of data is of paramount importance. There has been a lot of literature about the using the POS information to forecast the demand, but as Keifer (2010) points out, forecasting based on POS suffers from a retrospective analysis bias. Furthermore, during new product introductions this approach is not applicable as there is no historical data. Additionally, Keifer (2010) introduces new approaches to forecast new product introductions and web based services. However, there is no discussion about using a multivariate approach or identifying correlations between multiple physical variables and demand for physical products such as replacement batteries. Multiple studies have been conducted to determine the age of batteries. These studies can be divided into three major categories: " Experimental studies * Computational studies " Combination of experimental and computational studies 9 These categories were determined based on the tools used for these studies as they impact the results. In subsequent sections in this literature review, these three approaches will be discussed. Additionally, the approaches to handle data to forecast are reviewed and discussed. 2.1 Models to predict the age of lead-acid battery Various Mechanisms of Aging Ruetschi (2004) provides a summary of the aging mechanism and the impact of various mechanisms on battery-life. Additionally, the significance of each aging mechanism and the impact of each mechanism on the various types of lead-acid batteries is determined: * Anodic corrosion: This is the natural aging mechanism of positive plates. This mechanism is mostly common in automotive batteries and stand-by batteries. Additionally, this mechanism is accelerated by battery misuse. " Positive mass degradation: Batteries subjected to cycling such as city buses which make frequent stops and short trips can cause a shallow discharge cycle and thus degrade the positive mass. The positive mass will become softer and will shed. " Irreversible formation of lead sulfate: This mechanism can occur in batteries subjected to higher temperature and/or in the batteries which have a slow discharge rate for a lengthy duration. " Short Circuit: This mechanism is common in automotive batteries and in train-lighting batteries where the usage conditions can be harsh. " Loss of water: This mechanism is common in batteries exposed to higher temperature. 10 Although Ruetschi (2004) studies several aging mechanisms in detail, a quantitative understanding of the impact of temperature or temperature exposure is not established. Experimental Studies In this section the experimental studies from three different researchers are discussed in detail. These researchers have compared and predicted battery life and studied aging behavior in Li-ion and lead acid batteries. Doerffel and Sharkh (2006) performed experimental studies to predict the remaining battery life. They also compared the results from experimental studies to the existing standard of determining battery capacity empirically by Peukert's equation, which relates the battery capacity to the discharge current. Based on the result, it was determined that Peukert's equation is applicable only for constant battery discharges and if the battery discharge rate is variable the Peukert's equation underestimates the capacity. In a research article by Thomas et al. (2014), the effects of temperatures on the aging behavior of cycled lithium-ion batteries are investigated quantitatively by electrochemical methods and post-mortem analysis. The results are that temperature dependent aging mechanisms are found by Arrhenius plots, that the different aging mechanism are proven by post-mortem analysis and that the reason for the different mechanisms is found by testing with reference electrodes. All of these results combined confirm that temperature plays an integral role in batteries life cycle (Kouba, 2014). One limitation to Thomas et al. study is that it is focused on Lithium-ion batteries, so it is difficult to apply the results to all automotive batteries. Another limitation is that the 11 sample consisted of a small number of batteries. The correlations may have been different if batteries had different conditions at a time when testing. Lu et al. (2014) identified the factors influencing the life cycle of lead-acid batteries in small electric vehicles. The result was that the battery performance and the cycle life improved when the following four methods were used: the combination of grid alloys, mixing paste and curing process parameters control, the selection of the negative organic additives and the sets mode of the positive and negative plates. These results explicitly show that there are many variables to consider when predicting the life cycle of batteries. Computational Models Computational models are needed as battery aging is irregular and complicated, thus the aging mechanism cannot be replicated. In this section the heuristic model to determine the battery life and some improvements to the basic heuristic model are discussed. Schiffer et al. (2006) argued that determining the lifetime of a lead-acid battery is complicated because of the irregular operating conditions and the complexity in replicating those conditions experimentally. Hence, a heuristic model is developed, taking into account the impact of various aging mechanisms. Additionally, the results of the model were verified against existing results to validate the model. This model can be used as a systems model for various battery type and operating conditions. Various input parameters of the model include, battery temperature (which is assumed to be ambient temperature for specific conditions), aging mechanism (such as corrosion model and degradation) and state of charge current. Based on the results and by 12 comparing them with existing data, this model can be used to determine the lifetime of different battery types; however it can be further refined for conditions where the operating current is higher than 10 Ah (ampere-hour). Esfahanian, Torabi and Mosahebi (2008) refined the model by using computational fluid dynamics (CFD) and Equivalent Circuit Model (ECM) techniques. This model is better due to the fast computation time and greater accuracy from previous models. Combination modelling approach In addition to the heuristic model, the modelling approaches can simulate physicochemical mechanisms and consider the incremental decrease in life due to each aging mechanism to predict the battery life. These approaches are discussed in this section. Sauer and Wenzl (2008) further studied different modelling approaches and provide pros and cons of various approaches. Three different modelling approaches are created: " Physicochemical aging model: This model includes the aging mechanism of the battery to simulate the battery life. Each mechanism is simulated and the battery life is predicted. This modeling approach is the most complex due to the immense input conditions needed. However, the benefit of this approach is that this could be translated very easily across the various battery types. * Weighed Ah aging model: This is a heuristic model based on the systems design as performed by Schiffer et al. (2006). This model does not provide any avenues for 13 continuous improvement to battery manufacturers. However, this is a very powerful model in terms of speed of results. * Event-oriented aging model: This model is based on the understanding of incremental loss due to each failure mode. This is challenging as the expectation of this model is to quantify each failure mode. 2.2 Connecting Point of Sale (POS) to Forecasting In the previous section various approaches to determine the battery life were discussed. However, these approaches are not based on any easily measurable physical characteristics and are difficult to determine. Hence forecasting approaches are needed to determine the battery life. In this section various forecasting approaches being employed to determine the demand are discussed. Michael et al. (1996) answered five specific questions for guiding any study. First, who collected the data? Second, why were the data collected? Third, are the sales time series reasonable, consistent, and logical? Fourth, how were the data gathered? Fifth, are the sales figures based on a sample or census? The important issue for forecasters is to know the limitations of the data and any biases that might exist in the data. They suggested there are a few distortions in company generated data due to company politics such as sales quota, tax handling, accounting method, etc. As a result, adjustments to data are required to improve sales forecast. This research concludes that we have to consider the quality of the data used to forecast as well as the models used to make forecasts. 14 Keifer (2010) identified the weakness in using POS data for forecasting due to the historical nature of these forecast model. Another weakness identified is that they do not work for new products. Demand signals, pre-order sites, prediction markets, gift registries, wish lists, search engines and Web-site usage analysis are suggested as methods to determine the demand of new products. However, these methodologies are applicable to internet based products and/or for new products and are not transferable to products sold in brick and mortar stores. William et al. (2014) determined that using POS data improves the forecast accuracy. In their study they evaluated the demand of a consumable product. The orders from retailer to suppliers and retailer's POS data was analyzed and they concluded that the POS data is more related to actual demand of consumers than retailer orders, showing retailer's orders weren't actual responses to the market demand. The forecasting with POS data was shown to outperform other approaches by up to 125%. One critical gap of this approach is that POS data don't include too much information other than the number of unit sold. However, POS data could be very useful if they are incorporated with other important variables. 2.3 Conclusion: The need for a multivariate model of POS data Based on the literature review, several models exist to predict the lifetime of an individual battery. They can be broadly classified into heuristic, physicochemical and event-based. However, they are difficult to apply to an entire market of batteries in real life as some of the input parameters (such as corrosion, water loss or short-circuit) are difficult to measure on a daily 15 basis. Additionally, there is no clear connection of these parameters with the external environment such as temperature which is easier to measure and monitor. Also, although there are several approaches to predict the demand for web based services and for new products, there is little information on the approaches for predicting sales of products sold in brick and mortar stores. Thus, there is a clear need to create a multivariate model to understand the relationship with external conditions such as temperature and battery life. 16 3. Methodology, Data and Analysis 3.1 Overall Methodology To determine the impact of temperature on the sales, we followed the three steps illustrated in Figure 3-1. The first step, Data Collection, entails identifying the appropriate level of detail for sales data i.e. whether we should consider sell-in or sell-out data (defined in the table 3-1 below). In addition, this step also involves gathering temperature information. The second step, Data Analysis, involves visualizing the sales data and identifying the most important Stock Keeping Unit (SKU) (sub-group) for further analysis. Finally, in the third step, Data Modeling, the SKU identified in the second step is studied with the temperature information collected in the first step. Thus, the impact of temperature on sales can be quantitatively studied. * *u Sub-grouping cause and 3.2 Data collection Many companies use a variety of sales data to forecast their sales. As more supplier chains are connected, there are several sales processes even within one chain. Sales data can be divided into two major categories depending on the type of sales information: Sell-in and Sell-out. Sell17 in data represents sales orders from a manufacturer to a retailer. Sell-out data represents sales orders from a retailer to an end customer. Both data are meaningful to understand the current business status and set up the future strategies. Table 3-1 summarizes the benefits of both approaches. Table 3-1. Sell-in & Sell-out data features Sell-out Sell-in Data Source Retailer 4 Manufacturer End customer 4 Manufacturer Identify volume of the first article Able to see the response of end production consumers Benefits Because the aim of this research is to improve the sales forecast accuracy, it is more closely related to the behavior of end consumers. The best way to understand the behavior of end consumers is POS data analysis. POS data is considered the most useful Sell-out data. Sales information POS information captures the sales information on the retailer and customer end. Many companies use POS to manage sales, optimize inventory, maintain customer relationships and etc. Most importantly, POS allows us to understand sales patterns and popular items in different regions and time by real time data. However, sometimes such data are diverse and can have 18 multiple dimensions such as locations, SKU's and retailer relationships. These dimensions make it difficult to identify patterns appropriately unless specific data has been identified. SKU's are based on their usage in particular automobiles and thus can have different sales patterns based either on geography or on climatic conditions. Thus, a particular SKU needs to be identified in order to understand the relationship between sales and temperature without confounding other variables and patterns of SKU's. Various dimensions of point of sales (POS) data: Current POS data includes different components such as vendors, date of sales, zip code, SKU and units sold as indicated in Figure 3-2. A C B JCI Fiscal Week Date Segment Description 6/22/2014 0:00 PASSENGER LIGHT TRUCK/SUV 1048559 1 E D Group Size Zip Code 34 33312 8020 1048561 6/22/2014 0:00 PASSENGER LIGHT TRUCK/SUV 6/22/2014 0:00 PASSENGER LIGHT TRUCK/SUV H7 34 46901 1048562 6/22/2014 0:00 PASSENGER LIGHT TRUCK/SUV 121 R 32746 1048563 6/22/2014 0:00 PASSENGER LIGHT TRUCK/SUV 6/22/2014 0:00 PASSENGER LIGI IT TRUCK/SUV 6P2?014 000 PASSFNGFR I IGHT TRt)CK/SJV 75 31516 75 47006 26 3S404 65 31008 65 43952 1048568 b/22/2014 000 PASSENEI L6H I I RUCK/SUV 6/22/2014 0:00 PASSENGER LIGI IT TRUCK/SUV 6/22/2014 0-00 PASSENGER LIGHT TRUCK/SUV 75 45885 1048569 6/22/2014 0:00 PASSN6ER L6H I I UCK/SUV /5 3/924 1048560 1040564 104856'i 1048566 1048567 Figure 3-2. POS data composition 19 Gross Unit Sales The various components illustrated in Figure 3-2 provide information about various parameters. For example, date of sales shows consumers' buying patterns on a temporal basis. Zip code shows different buying patterns geographically. Figure 3-3 and Figure 3-4, show the sales of various SKU's over time and geography. From this our intention is to select one SKU with high sales as well as geographical prevalence. 20 Store Group Size * 24 * 24F 12K U 11K 26R *34 * 35 0 5R 065 075 10K 9K 78 0 H6 8K 7K 06K 0 5K 4K 3K 2K 1K OK Feb 9 Feb 23 Mar 9 Mar 23 Apr 6 Apr 20 Jun 1 May 18 May 4 Week of Date [2014] Jun 15 Jun 29 Figure 3-3. SKU sales by time 21 Jul 13 Jul 27 Aug 10 Aug 24 * ir Store Gross Unit.. 1 M 3,638 Figure 3-4. Zip code of sales In this thesis POS data from three different vendors is included, thus increasing the data set and also covering the entire geographical US. However, analyzing all the combined data is not only cumbersome but also less impactful, as various SKU may behave differently. Additionally, some SKU may not be geographically prevalent and thus information from these SKU may not be applicable to understand the impact of temperature. Thus, the most meaningful SKU needs to be identified to perform further analysis. SKU identification: top 10 sales, widespread location To identify the relevant SKU, it is desirable to select the most useful and representative data among all the SKUs. Our main criterion in selecting the SKU was that it should have large enough 22 dataset to ensure that the model was reliable. Another criterion was to ensure that the selected SKU was widespread enough geographically in order to incorporate temperature diversity and thus understand the impact of temperature. Thus, our rationale was that geographic spread would indicate temperature diversity and create a robust model. This dataset includes more than 30 different group sizes and we assumed that very few people buy more than 2 types of batteries (SKU's). It is assumed that each household will use batteries of same type as it the population of households having a car and a relatively larger vehicle such a bus would be lower. Therefore, considering the SKU with highest sales, size is indicative of the largest SKU and Step one of dataanalysis. With this background, the table below shows the top 10 sales of SKU's. Based on this information, we selected SKU 65 as illustrated in Figure 3-5, for further analysis and to identify geographical prevalence. Store Gross Unit.. Store Gros 65 7 6,009 305,829 24F 78 75 35 34 24 5IR H6 26R OK 20K 40K 60K 80K 1OOK 120K I 80K 160K 140K Store Gross Unit Sales 200K 220K 240K 260K 280K 300K 320K Figure 3-5. Top 10 sales by SKU Next, in order to visualize the geographical prevalence of SKU 65 we used a visualization software Tableau. Tableau is a visualization and business intelligence software developed Tableau Software Company. Tableau enables visualization of huge data sets and meaningful insights can be derived from this analysis. 23 -in-11112 liil -lillllM!!li : -il-. ii"- " " --- . " """" "", . """""" " -" .iiiilllll~~~ii" ""'- .116 - " ""--. '!!111111111112. """"" """""""""""" "' " """"''-" "" '''''''""""" We visualized sales information for all the SKU's across all retailers. This enabled us to identify the SKU with the highest sales and helped us determine whether that particular SKU was prevalent across the US. As illustrated in Figure 3-6 the sales of SKU 65 are shown geographically and based on the Figure 3-6 we can conclude that SKU 65 is sold throughout the US. Total Units -2 Map based on Longitude (generated) and Laitude (generated) Code. which keeps 6863 of 6,863 members 3160 Color shows sm of Total Units. Details are shown for Zip Code. The view is filtered on Zip Figure 3-6. Geographical sales of SKU 65 Finally, the POS data of the SKU 65 for a particular region was aggregated. For example the POS data for Boston region consisted of several zip codes shown in the graph below. This information was aggregated as the climatic conditions in a particular metropolitan area were similar. 24 '- - -, - - - '- ' -A I A-- A 'T-- - - - -ENMIML- - -,L - Total Unia 92 493 Map basad on Longitude (generated) and Latitude (generated). Colorshows sumof Total Units. Details are show forZ Code. Figure 3-7. Aggregated sales in Boston area City selection: 5 cities based on sales and temperature profile Based on the empirical information on temperature, 5 metropolitan areas were selected. The selection of the cities was done based on the following criteria: * Mix of cities with and without temperature variation " Cities where batteries from the SKU 65 are sold The following cities were selected as shown in Figure 3-8. " Los Angeles, * Boston * Washington D.C. * Chicago * Houston 25 Total Uit Figure 3-8. Sales of 5 cities Normalizing sales Finally, the aggregated sales information from each metropolitan area needs to be normalized as the sales are dependent on the total vehicles in operation (VIO) in a particular metropolitan area. The fraction of VIO in the specific metropolitan area is determined by the following equation. Normalized sales = (Total VIO in USA + Total Drivers is USA) x (Total Drivers in USA + Total US population) x (Populationof specific Metro Area) x Unit Sales in specific Metro Area 26 - , -Vmmm - - -1-1 Table 3-2. Normalized sales City Boston Chicago DC Houston LA Total Population(M) 4.5 9.52 5.86 6.18 18.2 44.26 Normalized factor 3.57 7.54 4.64 4.90 14.42 Total Units 8,073 34,211 20,813 68,855 89,224 221,176 Total Units (Normalized) 2,264 4,535 4,482 14,060 6,187 31,528 Temperature Temperature data from NOAA for last 5 years (2010 - 2014) Temperature data for each of the following cities was obtained from 2010 to 2014. The temperature information consisted of maximum and minimum temperature data, since battery failure occurs at temperature extremes. Two levels of aggregation needed to be performed for the temperature data. The first was aggregation from daily to weekly temperatures to correlate with the weekly POS, as our thesis company provided weekly POS data. The second was aggregation across the weather stations in a metropolitan region, as the temperature information consisted of temperatures across these various weather stations. For example, the temperature data for the Boston region consisted of daily temperatures at Foxboro, Logan Airport, and other 23 stations. Additionally, in order to aggregate, the weekly temperature patterns of these regions were evaluated and it was determined that the temperature patterns of these regions were similar, as illustrated in Figure 3-9. Hence, the average weekly maximum and minimum temperature of these regions was aggregated for the entire Boston area, as shown in Figure 3-10. 27 30 25 20 15 10 5 0 -5 -10 -15 -20 -25 -30 -35 Jan I Jan 3 Jan 5 Jan 7 Jan 9 Jan 11 Jan 13 Jan 21 Jan 19 Jan 17 Jan 15 Day of Date [January 2012] Jan 23 Jan 27 Jan 25 Jan 29 Jan 31 Figure 3-9. Temperature profiles of 25 stations in Boston area City 35 35 30 30 25 25 -A 20 20 15 15 Boston city U Boston, Average of TMAN UBoston, Average of TMIN 10 10 5 0 I SN~ -5 -10 -5 -10 -15 -15 Apr 17, 11 Oct 16, 11 Apr 15,12 Apr 1413 ct14 12 Week of JC I Fiscal Week Date W Oct 13,13 Apr 13,14 Oct 12, 14 Figure 3-10. Average weekly temperature of the entire Boston area This procedure was performed for all the other metropolitan areas identified. 28 Figure 3-11. Location of regions removed For the Los Angeles region data from the Mount Wilson, Chilao, Mill Creek and Clear creek stations, shown in Figure 3-11 were removed from the calculation of average temperature. The temperature patterns in these regions were different from other regions as shown in Figure 312. These regions include national forest and thus do not have a representative number of automobiles which may require battery sales. Hence, this temperature information can be safely removed without impacting the aggregated sales for this metropolitan area. Figure 3-13 shows the average aggregated maximum and minimum temperature for the LA area. 29 25 20 15 ~ --- - -- -4--- 0 -5 -10 Feb17 Feb15 Feb19 Feb21 Feb 23 Feb 27 Feb 25 Day of Date [2012) Feb 29 Mar 2 Mar 6 Mar 4 Mar 8 Station Name (group) Other CAMP 9 CALIF Figure 3-12. 5 Stations not applicable to temperature aggregation City 35 35 30 30 25 25 F- 20 20 15 15 10 10 z LA 5 0 Apr 17, 11 Oct 16, 11 Apr 15. 12 Apr 14, 13 Oct 14 12 Week of JCI Fiscal Week Date Oct 13 13 Apr 13 14 Oct 12 14 Figure 3-13. Average weekly temperature of the entire LA area 30 City LA. Average of TMAX LA. Average of TMIN For Houston, DC and Chicago, temperature showed very similar patterns across the entire stations and didn't show any anomalies as shown in Figures 3-14, 3-16, 3-18. The average maximum and minimum temperatures are shown in Figures 3-15, 3-17 and 3-19. Station Name 25 BAYTOWN TX 0 THOMPSONS HOUSTON CL HOUSTON HO HOU HOUSTON NA HOUSTON PO HOUSTON SU HOUSTON WI LAND STON INT 20 15 SUGAR - 0A-- 10 Dec 3, 11 Dec 13 11 Dec 23, 11 Jan 2 12 Feb 1 12 Jan 2 12 Jan 12, 12 Day of Feb 11 12 Feb 21 12 Mar 2. 12 Mar 12 12 Date Figure 3-14. Temperature profiles of 10 stations in Houston area Ct 40 40 35 35 30 30 25 25 20 20 Houston < 15 10 v4~ Apr 17, 11 City U Houston, Average Houston, Average 15 15 10 Oct 16. 11 Apr 15 12 Apr 14 1 Oct 14, 12 JCI Fiscal Week Date Week of Oct 13, 13 Apr 13, 14 Oct 12 14 Figure 3-15. Average weekly temperature of the entire Houston area 31 of TMAX of TMIN Station Name ANNAPOLIS N BALTIMORE W WASHINGTON BELTSVILLE M BRIGHTON DA DALECARUA DAMASCUS 3 LAUREL 3W MANASSAS V NATIONAL AR OXON HILL M STERLING NC UPPER MARL VIENNA VA US WASHINGTON A E - - Ifr 4= 4 4-WMtE-~YU~ assnis-4imus ai-I ~ ----. ~B~U 0 I OcI29 12 Nov 812 Nov 18 12 Nov 28 12 Dec 8. 12 Dec 18. 12 Day Dec 28 12 Jan of Date 7 13 Jan 17, 13 Jan 27 13 Figure 3-16. Temperature profiles of 15 stations in DC area Ciy City DC Average TMAX * DC. Average of TMIN of 35 35 3 0 30 2 5 25 2 0 20 15 15 1015 10 DC 94.-- - I TWV TIAN~ -10 -10 Apr 17. 11 Oct 16, 11 Apr 15, 12 Apr 14, 13 Oct 14, 12 Week of JCI Fescal Week Date Oct 13, 13 Apr 13, 14 Oct 12. 14 Figure 3-17. Average weekly temperature of the entire DC area 32 Station Name 30 CHICAGO SOT N STREAMWOO * CHICAGO MID CHICAGO MID CHICAGO NO CHICAGO OH. CHICAGO PAL LISLE MORTO PARK FOREST 25 20 15 ROMEOVILLE 10 5 -E Ag 0 I -10 Oci12 Oc1 12 03V 2 Nv51 o2 2 Dc51 e301 a1 3 el Jn01 3 PEB -15 -20 -25 -30 Oct 1. 12 Oct 16, 12 Oct 31, 12 Nov 15, 12 Nov 30, 12 Dec 15, 12 Dec 30, 12 Day of Date Jan 14, 13 Jan 29, 13 Feb 13.,13 Feb 28, 13 Figure 3-18. Temperature profiles of 10 stations in Chicago area City *Chicago, CRy 35 35 30 30 25 25 /KA 1A 20 15 Chicago 10 0 I i--- -1G 7 Average of TMAX Chicago, Average of TMIN 20 15 10 5 0 0 -5 -10 -15 -15 -O -20 -20 Apr 17, 11 Oct 16, 11 Apr 15,12 Apr 14,13 Oct 14,12 Week of JCI Fiscal Week Date Oct 13, 13 Apr 13, 14 Oct 12. 14 Figure 3-19. Average weekly temperature of the entire Chicago area 33 3.3 Modeling (data from 2010 -2014) iMP JMP is a statistical software by SAS and it enables identification of quantitative relationships between variables. This is needed for our research as our aim was to identify the relationship between the sales and temperature. Regression Analysis The "Fit Model" function of JMP was used to create a regression model. We used the following parameters in the model: 1. Dependent Variables (Y-parameter): Normalized sales as a continuous parameter. 2. Independent Variables (X-parameters): " Minimum Temperature, as a continuous parameter " Maximum Temperature, as a continuous parameter " Year, as ordinal " Quarter, as ordinal The model was created iteratively by plugging in a combination of X-variables and then checking R 2. Secondly, the adjusted R 2 was also checked to ensure that the model had an appropriate number of variables. The residuals (Actual-Predicted) by Row were plotted to ensure that there are no patterns. Finally, we used a significance value of 0.05 and based on the p-value in the parameter estimates, all the parameters with p-value >0.05, are removed. The parameters are 34 ilil..,1,L; -----lln-"--n""' im"ssl'""""""""' lI NNUUUlllllUM -U i -ll - removed from the model starting with the higher order parameters (e.g. second order parameters are removed before first order parameters) and then those with the highest p-value. Various combinations of independent variables were used in the model are shown below in Model 1 through 6. Model 1: Predictor Parameters included: TMIN, Quarter, Year, Interaction of TMIN and Quarter, quadratic and cubic effect of TMIN. Also data from first quarter of all years was removed to check if there was any variation due to first quarter sales. Predicted Parameters: Normalized sales Summary of Fit 0.643513 RSquare 0.632911 RSquare Adj 13.04762 Root Mean Square Error 22.76971 Mean of Response 278 Observations (or Sum Wgts) Parameter Estimates Temi Intercept Quarter[2Q Quarter[3Q] Year Average of TMIN (Average of TMIN-11.8011)*Quarter[2Q] (Average of TMIN-11.801 1)*Quarter[3Q] (Average of TMIN-11.8011)*(Average of TMIN-11.8011) (Average of TMIN-11.8011)*(Average of TMIN-11.8011)*(Average of TMIN-1 1.8011) Estmaft Std Error t Ratio Prob>ltl 1512.002 -4.39 <.0001* -2.84 0.0049* 1A09585 -6642.507 -3.996451 -8.953911 3.2984988 1.6265907 0.0513622 -1.536263 0.282693 1.746973 0.751347 0.260338 0.209286 0.333458 0.022627 0.0090016 0.001619 -5.13 4.39 6.25 0.25 -4.61 12.49 5.56 <.0001* <.0001* <.0001* 0.8063 <.0001* <.0001 <.0001* * " '' '''"'"""""" "'''n""' ""I ' "'"'''' Figure 3-20. Diagnostics of Model 1 35 ..... . ......... . ..... .... Model 2: Parameters included: TMIN, Quarter, Year, Interaction of TMIN and Quarter, quadratic and cubic effect of TMIN Predicted Parameters: Normalized sales Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.534081 0.521523 14.65589 23.20997 382 Parameter Estimates * Std Error t Ratio Prob> tI -5.09 <.0001 1458.841 6.02 <.0001* 2.256572 1.775078 -4.10 <0001* -2.93 0.0036* 3.31191 5.09 <.0001* 0.724865 4.92 <.0001* 0.219283 5.56 <.0001 0.343222 0.235004 -3.50 0.0005* -4.44 <.000 1 0.444074 11.09 <.0001* 0.018722 0.000758 4.90 <.0001* * Estimate -7422.553 13.580965 -7.279265 -9.702471 3.6911282 1.0795884 1.9073062 -0.822085 -1.972197 0.2077185 0.0037157 * Term Intercept Quarter[1Q] Quarter[2QJ Quarter[3Q] Year Tmin (Tmin-7.84418)*Quarter[1Q] (Tmin-7.84418)*Quarter[2Q] (Tmin-7.84418)*Quarter[3Q] (Tmin-7.84418)*(Tmin-7.84418) (Tmin-7.84418)*(Tmin-7.84418)*(Tmin-7.84418) Figure 3-21. Diagnostics of Model 2 Model 3: Predictor Parameters included: TM IN, Tmax, Quarter, Year, Interaction of TMIN and Quarter, and quadratic effect of TMIN Predicted Parameters: Normalized sales 36 Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.505329 0.491996 15.10133 23.20997 382 Parameter Estimates Term Intercept Quarter[1Q] Quarter[2Q] Quarter[3Q Year Tmax Tmin Quarter1Q]*(Tmin-7.84418) Quarter[2Q]*(Tmin-7.84418) Quarter3Q]*(Tmin-7.84418) (Tmin-7.84418)*(Tmin-7.84418) Esthnste -7278.905 16.941799 -5.665866 -16.4209 3.6191208 -0.450572 2.3367081 1.6647715 -0.960886 -1.032831 0.1584872 Std Error t Ratio -4.82 1509.524 7.67 2.210039 -3.15 1.796066 -5.30 3.09545 4.82 0.750281 0.432796 -1.04 5.27 0.443009 4.74 0.351325 -3.97 0.242112 -2.51 0.411295 9.76 0.016245 ProbyIt <.0001* <.0001* 0.0017* <.0001* <.0001' 0.2985 <.0001' < 0001* <.0001' 0.0125* <.0001* Figure 3-22. Diagnostics of Model 3 Model 4: Predictor Parameters included: TMIN, Tmax, Quarter, Year, Interaction of TMIN and Quarter, and quadratic effect of TMIN. Predicted Parameters: the square of normalized sales was predicted, instead of just the normalized sales. Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.373686 0.361964 1623.774 986.4411 382 37 Parameter Estimates Term Intercept Quarter[1Q] Quarter[2Q] Quarter[3Q] Year Tmax Tmin (Tmin-7.84418)*(Tmin-7.84418) Estimate -491795.6 947.31326 -258.2701 -1531.704 243.91716 -34.96315 202.7532 9.2462785 Std Error t Ratio Prob>ltl 0.0025* -3.05 161333.9 4.66 <.0001* 203.2903 149.8081 -1.72 0.0855 <.0001* 198.3015 -7.72 3.04 0.0025* 80.19875 -0.77 0.4414 45.37297 4.35 <.0001* 46.62874 9.82 <.0001* 0.941371 Figure 3-23. Diagnostics of Model 4 Model 5: Predictor Parameters included: TMIN, Quarter, and quadratic effect of TMIN Predicted Parameters: Normalized sales Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.409177 0.406234 17.37928 31.21604 1010 Parameter Estimates Term Intercept Quarter[1Q] Quarter[2Q] Quarter3Q Tmin (Tmin-9.62627)*(Tmin-9.62627) Estimate 0.9253695 15.180511 -7.41504 -19.86524 2.4464393 0.0869418 Std Error 1.320921 1.208659 0.979943 1.264733 0.095312 0.006103 t Ratio Prob>It 0.70 12.56 -7.57 -15.71 25.67 14.25 0.4837 <.0001* <.0001* <.0001* <.0001* <.0001* Figure 3-24. Diagnostics of Model 5 38 I Model 6: Predictor Parameters included: TMIN, Quarter Predicted Parameters: Normalized sales Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Parameter Term Intercept Quarter[1Q] Quarter[2Q] Quarter[3Q] Tmin 0.289736 0.286909 19.04569 31.21604 1010 Estimates Estimate Std Error t Ratio Prob>lt 13.399226 1.08389 12.36 <.0001* 15.59257 1.324172 11.78 <.0001* -9.134888 1.065725 -8.57 <.0001* -14.26195 1.317281 -10.83 <.0001* 1.8862826 0.09515 19.82 <.0001* Figure 3-25. Diagnostics of Model 6 Model Discussion Let us discuss each of the five models we have created. Model 1: This model shown in the Figure 3-20 provides the best R 2 value but as we notice we have used a cubic function of minimum temperature. Additionally the data from 1 st quarter for all years was removed to check whether quarter would impact the sales. Based on these results we observe that quarter does impact the results, but removal of sales data from 1st quarter and including a cubic expression of Tmin may not be warranted. 39 Model 2: In this model shown in the Figure 3-21 the sales data from 1 st quarter of all years is included, but the model expression is kept the same from model 1. As the R 2 is lower from model 1, this implies that the variation from 1st quarter induces more variability in the overall data. Additionally, we still have a cubic expression and an interaction of temperature and quarter, both of which may be unwarranted. Model 3: In this model shown in the Figure 3-22 the cubic expression from model 2 is dropped, but the interaction is still kept in the model. Additionally, the expression includes the maximum temperature. Although, the R 2 is decent, but the inclusion of interaction and the maximum temperature may not be warranted. Model 4: In this model shown in the Figure 3-23 the model expression for independent variables is kept the same but the dependent variable, Normalized sales is transformed (squared) to check if it yields a better fit. The fit does not improve, in fact the R 2 is reduced and hence the transformation of normalized sales is not justifiable. Model 5: This model shown in the Figure 3-24 includes the quadratic effect of minimum temperature and does not include either the interaction or the cubic effect. We observe that this model achieves a more discreet R 2 value but has the advantage of being more parsimonious, e.g. using less variables. The use of quadratic variable for temperature may be warranted by the fact that the relationship between temperature and the battery life (and sales) is not linear. Model 6: This model shown in the Figure 3-25 is a further simplification of model 5 and does not include the quadratic effect. We observe that the R 2 is further reduced. Also as discussed in model 40 5, the relationship between battery life(sales) and temperature may not be linear and is proved by the poor fit of this model. From these 6 models, model 1 and 2 include the cubic and quadratic effect of Tmin as well as the interaction of Tmin and quarter. Hence even though the R 2 for these models is higher than 50%, we did not select these models as they may be over fitting due to inclusion of additional variables. Model 3 incudes the interaction and model 4 further complicates by transforming the sales. Thus we do not select these models as well. Model 6, on the other hand oversimplifies and only uses the linear relationship between temperature and sales and thus has a lower R 2 and predictive power. From model 5 we can see that the temperature is indeed a predictor of sales. Notice it is the minimum and not the maximum temperature that is the best predictor. A model that uses the minimum temperature, both linear and squared along with the quarter, like model 5 above seems to offer a good compromise between predictive power and parsimony. Hence model 5, was selected from the above 6 models. 41 4. Validation of the approach Is there a way to validate the approach used to generate model 5 as a predictor of sales based on temperature? There is one way: to use it with new data. The sponsor company can apply it with new data. This however takes time. Is there a way to validate the approach of model 5 now? There may be a way: to use only part of the data, instead of all the data to generate a model which will then be applied to predict the values in the rest of the data. Data can be segregated for this exercise either in time or in geography. Thus, two models were created: one with three cities namely Chicago, LA and Houston with data from 2011 to 2014 and another with data from 2011, 2013 and 2014 with all the cities and indicated in Table 4-1. This was done as we wanted not only to create the model but also to validate the model. One option was to use all five cities to develop the model, but then we would have no way to validate it unless we obtained additional data. Instead, we decided to use the data from three cities to create a model, and then use the data from the other two models to validate the model. Additionally, to validate the model across time we decided to use data from three years and then use the data from one year to validate the model. Table 4-1. Validation of models Model City Year Purpose G Chicago, LA and Houston 2011, 2012, 2013 Validate the model for and 2014 Boston and Washington D.C. 42 Model City Year T Boston, Washington D.C., 2011, Chicago, LA and Houston 2014 Purpose 2013 and Validate the model for 2012 We chose Chicago, LA and Houston because these cities encompassed the range of minimum and maximum temperatures seen across the five cities as shown in Table 4-2, and thus the model could be used to predict the sales in Boston and Washington D.C. Table 4-2: Range of minimum Temperatures across the cities City Boston LA DC Houston Chicago Higher end of minimum Temperature (*C) 21.7 21.5 24.2 26.1 24.0 Lower end of minimum Temperature (*C) -14.0 4.8 -10.5 -1.3 -17.2 Similarly, the years 2011, 2013 and 2014 were chosen for model B as these years encompassed the range of minimum and maximum temperatures across the four years, as shown in Table 4-3. Hence the data from 2011, 2013 and 2014 could be used to predict the sales in 2011. Table 4-3: Minimum and Maximum Temperatures across the years Year 2011 Maximum Temperature (*C) 26.1 Minimum Temperature (*C) -14.0 2012 2013 25.0 24.9 -10.1 -14.3 2014 24.6 -17.2 43 The first model (Model G) provides an understanding of fit in terms of geography as this was model was developed with data for Houston, Los Angeles and Chicago. The second model (Model T) provides an understanding in terms of time and this model was developed with data from 2011, 2013 and 2014. 4.1 Model G diagnostics The model diagnostics for Model G are shown Figure 4-1. Regression Plot 130 120 .g1101 c 100 00 z 80* 70 601 50 40 302V'* 10 'i '* - -E 0 L n '. ' n LA LnLn n LI n L .n I R -1Q 2Q I: 3Q Average of TMIN - 4Q Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.405144 0.400187 18.45579 40.89451 606 Figure: 4-1 Model diagnostics for Model G: R 2 44 -: ' . - - - - - - - -- tTm- - - - - - - - - - - - - As illustrated in Figure 4-1, the R 2 and adjusted R 2 for model G are 40%. This implies that with the variables in the model explain, 40% of the variability in the sales is explained by this model. The Pareto chart, in Figure 4-2 illustrates the relative significance of each parameter in the model. Figure 4-3 shows that the minimum temperature (Tmin) and the quadratic effect of Tmin are the most important variables in the model. Pareto Plot of Transformed Estimates Term Esti Average of TMIN (Average of TMIN-1 1.4528)*(Average of TMIN-11A528) 11.61524 9.26380 Quarter[3Q] Quarter[2Q] 1.93990 -1.93182 Quarter[1Q] -1.20946 __t_ Figure 4-2: Pareto Plot for Model G Additionally, from Figure 4-3, describing the parameter estimates, it can be observed that all the parameters are statistically significant. IParameter Estimates Estimate Std Error t Ratio, Prob> It Intercept Quarter[1Q] Quarter[2Q] Quarter[3Q] Average of TMIN (Average of TMIN-1 1.4528)*(Average of TMIN-1 14528) 2.5165218 2.100989 14.401557 1.575935 -8.036456 1.324293 -18.52613 1.690849 2.7718893 0.140517 0.0952339 0.007707 1.20 9.14 -6.07 -10.96 19.73 12.36 0.2315 <.0001* <.0001* <.0001* <.0001 <.0001 * Term * -11 :- - - Figure 4-3: Parameter Estimates for Model G Finally the expression in Figure 4-4 describes the quantitative relationship between sales, temperature and quarter. The significance of quarter implies that even though sales are impacted 45 by temperature, the impact is also dependent on the quarter. Therefore, for the same minimum temperature, the sales could vary by quarter. This may indicate that the customer behavior may be different in quarters or that the mechanism of failure, i.e. physicochemical mechanisms, could be different in quarters. This further indicates that other climatic factors, such as humidity etc. or age of the battery, may additionally influence the failure rate. Additionally, another inference is that the third quarter would have the lowest sales and the first and fourth quarters would have the maximum sales. However, this could just be a manifestation of the temperature as the low minimum temperatures during 1 st and 4 th quarter may trigger the higher sales. Furthermore, the quadratic effect implies that sales bottom out at a certain temperature and sales increase at the other temperature extreme. However, as quarter is also a factor in the model, the temperature at which sales bottom out will different for each quarter. 2.51652175403159 "1 a': 14.4015568297034 "2Q" > -8.0364561076666 + Match[ Quarter] -3Q" :-18.526133413611 "4a':: 12.1610326915738 else a. + 2.77188931917744* Average of TMIN (Average of TMIN- 11.452794878231) +* (Average of TMIN- 11.452794878231) * 0.09523385467006J Figure 4-4: Prediction Expression for Model G 46 4.2 Model T diagnostics 2 As discussed previously comparing the R 2 and adjusted R provides a measure of the explanation of variability and also the measure of whether the model is overfitted, as shown in Figure 4-5. Regressio n Plot 140 p 120 .- J c -100 - 1Q - 2Q - 3Q -4Q 80 0E 0 E " z 60 40 20 0 -15-10-5 0 5 10 15 20 25 Average of TMIN Summary of Fit 0.39803 RSquare 0.393957 RSquare Adj 17.79322 Root Mean Square Error 31.22096 Mean of Response 745 Observations (or Sum Wgts) 2 Figure: 4-5 Model diagnostics for Model T: R The prediction equation and the pareto chart are shown below in Figure 4-6 and 4-7. 47 Pareto Plot of Transformed Estimates Orthog Est e 11.61180 Term Average of TMIN (Average of TMIN-9.41304)*(Average of TMIN-9.41304) 8.10596 -1.97868 -1.52138 0.93953r Quarter[2Q QuarterlQ] Quarter[3Q] Figure 4-6: Pareto Plot for Model T Based on this it can be concluded that both minimum temperature and the quadratic effect of minimum temperature are more important than quarter. Additionally, we can derive very similar conclusions are from Model T. Prediction Expression I 1.52016479823857 "1Q" - 15.3359960316323 "2Q" * -7.5992214579832 + Match[ Quarter) "3Q" > -20.200397225495 "4Q"- 12.4636226518457 else >. + 2.40942411949481 *Average of TMIN Average of TMIN -9.41303712902415) + Average of TMIN - 9.41303712902415) * 0.08600801465012 Figure 4-7: Prediction Expression for Model T 48 4.3 Insights on the validity of the approach Based on Figure 4-8 to 4-10 the approach of forecasting sales based on temperature model is more robust across time than across geography. We also notice that for both geography and time, the trends for actual and predicted sales are the same. However, for Model G, the difference between actual and predicted is much higher both for Boston and Washington D.C, than for Model T, the difference between actual and predicted values of 2012. This information can be used to prioritize what data to use when refining the model further. For example, if there is an equal amount of data available for geography or for time then the data from additional geographical locations should be used. 450 450 400 400 Mourne Numws 350 30 3N0 250 150 150 too 100 50 2011 02 2011 04 2012 02 2012 04 Oveter of Date 2013 Q2 2013 Q4 201402 2014 Q4 Figure 4-8: Model validation for Boston (Model G) 49 -- . -- ------ . .. . ...... Measum Nanm 600 600 Actual * Predicted 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 150 150 100 100 50 50 0 0 2011 02 201104 201202 201204 Ouarter of Date 2013 02 201304 2014 Q2 2014 Q4 Figure 4-9: Model validation for Washington D.C. (Model G) 3000 3000 Me.asu *Predicted 2500 2500 2000 2000 1500 1500 1000 1000 500 0 01 Q2 Qurter of Date 120121 Q3 Figure 4-10: Model validation for Year 2012 (Model T) 50 04 nams Additionally, it also implies that there are more variations across geographies and thus a model generated based on data from one region can't be extrapolated to another region. In this case data from the West coast, South and Midwest regions was used for the model and validated against East coast cities, Boston and Washington D.C. Based on the actual vs. predicted plots this implies that model needs to be built based on region to increase predictability. Based on the Figures 4-8 and 4-9, we see that the pattern of predicted and actual sales in both Boston and Washington D.C. are similar, but the absolute values are different. Hence we normalized both the predicted and actual sales based on the overall average of predicted and actual sales for the duration. This helped us understand the prediction of change over time. Based on Figures 4-11 and 4-12, we observe that the %change from average is similar for predicted and actual. This illustrates the fact that the direction and magnitude of change can be predicted by the model, but the absolute value of sales cannot be determined by the model. This illustrates the fact that the even though the sales were normalized based on the vehicles in operations, but there is still a difference due to factors such as demographics, public transportation and other local preferences. 51 Measure Names Avg. Actual (Norm) Avg Predicted (Norm) 1 4 1.2 1.0 0.8 0.6 0.4 02 0.0 20110Q1 20110Q2 20110Q3 20110Q4 2012 01 20120Q2 2012 Q3 20120Q4 Quarter 20130Q1 of Date 2013 Q2 2013 Q3 2013 Q4 2014 Q1 20140Q2 20140Q3 2014 Q4 Figure 4-11: Model validation for Boston (Model G) based on change Measure Names Avg Actual (Norm) Avg Predicted (Norm) 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 03 0.2 0.1 0.0 201101 201102 201103 2011 Q4 201201 2012 Q2 201203 201204 201301 Quarter of Date 2013Q2 201303 201304 2014 Q1 2014 Q2 201403 201404 Figure 4-12: Model validation for Washington D.C. (Model G) based on change 52 5. Conclusion and Future Work In this study, we established a correlation between sales and temperature to explain the variability in battery sales. Based on the results from the model, we found that there is a linear and quadratic relationship between the minimum temperature and battery sales. Additionally, based on the model validation for geography and time we determined that the model is more robust across time than across geography. Thus, this helps prioritize the resources when refining the model by adding additional data. What this means for our sponsor company is that they will be able to use temperature data to improve their sales forecast. This can be done by developing models that use historical data of the minimum temperature in a region and the point of sales of a given SKU in that region to predict future sales of that SKU as a function of future minimum temperature. The model can be developed using multiple regression, with the quarter, and minimum temperature as predictors. The minimum temperature in the model is related to the sales both linearly and quadratically. Based on these results, our thesis sponsor can further refine the model by adding sales and temperature information from various geographies. Additionally, another factor such as age of the battery can also be added to further refine the model. The age of the battery can be calculated based on results from a small customer survey in a representative metropolitan area. This additional understanding of the impact of temperature on the sales forecast allows firms not only to respond quickly to customer needs but also to reduce inventory costs, ultimately increasing their profits. Furthermore, this understanding and improvement in battery failure and thus sales represents a causal factor analysis in improving sales forecasts of automotive batteries. 53 References Doerffel,D & Sharkh, S.A. (2005). A Critical review of using the Peukert equation for determining the remaining capacity of lead-acid and lithium-ion batteries. Journal of Power Sources, 155, 395-400. Esfahanian,V., Torabi, F & Mosahebi,A.(2008). An innovative computational algorithm for simulation of lead-acid batteries. Journal of Power Sources, 176, 373-380. Hoy F. Carman (1972). Improving sales forecasts for appliances. Journal of Marketing Research, 11, 214-218. Keifer, S. (2010). Beyond Point of Sale: Leveraging Demand Signals for Forecasting. Journal of Business Forecasting. Kevin Kouba (2014). Can climate contribute to battery life expectancy?. Audiology Online, 1-1 Lu Junmin & Wang Xiaokan (2014). The improving measures research on the cycle life of leadacid batteries for electric vehicles. Advanced Materials Research, 986-987, 119-122. Michael D. Geurts & David Whitlark (1996). Improving sales forecasts by improving the input data. Journal of business forecasting methods & systems, 15, 15-18. Ruetschi,P. (2004). Aging Mechanisms and service life of lead-acid batteries. Journal of Power Sources, 127, 33-44. Schiffer, J., Sauer, D.U., Bindner, H., Cronin, T., Lundsager,P.& Kaiser, R. (2007). Model prediction for ranking lead-acid batteries according to expected lifetime in renewable energy systems and autonomous power-supply systems. Journal of Power Sources, 168, 66-78. 54 Sauer, D.U. & Wenzl, H. (2008). Comparison of different approaches for lifetime. prediction of electrochemical systems - Using lead-acid batteries as example. Journal of Power Sources, 176, 534-546. Thomas Waldmann & Marcel Wilka & Michael Kasper & Meike Fleishhammer & Margret Wohlfahrt-Mehrens (2014). Temperature dependent ageing mechanisms in Lithuim-ion batteries. Journal of Power Sources, 262, 129-135. Williams Brent & Waller Matthew & Ahire Sanjay & Ferrier Gary (2014). Decision Support: Predicting retailer orders with POS and order data: The inventory balance effect. European Journal of Operational Research, 232, 593-600 55