Sample Project 2 This paper will serve to analyze vehicle sales of autos and light trucks in the United States using data obtained from USDATA on the website spanning a period of time from 1992 to 2003. Vehicle sales will serve as the dependent variable in this study and we will attempt to explain this data by looking at the following three explainer variables: personal consumption expenditures, disposable personal income, and civilian unemployment rate. We will begin with a literary review that gives some general information on vehicle sales in the United States and then move on to describe each of the four variables. After describing each variable, we will move on to a show any linkages that may exist between these variables. A study was published in the Journal of Business and Economic Statistics in October 1992 by Thompson and Noordewier titled Estimating the Effects of Consumer Incentive Programs on Domestic Automobile Sales. Though this study was done in the early nineties and things have shifted over the past fifteen years, there is still some merit to be found in looking at this paper. In this study the authors are looking to find connections between “patterns of sales gained during the promotions and sales lost during any post promotion troughs (410).” Over a period of four years they looked at sales patterns for Ford Motor Company, GM, and Chrysler Corporation. Thompson and Noordewier make the important point, which ties into this study that “industry observers…and other researchers…suggest that the mean may be predicted by sales of import autos and by macroeconomic variables such as gross national product, interest rates, disposable income, and unemployment rates (411).” The authors test their data in a number of different ways including intervention analysis and ultimately conclude that there was a significant change in consumer response to promotions run during the time 3 periods analyzed. In addition, they found that as automobiles are a durable good, frequent promotion of such a good can produce a pattern that is less predictable than that of a nondurable good (like toothpaste and cookies) and ultimately more research is required (417). Having taken a look at this study, we will now delve in to the variables to be analyzed in this paper. The first key variable to be discussed is our dependent variable, vehicle sales within the United States. We are looking at data that spans from 1992 to 2003 and displays the data in thousands of units (SAAR). Below, please see a frequency distribution that gives an overview of this vehicle sales data. Frequency Table Vehicle Sales: Autos and light trucks, (Thousands of units, SAAR) Vehicle Sales Frequency Percentage Cumulative % > 11000 to 12000 0 0.00% 0.00% > 12000 to 13000 10 3.76% 7.52% > 13000 to 14000 8 3.01% 13.53% > 14000 to 15000 34 12.78% 39.10% > 15000 to 16000 33 12.41% 63.91% > 16000 to 17000 29 10.90% 85.71% > 17000 to 18000 12 4.51% 94.74% > 18000 to 19000 6 2.26% 99.25% > 19000 to 20000 0 0.00% 99.25% > 20000 to 21000 0 0.00% 99.25% > 21000 to 22000 1 0.38% 100.00% Generally speaking, a frequency distribution is a summary table that arranges data into numerically ordered class groupings or categories. It is used as a way of organizing larger sets of numbers. Another way of looking at this data is by creating a histogram which also describes data that has been grouped into frequency distributions using rectangular bars that are constructed at the boundaries of each class. Below please see a histogram for vehicle sales. 4 40 20 0 100.00% 50.00% 0.00% 11 9 13 99 9 15 99 9 17 99 9 19 99 9 21 99 99 9 Frequency Histogram - Vehicle Sales Frequency Cumulative % Vehicle Sales Finally, it is important to talk about descriptive statistics for this data such as where the numbers are centered, what is the variation or spread, as well as what is the shape or symmetry to the numbers. By running summary statistics in Excel and a five number summary in PHStat we can see that vehicle sales are centered around the mean which in this case is 15,439.11. The mean is the average of the numbers. The numbers are also somewhat centered around the median or middle number that separates the lowest fifty percent of the numbers from the top fifty percent. In this case the median is 15,332.4, which is relatively close to the mean. You can also calculate the range of the numbers or the size of the interval that contains all numbers by subtracting the Minimum from the Maximum (Max – Min). The range here is 8,803.1. Also, the standard deviation is important to consider. In this case, standard deviation is a measure of how much the vehicle sales numbers differ around the mean or average. The standard deviation here is 1,502.98. It is also beneficial to calculate the interquartile range which shows the middle fifty percent of the numbers by subtracting the first quartile or 25th percentile from the third quartile or seventy fifth percentile (3rd quartile – 1st quartile). In this case the interquartile range is 1,879.74. You can also calculate the coefficient of variation which measures how big the standard deviation is as a fraction or percentage of the mean (SD/mean). The coefficient of variation here is 0.097 or 9.7 %. Finally, it is 5 important to assess the shape of the data to determine if they are symmetrical of not. The best way to do this is to run a test for the Pearson Measure of Skewness which is calculated by taking the mean minus the median and dividing it by the standard deviation ((Mean – Median)/SD). When the absolute value of the Pearson Measure is less than 0.1 the numbers are considered to be symmetrical. When they are greater than or equal to 0.1, the numbers are considered to be skewed. In this case the Pearson Measure is 0.071, which means that these numbers can be considered symmetrical. Please see a chart below detailing all numbers described in the above paragraph. Descriptive Statistics - Vehicle Sales Autos and Light Trucks (Thousands of units, SAAR) Mean 15439.111 Median 15332.400 Range 8803.100 Standard Deviation 1502.980 Interquartile Range 1879.735 Coefficient of Variation 0.097 Pearson Measure 0.071 After looking at this descriptive analysis of the dependent variable we must now take a look at the three explainer variables that we hope will have a link to our dependent variable: personal consumption expenditures, disposable personal income, and civilian unemployment rate. Let us begin with our first explainer variable, personal consumption expenditures for the United States which is measured in billions for dollars, also from the time span of 1992 to 2003. First, please see a frequency distribution and histogram for this data. 6 Frequency Table Personal Consumption Expenditures ($ Billions) Personal Consumption Cumulative Expenditures Frequency Percentage % > 3500 to 4000 0 0.00% 0.00% > 4000 to 4500 20 7.52% 15.04% > 4500 to 5000 23 8.65% 32.33% > 5000 to 5500 23 8.65% 49.62% > 5500 to 6000 17 6.39% 62.41% > 6000 to 6500 14 5.26% 72.93% > 6500 to 7000 19 7.14% 87.22% > 7000 to 7500 17 6.39% 100.00% 30 20 10 0 100.00% 50.00% 0.00% 39 99 49 99 59 99 69 99 Frequency Histogram - Personal Consum ption Expenditures Frequency Cumulative % Personal Consum ption Expenditures In terms of a descriptive analysis of these numbers for personal consumption, we will first display the chart of numbers and then run through a brief explanation of what they mean. Descriptive Statistics - Personal Consumption Expenditures ($ Billions) Mean 5667.441 Median 5560.800 Range 3369.000 Standard Deviation 1016.690 Interquartile Range 1830.700 Coefficient of Variation 0.179 Pearson Measure 0.105 Here we see that the numbers are centered around the mean of 5,667.44 as well as the median (5,560.80) which is again close to the mean. The range shows a span of 3,369 7 and a standard deviation from the mean of 1,016.69. The interquartile range shows us that the middle fifty percent of the numbers are around 1,830.70 and the coefficient of variation of 0.179 indicates that the standard deviation is at 17.9 % of the mean. Finally, by looking at the Pearson Measure of 0.105 we see that the numbers are just slightly skewed as the measure is slightly greater than/almost equal to the absolute value of 0.1. The next explainer variable to look at is the disposable personal income within the United States from the period of 1992 to 2003. Please see the frequency distribution and histogram chart below. Frequency Table Disposable Personal Income ($ Billions) Disposable Personal Cumulative Income Frequency Percentage % > 3500 to 4000 0 0.00% 0.00% > 4000 to 4500 0 0.00% 0.00% > 4500 to 5000 23 8.65% 17.29% > 5000 to 5500 23 8.65% 34.59% > 5500 to 6000 21 7.89% 50.38% > 6000 to 6500 17 6.39% 63.16% > 6500 to 7000 14 5.26% 73.68% > 7000 to 7500 20 7.52% 88.72% > 7500 to 8000 15 5.64% 100.00% 30 20 10 0 100.00% 50.00% 0.00% Frequency Cumulative % 39 99 49 99 59 99 69 9 79 9 99 + Frequency Histogram - Disposable Personal Income Disposable Personal Incom e Next, please see a chart with numbers critical to a descriptive analysis of disposable personal income. 8 Descriptive Statistics - Disposable Personal Income ($ Billions) Mean 6127.944 Median 5967.800 Range 3391.000 Standard Deviation 1006.350 Interquartile Range 1792.050 Coefficient of Variation 0.164 Pearson Measure 0.159 Here we see that there is a more significant difference between the mean and median. In this case it is probably safer to say that the numbers are centered around the median 5,967.80 as opposed to the mean as in this case it seems that the mean is being skewed upward by certain data in the set. The range shows a span of 3,391 and a standard deviation from the mean of 1,006.35. The interquartile range shows us that the middle fifty percent of the numbers are around 1,792.05 and the coefficient of variation of 0.164 indicates that the standard deviation is at 16.4 % of the mean. Finally, by looking at the Pearson Measure of 0.159 we see that the numbers are slightly skewed as the measure is greater than the absolute value of 0.1. The final explainer variable to look at is the civilian unemployment rate within the United States from 1992 to 2003. See the frequency distribution and histogram below for this sample data. Frequency Table Civilian Unemployment Rate (%) Civilian Unemployment Rate > 2 to 3 > 3 to 4 > 4 to 5 > 5 to 6 > 6 to 7 > 7 to 8 > 8 to 9 Frequency 0 3 48 49 15 18 0 Percentage 0.00% 1.13% 18.05% 18.42% 5.64% 6.77% 0.00% Cumulative % 0.00% 2.26% 38.35% 75.19% 86.47% 100.00% 100.00% 9 Frequency Histogram - Civilian Unemployment Rate 60 100.00% 40 50.00% 20 0 Frequency Cumulative % 0.00% 2.93.94.95.96.97.98.9 Civilian Unem ploym ent Rate Next, please see a chart with data surrounding to the descriptive statistics for the civilian unemployment rate. Descriptive Statistics - Civilian Unemployment Rate (%) Mean 5.430 Median 5.500 Range 4.000 Standard Deviation 1.076 Interquartile Range 1.500 Coefficient of Variation 0.198 Pearson Measure -0.065 Here we see that the numbers are centered around the mean of 5.430 as well as the median (5.500) which is again close to the mean. The range shows a span of 4 and a standard deviation from the mean of 1.076. The interquartile range shows us that the middle fifty percent of the numbers are around 1.5 and the coefficient of variation of 0.198 indicates that the standard deviation is at 19.8 % of the mean. Finally, by looking at the Pearson Measure of -0.065 we see that the numbers are approximately symmetrical as the absolute value of the measure is less than 0.1. We also see, as the Pearson Measure is negative, that there is a greater median than the mean which we can see from the data above is true. Now that we have sifted through all the descriptive analysis of the four variables we are looking at within this study, we will use multiple regression analysis to assess any 10 linkages that may exist between the variables. We will thus test the relationship between the dependent and explainer variables in the hopes of showing that there is a relationship between vehicle sales and personal consumption expenditures, disposable personal income, and civilian unemployment rate. We will first start by calculating the means and standard deviations of the data. This can be done by running a simple summary statistic in Excel. We will hold onto this information for the moment as we will need it a future step. Vehicle Sales Mean Median Standard Deviation Sample Variance Minimum Maximum Disposable Personal Income Mean Median Standard Deviation Sample Variance Minimum Maximum 15439.11 15332.40 1502.98 2258949.51 12294.00 21097.10 6127.94 5967.80 1006.35 1012740.13 4633.30 8024.30 Personal Consumption Expenditures Mean Median Standard Deviation Sample Variance Minimum Maximum 5667.44 5560.80 1016.69 1033658.41 4108.50 7477.50 Civilian Unemployment Rate Mean Median Standard Deviation Sample Variance Minimum Maximum 5.43 5.50 1.08 1.16 3.80 7.80 Next, we will run a multiple regression analysis of the data using PHStat. When plugging that information into Excel the following chart is created. Regression Statistics Multiple R 0.892483471 R Square 0.796526745 Adjusted R Square 0.791794809 Standard Error 685.8024592 Observations 133 11 Here we see that the R Square value is 0.767. This tells us that 76.7 % of the variation in monthly vehicle sales in the United States is determined by changes in personal consumption expenditures, disposable personal income, and civilian unemployment rate. In addition, we see that the adjusted R Square is about the same as the R Square value as the sample size of vehicle sales (n) is large at 133 and the number of explainers is small (3). We will now need to look back at the data computed in our first step as we calculate the percentage standard error of the regression by taking the standard error of regression (685.80) and dividing it by the mean of the dependent variable (15,439.11). In this case we get 0.044 or 4.4 %. By looking at this percentage we see that the standard deviation of error terms is 4.4 % of average vehicle sales. Generally, percentage standard error of regression less than or equal to 0.1 are considered good forecasters, so in this case we confirm that our forecasters are good ones. Next, we use the F-test for overall regression model to determine whether personal consumption expenditures, disposable personal income, and civilian unemployment rate have any influence on vehicle sales. Essentially, the F-test determines whether the explainer variables (B1, B2, and B3) are zeros. If they are all zeros then none of the explainer variables influence the dependent variable, making the regression analysis worthless. Please see the steps for the F-Test detailed below. Hypotheses: Ho Ha = = B1 = Ho is False B2 = B3 = 0 12 Based on the calculation done below when we ran the multiple regression analysis in Excel, we can next determine the sample and critical F scores. ANOVA df Regression Residual Total 3 129 132 SS 237509408.1 60671926.69 298181334.8 MS 79169802.72 470325.0131 F 168.3299857 Significance F 2.04134E-44 Sample & Critical F Values: Sample F = 168 Critical F = 2.67 at 5 % significance level and 3.94 at 1 % significance level (using the calculator shown below) Section II: critical F values Level of Significance Degrees of Freedom #1 Degrees of Freedom #2 critical F value Section II: critical F values 0.05 3 129 2.674832 Level of Significance Degrees of Freedom #1 Degrees of Freedom #2 critical F value 0.01 3 129 3.937119444 Please note that degrees of freedom # 1 is the same as k (number of explainer variables) and degrees of freedom # 2 is calculated by taking sample size (n) minus number of explainer variables (k) minus 1 (n-k-1). Decision: Since the absolute value of sample F is greater than the absolute value of critical F, reject Ho. At least one of the explainer variables influences the dependent variable. The significance value (highlighted above) is the p value for this test and shows that the chance of drawing samples like this one when the null is true is extremely low. Now that we have calculated the F-test for overall regression and determined that we can reject the null (Ho) we know that at least one of the three explainer variables 13 influences our dependent variable of vehicle sales. We must now do a T-test on each of the regression coefficients to test whether the true (population) coefficients are greater or less than zero and thus has a positive or negative effect on vehicle sales. In order to do this we will first need the remainder of the Excel calculations from our multiple regression analysis. Intercept Personal Consumption Expenditures Disposable Personal Income Civilian Unemployment Rate Coefficients 13691.83962 Standard Error 817.7793504 t Stat 16.74270647 P-value 3.81586E-34 8.464336426 -7.427437282 -130.5572336 1.089307822 1.07352741 83.28920621 7.770380651 -6.918721601 -1.567516843 2.14966E-12 1.90822E-10 0.119443699 Hypotheses: Ho Ha = = B1 B1 ≤ > 0 0 Ho Ha = = B2 B2 ≤ > 0 0 Ho Ha = = B3 B3 ≤ > 0 0 Sample & Critical T Values: B1: Sample T = 7.77 (based on Excel highlighted above) Critical T = 1.66 (based on one-tailed test critical T calculator shown below at 5 % significance level) B2: Sample T Critical T = = -6.92 1.66 = = -1.57 1.66 B3: Sample T Critical T 14 Section I: critical t values Level of Significance Degrees of Freedom Absolute Critical T Value 0.05 130 1.656659 (1 tailed value) Please note that the degrees of freedom value is calculated by subtracting the number of explainer variables (k) from the sample size (n) (n-k). Decisions: B1: Since the absolute value of sample T is greater than the absolute value critical T and greater than zero (must agree with Ha as it is a one-tailed test) you can reject the null (Ho) and conclude that higher personal consumption expenditures yields higher vehicle sales. B2: Since the absolute value of sample T is greater than the absolute value of critical T and greater than zero (must agree with Ha as it is a one-tailed test) you can reject the null (Ho) and conclude that higher disposable personal income yields higher vehicle sales. B3: Since the absolute value of sample T is less than the absolute value of critical T you cannot reject the null (Ho). As a result, you can only conclude that there is no evidence that vehicle sales change as the civilian unemployment rate changes. Now that we have drawn conclusions by running by the F and T-tests it would be beneficial to take a look at the estimated regression coefficients (calculated by Excel) in the chart below. 15 Intercept Personal Consumption Expenditures Disposable Personal Income Civilian Unemployment Rate Coefficients 13691.83962 Standard Error 817.7793504 t Stat 16.74270647 P-value 3.81586E-34 Lower 95% 12073.84317 Upper 95% 15309.83607 8.464336426 1.089307822 7.770380651 2.14966E-12 6.30911425 10.6195586 -7.427437282 1.07352741 -6.918721601 1.90822E-10 -9.551437526 -5.30343703 -130.5572336 83.28920621 -1.567516843 0.119443699 -295.3469657 34.23249855 The intercept means that vehicle sales would be 13,691.84 (thousands of units) if the personal consumption expenditures, disposable personal income, and civilian unemployment rate were all zeros. Any coefficient on an explanatory variable measures how many units the dependent variable will change if the explainer variable changes by one unit. Based on the coefficients above, we can conclude the following: If personal consumption expenditures increase by one percentage point, vehicle sales increase by 8.46 percentage points. If disposable personal income increases by one percentage point, vehicle sales increase by 7.43 percentage points. If the civilian unemployment rate increases by one percentage point, vehicle sales increase by 130.56 percentage points. Next we must calculate the standardized coefficients for the statistically significant coefficients. Standardized coefficients are a means for finding the most influential explanatory variables. Each measures how many standard deviations the dependent variable will change if the explainer changes by one standard deviation. To calculate this you must multiply the estimated regression coefficient (detailed in the paragraph above) 16 times the standard deviation of the explainer divided by the standard deviation of the dependent variable. Please see the chart below for the standardized coefficient calculations. Standardized Coefficient Personal Consumption Expenditures Disposable Personal Income Civilian Unemployment Rate Estimated Coefficient SD of X SD of Y 5.725694556 8.464336426 1016.69 1502.98 -4.973186452 -7.427437282 1006.35 1502.98 -0.093471411 -130.5572336 1.08 1502.98 By looking at these numbers we can conclude the following: A one standard deviation (SD) increase in personal consumption expenditures leads to a 5.73 standard deviation increase in vehicle sales. A one SD increase in disposable personal income leads to a 4.97 SD decrease in vehicle sales. A one SD increase in the civilian unemployment rate leads to a 0.09 SD decrease in vehicle sales. Thus we see that personal consumption expenditures is the most important factor determining vehicle sales and disposable personal income is the next most important. By taking this data it can be possible to forecast what might happen to vehicle sales (dependent variable) in the future based on how we think these explainer variables are going to shift. This could have been used as a useful tool to forecast what might have happened to vehicle sales in 2004 based on the results of our regression analysis. The confidence intervals for the regression coefficients show how large the population coefficients are likely to be. By looking at the Excel data listed below, we are 95 % confident that the “true” marginal effects on R of changes in personal consumption 17 expenditures, disposable personal income, and civilian unemployment rate and R values lie between 6.31 to 10.62, -9.55 to -5.30, and -295.35 to 34.23 respectively. We can also note that zero lies within the civilian unemployment rate interval and as the population regression coefficient could be zero; the civilian unemployment rate has no effect on R. Intercept Personal Consumption Expenditures Disposable Personal Income Civilian Unemployment Rate Coefficients 13691.83962 Standard Error 817.7793504 t Stat 16.74270647 P-value 3.81586E-34 Lower 95% 12073.84317 Upper 95% 15309.83607 8.464336426 1.089307822 7.770380651 2.14966E-12 6.30911425 10.6195586 -7.427437282 1.07352741 -6.918721601 1.90822E-10 -9.551437526 -5.30343703 -130.5572336 83.28920621 -1.567516843 0.119443699 -295.3469657 34.23249855 In conclusion, by completing both a descriptive and multiple regression analysis of vehicle sales as it relates to personal consumption expenditures, disposable personal income, and civilian unemployment rate, we have shown there to be a linkage between vehicle sales in the United States between the period of 1992 and 2003 and personal consumption expenditures and disposable personal income within this same period. We have also learned that there is no evidence to support a specific correlation between vehicle sales and civilian unemployment rate. Though in this case we were not able to prove a correlation between vehicle sales and civilian unemployment rate this does not necessarily mean that one does not exist, it simply means that based on the sample data we did not have enough evidence to draw a conclusion. However, as we were able to reject the null (Ho) in two of our three T-test hypotheses, we were able to learn something from our multiple regression analysis of these variables.