sampleproject

advertisement
Sample Project
2
This paper will serve to analyze vehicle sales of autos and light trucks in the United
States using data obtained from USDATA on the website spanning a period of time from
1992 to 2003. Vehicle sales will serve as the dependent variable in this study and we will
attempt to explain this data by looking at the following three explainer variables: personal
consumption expenditures, disposable personal income, and civilian unemployment rate.
We will begin with a literary review that gives some general information on vehicle sales
in the United States and then move on to describe each of the four variables. After
describing each variable, we will move on to a show any linkages that may exist between
these variables.
A study was published in the Journal of Business and Economic Statistics in
October 1992 by Thompson and Noordewier titled Estimating the Effects of Consumer
Incentive Programs on Domestic Automobile Sales. Though this study was done in the
early nineties and things have shifted over the past fifteen years, there is still some merit
to be found in looking at this paper. In this study the authors are looking to find
connections between “patterns of sales gained during the promotions and sales lost during
any post promotion troughs (410).” Over a period of four years they looked at sales
patterns for Ford Motor Company, GM, and Chrysler Corporation. Thompson and
Noordewier make the important point, which ties into this study that “industry
observers…and other researchers…suggest that the mean may be predicted by sales of
import autos and by macroeconomic variables such as gross national product, interest
rates, disposable income, and unemployment rates (411).” The authors test their data in a
number of different ways including intervention analysis and ultimately conclude that
there was a significant change in consumer response to promotions run during the time
3
periods analyzed. In addition, they found that as automobiles are a durable good,
frequent promotion of such a good can produce a pattern that is less predictable than that
of a nondurable good (like toothpaste and cookies) and ultimately more research is
required (417).
Having taken a look at this study, we will now delve in to the variables to be
analyzed in this paper. The first key variable to be discussed is our dependent variable,
vehicle sales within the United States. We are looking at data that spans from 1992 to
2003 and displays the data in thousands of units (SAAR). Below, please see a frequency
distribution that gives an overview of this vehicle sales data.
Frequency Table
Vehicle Sales: Autos and light trucks, (Thousands of units, SAAR)
Vehicle Sales
Frequency
Percentage
Cumulative %
> 11000 to 12000
0
0.00%
0.00%
> 12000 to 13000
10
3.76%
7.52%
> 13000 to 14000
8
3.01%
13.53%
> 14000 to 15000
34
12.78%
39.10%
> 15000 to 16000
33
12.41%
63.91%
> 16000 to 17000
29
10.90%
85.71%
> 17000 to 18000
12
4.51%
94.74%
> 18000 to 19000
6
2.26%
99.25%
> 19000 to 20000
0
0.00%
99.25%
> 20000 to 21000
0
0.00%
99.25%
> 21000 to 22000
1
0.38%
100.00%
Generally speaking, a frequency distribution is a summary table that arranges data into
numerically ordered class groupings or categories. It is used as a way of organizing
larger sets of numbers.
Another way of looking at this data is by creating a histogram which also
describes data that has been grouped into frequency distributions using rectangular bars
that are constructed at the boundaries of each class. Below please see a histogram for
vehicle sales.
4
40
20
0
100.00%
50.00%
0.00%
11
9
13 99
9
15 99
9
17 99
9
19 99
9
21 99
99
9
Frequency
Histogram - Vehicle Sales
Frequency
Cumulative %
Vehicle Sales
Finally, it is important to talk about descriptive statistics for this data such as
where the numbers are centered, what is the variation or spread, as well as what is the
shape or symmetry to the numbers. By running summary statistics in Excel and a five
number summary in PHStat we can see that vehicle sales are centered around the mean
which in this case is 15,439.11. The mean is the average of the numbers. The numbers
are also somewhat centered around the median or middle number that separates the
lowest fifty percent of the numbers from the top fifty percent. In this case the median is
15,332.4, which is relatively close to the mean. You can also calculate the range of the
numbers or the size of the interval that contains all numbers by subtracting the Minimum
from the Maximum (Max – Min). The range here is 8,803.1. Also, the standard
deviation is important to consider. In this case, standard deviation is a measure of how
much the vehicle sales numbers differ around the mean or average. The standard
deviation here is 1,502.98. It is also beneficial to calculate the interquartile range which
shows the middle fifty percent of the numbers by subtracting the first quartile or 25th
percentile from the third quartile or seventy fifth percentile (3rd quartile – 1st quartile). In
this case the interquartile range is 1,879.74. You can also calculate the coefficient of
variation which measures how big the standard deviation is as a fraction or percentage of
the mean (SD/mean). The coefficient of variation here is 0.097 or 9.7 %. Finally, it is
5
important to assess the shape of the data to determine if they are symmetrical of not. The
best way to do this is to run a test for the Pearson Measure of Skewness which is
calculated by taking the mean minus the median and dividing it by the standard deviation
((Mean – Median)/SD). When the absolute value of the Pearson Measure is less than 0.1
the numbers are considered to be symmetrical. When they are greater than or equal to
0.1, the numbers are considered to be skewed. In this case the Pearson Measure is 0.071,
which means that these numbers can be considered symmetrical. Please see a chart
below detailing all numbers described in the above paragraph.
Descriptive Statistics - Vehicle Sales
Autos and Light Trucks (Thousands of units, SAAR)
Mean
15439.111
Median
15332.400
Range
8803.100
Standard Deviation
1502.980
Interquartile Range
1879.735
Coefficient of Variation
0.097
Pearson Measure
0.071
After looking at this descriptive analysis of the dependent variable we must now
take a look at the three explainer variables that we hope will have a link to our dependent
variable: personal consumption expenditures, disposable personal income, and civilian
unemployment rate. Let us begin with our first explainer variable, personal consumption
expenditures for the United States which is measured in billions for dollars, also from the
time span of 1992 to 2003. First, please see a frequency distribution and histogram for
this data.
6
Frequency Table
Personal Consumption Expenditures ($ Billions)
Personal Consumption
Cumulative
Expenditures
Frequency Percentage
%
> 3500 to 4000
0
0.00%
0.00%
> 4000 to 4500
20
7.52%
15.04%
> 4500 to 5000
23
8.65%
32.33%
> 5000 to 5500
23
8.65%
49.62%
> 5500 to 6000
17
6.39%
62.41%
> 6000 to 6500
14
5.26%
72.93%
> 6500 to 7000
19
7.14%
87.22%
> 7000 to 7500
17
6.39%
100.00%
30
20
10
0
100.00%
50.00%
0.00%
39
99
49
99
59
99
69
99
Frequency
Histogram - Personal Consum ption Expenditures
Frequency
Cumulative %
Personal Consum ption
Expenditures
In terms of a descriptive analysis of these numbers for personal consumption, we
will first display the chart of numbers and then run through a brief explanation of what
they mean.
Descriptive Statistics - Personal Consumption Expenditures
($ Billions)
Mean
5667.441
Median
5560.800
Range
3369.000
Standard Deviation
1016.690
Interquartile Range
1830.700
Coefficient of Variation
0.179
Pearson Measure
0.105
Here we see that the numbers are centered around the mean of 5,667.44 as well as the
median (5,560.80) which is again close to the mean. The range shows a span of 3,369
7
and a standard deviation from the mean of 1,016.69. The interquartile range shows us
that the middle fifty percent of the numbers are around 1,830.70 and the coefficient of
variation of 0.179 indicates that the standard deviation is at 17.9 % of the mean. Finally,
by looking at the Pearson Measure of 0.105 we see that the numbers are just slightly
skewed as the measure is slightly greater than/almost equal to the absolute value of 0.1.
The next explainer variable to look at is the disposable personal income within the
United States from the period of 1992 to 2003. Please see the frequency distribution and
histogram chart below.
Frequency Table
Disposable Personal Income ($ Billions)
Disposable Personal
Cumulative
Income
Frequency Percentage
%
> 3500 to 4000
0
0.00%
0.00%
> 4000 to 4500
0
0.00%
0.00%
> 4500 to 5000
23
8.65%
17.29%
> 5000 to 5500
23
8.65%
34.59%
> 5500 to 6000
21
7.89%
50.38%
> 6000 to 6500
17
6.39%
63.16%
> 6500 to 7000
14
5.26%
73.68%
> 7000 to 7500
20
7.52%
88.72%
> 7500 to 8000
15
5.64%
100.00%
30
20
10
0
100.00%
50.00%
0.00%
Frequency
Cumulative %
39
99
49
99
59
99
69
9
79 9
99
+
Frequency
Histogram - Disposable Personal Income
Disposable Personal Incom e
Next, please see a chart with numbers critical to a descriptive analysis of
disposable personal income.
8
Descriptive Statistics - Disposable Personal Income
($ Billions)
Mean
6127.944
Median
5967.800
Range
3391.000
Standard Deviation
1006.350
Interquartile Range
1792.050
Coefficient of Variation
0.164
Pearson Measure
0.159
Here we see that there is a more significant difference between the mean and median. In
this case it is probably safer to say that the numbers are centered around the median
5,967.80 as opposed to the mean as in this case it seems that the mean is being skewed
upward by certain data in the set. The range shows a span of 3,391 and a standard
deviation from the mean of 1,006.35. The interquartile range shows us that the middle
fifty percent of the numbers are around 1,792.05 and the coefficient of variation of 0.164
indicates that the standard deviation is at 16.4 % of the mean. Finally, by looking at the
Pearson Measure of 0.159 we see that the numbers are slightly skewed as the measure is
greater than the absolute value of 0.1.
The final explainer variable to look at is the civilian unemployment rate within
the United States from 1992 to 2003. See the frequency distribution and histogram below
for this sample data.
Frequency Table
Civilian Unemployment Rate (%)
Civilian Unemployment Rate
> 2 to 3
> 3 to 4
> 4 to 5
> 5 to 6
> 6 to 7
> 7 to 8
> 8 to 9
Frequency
0
3
48
49
15
18
0
Percentage
0.00%
1.13%
18.05%
18.42%
5.64%
6.77%
0.00%
Cumulative
%
0.00%
2.26%
38.35%
75.19%
86.47%
100.00%
100.00%
9
Frequency
Histogram - Civilian Unemployment Rate
60
100.00%
40
50.00%
20
0
Frequency
Cumulative %
0.00%
2.93.94.95.96.97.98.9
Civilian Unem ploym ent Rate
Next, please see a chart with data surrounding to the descriptive statistics for the
civilian unemployment rate.
Descriptive Statistics - Civilian Unemployment Rate
(%)
Mean
5.430
Median
5.500
Range
4.000
Standard Deviation
1.076
Interquartile Range
1.500
Coefficient of Variation
0.198
Pearson Measure
-0.065
Here we see that the numbers are centered around the mean of 5.430 as well as the
median (5.500) which is again close to the mean. The range shows a span of 4 and a
standard deviation from the mean of 1.076. The interquartile range shows us that the
middle fifty percent of the numbers are around 1.5 and the coefficient of variation of
0.198 indicates that the standard deviation is at 19.8 % of the mean. Finally, by looking
at the Pearson Measure of -0.065 we see that the numbers are approximately symmetrical
as the absolute value of the measure is less than 0.1. We also see, as the Pearson
Measure is negative, that there is a greater median than the mean which we can see from
the data above is true.
Now that we have sifted through all the descriptive analysis of the four variables
we are looking at within this study, we will use multiple regression analysis to assess any
10
linkages that may exist between the variables. We will thus test the relationship between
the dependent and explainer variables in the hopes of showing that there is a relationship
between vehicle sales and personal consumption expenditures, disposable personal
income, and civilian unemployment rate. We will first start by calculating the means and
standard deviations of the data. This can be done by running a simple summary statistic
in Excel. We will hold onto this information for the moment as we will need it a future
step.
Vehicle Sales
Mean
Median
Standard
Deviation
Sample Variance
Minimum
Maximum
Disposable
Personal
Income
Mean
Median
Standard
Deviation
Sample Variance
Minimum
Maximum
15439.11
15332.40
1502.98
2258949.51
12294.00
21097.10
6127.94
5967.80
1006.35
1012740.13
4633.30
8024.30
Personal
Consumption
Expenditures
Mean
Median
Standard
Deviation
Sample Variance
Minimum
Maximum
5667.44
5560.80
1016.69
1033658.41
4108.50
7477.50
Civilian
Unemployment
Rate
Mean
Median
Standard
Deviation
Sample Variance
Minimum
Maximum
5.43
5.50
1.08
1.16
3.80
7.80
Next, we will run a multiple regression analysis of the data using PHStat. When
plugging that information into Excel the following chart is created.
Regression Statistics
Multiple R
0.892483471
R Square
0.796526745
Adjusted R Square
0.791794809
Standard Error
685.8024592
Observations
133
11
Here we see that the R Square value is 0.767. This tells us that 76.7 % of the variation in
monthly vehicle sales in the United States is determined by changes in personal
consumption expenditures, disposable personal income, and civilian unemployment rate.
In addition, we see that the adjusted R Square is about the same as the R Square value as
the sample size of vehicle sales (n) is large at 133 and the number of explainers is small
(3).
We will now need to look back at the data computed in our first step as we
calculate the percentage standard error of the regression by taking the standard error of
regression (685.80) and dividing it by the mean of the dependent variable (15,439.11). In
this case we get 0.044 or 4.4 %. By looking at this percentage we see that the standard
deviation of error terms is 4.4 % of average vehicle sales. Generally, percentage standard
error of regression less than or equal to 0.1 are considered good forecasters, so in this
case we confirm that our forecasters are good ones.
Next, we use the F-test for overall regression model to determine whether
personal consumption expenditures, disposable personal income, and civilian
unemployment rate have any influence on vehicle sales. Essentially, the F-test
determines whether the explainer variables (B1, B2, and B3) are zeros. If they are all
zeros then none of the explainer variables influence the dependent variable, making the
regression analysis worthless. Please see the steps for the F-Test detailed below.
Hypotheses:
Ho
Ha
=
=
B1
=
Ho is False
B2
=
B3
=
0
12
Based on the calculation done below when we ran the multiple regression analysis in
Excel, we can next determine the sample and critical F scores.
ANOVA
df
Regression
Residual
Total
3
129
132
SS
237509408.1
60671926.69
298181334.8
MS
79169802.72
470325.0131
F
168.3299857
Significance
F
2.04134E-44
Sample & Critical F Values:
Sample F
=
168
Critical F
=
2.67 at 5 % significance level and 3.94 at 1 % significance level
(using the calculator shown below)
Section II: critical F
values
Level of Significance
Degrees of Freedom #1
Degrees of Freedom #2
critical F value
Section II: critical F
values
0.05
3
129
2.674832
Level of Significance
Degrees of Freedom #1
Degrees of Freedom #2
critical F value
0.01
3
129
3.937119444
Please note that degrees of freedom # 1 is the same as k (number of explainer variables)
and degrees of freedom # 2 is calculated by taking sample size (n) minus number of
explainer variables (k) minus 1 (n-k-1).
Decision:
Since the absolute value of sample F is greater than the absolute value of critical F, reject
Ho. At least one of the explainer variables influences the dependent variable. The
significance value (highlighted above) is the p value for this test and shows that the
chance of drawing samples like this one when the null is true is extremely low.
Now that we have calculated the F-test for overall regression and determined that
we can reject the null (Ho) we know that at least one of the three explainer variables
13
influences our dependent variable of vehicle sales. We must now do a T-test on each of
the regression coefficients to test whether the true (population) coefficients are greater or
less than zero and thus has a positive or negative effect on vehicle sales. In order to do
this we will first need the remainder of the Excel calculations from our multiple
regression analysis.
Intercept
Personal Consumption
Expenditures
Disposable Personal Income
Civilian Unemployment Rate
Coefficients
13691.83962
Standard
Error
817.7793504
t Stat
16.74270647
P-value
3.81586E-34
8.464336426
-7.427437282
-130.5572336
1.089307822
1.07352741
83.28920621
7.770380651
-6.918721601
-1.567516843
2.14966E-12
1.90822E-10
0.119443699
Hypotheses:
Ho
Ha
=
=
B1
B1
≤
>
0
0
Ho
Ha
=
=
B2
B2
≤
>
0
0
Ho
Ha
=
=
B3
B3
≤
>
0
0
Sample & Critical T Values:
B1:
Sample T
=
7.77 (based on Excel highlighted above)
Critical T
=
1.66 (based on one-tailed test critical T calculator shown below at
5 % significance level)
B2:
Sample T
Critical T
=
=
-6.92
1.66
=
=
-1.57
1.66
B3:
Sample T
Critical T
14
Section I: critical t values
Level of Significance
Degrees of Freedom
Absolute Critical T Value
0.05
130
1.656659
(1 tailed
value)
Please note that the degrees of freedom value is calculated by subtracting the number of
explainer variables (k) from the sample size (n) (n-k).
Decisions:
B1:
Since the absolute value of sample T is greater than the absolute value critical T and
greater than zero (must agree with Ha as it is a one-tailed test) you can reject the null
(Ho) and conclude that higher personal consumption expenditures yields higher vehicle
sales.
B2:
Since the absolute value of sample T is greater than the absolute value of critical T and
greater than zero (must agree with Ha as it is a one-tailed test) you can reject the null
(Ho) and conclude that higher disposable personal income yields higher vehicle sales.
B3:
Since the absolute value of sample T is less than the absolute value of critical T you
cannot reject the null (Ho). As a result, you can only conclude that there is no evidence
that vehicle sales change as the civilian unemployment rate changes.
Now that we have drawn conclusions by running by the F and T-tests it would be
beneficial to take a look at the estimated regression coefficients (calculated by Excel) in
the chart below.
15
Intercept
Personal
Consumption
Expenditures
Disposable
Personal Income
Civilian
Unemployment
Rate
Coefficients
13691.83962
Standard
Error
817.7793504
t Stat
16.74270647
P-value
3.81586E-34
Lower 95%
12073.84317
Upper 95%
15309.83607
8.464336426
1.089307822
7.770380651
2.14966E-12
6.30911425
10.6195586
-7.427437282
1.07352741
-6.918721601
1.90822E-10
-9.551437526
-5.30343703
-130.5572336
83.28920621
-1.567516843
0.119443699
-295.3469657
34.23249855
The intercept means that vehicle sales would be 13,691.84 (thousands of units) if the
personal consumption expenditures, disposable personal income, and civilian
unemployment rate were all zeros. Any coefficient on an explanatory variable measures
how many units the dependent variable will change if the explainer variable changes by
one unit. Based on the coefficients above, we can conclude the following:

If personal consumption expenditures increase by one percentage point, vehicle
sales increase by 8.46 percentage points.

If disposable personal income increases by one percentage point, vehicle sales
increase by 7.43 percentage points.

If the civilian unemployment rate increases by one percentage point, vehicle sales
increase by 130.56 percentage points.
Next we must calculate the standardized coefficients for the statistically significant
coefficients. Standardized coefficients are a means for finding the most influential
explanatory variables. Each measures how many standard deviations the dependent
variable will change if the explainer changes by one standard deviation. To calculate this
you must multiply the estimated regression coefficient (detailed in the paragraph above)
16
times the standard deviation of the explainer divided by the standard deviation of the
dependent variable. Please see the chart below for the standardized coefficient
calculations.
Standardized
Coefficient
Personal Consumption
Expenditures
Disposable Personal
Income
Civilian Unemployment
Rate
Estimated
Coefficient
SD of X
SD of Y
5.725694556
8.464336426
1016.69
1502.98
-4.973186452
-7.427437282
1006.35
1502.98
-0.093471411
-130.5572336
1.08
1502.98
By looking at these numbers we can conclude the following:

A one standard deviation (SD) increase in personal consumption expenditures
leads to a 5.73 standard deviation increase in vehicle sales.

A one SD increase in disposable personal income leads to a 4.97 SD decrease
in vehicle sales.

A one SD increase in the civilian unemployment rate leads to a 0.09 SD
decrease in vehicle sales.
Thus we see that personal consumption expenditures is the most important factor
determining vehicle sales and disposable personal income is the next most important. By
taking this data it can be possible to forecast what might happen to vehicle sales
(dependent variable) in the future based on how we think these explainer variables are
going to shift. This could have been used as a useful tool to forecast what might have
happened to vehicle sales in 2004 based on the results of our regression analysis.
The confidence intervals for the regression coefficients show how large the
population coefficients are likely to be. By looking at the Excel data listed below, we are
95 % confident that the “true” marginal effects on R of changes in personal consumption
17
expenditures, disposable personal income, and civilian unemployment rate and R values
lie between 6.31 to 10.62, -9.55 to -5.30, and -295.35 to 34.23 respectively. We can also
note that zero lies within the civilian unemployment rate interval and as the population
regression coefficient could be zero; the civilian unemployment rate has no effect on R.
Intercept
Personal
Consumption
Expenditures
Disposable
Personal Income
Civilian
Unemployment
Rate
Coefficients
13691.83962
Standard
Error
817.7793504
t Stat
16.74270647
P-value
3.81586E-34
Lower 95%
12073.84317
Upper 95%
15309.83607
8.464336426
1.089307822
7.770380651
2.14966E-12
6.30911425
10.6195586
-7.427437282
1.07352741
-6.918721601
1.90822E-10
-9.551437526
-5.30343703
-130.5572336
83.28920621
-1.567516843
0.119443699
-295.3469657
34.23249855
In conclusion, by completing both a descriptive and multiple regression analysis
of vehicle sales as it relates to personal consumption expenditures, disposable personal
income, and civilian unemployment rate, we have shown there to be a linkage between
vehicle sales in the United States between the period of 1992 and 2003 and personal
consumption expenditures and disposable personal income within this same period. We
have also learned that there is no evidence to support a specific correlation between
vehicle sales and civilian unemployment rate. Though in this case we were not able to
prove a correlation between vehicle sales and civilian unemployment rate this does not
necessarily mean that one does not exist, it simply means that based on the sample data
we did not have enough evidence to draw a conclusion. However, as we were able to
reject the null (Ho) in two of our three T-test hypotheses, we were able to learn
something from our multiple regression analysis of these variables.
Download