File

advertisement
IE 322 CASE STUDY
Group #:5
Name: ALRESAINI, MOHAMMED
Name: ALSHEHRI, ABDULKARIEM
Part I: Correlation and Covariance
Task 1:
Scatter Plot #1: Y (Market Share) vs. X1 (Absolute Unit Price) (5 Points)
Market Share (Y) vs Absolute Unit Price (X1)
30.00%
Market Share
y = -0.0012x + 0.2876
R² = 0.4517
25.00%
20.00%
15.00%
$36.00
$42.00
$48.00
$54.00
$60.00
$66.00
$72.00
$78.00
$84.00
$90.00
$96.00
Absoulte Unit Price
1
Scatter Plot #2: Y(Market Share) vs. X2 (Relative Unit Price) (5 Points)
Market Share (Y) vs Relative Unit Price X2
Market Share
30.00%
25.00%
20.00%
15.00%
0.3
0.35
0.4
0.45
0.5
0.55
Relative Unit Price
0.6
0.65
0.7
0.75
y = 0.2223x + 0.1212
R² = 0.6318
Task 2: (16 Points)
COV[X1, Y] = - 0.137383064
COV[X2,Y] = 0.001002051
CORR COEFF[X1, Y] = -0.672093319
CORR COEFF[X2, Y] = 0.794832546
2
Show Sample Calculations Below for COV[X1, Y]; COV[X2,Y]; CORR COEFF[X1, Y];
CORR COEFF[X2, Y]:
COV[X1, Y];
Please see appendix
COV[X2,Y];
CORR COEFF[X1, Y];
CORR COEFF[X2, Y]:
Correlations: x1, y
Pearson correlation of x1 and y = -0.672
P-Value = 0.000
Correlations: x2, y
Pearson correlation of x2 and y = 0.795
P-Value = 0.000
Covariances: x1, y
x1
y
x1
119.50363
-0.13854
y
0.00036
Covariances: x2, y
x2
y
x2
0.00454571
0.00101047
y
0.00035555
3
Task 3: Answer the Following Questions (12 Points)
(1) Comment on the scatter plots from Task #1. Explain what you can tell about
the relationship between Y and X1 and Y and X2 based on the scatter plots. You
may want to use the linear trendline and resulting R2 value to help you with this.
(4 Points)
The R2 value for (X1 : Y) graph has been calculated to be 0.4517 which
means by 45.17% X1 can explain the projection of Y values. The graph (X1 : Y)
illustrates the lack of accuracy ; the data points vary up and down along the
linearized function : y= -0.0012x + 0.2876
Moreover, the data points are much more concentrated around $ 42
comparing to $ 90 which implicitly explains that X1 is not the best variable to
project market share (Y) in the future.
Based on the scatter plots, the relationship between X2 and Y seems much
consistent and better in terms of accuracy than the relationship between X1 and Y.
The scatter points are mostly concentrated along the linearized function of
y=0.223x+0.1212
R2 value has been found to be 0.6138 which means 61.38% of Y variations can be
explained by X2.
(2) Comment on the Correlation Coefficients calculated and what they tell you
about the relationship between Y and X1 and Y and X2. Is there a relationship?
IF yes, is it positive or negative? Is it slight or strong? (4 Points)
CORR COEFF[X1, Y] = -0.672093319
The correlation coefficient between X1 and Y explains that they are inversely
(negatively) proportional to each other. The relationship is relatively strong. As X1
would increase Y will decrease (vice versa).
CORR COEFF[X2, Y] = 0.794832546
The correlation coefficient between X2 and Y explains that they are (positively)
proportional and the relationship is actually strong. As X2 increases Y will also
increase proportionally.
4
(3) Do you feel as though the relationships you discussed in (1) and (2) can be
used to accurately predict the market share (i.e. the dependent variable). In
other words, do you think the changes in the independent variables (X1, X2)
can accurately predict the dependent variable (Y) ? (4 Points)
As discussed in previous questions, the relationships of Correlations
between (X1 and Y) & (X2 and Y) are strong and the plots can give important
insights about the market estimations.
Changes in independent variables (X1, X2) can accurately predict the
dependent variable (Y).
Using two variables together rather than using each one at a time would give
better results in terms of accuracy. In other words, changes in either X1 or X2 will
affect the predictions on the dependent variable Y.
5
Part II: Regression Analysis
Task 1: (10 Points)
Paste the Minitab Output below for Y vs. X1, X2
————— 11/23/2013 5:55:57 PM ————————————————————
Welcome to Minitab, press F1 for help.
Regression Analysis: y versus x1, x2
The regression equation is
y = 0.179 - 0.00105 x1 + 0.208 x2
Predictor
Constant
x1
x2
Coef
0.179350
-0.00104880
0.207619
S = 0.000679380
SE Coef
0.000560
0.00000572
0.000927
R-Sq = 99.9%
T
320.19
-183.41
223.93
P
0.000
0.000
0.000
R-Sq(adj) = 99.9%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
DF
1
1
DF
2
117
119
SS
0.042256
0.000054
0.042310
MS
0.021128
0.000000
F
45775.34
Fit
0.219663
0.242472
0.193621
0.213454
0.280931
0.195257
0.203283
0.177053
0.211642
SE Fit
0.000125
0.000073
0.000263
0.000086
0.000218
0.000218
0.000252
0.000228
0.000103
Residual
0.001337
-0.001472
0.000379
-0.001454
0.001069
-0.000257
0.000717
0.000947
0.001358
P
0.000
Seq SS
0.019112
0.023144
Unusual Observations
Obs
18
25
28
48
61
73
106
111
118
x1
39.9
42.7
92.2
56.8
46.0
84.9
88.8
86.0
49.6
y
0.221000
0.241000
0.194000
0.212000
0.282000
0.195000
0.204000
0.178000
0.213000
St Resid
2.00R
-2.18R
0.61 X
-2.16R
1.66 X
-0.40 X
1.14 X
1.48 X
2.02R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
6
Task 2: Provide Answers to the Following Questions Based on the Minitab Output Above:
(24 Points)
(1) Looking at the R2 value (R-sq) in the output for the regression analysisplease write a one sentence interpretation of this value in the context of this
problem.
(6 Points)
R2 value (R-sq) in the output of Minitab regression analysis is 99.9% which
means that the prediction on Y values can be significantly explained by the X1 and
X2 values by 99.9%.
(2) From the output above, would you say that the Absolute Price (X1) is a
good predictor of the Market Share (Y)? Why or Why Not? Please make
sure you try to answer this using the p value for the Absolute Price shown. (6
Points)
Since p is less than alpha we reject the null hypothesis so Beta 1 is not 0 therefore
X1 is a signifanct
P value for the X1 variable is 0.000 which is less than α =0.05 and Ho: β1= 0, HA:
β1≠0.
P value = 0.000 is less than 0.05 The decision rule at the 0.05 significance level is
to reject the null hypothesis since our p < 0.05 Thus we can conclude that with
95% confidence there is a statistically significant evidence that absolute price (X1)
is a good predictor of the Market Share (Y).
(3) From the output above, would you say that the Relative Price (X2) is a
good predictor of the Market Share (Y)? Why or Why Not? Please make
sure you try to answer this using the p value for the Relative Price shown. (6
Points)
P value for the X2 variable is 0.000 which is less than α =0.05 and Ho: β2= 0, HA:
β2≠0.
P value = 0.000 is less than 0.05 The decision rule at the 0.05 significance level is
to reject the null hypothesis since our p < 0.05 Thus we can conclude that with
95% confidence there is a statistically significant evidence that Relative price (X2)
is a good predictor of the Market Share (Y).
7
(4) Do You feel as though the linear regression model (i.e. the regression equation)
does a good job of estimating (Y)? Explain. (6 Points)
R2 value R-Sq = 99.9% which means that using X1 and X2 values 99.9% of Y
values variations can be explained. As concluded previously X1 and X2 are
significant variables using p value hypothesis by rejecting null hypothesis and
accepting alternative hypothesis. Additionally, the correlation vales were also in
strong relationship.
The regression equation is : y = 0.179 - 0.00105 x1 + 0.208 x2
As we plug X1 and X2 values into the equation, Y values obtained from the
calculations are very much similar to the actual Y values which demonstrates what
had been said earlier.
8
Part III: Matching Distributions
Task 1: Paste Your Minitab Histogram Below (Y- Times): (3 Points)
Histogram of t
50
Frequency
40
30
20
10
0
6
12
18
24
30
36
t
9
Task 2: Paste Your (2) Empirical CDF Plots Below (Normal, Exponential): (6 Points)
Empirical CDF of t
Exponential
Mean
N
100
10.06
250
Percent
80
60
40
20
0
0
10
20
30
40
50
t
10
Task 3: Provide Answers to the Following Questions: (6 Points)
(1) From your work in Task above, comment on the Histogram in 100 words or
less. From your work in Task 1, does time (t) seem to follow a normal
distribution? If not, from your knowledge gained in IE 322, what probability
distribution does it appear the data may fit? (3 Points)
The Histogram does not follow a normal distribution. The more frequent
values are concentrated on the left side of the x-axis therefore the graph is
more likely to be right skewed. The mean is on the left side of the Histogram.
Considering the values, the graph appears to fit exponential probability
distribution because the data set (Y values regarding X values) is decreasing
most likely exponentially.
(2) From your work in Task 2 above, Based on the CDF plots, discuss which
distribution appears to be a good fit for the data. From the plot of the distribution
that appears to be a good fit for the data, please estimate the parameter(s) for that
specific distribution. (3 Points)
Using the software Minitab, team members has plotted the data set in
empirical cumulative distribution function using two types of distributions
normal and exponential.
As seen from the 1st graph of task 2, normal distribution graph has two kinds
of lines. Blue line a cumulative distribution of a normal distribution and the
line in red line normalized distribution of the data set. As observed from the
graph, there is a gap in between red and blue lines which implicitly explains
that normal distribution is not the accurate distribution.
On the other hand, the exponential cumulative distribution graph’s blue and
red line’s follows each other along the function line very closely. Therefore
exponential data graph proves that exponential distribution is a good fit for
the date and a much better fit than normal distribution.
The mean from the exponential distribution is 10.06 which means 10.06 min is
waiting an airplane time for take off.
11
Task 4: Provide Answers to the Following Questions: (13 Points)
(1) Complete the Following Table and Explain your calculations (5 Points):
t
# of
occurrences
f(t)
F(t)
ln(1/(1-F(t)))
/t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
23
21
19
17
16
14
13
12
11
10
9
8
7
7
6
6
5
5
4
4
4
3
3
3
3
2
2
2
2
2
2
2
1
1
1
0.092
0.084
0.076
0.068
0.064
0.056
0.052
0.048
0.044
0.04
0.036
0.032
0.028
0.028
0.024
0.024
0.02
0.02
0.016
0.016
0.016
0.012
0.012
0.012
0.012
0.008
0.008
0.008
0.008
0.008
0.008
0.008
0.004
0.004
0.004
0.092
0.176
0.252
0.32
0.384
0.44
0.492
0.54
0.584
0.624
0.66
0.692
0.72
0.748
0.772
0.796
0.816
0.836
0.852
0.868
0.884
0.896
0.908
0.92
0.932
0.94
0.948
0.956
0.964
0.972
0.98
0.988
0.992
0.996
1
0.096511
0.193585
0.290352
0.385662
0.484508
0.579818
0.677274
0.776529
0.87707
0.978166
1.07881
1.177655
1.272966
1.378326
1.47841
1.589635
1.69282
1.807889
1.910543
2.024953
2.154165
2.263364
2.385967
2.525729
2.688248
2.813411
2.956512
3.123566
3.324236
3.575551
3.912023
4.422849
4.828314
5.521461
#NUM!
0.096511
0.096792
0.096784
0.096416
0.096902
0.096636
0.096753
0.097066
0.097452
0.097817
0.098074
0.098138
0.09792
0.098452
0.098561
0.099352
0.099578
0.100438
0.100555
0.101248
0.102579
0.10288
0.103738
0.105239
0.10753
0.108208
0.1095
0.111556
0.114629
0.119185
0.126194
0.138214
0.146313
0.162396
#NUM!
12
(2) Please Paste in below your plot from Excel for the Distribution Fit with a
Linear Trendline (See Unit 13 Lecture Edited- Slide 21 for Help with the Plot) (4
Points)
ln vs t
y = 0.1181x
R² = 0.9164
6
5
4
3
2
1
0
0
5
10
15
20
25
30
35
40
(3) From the Plot above, please estimate the parameter λ and then interpret the
parameter (Hint: Explain what 1/ λ means in the context of this problem). (4
Points)
The plot fits into an exponentially distribution therefore the plot is linearized as a
function crossing the origin. Slope read from the equation is the value of λ =0.1181
The mean is equal to 1/ λ therefore 1/0.1181 = 8.4674 min
The average mean time for a plane to wait for a take off is 8.4674 minutes which is
about 8 minutes and 28 seconds.
GRADE: ________/ 100
13
——
——— 11/23/2013 8:16:17 PM ————————————————————
Welcome to Minitab, press F1 for help.
Histogram of t
Descriptive Statistics: t
Variable
t
N
250
N*
0
Variable
t
Q3
15.000
Mean
10.060
SE Mean
0.522
StDev
8.250
Variance
68.057
Minimum
1.000
Q1
3.000
Median
8.000
Maximum
35.000
14
Download