IE 322 CASE STUDY Group #:5 Name: ALRESAINI, MOHAMMED Name: ALSHEHRI, ABDULKARIEM Part I: Correlation and Covariance Task 1: Scatter Plot #1: Y (Market Share) vs. X1 (Absolute Unit Price) (5 Points) Market Share (Y) vs Absolute Unit Price (X1) 30.00% Market Share y = -0.0012x + 0.2876 R² = 0.4517 25.00% 20.00% 15.00% $36.00 $42.00 $48.00 $54.00 $60.00 $66.00 $72.00 $78.00 $84.00 $90.00 $96.00 Absoulte Unit Price 1 Scatter Plot #2: Y(Market Share) vs. X2 (Relative Unit Price) (5 Points) Market Share (Y) vs Relative Unit Price X2 Market Share 30.00% 25.00% 20.00% 15.00% 0.3 0.35 0.4 0.45 0.5 0.55 Relative Unit Price 0.6 0.65 0.7 0.75 y = 0.2223x + 0.1212 R² = 0.6318 Task 2: (16 Points) COV[X1, Y] = - 0.137383064 COV[X2,Y] = 0.001002051 CORR COEFF[X1, Y] = -0.672093319 CORR COEFF[X2, Y] = 0.794832546 2 Show Sample Calculations Below for COV[X1, Y]; COV[X2,Y]; CORR COEFF[X1, Y]; CORR COEFF[X2, Y]: COV[X1, Y]; Please see appendix COV[X2,Y]; CORR COEFF[X1, Y]; CORR COEFF[X2, Y]: Correlations: x1, y Pearson correlation of x1 and y = -0.672 P-Value = 0.000 Correlations: x2, y Pearson correlation of x2 and y = 0.795 P-Value = 0.000 Covariances: x1, y x1 y x1 119.50363 -0.13854 y 0.00036 Covariances: x2, y x2 y x2 0.00454571 0.00101047 y 0.00035555 3 Task 3: Answer the Following Questions (12 Points) (1) Comment on the scatter plots from Task #1. Explain what you can tell about the relationship between Y and X1 and Y and X2 based on the scatter plots. You may want to use the linear trendline and resulting R2 value to help you with this. (4 Points) The R2 value for (X1 : Y) graph has been calculated to be 0.4517 which means by 45.17% X1 can explain the projection of Y values. The graph (X1 : Y) illustrates the lack of accuracy ; the data points vary up and down along the linearized function : y= -0.0012x + 0.2876 Moreover, the data points are much more concentrated around $ 42 comparing to $ 90 which implicitly explains that X1 is not the best variable to project market share (Y) in the future. Based on the scatter plots, the relationship between X2 and Y seems much consistent and better in terms of accuracy than the relationship between X1 and Y. The scatter points are mostly concentrated along the linearized function of y=0.223x+0.1212 R2 value has been found to be 0.6138 which means 61.38% of Y variations can be explained by X2. (2) Comment on the Correlation Coefficients calculated and what they tell you about the relationship between Y and X1 and Y and X2. Is there a relationship? IF yes, is it positive or negative? Is it slight or strong? (4 Points) CORR COEFF[X1, Y] = -0.672093319 The correlation coefficient between X1 and Y explains that they are inversely (negatively) proportional to each other. The relationship is relatively strong. As X1 would increase Y will decrease (vice versa). CORR COEFF[X2, Y] = 0.794832546 The correlation coefficient between X2 and Y explains that they are (positively) proportional and the relationship is actually strong. As X2 increases Y will also increase proportionally. 4 (3) Do you feel as though the relationships you discussed in (1) and (2) can be used to accurately predict the market share (i.e. the dependent variable). In other words, do you think the changes in the independent variables (X1, X2) can accurately predict the dependent variable (Y) ? (4 Points) As discussed in previous questions, the relationships of Correlations between (X1 and Y) & (X2 and Y) are strong and the plots can give important insights about the market estimations. Changes in independent variables (X1, X2) can accurately predict the dependent variable (Y). Using two variables together rather than using each one at a time would give better results in terms of accuracy. In other words, changes in either X1 or X2 will affect the predictions on the dependent variable Y. 5 Part II: Regression Analysis Task 1: (10 Points) Paste the Minitab Output below for Y vs. X1, X2 ————— 11/23/2013 5:55:57 PM ———————————————————— Welcome to Minitab, press F1 for help. Regression Analysis: y versus x1, x2 The regression equation is y = 0.179 - 0.00105 x1 + 0.208 x2 Predictor Constant x1 x2 Coef 0.179350 -0.00104880 0.207619 S = 0.000679380 SE Coef 0.000560 0.00000572 0.000927 R-Sq = 99.9% T 320.19 -183.41 223.93 P 0.000 0.000 0.000 R-Sq(adj) = 99.9% Analysis of Variance Source Regression Residual Error Total Source x1 x2 DF 1 1 DF 2 117 119 SS 0.042256 0.000054 0.042310 MS 0.021128 0.000000 F 45775.34 Fit 0.219663 0.242472 0.193621 0.213454 0.280931 0.195257 0.203283 0.177053 0.211642 SE Fit 0.000125 0.000073 0.000263 0.000086 0.000218 0.000218 0.000252 0.000228 0.000103 Residual 0.001337 -0.001472 0.000379 -0.001454 0.001069 -0.000257 0.000717 0.000947 0.001358 P 0.000 Seq SS 0.019112 0.023144 Unusual Observations Obs 18 25 28 48 61 73 106 111 118 x1 39.9 42.7 92.2 56.8 46.0 84.9 88.8 86.0 49.6 y 0.221000 0.241000 0.194000 0.212000 0.282000 0.195000 0.204000 0.178000 0.213000 St Resid 2.00R -2.18R 0.61 X -2.16R 1.66 X -0.40 X 1.14 X 1.48 X 2.02R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. 6 Task 2: Provide Answers to the Following Questions Based on the Minitab Output Above: (24 Points) (1) Looking at the R2 value (R-sq) in the output for the regression analysisplease write a one sentence interpretation of this value in the context of this problem. (6 Points) R2 value (R-sq) in the output of Minitab regression analysis is 99.9% which means that the prediction on Y values can be significantly explained by the X1 and X2 values by 99.9%. (2) From the output above, would you say that the Absolute Price (X1) is a good predictor of the Market Share (Y)? Why or Why Not? Please make sure you try to answer this using the p value for the Absolute Price shown. (6 Points) Since p is less than alpha we reject the null hypothesis so Beta 1 is not 0 therefore X1 is a signifanct P value for the X1 variable is 0.000 which is less than α =0.05 and Ho: β1= 0, HA: β1≠0. P value = 0.000 is less than 0.05 The decision rule at the 0.05 significance level is to reject the null hypothesis since our p < 0.05 Thus we can conclude that with 95% confidence there is a statistically significant evidence that absolute price (X1) is a good predictor of the Market Share (Y). (3) From the output above, would you say that the Relative Price (X2) is a good predictor of the Market Share (Y)? Why or Why Not? Please make sure you try to answer this using the p value for the Relative Price shown. (6 Points) P value for the X2 variable is 0.000 which is less than α =0.05 and Ho: β2= 0, HA: β2≠0. P value = 0.000 is less than 0.05 The decision rule at the 0.05 significance level is to reject the null hypothesis since our p < 0.05 Thus we can conclude that with 95% confidence there is a statistically significant evidence that Relative price (X2) is a good predictor of the Market Share (Y). 7 (4) Do You feel as though the linear regression model (i.e. the regression equation) does a good job of estimating (Y)? Explain. (6 Points) R2 value R-Sq = 99.9% which means that using X1 and X2 values 99.9% of Y values variations can be explained. As concluded previously X1 and X2 are significant variables using p value hypothesis by rejecting null hypothesis and accepting alternative hypothesis. Additionally, the correlation vales were also in strong relationship. The regression equation is : y = 0.179 - 0.00105 x1 + 0.208 x2 As we plug X1 and X2 values into the equation, Y values obtained from the calculations are very much similar to the actual Y values which demonstrates what had been said earlier. 8 Part III: Matching Distributions Task 1: Paste Your Minitab Histogram Below (Y- Times): (3 Points) Histogram of t 50 Frequency 40 30 20 10 0 6 12 18 24 30 36 t 9 Task 2: Paste Your (2) Empirical CDF Plots Below (Normal, Exponential): (6 Points) Empirical CDF of t Exponential Mean N 100 10.06 250 Percent 80 60 40 20 0 0 10 20 30 40 50 t 10 Task 3: Provide Answers to the Following Questions: (6 Points) (1) From your work in Task above, comment on the Histogram in 100 words or less. From your work in Task 1, does time (t) seem to follow a normal distribution? If not, from your knowledge gained in IE 322, what probability distribution does it appear the data may fit? (3 Points) The Histogram does not follow a normal distribution. The more frequent values are concentrated on the left side of the x-axis therefore the graph is more likely to be right skewed. The mean is on the left side of the Histogram. Considering the values, the graph appears to fit exponential probability distribution because the data set (Y values regarding X values) is decreasing most likely exponentially. (2) From your work in Task 2 above, Based on the CDF plots, discuss which distribution appears to be a good fit for the data. From the plot of the distribution that appears to be a good fit for the data, please estimate the parameter(s) for that specific distribution. (3 Points) Using the software Minitab, team members has plotted the data set in empirical cumulative distribution function using two types of distributions normal and exponential. As seen from the 1st graph of task 2, normal distribution graph has two kinds of lines. Blue line a cumulative distribution of a normal distribution and the line in red line normalized distribution of the data set. As observed from the graph, there is a gap in between red and blue lines which implicitly explains that normal distribution is not the accurate distribution. On the other hand, the exponential cumulative distribution graph’s blue and red line’s follows each other along the function line very closely. Therefore exponential data graph proves that exponential distribution is a good fit for the date and a much better fit than normal distribution. The mean from the exponential distribution is 10.06 which means 10.06 min is waiting an airplane time for take off. 11 Task 4: Provide Answers to the Following Questions: (13 Points) (1) Complete the Following Table and Explain your calculations (5 Points): t # of occurrences f(t) F(t) ln(1/(1-F(t))) /t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 23 21 19 17 16 14 13 12 11 10 9 8 7 7 6 6 5 5 4 4 4 3 3 3 3 2 2 2 2 2 2 2 1 1 1 0.092 0.084 0.076 0.068 0.064 0.056 0.052 0.048 0.044 0.04 0.036 0.032 0.028 0.028 0.024 0.024 0.02 0.02 0.016 0.016 0.016 0.012 0.012 0.012 0.012 0.008 0.008 0.008 0.008 0.008 0.008 0.008 0.004 0.004 0.004 0.092 0.176 0.252 0.32 0.384 0.44 0.492 0.54 0.584 0.624 0.66 0.692 0.72 0.748 0.772 0.796 0.816 0.836 0.852 0.868 0.884 0.896 0.908 0.92 0.932 0.94 0.948 0.956 0.964 0.972 0.98 0.988 0.992 0.996 1 0.096511 0.193585 0.290352 0.385662 0.484508 0.579818 0.677274 0.776529 0.87707 0.978166 1.07881 1.177655 1.272966 1.378326 1.47841 1.589635 1.69282 1.807889 1.910543 2.024953 2.154165 2.263364 2.385967 2.525729 2.688248 2.813411 2.956512 3.123566 3.324236 3.575551 3.912023 4.422849 4.828314 5.521461 #NUM! 0.096511 0.096792 0.096784 0.096416 0.096902 0.096636 0.096753 0.097066 0.097452 0.097817 0.098074 0.098138 0.09792 0.098452 0.098561 0.099352 0.099578 0.100438 0.100555 0.101248 0.102579 0.10288 0.103738 0.105239 0.10753 0.108208 0.1095 0.111556 0.114629 0.119185 0.126194 0.138214 0.146313 0.162396 #NUM! 12 (2) Please Paste in below your plot from Excel for the Distribution Fit with a Linear Trendline (See Unit 13 Lecture Edited- Slide 21 for Help with the Plot) (4 Points) ln vs t y = 0.1181x R² = 0.9164 6 5 4 3 2 1 0 0 5 10 15 20 25 30 35 40 (3) From the Plot above, please estimate the parameter λ and then interpret the parameter (Hint: Explain what 1/ λ means in the context of this problem). (4 Points) The plot fits into an exponentially distribution therefore the plot is linearized as a function crossing the origin. Slope read from the equation is the value of λ =0.1181 The mean is equal to 1/ λ therefore 1/0.1181 = 8.4674 min The average mean time for a plane to wait for a take off is 8.4674 minutes which is about 8 minutes and 28 seconds. GRADE: ________/ 100 13 —— ——— 11/23/2013 8:16:17 PM ———————————————————— Welcome to Minitab, press F1 for help. Histogram of t Descriptive Statistics: t Variable t N 250 N* 0 Variable t Q3 15.000 Mean 10.060 SE Mean 0.522 StDev 8.250 Variance 68.057 Minimum 1.000 Q1 3.000 Median 8.000 Maximum 35.000 14