MGQ 301 – Spring 2015 Statistical Decisions in Management Homework 1 Seungmin Lee 50093307 Q1. Use EDUSEV.xls from the class website under UBlearns. (Total 10 points) a. How many observations are there? (1 point) - 78 observations b. How many variables are there? Which variable is categorical? (2 point) - 5 Variables. MALE c. How many percent is female? (Hint: Calculate the mean of “male” variable) (1point) Analysis Variable : Male Mean Std Dev Minimum Maximum N 0 47 78 0 0 047 - There are 47 females. = 0.602564 ≈ 60.2564% d. Make a suitable graph that describes the shape, center, and spread of the distribution of students’ IQ scores. (1 points) e. In general, IQ scores are usually said to be centered at 100. Is this true for this data? Describe the distribution in a couple of sentences. Is the midpoint for these students is close to 100, clearly above, or clearly below. (2points) Analysis Variable : IQ Mean Std Dev Minimum Maximum Range N Lower Quartile Median Upper Quartile 108.923 13.170 72 136 64 78 103 110 118 -This mean is 108.923. Just look at this mean, It seems like close to 100. But, mean, Lower Quartile, and median are above 100. So, ‘IQ scores are usually said to be centered at 100’ is false for this data. f. Make a suitable graph that describes the shape, center, and spread of the distribution of self-concept scores. (1 point) g. Can you identify any suspected outliers? Why? (2point) Mean Std Dev Analysis Variable : SelfConcept Minimum Maximum N Lower Quartile 56.961538512.412229320.000000080.000000078 Median Upper Quartile 51.000000059.5000000 66.0000000 - There are 3 outliers. IQR = Q3-Q1= 66-51=15 IQR×1.5=15×1.5=22.5 51-22.5=28.5, 66-22.5=43.5 28, 21, and 20 are below the 28.5. So, that are outliers. Q2. The following table contains monthly housing expenditures for 10 families. (Total 5 points) Family 1 2 3 4 5 6 7 8 9 10 Monthly Housing Expenditures (Dollars) 300 440 350 1,100 640 480 450 700 670 530 a. Find the mean monthly housing expenditure. (1 point) Analysis Variable : B Mean N 566.0000000 10 b. Find the median monthly housing expenditure. (1 point) Analysis Variable : B N Median 10 505.0000000 c. If monthly housing expenditure were measured in hundreds of dollars, rather than in dollars, what would be the average and median expenditures? (1 point) Analysis Variable : F2 Mean N Median 56600.0010 50500.00 d. Suppose that family number 8 increases its monthly housing expenditure to $900, but the expenditures of all other families remain the same. Compute the mean and median housing expenditures. (1 point) Analysis Variable : B Mean N Median 586.000000010505.0000000 e. Refer back to the original data. Now, suppose that family number 4 decreases its monthly housing expenditure to $800, but the expenditures of all other families remain the same. Compute the mean and median housing expenditures. (1 point) Analysis Variable : B Mean N Median 536.000000010505.0000000 Q3. Suppose the following equation describes the relationship between the average number of classes missed during a semester (missed) and the distance from school (distance, measure in miles) (Total 4 points): missed = 3 + 0.2 distance a. Sketch this line, being sure to label the axes. How do you interpret the intercept in this equation? (2 points) Y=3+0.2x (Y=missed, x=distance) b. What is the average number of classes missed for someone who lives five miles away? (1 point) - Y=3+0.2x (Y=missed, x=distance) Y=3+(0.2×5)=3+1=4. The average number of classes missed for someone who lives five miles away is 4 classes. c. What is the difference in the average number of classes missed for someone who lives 10 miles away and someone who lives 20 miles away? (1 points) Y=3+0.2x (Y=missed, x=distance) xâ = 10 Y=3+(0.2×10)=3+2=5 xâ=20 Y=3+(0.2×20)=3+4=7 xâ-xâ=7-5=2 Q4. Use CORR.xls data to answer the following question that illustrates an important point about correlation. (Total 4 points) a. Make a scatterplot of Y versus X. (1 point) b. Describe the relationship between Y and X. Is it weak or strong? Is it linear? (1 point) - It is strong. This graph is U-shaped, so it is linear. c.Find the correlation between Y and X. (1 point) 1 5−1 ∑5đ=1 {[ (đĽ−45) 15.81139 ]×[ Pearson Correlation Coefficients, N = 5 Prob > |r| under H0: Rho=0 x 0.00000 1.0000 y (đŚ−26) 16.7332 ]}= 0 = r = correlation d. What important point about correlation does this exercise illustrate? (1 point) - This graph has strong relationship between Y and X, and linear. But, that’s correlation is 0. Because U-shaped has two kinds of slope value. half of slope has positive value and half of slope has negative value. So, its slope is zero. Q5. Use BEER.xls. Use any statistical software to do problem 2.51 on page 108. (Total 6 points) a. Make a scatterplot of carbohydrates (g) versus alcohol (%) for 153 brands of beer. (2 point) b. Compute the correlation for these data. (2 point) 1 With Variables:Carbohydrates 1 Variables: PercentAlcohol Variable N Simple Statistics Mean Std Dev Sum Minimum Maximum Carbohydrates 15311.959614.90578 1830 1.90000 32.10000 PercentAlcohol153 5.228821.42874800.01000 0.40000 11.50000 Pearson Correlation Coefficients, N = 153 Prob > |r| under H0: Rho=0 PercentAlcohol 0.52097 <.0001 Carbohydrates c. The data you used to compute the correlation in part (b) includes an outlier. Remove the outlier and recomputed the correlation. (2 points) 1 With Variables:Carbohydrates 1 Variables: PercentAlcohol Variable N Simple Statistics Mean Std Dev Sum Minimum Maximum Carbohydrates 15211.950794.92078 1817 1.90000 32.10000 PercentAlcohol152 5.260591.37818799.61000 2.40000 11.50000 Pearson Correlation Coefficients, N = 152 Prob > |r| under H0: Rho=0 PercentAlcohol 0.54837 <.0001 Carbohydrates Q6. Each of the following statements contains an error. Describe each error and explain why the statement is wrong. (4 points) a. A strong negative relationship implies that there is a causation between the explanatory variable and the response variable. (2 points) - Correlation doesn’t equal to causation. b. A lurking (confounding) variable is always something that can be measured. (2 points) - Confounding variable is not always something that can be measured. Q7. In the language of government statistics, you are “in the labor force” if you are available for work and either working or actively seeking work. The unemployment rate is the proportion of the labor force (not of the entire population) who are unemployed. Here are data from the Current Population Survey (CPS) for the civilian population aged 25 years and over. The table entries are counts in thousands of people. You must show your work in answering the following questions. (4 points) Highest Education Total Population In Labor Force Employed Did not finish high 28,021 12,623 11,552 school High school but no 59,844 38,210 36,249 college Some college, but no 46,777 33,928 32,429 bachelor’s degree College graduate 51,568 40,414 39,250 a. Find the unemployment rate for people with each level of education. How does the unemployment rate change with education? Explain carefully why your results show that level of education and being employed are not independent. (2 points) Highest Education In Labor Force Employed # of unemployed Rate of Unemployed Did not finish high 12,623 11,552 1,071 8% school High school but no 38,210 36,249 1,961 5% college Some college, but no 33,9928 32,429 1,499 4% bachelor’s degree College graduate 40,414 39,250 1,164 3% - Level of education and being employed are dependent. Because when level of education go higher, the rate of unemployed decrease. It mean, people who have higher education level, they have more chance to get a job. b. What is the probability that a randomly chosen person 25 years of age or older is in the labor force? (1 point) Highest Education Total Population In Labor Force Did not finish high school 28,021 12,623 High school but no college 59,844 38,210 Some college, but no bachelor’s degree 46,777 33,928 College graduate 51,568 40,414 Total 186,210 125,175 A= In Labor Force, B= Person 25 years of age or older 125,175 đ(đ´ ∩ đľ) = 186,210=0.67225=67.225% c. If you know that the person chosen is a college graduate, what is the conditional probability that he or she is in the labor force? (1 point) A= a college graduate, B= in the labor force 40,414 đ(đ´ ∩ đľ) 186,210 40,414 đ(đ´|đľ) = = = = 0.783703 = 78.3703% 51,568 đ(đľ) 51,568 186,210 Q8. Suppose 40% of adults get enough sleep, 46% get enough exercise, and 24% do both. You must show your work in answering the following questions. (4 points) a. Draw a Venn diagram showing the probabilities for exercise and sleep. (1 points) Get enough exercise (22%) 2 4 % Get enough sleep (16%) A=Get enough exercise, B=Get enough sleep (đ´ ∩ đľ) = 24% b. Find the probabilities of the following events: i. Enough sleep and not enough exercise (1 point) A= Enough sleep, đľđ = Not enough exercise A=40%, đľđ =54% (đ´ ∩ đľđ ) = 40% × 54% = 21.6% ii. Not enough sleep and enough exercise (1 point) đ đ´ = Not enough sleep, B= Enough exercise đ´đ =60%, B=46% (đ´đ ∩ đľ) = 60% × 46% = 27.6% iii. Not enough sleep and not enough exercise (1 point) đ´đ = Not enough sleep, đľđ = Not enough exercise (đ´đ ∩ đľđ ) = 60% × 54% = 32.4% Q9. Three different machines M1, M2, and M3 were used for producing a large batch of similar manufactured items. Suppose that 20 percent of the items were produced by machine M1, 30 percent by machine M2, and 50 percent by machine M3. Suppose further that 1 percent of the items produced by machine M1 defective, that 2 percent of the items produced by machine M2 are defective, and that 3 percent of the items produced by machine M3 are defective. Finally, suppose that one item is selected at random from the entire batch, and it is found to be defective. You must show your work in answering the following questions. (2 points) M1=20%, M2=30%, M3=50%, Entire batch = 100% Defected by M1= 1%, Defected by M2= 2%, Defected by M3= 3% Product Defective Nondefective M1 0.2 0.2 × 0.01 0.2 − 0.002 = 0.002 = 0.198 M2 0.3 0.3 × 0.02 0.3 − 0.006 = 0.006 = 0.294 M3 0.5 0.5 × 0.03 0.5 − 0.015 = 0.015 = 0.485 Total 1.0 0.023 0.977 a. What is the probability that this item was produced by machine M2? (1 point) - A= select item was produced by machine M2 B= Selected item from the entire batch, and it is found to be defective. 0.006 đ(đ´ ∩ đľ) 0.006 đ(đ´|đľ) = = 1 = = 0.26087 = 26.087% 0.023 đ(đľ) 0.023 1 b. In the context of this exercise, a probability of an event before the item is selected and before it is known whether the selective item is defective or nondefective is often called the prior probability. A probability of an event after it is known that the selected item is defective is often called posterior probability. Suppose that the item selected at random from the entire lot is found to be nondefective. What is the posterior probability that it was produced by machine M2? (1 points) - A= Selected at random from the entire lot is found to be nondefective - B= Posterior probability that was produced by machine M2 - đ(đ´|đľ) = đ(đ´∩đľ) đ(đľ) = 0.294 1 0.977 1 0.294 = 0.977 = 0.300921 = 30.0921% Q10. Two boxes contain long bolts and short bolts. Suppose that one box contain 60 long bolts and 40 short bolts, and that the other box contains 10 long bolts and 20 short bolts. Suppose also that one box is selected at random and a bolt is then selected at random from that box. What is the probability that this bolt is long? You must show your work. (2 points) Box 1 Box 2 Long bolts = 60 Long bolts = 10 Short bolts= 40 Short bolts = 20 1 P(select one box) = 2 P(select long bolts from box 1) = 60 100 10 P(select long bolts from box 2) = 30 1 2 60 100 1 10 P(select one box and long bolts from box 1) = × = 30 100 = 30% 5 P(select one box and long bolts from box 2) = 2 × 30 = 30 = 0.166667 = 16.6667% P(select long bolt)=30%+16.6667%=46.6667% Q11. A $1 bet in a state lottery’s Pick 3 game pays $500 if the three-digit number you choose exactly matches the winning number, which is drawn at random. Here is the distribution of the payoff X: Payoff X Probability 0$ 0.999 500$ 0.001 Each day’s drawing is independent of other drawings. You must show your work in answering the following questions. (3 points) a. What are the mean and standard deviation of X? (1 point) đ¸(0$) = 0 × 0.999 = 0 đ¸(500$) = 500 × 0.001 = 0.5 đ = 0 + 0.5 = 0.5 đ(0$) = (0 − 0.5)2 0.999 đ(500$) = (500 − 0.5)2 0.001 đ 2 = 0.24975 + 249.5 = 0.24975 = 249.5 = 249.74975 đ = √249.7497 = 15.803471 b. Joe buys a Pick 3 ticket twice a week. What does the law of large numbers say about the average payoff Joe receives from his bets? (1 point) Payoff $ Probability 0 0.998 500 0.000999 1000 0.000001 đ = đ¸(đĽ) = ∑[đĽđ(đĽ)] = 0.998 × 0 + 0.000999 × 500 + 0.000001 × 1000 = 0 + 0.4995 + 0.001 = 0.5005 c. Joe comes out ahead for the year if his average payoff is greater than $1(the amount he spent each day on a ticket). What is the probability that Joe ends the year head? (1 point) 1 − 0.5 = 0.031639 15.803471 z-score 0.03 = 0.5120, 1-0.5120=0.488=48.8% Q12. According to genetic theory, the blossom color in the second generation of a certain cross of sweet peas should be red or white in a 3:1 ratio. That is, each plant has probability ¾ of having red blossoms, and the blossom colors of separate plants are independent. Show your work. (2 points) a. What is the probability that exactly 9 out of 12 of these plants have red blossoms? (1 point) 3 1 đ(đ ) = , đ(đ) = 4 4 12! 3 9 3 12−9 12! 3 9 1 3 10 × 11 × 12 3 9 1 3 ( ) (1 − ) = ( ) ( ) = ( ) ( ) 9! (12 − 9)! 4 4 9! × 1 × 2 × 3 4 4 1×2×3 4 4 19683 1 = 220 ( ) ( ) = 0.258104 = 25.8104% 262144 64 b. What is the mean number of red-blossomed plants when 120 plants of this type are grown from seeds? (1 point) đ = đ¸(đĽ) = đđ, đ = 120, đ = đ = đ¸(đĽ) = 120 × 3 = 90 4 3 4