MGQ 301 HW1 - Personal websites at UB

advertisement
MGQ 301 – Spring 2015 Statistical Decisions in Management
Homework 1
Seungmin Lee
50093307
Q1. Use EDUSEV.xls from the class website under UBlearns. (Total 10 points)
a. How many observations are there? (1 point)
- 78 observations
b. How many variables are there? Which variable is categorical? (2 point)
- 5 Variables. MALE
c. How many percent is female? (Hint: Calculate the mean of “male” variable) (1point)
Analysis Variable : Male
Mean Std Dev Minimum Maximum N
0
47
78
0
0
047
- There are 47 females. = 0.602564 ≈ 60.2564%
d. Make a suitable graph that describes the shape, center, and spread of the distribution of students’ IQ
scores. (1 points)
e. In general, IQ scores are usually said to be centered at 100. Is this true for this data? Describe the
distribution in a couple of sentences. Is the midpoint for these students is close to 100, clearly above, or
clearly below. (2points)
Analysis Variable : IQ
Mean Std Dev Minimum Maximum Range N Lower Quartile Median Upper Quartile
108.923 13.170
72
136
64 78
103
110
118
-This mean is 108.923. Just look at this mean, It seems like close to 100. But, mean, Lower
Quartile, and median are above 100. So, ‘IQ scores are usually said to be centered at 100’ is false
for this data.
f. Make a suitable graph that describes the shape, center, and spread of the distribution of self-concept
scores. (1 point)
g. Can you identify any suspected outliers? Why? (2point)
Mean
Std Dev
Analysis Variable : SelfConcept
Minimum Maximum N Lower Quartile
56.961538512.412229320.000000080.000000078
Median Upper Quartile
51.000000059.5000000
66.0000000
- There are 3 outliers.
IQR = Q3-Q1= 66-51=15
IQR×1.5=15×1.5=22.5
51-22.5=28.5, 66-22.5=43.5
28, 21, and 20 are below the 28.5. So, that are outliers.
Q2. The following table contains monthly housing expenditures for 10 families. (Total 5 points)
Family
1
2
3
4
5
6
7
8
9
10
Monthly Housing Expenditures (Dollars)
300
440
350
1,100
640
480
450
700
670
530
a. Find the mean monthly housing expenditure. (1 point)
Analysis Variable : B
Mean N
566.0000000 10
b. Find the median monthly housing expenditure. (1 point)
Analysis Variable : B
N
Median
10
505.0000000
c. If monthly housing expenditure were measured in hundreds of dollars, rather than in dollars, what
would be the average and median expenditures? (1 point)
Analysis Variable : F2
Mean N Median
56600.0010 50500.00
d. Suppose that family number 8 increases its monthly housing expenditure to $900, but the
expenditures of all other families remain the same. Compute the mean and median housing
expenditures. (1 point)
Analysis Variable : B
Mean N
Median
586.000000010505.0000000
e. Refer back to the original data. Now, suppose that family number 4 decreases its monthly housing
expenditure to $800, but the expenditures of all other families remain the same. Compute the
mean and median housing expenditures. (1 point)
Analysis Variable : B
Mean N
Median
536.000000010505.0000000
Q3. Suppose the following equation describes the relationship between the average number of classes
missed during a semester (missed) and the distance from school (distance, measure in miles) (Total 4
points):
missed = 3 + 0.2 distance
a. Sketch this line, being sure to label the axes. How do you interpret the intercept in this equation? (2
points)
Y=3+0.2x (Y=missed, x=distance)
b. What is the average number of classes missed for someone who lives five miles away? (1 point)
- Y=3+0.2x (Y=missed, x=distance)
Y=3+(0.2×5)=3+1=4.
The average number of classes missed for someone who lives five miles away is 4 classes.
c. What is the difference in the average number of classes missed for someone who lives 10 miles away
and someone who lives 20 miles away? (1 points)
Y=3+0.2x (Y=missed, x=distance)
x₁ = 10
Y=3+(0.2×10)=3+2=5
x₂=20
Y=3+(0.2×20)=3+4=7
x₂-x₁=7-5=2
Q4. Use CORR.xls data to answer the following question that illustrates an important point about
correlation. (Total 4 points)
a. Make a scatterplot of Y versus X. (1 point)
b. Describe the relationship between Y and X. Is it weak or strong? Is it linear? (1 point)
- It is strong. This graph is U-shaped, so it is linear.
c.Find the correlation between Y and X. (1 point)
1
5−1
∑5𝑖=1 {[
(𝑥−45)
15.81139
]×[
Pearson Correlation Coefficients, N = 5
Prob > |r| under H0: Rho=0
x
0.00000
1.0000
y
(𝑦−26)
16.7332
]}= 0 = r = correlation
d. What important point about correlation does this exercise illustrate? (1 point)
- This graph has strong relationship between Y and X, and linear. But, that’s correlation is 0.
Because U-shaped has two kinds of slope value. half of slope has positive value and half of slope has
negative value. So, its slope is zero.
Q5. Use BEER.xls. Use any statistical software to do problem 2.51 on page 108. (Total 6 points)
a. Make a scatterplot of carbohydrates (g) versus alcohol (%) for 153 brands of beer. (2 point)
b. Compute the correlation for these data. (2 point)
1 With Variables:Carbohydrates
1 Variables:
PercentAlcohol
Variable
N
Simple Statistics
Mean Std Dev
Sum Minimum Maximum
Carbohydrates 15311.959614.90578
1830 1.90000 32.10000
PercentAlcohol153 5.228821.42874800.01000 0.40000 11.50000
Pearson Correlation Coefficients, N = 153
Prob > |r| under H0: Rho=0
PercentAlcohol
0.52097
<.0001
Carbohydrates
c. The data you used to compute the correlation in part (b) includes an outlier. Remove the outlier
and recomputed the correlation. (2 points)
1 With Variables:Carbohydrates
1 Variables:
PercentAlcohol
Variable
N
Simple Statistics
Mean Std Dev
Sum Minimum Maximum
Carbohydrates 15211.950794.92078
1817 1.90000 32.10000
PercentAlcohol152 5.260591.37818799.61000 2.40000 11.50000
Pearson Correlation Coefficients, N = 152
Prob > |r| under H0: Rho=0
PercentAlcohol
0.54837
<.0001
Carbohydrates
Q6. Each of the following statements contains an error. Describe each error and explain why the
statement is wrong. (4 points)
a. A strong negative relationship implies that there is a causation between the explanatory variable
and the response variable. (2 points)
- Correlation doesn’t equal to causation.
b. A lurking (confounding) variable is always something that can be measured. (2 points)
- Confounding variable is not always something that can be measured.
Q7. In the language of government statistics, you are “in the labor force” if you are available for work
and either working or actively seeking work. The unemployment rate is the proportion of the labor force
(not of the entire population) who are unemployed. Here are data from the Current Population Survey
(CPS) for the civilian population aged 25 years and over. The table entries are counts in thousands of
people. You must show your work in answering the following questions. (4 points)
Highest Education
Total Population
In Labor Force
Employed
Did not finish high
28,021
12,623
11,552
school
High school but no
59,844
38,210
36,249
college
Some college, but no
46,777
33,928
32,429
bachelor’s degree
College graduate
51,568
40,414
39,250
a. Find the unemployment rate for people with each level of education. How does the unemployment
rate change with education? Explain carefully why your results show that level of education and
being employed are not independent. (2 points)
Highest Education
In Labor Force
Employed
# of unemployed
Rate of Unemployed
Did not finish high
12,623
11,552
1,071
8%
school
High school but no
38,210
36,249
1,961
5%
college
Some college, but no
33,9928
32,429
1,499
4%
bachelor’s degree
College graduate
40,414
39,250
1,164
3%
- Level of education and being employed are dependent. Because when level of education go
higher, the rate of unemployed decrease. It mean, people who have higher education level, they
have more chance to get a job.
b. What is the probability that a randomly chosen person 25 years of age or older is in the labor force?
(1 point)
Highest Education
Total Population
In Labor Force
Did not finish high school
28,021
12,623
High school but no college
59,844
38,210
Some college, but no bachelor’s degree
46,777
33,928
College graduate
51,568
40,414
Total
186,210
125,175
A= In Labor Force, B= Person 25 years of age or older
125,175
𝑃(𝐴 ∩ 𝐵) = 186,210=0.67225=67.225%
c. If you know that the person chosen is a college graduate, what is the conditional probability that he
or she is in the labor force? (1 point)
A= a college graduate, B= in the labor force
40,414
𝑃(𝐴 ∩ 𝐵) 186,210 40,414
𝑃(𝐴|𝐵) =
=
=
= 0.783703 = 78.3703%
51,568
𝑃(𝐵)
51,568
186,210
Q8. Suppose 40% of adults get enough sleep, 46% get enough exercise, and 24% do both. You must
show your work in answering the following questions. (4 points)
a. Draw a Venn diagram showing the probabilities for exercise and sleep. (1 points)
Get enough
exercise
(22%)
2
4
%
Get enough
sleep
(16%)
A=Get enough exercise,
B=Get enough sleep
(𝐴 ∩ 𝐵) = 24%
b. Find the probabilities of the following events:
i.
Enough sleep and not enough exercise (1 point)
A= Enough sleep, 𝐵𝑐 = Not enough exercise
A=40%, 𝐵𝑐 =54%
(𝐴 ∩ 𝐵𝑐 ) = 40% × 54% = 21.6%
ii.
Not enough sleep and enough exercise (1 point)
𝑐
𝐴 = Not enough sleep, B= Enough exercise
𝐴𝑐 =60%, B=46%
(𝐴𝑐 ∩ 𝐵) = 60% × 46% = 27.6%
iii.
Not enough sleep and not enough exercise (1 point)
𝐴𝑐 = Not enough sleep, 𝐵𝑐 = Not enough exercise
(𝐴𝑐 ∩ 𝐵𝑐 ) = 60% × 54% = 32.4%
Q9. Three different machines M1, M2, and M3 were used for producing a large batch of similar
manufactured items. Suppose that 20 percent of the items were produced by machine M1, 30 percent
by machine M2, and 50 percent by machine M3. Suppose further that 1 percent of the items produced
by machine M1 defective, that 2 percent of the items produced by machine M2 are defective, and that 3
percent of the items produced by machine M3 are defective. Finally, suppose that one item is selected
at random from the entire batch, and it is found to be defective. You must show your work in answering
the following questions. (2 points)
M1=20%, M2=30%, M3=50%, Entire batch = 100%
Defected by M1= 1%, Defected by M2= 2%, Defected by M3= 3%
Product
Defective
Nondefective
M1
0.2
0.2 × 0.01
0.2 − 0.002
= 0.002
= 0.198
M2
0.3
0.3 × 0.02
0.3 − 0.006
= 0.006
= 0.294
M3
0.5
0.5 × 0.03
0.5 − 0.015
= 0.015
= 0.485
Total
1.0
0.023
0.977
a. What is the probability that this item was produced by machine M2? (1 point)
- A= select item was produced by machine M2
B= Selected item from the entire batch, and it is found to be defective.
0.006
𝑃(𝐴 ∩ 𝐵)
0.006
𝑃(𝐴|𝐵) =
= 1 =
= 0.26087 = 26.087%
0.023
𝑃(𝐵)
0.023
1
b. In the context of this exercise, a probability of an event before the item is selected and before it is
known whether the selective item is defective or nondefective is often called the prior probability. A
probability of an event after it is known that the selected item is defective is often called posterior
probability. Suppose that the item selected at random from the entire lot is found to be
nondefective. What is the posterior probability that it was produced by machine M2? (1 points)
- A= Selected at random from the entire lot is found to be nondefective
- B= Posterior probability that was produced by machine M2
-
𝑃(𝐴|𝐵) =
𝑃(𝐴∩𝐵)
𝑃(𝐵)
=
0.294
1
0.977
1
0.294
= 0.977 = 0.300921 = 30.0921%
Q10. Two boxes contain long bolts and short bolts. Suppose that one box contain 60 long bolts and 40
short bolts, and that the other box contains 10 long bolts and 20 short bolts. Suppose also that one box
is selected at random and a bolt is then selected at random from that box. What is the probability that
this bolt is long? You must show your work. (2 points)
Box 1
Box 2
Long bolts
= 60
Long bolts
= 10
Short
bolts= 40
Short
bolts = 20
1
P(select one box) = 2
P(select long bolts from box 1) =
60
100
10
P(select long bolts from box 2) = 30
1
2
60
100
1
10
P(select one box and long bolts from box 1) = ×
=
30
100
= 30%
5
P(select one box and long bolts from box 2) = 2 × 30 = 30 = 0.166667 = 16.6667%
P(select long bolt)=30%+16.6667%=46.6667%
Q11. A $1 bet in a state lottery’s Pick 3 game pays $500 if the three-digit number you choose exactly
matches the winning number, which is drawn at random. Here is the distribution of the payoff X:
Payoff X
Probability
0$
0.999
500$
0.001
Each day’s drawing is independent of other drawings. You must show your work in answering the
following questions. (3 points)
a. What are the mean and standard deviation of X? (1 point)
𝐸(0$) = 0 × 0.999 = 0
𝐸(500$) = 500 × 0.001 = 0.5
𝑚 = 0 + 0.5 = 0.5
𝑉(0$) = (0 − 0.5)2 0.999
𝑉(500$) = (500 − 0.5)2 0.001
𝑆 2 = 0.24975 + 249.5
= 0.24975
= 249.5
= 249.74975
𝑆 = √249.7497 = 15.803471
b. Joe buys a Pick 3 ticket twice a week. What does the law of large numbers say about the average
payoff Joe receives from his bets? (1 point)
Payoff $
Probability
0
0.998
500
0.000999
1000
0.000001
𝑚 = 𝐸(𝑥) = ∑[𝑥𝑃(𝑥)] = 0.998 × 0 + 0.000999 × 500 + 0.000001 × 1000 = 0 + 0.4995 + 0.001
= 0.5005
c. Joe comes out ahead for the year if his average payoff is greater than $1(the amount he spent each
day on a ticket). What is the probability that Joe ends the year head? (1 point)
1 − 0.5
= 0.031639
15.803471
z-score 0.03 = 0.5120, 1-0.5120=0.488=48.8%
Q12. According to genetic theory, the blossom color in the second generation of a certain cross of sweet
peas should be red or white in a 3:1 ratio. That is, each plant has probability ¾ of having red blossoms,
and the blossom colors of separate plants are independent. Show your work. (2 points)
a. What is the probability that exactly 9 out of 12 of these plants have red blossoms? (1 point)
3
1
𝑃(𝑅) = ,
𝑃(𝑊) =
4
4
12!
3 9
3 12−9
12!
3 9 1 3 10 × 11 × 12 3 9 1 3
( ) (1 − )
=
( ) ( ) =
( ) ( )
9! (12 − 9)! 4
4
9! × 1 × 2 × 3 4
4
1×2×3 4
4
19683
1
= 220 (
) ( ) = 0.258104 = 25.8104%
262144 64
b. What is the mean number of red-blossomed plants when 120 plants of this type are grown from
seeds? (1 point)
𝑚 = 𝐸(𝑥) = 𝑛𝑝, 𝑛 = 120, 𝑝 =
𝑚 = 𝐸(𝑥) = 120 ×
3
= 90
4
3
4
Download