Stat 101 Exam 2

advertisement
Stat 101 Exam 2 - Embers
Important Formulas and Concepts
1
1
Chapter 8
1.1
Definitions
1. Extrapolation
In any regression situation it is unsafe. Predictions from extrapolation should not be
trusted.
2. Outlier
Any data point that stands away from the others.
3. Leverage
Data points whose x-values are far from the mean of x.
4. Influential Point
A point that ,if omitted from the data, results in a very different regression model.
1.2
Residual Plots
A residual plot is a scatterplot that shows the residual versus x values. There are 3 separate
conditions that must be met in order to have an appropriate linear model. These conditions
are 1) Linearity Condition, 2) Outlier Condition, and 3) Equal Spread Condition. The
Linearly Condition is violated when there are bends in a residual plot. The Outlier Condition
is violated if any point is far from the rest of the points in the residual plot. The Equal
Spread Condition is violated when the spread changes from one part of the plot to another.
2
Chapter 10
2.1
Definitions
1. Population
The entire group of individuals or instances about whom we hope to learn.
2. Sample
A (representative) subset of a population, examined in the hope of learning about the
population.
1
This version: October 20, 2015, by Jennifer Pajda-De La O. May not include all things that could
possibly be tested on. To be used as an additional reference to studying all Chapters 8 - 15. All definitions,
formulas, and selected problems come from Intro Stats by De Veaux, Velleman and Bock, 4th edition,
published by Pearson.
3. Sample Survey
A study that asks questions of a sample drawn from some population in the hope of
learning something about the entire population.
4. Randomization
The best defense against bias is randomization, in which each individual is given a fair,
random chance of selection.
5. Census
A sample that consists of the entire population.
6. Population Parameter
A numerically valued attribute of a model for a population. Example: mean income
of all employed people in the USA
7. Sample statistic
Statistics or sample statistics are values that are calculated for sample data. Example:
mean income of employed people in a representative sample
8. Sampling Frame
A list of individuals from whom the sample is drawn. Individuals who may be in the
population of interest, but who are not in the sampling frame cannot be included in
any sample.
9. Simple Random Sample (SRS)
A SRS of sample size n is a sample in which each set of n elements in the population
has an equal chance of selection.
10. Stratified Random Sampling
A sampling design in which the population is divided into several subpopulations
(strata) and random samples are then drawn from each stratum. Try to make strata
as homogeneous as possible.
11. Cluster Sampling
Entire groups, or clusters, are chosen at random. Clusters are heterogeneous.
12. Multistage Sampling
Sampling schemes that combine several sampling methods.
13. Systematic sample
A sample drawn by selecting individuals systematically from a sampling frame.
14. Voluntary response bias
Bias introduced to a sample when individuals can choose on their own whether to
participate in the sample.
15. Undercoverage bias
Biases the sample in a way that gives a part of the population less representation in
the sample than it has in the population.
16. Nonresponse bias
Bias introduced when a large fraction of those sampled fails to respond.
17. Response bias
Anything in a survey design that influences responses.
3
Chapter 11
1. Studies
(a) Observational Study
Study based on data in which no manipulation of factors has been employed.
(b) Retrospective Study
Observational study in which subjects are selected and then their previous conditions or behaviors are determined. Based on historical data and memories.
(c) Prospective Study
Observational study in which subjects are followed to observe future outcomes.
Because no treatments are deliberately applied, it is not an experiment.
2. Matching in Studies
In a retrospective of prospective study, participants who are similar in ways not under
study may be matched and then compared with each other on the variables of interest.
3. Experiments
(a) Factor
Variable whose levels are manipulated by the experimenter.
(b) Response Variable
Variable whose values are compared across different treatments.
(c) Experiment
Manipulates factor levels to create treatments, randomly assigns subjects to these
treatment levels, and then compares the responses of the subject groups across
treatment levels. Tries to assess effects of treatments.
(d) Levels
Specific values that the experimenter chooses for a factor.
(e) Treatment
Process, intervention, or other controlled circumstance applied to randomly assigned experimental units.
(f) Block
When groups of experimental units are similar in a way that is not a factor
under study, it is often a good idea to gather them together into blocks and then
randomize the assignment of treatments within each block.
4. Randomization through Random Assignment
An experiment must assign experimental units (individuals) to treatment groups using
some form of randomization.
5. Principles of Experimental Design
(a) Control
Control aspects of the experiment that we know may have an effect on the response, but that are not the factors being studied.
(b) Randomize
Randomize subjects to treatments to even out effects that we cannot control.
(c) Replicate
Replicate over as many subjects as possible.
(d) Block
Reduce the effects of identifiable attributes of the subjects that cannot be controlled.
6. Statistically Significant
When an observed difference is too large for us to believe that it is likely to have
occurred naturally, we consider the difference to be statistically significant.
7. Types of Experiments
(a) Completely randomized design (CRD)
All experimental units have an equal chance of receiving any treatment.
(b) Randomized Block Design (RBD)
Participants are randomly assigned to treatments within each block.
8. Control Treatment
Baseline treatment.
9. Control Group
Experimental units assigned to a baseline treatment level typically either the default
treatment or a placebo treatment. Responses provide a basis for comparison.
10. Blinding
Any individual associated with an experiment who is not aware of how subjects have
been allocated to treatment groups.
11. Single/Double Blind
• Those who could influence the results.
• Those who evaluate the results.
Single Blind: when either of the two above statements is blinded. Double Blind: when
both of the two above statements is blinded.
12. Placebo
A treatment known to have no effect.
13. Placebo Effect
The tendency of human subjects to show a response even when administered a placebo.
14. Potential Problems
(a) Confounding
When the levels of one factor are associated with the levels of another factor in
such a way that their effects cannot be separated, we say that these two factors
are confounded.
(b) Lurking Variable
A variable associated with both y and x that makes it appear that x may be
causing y.
15. In summary, the best experiments are usually 1) Randomized, 2) Comparative, 3)
Double-blind, and 4) Placebo-controlled.
4
Chapter 12
4.1
Definitions
1. Random Phenomenon
A phenomenon is random if we know what outcomes could happen, but not which
particular values will happen.
2. Trial
A single attempt or realization of a random phenomenon.
3. Outcome
The value measured, observed, or reported for an individual instance of a trial.
4. Event
A collection of outcomes. Usually, we identify events so that we can attach probabilities
to them. Denote events with bold capital letters like A, B, etc.
5. Sample Space
The collection of all possible outcome values. The collection of values in the sample
space has a probability of 1. Denote by S or Ω.
6. Law of Large Numbers (LLN)
This law states that the long-run relative frequency of an event’s occurrence gets closer
and closer to the true relative frequency as the number of trials increases.
7. Independence (informal definition)
2 events are independent if learning that one event occurs does not change the probability that the other event occurs.
8. Probability
A number between 0 and 1 that reports the likelihood of that event’s occurrence. Write
P(A) for the probability of event A.
9. Empirical Probability
When the probability comes from the long-run relative frequency of the event’s occurrence.
10. Theoretical Probability
When the probability comes from a model (such as equally likely outcomes). P (A) =
# outcomes in A divided by # all possible outcomes
11. Personal (or subjective) Probability
When the probability is subjective and represents your personal degree of belief.
12. Legitimate Assignment of Probabilities
An assignment of probabilities to outcomes is legitimate if
• each probability is greater than or equal to 0 and less than or equal to 1
• the sum of the probabilities = 1
4.2
Rules on Probability
1. For all events A, 0 ≤ P (A) ≤ 1.
2. Probability Assignment Rule
• P(S) = P(Ω) = 1
• The set of all possible outcomes of a trial must have probability = 1.
3. Complement Rule
• Set of outcomes that are not in the event A is the complement AC
• P (AC ) = 1 − P (A)
• The probability of an event not occurring is 1 minus the probability that it occurs
4. Addition Rule
• For 2 disjoint events A and B, the probability that one or the other occurs is the
sum of the probability of the two events.
• P (A or B) = P (A) + P (B) where A and B are disjoint
• disjoint means mutually exclusive; there are no outcomes in common
5. Multiplication Rule
• For two independent events A and B, the probability that both A and B occur
is the product of the probabilities of the two events.
• P (A and B) = P (A)P (B) where A and B are independent
5
Chapter 13
5.1
Definitions
1. General Addition Rule
For any two events A and B, the probability of A or B is
P (A or B) = P (A) + P (B) − P (A and B).
This rule does NOT require disjoint events.
2. Conditional Probability
The conditional probability of the event B given the event A has occurred is
P (B | A) = P (AandB) .
P (A)
3. General Multiplication Rule
For any two events A and B, the probability of A and B is
P (A and B) = P (A)P (B | A).
This rule does NOT require independence.
4. Independent
Events A and B are independent when P (B | A) = P (B). Note: independent is not
the same as disjoint.
5. Tree Diagram
A display of conditional events or probabilities that is helpful in thinking through
conditioning.
6. Bayes Rule
P (B | A) =
P (A|B)P (B)
.
P (A|B)P (B)+P (A|BC )P (BC )
5.2
Tree Diagram Example and Interpretations of Every Node
Example Probabilities are Given
P (A and B) = (0.6)(0.8) = 0.48
B
0.8
A
0.6
No
t
0.4 A
No
tB
0.2
P (A and Not B) = (0.6)(0.2) = 0.12
P (Not A and B) = (0.4)(0.2) = 0.08
B
0.2
No
tB
0.8
P (Not A and Not B) = (0.4)(0.8) = 0.32
Here are the mathematical interpretations of the numbers in the tree diagram:
P (A) = 0.6
P (Not A) = 0.4
P (A and B) = 0.48
P (A and Not B) = 0.12
P (B | A) = 0.8
P (Not B|A) = 0.2
P (Not A and B) = 0.08
P (Not A and Not B) = 0.32
P (B |Not A) = 0.2
P (Not B|Not A) = 0.8
Calculate things like P (A | B) using Bayes Rule:
P(B|A)P(A)
P(A | B) = P(B|A)P(A)+P(B|A
c )P(Ac )
=
P(B|A)P(A)
P(B|A)P(A)+P(B|N otA)P(N otA)
(0.8)(0.6)
(0.8)(0.6)+(0.2)(0.4)
0.48
0.56
=
=
= 0.8571.
Calculate things like P (B) using the Multiplication Rule but rearranging it.
P (B and A) = P(B)P(A | B) ⇒ P(BandA)
= P(B).
P(A|B)
Now,
0.32
P(B) = P(BandA)
= 0.8571
= 0.3734.
P(A|B)
6
Chapter 14
6.1
Definitions
1. Random Variable
Assumes any of several different values as a result of some random event. Denoted by
a capital letter, such as X.
2. Discrete Random Variable
A random variable that can take one of a finite number of distinct outcomes.
3. Continuous Random Variable
A random variable that can take on any of an (uncountably) infinite number of outcomes.
4. Probability Model
A function that associates a probability P with each value of a discrete random variable
X, denoted P(X=x), or with any interval of values of a continuous random variable.
5. Expected Value
The expected value of a random variable is its theoretical long-run average value, the
center of its model. Represented by µ or E(X), P
it is found by summing the products
of variable values and probabilities. µ =E(X) = xp(x).
6. Variance
P
2
Expected
value
of
the
squared
deviations
from
the
mean.
σ
=
Var(X)
=
(x − µ)2 p(x) =
P
(x − E(X))2 p(x)
= E(X 2 ) − [E(X)]2
7. Standard Deviation of a Random Variable
Describes
the spread in the model and is the square root of the variance. σ =SDX) =
p
Var(X).
8. Bernoulli Trials
A sequence of trials are Bernoulli Trials if
• Exactly 2 possible outcomes (success and failure)
• Probability of success is constant
• Trials are independent
9. 10% Condition
When you sample more than 10% of the population the trials can’t really be independent so you shouldn’t casually assume independence.
10. Binomial Probability Distribution
Appropriate for a random variable that counts the number of successes in n Bernoulli
Trials.
11. Success/Failure Condition
A Binomial Model is approximately Normal if we expect at least 10 successes and 10
failures, i.e. np ≥ 10 and n(1 − p) ≥ 10.
6.2
Rules for Expected Value, Variances, and Standard Deviations
1. Changing a Random Variable by a constant number, say a or c.
E(X ± c) =E(X) ± c
Var(X ± c) = Var(X)
SD(X ± c) = SD(X)
E(aX) = aE(X)
Var(aX) = a2 Var(X)
SD(aX) = |a| SD(X)
2. Addition Rule for Expected Value of a Random Variable (X and Y are both random
variables)
E(X ± Y ) = E(X)± E(Y )
3. Addition Rule for Variance of a Random Variable (X and Y are both random variables).
Use ONLY when X and Y are independent.
Var(X ± Y ) = Var(X)
+ Var(Y )
p
SD (X ± Y ) = Var(X) + Var(Y )
6.3
Binomial Model:
P(X = x) =n Cx px (1 − p)n−x
E(X) = np
Var(X) = p
np(1 − p)
SD(X) = np(1 − p)
where
n!
n Cx = x!(n−x)!
n! = n(n − 1)(n − 2) · · · (1)
7
Chapter 15
7.1
Definitions
1. Sampling Distribution
Different random samples give different values of a statistic. Distribution of the statistics over all possible samples is called the sampling distribution. Sampling distribution
model shows the behavior of the statistic over all the possible samples for the same
size n.
2. Sampling Distribution Model
Because we can never see all possible samples, we often use a model as a practical way
of describing the theoretical sampling distribution.
3. Sampling Distribution Model for a Proportion
If assumptions of independence and random sampling are met, and we expect at least
10 successes and 10 failures, then the sampling distribution of a proportion is modeled
by a normal model p
with a mean equal to the true proportion value p and has a standard
deviation equal to p(1 − p)/n.
q
p(1−p)
p̂ ∼ N p,
n
4. Sampling Error
Sample-to-sample variation
5. Central Limit Theorem (CLT)
The sampling distribution model of the sample mean (and proportion) is approximately
Normal for large n, regardless of the distribution of the population as long as the
observations are independent. The larger the sample, the better the approximation
will be.
6. Sampling Distribution Model for a Mean
If assumptions of independence and random sampling are met, and the sample size is
large enough, the sampling distribution of the sample mean is modeled by a normal
model√with a mean equal to the population mean and has a standard deviation equal
to σ/ n.
σ
√
X ∼ N µ, n
8
Example Problems
Q15 pg 309 For his Statistics class experiment, researcher J. Gilbert decided to study how parents’
income affects children’s performance on standardized tests like the SAT. He proposed
to collect information from a random sample of test takers and examine the relationship
between parental income and SAT score.
(a) Is this an experiment or an observational study?
(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.
Q27 pg 311 In 2002, the journal Science reported that a study of women in Finland indicated that
having sons shortened the life spans of mothers by about 34 weeks per son, but that
daughters helped to lengthen the mothers’ lives. The data came from church records
from the period 1640 to 1870.
(a) Is this an experiment or an observational study?
(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.
Q31 pg 311 Some people claim they can get relief from migraine headache pain by drinking a large
glass of ice water. Researchers plan to enlist several people who suffer from migraines
in a test. When a participant experiences a migraine headache, he or she will take a
pill that may be a standard pain reliever or a placebo. Half of each group will also
drink ice water. Participants will then report the level of pain relief they experience.
(a) Is this an experiment or an observational study?
(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.
Q33 pg 311 Athletes who had suffered hamstring injuries were randomly assigned to one of two
exercise programs. Those who engaged in static stretching returned to sports activity
in a mean of 15.2 days faster than those assigned to a program of agility and truck
stabilization exercises.
(a) Is this an experiment or an observational study?
(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.
Q28 pg 336 In a large Introductory Statistics lecture hall, the professor reports that 55% of the
students enrolled have never taken a Calculus course, 32% have taken only one semester
of Calculus, and the rest have taken two or more semesters of Calculus. The professor
randomly assigns students to groups of three to work on a project of the course. What
is the probability that the first group-mate you meet has studied
(a) two or more semesters of Calculus?
(b) some Calculus?
(c) no more than one semester of Calculus?
Q30 pg 336 Continuation of Q28 pg 336. What is the probability that of your other two groupmates,
(a) neither has studied Calculus?
(b) both have studied at least one semester of Calculus?
(c) at least one has had more than one semester of Calculus?
Q45 pg 338 A certain bowler can bowl a strike 70% of the time. If the bowls are independent,
what’s the probability that she
(a) goes three consecutive frames without a strike?
(b) makes her first strike in the third frame?
(c) has at least one strike in the first three frames?
(d) bowls a perfect game (12 consecutive strikes)?
Q17 pg 357 A check of dorms revealed that 38% had refrigerators, 52% had TV’s and 21% had
both a TV and a refrigerator. What’s the probability that a randomly selected dorm
room has:
(a) a TV but no refrigerator
(b) a TV or refrigerator but not both
(c) neither a TV nor a refrigerator
Q19 pg 357 We are given information about the Education Level by Country in the below table:
Post Grad College Some HS Primary No Answer Total
China
7
315
671
506
3
1502
France
69
388
766
309
7
1539
India
161
514
622
227
11
1535
UK
58
207
1240
32
20
1557
US
84
486
896
87
4
1557
Total
379
1910
4195
1161
45
7690
Calculate the following probabilities:
(a) P(US)
(b) Probability that a person completed education before college? Do not include
those who did not answer.
(c) Probability that a person is from France or did post graduate study.
(d) Probability that a person is from France and finished primary school.
Q22 pg 357 An animal shelter states that it currently has 24 dogs and 18 cats available for adoption.
8 of the dog and 6 of the cats are male. Find the conditional probability of:
(a) pet is male, given that it is a cat
(b) pet is a cat, given that it is female
(c) pet is female, given that it is a dog
Followup to Q22 The local animal shelter in Q22 reported that it currently has 24 dogs and 18 cats
available for adoption; 8 of the dogs and 6 of the cats are male. Are being male and
being a dog independent events? Briefy justify your answer.
Q55 pg 360 Police setup checkpoints to catch drunk drivers. Based on the initial stop, trained
officers can make the right decision 80% of the time. Suppose a checkpoint is set up at
a time when it is estimated that about 12% of people have been drinking. Questions
to answer:
(a) Suppose a person is stopped and is not drinking. What is the probability that he
is detained for further testing?
(b) What’s the probability that any given driver will be detained?
(c) What’s the probability that a driver who is detained has actually been drinking?
(d) What’s the probability that a driver who was released had actually been drinking?
Q51 pg 360 A company’s records indicate that on any given day about 1% of their day-shift employees and 2% of the night-shift employees will miss work. Sixty percent of the employees
work the day shift. What percent of employees are absent on any given day?
1. We are given the following distribution for X.
X
P(X = x)
3
0.2
5
0.1
6
0.3
8
0.3
10
(a) What is the value of the missing probability in the table above?
(b) What is the expected value for X?
(c) What is the variance for X?
(d) What is the standard deviation for X?
2. We are given independent random variables with means and standard deviations as
shown.
X
Y
Mean
5
8
SD
2
3
Find the mean and standard deviation of
(a) 2X
(b) 3Y
(c) X + Y
(d) X − Y
(e) X1 + X2
(f) 5X − 2
(g) 8Y + 3
(h) 2X + 3Y
(i) 9X − 4Y
(j) −6X
Q43 pg 390 A grocery supplier believes that in a dozen eggs, the mean number of broken ones is 0.6
with a standard deviation of 0.5 eggs. You buy 3 dozen eggs without checking them.
(a) How many broken eggs do you expect to get?
(b) What’s the standard deviation?
(c) What assumptions did you have to make about the eggs in order to answer this
question?
3. A printing company ships boxes of paper to office stores. In each box, there are 30
reams of paper. However, in every box, they estimate that 2% of the reams of paper
are defective in some way. What is the probability that in a box, there will be exactly
4 reams of paper that need to be shipped back to the printing company? What is the
expected value of the number of reams of paper that need to be shipped back? What
is the variance?
4. The life span of an alarm clock is normally distributed with mean of 3 years and a
standard deviation of 1.2 years. What is the probability that the alarm clock lasts
(a) more than 4 years?
(b) less than 2.5 years?
5. The life span of a battery is normally distributed with a mean of 120 hours and a
standard deviation of 15 hours. A random sample of 50 batteries is collected and the
sample mean will be computed.
(a) What is the expected value of the sample mean?
(b) What is the standard deviation of the sample mean?
(c) Write down your model.
(d) Estimate the probability that the sample mean is between 105 and 115 hours.
(e) Estimate the probability that the sample mean is more than 122 hours.
(f) Give an interval that will contain the sample mean for 99.7% of samples.
6. We know that 30% of people own ice cube trays. A random sample of 500 people is
collected.
(a) Are we going to calculate information on the sample mean, or sample proportion?
(b) Write down the parameter that you are going to find information about.
(c) Find the expected value of the parameter you selected.
(d) Find the standard deviation of the parameter you selected.
(e) Write down your model.
(f) Estimate the probability that your parameter is less than 0.40.
(g) Estimate the probability that your parameter is between 0.27 and 0.32.
9
Example Solutions
Q15 pg 309
i. An observational study because no treatments were imposed.
ii. It is a retrospective study.
iii. Explanatory variable: Parental income. Response variable: SAT score.
Q27 pg 311
i. Observational study.
ii. Retrospective. Records were obtained from 1640 to 1870.
iii. Explanatory Variable: Having a son or a daughter. Response variable: Average life span of mothers.
Q31 pg 311
i. Experiment
ii. There are 2 factors - pain reliever and water temp. The pain reliever has 2
levels - pain reliever or placebo. The water temperature has 2 levels - ice
water or regular water. Total, there are 4 treatments.
iii. Explanatory variable: pain reliever and water temp. Response variable: level
of pain relief.
Q33 pg 311
i. Experiment
ii. There is 1 factor - type of exercise. This factor has 2 levels - static stretching
and trunk stabilization exercises. In total, there are 2 treatments.
iii. Explanatory variable: type of exercise. Response variable: time before the
athletes were able to return to sports.
Q28 pg 336 We are given that
P(no calculus) = 0.55,
P(1 semester) = 0.32.
i. P(2 or more) = 1 - P(no calculus) - P(1 semester) = 1-0.55-0.32 = 0.13.
ii. P(some calculus) = P(1 semester or 2 or more) = P(1 semester) + P(2 or
more) = 0.32+0.13 = 0.45.
iii. P(no more than one semester) = P(no calculus or 1 semester) = P(no calculus)
+ P(1 semester) = 0.55+0.32 = 0.87.
Q30 pg 336 From Q28 pg 336, we have that
P (no calculus) = 0.55,
P (at least 1 semester) = P (some calculus) = 0.45.
i. P (neither) = P (person 1 no calculus and person 2 no calculus)
= P (no calculus) P (no calculus)
= (0.55)(0.55)
= 0.3025.
ii. P (both) = P (person 1 some calculus and person 2 some calculus)
= P (some calculus) P (some calculus)
= (0.45)(0.45)
= 0.2025.
iii. Option 1:
P (at least one has had more than one semester)
= P (person 1 some calculus and person 2 no calculus OR person 1 no calculus
and person 2 some calculus OR person 1 some calculus and person 2 some
calculus)
= P (some calculus)P (no calculus) + P (no calculus)P (some calculus) + P (some
calculus)P (some calculus)
= (0.87)(0.13) + (0.13)(0.87) + (0.13)(0.13)
= 0.2431.
Option 2:
P (at least one has had more than one semester)
= 1 - P (neither)
= 1-0.7569
= 0.2431.
Q45 pg 338 Information given in the problem:
P (strike) = 0.7
P (no strike) = 0.3
i. goes three consecutive frames without a strike?
P (no strike and no strike and no strike) = P (no strike)P (no strike)P (no
strike)
= (0.3)(0.3)(0.3)
= (0.3)3
= 0.027
ii. makes her first strike in the third frame?
P (no strike and no strike and strike) = P (no strike)P (no strike)P (strike)
=(0.3)(0.3)(0.7)
= (0.3)2 (0.7)
= 0.063
iii. has at least one strike in the first three frames? P (no strike) P (at least 1
strike in first 3 frames) = 1- P (no strikes in first 3 frames)
= 1- 0.027
= 0.973
iv. bowls a perfect game (12 consecutive strikes)?
P (12 consecutive strikes) = P (strike)P (strike)· · · P (strike)
=(0.7)(0.7) · · · (0.7)
= (0.7)12
= 0.0138
Q17 pg 357 What we know:
• P(TV) = 0.52
• P(Refrigerator) = 0.38
• P(both) = P(TV and Refrigerator) = 0.21
A Venn Diagram (not shown) may help with this problem.
What else we can calculate (may or may not relate to the above questions asked):
• P(TV only) = P(TV) - P(both) = 0.52-0.21 = 0.31
• P(Refrigerator only) = P(Refrigerator) - P(both) = 0.38- 0.21 = 0.17
• P(TV or Refrigerator) = P(TV) + P(Refrigerator) - P(TV and Refrigerator)
= 0.52 + 0.38 - 0.21 = 0.69
Answers to questions:
i. P(TV but no refrigerator) = P(TV only) = 0.31
ii. P(TV or Refrigerator but not both) = P(TV or Refrigerator) - P(both) =
0.69 - 0.21 = 0.48
OR
P(TV or Refrigerator but not both) = P(TV only) + P(Refrigerator only) =
0.31 + 0.17 = 0.48
iii. P(neither a TV nor a Refrigerator) = 1 - P( (neither a TV nor a Refrigerator)C )
= 1 - P(TV or Refrigerator) = 1-0.69 = 0.31
OR
P(neither a TV nor a Refrigerator) = 1 - P(TV only) - P(Refrigerator only)
- P(both) = 1-0.31-0.17-0.21=0.31
Q19 pg 357
i. P(US) = 1557/7690= 0.2025
ii. Probability that a person completed education before college? Do not include
those who did not answer.
+ 1161
= 0.6965.
P(Some HS) + P(Primary) = 4195
7690
7690
iii. Probability that a person is from France or did post graduate study.
P(France or Post Grad) = P(France) + P(Post Grad) - P(both) = 1539
+
7690
379
69
−
=
0.2404.
7690
7690
iv. Probability that a person is from France and finished primary school.
309
P(France and Primary) = 7690
= 0.0402.
Q22 pg 357 A chart may help solve this problem. The below chart shows the initial information given to us:
Male
Female
Total
Cat
6
Dog
8
18
24
Total
We can then fill in the missing numbers:
Male
Female
Total
Cat
6
12
18
Dog
8
16
24
Total
14
28
42
Then we can answer the questions that we’re interested in.
P (M aleandCat)
6/42
= 18/42
= 13 = 0.3333
P (Cat)
emale)
12/42
P(Cat | Female) = P (CatandF
= 28/42
= 0.4286
P (F emale)
= 16/42
= 0.6667
P(Female | Dog) = P (F emaleandDog)
P (Dog)
24/42
i. P(Male | Cat) =
ii.
iii.
Followup to Q22 2 definitions for independence you could use:
• P(A)P(B) =P(AandB)
• P(A | B) = P(A)
Using each definition:
Def1:
14 336
P(Dog)P(M ) = 24
= 1764
= 0.1905
42
42
8
P (Dog and M) = 42 = 0.1905
Def2:
8
P(Dog | M ) = 14
= 0.5714
24
P(Dog) = 42 = 0.5714
Since the above 2 equations are equal using either definition, then yes, they are
independent.
Q55 pg 360 Before these questions are answered, set up a tree diagram. Note that the probability of being detained depends on whether a “correct” decision has been made.
Because of this, detained and not detained will go on the second branch of the
tree.
P (Drink and Detain) = (0.12)(0.8) = 0.096
in
a
t
De
0.8
No
tD
k
et
n
i
0.2 ain
Dr 2
P (Drink and Not Detain) = (0.12)(0.2) = 0.024
0.1
No
tD
0.8 rink
8
ain
Det
0.2
No
tD
et
0.8 ain
P(Not Drink and Detain)=(0.88)(0.2)=0.176
P(Not Drink and Not Detain)=(0.88)(0.8)=0.704
Here are the interpretations of the numbers in the tree diagram:
P(Drink) = 0.12
P(Not Drink) = 0.88
P(Detain | Drink) = 0.8
P(Not Detain | Drink) = 0.2
P(Detain | Not Drink) = 0.2
P(Not Detain | Not Drink) = 0.8
P(Drink and Detain) = 0.096
P(Drink and Not Detain) =0.024
P(Not Drink and Detain) =0.176
P(Not Drink and Not Detain) =0.704
To answer the questions:
i. P(Detain | Not Drink) = 0.2.
ii. P(Detain) = P(Detain and Drink) + P(Detain and Not Drink) = 0.096+0.176
= 0.272.
= 0.096
= 0.353.
iii. P(Drink | Detain) = P (DrinkandDetain)
P (detain)
0.272
iv. P(Drink | Not Detain) =
(0.2)(0.12)
(0.2)(0.12)+(0.8)(0.88)
P (N otDetain|Drink)P (Drink)
P (N otDetain|Drink)P (Drink)+P (N otDetain|N otDrink)P (N otDrink)
= 0.033.
Q51 pg 360 Before we answer any questions, it may be useful to create a tree diagram.
=
y
Da
0.6
Ni
0.4
gh
t
t
sen
Ab 1
0.0
No
tA
bs
0.9 ent
9
t
sen
Ab 2
0.0
No
tA
bs
0.9 ent
8
P (Day and Absent) = (0.6)(0.01) = 0.006
P (Day and Not Absent) = (0.6)(0.99) = 0.594
P (Night and Absent) = (0.4)(0.02) = 0.008
P (Night and Not Absent) = (0.4)(0.98) = 0.392
Question to answer: What percent of employees are absent on any given day?
Need to calculate P(Absent). This is the denominator of Bayes Rule.
P (Absent) = P (Absent | Day) P (Day) + P (Absent | Night) P (Night)
= (0.01)(0.6) + (0.02)(0.4)
= 0.014
= 1.4%.
7. (a) What is the value of the missing probability in the table above? The total probability must equal 1. Therefore, the missing value is then 1 − 0.2 − 0.1 − 0.3 − 0.3 =
0.1.
(b) What is the expected value for X?
E(X) = 3(0.2) + 5(0.1) + 6(0.3) + 8(0.3) + 10(0.1) = 6.3.
(c) What is the variance for X? There are 2 ways to calculate variance.
Option 1:
P
Var(X) = x (x−E(X))2 p(X = x)
= (3 − 6.3)2 (0.2) + (5 − 6.3)2 (0.1) + (6 − 6.3)2 (0.3)
+ (8 − 6.3)2 (0.3) + (10 − 6.3)2 (0.1)
= 4.61.
Option 2:
E(X) = 6.3
P
E(X 2 ) = x X 2 p(X = x)
= (32 )(0.2) + (52 )(0.1) + (62 )(0.3) + (82 )(0.3) + (102 )(0.1)
= 44.3
Var(X) = E(X 2 ) − [E(X)]2
= 44.3 − [6.3]2
= 44.3 − 39.69
= 4.61.
(d) What is the standard deviation for X?
p
√
SD(X) = Var(X) = 4.61 = 2.147.
8. Find the mean and standard deviation of
(a) 2X
E(2X) = 2 E(X) = 2(5) = 10
SD(2X) = |2| SD(X) = 2(2) = 4
(b) 3Y
E(3Y ) = 3E(Y ) = 3(8) = 24
SD(3Y ) = |3| SD(Y ) = 3(3) = 9
(c) X + Y
E(X + Y ) = E(X)+E(Y
) = 5 + 8 = 13
p
p
√
SD(X + Y ) = Var(X) + Var(Y ) = (2)2 + (3)2 = 4 + 9 = 3.606
(d) X − Y
E(X − Y ) =E(X)−
E(Y ) = 5 − 8 = −3
p
p
√
SD(X − Y ) = Var(X) + Var(Y ) = (2)2 + (3)2 = 4 + 9 = 3.606
(e) X1 + X2
E(X1 + X2 ) = E(X
10
p 1 )+ E(X2 ) = 5 + 5 = p
√
SD(X1 + X2 ) = Var(X1 ) + Var(X2 ) = (2)2 + (2)2 = 4 + 4 = 2.828
(f) 5X − 2
E(5X − 2) =E(5X) − 2 = 5E(X) − 2 = 5(5) − 2 = 23
SD(5X − 2) = SD(5X) = |5|SD(X) = 5(2) = 10
(g) 8Y + 3
E(8Y + 3) = E(8Y ) + 3 = 8E(Y ) + 3 = 8(8) + 3 = 67
SD(8Y + 3) =SD(8Y ) = |8|SD(Y ) = 8(3) = 24
(h) 2X + 3Y
E(2X + 3Y ) =E(2X)+E(3Y
) = 2E(X) + 3E(Y ) = 2(5) + 3(8) = 34
p
SD(2X
+
3Y
)
=
Var(2X)
+
Var(3Y )
p
2
2
= p2 Var(X) + 3 Var(Y )
= √ 4(2)2 + 9(3)2
= 16 + 81
= 9.849
(i) 9X − 4Y
E(9X − 4Y ) =E(9X)−E(4Y
) = 9E(X) − 4E(Y ) = 9(5) − 4(8) = 13
p
SD(9X
p − 4Y ) = Var(9X) + Var(4Y )
= p92 Var(X) + 42 Var(Y )
= √ 81(2)2 + 16(3)2
= 324 + 144
= 21.633
(j) −6X
E(−6X) = −6E(X) = −6(5) = −30
SD(−6X) = | − 6|SD(X) = 6(2) = 12
Q43 pg 390 (a) How many broken eggs do you expect to get? Let X =1 carton of a dozen eggs.
We know that E(X) = 0.6, SD(X) = 0.5. Now, when we take 3 cartons of eggs,
we are NOT taking 3× a single carton of eggs. This would be like cloning one
carton 3 times. We are taking 3 separate cartons. Let these 3 separate cartons be
denoted by X1 , X2 , X3 , and these cartons have the expected value and standard
deviation as listed above.
E(X1 + X2 + X3 ) =E(X1 )+E(X2 )+E(X3 )
= 0.6 + 0.6 + 0.6
= 1.8.
We expect to have 1.8 broken eggs.
(b) What’s the standard deviation?
p
Var(X1 ) + Var(X2 ) + VarX3 )
SD(X
+
X
+
X
)
=
1
2
3
p
2
2
= √ (0.5) + (0.5) + (0.5)2
= √0.25 + 0.25 + 0.25
= 0.75
= 0.87.
(c) What assumptions did you have to make about the eggs in order to answer this
question? We needed to assume that the cartons of eggs were independent of each
other in order to answer the standard deviation question.
9. This is an example of a Binomial Model problem. We are given that p = 0.02, n = 30.
We define “success” to be that a ream of paper that needs to be shipped back to the
printing company. Our model is b(30, 0.02). The probability that there will be exactly
4 reams of paper that need to be shipped back is
p(X = 4) =30 C4 (0.02)4 (1 − 0.02)30−4
= 27405(0.024 )(0.9826 )
= 27405(1.6 × 10−7 )(0.5914)
= 0.0026
You can also calculate this probability on your calculator as binompdf (30, 0.02, 4) =
0.0026.
The expected value is E(X) = np = 30(0.02) = 0.6. The variance is Var(X) =
np(1 − p) = 30(0.02)(0.98) = 0.588.
10. The model for this problem is N (3, 1.2). Let X = lifespan (in years) of an alarm clock.
(a) Probability lasts more than 4 years = P(X > 4). First, need to calculate the
z-score.
= 65 .
z = 4−3
1.2
Now, P(X > 4) = P Z > 65 = normalcdf 56 , 999 = 0.2023.
(b) Probability lasts less than 2.5 years = P(X < 2.5). First, need to calculate the
z-score.
5
= − 12
z = 2.5−3
1.2
5
5
Now, P(X < 2.5) = P Z < − 12
= normalcdf −999, − 12
= 0.3385.
11. (a) E(x) = 120.
√
√
(b) SD(x) = σ/ n = 15/ 50 = 2.121.
(c) x ∼ N (120, 2.121).
(d) We want to find P(105 < x < 115). Now, calculate the two z-score values that
we need.
= −7.07,
z1 = 105−120
2.121
115−120
z2 = 2.121 = −2.36.
So we want to calculate P(−7.07 < Z < −2.36) = normalcdf (−7.07, −2.36) =
0.00914.
(e) We want to find P(x > 122). Now, calculate the z-score value that we need.
z = 122−120
= 0.94.
2.121
So we want to calculate P(Z > 0.94) = normalcdf (0.94, 999) = 0.174.
(f) By the 68-95-99.7 Rule, we know that between µ ± 3σ we have 99.7% of the total
area. However,
since we are working with the sample mean, we want to calculate
√
µ ± 3σ/ n instead. Thus, our interval will be
µ ± 3 √σn = 120 ± 3(2.121)
= 120 ± 6.363
= (113.637, 126.363) .
12. We know that 30% of people own ice cube trays. A random sample of 500 people is
collected.
(a) Sample proportion because we are given information in percentages.
(b) Since we are looking for sample proportion, the parameter is p̂.
(c) E(p̂) = p = 0.3.
q
q
q
p(1−p)
0.3(1−0.3)
=
= 0.21
= 0.0205.
(d) SD(p̂) =
n
500
500
(e) p̂ ∼ N (0.3, 0.0205).
(f) We want to find P(p̂ < 0.4). First, calculate the z-score since we have a Normal
Model.
z = 0.4−0.3
= 4.88.
0.0205
So we want to calculate P(Z < 4.88) = normalcdf (−999, 4.88) = 0.99999947.
(g) We want to find P(0.27 < p̂ < 0.32). First, calculate the z-scores since we have a
Normal Model.
= −1.46,
z1 = 0.27−0.3
0.0205
0.32−0.3
z2 = 0.0205 = 0.98.
So we want to calculate P(−1.46 < Z < 0.98) = normalcdf (−1.46, 0.98) =
0.7643.
Download