Stat 101 Exam 2 - Embers Important Formulas and Concepts 1 1 Chapter 8 1.1 Definitions 1. Extrapolation In any regression situation it is unsafe. Predictions from extrapolation should not be trusted. 2. Outlier Any data point that stands away from the others. 3. Leverage Data points whose x-values are far from the mean of x. 4. Influential Point A point that ,if omitted from the data, results in a very different regression model. 1.2 Residual Plots A residual plot is a scatterplot that shows the residual versus x values. There are 3 separate conditions that must be met in order to have an appropriate linear model. These conditions are 1) Linearity Condition, 2) Outlier Condition, and 3) Equal Spread Condition. The Linearly Condition is violated when there are bends in a residual plot. The Outlier Condition is violated if any point is far from the rest of the points in the residual plot. The Equal Spread Condition is violated when the spread changes from one part of the plot to another. 2 Chapter 10 2.1 Definitions 1. Population The entire group of individuals or instances about whom we hope to learn. 2. Sample A (representative) subset of a population, examined in the hope of learning about the population. 1 This version: October 20, 2015, by Jennifer Pajda-De La O. May not include all things that could possibly be tested on. To be used as an additional reference to studying all Chapters 8 - 15. All definitions, formulas, and selected problems come from Intro Stats by De Veaux, Velleman and Bock, 4th edition, published by Pearson. 3. Sample Survey A study that asks questions of a sample drawn from some population in the hope of learning something about the entire population. 4. Randomization The best defense against bias is randomization, in which each individual is given a fair, random chance of selection. 5. Census A sample that consists of the entire population. 6. Population Parameter A numerically valued attribute of a model for a population. Example: mean income of all employed people in the USA 7. Sample statistic Statistics or sample statistics are values that are calculated for sample data. Example: mean income of employed people in a representative sample 8. Sampling Frame A list of individuals from whom the sample is drawn. Individuals who may be in the population of interest, but who are not in the sampling frame cannot be included in any sample. 9. Simple Random Sample (SRS) A SRS of sample size n is a sample in which each set of n elements in the population has an equal chance of selection. 10. Stratified Random Sampling A sampling design in which the population is divided into several subpopulations (strata) and random samples are then drawn from each stratum. Try to make strata as homogeneous as possible. 11. Cluster Sampling Entire groups, or clusters, are chosen at random. Clusters are heterogeneous. 12. Multistage Sampling Sampling schemes that combine several sampling methods. 13. Systematic sample A sample drawn by selecting individuals systematically from a sampling frame. 14. Voluntary response bias Bias introduced to a sample when individuals can choose on their own whether to participate in the sample. 15. Undercoverage bias Biases the sample in a way that gives a part of the population less representation in the sample than it has in the population. 16. Nonresponse bias Bias introduced when a large fraction of those sampled fails to respond. 17. Response bias Anything in a survey design that influences responses. 3 Chapter 11 1. Studies (a) Observational Study Study based on data in which no manipulation of factors has been employed. (b) Retrospective Study Observational study in which subjects are selected and then their previous conditions or behaviors are determined. Based on historical data and memories. (c) Prospective Study Observational study in which subjects are followed to observe future outcomes. Because no treatments are deliberately applied, it is not an experiment. 2. Matching in Studies In a retrospective of prospective study, participants who are similar in ways not under study may be matched and then compared with each other on the variables of interest. 3. Experiments (a) Factor Variable whose levels are manipulated by the experimenter. (b) Response Variable Variable whose values are compared across different treatments. (c) Experiment Manipulates factor levels to create treatments, randomly assigns subjects to these treatment levels, and then compares the responses of the subject groups across treatment levels. Tries to assess effects of treatments. (d) Levels Specific values that the experimenter chooses for a factor. (e) Treatment Process, intervention, or other controlled circumstance applied to randomly assigned experimental units. (f) Block When groups of experimental units are similar in a way that is not a factor under study, it is often a good idea to gather them together into blocks and then randomize the assignment of treatments within each block. 4. Randomization through Random Assignment An experiment must assign experimental units (individuals) to treatment groups using some form of randomization. 5. Principles of Experimental Design (a) Control Control aspects of the experiment that we know may have an effect on the response, but that are not the factors being studied. (b) Randomize Randomize subjects to treatments to even out effects that we cannot control. (c) Replicate Replicate over as many subjects as possible. (d) Block Reduce the effects of identifiable attributes of the subjects that cannot be controlled. 6. Statistically Significant When an observed difference is too large for us to believe that it is likely to have occurred naturally, we consider the difference to be statistically significant. 7. Types of Experiments (a) Completely randomized design (CRD) All experimental units have an equal chance of receiving any treatment. (b) Randomized Block Design (RBD) Participants are randomly assigned to treatments within each block. 8. Control Treatment Baseline treatment. 9. Control Group Experimental units assigned to a baseline treatment level typically either the default treatment or a placebo treatment. Responses provide a basis for comparison. 10. Blinding Any individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups. 11. Single/Double Blind • Those who could influence the results. • Those who evaluate the results. Single Blind: when either of the two above statements is blinded. Double Blind: when both of the two above statements is blinded. 12. Placebo A treatment known to have no effect. 13. Placebo Effect The tendency of human subjects to show a response even when administered a placebo. 14. Potential Problems (a) Confounding When the levels of one factor are associated with the levels of another factor in such a way that their effects cannot be separated, we say that these two factors are confounded. (b) Lurking Variable A variable associated with both y and x that makes it appear that x may be causing y. 15. In summary, the best experiments are usually 1) Randomized, 2) Comparative, 3) Double-blind, and 4) Placebo-controlled. 4 Chapter 12 4.1 Definitions 1. Random Phenomenon A phenomenon is random if we know what outcomes could happen, but not which particular values will happen. 2. Trial A single attempt or realization of a random phenomenon. 3. Outcome The value measured, observed, or reported for an individual instance of a trial. 4. Event A collection of outcomes. Usually, we identify events so that we can attach probabilities to them. Denote events with bold capital letters like A, B, etc. 5. Sample Space The collection of all possible outcome values. The collection of values in the sample space has a probability of 1. Denote by S or Ω. 6. Law of Large Numbers (LLN) This law states that the long-run relative frequency of an event’s occurrence gets closer and closer to the true relative frequency as the number of trials increases. 7. Independence (informal definition) 2 events are independent if learning that one event occurs does not change the probability that the other event occurs. 8. Probability A number between 0 and 1 that reports the likelihood of that event’s occurrence. Write P(A) for the probability of event A. 9. Empirical Probability When the probability comes from the long-run relative frequency of the event’s occurrence. 10. Theoretical Probability When the probability comes from a model (such as equally likely outcomes). P (A) = # outcomes in A divided by # all possible outcomes 11. Personal (or subjective) Probability When the probability is subjective and represents your personal degree of belief. 12. Legitimate Assignment of Probabilities An assignment of probabilities to outcomes is legitimate if • each probability is greater than or equal to 0 and less than or equal to 1 • the sum of the probabilities = 1 4.2 Rules on Probability 1. For all events A, 0 ≤ P (A) ≤ 1. 2. Probability Assignment Rule • P(S) = P(Ω) = 1 • The set of all possible outcomes of a trial must have probability = 1. 3. Complement Rule • Set of outcomes that are not in the event A is the complement AC • P (AC ) = 1 − P (A) • The probability of an event not occurring is 1 minus the probability that it occurs 4. Addition Rule • For 2 disjoint events A and B, the probability that one or the other occurs is the sum of the probability of the two events. • P (A or B) = P (A) + P (B) where A and B are disjoint • disjoint means mutually exclusive; there are no outcomes in common 5. Multiplication Rule • For two independent events A and B, the probability that both A and B occur is the product of the probabilities of the two events. • P (A and B) = P (A)P (B) where A and B are independent 5 Chapter 13 5.1 Definitions 1. General Addition Rule For any two events A and B, the probability of A or B is P (A or B) = P (A) + P (B) − P (A and B). This rule does NOT require disjoint events. 2. Conditional Probability The conditional probability of the event B given the event A has occurred is P (B | A) = P (AandB) . P (A) 3. General Multiplication Rule For any two events A and B, the probability of A and B is P (A and B) = P (A)P (B | A). This rule does NOT require independence. 4. Independent Events A and B are independent when P (B | A) = P (B). Note: independent is not the same as disjoint. 5. Tree Diagram A display of conditional events or probabilities that is helpful in thinking through conditioning. 6. Bayes Rule P (B | A) = P (A|B)P (B) . P (A|B)P (B)+P (A|BC )P (BC ) 5.2 Tree Diagram Example and Interpretations of Every Node Example Probabilities are Given P (A and B) = (0.6)(0.8) = 0.48 B 0.8 A 0.6 No t 0.4 A No tB 0.2 P (A and Not B) = (0.6)(0.2) = 0.12 P (Not A and B) = (0.4)(0.2) = 0.08 B 0.2 No tB 0.8 P (Not A and Not B) = (0.4)(0.8) = 0.32 Here are the mathematical interpretations of the numbers in the tree diagram: P (A) = 0.6 P (Not A) = 0.4 P (A and B) = 0.48 P (A and Not B) = 0.12 P (B | A) = 0.8 P (Not B|A) = 0.2 P (Not A and B) = 0.08 P (Not A and Not B) = 0.32 P (B |Not A) = 0.2 P (Not B|Not A) = 0.8 Calculate things like P (A | B) using Bayes Rule: P(B|A)P(A) P(A | B) = P(B|A)P(A)+P(B|A c )P(Ac ) = P(B|A)P(A) P(B|A)P(A)+P(B|N otA)P(N otA) (0.8)(0.6) (0.8)(0.6)+(0.2)(0.4) 0.48 0.56 = = = 0.8571. Calculate things like P (B) using the Multiplication Rule but rearranging it. P (B and A) = P(B)P(A | B) ⇒ P(BandA) = P(B). P(A|B) Now, 0.32 P(B) = P(BandA) = 0.8571 = 0.3734. P(A|B) 6 Chapter 14 6.1 Definitions 1. Random Variable Assumes any of several different values as a result of some random event. Denoted by a capital letter, such as X. 2. Discrete Random Variable A random variable that can take one of a finite number of distinct outcomes. 3. Continuous Random Variable A random variable that can take on any of an (uncountably) infinite number of outcomes. 4. Probability Model A function that associates a probability P with each value of a discrete random variable X, denoted P(X=x), or with any interval of values of a continuous random variable. 5. Expected Value The expected value of a random variable is its theoretical long-run average value, the center of its model. Represented by µ or E(X), P it is found by summing the products of variable values and probabilities. µ =E(X) = xp(x). 6. Variance P 2 Expected value of the squared deviations from the mean. σ = Var(X) = (x − µ)2 p(x) = P (x − E(X))2 p(x) = E(X 2 ) − [E(X)]2 7. Standard Deviation of a Random Variable Describes the spread in the model and is the square root of the variance. σ =SDX) = p Var(X). 8. Bernoulli Trials A sequence of trials are Bernoulli Trials if • Exactly 2 possible outcomes (success and failure) • Probability of success is constant • Trials are independent 9. 10% Condition When you sample more than 10% of the population the trials can’t really be independent so you shouldn’t casually assume independence. 10. Binomial Probability Distribution Appropriate for a random variable that counts the number of successes in n Bernoulli Trials. 11. Success/Failure Condition A Binomial Model is approximately Normal if we expect at least 10 successes and 10 failures, i.e. np ≥ 10 and n(1 − p) ≥ 10. 6.2 Rules for Expected Value, Variances, and Standard Deviations 1. Changing a Random Variable by a constant number, say a or c. E(X ± c) =E(X) ± c Var(X ± c) = Var(X) SD(X ± c) = SD(X) E(aX) = aE(X) Var(aX) = a2 Var(X) SD(aX) = |a| SD(X) 2. Addition Rule for Expected Value of a Random Variable (X and Y are both random variables) E(X ± Y ) = E(X)± E(Y ) 3. Addition Rule for Variance of a Random Variable (X and Y are both random variables). Use ONLY when X and Y are independent. Var(X ± Y ) = Var(X) + Var(Y ) p SD (X ± Y ) = Var(X) + Var(Y ) 6.3 Binomial Model: P(X = x) =n Cx px (1 − p)n−x E(X) = np Var(X) = p np(1 − p) SD(X) = np(1 − p) where n! n Cx = x!(n−x)! n! = n(n − 1)(n − 2) · · · (1) 7 Chapter 15 7.1 Definitions 1. Sampling Distribution Different random samples give different values of a statistic. Distribution of the statistics over all possible samples is called the sampling distribution. Sampling distribution model shows the behavior of the statistic over all the possible samples for the same size n. 2. Sampling Distribution Model Because we can never see all possible samples, we often use a model as a practical way of describing the theoretical sampling distribution. 3. Sampling Distribution Model for a Proportion If assumptions of independence and random sampling are met, and we expect at least 10 successes and 10 failures, then the sampling distribution of a proportion is modeled by a normal model p with a mean equal to the true proportion value p and has a standard deviation equal to p(1 − p)/n. q p(1−p) p̂ ∼ N p, n 4. Sampling Error Sample-to-sample variation 5. Central Limit Theorem (CLT) The sampling distribution model of the sample mean (and proportion) is approximately Normal for large n, regardless of the distribution of the population as long as the observations are independent. The larger the sample, the better the approximation will be. 6. Sampling Distribution Model for a Mean If assumptions of independence and random sampling are met, and the sample size is large enough, the sampling distribution of the sample mean is modeled by a normal model√with a mean equal to the population mean and has a standard deviation equal to σ/ n. σ √ X ∼ N µ, n 8 Example Problems Q15 pg 309 For his Statistics class experiment, researcher J. Gilbert decided to study how parents’ income affects children’s performance on standardized tests like the SAT. He proposed to collect information from a random sample of test takers and examine the relationship between parental income and SAT score. (a) Is this an experiment or an observational study? (b) If it is a study, is it retrospective or prospective? If it is an experiment, how many factors are there? (c) Identify the explanatory variable and response variable. Q27 pg 311 In 2002, the journal Science reported that a study of women in Finland indicated that having sons shortened the life spans of mothers by about 34 weeks per son, but that daughters helped to lengthen the mothers’ lives. The data came from church records from the period 1640 to 1870. (a) Is this an experiment or an observational study? (b) If it is a study, is it retrospective or prospective? If it is an experiment, how many factors are there? (c) Identify the explanatory variable and response variable. Q31 pg 311 Some people claim they can get relief from migraine headache pain by drinking a large glass of ice water. Researchers plan to enlist several people who suffer from migraines in a test. When a participant experiences a migraine headache, he or she will take a pill that may be a standard pain reliever or a placebo. Half of each group will also drink ice water. Participants will then report the level of pain relief they experience. (a) Is this an experiment or an observational study? (b) If it is a study, is it retrospective or prospective? If it is an experiment, how many factors are there? (c) Identify the explanatory variable and response variable. Q33 pg 311 Athletes who had suffered hamstring injuries were randomly assigned to one of two exercise programs. Those who engaged in static stretching returned to sports activity in a mean of 15.2 days faster than those assigned to a program of agility and truck stabilization exercises. (a) Is this an experiment or an observational study? (b) If it is a study, is it retrospective or prospective? If it is an experiment, how many factors are there? (c) Identify the explanatory variable and response variable. Q28 pg 336 In a large Introductory Statistics lecture hall, the professor reports that 55% of the students enrolled have never taken a Calculus course, 32% have taken only one semester of Calculus, and the rest have taken two or more semesters of Calculus. The professor randomly assigns students to groups of three to work on a project of the course. What is the probability that the first group-mate you meet has studied (a) two or more semesters of Calculus? (b) some Calculus? (c) no more than one semester of Calculus? Q30 pg 336 Continuation of Q28 pg 336. What is the probability that of your other two groupmates, (a) neither has studied Calculus? (b) both have studied at least one semester of Calculus? (c) at least one has had more than one semester of Calculus? Q45 pg 338 A certain bowler can bowl a strike 70% of the time. If the bowls are independent, what’s the probability that she (a) goes three consecutive frames without a strike? (b) makes her first strike in the third frame? (c) has at least one strike in the first three frames? (d) bowls a perfect game (12 consecutive strikes)? Q17 pg 357 A check of dorms revealed that 38% had refrigerators, 52% had TV’s and 21% had both a TV and a refrigerator. What’s the probability that a randomly selected dorm room has: (a) a TV but no refrigerator (b) a TV or refrigerator but not both (c) neither a TV nor a refrigerator Q19 pg 357 We are given information about the Education Level by Country in the below table: Post Grad College Some HS Primary No Answer Total China 7 315 671 506 3 1502 France 69 388 766 309 7 1539 India 161 514 622 227 11 1535 UK 58 207 1240 32 20 1557 US 84 486 896 87 4 1557 Total 379 1910 4195 1161 45 7690 Calculate the following probabilities: (a) P(US) (b) Probability that a person completed education before college? Do not include those who did not answer. (c) Probability that a person is from France or did post graduate study. (d) Probability that a person is from France and finished primary school. Q22 pg 357 An animal shelter states that it currently has 24 dogs and 18 cats available for adoption. 8 of the dog and 6 of the cats are male. Find the conditional probability of: (a) pet is male, given that it is a cat (b) pet is a cat, given that it is female (c) pet is female, given that it is a dog Followup to Q22 The local animal shelter in Q22 reported that it currently has 24 dogs and 18 cats available for adoption; 8 of the dogs and 6 of the cats are male. Are being male and being a dog independent events? Briefy justify your answer. Q55 pg 360 Police setup checkpoints to catch drunk drivers. Based on the initial stop, trained officers can make the right decision 80% of the time. Suppose a checkpoint is set up at a time when it is estimated that about 12% of people have been drinking. Questions to answer: (a) Suppose a person is stopped and is not drinking. What is the probability that he is detained for further testing? (b) What’s the probability that any given driver will be detained? (c) What’s the probability that a driver who is detained has actually been drinking? (d) What’s the probability that a driver who was released had actually been drinking? Q51 pg 360 A company’s records indicate that on any given day about 1% of their day-shift employees and 2% of the night-shift employees will miss work. Sixty percent of the employees work the day shift. What percent of employees are absent on any given day? 1. We are given the following distribution for X. X P(X = x) 3 0.2 5 0.1 6 0.3 8 0.3 10 (a) What is the value of the missing probability in the table above? (b) What is the expected value for X? (c) What is the variance for X? (d) What is the standard deviation for X? 2. We are given independent random variables with means and standard deviations as shown. X Y Mean 5 8 SD 2 3 Find the mean and standard deviation of (a) 2X (b) 3Y (c) X + Y (d) X − Y (e) X1 + X2 (f) 5X − 2 (g) 8Y + 3 (h) 2X + 3Y (i) 9X − 4Y (j) −6X Q43 pg 390 A grocery supplier believes that in a dozen eggs, the mean number of broken ones is 0.6 with a standard deviation of 0.5 eggs. You buy 3 dozen eggs without checking them. (a) How many broken eggs do you expect to get? (b) What’s the standard deviation? (c) What assumptions did you have to make about the eggs in order to answer this question? 3. A printing company ships boxes of paper to office stores. In each box, there are 30 reams of paper. However, in every box, they estimate that 2% of the reams of paper are defective in some way. What is the probability that in a box, there will be exactly 4 reams of paper that need to be shipped back to the printing company? What is the expected value of the number of reams of paper that need to be shipped back? What is the variance? 4. The life span of an alarm clock is normally distributed with mean of 3 years and a standard deviation of 1.2 years. What is the probability that the alarm clock lasts (a) more than 4 years? (b) less than 2.5 years? 5. The life span of a battery is normally distributed with a mean of 120 hours and a standard deviation of 15 hours. A random sample of 50 batteries is collected and the sample mean will be computed. (a) What is the expected value of the sample mean? (b) What is the standard deviation of the sample mean? (c) Write down your model. (d) Estimate the probability that the sample mean is between 105 and 115 hours. (e) Estimate the probability that the sample mean is more than 122 hours. (f) Give an interval that will contain the sample mean for 99.7% of samples. 6. We know that 30% of people own ice cube trays. A random sample of 500 people is collected. (a) Are we going to calculate information on the sample mean, or sample proportion? (b) Write down the parameter that you are going to find information about. (c) Find the expected value of the parameter you selected. (d) Find the standard deviation of the parameter you selected. (e) Write down your model. (f) Estimate the probability that your parameter is less than 0.40. (g) Estimate the probability that your parameter is between 0.27 and 0.32. 9 Example Solutions Q15 pg 309 i. An observational study because no treatments were imposed. ii. It is a retrospective study. iii. Explanatory variable: Parental income. Response variable: SAT score. Q27 pg 311 i. Observational study. ii. Retrospective. Records were obtained from 1640 to 1870. iii. Explanatory Variable: Having a son or a daughter. Response variable: Average life span of mothers. Q31 pg 311 i. Experiment ii. There are 2 factors - pain reliever and water temp. The pain reliever has 2 levels - pain reliever or placebo. The water temperature has 2 levels - ice water or regular water. Total, there are 4 treatments. iii. Explanatory variable: pain reliever and water temp. Response variable: level of pain relief. Q33 pg 311 i. Experiment ii. There is 1 factor - type of exercise. This factor has 2 levels - static stretching and trunk stabilization exercises. In total, there are 2 treatments. iii. Explanatory variable: type of exercise. Response variable: time before the athletes were able to return to sports. Q28 pg 336 We are given that P(no calculus) = 0.55, P(1 semester) = 0.32. i. P(2 or more) = 1 - P(no calculus) - P(1 semester) = 1-0.55-0.32 = 0.13. ii. P(some calculus) = P(1 semester or 2 or more) = P(1 semester) + P(2 or more) = 0.32+0.13 = 0.45. iii. P(no more than one semester) = P(no calculus or 1 semester) = P(no calculus) + P(1 semester) = 0.55+0.32 = 0.87. Q30 pg 336 From Q28 pg 336, we have that P (no calculus) = 0.55, P (at least 1 semester) = P (some calculus) = 0.45. i. P (neither) = P (person 1 no calculus and person 2 no calculus) = P (no calculus) P (no calculus) = (0.55)(0.55) = 0.3025. ii. P (both) = P (person 1 some calculus and person 2 some calculus) = P (some calculus) P (some calculus) = (0.45)(0.45) = 0.2025. iii. Option 1: P (at least one has had more than one semester) = P (person 1 some calculus and person 2 no calculus OR person 1 no calculus and person 2 some calculus OR person 1 some calculus and person 2 some calculus) = P (some calculus)P (no calculus) + P (no calculus)P (some calculus) + P (some calculus)P (some calculus) = (0.87)(0.13) + (0.13)(0.87) + (0.13)(0.13) = 0.2431. Option 2: P (at least one has had more than one semester) = 1 - P (neither) = 1-0.7569 = 0.2431. Q45 pg 338 Information given in the problem: P (strike) = 0.7 P (no strike) = 0.3 i. goes three consecutive frames without a strike? P (no strike and no strike and no strike) = P (no strike)P (no strike)P (no strike) = (0.3)(0.3)(0.3) = (0.3)3 = 0.027 ii. makes her first strike in the third frame? P (no strike and no strike and strike) = P (no strike)P (no strike)P (strike) =(0.3)(0.3)(0.7) = (0.3)2 (0.7) = 0.063 iii. has at least one strike in the first three frames? P (no strike) P (at least 1 strike in first 3 frames) = 1- P (no strikes in first 3 frames) = 1- 0.027 = 0.973 iv. bowls a perfect game (12 consecutive strikes)? P (12 consecutive strikes) = P (strike)P (strike)· · · P (strike) =(0.7)(0.7) · · · (0.7) = (0.7)12 = 0.0138 Q17 pg 357 What we know: • P(TV) = 0.52 • P(Refrigerator) = 0.38 • P(both) = P(TV and Refrigerator) = 0.21 A Venn Diagram (not shown) may help with this problem. What else we can calculate (may or may not relate to the above questions asked): • P(TV only) = P(TV) - P(both) = 0.52-0.21 = 0.31 • P(Refrigerator only) = P(Refrigerator) - P(both) = 0.38- 0.21 = 0.17 • P(TV or Refrigerator) = P(TV) + P(Refrigerator) - P(TV and Refrigerator) = 0.52 + 0.38 - 0.21 = 0.69 Answers to questions: i. P(TV but no refrigerator) = P(TV only) = 0.31 ii. P(TV or Refrigerator but not both) = P(TV or Refrigerator) - P(both) = 0.69 - 0.21 = 0.48 OR P(TV or Refrigerator but not both) = P(TV only) + P(Refrigerator only) = 0.31 + 0.17 = 0.48 iii. P(neither a TV nor a Refrigerator) = 1 - P( (neither a TV nor a Refrigerator)C ) = 1 - P(TV or Refrigerator) = 1-0.69 = 0.31 OR P(neither a TV nor a Refrigerator) = 1 - P(TV only) - P(Refrigerator only) - P(both) = 1-0.31-0.17-0.21=0.31 Q19 pg 357 i. P(US) = 1557/7690= 0.2025 ii. Probability that a person completed education before college? Do not include those who did not answer. + 1161 = 0.6965. P(Some HS) + P(Primary) = 4195 7690 7690 iii. Probability that a person is from France or did post graduate study. P(France or Post Grad) = P(France) + P(Post Grad) - P(both) = 1539 + 7690 379 69 − = 0.2404. 7690 7690 iv. Probability that a person is from France and finished primary school. 309 P(France and Primary) = 7690 = 0.0402. Q22 pg 357 A chart may help solve this problem. The below chart shows the initial information given to us: Male Female Total Cat 6 Dog 8 18 24 Total We can then fill in the missing numbers: Male Female Total Cat 6 12 18 Dog 8 16 24 Total 14 28 42 Then we can answer the questions that we’re interested in. P (M aleandCat) 6/42 = 18/42 = 13 = 0.3333 P (Cat) emale) 12/42 P(Cat | Female) = P (CatandF = 28/42 = 0.4286 P (F emale) = 16/42 = 0.6667 P(Female | Dog) = P (F emaleandDog) P (Dog) 24/42 i. P(Male | Cat) = ii. iii. Followup to Q22 2 definitions for independence you could use: • P(A)P(B) =P(AandB) • P(A | B) = P(A) Using each definition: Def1: 14 336 P(Dog)P(M ) = 24 = 1764 = 0.1905 42 42 8 P (Dog and M) = 42 = 0.1905 Def2: 8 P(Dog | M ) = 14 = 0.5714 24 P(Dog) = 42 = 0.5714 Since the above 2 equations are equal using either definition, then yes, they are independent. Q55 pg 360 Before these questions are answered, set up a tree diagram. Note that the probability of being detained depends on whether a “correct” decision has been made. Because of this, detained and not detained will go on the second branch of the tree. P (Drink and Detain) = (0.12)(0.8) = 0.096 in a t De 0.8 No tD k et n i 0.2 ain Dr 2 P (Drink and Not Detain) = (0.12)(0.2) = 0.024 0.1 No tD 0.8 rink 8 ain Det 0.2 No tD et 0.8 ain P(Not Drink and Detain)=(0.88)(0.2)=0.176 P(Not Drink and Not Detain)=(0.88)(0.8)=0.704 Here are the interpretations of the numbers in the tree diagram: P(Drink) = 0.12 P(Not Drink) = 0.88 P(Detain | Drink) = 0.8 P(Not Detain | Drink) = 0.2 P(Detain | Not Drink) = 0.2 P(Not Detain | Not Drink) = 0.8 P(Drink and Detain) = 0.096 P(Drink and Not Detain) =0.024 P(Not Drink and Detain) =0.176 P(Not Drink and Not Detain) =0.704 To answer the questions: i. P(Detain | Not Drink) = 0.2. ii. P(Detain) = P(Detain and Drink) + P(Detain and Not Drink) = 0.096+0.176 = 0.272. = 0.096 = 0.353. iii. P(Drink | Detain) = P (DrinkandDetain) P (detain) 0.272 iv. P(Drink | Not Detain) = (0.2)(0.12) (0.2)(0.12)+(0.8)(0.88) P (N otDetain|Drink)P (Drink) P (N otDetain|Drink)P (Drink)+P (N otDetain|N otDrink)P (N otDrink) = 0.033. Q51 pg 360 Before we answer any questions, it may be useful to create a tree diagram. = y Da 0.6 Ni 0.4 gh t t sen Ab 1 0.0 No tA bs 0.9 ent 9 t sen Ab 2 0.0 No tA bs 0.9 ent 8 P (Day and Absent) = (0.6)(0.01) = 0.006 P (Day and Not Absent) = (0.6)(0.99) = 0.594 P (Night and Absent) = (0.4)(0.02) = 0.008 P (Night and Not Absent) = (0.4)(0.98) = 0.392 Question to answer: What percent of employees are absent on any given day? Need to calculate P(Absent). This is the denominator of Bayes Rule. P (Absent) = P (Absent | Day) P (Day) + P (Absent | Night) P (Night) = (0.01)(0.6) + (0.02)(0.4) = 0.014 = 1.4%. 7. (a) What is the value of the missing probability in the table above? The total probability must equal 1. Therefore, the missing value is then 1 − 0.2 − 0.1 − 0.3 − 0.3 = 0.1. (b) What is the expected value for X? E(X) = 3(0.2) + 5(0.1) + 6(0.3) + 8(0.3) + 10(0.1) = 6.3. (c) What is the variance for X? There are 2 ways to calculate variance. Option 1: P Var(X) = x (x−E(X))2 p(X = x) = (3 − 6.3)2 (0.2) + (5 − 6.3)2 (0.1) + (6 − 6.3)2 (0.3) + (8 − 6.3)2 (0.3) + (10 − 6.3)2 (0.1) = 4.61. Option 2: E(X) = 6.3 P E(X 2 ) = x X 2 p(X = x) = (32 )(0.2) + (52 )(0.1) + (62 )(0.3) + (82 )(0.3) + (102 )(0.1) = 44.3 Var(X) = E(X 2 ) − [E(X)]2 = 44.3 − [6.3]2 = 44.3 − 39.69 = 4.61. (d) What is the standard deviation for X? p √ SD(X) = Var(X) = 4.61 = 2.147. 8. Find the mean and standard deviation of (a) 2X E(2X) = 2 E(X) = 2(5) = 10 SD(2X) = |2| SD(X) = 2(2) = 4 (b) 3Y E(3Y ) = 3E(Y ) = 3(8) = 24 SD(3Y ) = |3| SD(Y ) = 3(3) = 9 (c) X + Y E(X + Y ) = E(X)+E(Y ) = 5 + 8 = 13 p p √ SD(X + Y ) = Var(X) + Var(Y ) = (2)2 + (3)2 = 4 + 9 = 3.606 (d) X − Y E(X − Y ) =E(X)− E(Y ) = 5 − 8 = −3 p p √ SD(X − Y ) = Var(X) + Var(Y ) = (2)2 + (3)2 = 4 + 9 = 3.606 (e) X1 + X2 E(X1 + X2 ) = E(X 10 p 1 )+ E(X2 ) = 5 + 5 = p √ SD(X1 + X2 ) = Var(X1 ) + Var(X2 ) = (2)2 + (2)2 = 4 + 4 = 2.828 (f) 5X − 2 E(5X − 2) =E(5X) − 2 = 5E(X) − 2 = 5(5) − 2 = 23 SD(5X − 2) = SD(5X) = |5|SD(X) = 5(2) = 10 (g) 8Y + 3 E(8Y + 3) = E(8Y ) + 3 = 8E(Y ) + 3 = 8(8) + 3 = 67 SD(8Y + 3) =SD(8Y ) = |8|SD(Y ) = 8(3) = 24 (h) 2X + 3Y E(2X + 3Y ) =E(2X)+E(3Y ) = 2E(X) + 3E(Y ) = 2(5) + 3(8) = 34 p SD(2X + 3Y ) = Var(2X) + Var(3Y ) p 2 2 = p2 Var(X) + 3 Var(Y ) = √ 4(2)2 + 9(3)2 = 16 + 81 = 9.849 (i) 9X − 4Y E(9X − 4Y ) =E(9X)−E(4Y ) = 9E(X) − 4E(Y ) = 9(5) − 4(8) = 13 p SD(9X p − 4Y ) = Var(9X) + Var(4Y ) = p92 Var(X) + 42 Var(Y ) = √ 81(2)2 + 16(3)2 = 324 + 144 = 21.633 (j) −6X E(−6X) = −6E(X) = −6(5) = −30 SD(−6X) = | − 6|SD(X) = 6(2) = 12 Q43 pg 390 (a) How many broken eggs do you expect to get? Let X =1 carton of a dozen eggs. We know that E(X) = 0.6, SD(X) = 0.5. Now, when we take 3 cartons of eggs, we are NOT taking 3× a single carton of eggs. This would be like cloning one carton 3 times. We are taking 3 separate cartons. Let these 3 separate cartons be denoted by X1 , X2 , X3 , and these cartons have the expected value and standard deviation as listed above. E(X1 + X2 + X3 ) =E(X1 )+E(X2 )+E(X3 ) = 0.6 + 0.6 + 0.6 = 1.8. We expect to have 1.8 broken eggs. (b) What’s the standard deviation? p Var(X1 ) + Var(X2 ) + VarX3 ) SD(X + X + X ) = 1 2 3 p 2 2 = √ (0.5) + (0.5) + (0.5)2 = √0.25 + 0.25 + 0.25 = 0.75 = 0.87. (c) What assumptions did you have to make about the eggs in order to answer this question? We needed to assume that the cartons of eggs were independent of each other in order to answer the standard deviation question. 9. This is an example of a Binomial Model problem. We are given that p = 0.02, n = 30. We define “success” to be that a ream of paper that needs to be shipped back to the printing company. Our model is b(30, 0.02). The probability that there will be exactly 4 reams of paper that need to be shipped back is p(X = 4) =30 C4 (0.02)4 (1 − 0.02)30−4 = 27405(0.024 )(0.9826 ) = 27405(1.6 × 10−7 )(0.5914) = 0.0026 You can also calculate this probability on your calculator as binompdf (30, 0.02, 4) = 0.0026. The expected value is E(X) = np = 30(0.02) = 0.6. The variance is Var(X) = np(1 − p) = 30(0.02)(0.98) = 0.588. 10. The model for this problem is N (3, 1.2). Let X = lifespan (in years) of an alarm clock. (a) Probability lasts more than 4 years = P(X > 4). First, need to calculate the z-score. = 65 . z = 4−3 1.2 Now, P(X > 4) = P Z > 65 = normalcdf 56 , 999 = 0.2023. (b) Probability lasts less than 2.5 years = P(X < 2.5). First, need to calculate the z-score. 5 = − 12 z = 2.5−3 1.2 5 5 Now, P(X < 2.5) = P Z < − 12 = normalcdf −999, − 12 = 0.3385. 11. (a) E(x) = 120. √ √ (b) SD(x) = σ/ n = 15/ 50 = 2.121. (c) x ∼ N (120, 2.121). (d) We want to find P(105 < x < 115). Now, calculate the two z-score values that we need. = −7.07, z1 = 105−120 2.121 115−120 z2 = 2.121 = −2.36. So we want to calculate P(−7.07 < Z < −2.36) = normalcdf (−7.07, −2.36) = 0.00914. (e) We want to find P(x > 122). Now, calculate the z-score value that we need. z = 122−120 = 0.94. 2.121 So we want to calculate P(Z > 0.94) = normalcdf (0.94, 999) = 0.174. (f) By the 68-95-99.7 Rule, we know that between µ ± 3σ we have 99.7% of the total area. However, since we are working with the sample mean, we want to calculate √ µ ± 3σ/ n instead. Thus, our interval will be µ ± 3 √σn = 120 ± 3(2.121) = 120 ± 6.363 = (113.637, 126.363) . 12. We know that 30% of people own ice cube trays. A random sample of 500 people is collected. (a) Sample proportion because we are given information in percentages. (b) Since we are looking for sample proportion, the parameter is p̂. (c) E(p̂) = p = 0.3. q q q p(1−p) 0.3(1−0.3) = = 0.21 = 0.0205. (d) SD(p̂) = n 500 500 (e) p̂ ∼ N (0.3, 0.0205). (f) We want to find P(p̂ < 0.4). First, calculate the z-score since we have a Normal Model. z = 0.4−0.3 = 4.88. 0.0205 So we want to calculate P(Z < 4.88) = normalcdf (−999, 4.88) = 0.99999947. (g) We want to find P(0.27 < p̂ < 0.32). First, calculate the z-scores since we have a Normal Model. = −1.46, z1 = 0.27−0.3 0.0205 0.32−0.3 z2 = 0.0205 = 0.98. So we want to calculate P(−1.46 < Z < 0.98) = normalcdf (−1.46, 0.98) = 0.7643.