Stat 109 Quiz 1 Prep Given the data set find the following. Use Summation Notation to express your answer. There are 10 problems here, expect 4 or 5 on Quiz 1. n 1) x i 1 i n 2) x i 1 x 4) x i x = i x n n 5) x i 1 1 n x n i 1 i 3) i 1 2 i 2 Name ____________ Let 𝑥1 = 6, 𝑥2 = −4, 𝑥3 = 7, 𝑥4 = 2, Let 𝑓1 = 4, 𝑓2 = 5, 𝑓3 = −3, 𝑓4 = 1 545 Stat 109 Quiz 1 Prep Given the data set find the following. Use Summation Notation to express your answer. There are 10 problems here, expect 4 or 5 on Quiz 1. 6) Variance : n x i 1 2 n 1 2 n fx i 1 i i n 9) 10) fx i 1 2 2 n x 1 n 2 i 1 i x n n 1 i 1 i 7) Variance : 8) i x i 2 i 1 n 2 f i xi x n i 1 = Name ____________ Let 𝑥1 = 6, 𝑥2 = −4, 𝑥3 = 7, 𝑥4 = 2, Let 𝑓1 = 4, 𝑓2 = 5, 𝑓3 = −3, 𝑓4 = 1 546 Stat 109 Quiz 1 Prep Given the data set find the following. Use Summation Notation to express your answer. There are 10 problems here, expect 4 or 5 on Quiz 1. Solution Let 𝑥1 = 6, 𝑥2 = −4, 𝑥3 = 7, 𝑥4 = 2, Let 𝑓1 = 4, 𝑓2 = 5, 𝑓3 = −3, 𝑓4 = 1 n 1) x i 1 4 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 i 547 ANSWER: x 11 x 1 = 6 + (−4) + 7 + 2 = 11 n x 2) i 1 2 i 4 𝑥12 + 𝑥22 + 𝑥32 + 𝑥42 2 =6 + (−4)2 2 2 +7 +2 ANSWER: x x 1 2 i 105 = 36 + 16 + 49 + 4 = 105 x 1 xi n n 3) 1 4 (𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 ) i 1 1 = 4 (6 + (−4) + 7 + 2) = x n 4) i 1 i 11 4 ANSWER: 1 4 11 xi 2.75 4 x1 4 x = (𝑥1 − 𝑥̅ ) + (𝑥2 − 𝑥̅ ) + (𝑥3 − 𝑥̅ ) + (𝑥4 − 𝑥̅ ) = (6 − 11 11 11 11 ) + (−4 − ) + (7 − ) + (2 − ) 4 4 4 4 24 11 −16 11 28 11 8 11 = ( − ) +( − )+( − )+( − ) 4 4 4 4 4 4 4 4 ANSWER: 4 13 −27 17 −3 0 = +( )+ +( )= =0 4 4 4 4 4 n 5) x i 1 x x 1 i x 0 x (𝑥1 − 𝑥̅ )2 + (𝑥2 − 𝑥̅ )2 + (𝑥3 − 𝑥̅ )2 + (𝑥4 − 𝑥̅ )2 2 i = (6 − 2.75)2 + (−4 − 2.75)2 + (7 − 2.75)2 + (2 − 2.75)2 ANSWER: = (3.25)2 + (−6.75)2 + (4.25)2 + (−0.75)2 4 = 10.5625 + 45.5625 + 18.0625 + 0.5625 = 74.75 x x 1 x 74.75 2 i Stat 109 Quiz 1 Prep Given the data set find the following. Use Summation Notation to express your answer. There are 10 problems here, expect 4 or 5 on Quiz 1. 6) Variance : n 2 xi x 2 n 1 i 1 Solution Let 𝑥1 = 6, 𝑥2 = −4, 𝑥3 = 7, 𝑥4 = 2, Let 𝑓1 = 4, 𝑓2 = 5, 𝑓3 = −3, 𝑓4 = 1 2 2 2 2 x1 x x2 x x3 x x4 x n 1 (6 − 2.75)2 + (−4 − 2.75)2 + (7 − 2.75)2 + (2 − 2.75)2 = 4−1 ANSWER: n = 548 74.75 3 = 24.9166̅ xi x 2 i 1 n 1 24.916 6 7) Variance : 2 n xi 6 4 7 22 1 n 2 i 1 1 2 2 2 2 2 x 6 4 7 2 i 3 n 1 i 1 n 4 2 1 11 1 420 121 1 299 299 2 105 24.916 6 3 4 3 4 4 3 4 12 ANSWER: = 24.91667 n 8) fx i 1 i i f1 x1 f 2 x2 f 3 x3 f 4 x4 4 6 5 4 3 7 1 2 24 20 21 2 15 ANSWER: 4 fx i 1 i i 15 Stat 109 Quiz 1 Prep n 9) fx i 1 2 i i Solution 549 f1 x12 f 2 x22 f 3 x32 f 4 x42 4 6 2 5 4 3 7 2 1 2 2 2 4 36 5 16 3 49 1 4 ANSWER: 4 fx 144 80 147 4 81 i 1 2 i i 81 1 n 1 2 2 2 2 2 f i xi x f1 x1 x f 2 x2 x f 3 x3 x f 4 x4 x 10) n i 1 n 1 2 2 2 2 46 2.75 5 4 2.75 37 2.75 12 2.75 4 1 2 2 2 2 43.25 5 6.75 34.25 1 0.75 4 1 4 10.5625 545.5625 318.0625 10.5625 4 1 42.25 227.8125 54.1875 0.5625 4 ANSWER: 216.4375 54.109375 4 1 n 2 f i xi x 54.109375 n i 1 Stat 109 Quiz 2 Prep NAME_________ Problem 1.) Given random collected data reporting on the average weekly hours that 22 students spend in front of a computer, find the 5 key numbers for a boxplot and express them with correct notation. Then draw the box plot on the given number line. Express any outliers with small circles. This Data set is Circle one: 1.) Skewed left 2.) Symmetrical 1 2 6 15 27 1 4 10 15 58 1 4 10 19 3.) Skewed right 550 1 6 10 24 1 6 10 25 Stat 109 Quiz 2 Prep NAME_________ Problem 2.) Given random collected data reporting on the number of trials it took each of 28 sixth graders to shoot a basket from the free-throw line, find the 5 key numbers for a boxplot and express them with correct notation. Then draw the box plot on the given number line. Express any outliers with small circles. This Data set is Circle one: 1.) Skewed left 2.) Symmetrical 551 1 1 1 2 3 3 4 5 5 6 8 9 11 12 12 15 16 18 19 20 22 25 26 26 42 45 54 58 3.) Skewed right Stat 109 Quiz 2 Prep Solution 1 2 6 15 27 Problem 1.) Given random collected data reporting on the average weekly hours that 22 students spend in front of a computer, find the 5 key numbers for a boxplot and express them with correct notation. Then draw the box plot on the given number line. Express any outliers with small circles. 2.) Find the median: 1 4 10 15 58 1 4 10 19 552 1 6 10 24 1 6 10 25 3.) Determine the Quartile Criterion 𝑛 + 1 𝑡ℎ 𝑥̃ = ( ) 2 𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒: 𝑄1 22 + 1 𝑡ℎ =( ) 2 𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] [ 2 𝑡ℎ 23 =( ) 2 𝑄3 = 11.5𝑡ℎ 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 [ 4 ] +[ 4 +1] 2 𝑡ℎ 𝑛 +1] 4 𝑡ℎ 3𝑛 [ +1] 4 Where square brackets indicate that we round any decimal down to find the nth value in the data set. N = 22 is not divisible by 4 so: = 11𝑡ℎ + 0.5(12𝑡ℎ − 11𝑡ℎ ) = 6 + 0.5(10 − 6) = 6 + 0.5(4) 𝑛 𝑡ℎ 𝑄1 = [ 4 + 1 ] 𝑎𝑛𝑑 𝑄3 = [ =8 4.) Find the 1st and 3rd Quartiles. 𝑡ℎ 𝑛 𝑄1 = [ + 1 ] 4 𝑡ℎ 3𝑛 𝑄3 = [ +1] 4 𝑡ℎ 22 =[ +1] 4 𝑡ℎ 66 =[ +1] 4 = [ 5.5 + 1 ]𝑡ℎ = [ 16.5 + 1 ]𝑡ℎ = [ 6.5 ]𝑡ℎ = [ 17.5 ]𝑡ℎ = 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 17𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 𝑄1 = 2 𝑄3 = 15 5.) Find the IQR and Step. 3𝑛 4 +1] 𝑡ℎ 6.) Find LOT &UOT LOT = 𝑄1 − 𝑆𝑡𝑒𝑝 IQR = Interquartile range IQR = Q3 – Q1 LOT = 2 − 19.5 LOT = −17.5 IQR = 15 – 2 IQR = 13 UOT = 𝑄3 + 𝑆𝑡𝑒𝑝 𝑆𝑡𝑒𝑝 = 1.5 × 𝐼𝑄𝑅 UOT = 15 + 19.5 𝑆𝑡𝑒𝑝 = 1.5 × 13 UOT = 34.5 𝑆𝑡𝑒𝑝 = 19.5 1 2 6 15 27 Note: The whiskers of the boxplot extend to the last data point that exists within the outlier thresholds. It is an error to extend the whiskers to the outlier thresholds. 1 4 10 15 58 1 4 10 19 1 6 10 24 1 6 10 25 o This Data set is Circle one: 1.) Skewed left 2.) Symmetrical 3.) Skewed right Stat 109 Quiz 2 Prep Solution Problem 2.) Given random collected data reporting on the number of trials it took each of 28 sixth graders to shoot a basket from the free-throw line, find the 5 key numbers for a boxplot and express them with correct notation. Then draw the box plot on the given number line. Express any outliers with small circles. 2.) Find the median: 𝑛 + 1 𝑡ℎ 𝑥̃ = ( ) 2 3.) Determine the Quartile Criterion 𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒: 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] 𝑄1 𝑡ℎ 28 + 1 =( ) 2 2 𝑡ℎ 29 =( ) 2 𝑄3 [ = 14.5𝑡ℎ = 14𝑡ℎ + 0.5(15𝑡ℎ − 14𝑡ℎ ) = 12 + 0.5(12 − 12) = 12 + 0.5(0) 1 1 1 2 3 3 4 5 5 6 8 9 11 12 12 15 16 18 19 20 22 25 26 26 42 45 54 58 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 ] +[ +1] 4 4 2 𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4 [ 𝑡ℎ 𝑛 +1] 4 𝑡ℎ 3𝑛 [ +1] 4 Where square brackets indicate that we round any decimal down to find the nth value in the data set. N = 28 is divisible by 4 so: = 12 𝑄1 = 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] 2 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 [ 4 ] +[ 4 +1] 𝑎𝑛𝑑 𝑄3 = 2 4.) Find the 1st and 3rd Quartiles. 𝑄1 = 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] 2 28 [ 4 ] 𝑄1 = 𝑡ℎ 𝑡ℎ 28 +1] 4 2 + [ 8 ]𝑡ℎ 2 4+5 9 𝑄1 = = 2 2 𝑄1 = [7 ]𝑡ℎ +[ 𝑄1 = 4.5 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 [ 4 ] +[ 4 +1] 𝑄3 = 2 𝑡ℎ 3 ∙ 28 𝑡ℎ 3 ∙ 28 + + 1 ] [ ] 4 4 𝑄3 = 2 [ 𝑡ℎ 84 𝑡ℎ 84 [ 4 ] +[ 4 +1] 𝑄3 = 2 [ 21 ]𝑠𝑡 + [ 22 ]𝑛𝑑 2 22 + 25 47 𝑄3 = = 2 2 𝑄3 = 𝑄3 = 23.5 553 Stat 109 Problem 2 Continued: Quiz 2 Prep Solution 554 𝑄1 = 4.5 𝑄3 = 23.5 5.) Find the IQR and Step. IQR = Inter-quartile range IQR = Q3 – Q1 IQR = 23.5 – 4.5 6.) Find LOT &UOT LOT = 𝑄1 − 𝑆𝑡𝑒𝑝 LOT = 4.5 − 28.5 LOT = −24 IQR = 19 UOT = 𝑄3 + 𝑆𝑡𝑒𝑝 𝑆𝑡𝑒𝑝 = 1.5 × 𝐼𝑄𝑅 UOT = 23.5 + 28.5 𝑆𝑡𝑒𝑝 = 1.5 × 19 UOT = 52 𝑆𝑡𝑒𝑝 = 28.5 1 1 1 2 3 3 4 5 5 6 8 9 11 12 12 15 16 18 19 20 22 25 26 26 42 45 54 58 Note: The whiskers of the boxplot extend to the last data point that exists within the outlier thresholds. It is an error to extend the whiskers to the outlier thresholds. o This Data set is Circle one: 1.) Skewed left 2.) Symmetrical 3.) Skewed right o Stat 109 Quiz 3 Prep Name_____________ In surveys related to colony collapse disorder (CCD) a federation of bee keepers survey the hives in the southern United States and find that 38% of the bee hives have varroa mites, while 22% of the bee hives have IAPV (Israel acute paralysis virus), and that 7% of the hives have both ailments. Using the declared event variables at the right, find the probability of drawing a bee hive described below. Use complete probability notation to express your answer. Note that Quiz 3 will likely ask for 4 scenarios. Ten are given here for practice. 1.) Declare Variables: 2pts Let one letter signify one event: M = Hive has Varroa mites. I = Hive has IAPV It is an error to couple event variables. Examples of this error: MI: Has both mites and IAPV INM: Has IAPV but not Mites The hive does not have varroa mites. 4.5pts Use words to describe the assigned variable. These are NOT declarations: M = 0.38 2.) The hive does not have IAPV. 4.5pts 3.) The hive has both varroa mites and IAPV. 4.) The hive has neither varroa mites nor IAPV. 5.) The hive has varroa mites but not IAPV. 6.) The hive either has varroa mites or does not have IAPV. 4.5pts 7.) The hive either does not have varroa mites or has IAPV. 4.5pts 8.) The hive either does not have varroa mites or does not have IAPV. 9.) The hive either has varroa mites or has IAPV. 10.) The hive does not have varroa mites but does have IAPV. 4.5pt I = 0.22 4.5pts 4.5pts 4.5pts 4.5pts 4.5pts B = 0.07 555 Stat 109 Quiz 3 Prep Solution 556 Use a sketch to represent all possible events in the event space. Find the notation associated with each event and piece together with either addition or subtraction the specified events. Declare Event Variables: Assign one variable to one event. DO NOT Assign variables Without event descriptions. M = Hive has Varroa mites. I = Hive has IAPV. M = 0.38 (No Credit) I = 0.22 (No Credit) B = Both (No Credit) Find the probability of drawing a bee hive from the Southern U.S such that: ̅ ) = 1 − 𝑃(𝑀) = 1 − 0.38 𝑃(𝑀 ̅ ) = 0.62 𝑃(𝑀 1.) The hive does not have varroa mites. 2.) The hive does not have IAPV. 1.) 𝑃(𝐼 ̅ ) = 1 − 𝑃(𝐼 ) = 1 − 0.22 𝑃(𝐼̅ ) = 0.78 𝑃(𝑀 ∩ 𝐼) = 0.07 (𝐺𝑖𝑣𝑒𝑛) 3.) The hive has both varroa mites and IAPV. 4.) The hive has neither varroa mites nor IAPV. ̅ ∩ 𝐼̅ ) = 𝑃(𝑀 ̅ ) − 𝑃(𝐼) + 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 0.62 – 0.22 + 0.07 = 0.47 5.) The hive has varroa mites but not IAPV. 𝑃(𝑀 ∩ 𝐼̅ ) = 𝑃(𝑀) − 𝑃(𝑀 ∩ 𝐼) = 0.38 – 0.07 = 0.31 6.) The hive either has varroa mites or does not have IAPV. 𝑃(𝑀 ∪ 𝐼̅ ) = 𝑃( ̅𝐼 ) + 𝑃(𝑀 ∩ 𝐼) = 0.78 + 0.07 = 0.85 7.) The hive either does not have varroa mites or has IAPV. ̅ ∪ 𝐼 ) = 𝑃(𝑀 ̅ ) + 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 0.62 + 0.07 = 0.69 8.) The hive either does not have varroa mites or does not have IAPV. 9.) The hive either has varroa mites or has IAPV. 10.) The hive does not have varroa mites but does have IAPV. ̅ ∪ 𝐼̅ ) = 1 − 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 1 – 0.07 = 0.93 𝑃(𝑀 ∪ 𝐼 ) = 𝑃(𝑀) + 𝑃(𝐼) − 𝑃(𝑀 ∩ 𝐼) = 0.38 + 0.22 – 0.07 = 0.53 ̅ ∩ 𝐼 ) = 𝑃(𝐼) − 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 0.22 – 0.07 = 0.15 Stat 109 Quiz 3 Prep Solution Let’s use some sketches to support the use of probability notation for each answer. Recall that the entire event space must have probabilities that sum to 1, then let this segmented sketch have a probability area that equals 1. Where: M = Event that a hive has varroa mites I = Event that a hive has IAPV. First we shade the given probabilities (as portions): 𝑃(𝑀) = 0.38 𝑃(𝐼) = 0.22 𝑃(𝑀 ∩ 𝐼) = 0.07 Now we can use these shaded portions to support the notation given in the calculation of the probability of each event. 1.) The hive does not have varroa mites. ̅ ) = 0.62 𝑃(𝑀 ̅ ) = 1 − 𝑃(𝑀) = 1 − 0.38 𝑃(𝑀 – = 2.) The hive does not have IAPV. 𝑃(𝐼̅ ) = 0.78 𝑃(𝐼̅ ) = 1 − 𝑃(𝐼) = 1 − 0.22 – = 3.) The hive has both varroa mites and IAPV. 𝑃(𝑀 ∩ 𝐼) = 0.07 (𝐺𝑖𝑣𝑒𝑛) 557 Stat 109 Quiz 3 Prep Solution ̅ ∩ 𝐼̅ ) = 0.47 𝑃(𝑀 4.) The hive has neither varroa mites nor IAPV. ̅ ∩ 𝐼̅ ) = 𝑃(𝑀 ̅ ) − 𝑃(𝐼) + 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 0.62 – 0.22 + 0.07 – = + 𝑃(𝑀 ∩ 𝐼̅ ) = 0.31 5.) The hive has varroa mites but not IAPV. 𝑃(𝑀 ∩ 𝐼̅ ) = 𝑃(𝑀) − 𝑃(𝑀 ∩ 𝐼) = 0.38 – 0.07 – = 6.) The hive either has varroa mites or does not have IAPV. 𝑃(𝑀 ∪ 𝐼̅ ) = 0.85 𝑃(𝑀 ∪ 𝐼̅ ) = 𝑃( ̅𝐼 ) + 𝑃(𝑀 ∩ 𝐼) = 0.78 + 0.07 = + 7.) The hive either does not have varroa mites or has IAPV. ̅ ∪ 𝐼) = 𝑃(𝑀 ̅ ) + 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 0.62 + 0.07 = + 𝑃( ̅̅̅ 𝑀 ∪ 𝐼) = 0.69 558 Stat 109 Quiz 3 Prep Solution ̅ ∪ ̅𝐼 ) = 0.93 8.) The hive either does not have varroa mites or does not have IAPV. 𝑃( 𝑀 ̅ ∪ 𝐼̅ ) = 1 − 𝑃(𝑀 ∩ 𝐼) 𝑃(𝑀 = 1 – 0.07 – = 9.) The hive either has varroa mites or has IAPV. 𝑃( 𝑀 ∪ 𝐼) = 0.53 𝑃( 𝑀 ∪ 𝐼) = 𝑃(𝑀) + 𝑃(𝐼) − 𝑃(𝑀 ∩ 𝐼) = 0.38 + 0.22 – 0.07 = + 10.) The hive does not have varroa mites but does have IAPV. ̅ ∩ 𝐼) = 𝑃(𝐼) − 𝑃(𝑀 ∩ 𝐼) 𝑃( 𝑀 = = 0.22 – 0.07 – – ̅ ∩ 𝐼) = 0.15 𝑃( 𝑀 559 Stat 109 1.) Quiz 4 Prep Name__________ Given that 70% of pet owners in a college town have cats only while the rest have dogs only and that 20% of cats have fleas while only 10% of dogs have fleas, find the probability that a randomly drawn pet will have fleas. Declare event variables: 2pts Notation: 2pts Calculation: 4pts 2.) Find the probability of drawing a dog given that the pet has fleas: Notation: 2pts Calculation: 4pts 3.) Are the events of the kind of pet drawn and whether or not it has fleas independent events? Use complete probability notation and associated values to support your answer. Notation: 2pts Calculation: 4pts 560 Stat 109 Quiz 4 Prep Solution 1.) Given that 70% of pet owners in a college town have cats only while the rest have dogs only and that 20% of cats have fleas while only 10% of dogs have fleas, find the probability that a randomly drawn pet will have fleas. It is an error to couple event variables. First define the event space variables. Examples of this error: Let one letter signify one event. C = event that a cat with fleas is drawn. C = event that a cat is drawn. D = event that a dog is drawn. F = event that a pet with fleas is drawn. The probability of drawing a pet with fleas. = D = event that a dog without fleas is drawn. C = 0.70 D = 0.30 These are not declarations! The probability of drawing a pet The probability of with fleas given drawing a dog. the pet is a dog. + The probability of drawing a pet The probability of with fleas given drawing a cat. the pet is a cat. PF PF D PD PF C PC Answer with Notation: Answering in English: PF .10 .30 .20 .70 PF 0.17 “There is a 17% chance that randomly drawn pet will have fleas.” 2.) Find the probability of drawing a dog given that the pet has fleas: The probability of drawing a dog given the pet has fleas. = PD F The probability of drawing a dog that has fleas . The probability of drawing any pet with fleas. = The probability of drawing a pet with fleas given that it is a dog. The probability of drawing a dog. The probability of drawing any pet with fleas. PF D PD PD F P F P F Answer with Notation: 0.10 0.30 PD F 0.17 PD F .1765 Answering in English: “There is a 17.7% chance that randomly drawn flea-born pet will be a dog.” 561 Stat 109 Quiz 4 Prep Solution 562 3.) Are the events of the kind of pet drawn and whether or not it has fleas independent events? Use complete probability notation and associated values to support your answer. Explanation On problems #1 and 2, along with the given information we found that: An intersection is equal to a conditional probability times the probability of the given event. In this case the conditional probability is the probability of drawing a dog from all flea borne pets. We multiply this by the probability of drawing a flea borne pet and the product gives us the intersection of dogs with fleas as a probability. We can determine whether events are independent of each other by multiplying their probabilities and checking whether the product is equal to the intersection. If the product is equal we can claim that the two events are independent of each other. #3) Notation and Answer 𝑃(𝐷) = 0.30 𝑃(𝐹) = 0.17 𝑃(𝐷|𝐹) = 0.1765 𝑃(𝐷 ∩ 𝐹) = 𝑃(𝐷|𝐹) ∙ 𝑃(𝐹) (Always true) 𝑃(𝐷 ∩ 𝐹) = 0.1765 ∙ (0.17) 𝑃(𝐷 ∩ 𝐹) = 0.030005 𝑃(𝐷 ∩ 𝐹) = 𝑃(𝐷) ∙ 𝑃(𝐹) (True only if D and F are independent) 𝑃(𝐷) ∙ 𝑃(𝐹) = 0.30 ∙ (0.17) 𝑃(𝐷) ∙ 𝑃(𝐹) = 0.051 Since 𝑃(𝐷 ∩ 𝐹) ≠ 𝑃(𝐷) ∙ 𝑃(𝐹) 0.030005 ≠ 0.051 The event of a pet having fleas and what kind of pet is drawn are not independent. The rate of flea infection must depend upon the type of pet drawn. Stat 109 Quiz 5 Prep Name ____________ For all Confidence Intervals present your answer with the percentage level of confidence, the parameter of interest (mean or median), the high and low values of the interval, the units of measure, all in a brief descriptive sentence. For Example: “Find the 85% Confidence Interval for the mean cholesterol level for Eskimos”. mg .” dL Entomologists were interested in the adult life span of a particular species of may fly. A random sample of the adult life span in hours for 24 may flies can be found in an MS Excel file at the site: users.humboldt.edu/tpayer Stat 109 Data sets Mayfly.csv. Previous research has found that this species had a standard deviation of = 1.5 hours. Find a 90% Confidence interval on the mean adult life span of this species of may fly. Use hand calculation. Answer: 1.) “The 85% CI for mean Eskimo cholesterol is (220.2, 245.6) 2.) Human birth weights in India are (approximately) normally distributed. Find a 95% confidence interval for the mean population Indian birth weight given a random sample of 17 birth weights with a sample mean and sd of 2900 598 grams x sd . Use hand calculation. 3.) There is a third problem on the back page! 563 Stat 109 Quiz 5 Prep backpage 564 Table 6: t-Table for Confidence Intervals. Level of Confidence (percent) df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 1000 ∞ 3.) 80 3.07768 1.88562 1.63778 1.53320 1.47589 1.43977 1.41493 1.39685 1.38303 1.37215 1.36342 1.35621 1.35019 1.34502 1.34060 1.33677 1.33338 1.33036 1.32775 1.32533 1.32320 1.32125 1.31944 1.31783 1.31636 1.31497 1.31369 1.31253 1.31142 1.31038 1.30308 1.29868 1.29581 1.29376 1.29222 1.29103 1.29007 1.28200 1.28155 90 6.31375 2.91999 2.35341 2.13184 2.01505 1.94317 1.89456 1.85953 1.83313 1.81244 1.79588 1.78228 1.77094 1.76133 1.75307 1.74587 1.73962 1.73407 1.72911 1.72474 1.72074 1.71715 1.71389 1.71087 1.70813 1.70563 1.70326 1.70112 1.69911 1.69724 1.68386 1.67589 1.67065 1.66692 1.66413 1.66196 1.66024 1.64600 1.64485 95 12.7062 4.3027 3.1825 2.7764 2.5706 2.4469 2.3646 2.3060 2.2622 2.2281 2.2010 2.1788 2.1604 2.1448 2.1315 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0556 2.0519 2.0484 2.0452 2.0423 2.0211 2.0085 2.0003 1.9944 1.9901 1.9867 1.9840 1.9620 1.9600 98 31.8206 6.9646 4.5407 3.7470 3.3649 3.1427 2.9980 2.8965 2.8215 2.7638 2.7181 2.6810 2.6503 2.6245 2.6025 2.5835 2.5669 2.5524 2.5395 2.5280 2.5176 2.5083 2.4999 2.4921 2.4851 2.4786 2.4727 2.4671 2.4620 2.4573 2.4232 2.4033 2.3902 2.3808 2.3739 2.3685 2.3642 2.3300 2.3264 99 63.6570 9.9248 5.8410 4.6041 4.0321 3.7075 3.4995 3.3554 3.2498 3.1693 3.1058 3.0545 3.0123 2.9768 2.9467 2.9208 2.8982 2.8784 2.8610 2.8453 2.8314 2.8187 2.8074 2.7969 2.7874 2.7787 2.7707 2.7633 2.7564 2.7500 2.7045 2.6778 2.6604 2.6480 2.6387 2.6316 2.6259 2.5810 2.5758 99.9 636.607 31.598 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.849 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.435 3.416 3.402 3.391 3.300 3.291 If we have a 95% confidence interval, this means 95% of what must be true? Answer in the context of the last problem. Confidence (Percent) 80 90 95 98 99 99.8 99.9 99.99 99.999 z 1.282 1.645 1.960 2.326 2.576 3.090 3.291 3.891 4.491 Stat 109 Quiz 5 Prep KEY 565 1a.) First we load the data file into R: a1) Open the Mayfly file: a2) Then select the web address of this file from your browser and press Ctrl + C to copy from it. users.humboldt.edu/tpayer Stat 109 Data sets Mayfly.csv a3) Open up R Studio and load the data set. Open the Environment Tab Import Dataset From Web URL Click once inside the dialogue box. Press Ctrl + V to paste the web address. Click OK. Keep or rename the file. Click on Yes for Heading. Click on Import. Stat 109 Quiz 5 Prep KEY 566 1.b) With the data loaded we load the nortest package and test the data for normality and find the summarized values of the sample size, mean, and sd. R Editor Code Open the Packages tab in the lower right panel and check that the nortest package is checked. R Console code > install.packages(“nortest”) > library(nortest) R Code Explained > attach(Mayfly) > ad.test(Mayfly[,1]) > ad.test(Hours) Anderson-Darling normality test: data: Hours A = 0.2322, p-value = 0.7756 > NROW(Hours) [1] 24 > mean(Hours) [1] 8.208333 > sd(Hours) [1] 1.227552 Common R Studio Error Install a package in R by referencing its name in quotes with the install.packages command. Add the package contents to R’s Library, this time without quotes on the package name. Attaches the Mayfly data set to R’s working memory. (Necessary for data access.) Either one of these lines of code will run the Anderson-Darling test for normality on the Hours data set within the file called Mayfly. The first command for normality asks for the first column in the file called Mayfly, while the second command uses the column header of that data set to reference the data. We conclude that the mayfly (p > data set passes normality because by Anderson Darling: The sample size, mean and sd of the mayfly data set are found by applying the respective commands to column header name of the data set. 𝑁 = 24, 𝑥̅ = 8.208, 𝑠 = 𝑠𝑑 = 1.228 R Studio Error Explained We cannot apply a command to a filename, rather > ad.test(Mayfly) we must apply the command to a data set within said Error in file using either the column header name (Hours)or `[.data.frame`(x,complete.cases(x)) a column number index Mayfly[,1] for that file. undefined columns selected Stat 109 Quiz 5 Prep KEY 567 Problem #1, Continued Use the summarized data taken from R’s output to construct a confidence interval by hand. Our goal is to build a confidence interval about the mean mayfly life span. But we must decide which of 2 confidence interval formulas to use, neither of which will be given on the quiz. Which one do we apply? When we know the true standard deviation value, , then we will use with a Z-interval. This gives a better approximation, but is often unknowable. If we do not know the true standard deviation value, , then all we have is the sample standard deviation: s. In this case we use s with a t-interval. 𝑥̅ ± 𝑍𝛼⁄2 𝑥̅ ± 𝑡𝛼⁄2 𝜎 √𝑛 𝑠 √𝑛 For a review on why we make this choice revisit Week 6 Day 1 Lecture Notes. If is known, we use the z-interval for the mean mayfly life span: Since was given as = 1.5 𝑥̅ ± 𝑍𝛼⁄2 𝜎 √ = 8.208 ± 1.645 𝑛 1.5 6pts √24 “The 90% CI on the mean adult life span for this species of mayfly is (7.70, 8.71) hours”. 2.) Human birth weights in India are (approximately) normally distributed. Find a 95% confidence interval for the mean population Indian birth weight given a random sample of 17 birth weights with a sample mean and sd of 2900 598 grams x sd . Use hand calculation. Answer: Data is assumed to be normally distributed as we have summarized data. 1pt Note! Even though we cannot verify that this data set is normally distributed, (because summarized data can’t be verified) we still have to acknowledge that we are making this assumption. Without this we are not justified in using mean values. Problem #2 Continued: Sinceis not known, we use the t-interval: 𝑥̅ ± 𝑡𝛼⁄2 𝑠 598 √ √17 = 2900 ± 2.1199 𝑛 “The 95% CI on the mean birth weight in India is (2592.5, 3207.5) grams”. 6pts Stat 109 Quiz 5 Prep KEY 568 Point Key For problems #1 and #2: (2 x 7pts each) -3pts for using the wrong CI interval formula. -0.5 pt for reporting the CI with switched bounds. -1pt for each normality check. -2pts for using the wrong value within the correct table. -2pts for calculator errors, -0.5pt for round off errors. The lower bound should come first like this: (1.472, 2.376) Switched bounds put the upper bound first in error: (2.376, 1.472) -0.5 pt for reporting the CI without units of measure. -1.5pts for using the sample standard deviation, s, instead of the true standard deviation, s when it is available. Procedural Question: A truncated t-table for this Example: What happens if my df value lands in between the gaps for larger df sample sizes of the t-table? Suppose the confidence level is 95% and the sample size is n = 79. This means your df at n-1 is 78. Except that there is no corresponding row for df = 78. Do we round up to df = 80? No, because this overstates your sample size. Do we round down to df = 70? Yes! Because df = 70 is the closest value in the table that does not overstate the sample size. How about interpolating and use 80% of the distance between 70 and 80 to estimate where the df = 78 value will be? No, because the t-curve is not linear and this detail work we save for R. 3.) Answer: 95% confidence means that 95% of all randomly drawn samples will form confidence intervals that bracket the true mean Indian birth weight. The interval of (2592.5, 3207.5) grams may be one of the 95% CI that contains the true mean Indian birth weight or one of the 5% that do not. The method we have used works 95% of the time. Thus we have 95% confidence. Also to the point: 95% confidence refers to the likelihood of bounding the true mean Indian birth weight within a confidence interval calculated from a random sample of the population. Once the confidence interval is calculated we do not know whether the interval is one of the 95% that contains the true mean birth weight or one of the 5% that fails to catch the true mean. We cannot verify whether the interval of (2592.5, 3207.5) grams contains the true mean Indian birth weight, but the method we have used works 95% of the time. Thus we have 95% confidence. Stat 109 Quiz 5 Prep KEY 569 NOTE to Student: Both statements above for problem 3 are solid interpretations of the meaning of confidence in regards to a specific confidence interval. The first sentence captures the key point. But unfortunately this remains a troubling concept for many students. Consider a random selection of typical responses from previous quizzes and exams below. Here are some common errors in the interpretation of the meaning of confidence with corrective responses. a.) “95% confidence means that 95% of all Indian birth weights will be between (2592.5, 3207.5) g.” No, we are not answering whether Indian birth weights span a particular interval, but whether the true mean of Indian birth weight will be caught within the confidence interval we have calculated from a random sample. We have a 95% chance of bracketing the true mean birth weight before the random draw of the data is made. The interval of (2592.5, 3207.5) may be one of the 95% that contains the true mean birth, or one of the 5% that missed its containment. b.) “There is a 95% chance that the confidence interval of (2592.5, 3207.5) g contains the true mean Indian birth weight.” No, absolutely not! The interval either contains the true mean or it does not. There is no chance about it. Referencing 95% chance was appropriate before the random sample was drawn. Once we have the drawn the sample data our confidence interval is fixed and it either brackets the true mean or it does not. c.) “95% of the data will be true to the interval (2592.5, 3207.5) g.” What?? No Credit here. Stringing together the terms of data, interval, and “truth” in a vague sentence does not explain the meaning of confidence. Look, our goal is to try to get the best estimate we can on the true mean Indian birth weight. The problem is that while this value exists, it is unknowable, so we resort to statistical sampling to form a confidence interval about a randomly drawn sample mean. The method we use brackets the true mean value in 95% of all confidence intervals formed from simulated trials. It is this 95% success rate that is the basis of our confidence in bracketing the true mean Indian birth weight. d.) “95% of all true mean Indian birth weights will be contained within the interval of (2592.5, 3207.5) g.” Oh no: First there is only one true mean birth weight. Second, you have reversed the process: it is not whether 95% of birth weights (as true mean values or not) will be contained within the confidence interval, but whether the confidence interval brackets the one true mean. We could say that 95% of all confidence intervals formed from random samples bracket the true mean Indian birth weight. e.) “95% of the time the sample mean will exist within the interval of (2592.5, 3207.5) g.” No! The sample mean will always exist within the confidence interval, assuming that we make no calculation errors. The sample mean is the dead center of the confidence interval, we add and subtract a margin of error from the sample mean to build the confidence interval according to the Stat 109 Quiz 5 Prep KEY e.) Continued… 𝑠 t-interval formula: 𝑥̅ ± 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑛. The question is whether the √ constructed confidence interval contains the true mean Indian birth weight. Because this is real data and the true mean is unknowable, we cannot verify whether this interval brackets the true mean. Instead we cite that in simulated trials 95% of all confidence intervals drawn from random samples did bracket the true mean value, and thus we have 95% confidence. Recall the lecture of Week 6 Day 1: We calculated for a 98% confidence interval (CI) on the true mean goliath frog weight in pounds and found (4.32, 5.42) lbs. Here we are assuming that we do not know that the true mean weight of the koi in the pond is = 4.62 lbs. Once we have the confidence interval it is important that we do not talk about probability or chance. Probability and chance refer to events that have yet to occur. Once we have rolled the dice, drawn the card from the deck, or for a biologist, taken a random sample of goliath frog weights, all talk of chance is over. This confidence interval either caught the true mean weight of the frog or it did not. The fact that we do not know the true mean value in most situations does not change this. We have only one random sample and its resulting confidence interval from which to approximate the true mean frog weight. This particular interval that we were working with, (4.32, 5.43) lbs is either one of the 98% of all confidence intervals calculated from random samples that does contain the true mean frog weight or it is one of the 2% of all confidence intervals calculated from random samples that does not contain the true mean frog weight. Our 98% confidence level comes from that fact that if we were to repeat our random sampling millions of times over and construct millions of confidence intervals from these samples we would be assured that 98% of these confidence intervals would bracket the true mean frog weight. This 98% probability of success with this method can be verified in computer simulations and is the basis of our 98% confidence level. 570 Stat 109 Quiz 6 PREP Table 8: Test of Hypothesis z-Table. Draw a decision line for the reject H 0 and do not reject H 0 regions on either side of the z Critical value(s). Find the bracketed p-value and compare it against (LOS). For the following scenarios state a full statistical conclusion of the hypothesis comparing the z Sample with z Critical values and bracketed p-values with values 1.) 2.) 3.) H 0 : p 0.15 H A : p 0.15 H 0 : p 0.48 H A : p 0.48 H 0 : p 0.007 H A : p 0.007 z Sample 1.89 0.05 z Sample 2.79 0.01 z Sample 1.35 0.10 Name________ 571 Level of Significance z OneTwoCritical tailed tailed .10 .20 1.282 .05 .10 1.645 .025 .05 1.960 .01 .02 2.326 .005 .01 2.576 .001 .002 3.090 .0005 .001 3.291 .00005 .0001 3.891 .000005 .00001 4.491 Stat 109 Quiz 6 PREP Solution Table 8: Test of Hypothesis z-Table. Draw a decision line for the reject H 0 and do not reject H 0 regions on either side of the z Critical value(s). Find the bracketed p-value and compare it against (LOS). For the following scenarios state a full statistical conclusion of the hypothesis comparing the z Sample with z Critical values and bracketed p-values with values 1.) H 0 : p 0.15 H A : p 0.15 z Sample 1.89 0.05 At the 5% LOS we do not Reject H0 because: ZSample < ZCritical p > (-1.89 < 1.645) (p > 0.10) > 0.05 2.) H 0 : p 0.48 H A : p 0.48 3.) H A : p 0.007 Level of Significance z OneTwoCritical tailed tailed .10 .20 1.282 .05 .10 1.645 .025 .05 1.960 .01 .02 2.326 .005 .01 2.576 .001 .002 3.090 .0005 .001 3.291 .00005 .0001 3.891 .000005 .00001 4.491 Do Not Reject H0 Reject H0 ZCritical = 1.645 z Sample 2.79 0.01 At the 1% LOS we do not Reject H0 because: ZSample > ZCritical p > (2.79 > -2.326) (p > 0.10) > 0.01 H 0 : p 0.007 572 Reject H0 Do Not Reject H0 ZCritical = -2.326 z Sample 1.35 0.10 Reject H0 Do Not Reject H0 Reject H0 At the 10% LOS we do not Reject H0 because: ZCritical = -1.645 ZCritical = +1.645 ZCritical < ZSample < ZCritical p > (-1.645 < -1.35 < 1.645) (0.10 < p < 0.20) > 0.10 Stat 109 Quiz 7 Prep Name____________ Given a scenario that requires a test of hypothesis and the resulting p-value, find the following: Declare the parameter with units of measure State the hypothesis. Describe what the p-value means in context of the problem. Use an English sentence that uses the given parameter and its units of measure. A plant physiologist conducted an experiment to determine whether mechanical stress can retard the growth of soybean plants. Young plants were randomly allocated to two groups of 13 plants each. Plants in one group were mechanically agitated by shaking for 20 minutes twice daily, while plants in the other group were not agitated. After 16 days of growth, the mean stem length in cm of each plant was measured, with the results given in the table at right. Assume normality and use an appropriate t-test to test the hypothesis that stress tends to retard plant growth. Assume that the tests p-value yields: p 0.02 . Stresses 24.7 25.7 26.5 27.0 27.1 27.2 27.3 27.7 28.7 28.9 29.7 30.0 30.6 x 27.78 sd = 1.726 Control 25.2 29.5 30.1 30.1 30.2 30.2 30.3 30.6 31.1 31.2 31.4 33.5 34.3 x 30.59 sd = 2.134 Solution: Declare parameter: 1 = Mean stem length in cm of plant after 16 days for stressed plants. 2 = Mean stem length in cm of plant after 16 days for non-stressed plants. State the hypothesis: 𝐻0 : 𝜇1 = 𝜇2 𝐻𝐴 : 𝜇1 < 𝜇2 Describe what the p-value means in the context of the problem: If it’s true that there is no difference between the mean stem lengths (in cm) of stressed versus non-stressed soybean plants, then only 2% of all random draws will show that shaking the plants retards growth to an even greater extent than our data set does. 573 Stat 109 Quiz 7 Prep Solution If the null hypothesis is true then p% of all random draws will contradict the null hypothesis to an even greater extent than our data set does. If it’s true that there is no difference between the mean stem lengths (in cm) of stressed versus non-stressed soybean plants, then only 2% of all random draws will show that shaking the plants retards growth at an even greater extent than our data does. If its true that The Null hypothesis expressed with its parameter and units of measure Then p% of all random draws will show that If its true that there is no difference between the mean stem lengths (in cm) of stressed versus non-stressed soybean plants, Then 2% of all random draws will show that The key point of the alternate hypothesis Shaking plants retards growth at an even greater extent than our data set does. at an even greater extent than our data set does. 574 Stat 109 Quiz 7 Prep Consider these examples: Try to complete each of these examples for practice without looking to the answers on the next page. Name_______________ 575 For each example: Declare the parameter. State the hypothesis. Interpret the p-value. 1.) A pediatrician wants to determine how effective aspirin is in decreasing body temperature of 5 year olds with the flu. She records the body temperature in Fahrenheit of 14 of her patients before and one hour after administering aspirin. She runs a paired t-test. Her hypothesis test yields a pvalue of p = 0.03. Interpret the meaning of the p-value. Given the p-value is a probability and it is read as a percentage, then 3% of what must be true? 2.) A kinesiologist suspects that the proportion of national baseball league pitchers that are left handed in the will be greater than the 10% reported in the general U.S. population. He runs a one proportion hypothesis test on a random selection of 48 pitchers from the national baseball league and finds that 13 of the pitchers are left handed. The hypothesis reports a p-value of p = 0.23. Interpret the meaning of the p-value. Given the p-value is a probability and it is read as a percentage, then 23% of what must be true? 3.) A botanist suspects that the rate at which trees absorb carbon dioxide (measured in units of kg C per square meter of ground area) will increase if the trees are fertilized. She runs a 2-sample t-test comparing similar stands of neighboring trees where one stand receives fertilizer and the controls did not. Her hypothesis test yields a p-value of p = 0.04. Interpret the meaning of the p-value. Given the p-value is a probability and it is read as a percentage, then 4% of what must be true? 4.) A doctor suspects that allergic reactions for adults will be smaller in populations that grew up as children in homes where they regularly made contact with household pets. She samples from 2 populations: 2300 adults that had pets as youngsters, and 1600 adults that never had a pet as a child. The adults that had pets as children had an incidence of allergic reactions in 2% of the population. In the population of adults that never had a pet as a child the incidence of allergic reaction was at 3.5%. Does the evidence support the doctors suspicion? A 2 sample proportion yielded a p-value of 0.016 Stat 109 Declare the parameter. Quiz 7 Prep Key 1.) Declare the parameter. d = Mean difference in temperatures in Fo of 5 year-old flu patients between before and after aspirin ingestion. We use “ > ” because: Ho: d = 0 Before – After > 0 Ha: d > 0 (Large – small) > 0 State the hypothesis. Interpret the p-value. 576 If it’s true that aspirin does not lower the temperature of 5 year olds with the flu, then 3% of all random samples of 5 year olds with the flu that have taken aspirin will have lowered temperatures to an even greater extent than was shown in this sample. 2.) p = Proportion of left-handers in the national baseball league. State the hypothesis. Ho: p = 0.10 Ha: p > 0.10 Interpret the p-value. If it’s true that the proportion of left-handers in the national baseball league does not exceed 10%, then 23% of all random samples of national league pitchers will show an even greater proportion of left-handers than our sample of 13/48. Declare the parameter. 3.) 1 = Mean carbon absorbed in kg C per square meter of ground area for fertilized trees. 2 = Mean carbon absorbed in kg C per square meter of ground area for unfertilized trees. State the hypothesis. Ho: 1 = 2 Ha: 1 > 2 Interpret the p-value. If it’s true that fertilized trees do not absorb any more carbon than non-fertilized trees, then 4% of all random samples of fertilized trees will show rates of carbon absorption that exceed what was found in this sample. Declare the parameter. 4.) State the hypothesis. Interpret the p-value. p1 = proportion of adults that have allergies, and had pets as children. p2 = proportion of adults that have allergies, but had no pets as children. Ho: p 1 = p2 Ha: p 1 < p2 If it’s true that having pets as children does not reduce one’s chances of having allergies as an adult later in life, then 1.6% of all random draws comparing the adult allergy rates between those that had pets as children and those that did not will show results where adults that did have pets as children had an even lower comparative rate of allergies than was seen in our sample data. Stat 109 Quiz 8 For each of the 5 problems (only 4 are shown here for the prep) determine whether the scenario describes an independent comparison of means or a dependent comparison of means. Prep NAME______________ Declare the parameter with units of measure. Make the hypothesis statement. 1.) In a study of kidney function, 40 adult male frogs, Rana pipiens, had their Oxygen, O2, consumption measured. 20 of the frogs had renal (kidney) damage while the other 20 frogs were used as controls. The researchers suspect that the frogs with renal damage will consume more oxygen in ml/g/hour than the controls. The mean O2 consumed by both groups of frogs was recorded and compared. 2.) Beta wave, or beta rhythm, is the term used to designate the frequency range of brain activity above 12 Hz (12 transitions or cycles per second). Beta states are the states associated with normal waking consciousness. Low amplitude beta waves with multiple and varying frequencies are often associated with active, busy, or anxious thinking and active concentration. Researchers suspect that sedative-hypnotic drugs such as benzodiazepines or barbiturates will reduce the mean amplitude in a rat’s beta waves. 24 rats had their beta waves recorded as they engaged in solving maze pathways. The beta waves of the same rats were then recorded when the rats were subjected to a small dose of barbiturates and introduced to a new maze puzzle. The mean amplitude of the beta waves for the rats in both experiments were compared. 3.) The hygiene hypothesis proposes that our immune systems are fortified if one is exposed to animals when we are young. A nurse tests the hypotheses on a group of adult volunteers that had been separated into two groups: Group 1 had either a cat, a dog, or a rodent as a pet when they were children, while Group 2 did have any pets when they were children. Each group was given a skin scratch test for pet dander and the resulting skin rash that developed was recorded for each individual in mm2. The mean area of skin reactivity in mm2 for each group was recorded and compared. 4.) A nurse tests the hypotheses that comfrey compresses can reduce skin reactivity to irritants on the skin on a group of adult volunteers. Each volunteer had both of their forearms scratched tested with pet dander. Immediately after the scratch test each volunteer had their left forearm prepared with a comfrey compress while their right forearm were treated with a placebo compress. The mean area of skin reactivity in mm2 for each arm was recorded and compared. 577 Stat 109 Quiz 8 For each of the 5 problems (only 2 are shown here for the prep) determine whether the scenario describes an independent comparison of means or a dependent comparison of means. Prep Solution Declare the parameter with units of measure. Make the hypothesis statement. 1) In a study of kidney function, 40 adult male frogs, Rana pipiens, had their Oxygen, O2, consumption measured. 20 of the frogs had renal (kidney) damage while the other 20 frogs were used as controls. The researchers suspect that the frogs with renal damage will consume more oxygen in ml/g/hour than the controls. The mean O2 consumed by both groups of frogs was recorded and compared. Declare parameters: mean O2 consumption of renal damaged frogs in ml/g/hour. mean O2 consumption of control frogs in ml/g/hour. State hypothesis: 2) Beta wave, or beta rhythm, is the term used to designate the frequency range of brain activity above 12 Hz (12 transitions or cycles per second). Beta states are the states associated with normal waking consciousness. Low amplitude beta waves with multiple and varying frequencies are often associated with active, busy, or anxious thinking and active concentration. Researchers suspect that sedative-hypnotic drugs such as benzodiazepines or barbiturates will reduce the mean amplitude in a rat’s beta waves. 24 rats had their beta waves recorded as they engaged in solving maze pathways. The beta waves of the same rats were then recorded when the rats were subjected to a small dose of barbiturates and introduced to a new maze puzzle. The mean amplitude of the beta waves for the rats in both experiments were compared. Declare parameters: d mean difference in Beta-wave amplitude between controls-barbiturate dosed rats. State hypothesis: d d 578 Stat 109 Quiz 8 Prep Solution 3.) The hygiene hypothesis proposes that our immune systems are fortified if one is exposed to animals when we are young. A nurse tests the hypotheses on a group of adult volunteers that had been separated into two groups: Group 1 had either a cat, a dog, or a rodent as a pet when they were children, while Group 2 did have any pets when they were children. Each group was given a skin scratch test for pet dander and the resulting skin rash that developed was recorded for each individual in mm2. The mean area of skin reactivity in mm2 for each group was recorded and compared. Declare parameters: mean skin reactivity in mm2 for adults who had pets as children. mean skin reactivity in mm2 for adults who did not have pets as children. State hypothesis: 4.) A nurse tests the hypotheses that comfrey compresses can reduce skin reactivity to irritants on the skin on a group of adult volunteers. Each volunteer had both of their forearms scratched tested with pet dander. Immediately after the scratch test each volunteer had their left forearm prepared with a comfrey compress while their right forearm were treated with a placebo compress. The mean area of skin reactivity in mm2 for each arm was recorded and compared. Declare parameters: d mean difference in skin reactivity in mm2 for adults between their comfrey treated left forearm and their placebo treated right forearm. State hypothesis: d d 579 Stat 109 Quiz 9 Prep Given a brief description of the categorical data displayed in 2x2 table determine which of the 3 Chisquare hypothesis tests to apply: 1) Whitehall Laboratories, makers of Advil, published a study comparing reports of upset stomach as a side effect among ibuprofin users versus placebo users. Is the rate of upset stomach significantly different with Ibuprofin group versus that experienced in the control group? Name____________ 1) Goodness of Fit Test 2) Independence of Attributes 3) McNemar’s Test Dosage Group: Control Ibuprofin Upset stomach 8 6 No upset stomach 664 645 1) Goodness of Fit Test Circle the correct Chi-Square test to apply: 2) Independence of Attributes 3) McNemar’s Test 2) In an ecological study of the Carolina Junco, 54 birds were captured from a certain population; of these, 40 were male. Is this evidence that males outnumber females in the population? Junco Counts: Observed Expected Male 40 27 Female 14 27 1) Goodness of Fit Test Circle the correct Chi-Square test to apply: 2) Independence of Attributes 3) McNemar’s Test 3) The shrub Xerospirea hartwegiana inhabits Shrub Cut and dry climates of Mexico. Botanist were interested Results: Cut Only Burned in the shrubs ability to regenerate after trauma. Died 1 4 35 Xerospirea shrubs were randomly selected Regenerated 19 11 and all shrubs were topped off 2 inches above ground level. 15 of the shrub stumps were subjected to a propane torch to simulate a fire. Does the presence of fire 1) Goodness of Fit Test significantly reduce the shrubs ability to regenerate? 2) Independence of Attributes Circle the correct Chi-Square test to apply: 3) McNemar’s Test 580 Stat 109 Quiz 9 Prep 4) ACME Laboratories, makers of a new sunscreen ointment tested the ointment for the side effect of itchy skin. Concerned that itchy skin might be sex dependent, researchers applied the ointment to 40 women and their brothers in an attempt to match genetic skin type but still control for differences of sex. Is the reaction to the ointment dependent on one’s sex? Sister has itchy skin Sister has no reaction Name__________ Brother has Brother has itchy skin no reaction 5 8 1 27 1.) Goodness of Fit Test Circle the correct Chi-Square test to apply: 2.) Independence of Attributes 3.) McNemar’s Test 5) In an attempt to determine if caffeine quickens one’s reaction time, researchers tested the reaction time of 64 college students before and after taking caffeine. Using threshold of 0.35 seconds to push a button in response to a bell each students success or failure was recorded before and after taking 2 cups of coffee. No Success with Caffeine Caffeine Male Success 14 Failure 18 Failure with Caffeine 3 27 1.) Goodness of Fit Test Circle the correct Chi-Square test to apply: 2.) Independence of Attributes 3.) McNemar’s Test 6) The shrub Xerospirea hartwegiana inhabits Before After Fire After Fire dry climates of Mexico. Botanist were interested Wildfire Present Absent in the role fire plays in the colonization of the shrub. Present 1 19 35 random transects of a high mesa plane were taken Absent 4 11 the year before and then visited again a year after the area was overrun by wildfire and the presence or absence 1.) Goodness of Fit Test Xerospirea hartwegiana shrubs in each transect were 2.) Independence of Attributes recorded before and after the wildfire. Does the presence of the shrub before fire significantly reduce the shrubs 3.) McNemar’s Test presence after the fire? Circle the correct Chi-Square test to apply: 581 Stat 109 Quiz 9 Prep Solution 1.) Independence of Attributes. The researchers of the experiment compare 2 populations of treatments (ibuprofen vs control) for their effect on upset stomach. In general, an independence of attributes test will count one set of attributes (upset stomach or not) among 2 or more populations: (ibuprofen vs control). 582 1.) Goodness of Fit Test 2.) Independence of Attributes 3.) McNemar’s Test 2.) Goodness of Fit Test. 1.) Goodness of Fit Test Here the comparison is between an observed count and an expected count. If the sex distribution of the Carolina Junco is 2.) Independence of Attributes evenly distributed we would expect to see an even count of both 3.) McNemar’s Test sexes. These expected counts are then compared against the observed counts of each sex. The even “expected count” was implied in this example but a Goodness of Fit test can also specify that the expected count should be based on percentages or ratios. For example, “The ornithologist suspects that 70% of the Junco population will be male.” Or: “The ornithologist suspects that the male Junco population will outnumber the females by a ratio of 2:1.” 3.) Independence of Attributes. The researchers of the experiment compare 2 populations of treatments (cut vs cut & burn) for their effect upon the regeneration of the shrub. In general, an independence of attributes test will count one set of attributes (regenerate or not) among 2 or more populations: (cut vs cut & burn). 1.) Goodness of Fit Test 2.) Independence of Attributes 3.) McNemar’s Test 4.) McNemar’s Test The researchers of the experiment compare the responses of itchy 1.) Goodness of Fit Test skin among the paired populations of brother and sister. The 2.) Independence of Attributes point of doing this is to have as similar skin types as possible to enable a good comparison of male and female reactions to the 3.) McNemar’s Test ointment without having a dissimilar genetic skin type reaction confound the results. Note how the both the column headers and row descriptors of the 2x2 table describe an attribute (brother vs sister) and result (itchy skin or not) simultaneously. We use McNemars Test whenever we want to discern a difference between categorically paired data sets. Is itchy skin more likely for one sex vs another? Stat 109 Quiz 9 Prep Solution 5.) McNemar’s Test The researchers of the experiment compare the responses of 1.) Goodness of Fit Test reaction times among the paired populations of students before 2.) Independence of Attributes and after ingesting caffeine. The point of doing this is to have as similar metabolisms as possible to enable a good comparison of 3.) McNemar’s Test before and after reactions times to caffeine without having a dissimilar metabolisms confound the results. Note how the both the column headers and row descriptors of the 2x2 table describe an attribute (with vs without caffeine) and result (success vs failure) simultaneously. We use a McNemars test whenever we want to discern a difference between categorically paired data sets. Is reaction time quickened by caffeine? We pair before and after reaction times with a McNemars test to find out. 6.) McNemar’s Test The researchers of the experiment compare the presence of the 1.) Goodness of Fit Test shrub, Xerospirea hartwegiana among the paired populations of transects taken before and after a wildfire. The point of 2.) Independence of Attributes doing this is to have as similar transects as possible to enable a 3.) McNemar’s Test good comparison of before and after transect lines without having a dissimilar soil types or micro-climates confound the results. Note how the both the column headers and row descriptors of the 2x2 table describe an attribute (before vs after wildfire) and result (present vs absent) simultaneously. We use a McNemars test whenever we want to discern a difference between categorically paired data sets. Is the presence of the shrub significantly different after a fire? We pair before and after plant presence transects with a McNemars test to find out. 7.) Consider this Example: The strength of anti-bacteria soap was “Clean Counts” “Dirty Counts” challenged by a company that sells bar< 300 CFU > 300 CFU soap. The company claims that bar soaps Anti-Bacterial Soap 15 5 work just as well as the anti-bacterial soap Bar Soap 12 8 in minimizing the bacteria left on one’s hands after washing. A comparison with 20 volunteers was made in which each person washed their hands and then dug in a compost heap with bare hands for 5 minutes and then washed their hands with the anti-bacterial soap. Each person set their washed hand in an agar dish and the resulting bacteria count was made in 24 hours. The same 20 people repeated the process with bar soap hand washing. Each went back to the compost heap and dug bare handed for another 5 minutes in the compost bin and this time washed their hand with the company’s bar soap, and again placed the washed hand in an agar dish and a bacteria count was made 24 hours later. A lower threshold of 300 CFU (colony forming units) was used as the standard of a “clean” hand. Do the resulting bacteria counts support the bar-soap company’s claim? Is this paired data? 583 Stat 109 Quiz 9 Prep Solution 584 Revisited: Distinguishing Paired vs Unpaired Categorical Data Paired Categorical Data will have the 2 treatments (Anti-Bacterial Soap vs Bar Soap) separated into row and column responses of “Clean” and “Dirty” counts. Each cell represents the particular response of a treatment from both categories of treatments. 20 people placed their hands in the compost pile while using bacterial soap, and then repeated the experience while using bar soap. If the two resulting bacteria counts are compared for each individual, then we have paired categorical data. Repeating the point for emphasis: that is Susan’s bacteria count with antibacterial soap is compared with Susan’s bacteria count with bar soap hand washing, and John’s bacteria count with anti-bacterial soap is compared with John’s bacteria count with bar soap hand washing, and so on for each person 2 bacterial counts are also compared against each other. For example in the first cell n1,1 we see that 10 people had “Clean” bacteria counts while using AntiBacterial Soap, and these same 10 people had low bacteria counts while using Bar Soap. Etc… Anti-Bacterial Soap “Clean Counts” < 300 CFU “Dirty Counts” > 300 CFU Bar Soap “Clean Counts” < 300 CFU Bar Soap “Dirty Counts” > 300 CFU 10 3 6 1 Circle the correct Chi-Square test to apply: 1) Goodness of Fit Test 2) Independence of Attributes 3) McNemar’s Test Unpaired Categorical Data will have rows relegated to treatments, while columns will hold responses. Each cell represents the particular response of a treatment from only one category of treatment. For example in the first cell n1,1 we see that 15 people had “Clean” bacteria counts while using Anti-Bacterial Soap. Those 15 people are not referenced again under a second treatment nor a second, paired response. Circle the correct Chi-Square test to apply: “Clean Counts” “Dirty Counts” < 300 CFU > 300 CFU 1) Goodness of Fit Test Anti-Bacterial Soap 15 5 Bar Soap 12 8 2) Independence of Attributes 3) McNemar’s Test