Formulas for Stat 109 Exam 1 585 In a density histogram each rectangular area = percentage (or relative frequency) 1 n 1 k Mean: y y i f i y i y i p y i n i 1 n i 1 SD = 1 n yi y 2 = n 1 i 1 1 n 2 f i yi y = n 1 i 1 𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒: 𝑛 𝑡ℎ 𝑛 [ ] +[4+1] 𝑄1 = 4 y 𝑡ℎ 𝑛 4 2 i 2 i IQR = Q3 Q1 𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4 𝑡ℎ x px p yi = 2 i step = 1.5 IQR 𝑄1 = [ + 1 ] 2 3𝑛 𝑡ℎ 𝑡ℎ 3𝑛 [ ] +[ 4 +1] 𝑄3 = 4 𝑄3 = [ n 1 ~ x th value 2 𝑡ℎ 3𝑛 +1] 4 2 P A B P A PB P A B P A B P A PB True only when: A and B are mutually exclusive events. P A B P APB True only when A and B are independent events. y 1sd contains y 2sd contains y 3sd contains Y ~ binomial (n, p): j = 0, 1, 2, …, n P A B P APB A 𝑃(𝐴|𝐵) = 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵) 𝑃(𝐵|𝐴) = n n j PY j p j 1 p j 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴) Y ~ binomial (n, p): EX x np k 0.5 PY k PY k 0.5 P Z k 0.5 PY k PY k 0.5 P Z k 0.5 k 0.5 PY k Pk 0.5 Y k 0.5 P Z Z Y Z Y / n 68% of the data 95% of the data 99.7% of the data SDX x np1 p Z Y np np1 p Y 0.5 0.5 p pˆ p n n n Z p1 p p1 p n n Table 3 -z 586 .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 -0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 0.000337 0.000483 0.000687 0.000968 0.001350 0.001866 0.002555 0.003467 0.004660 0.006210 0.008198 0.010724 0.013903 0.017864 0.022750 0.028717 0.035930 0.044565 0.054799 0.066807 0.080757 0.096800 0.115070 0.135666 0.158655 0.184060 0.211855 0.241964 0.274253 0.308538 0.344578 0.382089 0.420740 0.460172 0.500000 0.500000 0.539828 0.579260 0.617911 0.655422 0.691462 0.725747 0.758036 0.788145 0.815940 0.841345 0.864334 0.884930 0.903200 0.919243 0.933193 0.945201 0.955435 0.964070 0.971283 0.977250 0.982136 0.986097 0.989276 0.991802 0.993790 0.995340 0.996533 0.997445 0.998134 0.998650 0.999032 0.999313 0.999517 0.999663 0.000325 0.000466 0.000664 0.000935 0.001306 0.001807 0.002477 0.003364 0.004527 0.006037 0.007976 0.010444 0.013553 0.017429 0.022216 0.028067 0.035148 0.043633 0.053699 0.065522 0.079270 0.095098 0.113139 0.133500 0.156248 0.181411 0.208970 0.238852 0.270931 0.305026 0.340903 0.378280 0.416834 0.456205 0.496011 0.503989 0.543795 0.583166 0.621720 0.659097 0.694974 0.729069 0.761148 0.791030 0.818589 0.843752 0.866500 0.886861 0.904902 0.920730 0.934478 0.946301 0.956367 0.964852 0.971933 0.977784 0.982571 0.986447 0.989556 0.992024 0.993963 0.995473 0.996636 0.997523 0.998193 0.998694 0.999065 0.999336 0.999534 0.999675 0.000313 0.000450 0.000641 0.000904 0.001264 0.001750 0.002401 0.003264 0.004396 0.005868 0.007760 0.010170 0.013209 0.017003 0.021692 0.027429 0.034380 0.042716 0.052616 0.064255 0.077804 0.093418 0.111232 0.131357 0.153864 0.178786 0.206108 0.235763 0.267629 0.301532 0.337243 0.374484 0.412936 0.452242 0.492022 0.507978 0.547758 0.587064 0.625516 0.662757 0.698468 0.732371 0.764237 0.793892 0.821214 0.846136 0.868643 0.888768 0.906582 0.922196 0.935745 0.947384 0.957284 0.965620 0.972571 0.978308 0.982997 0.986791 0.989830 0.992240 0.994132 0.995604 0.996736 0.997599 0.998250 0.998736 0.999096 0.999359 0.999550 0.999687 0.000302 0.000434 0.000619 0.000874 0.001223 0.001695 0.002327 0.003167 0.004269 0.005703 0.007549 0.009903 0.012874 0.016586 0.021178 0.026803 0.033625 0.041815 0.051551 0.063008 0.076359 0.091759 0.109349 0.129238 0.151505 0.176186 0.203269 0.232695 0.264347 0.298056 0.333598 0.370700 0.409046 0.448283 0.488034 0.511966 0.551717 0.590954 0.629300 0.666402 0.701944 0.735653 0.767305 0.796731 0.823814 0.848495 0.870762 0.890651 0.908241 0.923641 0.936992 0.948449 0.958185 0.966375 0.973197 0.978822 0.983414 0.987126 0.990097 0.992451 0.994297 0.995731 0.996833 0.997673 0.998305 0.998777 0.999126 0.999381 0.999566 0.999698 0.000291 0.000419 0.000598 0.000845 0.001183 0.001641 0.002256 0.003072 0.004145 0.005543 0.007344 0.009642 0.012545 0.016177 0.020675 0.026190 0.032884 0.040930 0.050503 0.061780 0.074934 0.090123 0.107488 0.127143 0.149170 0.173609 0.200454 0.229650 0.261086 0.294599 0.329969 0.366928 0.405165 0.444330 0.484047 0.515953 0.555670 0.594835 0.633072 0.670031 0.705401 0.738914 0.770350 0.799546 0.826391 0.850830 0.872857 0.892512 0.909877 0.925066 0.938220 0.949497 0.959070 0.967116 0.973810 0.979325 0.983823 0.987455 0.990358 0.992656 0.994457 0.995855 0.996928 0.997744 0.998359 0.998817 0.999155 0.999402 0.999581 0.999709 0.000280 0.000404 0.000577 0.000816 0.001144 0.001589 0.002186 0.002980 0.004025 0.005386 0.007143 0.009387 0.012224 0.015778 0.020182 0.025588 0.032157 0.040059 0.049471 0.060571 0.073529 0.088508 0.105650 0.125072 0.146859 0.171056 0.197663 0.226627 0.257846 0.291160 0.326355 0.363169 0.401294 0.440382 0.480061 0.519939 0.559618 0.598706 0.636831 0.673645 0.708840 0.742154 0.773373 0.802337 0.828944 0.853141 0.874928 0.894350 0.911492 0.926471 0.939429 0.950529 0.959941 0.967843 0.974412 0.979818 0.984222 0.987776 0.990613 0.992857 0.994614 0.995975 0.997020 0.997814 0.998411 0.998856 0.999184 0.999423 0.999596 0.999720 0.000270 0.000390 0.000557 0.000789 0.001107 0.001538 0.002118 0.002890 0.003907 0.005234 0.006947 0.009137 0.011911 0.015386 0.019699 0.024998 0.031443 0.039204 0.048457 0.059380 0.072145 0.086915 0.103835 0.123024 0.144572 0.168528 0.194895 0.223627 0.254627 0.287740 0.322758 0.359424 0.397432 0.436441 0.476078 0.523922 0.563559 0.602568 0.640576 0.677242 0.712260 0.745373 0.776373 0.805105 0.831472 0.855428 0.876976 0.896165 0.913085 0.927855 0.940620 0.951543 0.960796 0.968557 0.975002 0.980301 0.984614 0.988089 0.990863 0.993053 0.994766 0.996093 0.997110 0.997882 0.998462 0.998893 0.999211 0.999443 0.999610 0.999730 0.000260 0.000376 0.000538 0.000762 0.001070 0.001489 0.002052 0.002803 0.003793 0.005085 0.006756 0.008894 0.011604 0.015003 0.019226 0.024419 0.030742 0.038364 0.047460 0.058208 0.070781 0.085343 0.102042 0.121000 0.142310 0.166023 0.192150 0.220650 0.251429 0.284339 0.319178 0.355691 0.393580 0.432505 0.472097 0.527903 0.567495 0.606420 0.644309 0.680822 0.715661 0.748571 0.779350 0.807850 0.833977 0.857690 0.879000 0.897958 0.914657 0.929219 0.941792 0.952540 0.961636 0.969258 0.975581 0.980774 0.984997 0.988396 0.991106 0.993244 0.994915 0.996207 0.997197 0.997948 0.998511 0.998930 0.999238 0.999462 0.999624 0.999740 0.000251 0.000362 0.000519 0.000736 0.001035 0.001441 0.001988 0.002718 0.003681 0.004940 0.006569 0.008656 0.011304 0.014629 0.018763 0.023852 0.030054 0.037538 0.046479 0.057053 0.069437 0.083793 0.100273 0.119000 0.140071 0.163543 0.189430 0.217695 0.248252 0.280957 0.315614 0.351973 0.389739 0.428576 0.468119 0.531881 0.571424 0.610261 0.648027 0.684386 0.719043 0.751748 0.782305 0.810570 0.836457 0.859929 0.881000 0.899727 0.916207 0.930563 0.942947 0.953521 0.962462 0.969946 0.976148 0.981237 0.985371 0.988696 0.991344 0.993431 0.995060 0.996319 0.997282 0.998012 0.998559 0.998965 0.999264 0.999481 0.999638 0.999749 0.000242 0.000349 0.000501 0.000711 0.001001 0.001395 0.001926 0.002635 0.003573 0.004799 0.006387 0.008424 0.011011 0.014262 0.018309 0.023295 0.029379 0.036727 0.045514 0.055917 0.068112 0.082264 0.098525 0.117023 0.137857 0.161087 0.186733 0.214764 0.245097 0.277595 0.312067 0.348268 0.385908 0.424655 0.464144 0.535856 0.575345 0.614092 0.651732 0.687933 0.722405 0.754903 0.785236 0.813267 0.838913 0.862143 0.882977 0.901475 0.917736 0.931888 0.944083 0.954486 0.963273 0.970621 0.976705 0.981691 0.985738 0.988989 0.991576 0.993613 0.995201 0.996427 0.997365 0.998074 0.998605 0.998999 0.999289 0.999499 0.999651 0.999758 +z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 Table 3 587 Table 3A: CDF for the Standard Normal Distribution (left tail areas). -z -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 -0.0 area = Prob[ Z < a ] a 0 .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.000337 0.000483 0.000687 0.000968 0.001350 0.001866 0.002555 0.003467 0.004660 0.006210 0.008198 0.010724 0.013903 0.017864 0.022750 0.028717 0.035930 0.044565 0.054799 0.066807 0.080757 0.096800 0.115070 0.135666 0.158655 0.184060 0.211855 0.241964 0.274253 0.308538 0.344578 0.382089 0.420740 0.460172 0.500000 0.000325 0.000466 0.000664 0.000935 0.001306 0.001807 0.002477 0.003364 0.004527 0.006037 0.007976 0.010444 0.013553 0.017429 0.022216 0.028067 0.035148 0.043633 0.053699 0.065522 0.079270 0.095098 0.113139 0.133500 0.156248 0.181411 0.208970 0.238852 0.270931 0.305026 0.340903 0.378280 0.416834 0.456205 0.496011 0.000313 0.000450 0.000641 0.000904 0.001264 0.001750 0.002401 0.003264 0.004396 0.005868 0.007760 0.010170 0.013209 0.017003 0.021692 0.027429 0.034380 0.042716 0.052616 0.064255 0.077804 0.093418 0.111232 0.131357 0.153864 0.178786 0.206108 0.235763 0.267629 0.301532 0.337243 0.374484 0.412936 0.452242 0.492022 0.000302 0.000434 0.000619 0.000874 0.001223 0.001695 0.002327 0.003167 0.004269 0.005703 0.007549 0.009903 0.012874 0.016586 0.021178 0.026803 0.033625 0.041815 0.051551 0.063008 0.076359 0.091759 0.109349 0.129238 0.151505 0.176186 0.203269 0.232695 0.264347 0.298056 0.333598 0.370700 0.409046 0.448283 0.488034 0.000291 0.000419 0.000598 0.000845 0.001183 0.001641 0.002256 0.003072 0.004145 0.005543 0.007344 0.009642 0.012545 0.016177 0.020675 0.026190 0.032884 0.040930 0.050503 0.061780 0.074934 0.090123 0.107488 0.127143 0.149170 0.173609 0.200454 0.229650 0.261086 0.294599 0.329969 0.366928 0.405165 0.444330 0.484047 0.000280 0.000404 0.000577 0.000816 0.001144 0.001589 0.002186 0.002980 0.004025 0.005386 0.007143 0.009387 0.012224 0.015778 0.020182 0.025588 0.032157 0.040059 0.049471 0.060571 0.073529 0.088508 0.105650 0.125072 0.146859 0.171056 0.197663 0.226627 0.257846 0.291160 0.326355 0.363169 0.401294 0.440382 0.480061 0.000270 0.000390 0.000557 0.000789 0.001107 0.001538 0.002118 0.002890 0.003907 0.005234 0.006947 0.009137 0.011911 0.015386 0.019699 0.024998 0.031443 0.039204 0.048457 0.059380 0.072145 0.086915 0.103835 0.123024 0.144572 0.168528 0.194895 0.223627 0.254627 0.287740 0.322758 0.359424 0.397432 0.436441 0.476078 0.000260 0.000376 0.000538 0.000762 0.001070 0.001489 0.002052 0.002803 0.003793 0.005085 0.006756 0.008894 0.011604 0.015003 0.019226 0.024419 0.030742 0.038364 0.047460 0.058208 0.070781 0.085343 0.102042 0.121000 0.142310 0.166023 0.192150 0.220650 0.251429 0.284339 0.319178 0.355691 0.393580 0.432505 0.472097 0.000251 0.000362 0.000519 0.000736 0.001035 0.001441 0.001988 0.002718 0.003681 0.004940 0.006569 0.008656 0.011304 0.014629 0.018763 0.023852 0.030054 0.037538 0.046479 0.057053 0.069437 0.083793 0.100273 0.119000 0.140071 0.163543 0.189430 0.217695 0.248252 0.280957 0.315614 0.351973 0.389739 0.428576 0.468119 0.000242 0.000349 0.000501 0.000711 0.001001 0.001395 0.001926 0.002635 0.003573 0.004799 0.006387 0.008424 0.011011 0.014262 0.018309 0.023295 0.029379 0.036727 0.045514 0.055917 0.068112 0.082264 0.098525 0.117023 0.137857 0.161087 0.186733 0.214764 0.245097 0.277595 0.312067 0.348268 0.385908 0.424655 0.464144 Table 3 588 Table 3B: CDF for the Standard Normal Distribution (left-tail areas to the right of the mean). area = Prob[ Z < a ] 0 +z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 .00 0.500000 0.539828 0.579260 0.617911 0.655422 0.691462 0.725747 0.758036 0.788145 0.815940 0.841345 0.864334 0.884930 0.903200 0.919243 0.933193 0.945201 0.955435 0.964070 0.971283 0.977250 0.982136 0.986097 0.989276 0.991802 0.993790 0.995340 0.996533 0.997445 0.998134 0.998650 0.999032 0.999313 0.999517 0.999663 .01 0.503989 0.543795 0.583166 0.621720 0.659097 0.694974 0.729069 0.761148 0.791030 0.818589 0.843752 0.866500 0.886861 0.904902 0.920730 0.934478 0.946301 0.956367 0.964852 0.971933 0.977784 0.982571 0.986447 0.989556 0.992024 0.993963 0.995473 0.996636 0.997523 0.998193 0.998694 0.999065 0.999336 0.999534 0.999675 .02 0.507978 0.547758 0.587064 0.625516 0.662757 0.698468 0.732371 0.764237 0.793892 0.821214 0.846136 0.868643 0.888768 0.906582 0.922196 0.935745 0.947384 0.957284 0.965620 0.972571 0.978308 0.982997 0.986791 0.989830 0.992240 0.994132 0.995604 0.996736 0.997599 0.998250 0.998736 0.999096 0.999359 0.999550 0.999687 .03 0.511966 0.551717 0.590954 0.629300 0.666402 0.701944 0.735653 0.767305 0.796731 0.823814 0.848495 0.870762 0.890651 0.908241 0.923641 0.936992 0.948449 0.958185 0.966375 0.973197 0.978822 0.983414 0.987126 0.990097 0.992451 0.994297 0.995731 0.996833 0.997673 0.998305 0.998777 0.999126 0.999381 0.999566 0.999698 .04 0.515953 0.555670 0.594835 0.633072 0.670031 0.705401 0.738914 0.770350 0.799546 0.826391 0.850830 0.872857 0.892512 0.909877 0.925066 0.938220 0.949497 0.959070 0.967116 0.973810 0.979325 0.983823 0.987455 0.990358 0.992656 0.994457 0.995855 0.996928 0.997744 0.998359 0.998817 0.999155 0.999402 0.999581 0.999709 .05 0.519939 0.559618 0.598706 0.636831 0.673645 0.708840 0.742154 0.773373 0.802337 0.828944 0.853141 0.874928 0.894350 0.911492 0.926471 0.939429 0.950529 0.959941 0.967843 0.974412 0.979818 0.984222 0.987776 0.990613 0.992857 0.994614 0.995975 0.997020 0.997814 0.998411 0.998856 0.999184 0.999423 0.999596 0.999720 a .06 0.523922 0.563559 0.602568 0.640576 0.677242 0.712260 0.745373 0.776373 0.805105 0.831472 0.855428 0.876976 0.896165 0.913085 0.927855 0.940620 0.951543 0.960796 0.968557 0.975002 0.980301 0.984614 0.988089 0.990863 0.993053 0.994766 0.996093 0.997110 0.997882 0.998462 0.998893 0.999211 0.999443 0.999610 0.999730 .07 0.527903 0.567495 0.606420 0.644309 0.680822 0.715661 0.748571 0.779350 0.807850 0.833977 0.857690 0.879000 0.897958 0.914657 0.929219 0.941792 0.952540 0.961636 0.969258 0.975581 0.980774 0.984997 0.988396 0.991106 0.993244 0.994915 0.996207 0.997197 0.997948 0.998511 0.998930 0.999238 0.999462 0.999624 0.999740 .08 0.531881 0.571424 0.610261 0.648027 0.684386 0.719043 0.751748 0.782305 0.810570 0.836457 0.859929 0.881000 0.899727 0.916207 0.930563 0.942947 0.953521 0.962462 0.969946 0.976148 0.981237 0.985371 0.988696 0.991344 0.993431 0.995060 0.996319 0.997282 0.998012 0.998559 0.998965 0.999264 0.999481 0.999638 0.999749 .09 0.535856 0.575345 0.614092 0.651732 0.687933 0.722405 0.754903 0.785236 0.813267 0.838913 0.862143 0.882977 0.901475 0.917736 0.931888 0.944083 0.954486 0.963273 0.970621 0.976705 0.981691 0.985738 0.988989 0.991576 0.993613 0.995201 0.996427 0.997365 0.998074 0.998605 0.998999 0.999289 0.999499 0.999651 0.999758 Stat 109 Sample Exam 1A Name_____________ 1.) Given a scenario where 30% of the salmon in a river are farm fish, and the rest are wild, and we know that 12% the wild salmon are tagged while 85% of the farm fish are tagged, find the following probabilities: Declare all the event space variables: 2pts Answer all questions first with complete probability notation. Then answer each question with an English sentence. a) Find the probability that a randomly drawn salmon is tagged. 7pts b) Find the probability of drawing a wild salmon given that the fish is tagged. 7pts c) Are the type of salmon drawn and whether it is tagged or not independent events? 5pts (Again use probability notation and numerical values to support your answer.) 589 Stat 109 2.) Sample Exam 1A 590 John checks his chicken coop each morning and finds the following number of eggs according to the probability distribution table. 2a.) Find the expected number of eggs (the mean of x.) 5 pts x, eggs 2 3 4 p(x) 0.3 0.5 0.2 2b.) Find the typical range of eggs collected each morning expressed as the first standard deviation of x. (show your work.) 8pts 3.) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Data display for year 2000: widowed females per 1000 for all 58 California counties. 57 59 66 76 78 80 80 81 82 83 84 84 85 86 87 88 89 90 90 90 MONO SAN BENITO ALPINE YOLO SANTA CLARA EL DORADO MADERA SAN BERNARDINO VENTURA ORANGE KINGS SANTA CRUZ SAN DIEGO SOLANO LOS ANGELES ALAMEDA MONTEREY KERN LASSEN PLACER 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 91 91 91 92 93 93 93 93 94 97 97 97 98 98 99 100 102 102 103 104 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 FRESNO MERCED TULARE CONTRA COSTA MARIN PLUMAS SANTA BARBARA STANISLAUS SACRAMENTO SAN JOAQUIN SAN MATEO YUBA MENDOCINO RIVERSIDE SONOMA GLENN IMPERIAL SAN LUIS OBISPO SAN FRANCISCO COLUSA 104 104 105 106 109 111 112 113 116 116 119 120 121 123 125 135 138 146 HUMBOLDT NEVADA SUTTER DEL NORTE SIERRA CALAVERAS SHASTA TUOLUMNE MARIPOSA SISKIYOU BUTTE TEHAMA NAPA TRINITY AMADOR INYO LAKE MODOC 3a.) Is the 200 California county widowed data symmetrical or is it skewed to left or right? 2pts Frequency 2000 CA Widowed Females per 1000 by County 18 16 14 12 10 8 6 4 2 0 60 80 100 120 140 Stat 109 Sample Exam 1A 591 3b.) Use correct notation to express the 5 key values for a boxplot for the incidence of widowed females per 1000 for all 58 California counties: 3c.) Draw a boxplot of the CA county widow data set. Properly denote any outliers in the plot. 11pts Stat 109 Sample Exam 1A 4.) The lengths of adult Koi in a large pond are normally distributed with a mean length of 25 inches and a standard deviation of 3 inches. Answer the following Questions using the event variable with probability notation and show the equivalent Z score within probability notation. 592 a) Declare the event variable. 1pt 4a.) What probability corresponds to a Koi that is at most 30 inches in length? 8pts 4b.) The top 29% of Koi lengths must be above what length threshold? 8pts 4c.) How long will a Koi’s length be if it’s length is at the 37th percentile of length? 8pts 4d.) The Koi population of this particular pond is the progeny of a metallic “Ogon” variety such that each of the Koi presents a number of metallic scales on its body interspersed among the other scales of the fish body. Assuming that the number of metallic scales on any given Koi are normally distributed throughout the population, with a mean of 18 metallic scales and a standard deviation of 3 metallic scales, determine the following probabilities. a) Declare the event variable. 1pt 4e.) Find the probability that a randomly drawn Koi has less than 16 metallic scales upon its body. 8pt 4f.) Find the probability that a randomly drawn Koi has more than 17 metallic scales upon its body. 8pt 4g.) Find the probability that a randomly drawn Koi has between 13 and 19 (inclusive) metallic scales upon its body. 8pt Stat 109 Sample Exam 1A 593 5a.) In 2005 57% of the incoming Freshman class for the California State University required either English or Mathematic remediation of before taking college level course work. 47% required remediation in English while 37% required remediation in Math. What portion of the 2005 Freshman class required remediation in both English and Math? Use probability notation to express your answer. 8pts 5b.) What is the probability that the a 2005 Freshman needs remediation in Math but not English? Use probability notation to express your answer. 8pts 5c.) Find the probability that either 6 or 7 out of 10 randomly drawn CSU Freshman will not require remediation class work. Declare the variable. Use probability notation to express your answer. 8pts 5d.) 6.) Find the first standard deviation window to express the typical range of 10 randomly chosen Freshman that will not require remediation? 8pts Given that one out of four salamanders are consumed as prey before reaching sexual maturity, Find the probability that between 10% and 30% of 50 juvenile salamanders will be consumed before reaching maturity. Show all work and use proper notation for full credit. Finish with an English sentence. 12pts Stat 109 Sample Exam 1A Solutions 1.) Given a scenario where 30% of the salmon in a river are farm fish, and the rest are wild, and we know that 12% the wild salmon are tagged while 85% of the farm fish are tagged, find the following probabilities: Answer all questions first with complete probability notation. Then answer each question with an English sentence. a) Declare all the event space variables: 2pts T = Tagged Salmon W = Wild Salmon F = Farm Salmon Find the probability that a randomly drawn salmon is tagged. PT PT W PW PT F PF PT 0.12 0.7 0.85 0.3 PT .339 b) 7pts Notation: 2pts Calculation: 5pts “About 34% of random draws will be tagged.” Find the probability of drawing a wild salmon given that the fish is tagged. 7pts PW T P W T 594 PT W PW PW T PT PT 0.12 0.70 = 0.2478 0.339 Notation: 2pts Calculation: 5pts “About 25% of tagged fish will be wild.” c) Are the type of salmon drawn and whether it is tagged or not independent events? 5pts (Again use probability notation and numerical values to support your answer.) For independent events the following must be true: P A B P A PB We will use the tagged and wild salmon data since it is at hand: PW T PW PT ? Is this true? ??? PW T PT W PW from above PW T 0.12 0.7 PW T 0.084 Since 0.084 0.2373 Then: PW T PW PT Therefore we do not have independent events. PW PT 0.7 0.339 PW PT 0.2373 In English: “The probability of whether a salmon is tagged or not depends on whether it is a wild or a farmed salmon.” Stat 109 Sample Exam 1A Solutions 595 2.) John checks his chicken coop each morning and finds the following number of eggs according to the probability distribution table. Express answer with correct notation. 2a.) Find the expected number of eggs (the mean of x.) xi pxi 2 0.3 3 0.5 4 0.2 = 2.9 5 pts 2.9 x, eggs 2 3 4 1 pt for Notation 2 2b.) Find the typical range of eggs collected each morning expressed as the first standard deviation of x. (show your work.) 8pts (1 pt for notation) x px 2 i i 2 2 2 .3 3 2 .5 4 2 .2 2.9 2 0.49 0.7 0.7 𝝁 ± 𝝈 = 𝟐. 𝟗 ± 𝟎. 𝟕 = (𝟐. 𝟐, 𝟑. 𝟔) 3.) Data display for year 2000: Widowed females per 1000 for all 58 California counties. 57 59 66 76 78 80 80 81 82 83 84 84 85 86 87 88 89 90 90 90 MONO SAN BENITO ALPINE YOLO SANTA CLARA EL DORADO MADERA SAN BERNARDINO VENTURA ORANGE KINGS SANTA CRUZ SAN DIEGO SOLANO LOS ANGELES ALAMEDA MONTEREY KERN LASSEN PLACER 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 91 91 91 92 93 93 93 93 94 97 97 97 98 98 99 100 102 102 103 104 FRESNO MERCED TULARE CONTRA COSTA MARIN PLUMAS SANTA BARBARA STANISLAUS SACRAMENTO SAN JOAQUIN SAN MATEO YUBA MENDOCINO RIVERSIDE SONOMA GLENN IMPERIAL SAN LUIS OBISPO SAN FRANCISCO COLUSA 3a.) Is the 200 California county widowed data symmetrical or is it skewed to left or right? Skewed slightly right 2pts 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 104 104 105 106 109 111 112 113 116 116 119 120 121 123 125 135 138 146 HUMBOLDT NEVADA SUTTER DEL NORTE SIERRA CALAVERAS SHASTA TUOLUMNE MARIPOSA SISKIYOU BUTTE TEHAMA NAPA TRINITY AMADOR INYO LAKE MODOC 2000 CA Widowed Females per 1000 by County Frequency 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 18 16 14 12 10 8 6 4 2 0 60 80 100 120 140 p(x) 0.3 0.5 0.2 Stat 109 Sample Exam 1A Solutions 596 3b.) Use correct notation to express the 5 key values for a boxplot in correct notation for the incidence of widowed females per 1000 for all 58 California counties: 11 pts 2.) Find the median: 3.) Determine the Quartile Criterion 𝑛 + 1 𝑡ℎ 𝑥̃ = ( ) 2 𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒: 𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] 𝑄1 58 + 1 𝑡ℎ =( ) 2 [ 2 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 [ 4 ] +[ 4 +1] 2 𝑄3 59 𝑡ℎ =( ) 2 𝑡ℎ 𝑛 +1] 4 𝑡ℎ 3𝑛 [ +1] 4 = 29.5𝑡ℎ = 29𝑡ℎ + 0.5(30𝑡ℎ − 29𝑡ℎ ) Where square brackets indicate that we round any decimal down to find the nth value in the data set. N = 58 is not divisible by 4 so: 𝑄1 = [ 4 + 1 ] 𝑎𝑛𝑑 𝑄3 = [ = 94 + 0.5(97 − 94) = 94 + 0.5(3) 𝑥̃ = 95.5 4.) Find the 1st and 3rd Quartiles. 𝑄1 = [ =[ 𝑡ℎ 𝑛 +1] 4 58 +1] 4 =[ 𝑡ℎ 3 ∙ 58 +1] 4 = [ 14.5 + 1 ]𝑡ℎ = [ 43.5 + 1 ]𝑡ℎ = [ 15.5 ]𝑡ℎ = [ 44.5 ]𝑡ℎ = 15𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 44 𝑄1 = 87 𝑄3 = 106 𝑡ℎ 𝑡ℎ 3𝑛 4 +1] 𝑡ℎ 5.) Find the IQR and Step. 𝑡ℎ 3𝑛 𝑄3 = [ +1] 4 𝑡ℎ 𝑛 𝑣𝑎𝑙𝑢𝑒 Outliers: LOT = 𝑄1 − 𝑆𝑡𝑒𝑝 UOT = 𝑄3 + 𝑆𝑡𝑒𝑝 LOT = 87 − 28.5 UOT = 106 + 28.5 LOT = 58.5 UOT = 134.5 IQR = Inter-quartile range IQR = Q3 – Q1 IQR = 106 – 87 IQR = 19 𝑆𝑡𝑒𝑝 = 1.5 × 𝐼𝑄𝑅 𝑆𝑡𝑒𝑝 = 1.5 × 19 𝑆𝑡𝑒𝑝 = 28.5 Stat 109 Sample Exam 1A Solutions 597 3c.) Draw a boxplot of the CA county widow data set. Properly denote any outliers in the plot. Note! All data points that lie outside the outlier thresholds are expressed with asterisks. The boxplot whiskers terminate at the last data point to lie within the outlier thresholds. It is an error to extend the whiskers to the outlier thresholds. o o 4.) The lengths of adult Koi in a large pond are normally distributed with a mean length of 25 inches and a standard deviation of 3 inches. Answer the following Questions using the event variable with probability notation and show the equivalent Z score within probability notation. o o Declare the event variable. X = Koi length in inches. 1pt X = Fish. -1pt: For a vague declaration without units of measure 4a.) 4a.) What probability corresponds to a Koi at most 30 inches in length? 8pts Note! Here is a number with a circle drawn around it: X 30 25 Z Z = 1.67, then Z 3 0.95254 -1pt P X 30 PZ 1.67 0.95254 This number has no meaning without its supporting notation. Answer like this with complete notation, Not like this: 4b.) The top 29% of Koi lengths must be above what length threshold? 8pts Note: This is an upper tail probability and we must know to use the compliment value because the normal distribution tables only provide for lower tail probabilities. P X ??? PZ ??? 0.29 a) b) c) d) Search the body of the table for Z score associated with 0.71 Then PZ 0.55 0.29 X X 25 Find the X associated with this Z score: Z 0.55 X = 26.65 3 Report the answer using probability notation. Show the transition from the real world X values to the associated Z scores. P X 26.65 PZ 0.55 0.29 Stat 109 Sample Exam 1A Solutions 598 4c.) How long will a Koi’s length be if it’s length is at the 37th percentile of length? 8pts Note that the percentiles on the normal curve run from zero to 100% and accumulate from left to right. 0% 10% 20% 22" The corresponding Koi lengths for each of these percentiles are shown below. 30% 23" 40% 24" 50% 60% 70% 80% 25" 26" 27" 28" 90% 100% c) We find: d) Using Standardization we find x: x x 25 PZ 0.33 0.37 Z 0.33 3 e) Report your answer with full probability notation: a) Search the body of the table for Z score associated with 0.37. P X 24 PZ 0.33 0.37 b) P X ??? PZ ??? 0.37 4d.) The Koi population of this particular pond is the progeny of a a) Declare the event variable. 1pt metallic “Ogon” variety such that each of the Koi presents a number of N = The number of metallic metallic scales on its body interspersed among the other scales of the scales found on given Koi fish body. Assuming that the number of metallic scales on any given from the pond. Koi are normally distributed throughout the population, with a mean of N = Scales -1pt: 18 metallic scales and a standard deviation of 3 metallic scales, For a vague declaration determine the following probabilities. 4e.) Find the probability that a randomly drawn Koi has less than 16 metallic scales upon its body. 8pt PN 16 PN 15 Note that discrete counts must be adjusted for a mapping to the continuous normal curve. Strict inequalities of “less than” must be converted numerically to “less than or equal to” if using a continuous distribution. PN 16 PN 15 PN 15.5 Z x 15.5 18 3 A continuity correction is necessary here because a discrete count of fish scales must be mapped to a continuous curve of the normal distribution. Note that we always choose a half count in the direction that enlarges the shaded region beneath the normal curve. Z 0 .8 3 PN 16 PZ 0.83 0.203269 In English: “About 20% of the time a randomly drawn Koi will have less than 16 metallic scales upon its body. Answer with complete notation like this, and not this: 0.203269 Stat 109 Sample Exam 1A Solutions 599 4f.) Find the probability that a randomly drawn Koi has more than 17 metallic scales upon its body. 8pt PN 17 PN 18 Note that discrete counts must be adjusted for a mapping to the continuous normal curve. Strict inequalities of “greater than” must be converted numerically to “greater than or equal to” if using a continuous distribution. PN 17 PN 18 PN 17.5 Z x A continuity correction is necessary here because a discrete count of fish scales must be mapped to a continuous curve of the normal distribution. Note that we always choose a half count in the direction that enlarges the shaded region beneath the normal curve. Also note that a compliment of one minus the Z-table probability must be accounted for here. 17.5 18 3 Z 0 .1 6 PZ 0.17 1 PZ 0.17 1 0.432505 PN 17 PZ 0.17 0.567495 Answer with complete notation like this, and not this: In English: “About 57% of the time a randomly drawn Koi will have more than 17 metallic scales upon its body. 0.567495 4g.) Find the probability that a randomly drawn Koi has between 13 and 19 (inclusive) metallic scales upon its body. 8pt P13 N 19 Note that discrete counts must be adjusted for a mapping to the continuous normal curve. Here the inclusive counts (of 13 and 19) are already specified in the problem. P13 N 19 P12.5 N 19.5 A continuity correction is still necessary because a discrete count of fish scales must be mapped to a continuous curve of the normal distribution. Note that we always choose a half count in the direction that enlarges the shaded region beneath the normal curve. Z x Z 1 .8 3 12.5 18 3 Z x Z 0.5 19.5 18 3 Stat 109 Sample Exam 1A Solutions 600 4g.) Find the probability that a randomly drawn Koi has between 13 and 19 (inclusive) metallic scales upon its body. 8pt Continued P13 N 19 P 1.83 Z 0.5 P13 N 19 PZ 0.5 PZ 1.83 P13 N 19 0.691462 0.033625 P13 N 19 0.657837 In English: “About 66% of the time a randomly drawn Koi will have between 13 and 19 metallic scales upon its body. Answer with complete notation like this, and not this P13 N 19 P 1.83 Z 0.5 0.657837 0.657837 5a.) In 2005 57% of the incoming Freshman class for the California State University required either English or Mathematic remediation of before taking college level course work. 47% required remediation in English while 37% required remediation in Math. What portion of the 2005 Freshman class required remediation in both English and Math? Use probability notation to express your answer. 8pts PE M PE PM PE M Declaration and notation: 2pts PE M .47 .37 .57 E =Student needs English Remediation. M =Student needs Math Remediation. PE M 0.27 5b.) What is the probability that a 2005 Freshman needs remediation in Math but not English? Use probability notation to express your answer. 8pts 𝑃(𝑀 ∩ 𝐸̅ ) = 𝑃(𝑀) − 𝑃(𝑀 ∩ 𝐸) 𝑃(𝑀 ∩ 𝐸̅ ) = 0.37 − 0.27 𝑃(𝑀 ∩ 𝐸̅ ) = 0.10 5c.) Find the probability that either 6 or 7 out of 10 CSU Freshman will not require remediation class work. Declare the variable. Use probability notation to express your answer. 8pts X = The number of 10 CSU freshman that will not require remediation. 1pt 10 10 P6 X 7 0.436 0.57 4 0.437 0.57 3 6 7 P6 X 7 0.140129 + 0.060407 = 0.200536 Stat 109 Sample Exam 1A Solutions 601 5d.) Find the first standard deviation window to express the typical range of 10 randomly chosen Freshman that will not require remediation? 8pts np np1 p = 10 0.43 10 0.431 0.43 4.3 1.56556 2.734, 5.866 Notation 1pt “About 2.7 to 5.9 of 10 Freshman will typically not require remediation.” 6.) Given that one out of four salamanders are consumed as prey before reaching sexual maturity, Find the probability that between 10% and 30% of 50 salamanders will be consumed before reaching maturity. Show all work and use proper notation for full credit. Finish with an English sentence. 14pts Y pˆ pˆ 0.5 Method 1: Declare proportions with the continuity correction. n n Declare the parameter: 𝑝̂ = proportion of 50 drawn salamanders that are juveniles. 2pt x = Salamanders x = consumed salamanders x = Proportion of salamanders that are consumed. P = portion consumed. -2pt for all of these inadequate declarations. If you are going to use calculations that determine the probability that a sample proportion will range between some specified ̂, that we must declare. Do thresholds, then this is this variable, 𝒑 not use “x” or “y” unless you are declaring for counts and plan to calculate accordingly (see the next page.) Be sure to include the sample size as this will affect probability. P0.1 pˆ 0.3 P 0.10 0.5 0.5 pˆ 0.30 50 50 P0.1 pˆ 0.3 P0.09 pˆ 0.31 Convert proportions to Z scores. Use: P0.1 pˆ 0.3 P???? Z ??? 0.09 .25 Z P0.1 pˆ 0.3 P 0.25 1 0.25 50 P0.1 pˆ 0.3 P 2.613 Z 0.9798 P0.1 pˆ 0.3 P( Z 0.98) PZ 2.61 P0.1 pˆ 0.3 0.836457 – 0.004527 P0.1 pˆ 0.3 0.83192 Z pˆ p p(1 p) n 0.31 .25 0.25 1 0.25 50 Answer in English: “There is about an 83% chance that between 10% and 30% of the 50 salamanders will be consumed before reaching maturity. Stat 109 Sample Exam 1A Solutions 602 Problem #6) Method 2: Convert proportions to counts, then use the continuity correction. ̂= 𝒑 𝒀 → 𝒏 ̂∙𝒏 𝒀=𝒑 Declare the parameter: Y = number of 50 drawn salamanders that are consumed. 2pt x = Salamanders x = consumed salamanders x = Proportion of salamanders that are consumed. P = juveniles. -2pt for all of these inadequate declarations. If you are going to use calculations that determine the probability that a sample count will range between some specified thresholds, then this is the variable that we must declare. Be sure to include the sample size as this will affect probability. 𝑃(0.1 ≤ 𝑝̂ ≤ 0.3) → 𝑃(0.1 ∙ 50 ≤ 𝑝̂ ∙ 𝑛 ≤ 0.3 ∙ 50) → 𝑃(5 ≤ 𝑌 ≤ 15) 𝑃(5 ≤ 𝑌 ≤ 15) ≈ 𝑃(5 − 0.5 ≤ 𝑌 ≤ 15 + 0.5) = 𝑃(4.5 ≤ 𝑌 ≤ 15.5) After applying the continuity correction we convert the counts of juvenile salamanders to Z-scores using the binomial expressions for the mean and standard deviation: 𝜇 = 𝑛 ∙ 𝑝 and 𝜎 = √𝑛 ∙ 𝑝(1 − 𝑝). Note that “one out of four” implies that p = 0.25 𝑃(4.5 ≤ 𝑌 ≤ 15.5) = 𝑃 ( 4.5 − 0.25(50) √50(0.25)(1 − 0.25) ≤𝑍≤ 𝑍= 𝑍= 𝑋−𝜇 𝜎 𝑋−𝑛∙𝑝 √𝑛 ∙ 𝑝(1 − 𝑝) 15.5 − 0.25(50) √50(0.25)(1 − 0.25) ) 𝑃(4.5 ≤ 𝑌 ≤ 15.5) = 𝑃(−2.61279 ≤ 𝑍 ≤ 0.979796) ≈ 𝑃(−2.61 ≤ 𝑍 ≤ 0.98) We round to the nearest Hundredths for the Z-table probability values. 𝑃(4.5 ≤ 𝑌 ≤ 15.5) P 2.613 Z 0.9798 𝑃(4.5 ≤ 𝑌 ≤ 15.5) P( Z 0.98) PZ 2.61 𝑃(4.5 ≤ 𝑌 ≤ 15.5) ≈ 0.836457 – 0.004527 𝑃(4.5 ≤ 𝑌 ≤ 15.5) ≈ 0.83192 Answer in English: “There is about an 83% chance that between 10% and 30% of the 50 salamanders will be consumed before reaching maturity. Stat 109 Sample Exam 1B Name__________ 603 1.) Phenotypic risk for type 2 diabetes was assigned to 2377 people of European ancestry in the Framingham Offspring Study. N Engl J Med. Nov 20, 2008; 359(21): 2208–2219. Participants were genotyped for 18 SNPs (single nucleotide polymorphisms) associated with type 2 diabetes. Based upon the frequency of these 18 SNPs a phenotypic risk for type 2 diabetes was given to each participant on a scale that ran from a low of 7 to a high of 27. Three categories of phenotypic risk for type 2 diabetes were based upon one’s phenotypic score: Low at less than 15, Medium for scores between 16 and 20 inclusive, and High for scores greater than 20. Europeans over 50 years of age with a low phenotypic score had a 7% incidence of type 2 diabetes. While Europeans over 50 with a high phenotypic score had a 17% incidence of type 2 diabetes. Given that a sub-population of folks with European ancestry over 50 years of age contains people with only high and low phenotypic scores, 30% of which were high, with the rest having low scores, and that environmental factors such as diet and exercise have been controlled, determine the following using correct probability notation. Declare variables (2pts) a.) Find the probability that an over 50 year old with European ancestry will have Type 2 diabetes. 7pts b) Find the probability that an over 50 year old with European ancestry will have a low phenotypic score given that the person has Type 2 diabetes. 7pts c) Is the incidence of type 2 diabetes independent of the phenotypic score for people of European ancestry? (Again use probability notation and numerical values to support your answer.) 5pts Stat 109 Sample Exam 1B 2.) Instrumental conditioning is a term developed by Edward Thorndike (1898) in which an animal gradually learns a task through repetition. A Thorndike box was first used on house cats. The cat must learn how to paw at a series of levers and latches with several false starts before she can escape from the box. Given that a large number of individual trials were observed for cats placed in a Thorndike box, suppose that each cat was repeatedly subjected to a Thorndike box until her escape from the box could be performed in under a minute. A pdf of the number of trials required for each cat to reach an escape time of less than a minute was recorded and is posted above. 604 x, trials 4 5 6 7 8 9 10 p(x) 0.1 0.2 0.3 0.0 0.2 0.0 0.2 2a.) Find the expected number of trials required before a house cat can escape from the box in less than a minute.(the mean of x.) 5 pts 2b.) Find the standard deviation of the number of trials required before a house cat can escape from the box in less than a minute. 8pts 2c.) Find the typical range of the number of trials a cat needs to escape the Thorndike box in less than a minute as the first standard deviation of x. (show your work.) 4pts (1 pt for notation) Stat 109 Sample Exam 1B 3.) Kiama Blowhole Eruption Intervals: An ocean swell produces spectacular eruptions of water through a hole in the cliff at Kiama, about 120km south of Sydney, Australia, known as the Blowhole. The times in seconds between each of 64 successive eruptions from 1340 hours on 12 July 1998 were observed using a digital watch. 3a.) Circle the distribution that best describes the interval data for the blowhole eruptions: Skewed Left Symmetrical 605 7 10 15 18 28 40 60 83 8 10 16 21 29 42 61 83 8 10 17 25 29 47 61 87 8 10 17 25 34 51 68 89 8 11 17 26 35 54 69 91 8 11 18 27 36 55 73 95 9 12 18 27 36 56 77 146 9 14 18 28 37 60 82 169 Skewed Right 3b.) Use correct notation to express the 5 key values for a boxplot for the interval of time in seconds between eruptions at the Kiama Blowhole. 11 pts 3c.) Draw a boxplot of the Kaima Blowhole data. Properly denote any outliers in the plot. Stat 109 Sample Exam 1B 606 4.) A population of 15 ruby throated hummingbirds, Archilochus Declare the event variable. colubris, was observed in a controlled environment as the subject of a student senior’s thesis. Of interest was the quantitative feeding behavior of humming birds when a food source is plentiful at numerous sites. Eight humming bird feeders were filled with a sugar-nectar solution suspended from scales that would digitally record the mass of each humming bird feeder after each successive feeding. The difference in the feeder mass between each successive recording provided a record of the amount of fluid consumed by a humming bird at each feeding. The data of the mass consumed at each feeding was normally distributed with a mean of 0.25 g and a standard deviation of 0.06 g. Answer the following questions using the event variable with probability notation and show the equivalent Z score within probability notation. 4a.) 4a.) At what percentile is a feeding of 0.2 grams? 8pts 4b.) The upper tenth percentile of feeding mass corresponds to what mass? 8pts 4c.) How much mass is consumed if the feeding occurred at the 72nd percentile of feeding masses? 8pts Stat 109 Sample Exam 1B 4d.) The same senior thesis on the behavior of feeding for the ruby throated humming bird tracked the number of feedings made by each of the 15 hummingbirds in an hour. The feedings taken in an hour by each bird was normally distributed with a mean of 11.2 and a standard deviation of 1.8. Determine the following probabilities using correct notation. 607 Declare the event variable. 1pt 4e.) Find the probability that a randomly drawn hummingbird makes more than12 feedings in an hour. 8pt 4f.) Find the probability that a randomly drawn humming bird feeds less than 10 times in an hour. 8pt Stat 109 Sample Exam 1B 4g.) Find the probability that a randomly drawn humming bird feeds anywhere between 9 and 13 times (inclusive) in a given hour. 8pt 4h.) Given that a randomly drawn humming bird feeds at the upper 4th percentile of feeding frequency, how many times does the feed in an hour on average? 8pt 608 Stat 109 Sample Exam 1B 609 5a.) The Colorado pikeminnow (AKA white or Colorado river salmon, Ptychocheilus lucius) is the largest minnow native to North America, and it is well known for its spectacular fresh water spawning migrations and homing ability. Despite a massive recovery effort, its numbers decline. Hampered by a loss of habitat, the young of this once abundant fish is overwhelmed in its nursery habitat by invasive small fishes (such as red shiner and fathead minnow). Sites sampled from over 75 tributaries of the Colorado river found that both the invasive species of red shiner or fathead minnow was present in 38% of sampled sites. Given that 55% of the river sites had red shiners present and that 47% of the sampled sites had fathead minnows present, what portion of the sampled sites had either invasive species present? Use probability notation to express your answer. 6pts 5b.) What is the probability that a tributary site of the Colorado has fathead minnows but not red shiners? Use probability notation to express your answer. 6pts 5c.) What is the probability that a tributary site of the Colorado has either both invasive species of fathead minnows and red shiners present or has neither invasive species present? Use probability notation to express your answer. 8pts Stat 109 Sample Exam 1B 610 5d.) Find the probability that more than 6 out of 9 Colorado tributaries have either red shiner or fat head minnow presence. Declare the variable. Use probability notation to express your answer. 8pts 5e.) Find the probability that at least 8 out of 9 Colorado tributaries have fat head minnow presence. 8pts 5e.) Find the probability that a majority of 9 Colorado tributaries have red shiner presence. 8pts 5f.) Find the first standard deviation window to express the typical range of 9 Colorado tributaries that have red shiner presence. 8pts 𝑎. ) 𝑥̅ 𝑎𝑛𝑑 𝜇 ? 5g.) Describe the difference in meaning between these symbols: 𝑏. ) 𝑝̂ 𝑎𝑛𝑑 𝑝 ? 𝑐. ) 𝑥̃ 𝑎𝑛𝑑 𝜂 ? Stat 109 Sample Exam 1B 611 6.) The 4 basic categories of human blood types (O, A, B, and AB) are coupled with an Rh factor that is denoted with a plus (+) for its presence or minus sign (−) for its absence. This Rh factor is found predominantly in rhesus monkeys, and to varying degree in human populations. For the US population it is present in 83.3% of the population. Given that 40 people from the United States are randomly drawn, what is the probability the sample proportion has between 80% to 90% of folks with an Rh + factor for their blood type? Show all work and use proper notation for full credit. Finish with an English sentence. 14pts Stat 109 Sample Exam 1B Solution 612 1.) Phenotypic risk for type 2 diabetes was assigned to 2377 people of European ancestry in the Framingham Offspring Study. N Engl J Med. Nov 20, 2008; 359(21): 2208–2219. Participants were genotyped for 18 SNPs (single nucleotide polymorphisms) associated with type 2 diabetes. Based upon the frequency of these 18 SNPs a phenotypic risk for type 2 diabetes was given to each participant on a scale that ran from a low of 7 to a high of 27. Three categories of phenotypic risk for type 2 diabetes were based upon one’s phenotypic score: Low at less than 15, Medium for scores between 16 and 20 inclusive, and High for scores greater than 20. Europeans over 50 years of age with a low phenotypic score had a 7% incidence of type 2 diabetes. While Europeans over 50 with a high phenotypic score had a 17% incidence of type 2 diabetes. Given that a sub-population of folks with European ancestry over 50 years of age contains people with only high and low phenotypic scores, 30% of which were high, with the rest having low scores, and that environmental factors such as diet and exercise have been controlled, determine the following using correct probability notation. Declare variables (2pts) D = Event that a European has Type 2 diabetes. L = Event that a European has a low SNP score. H = Event that a European has a high SNP score. a.) Find the probability that an over 50 year old with European ancestry will have Type 2 diabetes. PD PD L PL PD H PH PD 0.07 0.7 0.17 0.3 PD .10 c) Notation: 2pts Calculation: 5pts “About 10% of Europeans over 50 will have Type 2 diabetes.” Find the probability that an over 50 year old with European ancestry will have a low phenotypic score given that the person has Type 2 diabetes. 7pts P L D P D L P L P L D P D P D 0.07 0.70 P L D = 0.49 0.10 Notation: 2pts Calculation: 5pts “About 49% of over 50 year olds with European ancestry with Type 2 diabetes will have a low phenotypic score.” c) Is the incidence of type 2 diabetes independent of the phenotypic score for people of European ancestry? (Again use probability notation and numerical values to support your answer.) 5pts Stat 109 Sample Exam 1B Solution 613 1c.) Continued.. For independent events the following must be true: P A B P A PB We will use the Type 2 diabetes and low phenotypic scores since it is at hand: PL D PL PD ? Is this true? ??? PL D PD L PL from above PL D 0.07 0.7 PL D 0.049 Since 0.049 0.07 Then: PL D PL PD Therefore we do not have independent events. PD PL 0.1 0.70 PD PL 0.07 In English: “The probability of whether an over 50 year old of European ancestry develops Type 2 diabetes depends on their phenotypic score.” 2.) Instrumental conditioning is a term developed by Edward Thorndike (1898) in which an animal gradually learns a task through repetition. A Thorndike box was first used on house cats. The cat must learn how to paw at a series of levers and latches with several false starts before she can escape from the box. Given that a large number of individual trials were observed for cats placed in a Thorndike box, suppose that each cat was repeatedly subjected to a Thorndike box until her escape from the box could be performed in under a minute. A pdf of the number of trials required for each cat to reach an escape time of less than a minute was recorded and is posted above. x, trials 4 5 6 7 8 9 10 p(x) 0.1 0.2 0.3 0.0 0.2 0.0 0.2 2a.) Find the expected number of trials required before a house cat can escape from the box in less than a minute.(the mean of x.) 5 pts xi pxi 4 0.1 5 0.2 6 0.3 7 0.0 8 0.2 9 0.0 10 0.2 = 6.8 1 pt for Notation 2 6.8 2b.) Find the standard deviation of the number of trials required before a house cat can escape from the box in less than a minute. 8pts x px 2 i 2 i 4 2 .1 52 .2 6 2 .3 7 2 0 82 .2 9 2 0 10 2 .2 6.82 3.96 1.99 1.99 Stat 109 2c.) Sample Exam 1B Solution 614 Find the typical range of the number of trials a cat needs to escape the Thorndike box in less than a minute as the first standard deviation of x. (show your work.) 4pts (1 pt for notation) 𝝁 ± 𝝈 = 𝟔. 𝟖 ± 𝟏. 𝟗𝟗 = (𝟒. 𝟖𝟏, 𝟖. 𝟕𝟗) 3.) Kiama Blowhole Eruption Intervals: An ocean swell produces spectacular eruptions of water through a hole in the cliff at Kiama, about 120km south of Sydney, Australia, known as the Blowhole. The times in seconds between each of 64 successive eruptions from 1340 hours on 12 July 1998 were observed using a digital watch. 3a.) Circle the distribution that best describes the interval data for the blowhole eruptions: Skewed Left Symmetrical 7 10 15 18 28 40 60 83 8 10 16 21 29 42 61 83 8 10 17 25 29 47 61 87 8 10 17 25 34 51 68 89 8 11 17 26 35 54 69 91 8 11 18 27 36 55 73 95 9 12 18 27 36 56 77 146 9 14 18 28 37 60 82 169 Skewed Right 3b.) Use correct notation to express the 5 key values for a boxplot for the interval of time in seconds between eruptions at the Kiama Blowhole. 11 pts 3b1.) Find the median: 𝑛 + 1 𝑡ℎ 𝑥̃ = ( ) 2 3b2.) Determine the Quartile Criterion 𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒: 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] 𝑄1 64 + 1 𝑡ℎ =( ) 2 65 𝑡ℎ =( ) 2 𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4 [ 2 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 [ 4 ] +[ 4 +1] 2 𝑄3 𝑡ℎ 𝑛 +1] 4 𝑡ℎ 3𝑛 [ +1] 4 = 32.5𝑡ℎ = 32𝑛𝑑 + 0.5(33𝑟𝑑 − 32𝑛𝑑 ) = 28 + 0.5(28 − 28) = 28 + 0.5(0) 𝑥̃ = 28 Where square brackets indicate that we round any decimal down to find the nth value in the data set. N = 64 is divisible by 4 so: 𝑄1 = [ 𝑡ℎ 𝑛 𝑡ℎ 𝑛 ] +[ +1 ] 4 4 2 𝑎𝑛𝑑 𝑄3 = [ 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 ] +[ +1 ] 4 4 2 Stat 109 Sample Exam 1B Solution 615 3b3.) Find the 1st and 3rd Quartiles. 𝑡ℎ 𝑛 𝑡ℎ 𝑛 [4] +[4+1] 𝑄1 = 2 𝑡ℎ 64 𝑡ℎ 64 [ ] +[ 4 +1] 𝑄1 = 4 2 3.) Find the 1st and 3rd Quartiles, Continued: 𝑡ℎ 3𝑛 𝑡ℎ 3𝑛 [ 4 ] +[ 4 +1] 𝑄3 = 2 𝑡ℎ 3 ∙ 64 𝑡ℎ 3 ∙ 64 ] + [ + 1 ] 4 4 𝑄3 = 2 [ 3b4.) Find the IQR and Step. [ 16 ]𝑡ℎ + [ 17 ]𝑡ℎ 2 14 + 15 𝑄1 = 2 𝑄1 = 𝑄1 = 14.5 [ 48 ]𝑡ℎ + [ 49 ]𝑡ℎ 2 60 + 60 𝑄3 = 2 𝑄3 = 𝑄3 = 60 3b5.) Find the Outlier Thresholds IQR = Inter-quartile range LOT = 𝑄1 − 𝑆𝑡𝑒𝑝 IQR = Q3 – Q1 LOT = 14.5 − 68.25 IQR = 60 – 14.5 LOT = −53.75 IQR = 45.5 UOT = 𝑄3 + 𝑆𝑡𝑒𝑝 𝑆𝑡𝑒𝑝 = 1.5 × 𝐼𝑄𝑅 UOT = 60 + 68.25 𝑆𝑡𝑒𝑝 = 1.5 × 45.5 𝑆𝑡𝑒𝑝 = 68.25 UOT = 128.25 3c.) Draw a boxplot of the Kaima Blowhole data. Properly denote any outliers in the plot. Note! All data points that lie outside the outlier thresholds are expressed with asterisks. The boxplot whiskers terminate at the last data point to lie within the outlier thresholds. It is an error to extend the whiskers to the outlier thresholds. * * Stat 109 Sample Exam 1B Solution 4.) A population of 15 ruby throated hummingbirds, Archilochus colubris, was observed in a controlled environment as the subject of a student senior’s thesis. Of interest was the quantitative feeding behavior of humming birds when a food source is plentiful at numerous sites. Eight humming bird feeders were filled with a sugar solution suspended from scales that would digitally record the mass of each humming bird feeder after each successive feeding. The difference in the feeder mass between each successive recording provided a record of the amount of fluid consumed by a humming bird at each feeding. The data of the mass consumed at each feeding was normally distributed with a mean of 0.25 g and a standard deviation of 0.06 g. 4a.) Answer the following questions using the event variable with probability notation and show the equivalent Z score within probability notation. 4a.) At what percentile is a feeding of 0.2 grams? 8pts X 0.2 0.25 Z Z = -0.8333, then Z 0.6 616 Declare the event variable. X = grams of sugar consumed by a humming bird at a single feeding. 1pt For a vague declaration without units of measure: X = food. -1pt X = bird. -1pt X = sugar -1pt Note! Here is a number with a circle drawn around it: 0.203269 -1pt This number has no meaning without its supporting notation. P X 0.2 PZ 0.83 0.203269 Answer like this with complete notation, Not like this: Note that even if the prompt was in percentiles, we answer with probability notation. 4b.) The upper tenth percentile of feeding mass corresponds to what mass? 8pts Note: This is an upper tail probability and we must know to use the compliment value because the normal distribution tables only provide for lower tail probabilities. P X ??? PZ ??? 0.10 a) Search the body of the table for Z score associated with 0.90. b) Note that 0.899727 is closer to 0.90 than 0.901475. If you do not see this try subtracting 0.90 from both values. Then we take the associated Z-score of 1.28 for 0.899727 as our closest estimate for 0.90. c) Then PZ 1.28 0.10 d) Find the X associated with this Z score: Z X 1.28 X 0.25 X = 0.3268 0.06 e) Report the answer using probability notation. Show the transition from the real world X values to the associated Z scores. P X 0.3268 PZ 1.28 0.10 Stat 109 Sample Exam 1B Solution 617 4c.) How much mass is consumed if the feeding occurred at the 72nd percentile of feeding masses? 8pts Note that the percentiles on the normal curve run from zero to 100% and accumulate from left to right. The corresponding masses for these percentiles of feedings are shown below. a) Search the body of the table for Z score associated with 0.72. c) We find: d) Using Standardization we find x: x x 0.25 PZ 0.58 0.72 Z 0.58 0.6 e) Report your answer with full probability notation: P X 0.2848 PZ 0.58 0.72 b) P X ??? PZ ??? 0.72 4d.) The same senior thesis on the behavior of feeding for the ruby throated humming bird tracked the number of feedings made by each of the 15 hummingbirds in an hour. The feedings taken in an hour by each bird was normally distributed with a mean of 11.2 and a standard deviation of 1.8. Determine the following probabilities using correct notation. b) Declare the event variable. 1pt N = The number of feedings made by a humming bird in an hour. N = feedings -1pt: For a vague declaration 4e.) Find the probability that a randomly drawn hummingbird makes more than12 feedings in an hour. 8pt PN 12 PN 13 Note that discrete counts must be adjusted for a mapping to the continuous normal curve. Strict inequalities of “more than” must be converted numerically to “more than or equal to” if using a continuous distribution. PN 12 PN 13 PN 12.5 Z x 12.5 11.2 1.8 Z 0.72 PZ 0.72 0.764237 PN 12 1 PZ 0.72 0.235763 A continuity correction is necessary here because a discrete count of visits to a feeder must be mapped to a continuous curve of the normal distribution. Note that we always choose a half count in the direction that enlarges the shaded region beneath the normal curve. Also note that a compliment of one minus the Z-table probability must be accounted for here. In English: “About 23.6% of the time a randomly drawn humming bird will make more than 12 feedings in an hour. Answer with complete notation like this, and not this: 0.235763 Stat 109 4f.) Sample Exam 1B Solution 618 Find the probability that a randomly drawn humming bird feeds less than 10 times in an hour. 8pt PN 10 PN 9 Note that discrete counts must be adjusted for a mapping to the continuous normal curve. Strict inequalities of “less than” must be converted numerically to “less than or equal to” if using a continuous distribution. PN 10 PN 9 PN 9.5 Z x A continuity correction is necessary here because a discrete count of feedings must be mapped to a continuous curve of the normal distribution. Note that we always choose a half count in the direction that enlarges the shaded region beneath the normal curve. 9.5 11.2 1.8 Z 0.9444 PZ 0.94 0.173609 In English: “About 17.4% of the time a randomly drawn humming bird will feed less than 10 times in an hour.” PN 10 PZ 0.94 0.173609 Answer with complete notation like this, and not this: 0.173609 4g.) Find the probability that a randomly drawn humming bird feeds anywhere between 9 and 13 times (inclusive) in a given hour. 8pt P9 N 13 Note that discrete counts must be adjusted for a mapping to the continuous normal curve. Here the inclusive counts (of 9 and 13) are already specified in the problem. P9 N 13 P8.5 N 13.5 A continuity correction is still necessary because a discrete count of visits to the feeder must be mapped to a continuous curve of the normal distribution. Note that we always choose a half count in the direction that enlarges the shaded region beneath the normal curve. Z Low x Z Low 1.5 Continued 8.5 11.2 1.8 Z High x Z High 1.2778 13.5 11.2 1.8 Stat 109 Sample Exam 1B Solution 619 4g.) Find the probability that a randomly drawn humming bird feeds anywhere between 9 and 13 times (inclusive) in a given hour. 8pt Continued P9 N 13 P 1.50 Z 1.28 In English: “About 83.3% of the time a randomly drawn humming bird will feed between 9 and 13 times in an hour. P9 N 13 PZ 1.28 PZ 1.50 P9 N 13 0.899727 0.066807 Answer with complete notation like this, and not this P13 N 19 0.83292 P9 N 13 P 1.50 Z 1.28 0.83292 0.83292 4h.) Given that a randomly drawn humming bird feeds at the upper 4th percentile of feeding frequency, how many times does the feed in an hour on average? 8pt PN ?? PZ ?? 0.04 The prompt for an upper 4th percentile means that we must search the body of the table for the “lower” 96th percentile as the Z-tables will only give lower tailed probabilities. a) Search the body of the table for Z score associated with 0.96. b) P X ??? PZ ??? 0.96 c) We find: PZ 1.75 0.96 e) Solve for x and report your answer with full probability notation: f) PN 14.85 PZ 1.75 0.04 d) Using Standardization we find x. But careful here! We must provide for the continuity correction and for an upper tail this means that we must subtract 0.5 from the lower threshold to expand the probability space by half a visit to a humming bird feeder. x x 0.5 11.2 x 11.7 Z 1.75 1.8 1.8 x 14.85 A humming bird at the upper 4th percentile will feed an average of 14.85 times per hour. Stat 109 Sample Exam 1B Solution 620 5a.) The Colorado pikeminnow (AKA white or Colorado river salmon, Ptychocheilus lucius) is the largest minnow native to North America, and it is well known for its spectacular fresh water spawning migrations and homing ability. Despite a massive recovery effort, its numbers decline. Hampered by a loss of habitat, the young of this once abundant fish is overwhelmed in its nursery habitat by invasive small fishes (such as red shiner and fathead minnow). Sites sampled from over 75 tributaries of the Colorado river found that both the invasive species of red shiner or fathead minnow was present in 38% of sampled sites. Given that 55% of the river sites had red shiners present and that 47% of the sampled sites had fathead minnows present, what portion of the sampled sites had either invasive species present? Use probability notation to express your answer. 6pts PR F PR PF PR F Declaration and notation: 2pts PR F 0.55 0.47 0.38 R =Red shiners are present in sample. F =Fatheads are present in sample. PR F 0.64 5b.) What is the probability that a tributary site of the Colorado has fathead minnows but not red shiners? Use probability notation to express your answer. 6pts 𝑃(𝐹 ∩ 𝑅̅ ) = 𝑃(𝐹) − 𝑃(𝐹 ∩ 𝑅) 𝑃(𝐹 ∩ 𝑅̅ ) = 0.47 − 0.38 𝑃(𝐹 ∩ 𝑅̅ ) = 0.09 5c.) What is the probability that a tributary site of the Colorado has either both invasive species of fathead minnows and red shiners present or has neither invasive species present? Use probability notation to express your answer. 8pts 𝑃((𝐹 ∩ 𝑅) ∪ (𝐹̅ ∩ 𝑅̅ )) = 1 − 𝑃(𝐹) − 𝑃(𝑅) + 2𝑃(𝐹 ∩ 𝑅) 𝑃((𝐹 ∩ 𝑅) ∪ (𝐹̅ ∩ 𝑅̅ )) = 1 − 0.47 − 0.55 + 2 ∙ 0.38 𝑃((𝐹 ∩ 𝑅) ∪ (𝐹̅ ∩ 𝑅̅ )) = 0.74 5d.) Find the probability that more than 6 of 9 Colorado tributaries have either red shiner or fat head minnow presence. Declare the variable. Use probability notation to express your answer. 8pts X = The number of 9 Colorado tributary sites that have either invasive species. 1pt 9 9 9 P X 6 P X 7 0.647 0.362 0.6480.361 0.649 0.360 7 8 9 P X 6 0.205195 + 0.091198 + 0.018014 = 0.31441 Stat 109 5e.) Sample Exam 1B Solution 621 Find the probability that at least 8 of 9 Colorado tributaries have fat head minnow presence. 8pts X = The number of 9 Colorado tributary sites that have fat head minnows. 1pt 9 9 P X 8 0.4780.531 0.479 0.530 8 9 P X 8 0.011358 + 0.001119 = 0.012477 5e.) Find the probability that a majority of 9 Colorado tributaries have red shiner presence. 8pts X = The number of 9 Colorado tributary sites that have red shiners. 1pt 9 9 9 9 9 P X 5 0.5550.454 0.556 0.453 0.557 0.452 0.5580.451 0.559 0.450 5 6 7 8 9 P X 5 0.260036 + 0.211881 + 0.110986 + 0.033912 + 0.004605 = 0.62142 5f.) Find the first standard deviation window to express the typical range of 9 Colorado tributaries that have red shiner presence. 8pts = 9 0.55 9 0.551 0.55 4.95 1.492481 3.458, 6.172 Notation 1pt “About 3.5 to 6.2 of 9 Colorado tributaries will have red shiner presence.” 𝑎. ) 𝑥̅ 𝑎𝑛𝑑 𝜇 ? 5g.) Describe the difference in meaning between these symbols: 𝑏. ) 𝑝̂ 𝑎𝑛𝑑 𝑝 ? 𝑐. ) 𝑥̃ 𝑎𝑛𝑑 𝜂 ? 𝑎. ) 𝑥̅ denotes the sample mean while 𝜇 denotes the population mean (or true mean). 𝑏. ) 𝑝̂ denotes the sample proportion while p denotes the population proportion (or true proportion). 𝑐. ) 𝑥̃ denotes the sample median while 𝜂 denotes the population median (or true median). Stat 109 Sample Exam 1B Solution 622 6.) The 4 basic categories of human blood types (O, A, B, and AB) are coupled with an Rh factor that is denoted with a plus (+) for its presence or minus sign (−) for its absence. This Rh factor is found predominantly in rhesus monkeys, and to varying degree in human populations. For the US population it is present in 83.3% of the population. Given that 40 people from the United States are randomly drawn, what is the probability the sample proportion has between 80% to 90% of folks with an Rh + factor for their blood type? Show all work and use proper notation for full credit. Finish with an English sentence. 14pts Y pˆ pˆ 0.5 Method 1: Declare proportions with the continuity correction. n n Declare the parameter: 𝑝̂ = proportion of 40 randomly drawn Americans with Rh+ blood types. 2pt x = USA Rh+ blood x = Rh+ blood types x = Proportion of USA with Rh+ p = proportion of USA with Rh+ -2pt for all of these inadequate declarations. If you are going to use calculations that determine the probability that a sample proportion will range between some specified ̂, that we must declare. Do thresholds, then this is this variable, 𝒑 not use “x” or “y” unless you are declaring for counts and plan to calculate accordingly (see the next page.) Be sure to include the sample size as this will affect probability. 0.5 0.5 pˆ 0.90 P0.8 pˆ 0.9 P 0.80 40 40 P0.8 pˆ 0.9 P0.7875 pˆ 0.9125 Convert proportions to Z scores. Use: pˆ p P0.8 pˆ 0.9 P???? Z ??? 0.7875 .833 Z P0.8 pˆ 0.9 P 0.833 1 0.833 40 P0.8 pˆ 0.9 P 0.7715 Z 1.3481 P0.8 pˆ 0.9 P( Z 1.35) PZ 0.77 P0.8 pˆ 0.9 = 0.911492 – 0.220650 P0.8 pˆ 0.9 0.690842 Z p(1 p) n 0.9125 .833 0.833 1 0.833 40 Answer in English: “There is about a 69.1% chance that between 80% and 90% of 40 randomly drawn Americans will have Rh+ blood types. Stat 109 Sample Exam 1B Solution 623 Problem #6) Method 2: Convert proportions to counts, then use the continuity correction. ̂= 𝒑 𝒀 → 𝒏 ̂∙𝒏 𝒀=𝒑 Declare the parameter: Y = number of 40 randomly drawn Americans with Rh+ blood types. 2pt x = Rh+ blood x = American Rh+ blood x = Number of Americans with Rh+ blood. P = proportion of Rh+ blood. -2pt for all of these inadequate declarations. If you are going to use calculations that determine the probability that a sample count will range between some specified thresholds, then this is the variable that we must declare. Be sure to include the sample size as this will affect probability. 𝑃(0.8 ≤ 𝑝̂ ≤ 0.9) = 𝑃(0.8 ∙ 40 ≤ 𝑝̂ ∙ 𝑛 ≤ 0.9 ∙ 40) = 𝑃(32 ≤ 𝑌 ≤ 36) 𝑃(32 ≤ 𝑌 ≤ 36) ≈ 𝑃(32 − 0.5 ≤ 𝑌 ≤ 36 + 0.5) = 𝑃(31.5 ≤ 𝑌 ≤ 36.5) After applying the continuity correction we convert the counts of Americans with Rh+ blood types to Z-scores using the binomial expressions for the mean and standard deviation: 𝜇 = 𝑛 ∙ 𝑝 and 𝜎 = √𝑛 ∙ 𝑝(1 − 𝑝). Note that p = 0.833. 𝑍= 𝑍= 𝑋−𝜇 𝜎 𝑋−𝑛∙𝑝 √𝑛 ∙ 𝑝(1 − 𝑝) 31.5 − 40(.833) 36.5 − 40(.833) 𝑃(31.5 ≤ 𝑌 ≤ 36.5) = 𝑃 ( ≤𝑍≤ ) √40(0.833)(1 − 0.833) √40(0.833)(1 − 0.833) 𝑃(31.5 ≤ 𝑌 ≤ 36.5) = 𝑃(−0.7715 ≤ 𝑍 ≤ 1.3481) ≈ 𝑃(−0.77 ≤ 𝑍 ≤ 1.35) We round to the nearest Hundredths for the Z-table probability values. Answer in English: “There is about a P31.5 Y 36.5 P( Z 1.35) PZ 0.77 69.1% chance that between 80% and P31.5 Y 36.5 = 0.911492 – 0.220650 P31.5 Y 36.5 0.690842 90% of 40 randomly drawn Americans will have Rh+ blood types. Stat 109 Blank Page for Separation…. Sample Exam 1B Solution 624 Stat 109 Formula Sheet For Exam 2 Name____________ 625 List of Formulas P(Type 1 error) = P(reject H 0 when H 0 is true) Binomial Distribution: n n j PY j p j 1 p j P(Type 2 error) = P(do not reject H 0 when H A is true) Power = 1-P(Type 2 error) = P(reject H 0 when H A is true) CI on proportion pˆ 1 pˆ Y where pˆ pˆ Z 2 n n ~ p Z 2 Optimal N: 2 𝑛= (𝑍𝛼⁄2 ) 4𝐸 2 ~ p 1 ~ p Y 2 where ~ p n4 n4 One Sample Proportion test: pˆ p 0 Ample Sample: npo(1 – po) > 10 Z Sample p 0 (1 p 0 ) n Independent Dependent One Sample t-test and Paired t-test: Two-Sample t-test: df = s12 s2 n 2n 1 2 2 2 s s n n 1 2 n1 1 n2 1 2 1 2 2 2 y1 y 2 t Sample 2 1 s12 s 22 n1 n2 t Sample y 0 sd n CI = y t n 2 When is known: CI = y Z 2 2 s s CI = y1 y 2 t 2 n1 n2 sd Optimal N: 𝑛=( 𝑍𝛼⁄ 𝜎 2 𝐸 Wilcoxon-Mann-Whitney test: U Sample max K1 , K 2 Sign test: BSample max N , N for 2-tailed K1 Count of values in sample 2 < sample 1. K 2 Count of values in sample 1 < sample 2. BSample N for upper tailed test Wilcoxon’s- Rank Sum test: n = size of the group with the larger rank-sum n’ = size of the group with the smaller rank-sum n ( n 1) U Sample = larger rank-sum 2 n 2 2 ) BSample N for lower tailed test Wilcoxon’s Signed-Rank test: 2-tailed: WSample max sum of - ranks , sum of ranks WSample sum of - ranks Lower tailed test WSample sum of ranks Upper tailed test Table 4: Test of hypothesis t-Table. Assume: Normal Distribution 0.20 df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 1000 ∞ 0.40 1.37638 1.06066 0.97847 0.94097 0.91954 0.90570 0.89603 0.88889 0.88340 0.87906 0.87553 0.87261 0.87015 0.86806 0.86625 0.86467 0.86328 0.86205 0.86095 0.85996 0.85907 0.85827 0.85753 0.85686 0.85624 0.85567 0.85514 0.85465 0.85419 0.85377 0.85070 0.84887 0.84765 0.84679 0.84614 0.84563 0.84523 0.841981 0.841621 .10 .20 3.07768 1.88562 1.63778 1.53320 1.47589 1.43977 1.41493 1.39685 1.38303 1.37215 1.36342 1.35621 1.35019 1.34502 1.34060 1.33677 1.33338 1.33036 1.32775 1.32533 1.32320 1.32125 1.31944 1.31783 1.31636 1.31497 1.31369 1.31253 1.31142 1.31038 1.30308 1.29868 1.29581 1.29376 1.29222 1.29103 1.29007 1.28200 1.28155 626 Level of Significance for one-tailed test .05 .025 .01 .005 Level of Significance for two-tailed test .10 .05 .02 .01 6.31375 12.7062 31.8206 63.6570 2.91999 4.3027 6.9646 9.9248 2.35341 3.1825 4.5407 5.8410 2.13184 2.7764 3.7470 4.6041 2.01505 2.5706 3.3649 4.0321 1.94317 2.4469 3.1427 3.7075 1.89456 2.3646 2.9980 3.4995 1.85953 2.3060 2.8965 3.3554 1.83313 2.2622 2.8215 3.2498 1.81244 2.2281 2.7638 3.1693 1.79588 2.2010 2.7181 3.1058 1.78228 2.1788 2.6810 3.0545 1.77094 2.1604 2.6503 3.0123 1.76133 2.1448 2.6245 2.9768 1.75307 2.1315 2.6025 2.9467 1.74587 2.1199 2.5835 2.9208 1.73962 2.1098 2.5669 2.8982 1.73407 2.1009 2.5524 2.8784 1.72911 2.0930 2.5395 2.8610 1.72474 2.0860 2.5280 2.8453 1.72074 2.0796 2.5176 2.8314 1.71715 2.0739 2.5083 2.8187 1.71389 2.0687 2.4999 2.8074 1.71087 2.0639 2.4921 2.7969 1.70813 2.0595 2.4851 2.7874 1.70563 2.0556 2.4786 2.7787 1.70326 2.0519 2.4727 2.7707 1.70112 2.0484 2.4671 2.7633 1.69911 2.0452 2.4620 2.7564 1.69724 2.0423 2.4573 2.7500 1.68386 2.0211 2.4232 2.7045 1.67589 2.0085 2.4033 2.6778 1.67065 2.0003 2.3902 2.6604 1.66692 1.9944 2.3808 2.6480 1.66413 1.9901 2.3739 2.6387 1.66196 1.9867 2.3685 2.6316 1.66024 1.9840 2.3642 2.6259 1.64600 1.9620 2.3300 2.5810 1.64485 1.9600 2.3264 2.5758 Table 3B: Test of Hypothesis z-Table. Table 9A Critical Chi-Squared values Level of Significance 0.10 0.05 0.02 0.01 df 0.20 Onetailed .10 .05 .025 .01 .005 .001 .0005 .00005 .000005 Twotailed .20 .10 .05 .02 .01 .002 .001 .0001 .00001 z 1.282 1.645 1.960 2.326 2.576 3.090 3.291 3.891 4.491 1 1.64 2 3.22 3 4.64 4 5.99 5 7.29 6 8.56 7 9.80 8 11.03 9 12.24 10 13.44 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 5.41 7.82 9.84 11.67 13.39 15.03 16.62 18.17 19.68 21.16 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 .0005 .001 636.607 31.598 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.849 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.435 3.416 3.402 3.391 3.300 3.291 0.001 10.83 13.82 16.27 18.47 20.51 22.46 24.32 26.12 27.88 29.59 0.0001 15.14 18.42 21.11 23.51 25.74 27.86 29.88 31.83 33.72 35.56 Stat 109: Nonparametric Critical values for paired medians 627 Table 7: B-Critical Values of for the Sign Test: Note that because the null distribution is discrete the actual tail probabilities corresponding to a given critical value is typically somewhat less than the column heading. Accordingly, if BSample = BCritical, bracket the p-value to the right of any matching sample and critical values. Nominal Tail Probability 2 tail: .20 .10 .05 .02 .01 .002 .001 nd 1 tail: .10 .05 .025 .01 .005 .001 .0005 1 2 3 4 5 5 5 6 6 6 6 7 6 7 7 7 8 7 7 8 8 8 9 7 8 8 9 9 10 8 9 9 10 10 10 11 9 9 10 10 11 11 11 12 9 10 10 11 11 12 12 13 10 10 11 12 12 13 13 14 10 11 12 12 13 13 14 15 11 12 12 13 13 14 14 Table 8: W-Critical Values of for the Wilcoxon Signed-rank Test: Note that because the null distribution is discrete the actual tail probabilities corresponding to a given critical value is typically somewhat less than the column heading. Accordingly, if WSample = WCritical, bracket the p-value to the right of any matching sample and critical values. Nominal Tail Probability 2 tail: .20 .10 .05 .02 .01 .002 nd 1 tail: .10 .05 .025 .01 .005 .001 1 2 3 4 10 5 13 15 6 18 19 21 7 23 25 26 28 8 28 31 33 35 36 9 35 37 40 42 44 10 41 45 47 50 52 55 11 49 53 56 59 61 65 12 57 61 65 69 71 76 13 65 70 74 79 82 87 14 75 80 84 90 93 99 15 84 90 95 101 105 112 .001 .0005 66 77 89 101 114 Stat 109: Nonparametric Critical values for independent medians 628 Stat 109: Nonparametric Critical values for independent medians 629 Stat 109 Chi-square Critical Values for Tail End Curve Areas 630 Stat 109 Exam 2A Sample Exam 631 1.) In a study of the development of the thymus gland, researchers weighed the glands of ten chick embryos. Five of the embryos had been incubated for 14 days and another five had been incubated for 15 days. The thymus gland weights in mg for both groups are shown in the table. 14 days 15 days Difference 29.6 32.7 -3.10 1a.) Run an appropriate 7 step hypothesis test to 21.5 40.3 -18.8 determine if the mean thymus gland weight at day 14 is 28.0 23.7 4.3 significantly different than the weight at day 15. 34.6 25.2 9.4 44.9 24.2 20.7 nd 5 n1 5 n2 5 y1 31.72 y2 29.22 y d 2.5 s1 8.73 s2 7.19 s d 14.72 1b.) 2.) 2a) Note that the chicks that were incubated longer had a smaller mean thymus gland weight. Is this “reverse” result surprising, or could it easily be attributed to chance? Explain why or why not. Suppose we have conducted a t test, with = 0.05, and p- value = 0.08. Determine whether the following statements are true or false. We reject H 0 when = 0.05 2b) We would reject H 0 if = 0.10 2c) If H 0 is true, then the probability of getting a sample-t at least as extreme as the one generated by our sample data is 8%. Suppose the null hypothesis was rejected. Then a type 2 error might have been made in the hypothesis conclusion. 2d) Stat 109 Exam 2A 3.) In a pharmacological study, researchers measured the concentration of the brain chemical dopamine in ng/g in six rats exposed toluene and six control rats. The concentrations in the striatum region of the brain for both groups are shown in the table. Suppose that at least one of the data sets is not normally distributed. Run an appropriate 7 step hypothesis test to determine if the dopamine level found in rats exposed to toluene is greater than that found in the controls. Sample Exam 632 Toluene Control Difference 3420 2314 1911 2464 2781 2803 1820 1843 1397 1803 2593 1990 1600 471 541 661 188 813 4.) In an experiment to compare two diets for fattening beef steers, nine pairs of steers were chosen from the herd; members of each pair were matched as closely as possible with respect to age and hereditary factors. The members of each pair were randomly allocated, one to each diet. The following table shows the weight gain in pounds of the animals over a 140 day test period on diet 1 and diet 2. Diet 1 Diet 2 Difference 596 498 98 422 460 -38 524 468 56 454 458 -4 538 530 8 552 482 70 478 528 -50 4a.) Assume that all three columns are normally distributed 564 598 -34 556 456 100 Run an appropriate 7 step hypothesis test at 0.10 to determine whether there is a significant difference in the Mean 520.4 497.6 22.9 weight gained between the two diets. SD 57.1 47.3 59.3 4b.) Assume that the column data of the steer diet comparisons are not normally distributed. Run an appropriate 7 step hypothesis test at 0.01 to determine whether there is a significant difference in the weight gained between the two diets. 4c.) Calculate the exact p-value of the data as determined by the sign test. Stat 109 Exam 2A Sample Exam 633 Stat 109 Exam 2A Solutions Sample Exam 634 1.) In a study of the development of the thymus gland, researchers weighed the glands of ten chick embryos. Five of the embryos had been incubated for 14 days and another five had been incubated for 15 days. The thymus gland weights in mg for both groups are shown in the table. 14 days 15 days Difference 1a.) Run an appropriate 7 step hypothesis test to 29.6 32.7 -3.10 determine if the mean thymus gland weight at day 14 is 21.5 40.3 -18.8 significantly different than the weight at day 15. 28.0 23.7 4.3 34.6 25.2 9.4 44.9 24.2 20.7 Total 24 pts: nd 5 n1 5 n2 5 Step1 Declare normality: “We assume normality.” y1 31.72 y2 29.22 y d 2.5 For all summarized data 1pt s1 8.73 s2 7.19 s d 14.72 (plus a t-test is asked for.) Step 2 Declare parameter: 2pts: The data sets are independent, the chick embryos were from different data sets and they are not paired even if the differences between them were provided. We declare 2 means. 3pt: 1 = Mean thymus gland weight in mg of chicks at 14 days of incubation. 2 = Mean thymus gland weight in mg of chicks at 15 days of incubation. Step 3: Declare the Hypothesis and LOS: H 0 : 1 2 H A : 1 2 0.05 2pts: Step 5: Find the Critical value and draw the decision line. Reject Ho Do Not Reject Ho tCritical = -2.3646 3pt: Step 4: Calculate the df. s s n n 1 2 2 1 df = 2 2 2 2 s12 s 22 n n 1 2 n1 1 n2 1 8.73 7.19 5 5 8.73 5 7.19 5 2 2 2 2 5 1 2 2 2 5 1 2 Reject Ho tCritical = +2.3646 3pts: Step 6 :Calculate the T.S. y y2 31.72 29.22 t Sample 1 2 2 s1 s 2 8.73 2 7.19 2 5 5 n1 n2 tSample = 0.49428 df = 7.7165, but we round down to df = 7. 10pts: Step 7: Stats conclusion: 2pts: At the 5% LOS we do not reject H 0 : 1 2 , Because: 2pts: English conclusion: t Critical t Sample t Critical which yields: 3pts: -2.3646 < 0.49428 < 2.3646 p 3pts: (p > 0.40) > 0.05 There is not a significant difference in the weight of the chick thymus gland between day 14 and day 15. Stat 109 1b.) Exam 2A Solutions Sample Exam 635 Note that the chicks that were incubated longer had a smaller mean thymus gland weight. Is this “reverse” result surprising, or could it easily be attributed to chance? Explain why or why not. 2pts: The reverse result is due to chance, a p-value > 0.40 tells us that there is no significant difference between the weights of either samples. 2.) 2a) 2b) Suppose we have conducted a t test, with = 0.05, and p- value = 0.08. Determine whether the following statements are true or false. False We reject H 0 when = 0.05 True We would reject H 0 if = 0.10 1pts 1pts (We reject H 0 whenever p < ) 2c) 2d) If H 0 is true, then the probability of getting a sample-t at least as extreme as the one generated by our sample data is 8%. (This is the basic description of the p-value) Suppose the null hypothesis was rejected. Then a type 2 error might have been made in the hypothesis conclusion. (A type 1 error occurs when we reject H 0 incorrectly) (A type 2 error occurs when we accept H 0 incorrectly) 3.) In a pharmacological study, researchers measured the concentration of the brain chemical dopamine in ng/g in six rats exposed toluene and six control rats. The concentrations in the striatum region of the brain for both groups are shown in the table. Suppose that at least one of the data sets is not normally distributed. Run an appropriate 7 step hypothesis test to determine if the dopamine level found in rats exposed to toluene is greater than that found in the controls. True 1pts False 1pts Toluene 3420 2314 1911 2464 2781 2803 Control Difference 1820 1843 1397 1803 2593 1990 Solution: Since normality fails we are performing a hypothesis test comparing the median dopamine levels in ng/g between toluene exposed rats and controls. We apply the Wilcoxon-Mann-Whitney test with ordered counts: Total 23 pts: Step 1: Declare normality: We assume the data set is not normally distributed. Step 2: Declare Parameters: 2pts: Step 3: 3pts: Declare the Hypothesis and the LOS. 1 = Median dopamine level in ng/g for rats exposed to toluene. 2 = Median dopamine level in ng/g for the control rats. H 0 : 1 2 H A :1 2 0.05 1600 471 541 661 188 813 Stat 109 Exam 2A Solutions Sample Exam 636 Continued.. Step 4: Draw the decision line and find UCritical: 3pts: Do Not Reject Ho U Critical 29 #3, Step 5 :Calculate the T.S. Reject Ho 5pts Make an ordered count of the grouped data: Toluene: 1911 2314 Control: 1397 1803 1820 1843 1990 Sum both Counts: 2464 K1 4 + 5 + 5 + 6 + 6 + 6 = 32 K2 0 + 0 + 0 + 0 + 1 + 3 = 4 2781 2803 3420 2593 t.s. = U Sample max K1 , K 2 = 32 Take the larger count as the U Sample #3, Step 6: Bracket the p-value: Go to table 6:(Wilcoxon-Mann-Whitney Statistic) to find the U Critical and p-values. Using n = larger sample size n' = smaller sample size Then n = 6 and n’ = 6 A 1 tailed = 0.05, yields: U Sample = 32 U Critical = 29 0.01 < p < 0.025 8pts: Step 7: Stats conclusion: 2pts: At the 5% LOS we reject H 0 : 1 2 , 3pts: U Sample U Critical which yields: 32 > 29 2pts: English conclusion: Because: p 3pts: (0.01 < p < 0.025) < 0.05 The median dopamine level in mg/g of toluene exposed rats is significantly greater that that found in control rats. 4.) In an experiment to compare two diets for fattening beef steers, nine pairs of steers were chosen from the herd; members of each pair were matched as closely as possible with respect to age and hereditary factors. The members of each pair were randomly allocated, one to each diet. The following table shows the weight gain in pounds of the animals over a 140 day test period on diet 1 and diet 2. Diet 1 Diet 2 Difference 596 498 98 422 460 -38 524 468 56 454 458 -4 538 530 8 552 482 70 478 528 -50 4a.) Assume that all three columns are normally distributed 564 598 -34 556 456 100 Run an appropriate 7 step hypothesis test at 0.10 to determine whether there is a significant difference in the Mean 520.4 497.6 22.9 weight gained between the two diets. SD 57.1 47.3 59.3 Stat 109 Exam 2A Solutions Sample Exam 637 Solution to #4a) Total 21 pts: Step1 Declare normality: “We assume normality.” Step 2 Declare parameter: The data sets are paired for comparison and are therefore dependent. We declare a mean difference. 2pts: d Mean difference of the weight gained in pounds between 2 diets: Diet1- Diet 2. 3pt: Step 3 Declare the LOS and Hypothesis: H 0 : d 0 H A : d 0 4pts: Step 5 :Calculate the T.S. x 0 22.9 0 t Sample 1.1585 sd 59.3 n 9 0.10 3pts: Step 4: Draw the decision line and find tCritiical: Reject Ho Do Not Reject Ho Reject Ho t Critical -1.860 t Critical +1.860 Step 6: Bracket the p-value. From a 2-tailed 0.10 with df = nd 1 8 we find: p-value: ( 0.20 < p < 0.40) 7pts: Step 7: Stats conclusion: 1pts: At the 10% LOS we do not reject H 0 : d 0 , Because: 3pts: t Critical t Sample t Critical which yields: -1.860 < 1.1585 < 1.860 2pts: English conclusion: p 3pts: ( 0.20 < p < 0.40) > 0.10 There is not a significant difference in the weight gained in pounds between the steers on diet 1 and diet 2. 4b.) Assume that the column data of the steer diet comparisons are not normally distributed. Run an appropriate 7 step hypothesis test at 0.01 to determine whether there is a significant difference in the weight gained between the two diets. Solution to #4b) Total 21 pts: Step1 Declare normality: “We assume the data is not normally distributed.” Step 2 Declare parameter: The data sets are paired for comparison and are therefore dependent. We declare a median difference. 2pts: d Median difference of the weight gained in pounds between 2 diets: Diet1- Diet 2. Stat 109 Exam 2A Solutions 3pt:Step 3: Declare hypothesis and the LOS: H 0 : d 0 H A : d 0 Sample Exam 3pts: Step 5 :Calculate the T.S. N 5 , N 4 0.01 BSample max( N , N ) BSample 5 We use a sign test (Table 7 page 684) for the differences of medians. 638 98 -38 56 -4 8 70 -50 -34 100 Step 4: Draw the decision line and find BCritical: 3pts: Do Not Reject Ho Reject Ho BCritical 9 Step 6: Bracket the p-value. From table 7 a 2-tailed 0.01 with nd 9 p-value: ( p > 0.20) 8pts: Step 7: Stats conclusion: 2pts: At the 1% LOS we do not reject H 0 : d 0 , Because: 3pts: BSample BCritical which yields: 5<9 2pts: English conclusion: 4c.) p 3pts: ( p > 0.20) > 0.01 There is not a significant difference in the weight gained in pounds between the steers on diet 1 and diet 2. 5pts: Calculate the exact p-value of the data as determined by the sign test. We calculate the p-value under the assumption that the null hypothesis is correct, that is that there are an even number of positive and negative signs in the median difference data set which means that either value has a 50% chance of occurrence. We take the largest value of signs as a threshold value (there were 5 positive values) and calculate upward to the most extreme case (n = 9) within a sum of binomial probabilities. Note that if there was a zero difference that the zero would be thrown out and n = 8. Also all 2-tail hypotheses must have these probabilities doubled. Then for a one tailed hypothesis the calculation is: 9 9 9 9 9 PY 5 0.55 0.5 4 0.5 6 0.53 0.5 7 0.5 2 0.58 0.51 0.59 0.5 0 = 0.5 5 6 7 8 9 Recognize that since success and failure both equal 50%, the binomial calculation can be reduced to : See the last page of Week 9 Day 2 lecture notes for more examples of the 9 9 9 9 9 PY 5 0.59 0.5 calculation of the exact p-value. 5 6 7 8 9 Because this is a 2-tailed hypothesis: p-value = 2 PY 5 2 0.5 1 p = 1 (p > 0.999) Stat 109 Exam 2 Prep A Brief Note to Students who are Struggling to Pass: A key concept from Day 1: Notation We denote the true population parameter (mean, median, sd, or proportion, etc…) with Greek letters: p etc... The random samples that we use as our best estimate for these parameters are denoted with accented notations: 𝑥̅ , 𝑥̃, 𝑠, 𝑝̂ , 𝑒𝑡𝑐.. Confusing the notation of parameters and sample data means that you are unable to express the difference between the true population value and the sample value used to estimate it (even if you do understand the difference, your choice of notation says otherwise). Worse, switching the notations for the proportion with the mean, or that of the median for the proportion shows a lack of understanding of what each symbol means. Previous exams from struggling students have had an accumulated loss 10 to 25 points for a chronic misuse of notation. Refer to the lecture notes from Day 1 to review the key concept of notation and its use. Dependent vs Independent Data: If we compare the difference between two paired data sets, for example (before – after) or (Method 1 – Method 2), the data set is dependent. While most students understand this, the gap in understanding for the struggling student comes in what to do with dependent data. Keep in mind that a confidence interval or hypothesis test will be applied to the column of differences of the dependent data. Taken as one data set, the sample of differences for the mean will be treated the same as a t-interval or a one sample t-test. The sample of differences for the median will be treated the same as an s-interval or a sign-test, or Wilcoxon signed-rank test. The notation however will be denoted as d (mean difference), or d (median difference). It is essential that in interpreting the result of the confidence interval or hypothesis test that you match the column of differences (first – second) with the correct scenario: its either a (small – large) or a (large – small) scenario. Independent data will also be presented in a two column format but here the columns will remain distinct because we are making a comparison of two distinct populations. Two means are compared in a two-sample t-test and denoted as 1 and 2. Two medians are compared in a Wilcoxon- Mann-Whitney test or Wilcoxon’s Rank- Sum test and denoted as 1 and 2. Again it is essential that in interpreting the result of the confidence interval or hypothesis test that you match the column of differences (first – second) with the correct (small – large) or (large – small) scenarios. T and Z Lower tail alternate hypotheses: HA: HA: p Because t and z distributions are symmetrical about zero, a less than inequality in the alternate hypothesis should serve as reminder that the critical value will be negative. Neglect this and you will have a false conclusion. 639 Stat 109 Exam 2 Prep 640 A Brief Note to Students who are Struggling to Pass: Forming the hypothesis statement: You must know by now that when you run a hypothesis test, it is the null hypothesis that takes the case of equality (even if this equality statement for H0 implies the opposite of HA). The prompt to run the hypothesis (or suspicion) is always translated to the alternate hypothesis, not the null. The null hypothesis will use either =, or < or >, while the alternate hypothesis will use one of 3 inequalities: ≠, 𝑜𝑟 < , 𝑜𝑟 >. If rejecting H0 it must be that p < and vice-versa. The helpful hint of using the inequality of < or > within the alternate hypothesis, HA, as an arrow to determine the region of rejection on decision line applies ONLY to Ztables and t-tables. It is an error to extend this idea to the decision lines of other hypothesis tests. The decision lines for ALL other hypothesis tests (medians, Chisquare, ANOVA, etc..) will ALWAYS have the rejection region at the far right. Look, for t-tests (and z-tests too) this is valid because: H 0 : d 0 H A : d 0 H 0 : 1 2 H A : 1 2 2-tailed decision for independent and dependent means: Reject Ho Do Not Reject Ho Reject Ho t Critical t Critical H 0 : d 0 H A : d 0 H 0 : 1 2 H A : 1 2 Lower decision for independent and dependent means: Reject Ho Do Not Reject Ho t Critical H 0 : d 0 H A : d 0 H 0 : 1 2 H A : 1 2 Upper decision for independent and dependent means: Do Not Reject Ho Reject Ho t Critical But when we look at hypothesis tests for Medians we have only ONE decision line and it ALWAYS has the rejection region at the far right. Look at the table data to see why. H 0 : d 0 H A : d 0 H 0 :1 2 H A :1 2 H 0 : d 0 H A : d 0 H 0 :1 2 H A : 1 2 Do Not Reject Ho Reject Ho BCritical U Critical WCritical H 0 : d 0 H A : d 0 H 0 :1 2 H A : 1 2 Stat 109 Exam 2 Prep 641 A Brief Note to Students who are Struggling to Pass: Do Not apply the concept of Degrees of Freedom to tables of critical values for a median hypothesis. For example, the notation of nd stands for the number of differences. If there 6 differences that are being compared DO NOT subtract one and jump above to row 5 to find the critical value. The concept of degrees of freedom is applied to t-tables, chi-square tables, and F-tables, But NOT to the hypothesis tests of median values. All Confidence Intervals will take their critical values from the columns of 2-tailed tests, NOT one tailed tests. Look at why this would be so: (Lower bound, Upper bound) 2-tails Reading the values of the confidence interval from left to right should follow that of a number line from smallest to largest: Like this: (Smallest, Largest) Not like this: (largest, smallest) Calculator entry. About one out of 4 science majors do not know how to use their calculators correctly. Does the first row of calculations look familiar? This method is time consuming, inefficient and often introduces error due to round off error. Learn to follow the approach of the second line of calculations. If you make a mistake with the second method at least you can trace your error within the calculator screen, this is not so easily done with the herky-jerky processing of the first method. This method takes 3 times as many key-strokes and often introduces round off error. 3.45 2.68 32 25 df 3.45 32 2.68 25 3.45 2.68 32 25 df 3.45 32 2.68 25 55.0058 2 2 2 2 31 2 2 31 0.371953 0.287292 0.004462 0.003439 24 2 Pick up the calculator ONE TIME only! 2 2 2 2 2 2 24 2 df s12 s2 n 2n 1 2 2 2 s12 s 22 n n 1 2 n1 1 n2 1 2 0.434601 55.0058 0.007901 Stat 109 Exam 2 Prep A Brief Note to Students who are Struggling to Pass: The most common mistakes on Exam 2 are listed below: A Crucial Point: Not recognizing the difference in the wording between problems using independent versus dependent data sets is the key indicator of a low score on Exam 2. Choosing the wrong test results in a loss of half credit for a given problem. Partial credit can be made if no further mistakes are made with the application of the wrong hypothesis test. Not recognizing that confidence intervals are two tailed! We should use two tailed critical values in constructing confidence intervals, not one tailed critical values. Neglecting to learn the calculation of the binomial formula with your calculator. You know that it is very likely that binomial calculations will be required on the 2nd exam. It was also required on the 1st exam. Why not just admit that you need to come in to office hours and get help with using your calculator? Once you take the time to learn it the binomial formula is pretty straight forward. See the last pages of Week 9 Day 2 Lecture for calculating exact pvalues. Not recognizing that there are 2 optimal N formulas: One for sample means, the other for sample proportions. 642 Stat 109 Exam 2B Prep 1.) In a pediatric clinic a study is carried out to see how effective an over the counter medication is in reducing temperature. Ten 5-year-old children suffering from influenza had their temperature (Fo) taken immediately before and 1 hour after administration of the medication. The results are given in the table at the right. Does the evidence support the case that the medication reduces fever? Assume normality passes for all three data columns. Test the hypothesis at the 10% LOS. Total 20 pts: Sample Exam Patient 1 2 3 4 5 6 7 8 9 10 𝑥̅ = sd = Before Medication 102.4 101.9 103.0 101.2 100.7 102.5 102.8 101.1 101.9 101.4 101.89 0.778 643 After Difference Medication 99.6 2.8 100.2 1.7 101.4 1.6 99.8 1.4 100.7 0 101.2 1.3 100.7 2.1 102.3 -1.2 102.3 -0.4 100.2 1.2 100.86 1.05 0.952 1.219 Stat 109 Exam 2B Prep 2.) Repeat the previous problem with a full 7 step hypothesis under the assumption that normality does not pass. Use Wilcoxon’s Signed Rank Test. Does the evidence support the case that the medication reduces fever? Test the hypothesis at the 10% LOS. Total 25 pts: Sample Exam Diff. 2.8 1.7 1.6 1.4 0.0 1.3 2.1 -1.2 -0.4 1.2 3.) Regardless of the conclusion you found in the first problem suppose your hypothesis found a pvalue of 0.06. Interpret the meaning of the p-value in terms of the context of the problem (temperature, medication etc..) If p = 0.06 can be read as a percentage value, then 6% of what must be true? 8pts. 644 Stat 109 Exam 2B Prep Sample Exam 4.) Regardless of the conclusion you found in the hypothesis test for 5-year old’s temperature reduction with under flu medication, Explain the consequences of a rejection error. What does the data lead us to conclude, and what is actually true? 6pt 5.) Find a 90% confidence interval on the mean body mass index (BMI = kg/m2) of high school boys given that a random sample of 134 high school boys yields mean BMI of 21.8 with a standard deviation of s = 3.4. Answer with an English sentence that uses the bounds of the CI. 6pts 6.) Suppose we want to estimate the true mean BMI for high school boys to within 0.01 BMI units. How many high school boys must be sampled for the margin of error of the 90% confidence interval to be within 0.01 BMI units of the true mean BMI value? Assume a reliable standard deviation of 𝜎 = 3.51 6pts 7.) Interpret the meaning of confidence in the context of the 90% confidence interval you constructed on the mean BMI of high school boys. Use the specific interval you constructed to help answer the question that 90% of what must be true? 6pt 645 Stat 109 Exam 2B Prep 8.) A student sought to demonstrate that soybeans inoculated with nitrogen-fixing bacteria yield more and grow more adequately without the use of expensive and environmentally deleterious synthesized fertilizers. The trial was conducted under controlled conditions with uniform amounts of soil. There were 8 inoculated plants compared against 8 uninoculated plants. The soybean pod weight (in grams) was recorded for each plant. Does the evidence support the student’s suspicion? Assume that the data set does NOT pass normality. Test the hypothesis at the 10% LOS. Total 23 pts: Sample Exam 646 Plot Inoculated Uninoculated Difference 1 1.76 0.49 1.27 2 1.45 0.85 0.60 3 1.03 1.00 0.03 4 1.53 1.53 0.00 5 2.34 1.01 1.33 6 1.96 0.75 1.21 7 1.79 2.11 -0.32 8 1.21 0.92 0.29 1.634 1.083 1.551 𝑥̅ = 0.420 0.509 0.651 sd = Stat 109 Exam 2B Prep Solution 1.) In a pediatric clinic a study is carried out to see how effective an over the counter medication is in reducing temperature. Ten 5-year-old children suffering from influenza had their temperature (Fo) taken immediately before and 1 hour after administration of the medication. The results are given in the table at the right. Does the evidence support the case that the medication reduces fever? Assume normality passes for all three data columns. Test the hypothesis at the 10% LOS. Patient Total 20 pts: Step1 Declare normality: “We assume normality.” 1 2 3 4 5 6 7 8 9 10 𝑥̅ = sd = Name_____________647 Before Medication 102.4 101.9 103.0 101.2 100.7 102.5 102.8 101.1 101.9 101.4 101.89 0.778 After Difference Medication 99.6 2.8 100.2 1.7 101.4 1.6 99.8 1.4 100.7 0 101.2 1.3 100.7 2.1 102.3 -1.2 102.3 -0.4 100.2 1.2 100.86 1.05 0.952 1.219 Step 2 Declare parameter: The data sets are paired for comparison and are therefore dependent. We declare a mean difference. 2pts: d Mean difference between before and after temperatures (Fahrenheit, Fo) for 5 year olds taking medication for flu symptoms. 1pt: Step 3 Declare the LOS: 0.10 4pts: Step 4: Declare hypothesis: 3pts: Step 5: Calculate the t-sample value: t Sample H 0 : d 0 H A : d 0 x 0 1.05 0 2.724 sd 1.219 n 10 Step 6: Find the critical value and draw the decision line. For a 1-tailed 0.10 with df = nd 1 9 we find: Do Not Reject Ho Reject Ho t Critical 1.38303 8pts: Step 7: Stats conclusion: 2pts: At the 10% LOS we reject H 0 : d 0 , Because: 3pts: t Sample t Critical which yields: 2.724 > 1.378303 2pts: English conclusion: p 3pts: ( 0.01 < p < 0.025) < 0.10 The medication significantly lowers the temperature of a 5 year old child with flu. Stat 109 Exam 2B Prep Solution 2.) Repeat the previous problem with a full 7 step hypothesis under the assumption that normality does not pass. Use Wilcoxon’s Signed Rank Test. Does the evidence support the case that the medication reduces fever? Test the hypothesis at the 10% LOS. Total 25 pts: Step1 Declare normality: “We assume normality fails.” Step 2 Declare parameter: The data sets are paired for comparison and are therefore dependent. We declare a median difference. 2pts: d Median difference between before and after temperatures (Fahrenheit, Fo) for 9 year olds taking medication for flu symptoms. 1pt: Step 3: Declare the LOS: 0.10 648 Diff. ABS rank 2.8 9 1.7 7 1.6 6 1.4 5 0 1* 1.3 4 2.1 8 -1.2 2.5 -0.4 1 1.2 2.5 5pts: Signed ranks = – Ranks * + Ranks 9 7 6 5 * 4 8 2.5 1 3.5 2.5 41.5 Note that a Zero difference is omitted from the data set! Step 5: Calculate the W-sample value: Because HA: d > 0 we take the positive signed rank 4pts: Step 4: Declare hypothesis: H 0 : d 0 H A : d 0 WSample = 41.5 Step 6: Find the critical value and draw the decision line. For a 1-tailed 0.10 with 9 non-zero differences, d = 9 we find: Do Not Reject Ho Reject Ho WCritical 35 3pts 8pts: Step 7: Stats conclusion: 2pts: At the 10% LOS we reject H 0 : d 0 , Because: 3pts: WSample WCritical which yields: 41.5 > 35 2pts: English conclusion: p 3pts: ( 0.01 < p < 0.025) < 0.10 The medication significantly lowers the temperature of a child with flu. 3.) Regardless of the conclusion you found in the first problem suppose your hypothesis found a p-value of 0.06. Interpret the meaning of the p-value in terms of the context of the problem (temperature, medication etc..) If p = 0.06 can be read as a percentage value, then 6% of what must be true? 8pts. In general: If the null hypothesis is true, then 6% of all random draws will be at least as extreme in its contradiction of the null hypothesis than the drawn data is. (Only 2 pts credit for a general answer like this one, that is memorized like a rubber stamp and lacks any context with the problem.) In Context: If it is true that there no significant drop in temperature for children with flu who have taken the medication as opposed to their temperatures before the medication was taken, then 6% of random samples will show that the medication lowers the temperature for children with flu to an even greater extent than was shown in this study. (Full credit for a description with context.) See Quiz 7 Prep for a review. Stat 109 Exam 2B Prep Solution 4.) Regardless of the conclusion you found in the hypothesis test for 5-year old’s temperature reduction with under flu medication, Explain the consequences of a rejection error. What does the data lead us to conclude, and what is actually true? 6pt We claim that the medication is effective in lowering the temperature of 5-year olds with the flu, when in fact the medication does not lower the temperature of 5 year olds with the flu. 5.) Find a 90% confidence interval on the mean body mass index (BMI = kg/m2) of high school boys given that a random sample of 134 high school boys yields mean BMI of 21.8 with a standard deviation of s = 3.4. Answer with an English sentence that uses the bounds of the CI. 6pts With a df = n-1 =134-1=133, and rounding down to df = 100 as the next available value in the t-table, we take the 2 tailed column at an = 0.10 (as a compliment to 90% confidence) giving a t-critical value of t t 0.10 1.66024 2 y t sd 2 n 2 21.8 1.66024 3.4 134 (21.31, 22.29) English: The 90% CI on the mean BMI for high school boys is (21.31, 22.29) kg/m2. 6.) Suppose we want to estimate the true mean BMI for high school boys to within 0.01 BMI units. How many high school boys must be sampled for the margin of error of the 90% confidence interval to be within 0.01 BMI units of the true mean BMI value? Assume a reliable standard deviation of 𝜎 = 3.51 6pts We must solve for the optimal n to find this sample size. 𝑛= (𝑍𝛼⁄ 𝜎) 2 𝐸2 2 = (1.645×3.51)2 (0.01)2 = 333,384.986 always rounding up… We must sample the BMI from 333,385 high school boys to find a 90% CI on the mean with a margin of error of 0.01 BMI units. 6.) Interpret the meaning of confidence in the context of the 90% confidence interval you constructed on the mean BMI of high school boys. Use the specific interval you constructed to help answer the question that 90% of what must be true? 6pt 90% of all randomly drawn samples will form confidence intervals that contain the true mean BMI of high school boys. The specific interval that we found, (21.31, 22.29) may be one of the 90% of all confidence intervals that contains the true mean BMI or one of the 10% that fails to contain the true mean BMI for high school boys. We cannot say for sure, we only know that this method works 90% of the time. 649 Stat 109 Exam 2B Prep Solution 7.) A student sought to demonstrate that soybeans inoculated with nitrogen-fixing bacteria yield more and grow more adequately without the use of expensive and environmentally deleterious synthesized fertilizers. The trial was conducted under controlled conditions with uniform amounts of soil. There were 8 inoculated plants compared against 8 uninoculated plants. The soybean pod weight (in grams) was recorded for each plant. Does the evidence support the student’s suspicion? Assume that the data set does NOT pass normality. Test the hypothesis at the 10% LOS. Total 23 pts: Step 1: Declare normality: We assume the data set is not normally distributed. Step 2: 2pts: Declare Parameters: 650 Plot Inoculated Uninoculated Difference 1 1.76 0.49 1.27 2 1.45 0.85 0.60 3 1.03 1.00 0.03 4 1.53 1.53 0.00 5 2.34 1.01 1.33 6 1.96 0.75 1.21 7 1.79 2.11 -0.32 8 1.21 0.92 0.29 1.634 1.083 1.551 𝑥̅ = 0.420 0.509 0.651 sd = 1 = Median soybean pod weight (in grams) for the inoculated plants. 2 = Median soybean pod weight (in grams) for the uninoculated plants. Step 3: 4pts: Declare the hypothesis and the LOS: H 0 : 1 2 H A :1 2 Step 4: Find the critical value and draw the decision line. For a 1-tailed 0.10 with n = 8, n’ = 8, we find: Step 5: Calculate the T.S. 5pts 0.10 Do Not Reject Ho U Critical Reject Ho 45 3pts Make an ordered count of the grouped data: Inoculated: 1.03 1.21 1.45 1.53 1.76 1.79 1.96 Uninoculated: 0.49 0.75 0.85 0.92 1.00 1.01 Sum both K1 6 + 6 + 6 + 6.5 + 7 + 7 + 7 + 8 = 53.5 Counts: K 2 0 + 0 + 0 + 0 + 0 + 0 + 3.5 + 7 = 10.5 1.53 2.34 2.11 t.s. = U Sample max K1 , K 2 = 53.5 Take the larger count as the U Sample Check your work is this equation true? K1 K 2 n1 n2 10.5 53.5 8 8 ?? 9pts: Step 7: Stats conclusion: 1pts: At the 10% LOS we reject H 0 : 1 2 , 3pts: U Sample U Critical which yields: 53.5 > 45 2pts: English conclusion: Because: p 3pts: (0.01 < p < 0.025) < 0.10 The Median soybean pod weight (in grams) for the inoculated plants is significantly greater that that found in uninoculated plants. Stat 109 Final Exam Formula Page k 2 ni Ei 2 Ei i 1 ni nj 2 n ij i 1 j 1 2 df k 1 2 Eij df2 row1column1 , Eij 651 2 n12 n21 df2 1 n12 n21 2 𝐸𝑥𝑝. 𝑖,𝑗 = 𝐶𝑜𝑙𝑢𝑚𝑛 𝑗 𝑖 (𝑅𝑜𝑤 𝑠𝑢𝑚 )×( 𝑠𝑢𝑚 ) 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 n1 n 2 n1 n 2 n1 n2 n1 n 2 a b a 1 b 1 a 2 b 2 0 a b Fisher’s Exact p-value = .... n1 n 2 n1 n 2 n1 n2 n1 n2 a b a b a b a b p1 1 p1 Odds Ratio: p2 1 p2 Or: SE of ln ˆ e 95% CI for n n 11 22 n12 n21 1 1 1 1 n11 n12 n21 n22 ln ˆ 1.96 SE Harmonic mean n' k 1 1 1 n1 n2 nk SS Between ni xi x k i 1 SSW ithin xij xi k Within df k–1 n i 1 SS SS Between n–1 SS Total df a–1 SS SS Between A Between B b–1 Interaction df A df B SS Between B n–1 2 SS Total xij x k nj SS Interaction SSW ithin SS Total 2 i 1 j 1 SNK critical value = (SNK table) MS F MS F MSW ithin n' 1 1 n1 n2 p-value SSW ithin Total 2-way ANOVA Source Between A Within Total nj i 1 j 1 Bonferroni 95% CI ( x1 x2 ) B#Pairs, df within MS within 1-way ANOVA Source Between 2 p-value F-Tables 652 F-Tables 653 F-Tables 654 F-Tables 655 F-Tables 656 Bonferroni Table 657 Newman Keuls Table 658 Stat 109 Final Exam Prep Sheet 659 1A.) In a breeding experiment, white chickens with small combs were mated and produced 190 offspring, of the types shown in the accompanying table. Are these data consistent with the Mendelian expected ratios of 9:3:3:1 for the four types? Use a chi-square test at = .10. Number of Offspring Type of offspring: White feathers, small comb White feathers, large comb Dark feathers. small comb Dark feathers, large comb Total 1B.) The distribution of blood types in the Armenian population is as given in the following table. Is the distribution of a random sample of 200 blood types among Portuguese statistically similar to the reported values of the Armenian population? Use a chi-square test at = 0.05 2.) The accompanying partially complete contingency table shows the responses to two treatments. Invent a fictitious data set that agrees with the table and for which s2 0 . 111 37 34 8 190 Blood type Dist. Armenian pop. O = 0.31 Blood Dist. of 200 random Portuguese O = 70 A = 0.50 A = 106 B = 0.13 B = 16 AB = 0.06 AB = 8 1 70 Treatment: Success Failure 2 Total 100 200 3.) A random sample of 99 students in a Conservatory of Music found that 9 of the 48 women sampled had "perfect pitch" (the ability to identify, without error, the pitch of a musical note), but only 1 of the 51 men sampled had perfect pitch. Conduct Fisher's exact test to determine if the evidence supports the case that women are more likely than men to have perfect pitch let = .05. Injured? SelfEmployed 4.) As part of the National Health Interview Survey, employed by Others occupational injury data were collected on thousands of Yes 210 4391 American workers. The following table summarizes part of No 33724 421502 these data. Total 33934 425893 (a) Calculate the relative risk for the self employed. (b) Calculate the sample value of the odds ratio. (c) According to the odds ratio, are self-employed workers less likely to be injured than persons who work for others? Use the odds ratio to express by how much one group is likely to be injured when compared to the other group. (d) Construct a 95% confidence interval for the population va1ue of the odds ratio and use the bounds of the odds ratio to interpret the interval in an English sentence. (e) Run an independence of attributes test to determine if self employment results in lower injury rates. 5.) The habitat selection behavior of the fruitfly Drosophila subobscura was studied by capturing flies from two different habitat sites. The flies were marked with colored fluorescent dust to indicate the site of capture and then released at a point mid-way between the original sites. Site of Original Capture Site of Recapture I 1 78 2 56 II 33 58 Stat 109 Final Exam Prep Sheet 660 5.) Continued: On the following two days, Flies were recaptured at the two sites. The results are summarized in the table. Perform a complete chi-square hypothesis statistic for this contingency table. Test the null hypothesis of independence against the (directional) alternative that the flies preferentially tend to return to their site of capture. Let = .01. 6.) The accompanying table shows fictitious data for 3 samples. Complete the following ANOVA table from the given data. Given the Grand mean = x = Sample: 440 = 40 11 Mean: 1 48 39 42 43 43 2 40 48 44 44 7.) Hospitals across the U.S. report that women Treatment Mon. Tues. Wed. Thu. are more likely to give birth during weekdays versus Mean 53.5 60.1 57.3 59.0 the weekend. An obstetrician wants to know if the Sd 5.3 8.4 4.3 6.2 likelihood of birth is uniform for all 5 weekdays. She n 8 5 4 7 randomly selects 34 calendar weekdays from her local maternity ward and records the mean number of births for each weekday as shown in the table. Given: MSW = 6.03, and 𝑥̿ = 58.0, 𝛼 = 0.05. Cigarettes Heavy smoker. Light 8.) A researcher has determined with an per day more than 5 less than 5 ANOVA hypothesis that smoking has an Mean 78 70 impact on resting heart rate. Given that the n 5 9 ANOVA test yielded an MSW = 43.9, Use the Newman-Keuls method to compare all pairs of means at = .05. 3 39 30 32 35 34 Fri. 60.0 7.1 10 Nonsmoker 0 . 58 14 River Birch European Birch 9.) A plant physiologist investigated the Flooded Control Flooded Control effect of flooding on root metabolism in two 1.45 1.70 0.21 1.34 tree species: flood-tolerant river birch and the 1.19 2.04 0.58 0.99 intolerant European birch. Four seedlings of 1.05 1.49 0.11 1.17 each species were flooded for one day and 1.07 1.91 0.27 1.30 four were used as controls. The concentration of adenosine x 1.19 1.785 0.2925 1.20 triphosphate (ATP) in the roots of each plant was measured. The data (nmol ATP per mg tissue) are shown in the table. For these data: SS(species of birch) = 2.19781, SS(flooding) = 2.25751, SS(interaction) = 0.097656, and SS(within) = .47438. 9a.) Draw an interaction graph. 9b.) Construct the ANOVA table. 9c.) Based on the ANOVA table, write a statistical and English conclusion answering whether the factors of species & flooding interact. Report the respective F-stat and p-value at = .05 9d.) Based on the ANOVA table, write a statistical and English conclusion that tests the null hypothesis that species has no effect on ATP concentration. Use = .01. 9e.) Assuming that each of the four populations has the same standard deviation, use the data to calculate an estimate of that standard deviation. 9f.) Suppose a two-sample t-test was applied to the alternate hypothesis that the mean ATP level was significantly different in the two species of Birch. What would be the resulting test statistic? Stat 109 Final Exam Prep Sheet 10.) Suppose that ATP in nmol per mg of tissue for the River Birch were correlated with annual inches of rain received over several regions resulted in the linear regression line: ATP = 3.24 – 1.25 rain. Interpret the slope of this regression line with an English sentence using the units of measure. 661 Stat 109 Final Exam Prep Sheet Solutions 1A.) In a breeding experiment, white chickens with small combs were mated and produced 190 offspring, of the types shown in the accompanying table. Are these data consistent with the Mundelein expected ratios of 9:3:3:1 for the four types? Use a chi-square test at = 0.10. 23pts Perform all preliminary checks, 3pts: Type of offspring: White feathers, small comb White feathers, large comb Dark feathers. small comb Dark feathers, large comb Total Number of Offspring 111 37 34 8 190 pi = Proportion of chickens with the “ith” color comb combination. Solution: 2pts Declare the parameter: Declare the hypothesis, 3pts: 662 Ho: pWhite/small = 9 16 , pWhite/Large = 3 16 , pDark/small = 3 16 , pDark/Large = 1 16 OR: Ho: The color and comb ratio of the chickens fits the Mendelian ratio of 9:3:3:1 HA: The color and comb ratio of the chickens is otherwise. Preliminary check: Show that all 4 expected values are greater than 5. 9 190 3 190 LOS = 0.10 E1 106.875 5 E2 35.625 5 16 16 3 190 1 190 E3 35.625 5 E4 11.875 5 16 16 Then aknowledge what this check indicates: “All expected values are greater than 5, therefore we have a large enough sample to apply a chi-square test.” OR: “We have an ample sample.” Draw the decision Line 2pts: Calculate the test statistic. (the sample value) 4pts: Do Not Reject Ho at df = 3: t.s. 2 s s2 Reject Ho 6.25 Observed - Expected 2 Expected 111 106.875 106.875 1.55 2 37 35.6252 34 35.6252 8 11.8752 35.625 35.625 11.875 2 s Statistical conclusion 7pts: English Conclusion 2pts: At the 10% LOS we do not reject H 0 , because 2 2 , df = 3 yields p : Sample Critical (1.55 < 6.25) (p > 0.20) > 0.10 The distribution of the chicken comb and color ratio fits the Mendelian ratio of 9:3:3:1 Stat 109 Final Exam Prep Sheet Solutions 1B.) The distribution of blood types in the Armenian population is as given in the following table. Is the distribution of a random sample of 200 blood types among Portuguese statistically similar to the reported values of the Armenian population? 663 Blood type Dist. Blood Dist. of 200 Armenian pop. random Portuguese O = 0.31 O = 70 A = 0.50 A = 106 B = 0.13 B = 16 Run a chi-square test at an LOS of 5% to find out. AB = 8 AB = 0.06 15pts 1.) Declare the parameter: p i = Proportion of Portuguese with blood type “i”. 2.) State the hypothesis. 2pts 2pts State the LOS 0.05 H 0 : pO 0.31, p A 0.50, pB 0.13, p AB 0.06, H A : The proportions are otherwise. 3.) Verify that all the expected values are greater than 5. 2pt EO 0.31 200 62 E A 0.50 200 100 62 > 5, 100 > 5, 26 > 5, 12 > 5. “Since all expected values are greater than 5, we have ample sample.” EB 0.13 200 26 E AB 0.06 200 12 4.) Calculate for the sample value, express with correct notation 2 Observed - Expected 2 Sample Expected 4pts 2 2 2 2 70 62 106 100 16 26 8 12 2 Sample 62 100 26 12 2 Sample 82 62 102 4 2 62 100 26 12 Accept Ho 2 Critical = 7.81 Reject Ho 2 Sample 1.0323 0.36 3.846 1.333 2 Sample 6.57 “At the 5% LOS we do not reject H 0 2 2 because: Sample < Critical p (6.57 < 7.81) ( 0.05 < p <0.10) > 0.05 5pts Portuguese have the same blood type distribution as the Armenians. Stat 109 Final Exam Prep Sheet Solutions 664 2.) The accompanying partially complete contingency table shows the responses to two treatments. Invent a fictitious data set that agrees with the table and for which s2 0 . 5pts Solution: s2 0 means the two variables are completely independent. This can happen only when exactly the same percentage shows for “success” in each of the treatments: pˆ1 70 0.7 140 pˆ 2 100 Treatment: Success Failure Total 1 70 2 100 200 Treatment Success Failure 1 70 30 2 140 60 200 (𝑅𝑜𝑤 𝑠𝑢𝑚×𝐶𝑜𝑙𝑢𝑚𝑛 𝑠𝑢𝑚) Consider that for independence of attributes an 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 = for each cell. 𝐺𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 Then: 70 140 70 30 21,000 70 70 140 140 60 42,000 140 E1,1 E1, 2 300 300 300 300 30 60 70 30 9,000 30 30 60 140 60 18,000 60 E 2,1 E 2, 2 300 300 300 300 s2 Observed - Expected 2 Expected s2 70 702 140 1402 30 302 60 602 70 140 30 60 s2 0 0 0 0 0 3.) A random sample of 99 students in a Conservatory of Music found that 9 of the 48 women sampled had "perfect pitch" (the ability to identify, without error, the pitch of a musical note), but only 1 of the 51 men sampled had perfect pitch. Conduct Fisher's exact test to determine if the evidence supports the case that women are more likely than men to have perfect pitch let = .05. 14pts Solution: Declare H 0 : Perfect pitch is independent of sex. hypothesis: 3pts H1 : Women are more likely than men to have perfect pitch. Generate all tabled scenarios from the given to the most extreme case of zero: 3pts Y N Women 9 39 48 Men 1 50 51 10 89 99 Women 10 38 48 Men 0 51 51 10 89 99 Use combinations to calculate the p-value: 10 89 10 89 1 50 0 51 p = 0.00591 4pts 99 99 51 51 Stat. Conclusion. “At the 5% LOS we reject H0 because p , 0.00591 < 0.05” 2pts English Conclusion: Women are significantly more likely than men to have perfect pitch. 2pts Stat 109 Final Exam Prep Sheet Solutions 665 Injured? SelfEmployed 4.) As part of the National Health Interview employed by Others Survey, occupational injury data were collected on Yes thousands of American workers. The following 210 4391 No table summarizes part of these data. 33724 421502 Total (a) Calculate the relative risk for the self employed. 33934 425893 (b) Calculate the sample value of the odds ratio. (c) According to the odds ratio, are self-employed workers more likely, or less likely, to be injured than persons who work for others? If so then use the odds ratio to express by how much one group is likely to be injured when compared to the other group. (d) Construct a 95% confidence interval for the population va1ue of the odds ratio and use the bounds of the odds ratio to interpret the interval in an English sentence. 14pts Solution: a) Find the relative risk: RR = ˆ b) Find the Odds ratio c) 𝑝̂1 𝑝̂2 = 210⁄ 33934 4391⁄ 425893 = 0.600235 (210)(421502) (4391)(33724) 0.5977 Self-employed workers are injured on the job at only 0.5977 times the rate as workers employed by others. 2pts 3pts 1pt d) Calculate a 95% CI for the odds ratio ln ˆ 0.5146 1 1 1 1 0.0709 210 4391 33724 421502 ˆ SE of ln 95% c.i. for ln ( ) = ln Z SE of ln 2 = 0.5146 (1.96 0.0709) (0.65356, 6pts 95% c.i. for ( ) = e 0.65356 0.37564) , e 0.37564 (0.52019, 0.68685) . Self employed workers will be injured on the job anywhere from 0.52 to 0.69 times as frequently as workers employed by others. 2pts Recall from lecture week 10: ( < 1, ( < 1, < 1) > 1) ( > 1, > 1) Both bounds are less than 1, Column1 descriptors are less likely than column 2 Lower bound is less than 1, Upper bound is greater than 1, We can not rule out even odds for either descriptor. Both bounds are greater than 1, Column1 descriptors are more likely than column 2 Stat 109 Final Exam Prep Sheet Solutions 666 4e.) Run an independence of attributes test to determine if self employment results in lower injury rates. Step 1: Declare the hypothesis and LOS: 3pts H 0 : Injury rates are independent of whether one is self employed or working for a boss. H A : Injury rates are dependent upon employment status. Self-employed workers have SelfInjured? employed Yes 210 No 33724 Total 33934 Step 2: Calculate the expected values for the null hypothesis: 𝐸𝑥𝑝. 𝑖,𝑗 𝑅𝑜𝑤 𝑖 𝐶𝑜𝑙𝑢𝑚𝑛 𝑗 ( )×( ) 𝑠𝑢𝑚 = 𝑠𝑢𝑚 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 Expected Yes Injured Not Injured Assume = 0.05 lower rates of injury on the job. Self-employed Employed by Others 𝑅𝑜𝑤 1 𝐶𝑜𝑙𝑢𝑚𝑛 1 ( )×( ) 𝑠𝑢𝑚 𝑠𝑢𝑚 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 Total 𝑅𝑜𝑤 1 𝐶𝑜𝑙𝑢𝑚𝑛 2 ( )×( ) 𝑠𝑢𝑚 𝑠𝑢𝑚 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 𝑅𝑜𝑤 2 𝐶𝑜𝑙𝑢𝑚𝑛 1 ( )×( ) 𝑠𝑢𝑚 𝑠𝑢𝑚 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 𝑅𝑜𝑤 2 𝐶𝑜𝑙𝑢𝑚𝑛 2 ( )×( ) 𝑠𝑢𝑚 𝑠𝑢𝑚 𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 4601 455226 Total 33934 Expected Yes Injured Self-employed Employed by Others Total (4601) × (33934) 459827 (4601) × (425893) 459827 4601 425893 459827 Not (455226) × (33934) (455226) × (425893) 455226 Injured 459827 459827 Total Expected Yes Injured Not Injured Total 33934 425893 Selfemployed Employed by Others Total 339.54 4261.46 4601 33594.46 421631.54 455226 425893 459827 33934 459827 Employed by Others Total 4391 421502 425893 4601 455226 459827 Stat 109 Final Exam Prep Sheet Solutions 4e.) Continued: Run an independence of attributes test to determine if self employment results in lower injury rates. Step 3: Check for ample sample: Each expected value is greater than 5, therefore we have an ample sample. Step 4: Draw a decision line: 𝑛𝑢𝑚𝑏𝑒𝑟 df = (𝑛𝑢𝑚𝑏𝑒𝑟 − 1) × ( − 1) 𝑜𝑓 𝑟𝑜𝑤𝑠 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 E1,1 = 339.54 > 5 E1,2 = 4261.46 > 5 E2,1 = 33594.46 > 5 E2,2 = 421631.54 > 5 Note that for a one tailed test we must double the given in the two-tailed chi-square table to find the correct 𝝌𝟐𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍 value. Do Not Reject Ho Reject Ho 2 𝜒𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 2.71 df = 1x1 = 1 Step 5: Calculate the Chi-square sample value: 2 k i 1 𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 = 667 n ij Eij 2 Eij (210 − 339.54)2 (4391 − 4261.46)2 (33724 − 33594.46)2 (421502 − 421631.54)2 + + + 339.54 4261.46 33594.46 421631.54 𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 = 49.423 + 3.938 + 0.5 + 0.04 𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 = 53.9 Step 6: Write a complete statistical conclusion with an English sentence summary. At the 5% LOS we reject Ho because: df = 1 𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 > 𝝌𝟐𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍 53.9 > 2.71 p ( p < 0.00005) < Injury rates are dependent upon employment status. Self-employed workers have lower rates of injury on the job. Stat 109 Final Exam Prep Sheet Solutions 668 5.) The habitat selection behavior of the fruitfly Site of Site of Drosophila subobscura was studied by capturing flies from Original Recapture two different habitat sites. Fruit flies were captured at one of Capture 1 2 two sites and marked and then released at a point mid-way I 78 56 between the original sites. On the following two days, Flies II 33 58 were recaptured at the two sites. Each fruit fly was marked with its own micro-bar code to compare the fruit fly’s original site of capture with its site of recapture. The results are summarized in the table. Perform a complete chi-square hypothesis statistic for this contingency table. Test the null hypothesis of independence against the (directional) alternative that the flies preferentially tend to return to their site of capture. Let = 0.01. 18pts Solution: H 0 : Site of fruit flies’ recapture is independent of the original site of capture. H A : Site of fruit flies’ recapture is dependent of the original site of capture, flies preferentially tend to return to their site of capture. (this last segment is the one tail portion of the hypothesis.) 0.01 df = (R-1) x (C-1) = (2 – 1) x (2 -1) = 1 = 0.01 x 2 = 0.02 for a 1-tailed test Do Not Reject H0 2 Critical Reject H0 5.41 2 𝑀𝑐𝑁𝑒𝑚𝑎𝑟𝑠: 2 𝜒𝑆𝑎𝑚𝑝𝑙𝑒 (56 − 33)2 (𝑛1,2 − 𝑛2,1 ) = = = 5.94382 𝑛1,2 + 𝑛2,1 56 + 33 At the 1% LOS we reject H0 because: 2 2 𝜒𝑆𝑎𝑚𝑝𝑙𝑒 > 𝜒𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 5.94382 > 5.41 p < (0.005 < p < 0.01) < 0.01 Fruit-flies prefer to return to their site of original capture. 2pts -1pt for the 2-tailed conclusion of: “Site of recapture dependent on site of original capture.” Declare the hypothesis: 3pts State the LOS 1pt Draw the decision line: 3pts Calculate the Sample Value: 4pts Conclude with Statistics: 5pts English: 2pts Note! If this were a 2-tailed test the bracketed p-value would be (0.01 < p < 0.02) But to accommodate a one-tailed test these bracketed values must be split. – 8pts For applying the incorrect method of Independence of Attributes for paired categorical data. E1,1 E 2, 2 134111 66.11 225 91114 46.11 225 E1, 2 134114 67.89 225 E 2,1 91111 44.89 225 All expected values exceed 5, therefore we have an Ample Sample. Stat 109 Final Exam Prep Sheet Solutions (Continued from the previous page.) row i sum column j sum Expected Values = Eij 2 s 2 s total sum 669 Obs 2 2 2 2 78 61.11 56 67.89 33 44.89 58 46.11 61.11 67.89 44.89 46.11 Ei , j 2 i, j Ei , j 10.44 2 Sample 2.1384 2.0823 3.15 3.06 10.44 2 t.s. Sample =10.44 (df =1), yields a table P-value between 0.001 < p-value < 0.01 However we must recognize that this is a one tailed test which means that these p-values must be split in half: 0.0005 < p-value < 0.005 2 2 Stat: “At the 1% LOS So we reject H0 because Sample (10.44 > 5.41) Critical Which yields p , (0.0005 < p < 0.005) < 0.01. English: We conclude that flies preferentially return to their site of capture. 6.) The accompanying table shows fictitious data for 3 samples. Complete the following ANOVA table from the given data. 14pts 1 48 39 42 43 Mean: 43 Solution: SS Between ni xi x k SSW ithin xij xi nj k 2 i 1 2 i 1 j 1 Sample 2 40 48 44 44 3 39 30 32 35 34 440 Grand mean = x = (total sum)/(number of data points) = = 40 11 2 2 2 SSB = 4(43 40) 3(44 40) 4(34 40) 228 SSW (48 43) (39 43) (42 43) (43 43) 2 (39 34) 2 2 (40 44) 2 (48 44) 2 (44 44) 2 2 SSB 228 = 114 df B 2 SSW 120 MSW = = 15 df W 8 MSB = 2 (30 34) (32 34) (35 34) 2 2pts 2pt 2 df B k 1 2 df W ni 1 8 3pts 2 = 120 3pts FStat Source Between groups df 2 SS 228 MS 114 Within groups 8 120 15 Total 10 348 MSB 114 7.6 MSW 15 Fstat p-value 7.6 0.01 < p < 0.02 4pts Stat 109 Final Exam Prep Sheet Solutions 670 7.) Hospitals across the U.S. report that women Treatment Mon. Tues. Wed. Thu. Fri. are more likely to give birth during weekdays Mean 53.5 60.1 57.3 59.0 60.0 versus the weekend. An obstetrician wants to Sd 5.3 8.4 4.3 6.2 7.1 know if the likelihood of birth is uniform for all 5 n 8 5 4 7 10 weekdays. She randomly selects 34 calendar weekdays from her local maternity ward and records the mean number of births for each weekday as shown in the table. Given: MSW = 6.03, and 𝑥̿ = 58.0, 𝛼 = 0.05. Construct the ANOVA table and run a complete ANOVA hypothesis. Solution: Note preliminary Checks, 2pts We assume independent and random samples. We also assume normality. 𝑘 Find FCritical = Fk,n’-1: k = 5 groups, 𝑛′ = 1 1 1 1 1 + + + + 𝑛1 𝑛2 𝑛3 𝑛4 𝑛5 Check for equal variances: Rounding up n’ = 7, df = n – 1 = 6, then at 𝛼 = 0.05, FCritical = F5,6 = 4.39 2 𝑠𝑑 29pts =1 5 1 1 1 1 + + + + 8 5 4 7 10 = 6.1135 5pts 8.4 2 FSample = ( 𝑠𝑑𝑚𝑎𝑥 ) = (4.3) = 3.816 𝑚𝑖𝑛 Do Not Reject Ho At the 5% LOS we assume equal variances because: Reject Ho 4.39 p > (0.05 < p < 0.10) > 0.05 FSample < F5,6 Critical (3.816 < 4.39) Note: In Hartley’s test we ALWAYS use 𝜶 = 𝟎. 𝟎𝟓 even if 𝜶 equals another value in the ANOVA test. Declare parameter: 2pts i mean number of births on the “ith” day of the week. State Hypothesis: 2pts Ho: Mon = Tue =Wed =Thu =Fri Ha: At least one weekday has a mean number of births that is not equal to the others. Find the correct Df 3pts Df Between: df B k 1 4 and Df Within: dfW ni 1 29 SSB ni xi x k Calculate the SSB: 3pts Calculate the SSW 2pts 2 SSB = 233.01 i 1 SSB 853.5 58 560.1 58 457.3 58 759 58 1060 58 2 SSW df W SSW MSW df W Since MSW = 7.) Continued on the next page: 7.) Continued: 2 2 given: MSW = 6.03, dfw = 29 SSW = 6.03× 29 SSW = 174.87 2 2 Stat 109 Final Exam Prep Sheet Solutions MSB = SSB 233.01 = 58.25 2pts df B 4 FStat MSB 58.25 9.66 2pts MSW 6.03 671 Source Between groups df 4 SS 233.01 MS 58.25 Within groups 29 174.87 6.03 Total 33 407.88 Fstat 9.66 p-value p < 0.0001 State conclusion with Statistics and English. Stat: : “At the 5% LOS we reject H0 because F Sample F4, 29 Critical (9.66 > 2.70) which yields: p , (p < 0.0001) < 0.05” 4pts English: The mean number of births is significantly different for at least one weekday. 2pts 8.) A researcher has determined with an ANOVA hypothesis that smoking has an impact on resting heart rate. Given that the ANOVA test yielded an MSW = 43.9, Use the Newman-Keuls method to compare all pairs of means at = .05. 20pts Cigarettes per day Mean n 8.) Solution Do not conclude that df = 2 NOTE! Order the Means: 2pts & Determine Dfw 2pts x Non 58 For unbalanced data sets we calculate the harmonic mean. 3pts 𝑛′ = x Light 70 Heavy smoker. more than 5 78 5 x Heavy 78 Light less than 5 70 9 Common error! dfW ni 1 25 𝑘 3 = = 7.84 1 1 1 1 1 1 + + + + 𝑛1 𝑛2 𝑛3 14 9 5 Build the SNK table for critical values at 0.05 with df(within) = 25. Note the steps apart SNK table value at Df = 25. (but we must use df = 24) (SNK table ) MS (within) n' 2 2.92 3 3.53 2pts 2pts 6.91 8.35 3pts 8.) Continued on the next page: Nonsmoker 0 . 58 14 Stat 109 Final Exam Prep Sheet Solutions 672 8.) Continued: Table Work: 3pts Differences Of Mean Pairs x Heavy - x Non 78 58 Absolute differences Heavy-Non = 20 Corresponding NK Diff. vs. critical values At 3 steps: 20 > 8.35 Conclusion Heavy Non =8 At 2 steps: 8 > 6.91 Heavy Light = 12 At 2 steps: 12 > 6.91 Light Non x Heavy - x Light 78 70 Heavy-Light x Light - x Non 70 58 Light - Non Line notation summary: 3pts Non Light Heavy River Birch European Birch 9.) A plant physiologist investigated the effect of Flooded Control Flooded Control flooding on root metabolism in two tree species: flood1.45 1.70 0.21 1.34 tolerant river birch and the intolerant European birch. 1.19 2.04 0.58 0.99 Four seedlings of each species were flooded for one day 1.05 1.49 0.11 1.17 and four were used as controls. The concentration of 1.07 1.91 0.27 1.30 adenosine triphosphate (ATP) in the roots of each plant was measured. The data (nmol ATP per mg x 1.19 1.785 0.2925 1.20 tissue) are shown in the table. For these data: SS(species of birch) = 2.19781, SS(flooding) = 2.25751, SS(interaction) = 0.097656, and SS(within) = .47438. 20pts Interaction Graph of Birch Species vs Flooding Level Solution 9a.) Draw an interaction graph. nmol ATP per mg tissue 3pts -1pt for missing labels or units 2.0 River Birch 1.5 European Birch 1.0 0.5 Flooded Control Stat 109 Final Exam Prep Sheet Solutions 673 9b.) Construct the ANOVA table. Solution: df Species species -1 = 1 SS Interaction 0.097656 MS ( Interaction) df Interaction 1 df flooding levels flooding levels -1 = 1 SS Within 0.47438 df Interaction df Species df flooding levels 1 MS Within df 12 W ithin df W ithin n k 16 4 MS Species 2.19781 FSpecies MSW 0.039532 SS Species 2.19781 MS (Species ) MS Floods 2.25751 df Species 1 FFloods MSW 0.039532 SS Floods 2.25751 MS Interaction 0.097656 MS ( Floods ) FInteraction df Floods 1 MSW 0.039532 Source df SS MS F p-value Bet’n species 1 2.19781 2.19781 55.60 p < 0.0001 Bet’n flooding levels 1 2.25751 2.25751 57.11 p < 0.0001 5pts Interaction 1 0.097656 0.097656 2.47 0.1 < p < 0.2 Within groups 12 0.47438 0.039532 Total 15 5.027356 9c.) Based on the ANOVA table, write a statistical and English conclusion answering whether the factors of species & flooding interact. Report the respective F-stat and pvalue at = .05 Solution: 2pts Stat: At the 5% LOS, we do not reject H0 because F Sample Fdf 1, 12 Critical (2.47 < 4.75) yields p , (0.1 < p < 0.2) > 0.05 2pts English: “There is no significant interaction between species and flooding.” 9d.) Based on the ANOVA table, write a statistical and English conclusion that tests the null hypothesis that species has no effect on ATP concentration. Use = .01. Solution: Stat: At the 1% LOS, we reject H0 because FSample F1,12 Critical (55.60 > 9.33) yields p , (p < 0.0001) < 0.01 2pts English: “There is a significant species effect upon the ATP concentration of the tree.” 2pts 9e.) Assuming that each of the four populations has the same standard deviation, use the data to calculate an estimate of that standard deviation. Solution: spooled = MS (within) 0.039532 spooled = 0.1988 2pts Stat 109 Final Exam Prep Sheet Solutions 674 9f.) Suppose a two-sample t-test was applied to the alternate hypothesis that the mean ATP level was significantly different in the two species of Birch. What would be the resulting test statistic? Solution: The link between the sample values of the ANOVA and 2-sample t-test is: √𝐹𝑆𝑎𝑚𝑝𝑙𝑒 = 𝑡𝑆𝑎𝑚𝑝𝑙𝑒 Then √55.6 = 7.46 and 𝑡𝑆𝑎𝑚𝑝𝑙𝑒 = 7.46 10.) Suppose that ATP in nmol per mg of tissue for the River Birch were correlated with annual inches of rain received over several regions resulted in the linear regression line: ATP = 3.24 – 1.25 rain. Interpret the slope of this regression line with an English sentence using the units of measure. Answer: “For each additional inch of annual rainfall we should see a decrease of 1.25 nmol per mg of ATP in the tissue of the River Birch.” A note to students struggling to pass: Warning! 1 out 3 have made this error in past exams: You must know by now that when you run a hypothesis test, it is the Null hypothesis that takes the case of equality (even if in this equality statement for H0 implies the opposite of HA). The prompt to run the hypothesis (or suspicion) is translated to the alternate hypothesis, Not the null.