All_Exam_Preps

advertisement
Formulas for Stat 109 Exam 1

585

In a density histogram each rectangular area = percentage (or relative frequency)
1 n
1 k
Mean: y   y i   f i y i   y i p y i 
n i 1
n i 1

SD =
1 n
 yi  y 2 =

n  1 i 1
1 n
2
f i  yi  y  =

n  1 i 1
𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒:
𝑛
𝑡ℎ
𝑛
[ ] +[4+1]
𝑄1 = 4
 y
𝑡ℎ
𝑛
4
2
i
2
i
IQR = Q3  Q1
𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4
𝑡ℎ
 x px   
   p yi  =
2
i
step = 1.5 IQR
𝑄1 = [ + 1 ]
2
3𝑛
𝑡ℎ
𝑡ℎ
3𝑛
[ ] +[ 4 +1]
𝑄3 = 4
𝑄3 = [
n 1
~
x
th value
2
𝑡ℎ
3𝑛
+1]
4
2

P A  B  P A  PB  P A  B

P A  B  P A  PB True only when:
A and B are mutually exclusive events.
P A  B  P APB True only
when A and B are independent events.



 y  1sd  contains
 y  2sd  contains
 y  3sd  contains
Y ~ binomial (n, p):
j = 0, 1, 2, …, n
P A  B  P APB A
𝑃(𝐴|𝐵) =
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵)
𝑃(𝐵|𝐴) =
n
n j
PY  j     p j 1  p 
j
 
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴)

Y ~ binomial (n, p): EX    x  np

k  0.5   

PY  k   PY  k  0.5  P Z 





k  0.5   

PY  k   PY  k  0.5  P Z 





k  0.5   
 k  0.5  
PY  k   Pk  0.5  Y  k  0.5  P
Z






Z
Y 

Z
Y 
/ n
68% of the data
95% of the data
99.7% of the data
SDX    x  np1  p 
Z
Y  np
np1  p 
Y 0.5
0.5

 p pˆ 
p
n
n
n
Z

p1  p 
p1  p 
n
n
Table 3
-z
586
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
-3.4
-3.3
-3.2
-3.1
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
-0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
0.000337
0.000483
0.000687
0.000968
0.001350
0.001866
0.002555
0.003467
0.004660
0.006210
0.008198
0.010724
0.013903
0.017864
0.022750
0.028717
0.035930
0.044565
0.054799
0.066807
0.080757
0.096800
0.115070
0.135666
0.158655
0.184060
0.211855
0.241964
0.274253
0.308538
0.344578
0.382089
0.420740
0.460172
0.500000
0.500000
0.539828
0.579260
0.617911
0.655422
0.691462
0.725747
0.758036
0.788145
0.815940
0.841345
0.864334
0.884930
0.903200
0.919243
0.933193
0.945201
0.955435
0.964070
0.971283
0.977250
0.982136
0.986097
0.989276
0.991802
0.993790
0.995340
0.996533
0.997445
0.998134
0.998650
0.999032
0.999313
0.999517
0.999663
0.000325
0.000466
0.000664
0.000935
0.001306
0.001807
0.002477
0.003364
0.004527
0.006037
0.007976
0.010444
0.013553
0.017429
0.022216
0.028067
0.035148
0.043633
0.053699
0.065522
0.079270
0.095098
0.113139
0.133500
0.156248
0.181411
0.208970
0.238852
0.270931
0.305026
0.340903
0.378280
0.416834
0.456205
0.496011
0.503989
0.543795
0.583166
0.621720
0.659097
0.694974
0.729069
0.761148
0.791030
0.818589
0.843752
0.866500
0.886861
0.904902
0.920730
0.934478
0.946301
0.956367
0.964852
0.971933
0.977784
0.982571
0.986447
0.989556
0.992024
0.993963
0.995473
0.996636
0.997523
0.998193
0.998694
0.999065
0.999336
0.999534
0.999675
0.000313
0.000450
0.000641
0.000904
0.001264
0.001750
0.002401
0.003264
0.004396
0.005868
0.007760
0.010170
0.013209
0.017003
0.021692
0.027429
0.034380
0.042716
0.052616
0.064255
0.077804
0.093418
0.111232
0.131357
0.153864
0.178786
0.206108
0.235763
0.267629
0.301532
0.337243
0.374484
0.412936
0.452242
0.492022
0.507978
0.547758
0.587064
0.625516
0.662757
0.698468
0.732371
0.764237
0.793892
0.821214
0.846136
0.868643
0.888768
0.906582
0.922196
0.935745
0.947384
0.957284
0.965620
0.972571
0.978308
0.982997
0.986791
0.989830
0.992240
0.994132
0.995604
0.996736
0.997599
0.998250
0.998736
0.999096
0.999359
0.999550
0.999687
0.000302
0.000434
0.000619
0.000874
0.001223
0.001695
0.002327
0.003167
0.004269
0.005703
0.007549
0.009903
0.012874
0.016586
0.021178
0.026803
0.033625
0.041815
0.051551
0.063008
0.076359
0.091759
0.109349
0.129238
0.151505
0.176186
0.203269
0.232695
0.264347
0.298056
0.333598
0.370700
0.409046
0.448283
0.488034
0.511966
0.551717
0.590954
0.629300
0.666402
0.701944
0.735653
0.767305
0.796731
0.823814
0.848495
0.870762
0.890651
0.908241
0.923641
0.936992
0.948449
0.958185
0.966375
0.973197
0.978822
0.983414
0.987126
0.990097
0.992451
0.994297
0.995731
0.996833
0.997673
0.998305
0.998777
0.999126
0.999381
0.999566
0.999698
0.000291
0.000419
0.000598
0.000845
0.001183
0.001641
0.002256
0.003072
0.004145
0.005543
0.007344
0.009642
0.012545
0.016177
0.020675
0.026190
0.032884
0.040930
0.050503
0.061780
0.074934
0.090123
0.107488
0.127143
0.149170
0.173609
0.200454
0.229650
0.261086
0.294599
0.329969
0.366928
0.405165
0.444330
0.484047
0.515953
0.555670
0.594835
0.633072
0.670031
0.705401
0.738914
0.770350
0.799546
0.826391
0.850830
0.872857
0.892512
0.909877
0.925066
0.938220
0.949497
0.959070
0.967116
0.973810
0.979325
0.983823
0.987455
0.990358
0.992656
0.994457
0.995855
0.996928
0.997744
0.998359
0.998817
0.999155
0.999402
0.999581
0.999709
0.000280
0.000404
0.000577
0.000816
0.001144
0.001589
0.002186
0.002980
0.004025
0.005386
0.007143
0.009387
0.012224
0.015778
0.020182
0.025588
0.032157
0.040059
0.049471
0.060571
0.073529
0.088508
0.105650
0.125072
0.146859
0.171056
0.197663
0.226627
0.257846
0.291160
0.326355
0.363169
0.401294
0.440382
0.480061
0.519939
0.559618
0.598706
0.636831
0.673645
0.708840
0.742154
0.773373
0.802337
0.828944
0.853141
0.874928
0.894350
0.911492
0.926471
0.939429
0.950529
0.959941
0.967843
0.974412
0.979818
0.984222
0.987776
0.990613
0.992857
0.994614
0.995975
0.997020
0.997814
0.998411
0.998856
0.999184
0.999423
0.999596
0.999720
0.000270
0.000390
0.000557
0.000789
0.001107
0.001538
0.002118
0.002890
0.003907
0.005234
0.006947
0.009137
0.011911
0.015386
0.019699
0.024998
0.031443
0.039204
0.048457
0.059380
0.072145
0.086915
0.103835
0.123024
0.144572
0.168528
0.194895
0.223627
0.254627
0.287740
0.322758
0.359424
0.397432
0.436441
0.476078
0.523922
0.563559
0.602568
0.640576
0.677242
0.712260
0.745373
0.776373
0.805105
0.831472
0.855428
0.876976
0.896165
0.913085
0.927855
0.940620
0.951543
0.960796
0.968557
0.975002
0.980301
0.984614
0.988089
0.990863
0.993053
0.994766
0.996093
0.997110
0.997882
0.998462
0.998893
0.999211
0.999443
0.999610
0.999730
0.000260
0.000376
0.000538
0.000762
0.001070
0.001489
0.002052
0.002803
0.003793
0.005085
0.006756
0.008894
0.011604
0.015003
0.019226
0.024419
0.030742
0.038364
0.047460
0.058208
0.070781
0.085343
0.102042
0.121000
0.142310
0.166023
0.192150
0.220650
0.251429
0.284339
0.319178
0.355691
0.393580
0.432505
0.472097
0.527903
0.567495
0.606420
0.644309
0.680822
0.715661
0.748571
0.779350
0.807850
0.833977
0.857690
0.879000
0.897958
0.914657
0.929219
0.941792
0.952540
0.961636
0.969258
0.975581
0.980774
0.984997
0.988396
0.991106
0.993244
0.994915
0.996207
0.997197
0.997948
0.998511
0.998930
0.999238
0.999462
0.999624
0.999740
0.000251
0.000362
0.000519
0.000736
0.001035
0.001441
0.001988
0.002718
0.003681
0.004940
0.006569
0.008656
0.011304
0.014629
0.018763
0.023852
0.030054
0.037538
0.046479
0.057053
0.069437
0.083793
0.100273
0.119000
0.140071
0.163543
0.189430
0.217695
0.248252
0.280957
0.315614
0.351973
0.389739
0.428576
0.468119
0.531881
0.571424
0.610261
0.648027
0.684386
0.719043
0.751748
0.782305
0.810570
0.836457
0.859929
0.881000
0.899727
0.916207
0.930563
0.942947
0.953521
0.962462
0.969946
0.976148
0.981237
0.985371
0.988696
0.991344
0.993431
0.995060
0.996319
0.997282
0.998012
0.998559
0.998965
0.999264
0.999481
0.999638
0.999749
0.000242
0.000349
0.000501
0.000711
0.001001
0.001395
0.001926
0.002635
0.003573
0.004799
0.006387
0.008424
0.011011
0.014262
0.018309
0.023295
0.029379
0.036727
0.045514
0.055917
0.068112
0.082264
0.098525
0.117023
0.137857
0.161087
0.186733
0.214764
0.245097
0.277595
0.312067
0.348268
0.385908
0.424655
0.464144
0.535856
0.575345
0.614092
0.651732
0.687933
0.722405
0.754903
0.785236
0.813267
0.838913
0.862143
0.882977
0.901475
0.917736
0.931888
0.944083
0.954486
0.963273
0.970621
0.976705
0.981691
0.985738
0.988989
0.991576
0.993613
0.995201
0.996427
0.997365
0.998074
0.998605
0.998999
0.999289
0.999499
0.999651
0.999758
+z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
Table 3
587
Table 3A:
CDF for the Standard
Normal Distribution
(left tail areas).
-z
-3.4
-3.3
-3.2
-3.1
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
-0.0
area = Prob[ Z < a ]
a
0
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.000337
0.000483
0.000687
0.000968
0.001350
0.001866
0.002555
0.003467
0.004660
0.006210
0.008198
0.010724
0.013903
0.017864
0.022750
0.028717
0.035930
0.044565
0.054799
0.066807
0.080757
0.096800
0.115070
0.135666
0.158655
0.184060
0.211855
0.241964
0.274253
0.308538
0.344578
0.382089
0.420740
0.460172
0.500000
0.000325
0.000466
0.000664
0.000935
0.001306
0.001807
0.002477
0.003364
0.004527
0.006037
0.007976
0.010444
0.013553
0.017429
0.022216
0.028067
0.035148
0.043633
0.053699
0.065522
0.079270
0.095098
0.113139
0.133500
0.156248
0.181411
0.208970
0.238852
0.270931
0.305026
0.340903
0.378280
0.416834
0.456205
0.496011
0.000313
0.000450
0.000641
0.000904
0.001264
0.001750
0.002401
0.003264
0.004396
0.005868
0.007760
0.010170
0.013209
0.017003
0.021692
0.027429
0.034380
0.042716
0.052616
0.064255
0.077804
0.093418
0.111232
0.131357
0.153864
0.178786
0.206108
0.235763
0.267629
0.301532
0.337243
0.374484
0.412936
0.452242
0.492022
0.000302
0.000434
0.000619
0.000874
0.001223
0.001695
0.002327
0.003167
0.004269
0.005703
0.007549
0.009903
0.012874
0.016586
0.021178
0.026803
0.033625
0.041815
0.051551
0.063008
0.076359
0.091759
0.109349
0.129238
0.151505
0.176186
0.203269
0.232695
0.264347
0.298056
0.333598
0.370700
0.409046
0.448283
0.488034
0.000291
0.000419
0.000598
0.000845
0.001183
0.001641
0.002256
0.003072
0.004145
0.005543
0.007344
0.009642
0.012545
0.016177
0.020675
0.026190
0.032884
0.040930
0.050503
0.061780
0.074934
0.090123
0.107488
0.127143
0.149170
0.173609
0.200454
0.229650
0.261086
0.294599
0.329969
0.366928
0.405165
0.444330
0.484047
0.000280
0.000404
0.000577
0.000816
0.001144
0.001589
0.002186
0.002980
0.004025
0.005386
0.007143
0.009387
0.012224
0.015778
0.020182
0.025588
0.032157
0.040059
0.049471
0.060571
0.073529
0.088508
0.105650
0.125072
0.146859
0.171056
0.197663
0.226627
0.257846
0.291160
0.326355
0.363169
0.401294
0.440382
0.480061
0.000270
0.000390
0.000557
0.000789
0.001107
0.001538
0.002118
0.002890
0.003907
0.005234
0.006947
0.009137
0.011911
0.015386
0.019699
0.024998
0.031443
0.039204
0.048457
0.059380
0.072145
0.086915
0.103835
0.123024
0.144572
0.168528
0.194895
0.223627
0.254627
0.287740
0.322758
0.359424
0.397432
0.436441
0.476078
0.000260
0.000376
0.000538
0.000762
0.001070
0.001489
0.002052
0.002803
0.003793
0.005085
0.006756
0.008894
0.011604
0.015003
0.019226
0.024419
0.030742
0.038364
0.047460
0.058208
0.070781
0.085343
0.102042
0.121000
0.142310
0.166023
0.192150
0.220650
0.251429
0.284339
0.319178
0.355691
0.393580
0.432505
0.472097
0.000251
0.000362
0.000519
0.000736
0.001035
0.001441
0.001988
0.002718
0.003681
0.004940
0.006569
0.008656
0.011304
0.014629
0.018763
0.023852
0.030054
0.037538
0.046479
0.057053
0.069437
0.083793
0.100273
0.119000
0.140071
0.163543
0.189430
0.217695
0.248252
0.280957
0.315614
0.351973
0.389739
0.428576
0.468119
0.000242
0.000349
0.000501
0.000711
0.001001
0.001395
0.001926
0.002635
0.003573
0.004799
0.006387
0.008424
0.011011
0.014262
0.018309
0.023295
0.029379
0.036727
0.045514
0.055917
0.068112
0.082264
0.098525
0.117023
0.137857
0.161087
0.186733
0.214764
0.245097
0.277595
0.312067
0.348268
0.385908
0.424655
0.464144
Table 3
588
Table 3B:
CDF for the Standard
Normal Distribution
(left-tail areas to the right
of the mean).
area = Prob[ Z < a ]
0
+z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
.00
0.500000
0.539828
0.579260
0.617911
0.655422
0.691462
0.725747
0.758036
0.788145
0.815940
0.841345
0.864334
0.884930
0.903200
0.919243
0.933193
0.945201
0.955435
0.964070
0.971283
0.977250
0.982136
0.986097
0.989276
0.991802
0.993790
0.995340
0.996533
0.997445
0.998134
0.998650
0.999032
0.999313
0.999517
0.999663
.01
0.503989
0.543795
0.583166
0.621720
0.659097
0.694974
0.729069
0.761148
0.791030
0.818589
0.843752
0.866500
0.886861
0.904902
0.920730
0.934478
0.946301
0.956367
0.964852
0.971933
0.977784
0.982571
0.986447
0.989556
0.992024
0.993963
0.995473
0.996636
0.997523
0.998193
0.998694
0.999065
0.999336
0.999534
0.999675
.02
0.507978
0.547758
0.587064
0.625516
0.662757
0.698468
0.732371
0.764237
0.793892
0.821214
0.846136
0.868643
0.888768
0.906582
0.922196
0.935745
0.947384
0.957284
0.965620
0.972571
0.978308
0.982997
0.986791
0.989830
0.992240
0.994132
0.995604
0.996736
0.997599
0.998250
0.998736
0.999096
0.999359
0.999550
0.999687
.03
0.511966
0.551717
0.590954
0.629300
0.666402
0.701944
0.735653
0.767305
0.796731
0.823814
0.848495
0.870762
0.890651
0.908241
0.923641
0.936992
0.948449
0.958185
0.966375
0.973197
0.978822
0.983414
0.987126
0.990097
0.992451
0.994297
0.995731
0.996833
0.997673
0.998305
0.998777
0.999126
0.999381
0.999566
0.999698
.04
0.515953
0.555670
0.594835
0.633072
0.670031
0.705401
0.738914
0.770350
0.799546
0.826391
0.850830
0.872857
0.892512
0.909877
0.925066
0.938220
0.949497
0.959070
0.967116
0.973810
0.979325
0.983823
0.987455
0.990358
0.992656
0.994457
0.995855
0.996928
0.997744
0.998359
0.998817
0.999155
0.999402
0.999581
0.999709
.05
0.519939
0.559618
0.598706
0.636831
0.673645
0.708840
0.742154
0.773373
0.802337
0.828944
0.853141
0.874928
0.894350
0.911492
0.926471
0.939429
0.950529
0.959941
0.967843
0.974412
0.979818
0.984222
0.987776
0.990613
0.992857
0.994614
0.995975
0.997020
0.997814
0.998411
0.998856
0.999184
0.999423
0.999596
0.999720
a
.06
0.523922
0.563559
0.602568
0.640576
0.677242
0.712260
0.745373
0.776373
0.805105
0.831472
0.855428
0.876976
0.896165
0.913085
0.927855
0.940620
0.951543
0.960796
0.968557
0.975002
0.980301
0.984614
0.988089
0.990863
0.993053
0.994766
0.996093
0.997110
0.997882
0.998462
0.998893
0.999211
0.999443
0.999610
0.999730
.07
0.527903
0.567495
0.606420
0.644309
0.680822
0.715661
0.748571
0.779350
0.807850
0.833977
0.857690
0.879000
0.897958
0.914657
0.929219
0.941792
0.952540
0.961636
0.969258
0.975581
0.980774
0.984997
0.988396
0.991106
0.993244
0.994915
0.996207
0.997197
0.997948
0.998511
0.998930
0.999238
0.999462
0.999624
0.999740
.08
0.531881
0.571424
0.610261
0.648027
0.684386
0.719043
0.751748
0.782305
0.810570
0.836457
0.859929
0.881000
0.899727
0.916207
0.930563
0.942947
0.953521
0.962462
0.969946
0.976148
0.981237
0.985371
0.988696
0.991344
0.993431
0.995060
0.996319
0.997282
0.998012
0.998559
0.998965
0.999264
0.999481
0.999638
0.999749
.09
0.535856
0.575345
0.614092
0.651732
0.687933
0.722405
0.754903
0.785236
0.813267
0.838913
0.862143
0.882977
0.901475
0.917736
0.931888
0.944083
0.954486
0.963273
0.970621
0.976705
0.981691
0.985738
0.988989
0.991576
0.993613
0.995201
0.996427
0.997365
0.998074
0.998605
0.998999
0.999289
0.999499
0.999651
0.999758
Stat 109
Sample Exam 1A
Name_____________
1.)
Given a scenario where 30% of the salmon in a river are
farm fish, and the rest are wild, and we know that 12% the wild
salmon are tagged while 85% of the farm fish are tagged, find the
following probabilities:
Declare all the event
space variables: 2pts
Answer all questions first with complete probability notation.
Then answer each question with an English sentence.
a) Find the probability that a randomly drawn salmon is tagged.
7pts
b) Find the probability of drawing a wild salmon given that the fish is tagged. 7pts
c) Are the type of salmon drawn and whether it is tagged or not independent events? 5pts
(Again use probability notation and numerical values to support your answer.)
589
Stat 109
2.)
Sample Exam 1A
590
John checks his chicken coop each morning and finds the following
number of eggs according to the probability distribution table.
2a.)
Find the expected number of eggs (the mean of x.)
5 pts
x, eggs
2
3
4
p(x)
0.3
0.5
0.2
2b.) Find the typical range of eggs collected each morning expressed as the first standard
deviation of x. (show your work.) 8pts
3.)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Data display for year 2000: widowed females per 1000 for all 58 California counties.
57
59
66
76
78
80
80
81
82
83
84
84
85
86
87
88
89
90
90
90
MONO
SAN BENITO
ALPINE
YOLO
SANTA CLARA
EL DORADO
MADERA
SAN BERNARDINO
VENTURA
ORANGE
KINGS
SANTA CRUZ
SAN DIEGO
SOLANO
LOS ANGELES
ALAMEDA
MONTEREY
KERN
LASSEN
PLACER
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
91
91
91
92
93
93
93
93
94
97
97
97
98
98
99
100
102
102
103
104
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
FRESNO
MERCED
TULARE
CONTRA COSTA
MARIN
PLUMAS
SANTA BARBARA
STANISLAUS
SACRAMENTO
SAN JOAQUIN
SAN MATEO
YUBA
MENDOCINO
RIVERSIDE
SONOMA
GLENN
IMPERIAL
SAN LUIS OBISPO
SAN FRANCISCO
COLUSA
104
104
105
106
109
111
112
113
116
116
119
120
121
123
125
135
138
146
HUMBOLDT
NEVADA
SUTTER
DEL NORTE
SIERRA
CALAVERAS
SHASTA
TUOLUMNE
MARIPOSA
SISKIYOU
BUTTE
TEHAMA
NAPA
TRINITY
AMADOR
INYO
LAKE
MODOC
3a.) Is the 200 California county
widowed data symmetrical or is it
skewed to left or right?
2pts
Frequency
2000 CA Widowed Females per 1000 by County
18
16
14
12
10
8
6
4
2
0
60
80
100
120
140
Stat 109
Sample Exam 1A
591
3b.) Use correct notation to express the 5 key values for a boxplot for the incidence of widowed
females per 1000 for all 58 California counties:
3c.) Draw a boxplot of the CA county widow data set. Properly denote any outliers in the plot.
11pts











Stat 109
Sample Exam 1A
4.)
The lengths of adult Koi in a large pond are normally
distributed with a mean length of 25 inches and a standard
deviation of 3 inches. Answer the following Questions using
the event variable with probability notation and show the
equivalent Z score within probability notation.
592
a)
Declare the event variable. 1pt
4a.)
What probability corresponds to a Koi that is at most 30 inches in length? 8pts
4b.)
The top 29% of Koi lengths must be above what length threshold? 8pts
4c.) How long will a Koi’s length be if it’s length is at the 37th percentile of length? 8pts
4d.) The Koi population of this particular pond is the progeny of a
metallic “Ogon” variety such that each of the Koi presents a number of
metallic scales on its body interspersed among the other scales of the
fish body. Assuming that the number of metallic scales on any given
Koi are normally distributed throughout the population, with a mean of
18 metallic scales and a standard deviation of 3 metallic scales,
determine the following probabilities.
a)
Declare the event variable. 1pt
4e.) Find the probability that a randomly drawn Koi has less than 16 metallic scales upon its body. 8pt
4f.) Find the probability that a randomly drawn Koi has more than 17 metallic scales upon its body. 8pt
4g.) Find the probability that a randomly drawn Koi has between 13 and 19 (inclusive) metallic scales
upon its body. 8pt
Stat 109
Sample Exam 1A
593
5a.) In 2005 57% of the incoming Freshman class for the California State University required either
English or Mathematic remediation of before taking college level course work. 47% required
remediation in English while 37% required remediation in Math. What portion of the 2005 Freshman
class required remediation in both English and Math? Use probability notation to express your
answer. 8pts
5b.) What is the probability that the a 2005 Freshman needs remediation in Math but not English?
Use probability notation to express your answer. 8pts
5c.) Find the probability that either 6 or 7 out of 10 randomly drawn CSU Freshman will not require
remediation class work. Declare the variable. Use probability notation to express your answer. 8pts
5d.)
6.)
Find the first standard deviation window to express the typical range of 10 randomly chosen
Freshman that will not require remediation?
8pts
Given that one out of four salamanders are consumed as prey before reaching sexual maturity,
Find the probability that between 10% and 30% of 50 juvenile salamanders will be consumed
before reaching maturity.
Show all work and use proper notation for full credit. Finish with an English sentence.
12pts
Stat 109
Sample Exam 1A Solutions
1.)
Given a scenario where 30% of the salmon in a river are
farm fish, and the rest are wild, and we know that 12% the wild
salmon are tagged while 85% of the farm fish are tagged, find the
following probabilities:
Answer all questions first with complete probability notation.
Then answer each question with an English sentence.
a)
Declare all the event
space variables: 2pts
T = Tagged Salmon
W = Wild Salmon
F = Farm Salmon
Find the probability that a randomly drawn salmon is tagged.
PT   PT W  PW   PT F  PF 
PT   0.12  0.7  0.85  0.3
PT   .339
b)
7pts
Notation: 2pts
Calculation: 5pts
“About 34% of random draws will be tagged.”
Find the probability of drawing a wild salmon given that the fish is tagged.
7pts
PW T  
P W T  
594
PT W  PW 
PW  T 

PT 
PT 
0.12  0.70
= 0.2478
0.339
Notation: 2pts
Calculation: 5pts
“About 25% of tagged fish will be wild.”
c) Are the type of salmon drawn and whether it is tagged or not independent events? 5pts
(Again use probability notation and numerical values to support your answer.)
For independent events the following must be true: P A  B  P A PB
We will use the tagged and wild salmon data since it is at hand:
PW  T   PW   PT  ? Is this true?
???
PW  T   PT W   PW  from above
PW T   0.12  0.7
PW T   0.084
Since 0.084  0.2373
Then: PW  T   PW  PT 
Therefore we do not have independent
events.
PW  PT   0.7  0.339
PW  PT   0.2373
In English:
“The probability of whether a salmon is
tagged or not depends on whether it is a
wild or a farmed salmon.”
Stat 109
Sample Exam 1A Solutions
595
2.) John checks his chicken coop each morning and finds the following number of eggs
according to the probability distribution table. Express answer with correct notation.
2a.)
Find the expected number of eggs (the mean of x.)
   xi pxi   2  0.3  3  0.5  4  0.2 = 2.9

5 pts
  2.9
x, eggs
2
3
4
1
pt for Notation
2
2b.) Find the typical range of eggs collected each morning expressed as the first standard
deviation of x. (show your work.) 8pts (1 pt for notation)

 x px   
2
i
i
2
 2 2 .3  3 2 .5  4 2 .2  2.9 2  0.49  0.7 
  0.7
𝝁 ± 𝝈 = 𝟐. 𝟗 ± 𝟎. 𝟕 = (𝟐. 𝟐, 𝟑. 𝟔)
3.)
Data display for year 2000: Widowed
females per 1000 for all 58 California counties.
57
59
66
76
78
80
80
81
82
83
84
84
85
86
87
88
89
90
90
90
MONO
SAN BENITO
ALPINE
YOLO
SANTA CLARA
EL DORADO
MADERA
SAN BERNARDINO
VENTURA
ORANGE
KINGS
SANTA CRUZ
SAN DIEGO
SOLANO
LOS ANGELES
ALAMEDA
MONTEREY
KERN
LASSEN
PLACER
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
91
91
91
92
93
93
93
93
94
97
97
97
98
98
99
100
102
102
103
104
FRESNO
MERCED
TULARE
CONTRA COSTA
MARIN
PLUMAS
SANTA BARBARA
STANISLAUS
SACRAMENTO
SAN JOAQUIN
SAN MATEO
YUBA
MENDOCINO
RIVERSIDE
SONOMA
GLENN
IMPERIAL
SAN LUIS OBISPO
SAN FRANCISCO
COLUSA
3a.) Is the 200 California county widowed data
symmetrical or is it skewed to left or right?
Skewed slightly right 2pts
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
104
104
105
106
109
111
112
113
116
116
119
120
121
123
125
135
138
146
HUMBOLDT
NEVADA
SUTTER
DEL NORTE
SIERRA
CALAVERAS
SHASTA
TUOLUMNE
MARIPOSA
SISKIYOU
BUTTE
TEHAMA
NAPA
TRINITY
AMADOR
INYO
LAKE
MODOC
2000 CA Widowed Females per 1000 by County
Frequency
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
18
16
14
12
10
8
6
4
2
0
60
80
100
120
140
p(x)
0.3
0.5
0.2
Stat 109
Sample Exam 1A Solutions
596
3b.) Use correct notation to express the 5 key values for a boxplot in correct notation
for the incidence of widowed females per 1000 for all 58 California counties:
11 pts
2.) Find the median:
3.) Determine the Quartile Criterion
𝑛 + 1 𝑡ℎ
𝑥̃ = (
)
2
𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒:
𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4
𝑡ℎ
𝑛 𝑡ℎ
𝑛
[4] +[4+1]
𝑄1
58 + 1 𝑡ℎ
=(
)
2
[
2
𝑡ℎ
3𝑛 𝑡ℎ
3𝑛
[ 4 ] +[ 4 +1]
2
𝑄3
59 𝑡ℎ
=(
)
2
𝑡ℎ
𝑛
+1]
4
𝑡ℎ
3𝑛
[
+1]
4
= 29.5𝑡ℎ

= 29𝑡ℎ + 0.5(30𝑡ℎ − 29𝑡ℎ )

Where square brackets indicate that we round
any decimal down to find the nth value in the
data set.
N = 58 is not divisible by 4 so:

𝑄1 = [ 4 + 1 ] 𝑎𝑛𝑑 𝑄3 = [
= 94 + 0.5(97 − 94)
= 94 + 0.5(3)
𝑥̃ = 95.5
4.) Find the 1st and 3rd Quartiles.
𝑄1 = [
=[
𝑡ℎ
𝑛
+1]
4
58
+1]
4
=[
𝑡ℎ
3 ∙ 58
+1]
4
= [ 14.5 + 1 ]𝑡ℎ
= [ 43.5 + 1 ]𝑡ℎ
= [ 15.5 ]𝑡ℎ
= [ 44.5 ]𝑡ℎ
= 15𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
= 44
𝑄1 = 87
𝑄3 = 106
𝑡ℎ
𝑡ℎ
3𝑛
4
+1]
𝑡ℎ
5.) Find the IQR and Step.
𝑡ℎ
3𝑛
𝑄3 = [
+1]
4
𝑡ℎ
𝑛
𝑣𝑎𝑙𝑢𝑒
Outliers:
LOT = 𝑄1 − 𝑆𝑡𝑒𝑝
UOT = 𝑄3 + 𝑆𝑡𝑒𝑝
LOT = 87 − 28.5
UOT = 106 + 28.5
LOT = 58.5
UOT = 134.5
IQR = Inter-quartile range
IQR = Q3 – Q1
IQR = 106 – 87
IQR = 19
𝑆𝑡𝑒𝑝 = 1.5 × 𝐼𝑄𝑅
𝑆𝑡𝑒𝑝 = 1.5 × 19
𝑆𝑡𝑒𝑝 = 28.5
Stat 109
Sample Exam 1A Solutions
597
3c.) Draw a boxplot of the CA county widow data set. Properly denote any outliers in the plot.
Note! All data points that lie outside the outlier thresholds are expressed with
asterisks. The boxplot whiskers terminate at the last data point to lie within the
outlier thresholds. It is an error to extend the whiskers to the outlier thresholds.
o

o







4.)
The lengths of adult Koi in a large pond are normally distributed
with a mean length of 25 inches and a standard deviation of 3 inches.
Answer the following Questions using the event variable with probability
notation and show the equivalent Z score within probability notation.

o
o


Declare the event variable.
X = Koi length in inches. 1pt
X = Fish. -1pt: For a vague
declaration without units of measure
4a.)
4a.) What probability corresponds to a Koi at most 30 inches in length? 8pts Note! Here is a number with a
circle drawn around it:
X 
30  25
 Z 
Z = 1.67, then
Z

3
0.95254 -1pt
P X  30  PZ  1.67  0.95254
This number has no meaning
without its supporting notation.
Answer like this with complete notation, Not like this:
4b.) The top 29% of Koi lengths must be above what length threshold? 8pts
Note: This is an upper tail probability and we must know
to use the compliment value because the normal distribution
tables only provide for lower tail probabilities.
P X  ???  PZ  ???  0.29
a)
b)
c)
d)
Search the body of the table for Z score associated with 0.71
Then PZ  0.55  0.29
X 
X  25
Find the X associated with this Z score: Z 
 0.55 
 X = 26.65

3
Report the answer using probability notation.
Show the transition from the real world X values to the associated Z scores.
P X  26.65  PZ  0.55  0.29
Stat 109
Sample Exam 1A Solutions
598
4c.) How long will a Koi’s length be if it’s length is at
the 37th percentile of length? 8pts
Note that the percentiles on the normal
curve run from zero to 100% and
accumulate from left to right.
0%
10%
20%
22"
The corresponding Koi lengths for each
of these percentiles are shown below.
30%
23"
40%
24"
50%
60%
70%
80%
25"
26"
27"
28"
90%
100%
c) We find:
d) Using Standardization we find x:
x
x  25
PZ  0.33  0.37
Z
 0.33 

3
e) Report your answer with full probability notation:
a) Search the body of the table for Z
score associated with 0.37.
P X  24  PZ  0.33  0.37
b) P X  ???   PZ  ???   0.37
4d.) The Koi population of this particular pond is the progeny of a
a) Declare the event variable. 1pt
metallic “Ogon” variety such that each of the Koi presents a number of N = The number of metallic
metallic scales on its body interspersed among the other scales of the
scales found on given Koi
fish body. Assuming that the number of metallic scales on any given
from the pond.
Koi are normally distributed throughout the population, with a mean of
N = Scales -1pt:
18 metallic scales and a standard deviation of 3 metallic scales,
For a vague declaration
determine the following probabilities.
4e.) Find the probability that a randomly drawn Koi has less than 16 metallic scales upon its body. 8pt
PN  16  PN  15
Note that discrete counts must be adjusted for a mapping to the continuous
normal curve. Strict inequalities of “less than” must be converted
numerically to “less than or equal to” if using a continuous distribution.
PN  16  PN  15  PN  15.5
Z
x


15.5  18
3
A continuity correction is necessary here because a
discrete count of fish scales must be mapped to a
continuous curve of the normal distribution. Note that we
always choose a half count in the direction that enlarges
the shaded region beneath the normal curve.
Z  0 .8 3
PN  16  PZ  0.83  0.203269
In English: “About 20% of the time a
randomly drawn Koi will have less than 16
metallic scales upon its body.
Answer with complete notation like this, and not this:
0.203269
Stat 109
Sample Exam 1A Solutions
599
4f.) Find the probability that a randomly drawn Koi has more than 17 metallic scales upon its body. 8pt
PN  17  PN  18
Note that discrete counts must be adjusted for a mapping to the continuous
normal curve. Strict inequalities of “greater than” must be converted
numerically to “greater than or equal to” if using a continuous distribution.
PN  17  PN  18  PN  17.5
Z
x


A continuity correction is necessary here because a
discrete count of fish scales must be mapped to a
continuous curve of the normal distribution. Note that we
always choose a half count in the direction that enlarges
the shaded region beneath the normal curve. Also note
that a compliment of one minus the Z-table probability
must be accounted for here.
17.5  18
3
Z  0 .1 6
PZ  0.17  1  PZ  0.17  1  0.432505
PN  17  PZ  0.17  0.567495
Answer with complete notation like this, and not this:
In English: “About 57% of the time a
randomly drawn Koi will have more than 17
metallic scales upon its body.
0.567495
4g.) Find the probability that a randomly drawn Koi has between 13 and 19 (inclusive) metallic scales
upon its body. 8pt
P13  N  19
Note that discrete counts must be adjusted for a mapping to the continuous normal
curve. Here the inclusive counts (of 13 and 19) are already specified in the problem.
P13  N  19  P12.5  N  19.5
A continuity correction is still necessary because a discrete count of fish scales must be mapped to a
continuous curve of the normal distribution. Note that we always choose a half count in the direction that
enlarges the shaded region beneath the normal curve.
Z
x

Z  1 .8 3

12.5  18
3
Z
x

Z  0.5

19.5  18
3
Stat 109
Sample Exam 1A Solutions
600
4g.) Find the probability that a randomly drawn Koi has between 13 and 19 (inclusive) metallic scales
upon its body. 8pt Continued
P13  N  19  P 1.83  Z  0.5
P13  N  19  PZ  0.5  PZ  1.83
P13  N  19  0.691462  0.033625
P13  N  19  0.657837
In English: “About 66% of the time a randomly
drawn Koi will have between 13 and 19 metallic
scales upon its body.
Answer with complete notation
like this, and not this
P13  N  19  P 1.83  Z  0.5  0.657837
0.657837
5a.) In 2005 57% of the incoming Freshman class for the California State University required either
English or Mathematic remediation of before taking college level course work. 47% required remediation in
English while 37% required remediation in Math. What portion of the 2005 Freshman class required
remediation in both English and Math? Use probability notation to express your answer. 8pts
PE  M   PE   PM   PE  M 
Declaration and notation: 2pts
PE  M   .47  .37  .57
E =Student needs English Remediation.
M =Student needs Math Remediation.
PE  M   0.27
5b.) What is the probability that a 2005 Freshman needs remediation in Math but not English?
Use probability notation to express your answer. 8pts
𝑃(𝑀 ∩ 𝐸̅ ) = 𝑃(𝑀) − 𝑃(𝑀 ∩ 𝐸)
𝑃(𝑀 ∩ 𝐸̅ ) = 0.37 − 0.27
𝑃(𝑀 ∩ 𝐸̅ ) = 0.10
5c.) Find the probability that either 6 or 7 out of 10 CSU Freshman will not require remediation class
work.
Declare the variable. Use probability notation to express your answer. 8pts
X = The number of 10 CSU freshman that will not require remediation. 1pt
10 
10 
P6  X  7    0.436 0.57 4   0.437 0.57 3
6 
7 
P6  X  7  0.140129 + 0.060407 = 0.200536
Stat 109
Sample Exam 1A Solutions
601
5d.) Find the first standard deviation window to express the typical range of 10 randomly chosen
Freshman that will not require remediation?
8pts
np  np1  p  = 10  0.43  10  0.431  0.43  4.3  1.56556  2.734, 5.866 Notation 1pt
“About 2.7 to 5.9 of 10 Freshman will typically not require remediation.”
6.) Given that one out of four salamanders are consumed as prey before reaching sexual maturity, Find the
probability that between 10% and 30% of 50 salamanders will be consumed before reaching maturity. Show
all work and use proper notation for full credit. Finish with an English sentence.
14pts
Y
 pˆ  pˆ  0.5
Method 1: Declare proportions with the continuity correction.
n
n
Declare the parameter: 𝑝̂ = proportion of 50 drawn salamanders that are juveniles. 2pt
x = Salamanders
x = consumed salamanders
x = Proportion of salamanders
that are consumed.
P = portion consumed.
-2pt for all of these inadequate declarations.
If you are going to use calculations that determine the probability
that a sample proportion will range between some specified
̂, that we must declare. Do
thresholds, then this is this variable, 𝒑
not use “x” or “y” unless you are declaring for counts and plan to
calculate accordingly (see the next page.) Be sure to include the
sample size as this will affect probability.

P0.1  pˆ  0.3  P 0.10 
0.5
0.5 
 pˆ  0.30 

50
50 

P0.1  pˆ  0.3  P0.09  pˆ  0.31
Convert proportions to Z scores.
Use:
P0.1  pˆ  0.3  P????  Z  ???


0.09  .25
Z
P0.1  pˆ  0.3  P
0.25  1  0.25

50

P0.1  pˆ  0.3  P 2.613  Z  0.9798
P0.1  pˆ  0.3  P( Z  0.98)  PZ  2.61
P0.1  pˆ  0.3  0.836457 – 0.004527
P0.1  pˆ  0.3  0.83192
Z
pˆ  p
p(1  p)
n


0.31  .25

0.25  1  0.25 

50

Answer in English: “There is about
an 83% chance that between 10% and
30% of the 50 salamanders will be
consumed before reaching maturity.
Stat 109
Sample Exam 1A Solutions
602
Problem #6) Method 2: Convert proportions to counts, then use the continuity correction.
̂=
𝒑
𝒀
→
𝒏
̂∙𝒏
𝒀=𝒑
Declare the parameter: Y = number of 50 drawn salamanders that are consumed. 2pt
x = Salamanders
x = consumed salamanders
x = Proportion of salamanders
that are consumed.
P = juveniles.
-2pt for all of these inadequate declarations.
If you are going to use calculations that determine the probability
that a sample count will range between some specified thresholds,
then this is the variable that we must declare. Be sure to include
the sample size as this will affect probability.
𝑃(0.1 ≤ 𝑝̂ ≤ 0.3) → 𝑃(0.1 ∙ 50 ≤ 𝑝̂ ∙ 𝑛 ≤ 0.3 ∙ 50) → 𝑃(5 ≤ 𝑌 ≤ 15)
𝑃(5 ≤ 𝑌 ≤ 15) ≈ 𝑃(5 − 0.5 ≤ 𝑌 ≤ 15 + 0.5) = 𝑃(4.5 ≤ 𝑌 ≤ 15.5)
After applying the continuity correction we convert the counts of juvenile
salamanders to Z-scores using the binomial expressions for the mean and
standard deviation: 𝜇 = 𝑛 ∙ 𝑝 and 𝜎 = √𝑛 ∙ 𝑝(1 − 𝑝). Note that “one out of
four” implies that p = 0.25
𝑃(4.5 ≤ 𝑌 ≤ 15.5) = 𝑃 (
4.5 − 0.25(50)
√50(0.25)(1 − 0.25)
≤𝑍≤
𝑍=
𝑍=
𝑋−𝜇
𝜎
𝑋−𝑛∙𝑝
√𝑛 ∙ 𝑝(1 − 𝑝)
15.5 − 0.25(50)
√50(0.25)(1 − 0.25)
)
𝑃(4.5 ≤ 𝑌 ≤ 15.5) = 𝑃(−2.61279 ≤ 𝑍 ≤ 0.979796) ≈ 𝑃(−2.61 ≤ 𝑍 ≤ 0.98)
We round to the nearest Hundredths for the Z-table probability values.
𝑃(4.5 ≤ 𝑌 ≤ 15.5)  P 2.613  Z  0.9798
𝑃(4.5 ≤ 𝑌 ≤ 15.5)  P( Z  0.98)  PZ  2.61
𝑃(4.5 ≤ 𝑌 ≤ 15.5) ≈ 0.836457 – 0.004527
𝑃(4.5 ≤ 𝑌 ≤ 15.5) ≈ 0.83192
Answer in English: “There is about an 83% chance that between 10% and 30% of the 50
salamanders will be consumed before reaching maturity.
Stat 109
Sample Exam 1B
Name__________
603
1.) Phenotypic risk for type 2 diabetes was assigned to 2377 people of European ancestry in the Framingham
Offspring Study. N Engl J Med. Nov 20, 2008; 359(21): 2208–2219. Participants were genotyped for 18 SNPs
(single nucleotide polymorphisms) associated with type 2 diabetes. Based upon the frequency of these 18 SNPs
a phenotypic risk for type 2 diabetes was given to each participant on a scale that ran from a low of 7 to a high
of 27. Three categories of phenotypic risk for type 2 diabetes were based upon one’s phenotypic score: Low at
less than 15, Medium for scores between 16 and 20 inclusive, and High for scores greater than 20. Europeans
over 50 years of age with a low phenotypic score had a 7% incidence of type 2 diabetes. While Europeans over
50 with a high phenotypic score had a 17% incidence of type 2 diabetes. Given that a sub-population of folks
with European ancestry over 50 years of age contains people with only high and low phenotypic scores, 30% of
which were high, with the rest having low scores, and that environmental factors such as diet and exercise have
been controlled, determine the following using correct probability notation.
Declare variables (2pts)
a.)
Find the probability that an over 50 year old with European ancestry will have Type 2 diabetes. 7pts
b)
Find the probability that an over 50 year old with European ancestry will have a low phenotypic
score given that the person has Type 2 diabetes. 7pts
c) Is the incidence of type 2 diabetes independent of the phenotypic score for people of European
ancestry? (Again use probability notation and numerical values to support your answer.) 5pts
Stat 109
Sample Exam 1B
2.) Instrumental conditioning is a term developed by Edward Thorndike (1898) in
which an animal gradually learns a task through repetition. A Thorndike box was
first used on house cats. The cat must learn how to paw at a series of levers and
latches with several false starts before she can escape from the box. Given that a
large number of individual trials were observed for cats placed in a Thorndike box,
suppose that each cat was repeatedly subjected to a Thorndike box until her escape
from the box could be performed in under a minute. A pdf of the number of trials
required for each cat to reach an escape time of less than a minute was recorded and
is posted above.
604
x, trials
4
5
6
7
8
9
10
p(x)
0.1
0.2
0.3
0.0
0.2
0.0
0.2
2a.) Find the expected number of trials required before a house cat can escape from the box in
less than a minute.(the mean of x.)
5 pts
2b.) Find the standard deviation of the number of trials required before a house cat can escape
from the box in less than a minute. 8pts
2c.) Find the typical range of the number of trials a cat needs to escape the Thorndike box in less
than a minute as the first standard deviation of x. (show your work.) 4pts (1 pt for notation)
Stat 109
Sample Exam 1B
3.) Kiama Blowhole Eruption Intervals:
An ocean swell produces spectacular eruptions of water through a hole in
the cliff at Kiama, about 120km south of Sydney, Australia, known as the
Blowhole. The times in seconds between each of 64 successive eruptions
from 1340 hours on 12 July 1998 were observed using a digital watch.
3a.) Circle the distribution that best describes the interval data for the
blowhole eruptions:
Skewed Left
Symmetrical
605
7
10
15
18
28
40
60
83
8
10
16
21
29
42
61
83
8
10
17
25
29
47
61
87
8
10
17
25
34
51
68
89
8
11
17
26
35
54
69
91
8
11
18
27
36
55
73
95
9
12
18
27
36
56
77
146
9
14
18
28
37
60
82
169
Skewed Right
3b.) Use correct notation to express the
5 key values for a boxplot for the
interval of time in seconds between
eruptions at the Kiama Blowhole.
11 pts
3c.) Draw a boxplot of the Kaima Blowhole data. Properly denote any outliers in the plot.
Stat 109
Sample Exam 1B
606
4.)
A population of 15 ruby throated hummingbirds, Archilochus
Declare the event variable.
colubris, was observed in a controlled environment as the subject of a student
senior’s thesis. Of interest was the quantitative feeding behavior of humming
birds when a food source is plentiful at numerous sites. Eight humming bird
feeders were filled with a sugar-nectar solution suspended from scales that
would digitally record the mass of each humming bird feeder after each
successive feeding. The difference in the feeder mass between each successive
recording provided a record of the amount of fluid consumed by a humming
bird at each feeding. The data of the mass consumed at each feeding was
normally distributed with a mean of 0.25 g and a standard deviation of 0.06 g.
Answer the following questions using the event variable with probability
notation and show the equivalent Z score within probability notation.
4a.)
4a.) At what percentile is a feeding of 0.2 grams? 8pts
4b.) The upper tenth percentile of feeding mass corresponds to what mass? 8pts
4c.) How much mass is consumed if the feeding
occurred at the 72nd percentile of feeding masses? 8pts
Stat 109
Sample Exam 1B
4d.) The same senior thesis on the behavior of feeding for the ruby
throated humming bird tracked the number of feedings made by each
of the 15 hummingbirds in an hour. The feedings taken in an hour by
each bird was normally distributed with a mean of 11.2 and a standard
deviation of 1.8. Determine the following probabilities using correct
notation.
607
Declare the event variable. 1pt
4e.) Find the probability that a randomly drawn hummingbird makes more than12 feedings in an hour. 8pt
4f.)
Find the probability that a randomly drawn humming bird feeds less than 10 times in an hour. 8pt
Stat 109
Sample Exam 1B
4g.) Find the probability that a randomly drawn humming bird feeds anywhere between 9 and 13 times
(inclusive) in a given hour. 8pt
4h.) Given that a randomly drawn humming bird feeds at the upper 4th percentile of feeding frequency,
how many times does the feed in an hour on average? 8pt
608
Stat 109
Sample Exam 1B
609
5a.)
The Colorado pikeminnow (AKA white or Colorado river salmon, Ptychocheilus lucius) is the
largest minnow native to North America, and it is well known for its spectacular fresh water spawning
migrations and homing ability. Despite a massive recovery effort, its numbers decline. Hampered by a loss of
habitat, the young of this once abundant fish is overwhelmed in its nursery habitat by invasive small fishes
(such as red shiner and fathead minnow). Sites sampled from over 75 tributaries of the Colorado river found
that both the invasive species of red shiner or fathead minnow was present in 38% of sampled sites. Given that
55% of the river sites had red shiners present and that 47% of the sampled sites had fathead minnows present,
what portion of the sampled sites had either invasive species present? Use probability notation to express
your answer. 6pts
5b.) What is the probability that a tributary site of the Colorado has fathead minnows but not red shiners?
Use probability notation to express your answer. 6pts
5c.) What is the probability that a tributary site of the Colorado has either both invasive species of fathead
minnows and red shiners present or has neither invasive species present?
Use probability notation to express your answer. 8pts
Stat 109
Sample Exam 1B
610
5d.) Find the probability that more than 6 out of 9 Colorado tributaries have either red shiner or fat head
minnow presence. Declare the variable. Use probability notation to express your answer. 8pts
5e.)
Find the probability that at least 8 out of 9 Colorado tributaries have fat head minnow presence. 8pts
5e.)
Find the probability that a majority of 9 Colorado tributaries have red shiner presence. 8pts
5f.)
Find the first standard deviation window to express the typical range of 9 Colorado tributaries that
have red shiner presence.
8pts
𝑎. ) 𝑥̅ 𝑎𝑛𝑑 𝜇 ?
5g.) Describe the difference in meaning between these symbols:
𝑏. ) 𝑝̂ 𝑎𝑛𝑑 𝑝 ?
𝑐. ) 𝑥̃ 𝑎𝑛𝑑 𝜂 ?
Stat 109
Sample Exam 1B
611
6.) The 4 basic categories of human blood types (O, A, B, and AB) are coupled with an Rh factor that is
denoted with a plus (+) for its presence or minus sign (−) for its absence. This Rh factor is found
predominantly in rhesus monkeys, and to varying degree in human populations. For the US population it is
present in 83.3% of the population. Given that 40 people from the United States are randomly drawn, what
is the probability the sample proportion has between 80% to 90% of folks with an Rh + factor for their
blood type? Show all work and use proper notation for full credit. Finish with an English sentence.
14pts
Stat 109
Sample Exam 1B
Solution
612
1.) Phenotypic risk for type 2 diabetes was assigned to 2377 people of European ancestry in the Framingham
Offspring Study. N Engl J Med. Nov 20, 2008; 359(21): 2208–2219. Participants were genotyped for 18 SNPs
(single nucleotide polymorphisms) associated with type 2 diabetes. Based upon the frequency of these 18 SNPs
a phenotypic risk for type 2 diabetes was given to each participant on a scale that ran from a low of 7 to a high
of 27. Three categories of phenotypic risk for type 2 diabetes were based upon one’s phenotypic score: Low at
less than 15, Medium for scores between 16 and 20 inclusive, and High for scores greater than 20. Europeans
over 50 years of age with a low phenotypic score had a 7% incidence of type 2 diabetes. While Europeans over
50 with a high phenotypic score had a 17% incidence of type 2 diabetes. Given that a sub-population of folks
with European ancestry over 50 years of age contains people with only high and low phenotypic scores, 30% of
which were high, with the rest having low scores, and that environmental factors such as diet and exercise have
been controlled, determine the following using correct probability notation.
Declare variables (2pts)
D = Event that a European has Type 2 diabetes.
L = Event that a European has a low SNP score.
H = Event that a European has a high SNP score.
a.)
Find the probability that an over 50 year old with European ancestry will have Type 2 diabetes.
PD   PD L PL  PD H  PH 
PD  0.07  0.7  0.17  0.3
PD  .10
c)
Notation: 2pts
Calculation: 5pts
“About 10% of Europeans over 50 will have Type 2 diabetes.”
Find the probability that an over 50 year old with European ancestry will have a low phenotypic
score given that the person has Type 2 diabetes. 7pts
P L D  
P D L  P L 
P L  D 

P D 
P D 
0.07  0.70
P L D  
= 0.49
0.10
Notation: 2pts
Calculation: 5pts
“About 49% of over 50 year olds with European ancestry with Type 2 diabetes will have a
low phenotypic score.”
c) Is the incidence of type 2 diabetes independent of the phenotypic score for people of European
ancestry? (Again use probability notation and numerical values to support your answer.) 5pts
Stat 109
Sample Exam 1B
Solution
613
1c.) Continued..
For independent events the following must be true: P A  B  P A PB
We will use the Type 2 diabetes and low phenotypic scores since it is at hand:
PL  D   PL   PD  ? Is this true?
???
PL  D  PD L PL from above
PL  D  0.07  0.7
PL  D  0.049
Since 0.049  0.07
Then: PL  D  PL  PD
Therefore we do not have independent
events.
PD  PL  0.1 0.70
PD  PL  0.07
In English:
“The probability of whether an over 50 year old
of European ancestry develops Type 2 diabetes
depends on their phenotypic score.”
2.) Instrumental conditioning is a term developed by Edward Thorndike (1898)
in which an animal gradually learns a task through repetition. A Thorndike box
was first used on house cats. The cat must learn how to paw at a series of levers
and latches with several false starts before she can escape from the box. Given
that a large number of individual trials were observed for cats placed in a
Thorndike box, suppose that each cat was repeatedly subjected to a Thorndike box
until her escape from the box could be performed in under a minute. A pdf of the
number of trials required for each cat to reach an escape time of less than a minute
was recorded and is posted above.
x, trials
4
5
6
7
8
9
10
p(x)
0.1
0.2
0.3
0.0
0.2
0.0
0.2
2a.) Find the expected number of trials required before a house cat can escape from the box in
less than a minute.(the mean of x.)
5 pts
   xi pxi   4  0.1  5  0.2  6  0.3  7  0.0  8  0.2  9  0.0  10  0.2 = 6.8 
1
pt for Notation
2
  6.8
2b.) Find the standard deviation of the number of trials required before a house cat can escape
from the box in less than a minute. 8pts

 x px   
2
i
2
i
 4 2 .1  52 .2   6 2 .3  7 2 0  82 .2   9 2 0  10 2 .2   6.82  3.96  1.99

  1.99
Stat 109
2c.)
Sample Exam 1B
Solution
614
Find the typical range of the number of trials a cat needs to escape the Thorndike box in less than
a minute as the first standard deviation of x. (show your work.) 4pts (1 pt for notation)
𝝁 ± 𝝈 = 𝟔. 𝟖 ± 𝟏. 𝟗𝟗 = (𝟒. 𝟖𝟏, 𝟖. 𝟕𝟗)
3.) Kiama Blowhole Eruption Intervals:
An ocean swell produces spectacular eruptions of water through a hole in
the cliff at Kiama, about 120km south of Sydney, Australia, known as the
Blowhole. The times in seconds between each of 64 successive eruptions
from 1340 hours on 12 July 1998 were observed using a digital watch.
3a.) Circle the distribution that best describes the interval data for the
blowhole eruptions:
Skewed Left
Symmetrical
7
10
15
18
28
40
60
83
8
10
16
21
29
42
61
83
8
10
17
25
29
47
61
87
8
10
17
25
34
51
68
89
8
11
17
26
35
54
69
91
8
11
18
27
36
55
73
95
9
12
18
27
36
56
77
146
9
14
18
28
37
60
82
169
Skewed Right
3b.) Use correct notation to express the
5 key values for a boxplot for the
interval of time in seconds between
eruptions at the Kiama Blowhole.
11 pts
3b1.) Find the median:
𝑛 + 1 𝑡ℎ
𝑥̃ = (
)
2
3b2.) Determine the Quartile Criterion
𝐼𝑓 𝑁 ÷ 4, 𝑊𝑒 𝐴𝑣𝑒𝑟𝑎𝑔𝑒:
𝑡ℎ
𝑛 𝑡ℎ
𝑛
[4] +[4+1]
𝑄1
64 + 1 𝑡ℎ
=(
)
2
65 𝑡ℎ
=(
)
2
𝐼𝑓 𝑁 𝑖𝑠 𝑛𝑜𝑡 ÷ 4
[
2
𝑡ℎ
3𝑛 𝑡ℎ
3𝑛
[ 4 ] +[ 4 +1]
2
𝑄3
𝑡ℎ
𝑛
+1]
4
𝑡ℎ
3𝑛
[
+1]
4
= 32.5𝑡ℎ
= 32𝑛𝑑 + 0.5(33𝑟𝑑 − 32𝑛𝑑 )

= 28 + 0.5(28 − 28)

= 28 + 0.5(0)
𝑥̃ = 28

Where square brackets indicate that we round any
decimal down to find the nth value in the data set.
N = 64 is divisible by 4 so:
𝑄1 =
[
𝑡ℎ
𝑛 𝑡ℎ
𝑛
] +[ +1 ]
4
4
2
𝑎𝑛𝑑 𝑄3 =
[
𝑡ℎ
3𝑛 𝑡ℎ
3𝑛
] +[ +1 ]
4
4
2
Stat 109
Sample Exam 1B
Solution
615
3b3.) Find the 1st and 3rd Quartiles.
𝑡ℎ
𝑛 𝑡ℎ
𝑛
[4] +[4+1]
𝑄1 =
2
𝑡ℎ
64 𝑡ℎ
64
[
] +[ 4 +1]
𝑄1 = 4
2
3.) Find the 1st and 3rd Quartiles, Continued:
𝑡ℎ
3𝑛 𝑡ℎ
3𝑛
[ 4 ] +[ 4 +1]
𝑄3 =
2
𝑡ℎ
3 ∙ 64 𝑡ℎ
3 ∙ 64
]
+
[
+
1
]
4
4
𝑄3 =
2
[
3b4.) Find the IQR and Step.
[ 16 ]𝑡ℎ + [ 17 ]𝑡ℎ
2
14 + 15
𝑄1 =
2
𝑄1 =
𝑄1 = 14.5
[ 48 ]𝑡ℎ + [ 49 ]𝑡ℎ
2
60 + 60
𝑄3 =
2
𝑄3 =
𝑄3 = 60
3b5.) Find the Outlier Thresholds
IQR = Inter-quartile range
LOT = 𝑄1 − 𝑆𝑡𝑒𝑝
IQR = Q3 – Q1
LOT = 14.5 − 68.25
IQR = 60 – 14.5
LOT = −53.75
IQR = 45.5
UOT = 𝑄3 + 𝑆𝑡𝑒𝑝
𝑆𝑡𝑒𝑝 = 1.5 × 𝐼𝑄𝑅
UOT = 60 + 68.25
𝑆𝑡𝑒𝑝 = 1.5 × 45.5
𝑆𝑡𝑒𝑝 = 68.25
UOT = 128.25
3c.) Draw a boxplot of the Kaima Blowhole data. Properly denote any outliers in the plot.
Note! All data points that lie outside the outlier thresholds are expressed with
asterisks. The boxplot whiskers terminate at the last data point to lie within the
outlier thresholds. It is an error to extend the whiskers to the outlier thresholds.
*
*
Stat 109
Sample Exam 1B
Solution
4.)
A population of 15 ruby throated hummingbirds, Archilochus
colubris, was observed in a controlled environment as the subject of a
student senior’s thesis. Of interest was the quantitative feeding behavior of
humming birds when a food source is plentiful at numerous sites. Eight
humming bird feeders were filled with a sugar solution suspended from
scales that would digitally record the mass of each humming bird feeder
after each successive feeding. The difference in the feeder mass between
each successive recording provided a record of the amount of fluid
consumed by a humming bird at each feeding. The data of the mass
consumed at each feeding was normally distributed with a mean of 0.25 g
and a standard deviation of 0.06 g.
4a.) Answer the following questions using the event variable with probability
notation and show the equivalent Z score within probability notation.
4a.) At what percentile is a feeding of 0.2 grams? 8pts
X 
0.2  0.25
 Z 
Z = -0.8333, then
Z
0.6

616
Declare the event variable.
X = grams of sugar consumed
by a humming bird at a single
feeding. 1pt
For a vague declaration
without units of measure:
X = food. -1pt
X = bird. -1pt
X = sugar -1pt
Note! Here is a number with a
circle drawn around it:
0.203269
-1pt
This number has no meaning
without its supporting notation.
P X  0.2  PZ  0.83  0.203269
Answer like this with complete notation, Not like this:
Note that even if the prompt was in percentiles, we answer with probability notation.
4b.) The upper tenth percentile of feeding mass corresponds to what mass? 8pts
Note: This is an upper tail probability and we must know
to use the compliment value because the normal distribution
tables only provide for lower tail probabilities.
P X  ???   PZ  ???   0.10
a) Search the body of the table for Z score associated with 0.90.
b) Note that 0.899727 is closer to 0.90 than 0.901475. If you do not see this try subtracting 0.90 from both
values. Then we take the associated Z-score of 1.28 for 0.899727 as our closest estimate for 0.90.
c) Then PZ  1.28  0.10
d) Find the X associated with this Z score: Z 
X 

 1.28 
X  0.25
 X = 0.3268
0.06
e) Report the answer using probability notation.
Show the transition from the real world X values to the associated Z scores.
P X  0.3268  PZ  1.28  0.10
Stat 109
Sample Exam 1B
Solution
617
4c.) How much mass is consumed if the feeding
occurred at the 72nd percentile of feeding masses? 8pts
Note that the percentiles on the
normal curve run from zero to
100% and accumulate from left to
right.
The corresponding masses for these
percentiles of feedings are shown below.
a) Search the body of the table for Z
score associated with 0.72.
c) We find:
d) Using Standardization we find x:
x
x  0.25
PZ  0.58  0.72
Z
 0.58 

0.6
e) Report your answer with full probability notation:
P X  0.2848  PZ  0.58  0.72
b) P X  ???   PZ  ???   0.72
4d.) The same senior thesis on the behavior of feeding for the ruby
throated humming bird tracked the number of feedings made by each
of the 15 hummingbirds in an hour. The feedings taken in an hour by
each bird was normally distributed with a mean of 11.2 and a standard
deviation of 1.8. Determine the following probabilities using correct
notation.
b) Declare the event variable. 1pt
N = The number of feedings
made by a humming bird
in an hour.
N = feedings -1pt:
For a vague declaration
4e.) Find the probability that a randomly drawn hummingbird makes more than12 feedings in an hour. 8pt
PN  12  PN  13
Note that discrete counts must be adjusted for a mapping to the continuous
normal curve. Strict inequalities of “more than” must be converted
numerically to “more than or equal to” if using a continuous distribution.
PN  12  PN  13  PN  12.5
Z
x


12.5  11.2
1.8
Z  0.72
PZ  0.72  0.764237
PN  12  1  PZ  0.72  0.235763
A continuity correction is necessary here because a
discrete count of visits to a feeder must be mapped to a
continuous curve of the normal distribution. Note that we
always choose a half count in the direction that enlarges
the shaded region beneath the normal curve. Also note
that a compliment of one minus the Z-table probability
must be accounted for here.
In English: “About 23.6% of the time a
randomly drawn humming bird will make more
than 12 feedings in an hour.
Answer with complete notation like this, and not this:
0.235763
Stat 109
4f.)
Sample Exam 1B
Solution
618
Find the probability that a randomly drawn humming bird feeds less than 10 times in an hour. 8pt
PN  10  PN  9
Note that discrete counts must be adjusted for a mapping to the continuous
normal curve. Strict inequalities of “less than” must be converted
numerically to “less than or equal to” if using a continuous distribution.
PN  10  PN  9  PN  9.5
Z
x


A continuity correction is necessary here because a
discrete count of feedings must be mapped to a
continuous curve of the normal distribution. Note that we
always choose a half count in the direction that enlarges
the shaded region beneath the normal curve.
9.5  11.2
1.8
Z  0.9444
PZ  0.94  0.173609
In English: “About 17.4% of the time a
randomly drawn humming bird will feed less
than 10 times in an hour.”
PN  10  PZ  0.94  0.173609
Answer with complete notation like this, and not this:
0.173609
4g.) Find the probability that a randomly drawn humming bird feeds anywhere between 9 and 13 times
(inclusive) in a given hour. 8pt
P9  N  13
Note that discrete counts must be adjusted for a mapping to the continuous normal curve.
Here the inclusive counts (of 9 and 13) are already specified in the problem.
P9  N  13  P8.5  N  13.5
A continuity correction is still necessary because a
discrete count of visits to the feeder must be mapped to
a continuous curve of the normal distribution. Note
that we always choose a half count in the direction that
enlarges the shaded region beneath the normal curve.
Z Low 
x

Z Low  1.5
Continued

8.5  11.2
1.8
Z High 
x


Z High  1.2778
13.5  11.2
1.8
Stat 109
Sample Exam 1B
Solution
619
4g.) Find the probability that a randomly drawn humming bird feeds anywhere between 9 and 13 times
(inclusive) in a given hour. 8pt Continued
P9  N  13  P 1.50  Z  1.28
In English: “About 83.3% of the time a
randomly drawn humming bird will feed
between 9 and 13 times in an hour.
P9  N  13  PZ  1.28  PZ  1.50
P9  N  13  0.899727  0.066807
Answer with complete notation
like this, and not this
P13  N  19  0.83292
P9  N  13  P 1.50  Z  1.28  0.83292
0.83292
4h.) Given that a randomly drawn humming bird feeds at the upper 4th percentile of feeding frequency,
how many times does the feed in an hour on average? 8pt
PN  ??  PZ  ??  0.04
The prompt for an upper 4th percentile
means that we must search the body of
the table for the “lower” 96th percentile as
the Z-tables will only give lower tailed
probabilities.
a) Search the body of the table for Z
score associated with 0.96.
b) P X  ???   PZ  ???   0.96
c) We find:
PZ  1.75  0.96
e) Solve for x and report your answer
with full probability notation:
f)
PN  14.85  PZ  1.75  0.04
d) Using Standardization we find x. But careful here! We
must provide for the continuity correction and for an
upper tail this means that we must subtract 0.5 from the
lower threshold to expand the probability space by half a
visit to a humming bird feeder.
x
x  0.5  11.2 x  11.7
Z
 1.75 


1.8
1.8
x  14.85
A humming bird at the upper 4th percentile will feed an
average of 14.85 times per hour.
Stat 109
Sample Exam 1B
Solution
620
5a.)
The Colorado pikeminnow (AKA white or Colorado river salmon, Ptychocheilus lucius) is the
largest minnow native to North America, and it is well known for its spectacular fresh water spawning
migrations and homing ability. Despite a massive recovery effort, its numbers decline. Hampered by a loss of
habitat, the young of this once abundant fish is overwhelmed in its nursery habitat by invasive small fishes
(such as red shiner and fathead minnow). Sites sampled from over 75 tributaries of the Colorado river found
that both the invasive species of red shiner or fathead minnow was present in 38% of sampled sites. Given that
55% of the river sites had red shiners present and that 47% of the sampled sites had fathead minnows present,
what portion of the sampled sites had either invasive species present? Use probability notation to express
your answer. 6pts
PR  F   PR  PF   PR  F 
Declaration and notation: 2pts
PR  F   0.55  0.47  0.38
R =Red shiners are present in sample.
F =Fatheads are present in sample.
PR  F   0.64
5b.) What is the probability that a tributary site of the Colorado has fathead minnows but not red shiners?
Use probability notation to express your answer. 6pts
𝑃(𝐹 ∩ 𝑅̅ ) = 𝑃(𝐹) − 𝑃(𝐹 ∩ 𝑅)
𝑃(𝐹 ∩ 𝑅̅ ) = 0.47 − 0.38
𝑃(𝐹 ∩ 𝑅̅ ) = 0.09
5c.) What is the probability that a tributary site of the Colorado has either both invasive species of fathead
minnows and red shiners present or has neither invasive species present?
Use probability notation to express your answer. 8pts
𝑃((𝐹 ∩ 𝑅) ∪ (𝐹̅ ∩ 𝑅̅ )) = 1 − 𝑃(𝐹) − 𝑃(𝑅) + 2𝑃(𝐹 ∩ 𝑅)
𝑃((𝐹 ∩ 𝑅) ∪ (𝐹̅ ∩ 𝑅̅ )) = 1 − 0.47 − 0.55 + 2 ∙ 0.38
𝑃((𝐹 ∩ 𝑅) ∪ (𝐹̅ ∩ 𝑅̅ )) = 0.74
5d.) Find the probability that more than 6 of 9 Colorado tributaries have either red shiner or fat head
minnow presence.
Declare the variable. Use probability notation to express your answer. 8pts
X = The number of 9 Colorado tributary sites that have either invasive species. 1pt
 9
 9
 9
P X  6  P X  7   0.647 0.362   0.6480.361   0.649 0.360
 7
8 
 9
P X  6  0.205195 + 0.091198 + 0.018014 = 0.31441
Stat 109
5e.)
Sample Exam 1B
Solution
621
Find the probability that at least 8 of 9 Colorado tributaries have fat head minnow presence. 8pts
X = The number of 9 Colorado tributary sites that have fat head minnows. 1pt
 9
 9
P X  8   0.4780.531   0.479 0.530
8 
 9
P X  8  0.011358 + 0.001119 = 0.012477
5e.)
Find the probability that a majority of 9 Colorado tributaries have red shiner presence. 8pts
X = The number of 9 Colorado tributary sites that have red shiners. 1pt
 9
 9
 9
 9
 9
P X  5   0.5550.454   0.556 0.453   0.557 0.452   0.5580.451   0.559 0.450
5
 6
 7
8 
 9
P X  5  0.260036 + 0.211881 + 0.110986 + 0.033912 + 0.004605 = 0.62142
5f.)
Find the first standard deviation window to express the typical range of 9 Colorado tributaries that
have red shiner presence.
8pts
= 9  0.55  9  0.551  0.55  4.95  1.492481  3.458, 6.172 Notation 1pt
“About 3.5 to 6.2 of 9 Colorado tributaries will have red shiner presence.”
𝑎. ) 𝑥̅ 𝑎𝑛𝑑 𝜇 ?
5g.) Describe the difference in meaning between these symbols:
𝑏. ) 𝑝̂ 𝑎𝑛𝑑 𝑝 ?
𝑐. ) 𝑥̃ 𝑎𝑛𝑑 𝜂 ?
𝑎. )
𝑥̅ denotes the sample mean while 𝜇 denotes the population mean (or true mean).
𝑏. )
𝑝̂ denotes the sample proportion while p denotes the population proportion (or true proportion).
𝑐. )
𝑥̃ denotes the sample median while 𝜂 denotes the population median (or true median).
Stat 109
Sample Exam 1B
Solution
622
6.) The 4 basic categories of human blood types (O, A, B, and AB) are coupled with an Rh factor that is
denoted with a plus (+) for its presence or minus sign (−) for its absence. This Rh factor is found
predominantly in rhesus monkeys, and to varying degree in human populations. For the US population it is
present in 83.3% of the population. Given that 40 people from the United States are randomly drawn, what
is the probability the sample proportion has between 80% to 90% of folks with an Rh + factor for their
blood type? Show all work and use proper notation for full credit. Finish with an English sentence.
14pts
Y
 pˆ  pˆ  0.5
Method 1: Declare proportions with the continuity correction.
n
n
Declare the parameter:
𝑝̂ = proportion of 40 randomly drawn Americans with Rh+ blood types. 2pt
x = USA Rh+ blood
x = Rh+ blood types
x = Proportion of USA with Rh+
p = proportion of USA with Rh+
-2pt for all of these inadequate declarations.
If you are going to use calculations that determine the probability
that a sample proportion will range between some specified
̂, that we must declare. Do
thresholds, then this is this variable, 𝒑
not use “x” or “y” unless you are declaring for counts and plan
to calculate accordingly (see the next page.) Be sure to include
the sample size as this will affect probability.
0.5
0.5 

 pˆ  0.90 
P0.8  pˆ  0.9  P 0.80 

40
40 

P0.8  pˆ  0.9  P0.7875  pˆ  0.9125
Convert proportions to Z scores.
Use:
pˆ  p
P0.8  pˆ  0.9  P????  Z  ???


0.7875  .833
Z
P0.8  pˆ  0.9  P
 0.833  1  0.833

40

P0.8  pˆ  0.9  P 0.7715  Z  1.3481
P0.8  pˆ  0.9  P( Z  1.35)  PZ  0.77
P0.8  pˆ  0.9 = 0.911492 – 0.220650
P0.8  pˆ  0.9  0.690842
Z
p(1  p)
n


0.9125  .833 
0.833  1  0.833 

40

Answer in English: “There is about a
69.1% chance that between 80% and
90% of 40 randomly drawn Americans
will have Rh+ blood types.
Stat 109
Sample Exam 1B
Solution
623
Problem #6) Method 2: Convert proportions to counts, then use the continuity correction.
̂=
𝒑
𝒀
→
𝒏
̂∙𝒏
𝒀=𝒑
Declare the parameter: Y = number of 40 randomly drawn Americans with Rh+ blood types. 2pt
x = Rh+ blood
x = American Rh+ blood
x = Number of Americans
with Rh+ blood.
P = proportion of Rh+ blood.
-2pt for all of these inadequate declarations.
If you are going to use calculations that determine the probability
that a sample count will range between some specified thresholds,
then this is the variable that we must declare. Be sure to include
the sample size as this will affect probability.
𝑃(0.8 ≤ 𝑝̂ ≤ 0.9) = 𝑃(0.8 ∙ 40 ≤ 𝑝̂ ∙ 𝑛 ≤ 0.9 ∙ 40) = 𝑃(32 ≤ 𝑌 ≤ 36)
𝑃(32 ≤ 𝑌 ≤ 36) ≈ 𝑃(32 − 0.5 ≤ 𝑌 ≤ 36 + 0.5) = 𝑃(31.5 ≤ 𝑌 ≤ 36.5)
After applying the continuity correction we convert the counts of
Americans with Rh+ blood types to Z-scores using the binomial
expressions for the mean and standard deviation: 𝜇 = 𝑛 ∙ 𝑝 and 𝜎 =
√𝑛 ∙ 𝑝(1 − 𝑝). Note that p = 0.833.
𝑍=
𝑍=
𝑋−𝜇
𝜎
𝑋−𝑛∙𝑝
√𝑛 ∙ 𝑝(1 − 𝑝)
31.5 − 40(.833)
36.5 − 40(.833)
𝑃(31.5 ≤ 𝑌 ≤ 36.5) = 𝑃 (
≤𝑍≤
)
√40(0.833)(1 − 0.833)
√40(0.833)(1 − 0.833)
𝑃(31.5 ≤ 𝑌 ≤ 36.5) = 𝑃(−0.7715 ≤ 𝑍 ≤ 1.3481) ≈ 𝑃(−0.77 ≤ 𝑍 ≤ 1.35)
We round to the nearest Hundredths for the Z-table probability values.
Answer in English: “There is about a
P31.5  Y  36.5  P( Z  1.35)  PZ  0.77 69.1% chance that between 80% and
P31.5  Y  36.5 = 0.911492 – 0.220650
P31.5  Y  36.5  0.690842
90% of 40 randomly drawn Americans
will have Rh+ blood types.
Stat 109
Blank Page for Separation….
Sample Exam 1B
Solution
624
Stat 109
Formula Sheet For Exam 2
Name____________ 625
List of Formulas
  P(Type 1 error) = P(reject H 0 when H 0 is true)
Binomial Distribution:
n
n j
PY  j     p j 1  p 
 j
  P(Type 2 error) = P(do not reject H 0 when H A is true)
Power = 1-P(Type 2 error) = P(reject H 0 when H A is true)
CI on proportion
pˆ 1  pˆ 
Y
where pˆ 
pˆ  Z 
2
n
n
~
p  Z
2
Optimal N:
2
𝑛=
(𝑍𝛼⁄2 )
4𝐸 2
~
p 1  ~
p
Y 2
where ~
p
n4
n4
One Sample Proportion test:
pˆ  p 0
Ample Sample: npo(1 – po) > 10
Z Sample 
p 0  (1  p 0 )
n
Independent
Dependent
One Sample t-test and Paired t-test:
Two-Sample t-test:
df =
 s12
s2 
 n  2n 
1
2

2
2
s

s

 n 
 n 
1
2


n1  1
n2  1
2
1
2
2
2
y1  y 2
t Sample 
2
1
s12 s 22

n1 n2
t Sample 
y  0
sd
n
CI = y  t
n
2
When  is known: CI = y  Z 
2
2
s
s

CI =  y1  y 2    t 
 2  n1 n2
sd
Optimal N:
𝑛=(
𝑍𝛼⁄ 𝜎
2
𝐸
Wilcoxon-Mann-Whitney test:
U Sample  max K1 , K 2 
Sign test:
BSample  max  N  , N   for 2-tailed
K1  Count of values in sample 2 < sample 1.
K 2  Count of values in sample 1 < sample 2.
BSample  N  for upper tailed test
Wilcoxon’s- Rank Sum test:
n = size of the group with the larger rank-sum
n’ = size of the group with the smaller rank-sum
n ( n  1)
U Sample = larger rank-sum 
2

n
2
2
)
BSample  N  for lower tailed test
Wilcoxon’s Signed-Rank test:
2-tailed:
WSample  max sum of - ranks , sum of  ranks 
WSample  sum of - ranks Lower tailed test
WSample  sum of  ranks Upper tailed test
Table 4: Test of hypothesis t-Table. Assume: Normal Distribution
0.20
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
70
80
90
100
1000
∞
0.40
1.37638
1.06066
0.97847
0.94097
0.91954
0.90570
0.89603
0.88889
0.88340
0.87906
0.87553
0.87261
0.87015
0.86806
0.86625
0.86467
0.86328
0.86205
0.86095
0.85996
0.85907
0.85827
0.85753
0.85686
0.85624
0.85567
0.85514
0.85465
0.85419
0.85377
0.85070
0.84887
0.84765
0.84679
0.84614
0.84563
0.84523
0.841981
0.841621
.10
.20
3.07768
1.88562
1.63778
1.53320
1.47589
1.43977
1.41493
1.39685
1.38303
1.37215
1.36342
1.35621
1.35019
1.34502
1.34060
1.33677
1.33338
1.33036
1.32775
1.32533
1.32320
1.32125
1.31944
1.31783
1.31636
1.31497
1.31369
1.31253
1.31142
1.31038
1.30308
1.29868
1.29581
1.29376
1.29222
1.29103
1.29007
1.28200
1.28155
626
Level of Significance for one-tailed test
.05
.025
.01
.005
Level of Significance for two-tailed test
.10
.05
.02
.01
6.31375
12.7062
31.8206
63.6570
2.91999
4.3027
6.9646
9.9248
2.35341
3.1825
4.5407
5.8410
2.13184
2.7764
3.7470
4.6041
2.01505
2.5706
3.3649
4.0321
1.94317
2.4469
3.1427
3.7075
1.89456
2.3646
2.9980
3.4995
1.85953
2.3060
2.8965
3.3554
1.83313
2.2622
2.8215
3.2498
1.81244
2.2281
2.7638
3.1693
1.79588
2.2010
2.7181
3.1058
1.78228
2.1788
2.6810
3.0545
1.77094
2.1604
2.6503
3.0123
1.76133
2.1448
2.6245
2.9768
1.75307
2.1315
2.6025
2.9467
1.74587
2.1199
2.5835
2.9208
1.73962
2.1098
2.5669
2.8982
1.73407
2.1009
2.5524
2.8784
1.72911
2.0930
2.5395
2.8610
1.72474
2.0860
2.5280
2.8453
1.72074
2.0796
2.5176
2.8314
1.71715
2.0739
2.5083
2.8187
1.71389
2.0687
2.4999
2.8074
1.71087
2.0639
2.4921
2.7969
1.70813
2.0595
2.4851
2.7874
1.70563
2.0556
2.4786
2.7787
1.70326
2.0519
2.4727
2.7707
1.70112
2.0484
2.4671
2.7633
1.69911
2.0452
2.4620
2.7564
1.69724
2.0423
2.4573
2.7500
1.68386
2.0211
2.4232
2.7045
1.67589
2.0085
2.4033
2.6778
1.67065
2.0003
2.3902
2.6604
1.66692
1.9944
2.3808
2.6480
1.66413
1.9901
2.3739
2.6387
1.66196
1.9867
2.3685
2.6316
1.66024
1.9840
2.3642
2.6259
1.64600
1.9620
2.3300
2.5810
1.64485
1.9600
2.3264
2.5758
Table 3B: Test of Hypothesis z-Table.
Table 9A Critical Chi-Squared values
Level of Significance
0.10
0.05
0.02
0.01
df 0.20
Onetailed
.10
.05
.025
.01
.005
.001
.0005
.00005
.000005
Twotailed
.20
.10
.05
.02
.01
.002
.001
.0001
.00001
z
1.282
1.645
1.960
2.326
2.576
3.090
3.291
3.891
4.491
1
1.64
2
3.22
3
4.64
4
5.99
5
7.29
6
8.56
7
9.80
8
11.03
9
12.24
10 13.44


2.71
4.61
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99

3.84
5.99
7.81
9.49
11.07
12.59
14.07
15.51
16.92
18.31

5.41
7.82
9.84
11.67
13.39
15.03
16.62
18.17
19.68
21.16

6.63
9.21
11.34
13.28
15.09
16.81
18.48
20.09
21.67
23.21

.0005
.001
636.607
31.598
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.849
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.496
3.460
3.435
3.416
3.402
3.391
3.300
3.291
0.001
10.83
13.82
16.27
18.47
20.51
22.46
24.32
26.12
27.88
29.59

0.0001
15.14
18.42
21.11
23.51
25.74
27.86
29.88
31.83
33.72
35.56

Stat 109:
Nonparametric Critical values for paired medians
627
Table 7: B-Critical Values of for the Sign Test: Note that because the null distribution is discrete the
actual tail probabilities corresponding to a given critical value is typically somewhat less than the
column heading. Accordingly, if BSample = BCritical, bracket the p-value to the right of any matching
sample and critical values.
Nominal Tail Probability
2
tail:
.20
.10
.05
.02
.01
.002
.001
nd
1 tail: .10
.05
.025
.01
.005
.001
.0005
1
2
3
4
5
5
5
6
6
6
6
7
6
7
7
7
8
7
7
8
8
8
9
7
8
8
9
9
10
8
9
9
10
10
10
11
9
9
10
10
11
11
11
12
9
10
10
11
11
12
12
13
10
10
11
12
12
13
13
14
10
11
12
12
13
13
14
15
11
12
12
13
13
14
14
Table 8: W-Critical Values of for the Wilcoxon Signed-rank Test: Note that because the null
distribution is discrete the actual tail probabilities corresponding to a given critical value is typically
somewhat less than the column heading. Accordingly, if WSample = WCritical, bracket the p-value to the
right of any matching sample and critical values.
Nominal Tail Probability
2
tail:
.20
.10
.05
.02
.01
.002
nd
1 tail: .10
.05
.025
.01
.005
.001
1
2
3
4
10
5
13
15
6
18
19
21
7
23
25
26
28
8
28
31
33
35
36
9
35
37
40
42
44
10
41
45
47
50
52
55
11
49
53
56
59
61
65
12
57
61
65
69
71
76
13
65
70
74
79
82
87
14
75
80
84
90
93
99
15
84
90
95
101
105
112
.001
.0005
66
77
89
101
114
Stat 109:
Nonparametric Critical values for independent medians
628
Stat 109:
Nonparametric Critical values for independent medians
629
Stat 109
Chi-square Critical Values for Tail End Curve Areas
630
Stat 109
Exam 2A
Sample Exam
631
1.)
In a study of the development of the thymus gland, researchers weighed the
glands of ten chick embryos. Five of the embryos had been incubated for 14 days and
another five had been incubated for 15 days. The thymus gland weights in mg for both
groups are shown in the table.
14 days
15 days
Difference
29.6
32.7
-3.10
1a.) Run an appropriate 7 step hypothesis test to
21.5
40.3
-18.8
determine if the mean thymus gland weight at day 14 is
28.0
23.7
4.3
significantly different than the weight at day 15.
34.6
25.2
9.4
44.9
24.2
20.7
nd  5
n1  5
n2  5
y1  31.72
y2  29.22
y d  2.5
s1  8.73
s2  7.19
s d  14.72
1b.)
2.)
2a)
Note that the chicks that were incubated longer had a smaller mean thymus gland weight.
Is this “reverse” result surprising, or could it easily be attributed to chance?
Explain why or why not.
Suppose we have conducted a t test, with  = 0.05, and p- value = 0.08.
Determine whether the following statements are true or false.
We reject H 0 when  = 0.05
2b)
We would reject H 0 if  = 0.10
2c)
If H 0 is true, then the probability of getting a sample-t at least
as extreme as the one generated by our sample data is 8%.
Suppose the null hypothesis was rejected. Then a type 2 error
might have been made in the hypothesis conclusion.
2d)
Stat 109
Exam 2A
3.)
In a pharmacological study, researchers measured the
concentration of the brain chemical dopamine in ng/g in six rats
exposed toluene and six control rats. The concentrations in the
striatum region of the brain for both groups are shown in the table.
Suppose that at least one of the data sets is not normally
distributed. Run an appropriate 7 step hypothesis test to
determine if the dopamine level found in rats exposed to toluene is
greater than that found in the controls.
Sample Exam
632
Toluene
Control
Difference
3420
2314
1911
2464
2781
2803
1820
1843
1397
1803
2593
1990
1600
471
541
661
188
813
4.)
In an experiment to compare two diets for fattening
beef steers, nine pairs of steers were chosen from the herd;
members of each pair were matched as closely as possible with
respect to age and hereditary factors. The members of each pair
were randomly allocated, one to each diet. The following table
shows the weight gain in pounds of the animals over a 140 day
test period on diet 1 and diet 2.
Diet 1 Diet 2 Difference
596
498
98
422
460
-38
524
468
56
454
458
-4
538
530
8
552
482
70
478
528
-50
4a.) Assume that all three columns are normally distributed
564
598
-34
556
456
100
Run an appropriate 7 step hypothesis test at   0.10 to
determine whether there is a significant difference in the
Mean
520.4
497.6
22.9
weight gained between the two diets.
SD
57.1
47.3
59.3
4b.) Assume that the column data of the steer diet comparisons are not normally distributed.
Run an appropriate 7 step hypothesis test at   0.01 to determine whether there is a
significant difference in the weight gained between the two diets.
4c.)
Calculate the exact p-value of the data as determined by the sign test.
Stat 109
Exam 2A
Sample Exam
633
Stat 109
Exam 2A Solutions
Sample Exam
634
1.)
In a study of the development of the thymus gland, researchers weighed the glands of ten
chick embryos. Five of the embryos had been incubated for 14 days and another five had been
incubated for 15 days. The thymus gland weights in mg for both groups are shown in the table.
14 days
15 days
Difference
1a.) Run an appropriate 7 step hypothesis test to
29.6
32.7
-3.10
determine if the mean thymus gland weight at day 14 is
21.5
40.3
-18.8
significantly different than the weight at day 15.
28.0
23.7
4.3
34.6
25.2
9.4
44.9
24.2
20.7
Total 24 pts:
nd  5
n1  5
n2  5
Step1 Declare normality: “We assume normality.”
y1  31.72
y2  29.22
y d  2.5
For all summarized data
1pt
s1  8.73
s2  7.19
s d  14.72
(plus a t-test is asked for.)
Step 2 Declare parameter: 2pts:
The data sets are independent, the chick
embryos were from different data sets and they
are not paired even if the differences between
them were provided. We declare 2 means.
3pt:
1 = Mean thymus gland weight in mg of chicks at
14 days of incubation.
 2 = Mean thymus gland weight in mg of
chicks at 15 days of incubation.
Step 3: Declare the Hypothesis and LOS:
H 0 : 1   2
H A : 1  2
  0.05
2pts: Step 5:
Find the Critical value and draw the decision line.
Reject Ho
Do Not Reject Ho
tCritical = -2.3646
3pt: Step 4: Calculate the df.
s

s
 n  n 
1
2

2
1
df =
2
2
2
2
 s12 
 s 22 
 n 
 n 
1
2


n1  1
n2  1
8.73  7.19 

5
5

8.73 5   7.19 5 
2
2
2
2
5 1
2
2
2
5 1
2
Reject Ho
tCritical = +2.3646
3pts: Step 6 :Calculate the T.S.
y  y2
31.72  29.22
t Sample  1

2
2
s1 s 2
8.73 2 7.19 2


5
5
n1 n2
tSample = 0.49428
df = 7.7165, but we round down to df = 7.
10pts:
Step 7: Stats conclusion:
2pts: At the 5% LOS we do not
reject H 0 : 1   2 , Because:
2pts: English conclusion:
t Critical  t Sample  t Critical which yields:
3pts: -2.3646 < 0.49428 < 2.3646
p 
3pts:
(p > 0.40) > 0.05
There is not a significant difference in the weight of the
chick thymus gland between day 14 and day 15.
Stat 109
1b.)
Exam 2A Solutions
Sample Exam
635
Note that the chicks that were incubated longer had a smaller mean thymus gland weight.
Is this “reverse” result surprising, or could it easily be attributed to chance?
Explain why or why not.
2pts: The reverse result is due to chance, a p-value > 0.40 tells us that there is no
significant difference between the weights of either samples.
2.)
2a)
2b)
Suppose we have conducted a t test, with  = 0.05, and p- value = 0.08.
Determine whether the following statements are true or false.
False
We reject H 0 when  = 0.05
True
We would reject H 0 if  = 0.10
1pts
1pts
(We reject H 0 whenever p <  )
2c)
2d)
If H 0 is true, then the probability of getting a sample-t at least
as extreme as the one generated by our sample data is 8%.
(This is the basic description of the p-value)
Suppose the null hypothesis was rejected. Then a type 2 error
might have been made in the hypothesis conclusion.
(A type 1 error occurs when we reject H 0 incorrectly)
(A type 2 error occurs when we accept H 0 incorrectly)
3.)
In a pharmacological study, researchers measured the
concentration of the brain chemical dopamine in ng/g in six rats
exposed toluene and six control rats. The concentrations in the
striatum region of the brain for both groups are shown in the table.
Suppose that at least one of the data sets is not normally
distributed. Run an appropriate 7 step hypothesis test to
determine if the dopamine level found in rats exposed to toluene is
greater than that found in the controls.
True
1pts
False
1pts
Toluene
3420
2314
1911
2464
2781
2803
Control Difference
1820
1843
1397
1803
2593
1990
Solution: Since normality fails we are performing a hypothesis test comparing the median
dopamine levels in ng/g between toluene exposed rats and controls.
We apply the Wilcoxon-Mann-Whitney test with ordered counts:
Total 23 pts:
Step 1: Declare normality: We assume the data set is not normally distributed.
Step 2: Declare Parameters:
2pts:
Step 3: 3pts:
Declare the Hypothesis and
the LOS.
1 = Median dopamine level in ng/g for rats exposed to toluene.
 2 = Median dopamine level in ng/g for the control rats.
H 0 : 1   2
H A :1  2
  0.05
1600
471
541
661
188
813
Stat 109
Exam 2A Solutions
Sample Exam
636
Continued..
Step 4: Draw the decision line and find UCritical: 3pts:
Do Not Reject Ho
U Critical  29
#3, Step 5 :Calculate the T.S.
Reject Ho
5pts
Make an ordered count of the grouped data:
Toluene:
1911
2314
Control: 1397 1803 1820 1843
1990
Sum both
Counts:
2464
K1  4 + 5 + 5 + 6 + 6 + 6 = 32
K2  0 + 0 + 0 + 0 + 1 + 3 = 4
2781
2803
3420
2593
t.s. = U Sample  max K1 , K 2  = 32
Take the larger count as the U Sample
#3, Step 6: Bracket the p-value:
Go to table 6:(Wilcoxon-Mann-Whitney Statistic) to find the U Critical and p-values.
Using n = larger sample size
n' = smaller sample size
Then n = 6 and n’ = 6
A 1 tailed  = 0.05, yields:
U Sample = 32
U Critical = 29
0.01 < p < 0.025
8pts: Step 7: Stats conclusion:
2pts: At the 5% LOS we reject
H 0 : 1   2 ,
3pts: U Sample  U Critical which yields:
32 > 29
2pts: English conclusion:
Because:
p 
3pts:
(0.01 < p < 0.025) < 0.05
The median dopamine level in mg/g of toluene exposed
rats is significantly greater that that found in control rats.
4.)
In an experiment to compare two diets for fattening
beef steers, nine pairs of steers were chosen from the herd;
members of each pair were matched as closely as possible with
respect to age and hereditary factors. The members of each pair
were randomly allocated, one to each diet. The following table
shows the weight gain in pounds of the animals over a 140 day
test period on diet 1 and diet 2.
Diet 1 Diet 2 Difference
596
498
98
422
460
-38
524
468
56
454
458
-4
538
530
8
552
482
70
478
528
-50
4a.) Assume that all three columns are normally distributed
564
598
-34
556
456
100
Run an appropriate 7 step hypothesis test at   0.10 to
determine whether there is a significant difference in the
Mean
520.4
497.6
22.9
weight gained between the two diets.
SD
57.1
47.3
59.3
Stat 109
Exam 2A Solutions
Sample Exam
637
Solution to #4a) Total 21 pts:
Step1 Declare normality:
“We assume normality.”
Step 2 Declare parameter:
The data sets are paired for comparison and are therefore dependent.
We declare a mean difference.
2pts:
 d  Mean difference of the weight gained in pounds between 2 diets: Diet1- Diet 2.
3pt:
Step 3 Declare the LOS and Hypothesis:
H 0 : d  0
H A : d  0
4pts: Step 5 :Calculate the T.S.
x   0 22.9  0
t Sample 

 1.1585
sd
59.3
n
9
  0.10
3pts: Step 4: Draw the decision
line and find tCritiical:
Reject Ho
Do Not Reject Ho
Reject Ho
t Critical  -1.860
t Critical  +1.860
Step 6: Bracket the p-value.
From a 2-tailed   0.10 with df = nd  1  8 we find:

p-value: ( 0.20 < p < 0.40)
7pts: Step 7: Stats conclusion:
1pts: At the 10% LOS we do not reject H 0 :  d  0 , Because:
3pts:
t Critical  t Sample  t Critical which yields:
-1.860 < 1.1585 < 1.860
2pts: English conclusion:
p 
3pts:
( 0.20 < p < 0.40) > 0.10
There is not a significant difference in the weight gained
in pounds between the steers on diet 1 and diet 2.
4b.) Assume that the column data of the steer diet comparisons are not normally distributed.
Run an appropriate 7 step hypothesis test at   0.01 to determine whether there is a
significant difference in the weight gained between the two diets.
Solution to #4b) Total 21 pts:
Step1 Declare normality:
“We assume the data is not normally distributed.”
Step 2 Declare parameter:
The data sets are paired for comparison and are therefore dependent.
We declare a median difference.
2pts:  d  Median difference of the weight gained in pounds between 2 diets: Diet1- Diet 2.
Stat 109
Exam 2A Solutions
3pt:Step 3:
Declare hypothesis and the LOS:
H 0 : d  0
H A : d  0
Sample Exam
3pts: Step 5 :Calculate the T.S.
N  5 , N   4
  0.01
BSample  max( N  , N  )
BSample  5
We use a sign test (Table 7 page 684)
for the differences of medians.
638
98
-38
56
-4
8
70
-50
-34
100
Step 4: Draw the decision line and find BCritical: 3pts:
Do Not Reject Ho
Reject Ho
BCritical  9
Step 6: Bracket the p-value.
From table 7 a 2-tailed   0.01 with nd  9

p-value: ( p > 0.20)
8pts: Step 7: Stats conclusion:
2pts: At the 1% LOS we do not reject H 0 :  d  0 , Because:
3pts:
BSample  BCritical which yields:
5<9
2pts: English conclusion:
4c.)
p 
3pts:
( p > 0.20) > 0.01
There is not a significant difference in the weight gained
in pounds between the steers on diet 1 and diet 2.
5pts: Calculate the exact p-value of the data as determined by the sign test.
We calculate the p-value under the assumption that the null hypothesis is correct, that is that there
are an even number of positive and negative signs in the median difference data set which means that
either value has a 50% chance of occurrence. We take the largest value of signs as a threshold value (there
were 5 positive values) and calculate upward to the most extreme case (n = 9) within a sum of binomial
probabilities. Note that if there was a zero difference that the zero would be thrown out and n = 8. Also all
2-tail hypotheses must have these probabilities doubled.
Then for a one tailed hypothesis the calculation is:
9
9
9 
9
9
PY  5   0.55 0.5 4   0.5 6 0.53   0.5 7 0.5 2   0.58 0.51   0.59 0.5 0 = 0.5
5
 6
7
8 
9
Recognize that since success and failure both equal 50%,
the binomial calculation can be reduced to :
See the last page of Week 9 Day 2
lecture
notes for more examples of the
 9   9   9   9   9 
PY  5               0.59  0.5
calculation of the exact p-value.
 5   6   7   8   9 
Because this is a 2-tailed hypothesis: p-value = 2  PY  5  2  0.5  1
p = 1 (p > 0.999)
Stat 109
Exam 2 Prep
A Brief Note to Students who are Struggling to Pass:

A key concept from Day 1: Notation
We denote the true population parameter (mean, median, sd, or proportion, etc…) with Greek
letters: p etc... The random samples that we use as our best estimate for these
parameters are denoted with accented notations: 𝑥̅ , 𝑥̃, 𝑠, 𝑝̂ , 𝑒𝑡𝑐..
Confusing the notation of parameters and sample data means that you are unable to express the
difference between the true population value and the sample value used to estimate it (even if
you do understand the difference, your choice of notation says otherwise). Worse, switching
the notations for the proportion with the mean, or that of the median for the proportion shows a
lack of understanding of what each symbol means. Previous exams from struggling students
have had an accumulated loss 10 to 25 points for a chronic misuse of notation. Refer to the
lecture notes from Day 1 to review the key concept of notation and its use.

Dependent vs Independent Data:
If we compare the difference between two paired data sets, for example (before – after) or
(Method 1 – Method 2), the data set is dependent. While most students understand this, the gap
in understanding for the struggling student comes in what to do with dependent data. Keep in
mind that a confidence interval or hypothesis test will be applied to the column of differences
of the dependent data. Taken as one data set, the sample of differences for the mean will be
treated the same as a t-interval or a one sample t-test. The sample of differences for the median
will be treated the same as an s-interval or a sign-test, or Wilcoxon signed-rank test. The
notation however will be denoted as d (mean difference), or d (median difference). It is
essential that in interpreting the result of the confidence interval or hypothesis test that you
match the column of differences (first – second) with the correct scenario: its either a (small –
large) or a (large – small) scenario.
Independent data will also be presented in a two column format but here the columns will
remain distinct because we are making a comparison of two distinct populations. Two means
are compared in a two-sample t-test and denoted as 1 and 2. Two medians are compared in a
Wilcoxon- Mann-Whitney test or Wilcoxon’s Rank- Sum test and denoted as 1 and 2. Again
it is essential that in interpreting the result of the confidence interval or hypothesis test that you
match the column of differences (first – second) with the correct (small – large) or (large –
small) scenarios.

T and Z Lower tail alternate hypotheses: HA: HA: p
Because t and z distributions are symmetrical about zero, a less than inequality
in the alternate hypothesis should serve as reminder that the critical value will
be negative. Neglect this and you will have a false conclusion.
639
Stat 109
Exam 2 Prep
640
A Brief Note to Students who are Struggling to Pass:

Forming the hypothesis statement:
You must know by now that when you run a hypothesis test, it is the null hypothesis that
takes the case of equality (even if this equality statement for H0 implies the opposite of
HA). The prompt to run the hypothesis (or suspicion) is always translated to the alternate
hypothesis, not the null. The null hypothesis will use either =, or < or >, while the
alternate hypothesis will use one of 3 inequalities: ≠, 𝑜𝑟 < , 𝑜𝑟 >.

If rejecting H0 it must be that p <  and vice-versa.

The helpful hint of using the inequality of < or > within the alternate hypothesis, HA,
as an arrow to determine the region of rejection on decision line applies ONLY to Ztables and t-tables. It is an error to extend this idea to the decision lines of other
hypothesis tests. The decision lines for ALL other hypothesis tests (medians, Chisquare, ANOVA, etc..) will ALWAYS have the rejection region at the far right.
Look, for t-tests (and z-tests too) this is valid because:
H 0 : d  0
H A : d  0
H 0 : 1   2
H A : 1  2
2-tailed decision for independent and dependent means:
Reject Ho
Do Not Reject Ho
Reject Ho
 t Critical
 t Critical
H 0 : d  0
H A : d  0
H 0 : 1   2
H A : 1  2
Lower decision for independent and dependent means:
Reject Ho
Do Not Reject Ho
 t Critical
H 0 : d  0
H A : d  0
H 0 : 1   2
H A : 1  2
Upper decision for independent and dependent means:
Do Not Reject Ho
Reject Ho
 t Critical
But when we look at hypothesis tests for Medians we have only ONE decision line and it
ALWAYS has the rejection region at the far right. Look at the table data to see why.
H 0 : d  0
H A : d  0
H 0 :1   2
H A :1  2
H 0 : d  0
H A : d  0
H 0 :1   2
H A : 1  2
Do Not Reject Ho
Reject Ho
 BCritical
 U Critical
 WCritical
H 0 : d  0
H A : d  0
H 0 :1   2
H A : 1  2
Stat 109
Exam 2 Prep
641
A Brief Note to Students who are Struggling to Pass:

Do Not apply the concept of Degrees of Freedom to tables
of critical values for a median hypothesis.
For example, the notation of nd
stands for the number of differences. If there 6 differences
that are being compared DO NOT subtract one and jump above
to row 5 to find the critical value.
The concept of degrees of freedom is applied to t-tables,
chi-square tables, and F-tables, But NOT to the
hypothesis tests of median values.
 All Confidence Intervals will take their critical values from the
columns of 2-tailed tests, NOT one tailed tests.
Look at why this would be so:
(Lower bound,
Upper bound)
2-tails
 Reading the values of the confidence interval from left to right
should follow that of a number line from smallest to largest:
Like this: (Smallest, Largest) Not like this: (largest, smallest)
 Calculator entry.
About one out of 4 science majors do not know how to use their
calculators correctly. Does the first row of calculations look familiar? This
method is time consuming, inefficient and often introduces error due to
round off error. Learn to follow the approach of the second line of
calculations. If you make a mistake with the second method at least you
can trace your error within the calculator screen, this is not so easily done
with the herky-jerky processing of the first method.
This method takes
3 times as many
key-strokes and
often introduces
round off error.
3.45


 2.68
32
25
df 
3.45 32  2.68 25

3.45


 2.68
32
25
df 
3.45 32  2.68 25
 55.0058
2
2
2
2
31
2
2
31
0.371953  0.287292
0.004462  0.003439
24
2
Pick up the
calculator ONE
TIME only!
2
2
2
2
2
2
24
2
df 

 s12
s2 
 n  2n 
1
2

2
2
 s12 
 s 22 
 n 
 n 
1
2


n1  1
n2  1
2
0.434601
 55.0058
0.007901
Stat 109
Exam 2 Prep
A Brief Note to Students who are Struggling to Pass:
The most common mistakes on Exam 2 are listed below:

A Crucial Point: Not recognizing the difference in the wording between problems
using independent versus dependent data sets is the key indicator of a low score on
Exam 2. Choosing the wrong test results in a loss of half credit for a given
problem. Partial credit can be made if no further mistakes are made with the
application of the wrong hypothesis test.

Not recognizing that confidence intervals
are two tailed! We should use two tailed
critical values in constructing confidence
intervals, not one tailed critical values.

Neglecting to learn the calculation of the binomial
formula with your calculator. You know that it is very
likely that binomial calculations will be required on the 2nd
exam. It was also required on the 1st exam. Why not just
admit that you need to come in to office hours and get help
with using your calculator? Once you take the time to learn
it the binomial formula is pretty straight forward. See the
last pages of Week 9 Day 2 Lecture for calculating exact pvalues.
 Not recognizing that there are 2 optimal N
formulas: One for sample means,
the other for sample proportions.
642
Stat 109
Exam 2B Prep
1.) In a pediatric clinic a study is carried out to see how
effective an over the counter medication is in
reducing temperature. Ten 5-year-old children
suffering from influenza had their temperature (Fo)
taken immediately before and 1 hour after
administration of the medication. The results are
given in the table at the right. Does the evidence
support the case that the medication reduces fever?
Assume normality passes for all three data columns.
Test the hypothesis at the 10% LOS.
Total 20 pts:
Sample Exam
Patient
1
2
3
4
5
6
7
8
9
10
𝑥̅ =
sd =
Before
Medication
102.4
101.9
103.0
101.2
100.7
102.5
102.8
101.1
101.9
101.4
101.89
0.778
643
After
Difference
Medication
99.6
2.8
100.2
1.7
101.4
1.6
99.8
1.4
100.7
0
101.2
1.3
100.7
2.1
102.3
-1.2
102.3
-0.4
100.2
1.2
100.86
1.05
0.952
1.219
Stat 109
Exam 2B Prep
2.) Repeat the previous problem with a full 7 step
hypothesis under the assumption that normality does not
pass. Use Wilcoxon’s Signed Rank Test. Does the
evidence support the case that the medication reduces
fever? Test the hypothesis at the 10% LOS.
Total 25 pts:
Sample Exam
Diff.
2.8
1.7
1.6
1.4
0.0
1.3
2.1
-1.2
-0.4
1.2
3.)
Regardless of the conclusion you found in the first problem suppose your hypothesis found a pvalue of 0.06. Interpret the meaning of the p-value in terms of the context of the problem (temperature,
medication etc..) If p = 0.06 can be read as a percentage value, then 6% of what must be true? 8pts.
644
Stat 109
Exam 2B Prep
Sample Exam
4.)
Regardless of the conclusion you found in the hypothesis test for 5-year old’s temperature
reduction with under flu medication, Explain the consequences of a rejection error. What does the
data lead us to conclude, and what is actually true? 6pt
5.)
Find a 90% confidence interval on the mean body mass index (BMI = kg/m2) of high school boys
given that a random sample of 134 high school boys yields mean BMI of 21.8 with a standard
deviation of s = 3.4. Answer with an English sentence that uses the bounds of the CI. 6pts
6.)
Suppose we want to estimate the true mean BMI for high school boys to within 0.01 BMI units.
How many high school boys must be sampled for the margin of error of the 90% confidence
interval to be within 0.01 BMI units of the true mean BMI value? Assume a reliable standard
deviation of 𝜎 = 3.51 6pts
7.)
Interpret the meaning of confidence in the context of the 90% confidence interval you constructed
on the mean BMI of high school boys. Use the specific interval you constructed to help answer
the question that 90% of what must be true? 6pt
645
Stat 109
Exam 2B Prep
8.)
A student sought to demonstrate that
soybeans inoculated with nitrogen-fixing bacteria
yield more and grow more adequately without the
use of expensive and environmentally deleterious
synthesized fertilizers. The trial was conducted
under controlled conditions with uniform amounts
of soil. There were 8 inoculated plants compared
against 8 uninoculated plants. The soybean pod
weight (in grams) was recorded for each plant. Does
the evidence support the student’s suspicion?
Assume that the data set does NOT pass normality.
Test the hypothesis at the 10% LOS.
Total 23 pts:
Sample Exam
646
Plot Inoculated Uninoculated Difference
1
1.76
0.49
1.27
2
1.45
0.85
0.60
3
1.03
1.00
0.03
4
1.53
1.53
0.00
5
2.34
1.01
1.33
6
1.96
0.75
1.21
7
1.79
2.11
-0.32
8
1.21
0.92
0.29
1.634
1.083
1.551
𝑥̅ =
0.420
0.509
0.651
sd =
Stat 109
Exam 2B Prep
Solution
1.) In a pediatric clinic a study is carried out to see how
effective an over the counter medication is in reducing
temperature. Ten 5-year-old children suffering from
influenza had their temperature (Fo) taken immediately
before and 1 hour after administration of the medication.
The results are given in the table at the right. Does the
evidence support the case that the medication reduces
fever? Assume normality passes for all three data columns.
Test the hypothesis at the 10% LOS.
Patient
Total 20 pts:
Step1 Declare normality:
“We assume normality.”
1
2
3
4
5
6
7
8
9
10
𝑥̅ =
sd =
Name_____________647
Before
Medication
102.4
101.9
103.0
101.2
100.7
102.5
102.8
101.1
101.9
101.4
101.89
0.778
After
Difference
Medication
99.6
2.8
100.2
1.7
101.4
1.6
99.8
1.4
100.7
0
101.2
1.3
100.7
2.1
102.3
-1.2
102.3
-0.4
100.2
1.2
100.86
1.05
0.952
1.219
Step 2 Declare parameter:
The data sets are paired for comparison and are therefore dependent.
We declare a mean difference.
2pts:  d  Mean difference between before and after temperatures (Fahrenheit, Fo) for 5 year olds taking
medication for flu symptoms.
1pt:
Step 3 Declare the LOS:   0.10
4pts: Step 4: Declare hypothesis:
3pts: Step 5: Calculate the t-sample value:
t Sample 
H 0 : d  0
H A : d  0
x  0
1.05  0

 2.724
sd
1.219
n
10
Step 6: Find the critical value and draw the decision line.
For a 1-tailed   0.10 with df = nd  1  9 we find:
Do Not Reject Ho
Reject Ho
t Critical  1.38303
8pts: Step 7: Stats conclusion:
2pts: At the 10% LOS we reject H 0 :  d  0 , Because:
3pts:
t Sample  t Critical which yields:
2.724 > 1.378303
2pts: English conclusion:
p 
3pts:
( 0.01 < p < 0.025) < 0.10
The medication significantly lowers the temperature of a
5 year old child with flu.
Stat 109
Exam 2B Prep
Solution
2.) Repeat the previous problem with a full 7 step hypothesis
under the assumption that normality does not pass. Use
Wilcoxon’s Signed Rank Test. Does the evidence support the
case that the medication reduces fever? Test the hypothesis at
the 10% LOS.
Total 25 pts:
Step1 Declare normality:
“We assume normality fails.”
Step 2 Declare parameter:
The data sets are paired for comparison and are
therefore dependent. We declare a median difference.
2pts:  d  Median difference between before and
after temperatures (Fahrenheit, Fo) for 9 year olds
taking medication for flu symptoms.
1pt:
Step 3: Declare the LOS:   0.10
648
Diff.
ABS rank
2.8
9
1.7
7
1.6
6
1.4
5
0
1*
1.3
4
2.1
8
-1.2
2.5
-0.4
1
1.2
2.5
5pts: Signed ranks =
– Ranks
*
+ Ranks
9
7
6
5
*
4
8
2.5
1
3.5
2.5
41.5
Note that a Zero difference is omitted
from the data set!
Step 5: Calculate the W-sample value:
Because HA: d > 0 we take the positive signed rank
4pts: Step 4: Declare hypothesis:
H 0 : d  0
H A : d  0
WSample = 41.5
Step 6: Find the critical value and draw the decision line.
For a 1-tailed   0.10 with 9 non-zero differences,
d = 9 we find:
Do Not Reject Ho
Reject Ho
WCritical  35
3pts
8pts: Step 7: Stats conclusion:
2pts: At the 10% LOS we reject H 0 : d  0 , Because:
3pts: WSample  WCritical which yields:
41.5 > 35
2pts: English conclusion:
p 
3pts:
( 0.01 < p < 0.025) < 0.10
The medication significantly lowers the temperature of a child with flu.
3.)
Regardless of the conclusion you found in the first problem suppose your hypothesis found a p-value of
0.06. Interpret the meaning of the p-value in terms of the context of the problem (temperature, medication etc..) If
p = 0.06 can be read as a percentage value, then 6% of what must be true? 8pts.
In general: If the null hypothesis is true, then 6% of all random draws will be at least as extreme in its
contradiction of the null hypothesis than the drawn data is. (Only 2 pts credit for a general answer like this one,
that is memorized like a rubber stamp and lacks any context with the problem.)
In Context: If it is true that there no significant drop in temperature for children with flu who have taken the
medication as opposed to their temperatures before the medication was taken, then 6% of random samples will
show that the medication lowers the temperature for children with flu to an even greater extent than was shown in
this study. (Full credit for a description with context.) See Quiz 7 Prep for a review.
Stat 109
Exam 2B Prep
Solution
4.) Regardless of the conclusion you found in the hypothesis test for 5-year old’s temperature reduction
with under flu medication, Explain the consequences of a rejection error. What does the data lead us to
conclude, and what is actually true? 6pt
We claim that the medication is effective in lowering the temperature of 5-year olds with the flu, when in
fact the medication does not lower the temperature of 5 year olds with the flu.
5.) Find a 90% confidence interval on the mean body mass index (BMI = kg/m2) of high school boys
given that a random sample of 134 high school boys yields mean BMI of 21.8 with a standard deviation of
s = 3.4. Answer with an English sentence that uses the bounds of the CI. 6pts
With a df = n-1 =134-1=133, and rounding down to df = 100 as the next available value in the t-table,
we take the 2 tailed column at an  = 0.10 (as a compliment to 90% confidence) giving a t-critical
value of t  t 0.10  1.66024
2
y  t
sd
2
n
2
 21.8  1.66024
3.4
134
 (21.31, 22.29)
English: The 90% CI on the mean BMI for high school boys is (21.31, 22.29) kg/m2.
6.) Suppose we want to estimate the true mean BMI for high school boys to within 0.01 BMI units. How
many high school boys must be sampled for the margin of error of the 90% confidence interval to be within
0.01 BMI units of the true mean BMI value? Assume a reliable standard deviation of 𝜎 = 3.51 6pts
We must solve for the optimal n to find this sample size.
𝑛=
(𝑍𝛼⁄ 𝜎)
2
𝐸2
2
=
(1.645×3.51)2
(0.01)2
= 333,384.986 always rounding up…
We must sample the BMI from 333,385 high school boys to find a 90% CI on the mean with a margin
of error of 0.01 BMI units.
6.)
Interpret the meaning of confidence in the context of the 90% confidence interval you constructed on
the mean BMI of high school boys. Use the specific interval you constructed to help answer the question
that 90% of what must be true? 6pt
90% of all randomly drawn samples will form confidence intervals that contain the true mean BMI of
high school boys. The specific interval that we found, (21.31, 22.29) may be one of the 90% of all
confidence intervals that contains the true mean BMI or one of the 10% that fails to contain the true mean
BMI for high school boys. We cannot say for sure, we only know that this method works 90% of the time.
649
Stat 109
Exam 2B Prep
Solution
7.) A student sought to demonstrate that soybeans inoculated with
nitrogen-fixing bacteria yield more and grow more adequately
without the use of expensive and environmentally deleterious
synthesized fertilizers. The trial was conducted under controlled
conditions with uniform amounts of soil. There were 8 inoculated
plants compared against 8 uninoculated plants. The soybean pod
weight (in grams) was recorded for each plant. Does the evidence
support the student’s suspicion? Assume that the data set does NOT
pass normality. Test the hypothesis at the 10% LOS.
Total 23 pts:
Step 1: Declare normality: We assume the data set is not normally
distributed.
Step 2: 2pts:
Declare Parameters:
650
Plot Inoculated Uninoculated Difference
1
1.76
0.49
1.27
2
1.45
0.85
0.60
3
1.03
1.00
0.03
4
1.53
1.53
0.00
5
2.34
1.01
1.33
6
1.96
0.75
1.21
7
1.79
2.11
-0.32
8
1.21
0.92
0.29
1.634
1.083
1.551
𝑥̅ =
0.420
0.509
0.651
sd =
1 = Median soybean pod weight (in grams) for the inoculated plants.
 2 = Median soybean pod weight (in grams) for the uninoculated plants.
Step 3: 4pts: Declare the
hypothesis and the LOS:
H 0 : 1   2
H A :1  2
Step 4: Find the critical value and draw the decision line.
For a 1-tailed   0.10 with n = 8, n’ = 8, we find:
Step 5: Calculate the T.S.
5pts
  0.10
Do Not Reject Ho
U Critical
Reject Ho
 45
3pts
Make an ordered count of the grouped data:
Inoculated:
1.03 1.21 1.45 1.53 1.76 1.79 1.96
Uninoculated: 0.49 0.75 0.85
0.92 1.00
1.01
Sum both K1  6 + 6 + 6 + 6.5 + 7 + 7 + 7 + 8 = 53.5
Counts:
K 2  0 + 0 + 0 + 0 + 0 + 0 + 3.5 + 7 = 10.5
1.53
2.34
2.11
t.s. = U Sample  max K1 , K 2  = 53.5
Take the larger count as the U Sample
Check your work is this equation true?
K1  K 2  n1  n2
10.5  53.5  8  8 ??
9pts: Step 7: Stats conclusion:
1pts: At the 10% LOS we reject
H 0 : 1   2 ,
3pts: U Sample  U Critical which yields:
53.5 > 45
2pts: English conclusion:
Because:
p 
3pts:
(0.01 < p < 0.025) < 0.10
The Median soybean pod weight (in grams) for the
inoculated plants is significantly greater that that found
in uninoculated plants.
Stat 109
Final Exam Formula Page
k
 
2
ni  Ei 2
Ei
i 1
ni
nj
  
2

n
ij
i 1 j 1

2
df  k 1
2
 Eij 
  df2  row1column1 ,
Eij
651
2

n12  n21 

  df2  1
n12  n21
2
𝐸𝑥𝑝.
𝑖,𝑗
=
𝐶𝑜𝑙𝑢𝑚𝑛 𝑗
𝑖
(𝑅𝑜𝑤
𝑠𝑢𝑚 )×( 𝑠𝑢𝑚 )
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
 n1  n 2   n1  n 2   n1  n2 
 n1  n 2 
   

 


 

a  b   a  1 b  1  a  2  b  2 
0  a  b 


Fisher’s Exact p-value =


 .... 
 n1  n 2 
 n1  n 2 
 n1  n2 
 n1  n2 








a  b 
a  b 
a  b 
a  b 

p1
1  p1
Odds Ratio:  
p2
1  p2
Or:

SE of ln ˆ 



e
95% CI for  
n n
  11 22
n12  n21
1
1
1
1



n11 n12 n21 n22
ln ˆ  1.96  SE 
Harmonic mean
n' 
k
1 1
1


n1 n2
nk
SS Between   ni xi  x 
k
i 1
SSW ithin   xij  xi 
k
Within
df
k–1
 n
i
1
SS
SS Between
n–1
SS Total
df
a–1
SS
SS Between A
Between B
b–1
Interaction
df A  df B 
SS Between B
n–1
2
SS Total   xij  x 
k
nj
SS Interaction
SSW ithin
SS Total
2
i 1 j 1
SNK critical value = (SNK table) 
MS
F
MS
F
MSW ithin
n'
1 1

n1 n2
p-value
SSW ithin
Total
2-way ANOVA
Source
Between A
Within
Total
nj
i 1 j 1
Bonferroni 95% CI  ( x1  x2 )  B#Pairs, df within MS within 
1-way ANOVA
Source
Between
2
p-value
F-Tables
652
F-Tables
653
F-Tables
654
F-Tables
655
F-Tables
656
Bonferroni Table
657
Newman Keuls Table
658
Stat 109 Final Exam Prep Sheet
659
1A.) In a breeding experiment, white chickens
with small combs were mated and produced
190 offspring, of the types shown in the accompanying
table. Are these data consistent with the Mendelian
expected ratios of 9:3:3:1 for the four types?
Use a chi-square test at  = .10.
Number of
Offspring
Type of offspring:
White feathers, small comb
White feathers, large comb
Dark feathers. small comb
Dark feathers, large comb
Total
1B.) The distribution of blood types in the Armenian
population is as given in the following table. Is the distribution
of a random sample of 200 blood types among Portuguese
statistically similar to the reported values of the Armenian
population?
Use a chi-square test at  = 0.05
2.) The accompanying partially complete contingency table
shows the responses to two treatments. Invent a fictitious
data set that agrees with the table and for which  s2  0 .
111
37
34
8
190
Blood type Dist.
Armenian pop.
O = 0.31
Blood Dist. of 200
random Portuguese
O = 70
A = 0.50
A = 106
B = 0.13
B = 16
AB = 0.06
AB = 8
1
70
Treatment:
Success
Failure
2
Total
100
200
3.) A random sample of 99 students in a Conservatory of Music found that 9 of the 48 women sampled had
"perfect pitch" (the ability to identify, without error, the pitch of a musical note), but only 1 of the 51
men sampled had perfect pitch. Conduct Fisher's exact test to determine if the evidence supports the case
that women are more likely than men to have perfect pitch let  = .05.
Injured?
SelfEmployed
4.) As part of the National Health Interview Survey,

employed
by Others
occupational injury data were collected on thousands of
Yes
210
4391
American workers. The following table summarizes part of
No
33724
421502
these data.
Total
33934
425893
(a) Calculate the relative risk for the self employed.
(b) Calculate the sample value of the odds ratio.
(c) According to the odds ratio, are self-employed workers less likely to be injured than persons who work
for others? Use the odds ratio to express by how much one group is likely to be injured when compared
to the other group.
(d) Construct a 95% confidence interval for the population va1ue of the odds ratio and use the bounds of the
odds ratio to interpret the interval in an English sentence.
(e) Run an independence of attributes test to determine if self employment results in lower injury rates.
5.)
The habitat selection behavior of the fruitfly
Drosophila subobscura was studied by capturing
flies from two different habitat sites. The flies
were marked with colored fluorescent dust to
indicate the site of capture and then released
at a point mid-way between the original sites.
Site of Original
 Capture 
Site of Recapture
I
1
78
2
56
II
33
58
Stat 109 Final Exam Prep Sheet
660
5.) Continued: On the following two days, Flies were recaptured at the two sites. The results are
summarized in the table. Perform a complete chi-square hypothesis statistic for this contingency table.
Test the null hypothesis of independence against the (directional) alternative that the flies
preferentially tend to return to their site of capture. Let  = .01.
6.) The accompanying table shows fictitious data for 3 samples.
Complete the following ANOVA table from the given data.
Given the Grand mean = x =
Sample:
440
= 40
11
Mean:
1
48
39
42
43
43
2
40
48
44
44
7.)
Hospitals across the U.S. report that women
Treatment Mon. Tues. Wed. Thu.
are more likely to give birth during weekdays versus
Mean
53.5 60.1 57.3 59.0
the weekend. An obstetrician wants to know if the
Sd
5.3
8.4
4.3
6.2
likelihood of birth is uniform for all 5 weekdays. She
n
8
5
4
7
randomly selects 34 calendar weekdays from her local
maternity ward and records the mean number of births for each weekday as shown in the table.
Given: MSW = 6.03, and 𝑥̿ = 58.0, 𝛼 = 0.05.
Cigarettes
Heavy smoker. Light
8.)
A researcher has determined with an
per
day
more than 5
less than 5
ANOVA hypothesis that smoking has an
Mean
78
70
impact on resting heart rate. Given that the
n
5
9
ANOVA test yielded an MSW = 43.9, Use the
Newman-Keuls method to compare all pairs of means at  = .05.
3
39
30
32
35
34
Fri.
60.0
7.1
10
Nonsmoker
0 .
58
14
River Birch
European Birch
9.) A plant physiologist investigated the
Flooded
Control
Flooded
Control
effect of flooding on root metabolism in two
1.45
1.70
0.21
1.34
tree species: flood-tolerant river birch and the
1.19
2.04
0.58
0.99
intolerant European birch. Four seedlings of
1.05
1.49
0.11
1.17
each species were flooded for one day and
1.07
1.91
0.27
1.30
four were used as controls.
The concentration of adenosine
x  1.19
1.785
0.2925
1.20
triphosphate (ATP) in the roots of each plant was measured. The data (nmol ATP per mg tissue) are
shown in the table. For these data: SS(species of birch) = 2.19781, SS(flooding) = 2.25751,
SS(interaction) = 0.097656, and SS(within) = .47438.
9a.)
Draw an interaction graph.
9b.) Construct the ANOVA table.
9c.) Based on the ANOVA table, write a statistical and English conclusion answering whether the
factors of species & flooding interact. Report the respective F-stat and p-value at  = .05
9d.) Based on the ANOVA table, write a statistical and English conclusion that tests the null
hypothesis that species has no effect on ATP concentration. Use  = .01.
9e.) Assuming that each of the four populations has the same standard deviation, use the data to
calculate an estimate of that standard deviation.
9f.)
Suppose a two-sample t-test was applied to the alternate hypothesis that the mean ATP level
was significantly different in the two species of Birch. What would be the resulting test statistic?
Stat 109 Final Exam Prep Sheet
10.) Suppose that ATP in nmol per mg of tissue for the River Birch were correlated
with annual inches of rain received over several regions resulted in the linear regression
line: ATP = 3.24 – 1.25  rain. Interpret the slope of this regression line with an English
sentence using the units of measure.
661
Stat 109 Final Exam Prep Sheet Solutions
1A.) In a breeding experiment, white chickens with
small combs were mated and produced 190 offspring, of the
types shown in the accompanying table. Are these data
consistent with the Mundelein expected ratios of 9:3:3:1 for
the four types? Use a chi-square test at  = 0.10.
23pts
Perform all
preliminary
checks,
3pts:
Type of offspring:
White feathers, small comb
White feathers, large comb
Dark feathers. small comb
Dark feathers, large comb
Total
Number of
Offspring
111
37
34
8
190
pi = Proportion of chickens with the “ith” color comb combination.
Solution:
2pts
Declare the parameter:
Declare the
hypothesis,
3pts:
662
Ho: pWhite/small =
9
16
, pWhite/Large =
3
16
, pDark/small =
3
16
, pDark/Large =
1
16
OR: Ho: The color and comb ratio of the chickens fits the Mendelian ratio of 9:3:3:1
HA: The color and comb ratio of the chickens is otherwise.
Preliminary check: Show that all 4 expected values are greater than 5.
9  190
3  190
LOS = 0.10
E1 
 106.875  5
E2 
 35.625  5
16
16
3  190
1  190
E3 
 35.625  5
E4 
 11.875  5
16
16
Then aknowledge what this check indicates: “All expected values are greater than 5,
therefore we have a large enough sample to apply a chi-square test.”
OR: “We have an ample sample.”
Draw the
decision Line
2pts:
Calculate the
test statistic.
(the sample
value) 4pts:
Do Not Reject Ho
at df = 3:
t.s.   
2
s
 s2 
Reject Ho
6.25
Observed - Expected 2
Expected
111  106.875
106.875
  1.55
2

37  35.6252  34  35.6252  8  11.8752
35.625
35.625
11.875
2
s
Statistical
conclusion
7pts:
English
Conclusion
2pts:
At the 10% LOS we do not reject H 0 , because
2
2
, df = 3 yields p   :
 Sample
  Critical
(1.55 < 6.25)
(p > 0.20) > 0.10
The distribution of the chicken comb and color ratio fits the Mendelian ratio of 9:3:3:1
Stat 109 Final Exam Prep Sheet Solutions
1B.) The distribution of blood types in the
Armenian population is as given in the following
table.
Is the distribution
of
a random
sample of 200 blood types among
Portuguese statistically similar to the reported
values of the Armenian population?
663
Blood type Dist. Blood Dist. of 200
Armenian pop.
random Portuguese
O = 0.31
O = 70
A = 0.50
A = 106
B = 0.13
B = 16
Run a chi-square test at an LOS of 5% to find out.
AB = 8
AB = 0.06
15pts
1.)
Declare the parameter: p i = Proportion of Portuguese with blood type “i”.
2.)
State the hypothesis. 2pts
2pts
State the LOS
  0.05
H 0 : pO  0.31, p A  0.50, pB  0.13, p AB  0.06,
H A : The proportions are otherwise.
3.)
Verify that all the expected values are greater than 5. 2pt
EO  0.31  200  62
E A  0.50  200  100
62 > 5, 100 > 5, 26 > 5, 12 > 5.
“Since all expected values are greater than 5, we
have ample sample.”
EB  0.13  200  26
E AB  0.06  200  12
4.)
Calculate for the sample value, express with correct notation
2

Observed - Expected 
2
 Sample  
Expected
4pts
2
2
2
2








70
62
106
100
16
26
8
12
2
 Sample




62
100
26
12

2
Sample
82
62 102 4 2




62 100 26 12
Accept Ho
2
 Critical
= 7.81 Reject Ho
2
 Sample
 1.0323  0.36  3.846  1.333
2
 Sample
 6.57
“At the 5% LOS we do not reject H 0
2
2
because:  Sample
<  Critical
p 
(6.57 < 7.81) ( 0.05 < p <0.10) > 0.05
5pts
Portuguese have the same blood
type distribution as the Armenians.
Stat 109 Final Exam Prep Sheet Solutions
664
2.) The accompanying partially complete contingency table shows the
responses to two treatments. Invent a fictitious data set that agrees
with the table and for which  s2  0 . 5pts
Solution:
s2  0 means the two variables are completely independent. This can
happen only when exactly the same percentage shows for “success” in each
of the treatments: pˆ1  70  0.7  140  pˆ 2
100
Treatment:
Success
Failure
Total
1
70
2
100
200
Treatment
Success
Failure
1
70
30
2
140
60
200
(𝑅𝑜𝑤 𝑠𝑢𝑚×𝐶𝑜𝑙𝑢𝑚𝑛 𝑠𝑢𝑚)
Consider that for independence of attributes an 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 =
for each cell.
𝐺𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
Then:
70  140  70  30  21,000  70
70  140  140  60  42,000  140
E1,1 
E1, 2 
300
300
300
300
30  60  70  30  9,000  30
30  60  140  60  18,000  60
E 2,1 
E 2, 2 
300
300
300
300
 s2  
Observed - Expected 2
Expected
 s2 
70  702  140  1402  30  302  60  602
70
140
30
60
 s2  0  0  0  0  0
3.)
A random sample of 99 students in a Conservatory of Music found that 9 of the 48 women
sampled had "perfect pitch" (the ability to identify, without error, the pitch of a musical note), but only
1 of the 51 men sampled had perfect pitch. Conduct Fisher's exact test to determine if the evidence
supports the case that women are more likely than men to have perfect pitch let  = .05. 14pts
Solution:
Declare
H 0 : Perfect pitch is independent of sex.
hypothesis: 3pts
H1 : Women are more likely than men to have perfect pitch.
Generate all tabled scenarios from the given to the most extreme case of zero: 3pts
Y
N
Women
9
39
48
Men
1
50
51
10
89
99
Women
10
38
48
Men
0
51
51
10
89
99
Use combinations to calculate the p-value:
10  89  10  89 
     
1 50
0 51
p =         0.00591 4pts
 99 
 99 
 
 
 51 
 51 
Stat. Conclusion. “At the 5% LOS we reject H0 because p   , 0.00591 < 0.05”
2pts
English Conclusion: Women are significantly more likely than men to have perfect pitch.
2pts
Stat 109 Final Exam Prep Sheet Solutions
665
Injured?
SelfEmployed
4.) As part of the National Health Interview

employed
by Others
Survey, occupational injury data were collected on
Yes
thousands of American workers. The following
210
4391
No
table summarizes part of these data.
33724
421502
Total
(a) Calculate the relative risk for the self employed.
33934
425893
(b) Calculate the sample value of the odds ratio.
(c) According to the odds ratio, are self-employed workers more likely, or less likely, to be
injured than persons who work for others? If so then use the odds ratio to express by how
much one group is likely to be injured when compared to the other group.
(d) Construct a 95% confidence interval for the population va1ue of the odds ratio and use the
bounds of the odds ratio to interpret the interval in an English sentence. 14pts
Solution:
a)
Find the relative risk: RR =
ˆ 
b) Find the Odds ratio
c)
𝑝̂1
𝑝̂2
=
210⁄
33934
4391⁄
425893
= 0.600235
(210)(421502)

(4391)(33724) 0.5977
Self-employed workers are injured on the job at only
0.5977 times the rate as workers employed by others.
2pts
3pts
1pt
d) Calculate a 95% CI for the odds ratio

ln ˆ  0.5146


1
1
1
1



 0.0709
210 4391 33724 421502
ˆ
SE of ln  
95% c.i. for ln (  ) = ln    Z  SE of ln  
2
=  0.5146  (1.96  0.0709)  (0.65356,

6pts
95% c.i. for (  ) =
e
0.65356
 0.37564)

, e 0.37564  (0.52019, 0.68685) .
Self employed workers will be injured on the job anywhere from
0.52 to 0.69 times as frequently as workers employed by others.
2pts
Recall from lecture week 10:
(  < 1,
(  < 1,


< 1)
> 1)
(  > 1,

> 1)
Both bounds are less than 1, Column1 descriptors are less likely than column 2
Lower bound is less than 1, Upper bound is greater than 1,
We can not rule out even odds for either descriptor.
Both bounds are greater than 1, Column1 descriptors are more likely than column 2
Stat 109 Final Exam Prep Sheet Solutions
666
4e.) Run an independence of attributes test to determine if self employment results in lower injury rates.
Step 1:
Declare the
hypothesis
and LOS: 3pts
H 0 : Injury rates are independent of whether one is self employed or working for a boss.
H A : Injury rates are dependent upon employment status. Self-employed workers have
SelfInjured? employed

Yes
210
No
33724
Total
33934
Step 2: Calculate the expected
values for the null hypothesis:
𝐸𝑥𝑝.
𝑖,𝑗
𝑅𝑜𝑤 𝑖
𝐶𝑜𝑙𝑢𝑚𝑛 𝑗
(
)×(
)
𝑠𝑢𝑚
= 𝑠𝑢𝑚
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
Expected
Yes
Injured
Not
Injured
Assume  = 0.05
lower rates of injury on the job.
Self-employed
Employed by Others
𝑅𝑜𝑤 1
𝐶𝑜𝑙𝑢𝑚𝑛 1
(
)×(
)
𝑠𝑢𝑚
𝑠𝑢𝑚
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
Total
𝑅𝑜𝑤 1
𝐶𝑜𝑙𝑢𝑚𝑛 2
(
)×(
)
𝑠𝑢𝑚
𝑠𝑢𝑚
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
𝑅𝑜𝑤 2
𝐶𝑜𝑙𝑢𝑚𝑛 1
(
)×(
)
𝑠𝑢𝑚
𝑠𝑢𝑚
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
𝑅𝑜𝑤 2
𝐶𝑜𝑙𝑢𝑚𝑛 2
(
)×(
)
𝑠𝑢𝑚
𝑠𝑢𝑚
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
4601
455226
Total
33934
Expected
Yes
Injured
Self-employed
Employed by Others
Total
(4601) × (33934)
459827
(4601) × (425893)
459827
4601
425893
459827
Not (455226) × (33934) (455226) × (425893) 455226
Injured
459827
459827
Total
Expected
Yes
Injured
Not
Injured
Total
33934
425893
Selfemployed
Employed by
Others
Total
339.54
4261.46
4601
33594.46
421631.54
455226
425893
459827
33934
459827
Employed
by Others
Total
4391
421502
425893
4601
455226
459827
Stat 109 Final Exam Prep Sheet Solutions
4e.) Continued:
Run an independence of attributes
test to determine if self employment
results in lower injury rates.
Step 3: Check for ample sample:
Each expected value is greater
than 5, therefore we have an
ample sample.
Step 4: Draw a decision line:
𝑛𝑢𝑚𝑏𝑒𝑟
df = (𝑛𝑢𝑚𝑏𝑒𝑟
− 1) × (
− 1)
𝑜𝑓 𝑟𝑜𝑤𝑠
𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠
E1,1 = 339.54 > 5
E1,2 = 4261.46 > 5
E2,1 = 33594.46 > 5
E2,2 = 421631.54 > 5
Note that for a one tailed test we must double
the given  in the two-tailed chi-square table to
find the correct 𝝌𝟐𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍 value.
Do Not Reject Ho Reject Ho
2
𝜒𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙
= 2.71
df = 1x1 = 1
Step 5: Calculate the Chi-square sample value:  
2
k

i 1
𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 =
667
n
ij
 Eij  2
Eij
(210 − 339.54)2 (4391 − 4261.46)2 (33724 − 33594.46)2 (421502 − 421631.54)2
+
+
+
339.54
4261.46
33594.46
421631.54
𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 = 49.423 + 3.938 + 0.5 + 0.04
𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 = 53.9
Step 6: Write a complete statistical conclusion with an English sentence summary.
At the 5% LOS we reject
Ho because: df = 1
𝝌𝟐𝑺𝒂𝒎𝒑𝒍𝒆 > 𝝌𝟐𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍
53.9 > 2.71
p
( p < 0.00005) < 
Injury rates are dependent upon employment status. Self-employed workers have
lower rates of injury on the job.
Stat 109 Final Exam Prep Sheet Solutions
668
5.)
The habitat selection behavior of the fruitfly
Site of
Site of
Drosophila subobscura was studied by capturing flies from
Original
Recapture
two different habitat sites. Fruit flies were captured at one of
 Capture  1
2
two sites and marked and then released at a point mid-way
I
78
56
between the original sites. On the following two days, Flies
II
33
58
were recaptured at the two sites. Each fruit fly was marked
with its own micro-bar code to compare the fruit fly’s original site of capture with its site of recapture.
The results are summarized in the table. Perform a complete chi-square hypothesis statistic for this
contingency table. Test the null hypothesis of independence against the (directional) alternative that the
flies preferentially tend to return to their site of capture. Let  = 0.01.
18pts
Solution:
H 0 : Site of fruit flies’ recapture is independent of the original site of capture.
H A : Site of fruit flies’ recapture is dependent of the original site of capture, flies
preferentially tend to return to their site of capture.
(this last segment is the one tail portion of the hypothesis.)
  0.01
df = (R-1) x (C-1) = (2 – 1) x (2 -1) = 1
 = 0.01 x 2 = 0.02 for a 1-tailed test
Do Not Reject H0
2
 Critical
Reject H0
 5.41
2
𝑀𝑐𝑁𝑒𝑚𝑎𝑟𝑠:
2
𝜒𝑆𝑎𝑚𝑝𝑙𝑒
(56 − 33)2
(𝑛1,2 − 𝑛2,1 )
=
=
= 5.94382
𝑛1,2 + 𝑛2,1
56 + 33
At the 1% LOS we reject H0 because:
2
2
𝜒𝑆𝑎𝑚𝑝𝑙𝑒
> 𝜒𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙
5.94382 > 5.41
p < 
(0.005 < p < 0.01) < 0.01
Fruit-flies prefer to return to their site of original capture. 2pts
-1pt for the 2-tailed conclusion of:
“Site of recapture dependent on site of original capture.”
Declare the
hypothesis: 3pts
State the LOS
1pt
Draw the
decision line:
3pts
Calculate the
Sample Value:
4pts
Conclude with
Statistics: 5pts
English: 2pts
Note! If this were a 2-tailed test the bracketed p-value would be (0.01 < p < 0.02)
But to accommodate a one-tailed test these bracketed values must be split.
– 8pts For applying the incorrect method of Independence of Attributes for paired categorical data.
E1,1 
E 2, 2
134111  66.11
225
91114  46.11

225
E1, 2 
134114  67.89
225
E 2,1 
91111  44.89
225
All expected values exceed 5, therefore we have an Ample Sample.
Stat 109 Final Exam Prep Sheet Solutions
(Continued from the previous page.)
row i sum column j sum 

Expected Values = Eij

2
s
 
2
s
total sum
669
Obs
2
2
2
2




78  61.11
56  67.89
33  44.89 
58  46.11




61.11
67.89
44.89
46.11
 Ei , j 
2
i, j
Ei , j
 10.44
2
 Sample
 2.1384  2.0823  3.15  3.06  10.44
2
t.s.  Sample
=10.44 (df =1), yields a table P-value between 0.001 < p-value < 0.01
However we must recognize that this is a one tailed test which means
that these p-values must be split in half: 0.0005 < p-value < 0.005
2
2
Stat: “At the 1% LOS So we reject H0 because  Sample
(10.44 > 5.41)
  Critical
Which yields p   , (0.0005 < p < 0.005) < 0.01.
English: We conclude that flies preferentially return to their site of capture.
6.)
The accompanying table shows fictitious data for 3 samples.
Complete the following ANOVA table from the given data. 14pts
1
48
39
42
43
Mean: 43
Solution:
SS Between   ni xi  x 
k
SSW ithin   xij  xi 
nj
k
2
i 1
2
i 1 j 1
Sample
2
40
48
44
44
3
39
30
32
35
34
440
Grand mean = x = (total sum)/(number of data points) =
= 40
11
2
2
2
SSB = 4(43  40)  3(44  40)  4(34  40)  228

SSW  (48  43)  (39  43)  (42  43)  (43  43)
2

(39  34)
2
2
 (40  44) 2  (48  44) 2  (44  44) 2
2
SSB 228

= 114
df B
2
SSW 120

MSW =
= 15
df W
8
MSB =

2

 (30  34)  (32  34)  (35  34)
2
2pts
2pt
2
df B  k  1  2
df W    ni  1  8
3pts
2

= 120 3pts
FStat 
Source
Between groups
df
2
SS
228
MS
114
Within groups
8
120
15
Total
10
348
MSB 114

 7.6
MSW
15
Fstat
p-value
7.6
0.01 < p < 0.02
4pts
Stat 109 Final Exam Prep Sheet Solutions
670
7.) Hospitals across the U.S. report that women
Treatment Mon. Tues. Wed. Thu. Fri.
are more likely to give birth during weekdays
Mean
53.5 60.1 57.3 59.0 60.0
versus the weekend. An obstetrician wants to
Sd
5.3
8.4
4.3
6.2
7.1
know if the likelihood of birth is uniform for all 5
n
8
5
4
7
10
weekdays. She randomly selects 34 calendar
weekdays from her local maternity ward and records the mean number of births for each weekday as
shown in the table.
Given: MSW = 6.03, and 𝑥̿ = 58.0, 𝛼 = 0.05.
Construct the ANOVA table and run a complete ANOVA hypothesis.
Solution:
Note preliminary
Checks, 2pts
We assume independent and random samples.
We also assume normality.
𝑘
Find FCritical = Fk,n’-1: k = 5 groups, 𝑛′ = 1 1 1 1
1
+ + + +
𝑛1 𝑛2 𝑛3 𝑛4 𝑛5
Check for equal
variances:
Rounding up n’ = 7, df = n – 1 = 6, then at 𝛼 = 0.05, FCritical = F5,6 = 4.39
2
𝑠𝑑
29pts
=1
5
1 1 1 1
+ + + +
8 5 4 7 10
= 6.1135
5pts
8.4 2
FSample = ( 𝑠𝑑𝑚𝑎𝑥 ) = (4.3) = 3.816
𝑚𝑖𝑛
Do Not Reject Ho
At the 5% LOS we assume equal variances because:
Reject Ho
4.39
p >  (0.05 < p < 0.10) > 0.05
FSample < F5,6 Critical (3.816 < 4.39)
Note: In Hartley’s test we ALWAYS use 𝜶 = 𝟎. 𝟎𝟓 even if 𝜶 equals another value in the ANOVA test.
Declare parameter: 2pts
 i  mean number of births on the “ith” day of the week.
State Hypothesis:
2pts
Ho: Mon = Tue =Wed =Thu =Fri
Ha: At least one weekday has a mean number of births that is not equal to the others.
Find the correct Df
3pts
Df Between: df B  k  1  4 and Df Within: dfW   ni  1  29
SSB   ni xi  x 
k
Calculate the
SSB:
3pts
Calculate the
SSW
2pts
2
SSB = 233.01
i 1
SSB  853.5  58  560.1  58  457.3  58  759  58  1060  58
2
SSW
df W
SSW  MSW  df W
Since MSW =
7.) Continued on the next page:
7.) Continued:
2
2
given: MSW = 6.03, dfw = 29
SSW = 6.03× 29
SSW = 174.87
2
2
Stat 109 Final Exam Prep Sheet Solutions
MSB =
SSB 233.01

= 58.25 2pts
df B
4
FStat 
MSB 58.25

 9.66 2pts
MSW
6.03
671
Source
Between groups
df
4
SS
233.01
MS
58.25
Within groups
29
174.87
6.03
Total
33
407.88
Fstat
9.66
p-value
p < 0.0001
State conclusion with Statistics and English.
Stat: : “At the 5% LOS we reject H0 because
F Sample  F4, 29 Critical (9.66 > 2.70) which yields: p   , (p < 0.0001) < 0.05”
4pts
English: The mean number of births is significantly different for at least one weekday. 2pts
8.)
A researcher has determined with an
ANOVA hypothesis that smoking has an
impact on resting heart rate. Given that the
ANOVA test yielded an MSW = 43.9, Use
the Newman-Keuls method to compare all
pairs of means at  = .05.
20pts
Cigarettes
per day
Mean
n
8.) Solution
Do not conclude that df = 2
NOTE!
Order the Means: 2pts
& Determine Dfw 2pts
x Non  58
For unbalanced data
sets we calculate the
harmonic mean. 3pts
𝑛′ =
x Light  70
Heavy smoker.
more than 5
78
5
x Heavy  78
Light
less than 5
70
9
Common error!
dfW   ni  1  25
𝑘
3
=
= 7.84
1
1
1
1 1 1
+
+
+
+
𝑛1 𝑛2 𝑛3 14 9 5
Build the SNK table for critical values at   0.05 with df(within) = 25.
Note the steps apart
SNK table value at Df = 25.
(but we must use df = 24)
(SNK table ) 
MS (within)
n'
2
2.92
3
3.53
2pts
2pts
6.91
8.35
3pts
8.) Continued on the next page:
Nonsmoker
0 .
58
14
Stat 109 Final Exam Prep Sheet Solutions
672
8.) Continued:
Table Work: 3pts
Differences
Of Mean Pairs
x Heavy - x Non  78  58
Absolute
differences
Heavy-Non = 20
Corresponding NK
Diff. vs. critical values
At 3 steps: 20 > 8.35
Conclusion
 Heavy   Non
=8
At 2 steps: 8 > 6.91
 Heavy   Light
= 12
At 2 steps: 12 > 6.91
 Light   Non
x Heavy - x Light  78  70
Heavy-Light
x Light - x Non  70  58
Light - Non
Line notation summary: 3pts
 Non  Light  Heavy
River Birch
European Birch
9.)
A plant physiologist investigated the effect of
Flooded
Control
Flooded
Control
flooding on root metabolism in two tree species: flood1.45
1.70
0.21
1.34
tolerant river birch and the intolerant European birch.
1.19
2.04
0.58
0.99
Four seedlings of each species were flooded for one day
1.05
1.49
0.11
1.17
and four were used as controls. The concentration of
1.07
1.91
0.27
1.30
adenosine triphosphate (ATP) in the roots of each
plant was measured. The data (nmol ATP per mg
x  1.19
1.785
0.2925
1.20
tissue) are shown in the table. For these data: SS(species of birch) = 2.19781, SS(flooding) = 2.25751,
SS(interaction) = 0.097656, and SS(within) = .47438.
20pts
Interaction Graph of Birch Species vs Flooding Level
Solution
9a.)
Draw an interaction graph.
nmol ATP per mg tissue
3pts
-1pt for missing labels or units
2.0
River Birch
1.5
European Birch
1.0
0.5
Flooded
Control
Stat 109 Final Exam Prep Sheet Solutions
673
9b.) Construct the ANOVA table. Solution:
df Species  species -1 = 1
SS Interaction  0.097656
MS ( Interaction) 

df Interaction
1
df flooding levels  flooding levels -1 = 1
SS Within  0.47438
df Interaction  df Species  df flooding levels  1
MS Within  

df
12
W
ithin
df W ithin  n  k  16  4
MS Species  2.19781
FSpecies 

MSW
0.039532
SS Species  2.19781
MS (Species ) 

MS Floods  2.25751
df Species
1
FFloods 

MSW
0.039532
SS Floods  2.25751
MS Interaction  0.097656
MS ( Floods ) 

FInteraction 

df Floods
1
MSW
0.039532
Source
df
SS
MS
F
p-value
Bet’n species
1
2.19781
2.19781
55.60 p < 0.0001
Bet’n flooding levels 1
2.25751
2.25751
57.11 p < 0.0001
5pts
Interaction
1
0.097656
0.097656
2.47
0.1 < p < 0.2
Within groups
12
0.47438
0.039532
Total
15
5.027356
9c.) Based on the ANOVA table, write a statistical and English conclusion answering
whether the factors of species & flooding interact. Report the respective F-stat and pvalue at  = .05
Solution:
2pts
Stat: At the 5% LOS, we do not reject H0 because
F Sample  Fdf 1, 12 Critical (2.47 < 4.75) yields p   , (0.1 < p < 0.2) > 0.05
2pts
English: “There is no significant interaction between species and flooding.”
9d.) Based on the ANOVA table, write a statistical and English conclusion that tests the null
hypothesis that species has no effect on ATP concentration. Use  = .01.
Solution:
Stat:
At the 1% LOS, we reject H0 because
FSample  F1,12 Critical (55.60 > 9.33) yields p   , (p < 0.0001) < 0.01
2pts
English: “There is a significant species effect upon the ATP concentration of the tree.” 2pts
9e.) Assuming that each of the four populations
has the same standard deviation, use the data to
calculate an estimate of that standard deviation.
Solution:
spooled = MS (within)  0.039532
spooled = 0.1988
2pts
Stat 109 Final Exam Prep Sheet Solutions
674
9f.) Suppose a two-sample t-test was applied to the alternate hypothesis that the mean ATP level was
significantly different in the two species of Birch. What would be the resulting test statistic?
Solution:
The link between the sample values of the ANOVA and 2-sample t-test is:
√𝐹𝑆𝑎𝑚𝑝𝑙𝑒 = 𝑡𝑆𝑎𝑚𝑝𝑙𝑒 Then √55.6 = 7.46 and 𝑡𝑆𝑎𝑚𝑝𝑙𝑒 = 7.46
10.)
Suppose that ATP in nmol per mg of tissue for the River Birch
were correlated with annual inches of rain received over several regions
resulted in the linear regression line:
ATP = 3.24 – 1.25  rain.
Interpret the slope of this regression line with an English sentence using
the units of measure.
Answer: “For each additional inch of annual rainfall we should see a decrease of 1.25
nmol per mg of ATP in the tissue of the River Birch.”
A note to students struggling to pass: Warning! 1 out 3 have made this error in past exams:
You must know by now that when you run a hypothesis test, it is the Null hypothesis that takes
the case of equality (even if in this equality statement for H0 implies the opposite of HA). The
prompt to run the hypothesis (or suspicion) is translated to the alternate hypothesis, Not the null.
Download