t (n1 + n2 - 2)

advertisement
Applications of Biostatistics in Clinical Data
1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the
incubation period. The incubation periods of a random sample of 7 HIV infected individuals is
given below (in years):
12.0
9.5
13.5
7.2
10.5
6.3
12.5
a) Calculate the sample mean.
Mean = (12.0 + 9.5 + 13.5 + 7.2 + 10.5 + 6.3 + 12.5) / 7 = 10.21
b) Calculate the sample median
Sort:
6.3, 7.2, 9.5, 10.5, 12.0, 12.5, 13.5
Median = (n + 1) / 2 = 8/2 = 4th value
Median = 10.5
c) Calculate the sample standard deviation.
SD = √
∑(𝑋−𝑋𝑏𝑎𝑟)2
𝑛−1
Xbar = 10.21
SD =
√
∑(12.0−10.21)2 +(9.5−10.21)2 +(13.5−10.21)2 +(7.2−10.21)2 +(10.5−10.21)2 +(6.3−10.21)2 +(12.5−10.21)2
7−1
SD = 2.71
d) If the number 6.3 above were changed to 1.5, what would happen to the sample mean,
median and standard deviation? State whether each would increase, decrease, or
remain the same.
Mean = (12.0 + 9.5 + 13.5 + 7.2 + 10.5 + 1.5 + 12.5) / 7 = 9.52
Sort:
1.5, 7.2, 9.5, 10.5, 12.0, 12.5, 13.5
Median = (n + 1) / 2 = 8/2 = 4th value
Median = 10.5
SD = √
∑(𝑋−𝑋𝑏𝑎𝑟)2
𝑛−1
Xbar = 10.21
SD =
√
∑(12.0−10.21)2 +(9.5−10.21)2 +(13.5−10.21)2 +(7.2−10.21)2 +(10.5−10.21)2 +(1.5−10.21)2 +(12.5−10.21)2
7−1
SD = 4.17
By changing 6.3 to 1.5, Mean decreases, Median remains same and SD increases.
e) Suppose instead of 7 individuals, we had 14 individuals. (we added 7 more
observations).
12.0
9.5
13.5
7.2
8.1
10.5
6.3
12.5
14.9
7.9
5.2
13.1
10.7
6.5
Make a guess of whether the sample mean and sample standard deviation for the 14
observations would increase, decrease or remain the same compared to answer in part
(d).
Mean = (5.2+6.3+6.5+7.2+7.9+8.1+9.5+10.5+10.7+12.0+12.5+13.1+13.5+14.9)/14 = 8.88
Sort:
5.2, 6.3, 6.5, 7.2, 7.9, 8.1, 9.5, 10.5, 10.7, 12.0, 12.5, 13.1, 13.5, 14.9
Median = (n + 1) / 2 = 15/2 = 7.5th value (i.e. between 7th and 8th value)
Median = (9.5 + 10.5)/2 = 10
SD = √
∑(𝑋−𝑋𝑏𝑎𝑟)2
𝑛−1
SD = 3.22
By changing number of observations to 14, Mean decreases, Median increases and SD
decreases.
2. A study is conducted concerning the blood pressure of 60 year old women with glaucoma. In
the study, 200 women are randomly selected and the sample mean systolic blood pressure
is 140 mm Hg and the sample standard deviation is 25 mm Hg.
a. Calculate a 95% confidence interval for the true mean systolic blood pressure among the
population of 60 year old women with glaucoma.
Answer:
95%
n = 200
σXbar =
𝜎
√𝑛
=
25
√200
= 1.768
σXbar = 1.768
α = 100 – CI = 100 – 95
α = 5% = 0.05
Therefore, 0.025 on each side (0.05÷2 = 0.025)
P(Z) = 1 – 0.025 = 0.975
From SND table the Z-value is 1.96
Zα/2 = 1.96
Xbar ± Zα/2
𝜎
√𝑛
140 ± 1.96 X 1.768
140 ± 3.465
Therefore, 140 – 3.465 and 140 + 3.465
Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between
136.535 and 143.465
b. Suppose the study above was based on 100 women instead of 200 but the sample mean
(140) and standard deviation (25) are the same. Recalculate the 95% confidence interval.
Answer:
95%
n = 200
σXbar =
𝜎
√𝑛
=
25
√100
= 2.5
σXbar = 2.5
α = 100 – CI = 100 – 95
α = 5% = 0.05
Therefore, 0.025 on each side (0.05÷2 = 0.025)
P(Z) = 1 – 0.025 = 0.975
From SND table the Z-value is 1.96
Zα/2 = 1.96
Xbar ± Zα/2
𝜎
√𝑛
140 ± 1.96 X 2.5
140 ± 4.9
Therefore, 140 – 4.9 and 140 + 4.9
Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 135.1
and 144.9
c. Does the interval get wider or narrower? Why?
Answer
Interval get wider because as the sampling (number of samples) decreases, accuracy
decreases. Therefore, the interval gets wider.
3. A random sample of 300 diastolic blood pressure measurements are taken. Suppose a 99%
confidence interval for the population mean diastolic blood pressure is 68 to 73 mm Hg. If a
95% confidence interval is calculated, then
a) The 95% confidence interval will be wider than 99%
b) The 95% confidence interval will be narrower than 99%
c) 95% and 99% confidence interval will be the same.
Answer: (b)
Higher the confidence level, wider will be the interval. Therefore, Lower the confidence
level, narrower will be the Interval.
4.
In a health care utilization journal, results are reported from a study performed on a
random sample of 100 deliveries at a large teaching hospital. The sample mean birth weight
is reported as 120 ounces, and the sample standard deviation is 25 ounces. What will be the
confidence interval at 95% confidence level for the population birth weight?
Answer:
95%
n = 100
σXbar =
𝜎
√𝑛
=
120
√25
= 24
σXbar = 24
α = 100 – CI = 100 – 95
α = 5% = 0.05
Therefore, 0.025 on each side (0.05÷2 = 0.025)
P(Z) = 1 – 0.025 = 0.975
From SND table the Z-value is 1.96
Zα/2 = 1.96
Xbar ± Zα/2
𝜎
√𝑛
120 ± 1.96 X 24
120 ± 47.04
Therefore, 120 – 47.04 and 120 + 47.04
Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 72.96
and 167.04
5. A study was undertaken to evaluate the effect of percutaneous transluminal coronary
angioplasty (PTCA) in patients with one-vessel coronary artery disease. A random sample of
107 patients with coronary artery disease were given PTCA. Patients were given exercise
tests at baseline and after 6 months of follow-up. Exercise tests were performed up to
maximal effort until symptoms (such as angina) were present. The “change” in duration of
exercise was calculated. “Change” is defined as the 6 month test minus the baseline test.
The mean change was 2.1 minutes and the standard deviation of the changes was 3.1
a) What statistical test can be performed to see of there has been a statistically significant
change in duration of exercise for this group of patients given PTCA?
Answer:
Paired t-test will be used to access whether there was a significant change in duration of
exercise after 6-months of PTCA treatment.
b) Compute a 95% confidence interval for the mean change in exercise duration.
Answer:
95%
n = 100
σXbar =
𝑠
√𝑛
=
3.1
√107
= 0.3
σXbar = 0.3
α = 100 – CI = 100 – 95
α = 5% = 0.05
We will use two-tails t-table
t (α,n-1) = t (0.05,106) = 1.98
From SND table the Z-value is 1.98
t (α,n-1) = 1.98
Xbar ± t (α,n-1)
𝑠
√𝑛
2.1 ± 1.98 X 0.3
2.1 ± 0.594
Therefore, 2.1 – 0.594 and 2.1 + 0.594
Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 1.506
and 2.694
c) Can we conclude from this study that PTCA is effective in increasing exercise duration?
Are there any limitations or weaknesses in this study for answering that question?
Step 2: State H0 and H1.
H0: PTCA was not effective in increasing exercise duration.
H1: PTCA was effective in increasing exercise duration.
Step 3: Is it z or t? one tail or two tail ? α ?
α = 5%, one-tail test
Step 4: Calculate-t
t(n-1) =
t(n-1) =
|𝑋𝑏𝑎𝑟− 𝜇|
𝑠
√𝑛
|(2.1)− 0|
3.1
√107
t(n-1) = 7
Step 5: Find t-Critical from the t-table
df = 107-1 = 106
t-critical (106,0.05)1-tail = 1.66
If Calc-t > t-critical, Reject NULL hypothesis
Therefore, 7 > 1.66 i.e. Calc-t > t-critical.
Therefore, Reject NULL hypothesis
PTCA was effective in increasing exercise duration.
6. To test the dosage of this pain relief, 64 tablets were measured and the average
paracetamol content was 491 mg. The population standard deviation is known to be 55 mg.
a) What is the confidence level of the clamed dose of 500 mg?
µ = 500
Xbar = 491
σ = 55
µ = Xbar ± Zα/2
𝜎
√𝑛
500 = 491 ± Zα/2 (55/8)
Zα/2 = (7 X 8) / 55
Zα/2 = 1.018
Therefore, P(Zα/2 ) = 0.8438
Therefore, 1 – 0.8438 = 0.1562
Therefore, 0.1562 on each side , 0.1562+0.1562 = 0.3124
1 – 0.3124 = 0.6876
0.6876 X 100 = 68.76%
Answer: Confidence level is 68.76%
b) What percentage of tablets contains at least 500 mg of paracetamol?
Z = (X – Mean) / σ
Z = (500 – 491) / 55
Z = 0.1272
P(X ≤ 500)
P(Z ≤ 0.1272)
Therefore, from SND table, value corresponding to 0.12 is 0.5478.
0.5478 X 100 = 54.78%
7. An investigator thinks that people under the age of forty have higher risk of diabetes than
that are different than those of people over sixty years of age. The investigator administers a
dose of insulin to a group of 31 younger subjects and to a group of 31 older subjects. Higher
dose reflect better performance. The mean dose for younger subjects was 14.0 and the
standard deviation of younger subject's scores was 5.0. The mean dose for older subjects
was 20.0 and the standard deviation of older subject's scores was 6.0. Does this experiment
provide evidence for the investigator's theory?
a) Is this data paired or independent?
The data is independent because the data has been obtained from two different sets of
population.
b) Was this experiment useful? Work at 95% confidence level.
Step 1:
n1 = 31, n2 = 31
X1bar (Mean) = 14
X2bar (Mean) = 20
SD1 = 5
SD2 = 6
Variance 1 = (SD1)2 = 25 = s12
Variance 2 = (SD2)2 = 36 = s22
Step 2:
State H0 and H1.
H0: People under the age of 40 are at equal risk for diabetes than people over 60
years of age.
i.e. Risk of diabetes(People under 40) = Risk of diabetes (People over 60)
H1: People under the age of 40 have higher risk of diabetes than people over 60
years of age.
i.e. Risk of diabetes(People under 40) > Risk of diabetes (People over 60)
Step 3: Is it z or t? one tail or two tail ? α ?
α = 5%, one-tail test
n1 (Young) = 31
n2 (Old) = 31
df = n1 + n2 -2
df = 31 + 31 – 2
df = 60
Step 4: Calculate-t
t(n1 + n2 - 2) =
𝒙𝟏𝒃𝒂𝒓−𝒙𝟐𝒃𝒂𝒓
𝟏
𝟏
+ )
𝒏𝟏 𝒏𝟐
√𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 (
𝟐
𝒔 𝒑𝒐𝒐𝒍𝒆𝒅 =
(𝒏𝟏−𝟏)∗𝒔𝟏𝟐 +(𝒏𝟐−𝟏)∗𝒔𝟐𝟐
𝒏𝟏+𝒏𝟐−𝟐
𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 =
(𝟑𝟏−𝟏)∗𝟐𝟓+(𝟑𝟏−𝟏)∗𝟑𝟔
𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 =
𝟕𝟓𝟎+𝟏𝟎𝟖𝟎
𝟑𝟏+𝟑𝟏−𝟐
𝟑𝟏+𝟑𝟏−𝟐
𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = 30.5
𝟏𝟒−𝟐𝟎
t(n1 + n2 - 2) =
𝟏 𝟏
𝟑𝟏 𝟑𝟏
√𝟑𝟎.𝟓 ( + )
t(n1 + n2 - 2) =
t(n1 + n2 - 2) =
−𝟔
√𝟑𝟎.𝟓 (𝟎.𝟎𝟑+𝟎.𝟎𝟑)
−𝟔
𝟏.𝟑𝟓
t(n1 + n2 - 2) = -4.44
t(60) = -4.44
Step 5: Find t-Critical from the t-table
t(n1+n2-2,0.05)1-tail
t(60,0.05)1-tail = 1.671
t-critical (60, 0.05)1-tail = 1.671
Calc-t = -4.44
t-critical = 1.671
If Calc-t > t-critical, Reject NULL hypothesis
But Calc-t < t-critical i.e. -4.44 < 1.671
Therefore, Accept NULL hypothesis
Answer: Risk of diabetes(People under 40) = Risk of diabetes (People over 60)
8. An investigator theorizes that people who participate in a regular program of exercise will
have levels of systolic blood pressure that are significantly different from that of people who
do not participate in a regular program of exercise. To test this idea the investigator
randomly assigns 21 subjects to an exercise program for 10 weeks and 21 subjects to a nonexercise comparison group. After ten weeks the mean systolic blood pressure of subjects in
the exercise group is 137 and the standard deviation of blood pressure values in the exercise
group is 10. After ten weeks, the mean systolic blood pressure of subjects in the nonexercise group is 127 and the standard deviation on subjects in the non-exercise group is
9.0.
a) Is this data paired or independent?
The data is independent because the data has been obtained from two different sets of
population.
b) Was this experiment useful? Work at 95% confidence level.
Step 1:
n1 = 21, n2 = 21
X1bar (Mean) = 137
X2bar (Mean) = 127
SD1 = 10
SD2 = 9
Variance 1 = (SD1)2 = 100 = s12
Variance 2 = (SD2)2 = 81 = s22
Step 2:
State H0 and H1.
H0: People who participate in a regular program of exercise will have equal levels
of systolic blood pressure than people who do not participate in a regular
program of exercise.
i.e. Level of systolic blood pressure(Participate in exercise) = Level of systolic
blood pressure (Do not participate in exercise)
H1: People who participate in a regular program of exercise will have higher
levels of systolic blood pressure than people who do not participate in a regular
program of exercise.
i.e. Level of systolic blood pressure(Participate in exercise) > Level of systolic
blood pressure (Do not participate in exercise)
Step 3: Is it z or t? one tail or two tail ? α ?
α = 5%, one-tail test
n1 (Participate) = 21
n2 (Do not participate) = 21
df = n1 + n2 -2
df = 21 + 21 – 2
df = 40
Step 4: Calculate-t
t(n1 + n2 - 2) =
𝒙𝟏𝒃𝒂𝒓−𝒙𝟐𝒃𝒂𝒓
𝟏
𝟏
+ )
𝒏𝟏 𝒏𝟐
√𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 (
𝟐
𝒔 𝒑𝒐𝒐𝒍𝒆𝒅 =
(𝒏𝟏−𝟏)∗𝒔𝟏𝟐 +(𝒏𝟐−𝟏)∗𝒔𝟐𝟐
𝒏𝟏+𝒏𝟐−𝟐
𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 =
(𝟐𝟏−𝟏)∗𝟏𝟎𝟎+(𝟐𝟏−𝟏)∗𝟖𝟏
𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 =
𝟐𝟎𝟎𝟎+𝟏𝟔𝟐𝟎
𝟐𝟏+𝟐𝟏−𝟐
𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = 90.5
𝟐𝟏+𝟐𝟏−𝟐
𝟏𝟑𝟕−𝟏𝟐𝟕
t(n1 + n2 - 2) =
𝟏 𝟏
𝟐𝟏 𝟐𝟏
√𝟗𝟎.𝟓 ( + )
t(n1 + n2 - 2) =
t(n1 + n2 - 2) =
𝟏𝟎
√𝟗𝟎.𝟓 (𝟎.𝟎𝟒+𝟎.𝟎𝟒)
𝟏𝟎
𝟐.𝟔𝟗
t(n1 + n2 - 2) = 3.72
t(40) = 3.72
Step 5: Find t-Critical from the t-table
t(n1+n2-2,0.05)1-tail
t(40,0.05)1-tail = 1.684
t-critical (40, 0.05)1-tail = 1.684
Calc-t = 3.72
t-critical = 1.684
If Calc-t > t-critical, Reject NULL hypothesis
Therefore, Calc-t > t-critical i.e. 3.72 < 1.684
Therefore, Reject NULL hypothesis
Answer: Level of systolic blood pressure(Participate in exercise) > Level of
systolic blood pressure (Do not participate in exercise)
Download