Applications of Biostatistics in Clinical Data 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period. The incubation periods of a random sample of 7 HIV infected individuals is given below (in years): 12.0 9.5 13.5 7.2 10.5 6.3 12.5 a) Calculate the sample mean. Mean = (12.0 + 9.5 + 13.5 + 7.2 + 10.5 + 6.3 + 12.5) / 7 = 10.21 b) Calculate the sample median Sort: 6.3, 7.2, 9.5, 10.5, 12.0, 12.5, 13.5 Median = (n + 1) / 2 = 8/2 = 4th value Median = 10.5 c) Calculate the sample standard deviation. SD = √ ∑(𝑋−𝑋𝑏𝑎𝑟)2 𝑛−1 Xbar = 10.21 SD = √ ∑(12.0−10.21)2 +(9.5−10.21)2 +(13.5−10.21)2 +(7.2−10.21)2 +(10.5−10.21)2 +(6.3−10.21)2 +(12.5−10.21)2 7−1 SD = 2.71 d) If the number 6.3 above were changed to 1.5, what would happen to the sample mean, median and standard deviation? State whether each would increase, decrease, or remain the same. Mean = (12.0 + 9.5 + 13.5 + 7.2 + 10.5 + 1.5 + 12.5) / 7 = 9.52 Sort: 1.5, 7.2, 9.5, 10.5, 12.0, 12.5, 13.5 Median = (n + 1) / 2 = 8/2 = 4th value Median = 10.5 SD = √ ∑(𝑋−𝑋𝑏𝑎𝑟)2 𝑛−1 Xbar = 10.21 SD = √ ∑(12.0−10.21)2 +(9.5−10.21)2 +(13.5−10.21)2 +(7.2−10.21)2 +(10.5−10.21)2 +(1.5−10.21)2 +(12.5−10.21)2 7−1 SD = 4.17 By changing 6.3 to 1.5, Mean decreases, Median remains same and SD increases. e) Suppose instead of 7 individuals, we had 14 individuals. (we added 7 more observations). 12.0 9.5 13.5 7.2 8.1 10.5 6.3 12.5 14.9 7.9 5.2 13.1 10.7 6.5 Make a guess of whether the sample mean and sample standard deviation for the 14 observations would increase, decrease or remain the same compared to answer in part (d). Mean = (5.2+6.3+6.5+7.2+7.9+8.1+9.5+10.5+10.7+12.0+12.5+13.1+13.5+14.9)/14 = 8.88 Sort: 5.2, 6.3, 6.5, 7.2, 7.9, 8.1, 9.5, 10.5, 10.7, 12.0, 12.5, 13.1, 13.5, 14.9 Median = (n + 1) / 2 = 15/2 = 7.5th value (i.e. between 7th and 8th value) Median = (9.5 + 10.5)/2 = 10 SD = √ ∑(𝑋−𝑋𝑏𝑎𝑟)2 𝑛−1 SD = 3.22 By changing number of observations to 14, Mean decreases, Median increases and SD decreases. 2. A study is conducted concerning the blood pressure of 60 year old women with glaucoma. In the study, 200 women are randomly selected and the sample mean systolic blood pressure is 140 mm Hg and the sample standard deviation is 25 mm Hg. a. Calculate a 95% confidence interval for the true mean systolic blood pressure among the population of 60 year old women with glaucoma. Answer: 95% n = 200 σXbar = 𝜎 √𝑛 = 25 √200 = 1.768 σXbar = 1.768 α = 100 – CI = 100 – 95 α = 5% = 0.05 Therefore, 0.025 on each side (0.05÷2 = 0.025) P(Z) = 1 – 0.025 = 0.975 From SND table the Z-value is 1.96 Zα/2 = 1.96 Xbar ± Zα/2 𝜎 √𝑛 140 ± 1.96 X 1.768 140 ± 3.465 Therefore, 140 – 3.465 and 140 + 3.465 Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 136.535 and 143.465 b. Suppose the study above was based on 100 women instead of 200 but the sample mean (140) and standard deviation (25) are the same. Recalculate the 95% confidence interval. Answer: 95% n = 200 σXbar = 𝜎 √𝑛 = 25 √100 = 2.5 σXbar = 2.5 α = 100 – CI = 100 – 95 α = 5% = 0.05 Therefore, 0.025 on each side (0.05÷2 = 0.025) P(Z) = 1 – 0.025 = 0.975 From SND table the Z-value is 1.96 Zα/2 = 1.96 Xbar ± Zα/2 𝜎 √𝑛 140 ± 1.96 X 2.5 140 ± 4.9 Therefore, 140 – 4.9 and 140 + 4.9 Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 135.1 and 144.9 c. Does the interval get wider or narrower? Why? Answer Interval get wider because as the sampling (number of samples) decreases, accuracy decreases. Therefore, the interval gets wider. 3. A random sample of 300 diastolic blood pressure measurements are taken. Suppose a 99% confidence interval for the population mean diastolic blood pressure is 68 to 73 mm Hg. If a 95% confidence interval is calculated, then a) The 95% confidence interval will be wider than 99% b) The 95% confidence interval will be narrower than 99% c) 95% and 99% confidence interval will be the same. Answer: (b) Higher the confidence level, wider will be the interval. Therefore, Lower the confidence level, narrower will be the Interval. 4. In a health care utilization journal, results are reported from a study performed on a random sample of 100 deliveries at a large teaching hospital. The sample mean birth weight is reported as 120 ounces, and the sample standard deviation is 25 ounces. What will be the confidence interval at 95% confidence level for the population birth weight? Answer: 95% n = 100 σXbar = 𝜎 √𝑛 = 120 √25 = 24 σXbar = 24 α = 100 – CI = 100 – 95 α = 5% = 0.05 Therefore, 0.025 on each side (0.05÷2 = 0.025) P(Z) = 1 – 0.025 = 0.975 From SND table the Z-value is 1.96 Zα/2 = 1.96 Xbar ± Zα/2 𝜎 √𝑛 120 ± 1.96 X 24 120 ± 47.04 Therefore, 120 – 47.04 and 120 + 47.04 Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 72.96 and 167.04 5. A study was undertaken to evaluate the effect of percutaneous transluminal coronary angioplasty (PTCA) in patients with one-vessel coronary artery disease. A random sample of 107 patients with coronary artery disease were given PTCA. Patients were given exercise tests at baseline and after 6 months of follow-up. Exercise tests were performed up to maximal effort until symptoms (such as angina) were present. The “change” in duration of exercise was calculated. “Change” is defined as the 6 month test minus the baseline test. The mean change was 2.1 minutes and the standard deviation of the changes was 3.1 a) What statistical test can be performed to see of there has been a statistically significant change in duration of exercise for this group of patients given PTCA? Answer: Paired t-test will be used to access whether there was a significant change in duration of exercise after 6-months of PTCA treatment. b) Compute a 95% confidence interval for the mean change in exercise duration. Answer: 95% n = 100 σXbar = 𝑠 √𝑛 = 3.1 √107 = 0.3 σXbar = 0.3 α = 100 – CI = 100 – 95 α = 5% = 0.05 We will use two-tails t-table t (α,n-1) = t (0.05,106) = 1.98 From SND table the Z-value is 1.98 t (α,n-1) = 1.98 Xbar ± t (α,n-1) 𝑠 √𝑛 2.1 ± 1.98 X 0.3 2.1 ± 0.594 Therefore, 2.1 – 0.594 and 2.1 + 0.594 Therefore, the true mean dose of Amoxicillin tablets with 95% CI is between 1.506 and 2.694 c) Can we conclude from this study that PTCA is effective in increasing exercise duration? Are there any limitations or weaknesses in this study for answering that question? Step 2: State H0 and H1. H0: PTCA was not effective in increasing exercise duration. H1: PTCA was effective in increasing exercise duration. Step 3: Is it z or t? one tail or two tail ? α ? α = 5%, one-tail test Step 4: Calculate-t t(n-1) = t(n-1) = |𝑋𝑏𝑎𝑟− 𝜇| 𝑠 √𝑛 |(2.1)− 0| 3.1 √107 t(n-1) = 7 Step 5: Find t-Critical from the t-table df = 107-1 = 106 t-critical (106,0.05)1-tail = 1.66 If Calc-t > t-critical, Reject NULL hypothesis Therefore, 7 > 1.66 i.e. Calc-t > t-critical. Therefore, Reject NULL hypothesis PTCA was effective in increasing exercise duration. 6. To test the dosage of this pain relief, 64 tablets were measured and the average paracetamol content was 491 mg. The population standard deviation is known to be 55 mg. a) What is the confidence level of the clamed dose of 500 mg? µ = 500 Xbar = 491 σ = 55 µ = Xbar ± Zα/2 𝜎 √𝑛 500 = 491 ± Zα/2 (55/8) Zα/2 = (7 X 8) / 55 Zα/2 = 1.018 Therefore, P(Zα/2 ) = 0.8438 Therefore, 1 – 0.8438 = 0.1562 Therefore, 0.1562 on each side , 0.1562+0.1562 = 0.3124 1 – 0.3124 = 0.6876 0.6876 X 100 = 68.76% Answer: Confidence level is 68.76% b) What percentage of tablets contains at least 500 mg of paracetamol? Z = (X – Mean) / σ Z = (500 – 491) / 55 Z = 0.1272 P(X ≤ 500) P(Z ≤ 0.1272) Therefore, from SND table, value corresponding to 0.12 is 0.5478. 0.5478 X 100 = 54.78% 7. An investigator thinks that people under the age of forty have higher risk of diabetes than that are different than those of people over sixty years of age. The investigator administers a dose of insulin to a group of 31 younger subjects and to a group of 31 older subjects. Higher dose reflect better performance. The mean dose for younger subjects was 14.0 and the standard deviation of younger subject's scores was 5.0. The mean dose for older subjects was 20.0 and the standard deviation of older subject's scores was 6.0. Does this experiment provide evidence for the investigator's theory? a) Is this data paired or independent? The data is independent because the data has been obtained from two different sets of population. b) Was this experiment useful? Work at 95% confidence level. Step 1: n1 = 31, n2 = 31 X1bar (Mean) = 14 X2bar (Mean) = 20 SD1 = 5 SD2 = 6 Variance 1 = (SD1)2 = 25 = s12 Variance 2 = (SD2)2 = 36 = s22 Step 2: State H0 and H1. H0: People under the age of 40 are at equal risk for diabetes than people over 60 years of age. i.e. Risk of diabetes(People under 40) = Risk of diabetes (People over 60) H1: People under the age of 40 have higher risk of diabetes than people over 60 years of age. i.e. Risk of diabetes(People under 40) > Risk of diabetes (People over 60) Step 3: Is it z or t? one tail or two tail ? α ? α = 5%, one-tail test n1 (Young) = 31 n2 (Old) = 31 df = n1 + n2 -2 df = 31 + 31 – 2 df = 60 Step 4: Calculate-t t(n1 + n2 - 2) = 𝒙𝟏𝒃𝒂𝒓−𝒙𝟐𝒃𝒂𝒓 𝟏 𝟏 + ) 𝒏𝟏 𝒏𝟐 √𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 ( 𝟐 𝒔 𝒑𝒐𝒐𝒍𝒆𝒅 = (𝒏𝟏−𝟏)∗𝒔𝟏𝟐 +(𝒏𝟐−𝟏)∗𝒔𝟐𝟐 𝒏𝟏+𝒏𝟐−𝟐 𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = (𝟑𝟏−𝟏)∗𝟐𝟓+(𝟑𝟏−𝟏)∗𝟑𝟔 𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = 𝟕𝟓𝟎+𝟏𝟎𝟖𝟎 𝟑𝟏+𝟑𝟏−𝟐 𝟑𝟏+𝟑𝟏−𝟐 𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = 30.5 𝟏𝟒−𝟐𝟎 t(n1 + n2 - 2) = 𝟏 𝟏 𝟑𝟏 𝟑𝟏 √𝟑𝟎.𝟓 ( + ) t(n1 + n2 - 2) = t(n1 + n2 - 2) = −𝟔 √𝟑𝟎.𝟓 (𝟎.𝟎𝟑+𝟎.𝟎𝟑) −𝟔 𝟏.𝟑𝟓 t(n1 + n2 - 2) = -4.44 t(60) = -4.44 Step 5: Find t-Critical from the t-table t(n1+n2-2,0.05)1-tail t(60,0.05)1-tail = 1.671 t-critical (60, 0.05)1-tail = 1.671 Calc-t = -4.44 t-critical = 1.671 If Calc-t > t-critical, Reject NULL hypothesis But Calc-t < t-critical i.e. -4.44 < 1.671 Therefore, Accept NULL hypothesis Answer: Risk of diabetes(People under 40) = Risk of diabetes (People over 60) 8. An investigator theorizes that people who participate in a regular program of exercise will have levels of systolic blood pressure that are significantly different from that of people who do not participate in a regular program of exercise. To test this idea the investigator randomly assigns 21 subjects to an exercise program for 10 weeks and 21 subjects to a nonexercise comparison group. After ten weeks the mean systolic blood pressure of subjects in the exercise group is 137 and the standard deviation of blood pressure values in the exercise group is 10. After ten weeks, the mean systolic blood pressure of subjects in the nonexercise group is 127 and the standard deviation on subjects in the non-exercise group is 9.0. a) Is this data paired or independent? The data is independent because the data has been obtained from two different sets of population. b) Was this experiment useful? Work at 95% confidence level. Step 1: n1 = 21, n2 = 21 X1bar (Mean) = 137 X2bar (Mean) = 127 SD1 = 10 SD2 = 9 Variance 1 = (SD1)2 = 100 = s12 Variance 2 = (SD2)2 = 81 = s22 Step 2: State H0 and H1. H0: People who participate in a regular program of exercise will have equal levels of systolic blood pressure than people who do not participate in a regular program of exercise. i.e. Level of systolic blood pressure(Participate in exercise) = Level of systolic blood pressure (Do not participate in exercise) H1: People who participate in a regular program of exercise will have higher levels of systolic blood pressure than people who do not participate in a regular program of exercise. i.e. Level of systolic blood pressure(Participate in exercise) > Level of systolic blood pressure (Do not participate in exercise) Step 3: Is it z or t? one tail or two tail ? α ? α = 5%, one-tail test n1 (Participate) = 21 n2 (Do not participate) = 21 df = n1 + n2 -2 df = 21 + 21 – 2 df = 40 Step 4: Calculate-t t(n1 + n2 - 2) = 𝒙𝟏𝒃𝒂𝒓−𝒙𝟐𝒃𝒂𝒓 𝟏 𝟏 + ) 𝒏𝟏 𝒏𝟐 √𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 ( 𝟐 𝒔 𝒑𝒐𝒐𝒍𝒆𝒅 = (𝒏𝟏−𝟏)∗𝒔𝟏𝟐 +(𝒏𝟐−𝟏)∗𝒔𝟐𝟐 𝒏𝟏+𝒏𝟐−𝟐 𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = (𝟐𝟏−𝟏)∗𝟏𝟎𝟎+(𝟐𝟏−𝟏)∗𝟖𝟏 𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = 𝟐𝟎𝟎𝟎+𝟏𝟔𝟐𝟎 𝟐𝟏+𝟐𝟏−𝟐 𝒔𝟐 𝒑𝒐𝒐𝒍𝒆𝒅 = 90.5 𝟐𝟏+𝟐𝟏−𝟐 𝟏𝟑𝟕−𝟏𝟐𝟕 t(n1 + n2 - 2) = 𝟏 𝟏 𝟐𝟏 𝟐𝟏 √𝟗𝟎.𝟓 ( + ) t(n1 + n2 - 2) = t(n1 + n2 - 2) = 𝟏𝟎 √𝟗𝟎.𝟓 (𝟎.𝟎𝟒+𝟎.𝟎𝟒) 𝟏𝟎 𝟐.𝟔𝟗 t(n1 + n2 - 2) = 3.72 t(40) = 3.72 Step 5: Find t-Critical from the t-table t(n1+n2-2,0.05)1-tail t(40,0.05)1-tail = 1.684 t-critical (40, 0.05)1-tail = 1.684 Calc-t = 3.72 t-critical = 1.684 If Calc-t > t-critical, Reject NULL hypothesis Therefore, Calc-t > t-critical i.e. 3.72 < 1.684 Therefore, Reject NULL hypothesis Answer: Level of systolic blood pressure(Participate in exercise) > Level of systolic blood pressure (Do not participate in exercise)