Examples

advertisement
Basic Statistics
Graphs
In a study of long-term limiting illness and other dimensions of self-reported health, 6212
men and women completed the SF36 questionnaire to assess their health. For each of 8
sub-scales of the SF36, mean scores were compared for those reporting long-term
limiting illness and those not. The following graph was presented.
100
SF36 score
80
60
40
20
Not ill
0
Phys fun
Long term ill
Role phys
Soc f un
Role emo
Mental
Pain
Vitality
General
SF36 sub-scale
1. What kind of graph is this?
2. This graph plots mean values on the 8 SF36 sub-scales and joins up points for
those with and without long-term illness. What is wrong with this approach?
Suggest an alternative graph.
1
Summarizing Data I
In a study to compare immediate vs. delayed radiotherapy in patients with lung cancer,
data were obtained on activity level scores using the Rotterdam symptom checklist.
Month
0
1
2
4
6
% with symptom score > 10
Immediate treatment
Delayed treatment
45% (49/109)
43% (47/110)
47% (28/59)
48% (39/81)
43% (22/51)
51% (26/51)
49% (24/49)
57% (33/58)
48% (17/35)
64% (31/49)
What is the interpretation of the results shown in the above table?
Summarizing Data II:
The table below shows data from a study of semen analysis and donor success in a
programme of artificial insemination by donor (AID). From the data presented, do you
think it likely that semen count and % motility have Normal distributions?
Semen indices in most and least fertile donors
Successful donors
Unsuccessful donors
No.
Mean (SD)
No.
Mean (SD)
6
Semen count (10 /ml)
18
146.4 (95.7)
19
124.8 (81.8)
% motility
17
60.7 (9.7)
19
58.5 (12.8)
2
Confidence Intervals:
In a study of bone density and falls in older women, 118 volunteers were randomized to
receive either calcium supplements plus a program of exercise classes or to calcium alone
for 2 years. Twelve subjects dropped out from the calcium group and 14 from the calcium
group taking exercise, leaving 92 subjects who completed the two-year project. The
percentage change in bone mineral content and bone mineral density in two years was
calculated for each individual. The authors reported that for the ultradistal forearm the
change in bone mineral content was –2.6 (95% CI –4.6 to –0.6) in the calcium only group
and 1.14 (95% CI –0.8 to 3.1) in the calcium group taking exercise.
1. What do the confidence intervals for the change in bone mineral content mean?
2. To what population do they refer?
3. Confidence intervals are presented for each group separately. Suggest a more
informative confidence interval.
Significance Tests:
Shortly after the grounding of the Braer oil tanker off the Shetland Isles, a study was
conducted to ascertain whether the respiratory tracts of children were being affected by
the crude oil vapour and droplet spray. Peak expiratory flow rate (PEFR) of 44 children
aged 5-12 years was measured twice: 3 days after the shipwreck and 9-12 days after
when the strong smell of oil had abated. Statistical analysis of the paired PEFRs (by
Student’s paired t-test) showed no significant difference between the two sets of values
(P=0.502).
1. What is meant by no significant difference?
2. What is meant by a paired t-test? What assumptions are involved and are they
likely to be justified here?
3. What can we conclude about the effect of spillage on the respiratory function of
children?
3
Correlation
In a randomized placebo-controlled trial of human recombinant growth hormone (rhGH)
in patients with chronic heart failure, 50 patients were allocated to treatment or placebo.
The figure below shows the relationship between changes in serum IGF-1 ( a growth
factor) and changes in left-ventricular mass by treatment group. The authors reported
that, “..a significant relation was found between changes in serum IGF-1 concentrations
and left-ventricular mass for all patients (r = 0.55, p=0.0001) but this relation was less
evident when the two groups were analysed separately (rhGH: r=0.28, p=0.19; placebo:
r=0.36, p=0.08)”
120
100
80
60
40
20
0
-20
placebo
-40
-200
rhGH
-100
0
100
200
Change in IGF-1 (ng/ml)
Suggest why the relation between changes in IGF-1 and changes in left-ventricular
mass as shown in the graph was, “..less evident when the two groups were analysed
separately”.
4
Solutions:
Graphs
1. This is a line graph. Such graphs are usually used to show change in a quantity
over time, which is not the case here.
2. The 8 SF36 sub-scales are discrete scales, not a continuous scale and so to join
them up is meaningless. It suggests some ordering which is not present. A better
graph is one where the subscales are displayed separately, like the bar graph
below. This allows us to compare mean values for those with and without longterm illness on each sub-scale separately.
100
SF36 score
80
60
40
20
Not ill
0
Long term ill
Phys fun
Role phys
Soc f un
Mental
Role emo
Pain
Vitality
General
SF36 sub-scale
5
Summarizing Data I
Looking at the simple percentages alone, it appears that the symptom scores for the
delayed treatment group are increasing over time, whereas those for the immediate
treatment group are static. However, it is clear that the attrition rate in the study is very
high. At six months, data on activity level are presented for less than half of the original
sample. The attrition rate is much higher in the immediate treatment group. The reasons
for dropping out of the study may be different for the two groups. Hence, a simple
comparison of the percentages cannot be made, and more information on the drop-outs
needs to be obtained before the results can be interpreted.
Summarizing Data II
For data that follow a Normal distribution, 2.5% of the observations will lie below the
mean minus 2 standard deviations. Here this would imply negative values for semen
count, which are impossible. Hence we conclude that it is unlikely that semen count
follows a Normal distribution. With % motility, the standard deviation is much less than
half the mean and so the mean minus 2 standard deviations is a possible value. Hence the
data could follow a Normal distribution but without seeing all the observations we cannot
be sure.
Confidence Intervals
1. The two confidence intervals give us a range of values within which the mean
change is estimated to lie in the whole population of women who would volunteer
if they were to receive the treatment for that group. The confidence interval for
the calcium only group tells us that the mean percentage change is a reduction in
bone mineral content somewhere between 0.6 and 4.6 percentage points. For the
calcium plus exercise group the mean percentage change could be a reduction of
0.8 percentage points or an increase of 3.1 percentage points or any value between
these limits.
2. These women were volunteers and it is therefore difficult to extrapolate the
findings to a general population.
3. The confidence intervals provide estimates of the mean change in bone density
over 2 years. Since we are here interested in any effect of exercise on this, a
confidence interval for the difference between the two groups would be more
useful than the two separate intervals provided. Using the data given the
difference in mean change is 3.7 with an approximate 95% CI 0.9 to 6.5. This CI
does not contain the value zero, hence we can infer that the calcium only group
had a significantly higher reduction in bone mineral content compared to the
calcium plus exercise group.
6
Significance tests
1. No significant difference is the result of a significance test comparing PEFR in
children at two points in time after exposure. The null hypothesis is that there is
no change in PEFR in the population from which the sample of children was
drawn. The data were used to give a test statistic that has a known distribution if
the null hypothesis is true. We then find the probability of data as or more
extreme as that observed if the null hypothesis were true. If the probability is
small, then we have good evidence against the null hypothesis. Conventionally,
we use p=0.05 as the cut-off. Here the probability is > 0.05, therefore we
conclude that the study failed to detect a difference. This does not necessarily
mean that no difference exists since we cannot prove that the null hypothesis is
true on the basis of a significance test.
2. The paired t-test is used to test the null hypothesis of no change. It is used here
because we have two measurements on each subject at two points in time and we
are interested in the differences between individuals. It requires the assumption
that the differences follow a Normal distribution. This is likely to be true because
PEFR follows a Normal distribution and the difference of two variables from
Normal distributions will also follow a Normal distribution.
3. There is no evidence for a reduction in lung function in the children between 3
and 12 days after the spillage. However, we should not conclude that the spillage
had no effect on lung function. There may be an effect that is too small to be
statistically significant in a sample of this size. Alternatively, lung function may
have already been reduced before the initial reading was taken and remained
reduced.
Correlation
In this study we have two separate groups, rhGH and placebo, in which the
distributions of IGF-1 hardly overlap. It appears that the rhGH treatment has
increased both IGF-1 and left-ventricular mass. Putting the two groups together in
this way increases the range of the data and so increases the size of the correlation. It
also produces a mixture of two populations, treated and untreated. The relationship
between IGF and mass at the individual level is confused with that at the group level,
an effect similar to that produced by combining multiple observations from each
subject. It is, therefore, misleading to calculate a correlation coefficient for the
combined data.
7
Download