Chapter 3-12. Standardization

advertisement
Chapter 3-12. Standardization
<< This chapter uses too many examples out of Rothman (2002), done that way to
quickly prepare a lecture while teaching out of the Rothman text. It needs to be
updated with more of my own examples. >>
In Chapter 3-11, we used pooling to combine stratum-specific estimates of effect measures (such
as risk ratio) into a single summary effect measure. The summary effect measure was basically a
weighted average of the stratum-specific estimates.
Another approach is standardization, which is a method of combining stratum-specific risks
(cases/N) or rates (cases/PT) into a single summary value by taking a weighted average of them.
It weights the stratum-specific rates using weights that come from a standard population, in
contrast to the pooling approach which weighted by how much information is contained in each
stratum.
Suppose we choose to use the U.S. population in the year 2000 as our standard. We would then
weight our age-specific rates with weights that reflect the age distribution of the U.S. population
in the year 2000. Our summary rate would then be the rate that we would expect if our
population had the same age distribution as the U.S. population in year 2000.
_________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 3-12 (revised 16 May 2010)
p. 1
Example Sweden and Panama (Rothman, 2002, pp.1-2)
It would seem residents of Sweden, where the standard of living is generally high, should have
lower death rates than residents of Panama, where poverty and more limited health care take their
toll. However, a greater proportion of Swedish residents than Panamanian residents die each
year.
The reason for this unexpected result is confounding due to differing age distributions of the
populations of these two countries, with Panama having a younger population.
Population Pyramid
Panama: 2000
MALE
FEMALE
60+
30-59
0-29
150 100 50 0
0 50 100 150
Population (in thousands)
Sweden: 2000
MALE
FEMALE
60+
30-59
0-29
300 200 100 0
0 100 200 300
Population (in thousands)
In both countries, older people die at a greater rate than younger people.
However, because Sweden has a population that is on the average older than that of Panama, a
greater proportion of all Swedes die in a given year, despite the lower death rates within specific
age categories.
If we standardized the Panama rate and the Sweden rate to match a single population (such as the
U.S. population), our rate estimates would then be directly comparable (not confounded by age),
since the rates would be based on the same age distribution.
Two advantages to this approach over pooling are:
1) It might be interesting to know the standardized rate, itself, for each country rather than just
the rate ratio. For example, it might be interesting to see a graph of 10 different countries, all
standardized to the same age distribution.
Chapter 3-12 (revised 16 May 2010)
p. 2
2) Pooling requires approximatley equal stratum-specific rate ratios (homogeneous effect),
whereas standardization does not. For example, the relative effects might be very different in
newborns, young adult, and old adult age groups for these two countries so that the pooled
estimate may not be appropriate.
We will return to this example below and standardize the rates using Stata.
Simple Examples of Standardization
Using Rothman’s example (2002, p. 158) suppose we have rates of:
Males: 10/1000 person years
Females: 5/1000 persons years
We can standardize these sex-specific rates to any standard that we wish.
For example, we might simply choose to weight males and females equally. We would then
obtained a weighted average of the two rates that would equal their simple average
standardized rate = weightmales  Ratemales + weightfemales  Ratefemales
= 1  10/1000py + 1  5/1000py
= (10+5)/1000py
= 7.5/1000 person years
Suppose the rates reflected the disease experience of nurses, 95% of whom are female. In that
case, we might wish to use as our standard a weight of 5% for males and 95% for females:
standardized rate = 0.05  10/1000py + 0.95  5/1000py
= (0.5 + 4.75)/1000py
= 5.25/1000 person years
To compare rates for exposed and unexposed people, we standardize both to the same standard
and then compare them.
An advantage to standardization is that it uses a defined set of weights (which are independent of
the data). Thus, other investigators can standardize using the same weights and then directly
compare their stratified results to yours.
Chapter 3-12 (revised 16 May 2010)
p. 3
Example
Mortality rate for current and past clozapine users by age category
Age 10-54 years
Age 55-94 years
Current
Past
Current
Past
Deaths
196
111
167
157
Person-years
62,119
15,763
6,085
2,780
5
315.5
704.2
2744
5647
Rate ( 10 years)
Rate difference
-388.7
-2903
5
( 10 years)
Rate ratio
0.45
0.49
Using the Mantel-Haenszel pooling approach, the mortality rate difference is -720/100,000
person years and the mortality rate ratio is 0.47.
Let’s now standardize the rates for age over the two age strata. We will standardize to the age
distribution of current clozapine use in the study, since that is the age distribution of those who
use the drug.
current clozapine use
Age 10-54 years: 62,119 py ( 91.1%)
Age 55-94 years: 6,085 py ( 8.9%)
Total:
68,204 py (100.0%)
Standardizing,
current clozapine use:
standardized mortality rate =
=
past clozapine use:
standardized mortality rate =
=
0.911  315.5/100,000 py + 0.089  2744/100,000py
532.2/100,000py
0.911  704.2/100,000 py + 0.089  5647/100,000py
1144/100,000py
Combining into standardize effect measures;
standardized rate difference = (532.2 – 1144)/100,000py = -612/100,000py
slightly smaller than the pooled estimate
standardize rate ratio = 532.2/1144 = 0.47
identical to the pooled estimate to two decimal places.
The stratum-specific rate ratios were very similar, so any weighting, whether pooled or
standardized, would give a result close to this value.
Chapter 3-12 (revised 16 May 2010)
p. 4
Standardized Mortality Ratio (SMR)
When the standardized rate ratio is calculated using the exposed group as the standard, the result
is usually referred to as a standardized mortality ratio, or standardized morbidity ratio
(Rothman, 2002, p.161).
Thus, we computed an SMR in the preceding example.
Direct Standardization
Rothman and Greenland (1998, pp.45-46) give the following formula for direct standardization.
Let T1, T1, … Tk be the person-years in k strata (e.g., age-sex categories) in some selected
standard population. Thus, the T’s are called the standard distribution for which the standardize
rate is based.
Let I1, I1, … Ik be the stratum-specific incidence rates computed from your data. Then the
standardized rate is given by
k
I T  ...  I k Tk
standardized rate  I s  1 1

T1  ...  Tk
IT
i i
i 1
k
T
i
i 1
The numerator is the number of cases one would see in a population that had the person-time
distribution T1, T1, … Tk and the stratum-specific rates I1, I1, … Ik. The denominator is the total
person-time in such a population. Therefore, the standardized rate, Is , is the rate one would see
in a population with person-time distribution T1, T1, … Tk and stratum-specific rates I1, I1, … Ik.
The standardization process can be conducted with incidence proportions or prevalence
proportions, as well.
Let N1, N1, … Nk be the number of persons in k strata. Let R1, R1, … Rk be the stratum-specific
incidence proportions (or prevalence proportions). Then the standardized risk, or standardized
prevalence, is given by
k
R N  ...  Rk N k
standardized risk  I s  1 1

N1  ...  N k
R N
i 1
k
i
N
i 1
i
i
These are the formulas used by the Stata’s direct standardization command dstdize.
Chapter 3-12 (revised 16 May 2010)
p. 5
Exercise (direct standardization)
Returning to the Sweden and Panama example, reading the data in,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on panswedmortality.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
panswedmortality.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use panswedmortality.dta, clear
Listing the data,
Data
Describe data
List data
Main tab: Variables: (leave empty for all variables): < leave empty >
Override minimum abbreviation of variable names: 15
Options tab: Table options: Draw divider lines between columns
Separators: When these variables change: nation
OK
list, abbreviate(15) divider sepby(nation)
1.
2.
3.
4.
5.
6.
+---------------------------------------------+
| nation | age_category | population | deaths |
|--------+--------------+------------+--------|
| Sweden |
0 - 29 |
3145000 | 3,523 |
| Sweden |
30 - 59 |
3057000 | 10,928 |
| Sweden |
60+ |
1294000 | 59,104 |
|--------+--------------+------------+--------|
| Panama |
0 - 29 |
741,000 | 3,904 |
| Panama |
30 - 59 |
275,000 | 1,421 |
| Panama |
60+ |
59,000 | 2,456 |
+---------------------------------------------+
We see that this file contains the variables for computing the age-specific incidence proportions,
or mortality proportions.
Chapter 3-12 (revised 16 May 2010)
p. 6
We will use the following standard population:
File
Open
Find the directory where you copied the course CD:
Find the subdirectory datasets & do-files
Single click on panswedstdpop.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
panswedstdpop.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use panswedstdpop.dta, clear
Double clicking on the last list command in the Review Window, and changing it to:
list, abbreviate(15) divider
+---------------------------+
| age_category | population |
|--------------+------------|
1. |
0 - 29 |
.35 |
2. |
30 - 59 |
.35 |
3. |
60+ |
.3 |
+---------------------------+
we see that this is a file with the proportion of the population that will be used for each age
stratum (the same for both countries).
When you wish to use a reference population that is different from any group in your incidence
or mortality data file, Stata requires:
1) the standard population to be saved in a separate Stata-formatted data file (.dta file extension),
2) for this file to have the identical strata as the risk data, and
3) for the morbidity or mortality data to be the current file in Stata memory.
Chapter 3-12 (revised 16 May 2010)
p. 7
Bringing the mortality data back in:
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on panswedmortality.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
panswedmortality.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use panswedmortality.dta, clear
To obtain the direct standardized rates, we use
Statistics
Epidemiology and related
Other
Direct standardization
Main tab: Characteristic variable: deaths
Population variable: population
Strata variable: age_category
Group variables: nation
Use standard population from Stata dataset: panswedstdpop
OK
dstdize deaths population age_category, by(nation)
using(panswedstdpop)
Chapter 3-12 (revised 16 May 2010)
p. 8
----------------------------------------------------------> nation= Panama
-----Unadjusted----- Std.
Pop. Stratum Pop.
Stratum
Pop.
Cases Dist. Rate[s] Dst[P] s*P
---------------------------------------------------------0 - 29
741000
3904 0.689 0.0053 0.350 0.0018
30 - 59
275000
1421 0.256 0.0052 0.350 0.0018
60+
59000
2456 0.055 0.0416 0.300 0.0125
---------------------------------------------------------Totals:
1075000
7781
Adjusted Cases: 17351.2
Crude Rate:
0.0072
Adjusted Rate:
0.0161
95% Conf. Interval: [0.0156, 0.0166]
----------------------------------------------------------> nation= Sweden
-----Unadjusted----- Std.
Pop. Stratum Pop.
Stratum
Pop.
Cases Dist. Rate[s] Dst[P] s*P
---------------------------------------------------------0 - 29
3145000
3523 0.420 0.0011 0.350 0.0004
30 - 59
3057000
10928 0.408 0.0036 0.350 0.0013
60+
1294000
59104 0.173 0.0457 0.300 0.0137
---------------------------------------------------------Totals:
7496000
73555
Adjusted Cases: 115032.5
Crude Rate:
0.0098
Adjusted Rate:
0.0153
95% Conf. Interval: [0.0152, 0.0155]
Summary of Study Populations:
nation
N
Crude
Adj_Rate
Confidence Interval
-------------------------------------------------------------------------Panama
1075000
0.007238
0.016141
[ 0.015645,
0.016637]
Sweden
7496000
0.009813
0.015346
[ 0.015235,
0.015457]
Notice that the standardized risks are given for each country, but there is no standardized risk
ratio or standardized risk difference reported by Stata. These have to be computed manually, as
standardized risk ratio:
display 0.016141/0.015346
1.051805
standardized risk difference:
display 0.016141 - 0.015346
.000795
These two formulas are given in Rothman and Greenland (1998, p.63). Given two standardized
rates, I s and I s* , both computed using the same standard distribution, the standardized rate ratio
and standardized risk difference are given by
IRs 
Is
I s*
and
Chapter 3-12 (revised 16 May 2010)
IRDs  I s  I s*
p. 9
The same formulas apply for computing the standardized risk ratio and the standardized
prevalence ratio and for computing the standardized risk difference and the standardized
prevalence difference.
The confidence intervals for these standardized effect measures are not simply forming ratios and
differences with the limits of the individual standardized measures. Rothman and Greenland
(1998, p.263) present formulas for the confidence intervals. These are not available in Stata for
direct standardization with the dstdize command.
Example Look at the article by Van Den Eden et al (2003). This is a paper that reports results
using direct standardization.
1) Look at Statistical Methods section. You should now be able to understand it. Notice that
they cite the US Census website. The website provides US population data so that researchers
around the world can standardize to a common population distribution.
2) Notice how standardization allowed them to compare rates across race/ethnic groups in Table
3, and across studies/countries in Table 4.
Chapter 3-12 (revised 16 May 2010)
p. 10
Indirect Standardization
The Stata command for indirect standardization is istdize.
The following description and formula for indirect standardization was taken from the Stata
reference manual under the dstdize command (StataCorp, 2003, Reference A-F, p.295):
“Standardization of rates can be performed via the indirect method whenever the stratumspecific rates are either unknown or unreliable. If the stratum-specific rates are known,
the direct standardization method is preferred.
In order to apply the indirect method, the following must be available:
1. The observed number of cases in each population to be standardized, O. For example,
if death rates in two states are being standardized using the US data rate for the same time
period, then you must know the total number of deaths in each state.
2. The distribution across the various strata for the population being studied, n1,…,nk. If
you are standardizing the death rate in the two states adjusting for age, then you must
know the number of individuals in each of the k age groups.
3. The stratum-specific rates for the standard population, p1,…,pk. For the example, you
must have the US death rate for each stratum (age group).
4. The crude rate of the standard population, C. For the example, you must have the
mortality rate for all the US for the year.”
The calculation is then (StataCorp, 2003, Reference A-F, p.299):
“For indirect standardization, define O as the observed number of cases in each
population to be standardized; n1,…,nk, the distribution across the various strata for the
population being studied;
R1,…,Rk, the stratum-specifc rates for the standard population; and C, the crude rate of
the standard population. Then the expected number of cases (deaths), E, in each
population is obtained by applying the standard population stratum-specific rates,
R1,…,Rk, to the study populations:
k
E   ni Ri
i 1
The indirectly adjusted rate is then
Rindirect  C
O
E
and O/E is the study population’s standardized mortality ratio (SMR) if death is the event
of interest or the standardized incidence ratio (SIR) for studies of disease (or other)
incidences.”
Chapter 3-12 (revised 16 May 2010)
p. 11
Exercise (indirect standardization)
We will use data borrowed from Kahn and Sempos (1989, 95-105) that are available on the Stata
website, and in the datasets & do-files subdirectory. The problem is (StataCorp, 2003, Ref A-F,
p. 295), “We want to compare 1970 mortality rates in California and Maine, adjusting for age.
Although we have age-specific population counts for the two states, we lack age-specific death
rates. In this situation, direct standardization is not feasible. We can use the US population
census data for the same year to produce indirectly standardized rates for the these two states.”
The 1970 US population age stratum-specific rates are found in KahnStdPopRates.dta.
1.
2.
3.
4.
5.
6.
7.
8.
+---------------------------------------+
|
age
population
deaths
rate |
|---------------------------------------|
|
<15
57,900,000
103,062
.00178 |
| 15-24
35,441,000
45,261
.00128 |
| 25-34
24,907,000
39,193
.00157 |
| 35-44
23,088,000
72,617
.00315 |
|---------------------------------------|
| 45-54
23,220,000
169,517
.0073 |
| 55-64
18,590,000
308,373
.01659 |
| 65-74
12,436,000
445,531
.03583 |
|
75+
7,630,000
736,758
.09656 |
+---------------------------------------+
The observed number of cases and age stratum-specific population sizes for the study
populations are found in the file KahnStudyPopSizes.dta.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
+-------------------------------------------+
| state
age
population
death |
|-------------------------------------------|
| California
<15
5,524,000
166,285 |
| California
15-24
3,558,000
166,285 |
| California
25-34
2,677,000
166,285 |
| California
35-44
2,359,000
166,285 |
|-------------------------------------------|
| California
45-54
2,330,000
166,285 |
| California
55-64
1,704,000
166,285 |
| California
65-74
1,105,000
166,285 |
| California
75+
696,000
166,285 |
|-------------------------------------------|
| Maine
<15
286,000
11,051 |
| Maine
15-24
168,000
11,051 |
| Maine
25-34
110,000
11,051 |
| Maine
35-44
109,000
11,051 |
|-------------------------------------------|
| Maine
45-54
110,000
11,051 |
| Maine
55-64
94,000
11,051 |
| Maine
65-74
69,000
11,051 |
| Maine
75+
46,000
11,051 |
+-------------------------------------------+
Notice here that the death variable is simply the total number of deaths, not the age-specific
deaths.
Chapter 3-12 (revised 16 May 2010)
p. 12
Bringing the study data into Stata,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on KahnStudyPopSizes.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
KahnStudyPopSizes.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use KahnStudyPopSizes.dta, clear
Obtaining the indirect standardized rates,
Statistics
Epidemiology and related
Other
Indirect standardization
Main tab: # cases variable: death
Population variable: population
Strata variables: age
Use standard population from Stata dataset: KahnStdPopRates
Use population variables: Case variable: death
Population varible: population
Options tab: Group variables: state
Include table summary of standard population in output
OK
istdize death population age using KahnStdPopRates.dta,
popvars(death population) by(state) print
Chapter 3-12 (revised 16 May 2010)
p. 13
------Standard Population-----Stratum
Rate
------------------------------<15
0.00178
15-24
0.00128
25-34
0.00157
35-44
0.00315
45-54
0.00730
55-64
0.01659
65-74
0.03583
75+
0.09656
------------------------------Standard population's crude rate:
0.00945
----------------------------------------------------------> state= California
Indirect Standardization
Standard
Population
Observed
Cases
Stratum
Rate
Population
Expected
---------------------------------------------------------<15
0.0018
5524000
9832.72
15-24
0.0013
3558000
4543.85
25-34
0.0016
2677000
4212.46
35-44
0.0031
2359000
7419.59
45-54
0.0073
2330000
17010.10
55-64
0.0166
1704000
28266.14
65-74
0.0358
1105000
39587.63
75+
0.0966
696000
67206.23
---------------------------------------------------------Totals:
19953000
178078.73
Observed Cases:
SMR (Obs/Exp):
SMR exact 95% Conf. Interval: [0.9293,
Crude Rate:
Adjusted Rate:
95% Conf. Interval: [0.0088,
166285
0.93
0.9383]
0.0083
0.0088
0.0089]
----------------------------------------------------------> state= Maine
Indirect Standardization
Standard
Population
Observed
Cases
Stratum
Rate
Population
Expected
---------------------------------------------------------<15
0.0018
286000
509.08
15-24
0.0013
168000
214.55
25-34
0.0016
110000
173.09
35-44
0.0031
109000
342.83
45-54
0.0073
110000
803.05
55-64
0.0166
94000
1559.28
65-74
0.0358
69000
2471.99
75+
0.0966
46000
4441.79
---------------------------------------------------------Totals:
992000
10515.67
Observed Cases:
SMR (Obs/Exp):
SMR exact 95% Conf. Interval: [1.0314,
Crude Rate:
Adjusted Rate:
95% Conf. Interval: [0.0097,
Chapter 3-12 (revised 16 May 2010)
11051
1.05
1.0707]
0.0111
0.0099
0.0101]
p. 14
Summary of Study Populations (Rates):
Cases
state
Observed
Crude
Adj_Rate Confidence Interval
-------------------------------------------------------------------------California
166285 0.008334
0.008824 [0.008782, 0.008866]
Maine
11051 0.011140
0.009931 [0.009747, 0.010118]
Summary of Study Populations (SMR):
Cases
Cases
Exact
state
Observed
Expected
SMR
Confidence Interval
-------------------------------------------------------------------------California
166285
178078.73
0.934
[0.929290, 0.938271]
Maine
11051
10515.67
1.051
[1.031405, 1.070688]
Chapter 3-12 (revised 16 May 2010)
p. 15
Direct Standardized Rates Using Individual Level Data
In this example, we will use a dataset from the Stata website. The file contains individual-level
data on persons in four cities over a number of years. For the standard population, we will
simply use the total sample distribution.
Three of the cities (1,2, and 3) introduced a public health campaign in 1991 for reducing high
blood pressure. City 5 was the control city, which received no public health campaign.
The task is to obtain standardized high blood pressure rates for each city for each of the years
1990 and 1992, using, as the standard, the age, sex, and race distribution of the four cities and
two years combined.
Using these standardized rates, the goal is to judge whether or not the compaign was successful.
Bringing the data into Stata
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on highBP.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
highBP.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use highBP.dta, clear
Look at the data in the browser. You will notice n=1130 subjects, with age categorized into 5year age categories and a variable indicating whether the subject had high blood pressure or not.
The dstdize command is designed to work with aggregate data. It will work, however, with
individual level data if we create a variable recording the population size represented by each
observation. For individual-level data, that size is one.
Enter the following command to set up this variable:
gen pop = 1
In the examples above, we always used a separate standard population file. This time, the
problem is to use the study population’s four cities and two years as our standard population,
stratified by age, sex, and race.
Chapter 3-12 (revised 16 May 2010)
p. 16
Requesting the standardize rates,
Statistics
Epidemiology and related
Other
Direct standardization
Main tab: Characteristic variable: hbp
Population variable: pop
Strata variable: age_group race sex
Group variables: city year
Use standard population from data in memory
if/in tab: If (expression): year==1990 | year==1992
Options tab: Include table summary of standard population in output
OK
dstdize hbp pop age_group race year if year==1990 | year==1992,
by(city year) print
---------------Standard Population--------------Stratum
Pop.
Dist.
------------------------------------------------15 - 19
Black
1990
44
0.097
15 - 19
Black
1992
35
0.077
15 - 19 Hispanic
1990
4
0.009
15 - 19 Hispanic
1992
11
0.024
15 - 19
White
1990
5
0.011
15 - 19
White
1992
7
0.015
20 - 24
Black
1990
49
0.108
20 - 24
Black
1992
61
0.134
20 - 24 Hispanic
1990
11
0.024
20 - 24 Hispanic
1992
16
0.035
20 - 24
White
1990
12
0.026
20 - 24
White
1992
13
0.029
25 - 29
Black
1990
39
0.086
25 - 29
Black
1992
22
0.048
25 - 29 Hispanic
1990
9
0.020
25 - 29 Hispanic
1992
11
0.024
25 - 29
White
1990
13
0.029
25 - 29
White
1992
12
0.026
30 - 34
Black
1990
24
0.053
30 - 34
Black
1992
24
0.053
30 - 34 Hispanic
1990
2
0.004
30 - 34 Hispanic
1992
3
0.007
30 - 34
White
1990
14
0.031
30 - 34
White
1992
14
0.031
------------------------------------------------Total:
455
(6 observations excluded due to missing values)
Chapter 3-12 (revised 16 May 2010)
p. 17
Summary of Study Populations:
city
year
N
Crude
Adj_Rate
Confidence Interval
----------------------------------------------------------------------1
1990
47
0.063830
0.024689
[ 0.000000,
0.050569]
1
1992
56
0.017857
0.003719
[ 0.000000,
0.010723]
2
1990
64
0.046875
0.024762
[ 0.000000,
0.050174]
2
1992
67
0.029851
0.007033
[ 0.000000,
0.015751]
3
1990
69
0.159420
0.062777
[ 0.028154,
0.097400]
3
1992
37
0.189189
0.028587
[ 0.012874,
0.044300]
5
1990
46
0.043478
0.025000
[ 0.000000,
0.052890]
5
1992
69
0.014493
0.015385
[ 0.000000,
0.036706]
Does it appear that the campaign worked?
It is hard to say. This illustrates the limitation of standardization. It is a good way to get
descriptive estimates, which are adjusted for potential confounders, which is sometimes all that is
needed.
To test hypotheses, however, researchers must turn to stratification or regression models.
Fitting a regression model to high blood pressure data,
Chapter 3-12 (revised 16 May 2010)
p. 18
keep if year==1990 | year==1992
* post-intervention indicator
gen post = 1 if year == 1992
replace post = 0 if year == 1990
replace post = . if year ==.
tab year post
drop year
* male indicator
gen male = 1 if sex == "Male"
replace male = 0 if sex == "Female"
replace male = . if sex == ""
tab sex male
drop sex
* race/ethnicity indictors
gen black = 0
replace black = 1 if race ==
replace black = . if race ==
*
gen white = 0
replace white = 1 if race ==
replace white = . if race ==
*
gen hispanic = 0
replace hispanic = 1 if race
replace hispanic = . if race
*
tab race black
tab race white
tab race hispanic
drop race
"Black"
""
"White"
""
== "Hispanic"
== ""
* continuous age variable (equal 5-year intervals)
gen age=.
replace age = 1 if age_group == "15 - 19"
replace age = 2 if age_group == "20 - 24"
replace age = 3 if age_group == "25 - 29"
replace age = 4 if age_group == "30 - 34"
tab age_group age
* intervention indicator
gen intervention = .
replace intervention = 1 if city==5
replace intervention = 0 if city<5
tab city intervention
* post x intervention interaction
gen postxint = post*intervention
tab postxint
tab hbp // consider overfitting
logistic hbp post intervention postxint male black hispanic age
Chapter 3-12 (revised 16 May 2010)
p. 19
We see that we should limit the number of predictors to 3 or 6, using the 30/10 or 30/5 rules
discussed in K30 Intro Biostat course.
hbp |
Freq.
Percent
Cum.
------------+----------------------------------0 |
431
93.49
93.49
1 |
30
6.51
100.00
------------+----------------------------------Total |
461
100.00
We will ignore this for the sake of completing the exercise.
The results of the logistic regression model were,
Logistic regression
Log likelihood = -93.186581
Number of obs
LR chi2(7)
Prob > chi2
Pseudo R2
=
=
=
=
455
34.75
0.0000
0.1572
-----------------------------------------------------------------------------hbp | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------post |
.6630758
.290103
-0.94
0.348
.2812892
1.563052
intervention |
.7409793
.5953338
-0.37
0.709
.1534317
3.578467
postxint |
.409221
.5471474
-0.67
0.504
.0297757
5.624109
male |
2.165947
1.128395
1.48
0.138
.7801839
6.013105
black |
.2669251
.1136918
-3.10
0.002
.115834
.6150958
hispanic |
.2379172
.1882277
-1.81
0.070
.0504661
1.121637
age |
1.759359
.3734157
2.66
0.008
1.160622
2.666969
------------------------------------------------------------------------------
To test our hypothesis, we use the post × intervention interaction term (p = 0.504). We conclude
the intervention was not statistically significant.
Chapter 3-12 (revised 16 May 2010)
p. 20
Exercise Look at the article by Adams, et al (2006). Notice in the footnote of Table 2, 3rd line
from bottom, they state,
“Mortality rates are per 100,000 person-years, directly standardized to the age distribution
of the cohort (according to sex).”
They are using individual level data (one observation per subject), and creating their own
standard population which is age by sex population proportions. In their Statistical Methods they
report using 5-year age categories, which will give more stable estimates (since there are more
observations per category than there are per year of age).
Also, notice that they report standardized rates in their Table 2, but they also use multivariable
Cox regression to test for associations.
References
Kahn HA, Sempos CT. (1989). Statistical Methods in Epidemiology. New York, Oxford
University Press.
Rothman KJ. (2002). Epidemiology: An Introduction. New York, Oxford University Press.
Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA, LippincottRaven Publishers.
StataCorp. (2003). Stata Statistical Software: Release 8.0. College Station, Texas.
Chapter 3-12 (revised 16 May 2010)
p. 21
Download