cpslink

advertisement
cpslink is a STATA program for matching adjacent year records
from the Current Population Survey. The cps has an in-4-out-8-in-4
rotation where there is a household interview in four successive months,
an
eight month rest that is followed by interviews in four successive
months.
With notable exceptions when household identifiers have been scrambled to
prevent matching, households can be linked across surveys. The cross year
link follows from the observation that the second set of four interviews
are in the same calendar months as the first four interviews. Thus if
there is
no attrition, in any given survey, half of the respondents would have
been
interviewed the previous year and the other half will be interviewed the
next
year. For the March Surveys (which includes the Annual Demographic
Supplement)
a person identifier, the line number, was added in 1979. In principle,
the
line number does not change once records for an individual are added to
the
survey.
With attrition, about 25-percent of the households cannot be
matched
across years and household membership varies so based on counts alone,
only
about 95 percent of the individuals can be matched.
The matching program begins with data in long form where matching
months for two successive years are stacked. For example in 1980 and
1981,
observations from month-in-sample(mis) 1 from 1980 can be matched with
the 1981
data for mis 5. The same is true of mis 2 with 6, 3 with 7, and
4 with 8 where the lower values of mis refer to the first year of the
adjacent
pair. The input data are a set of separate data sets where each includes
the stacked records for one month of an adjacent year pair.
The cpslink input data must include a variable named "year"
(the final two digits of the survey year) and another named "mis". mis
for
second-year records should exceed mis for first- year records by 4.
The output data retains mis and year, but mis is reduced by 4 in the
secondyear records. The output data are like the input data, the records for
the
two years are stacked and each data set refers to one month- in-sample
pair
for adjacent years. The individual records carry two new variables, level
and match_id. level is -1 for unmatched households and 0 for unmatched
records in matched households. For matched records, match_id with mis
and year, provides a unique identifier. The matching process consists of
a
user-specified series of steps. At each step, the user specifies a list
of
variables for matches in that step. The data are then sorted into cells
distinguished by a user supplied household identifier and the variables
in
the step's match list.
Matches occur when a cell contains one or more first-year and
one or more second-year records. If a match is not
unique, a tie-breaking phase of the step is entered. The user specifies a
secondary list of variables for breaking ties. The tie-breaker
first checks for matches at the most detailed partition permitted by the
variables given. (In this round, secondary ties or multiple matches are
separated randomly.) If a match is not found, the list of variables is
trimmed
by dropping the leftmost variable and the process is repeated. For
matched
records, level indicates the step where the match occurred. With step, s,
level=2*s-1 for unique matches and level=2*s for matches formed at the
tie breaker part of the step. There is an option to retain unmatched
records
so that attributes and behavior of the matched individuals can be
compared to
the unmatched ones.
There is a companion set of programs, makedwide and utilities, with the
suffix
.wid that rearrange the matched records into wide form with the first and
second year variables in the same record. (See makewide.doc)
The cpslink programs carry the suffix,.cps. Components are:
cpslink.cps,
getyrmo.cps, testmacs.cps, first.cps, match.cps, resdups.cps, this.cps,
that.cps, other.cps, last.cps, cleanup.cps, and macclear.cps.
cpslink requires three user supplied do-files, addvars, dropvars,
and
setmacs. addvars and dropvars can be dummies including the single
command,
<exit> if no action is to be taken. addvars is a program to construct
variables
that are used in matching. (There is an example included where "compage"
is
constructed; compage is age for first-year records and it is age-1 for
secondyear records.) dropvars drops variables not intended for the permanent
file.
setmacs.do is a list of controlling macros. In addition to the do-files,
there
is a user supplied list, yearlist.raw, that lists the years for which
data are
to be matched and, if a subset of months-in-sample are used, the months
as well.
setmacs.do includes:
logname
(required)
datain
(required)
dataout
(required)
temploc
(optional)
bloat
(optional)
compress
(optional)
unix
(required only if $bloat or $compress)
retain
(optional)
family
(required)
steps
(required)
match1
(required)
.
(required)
.
(required)
match#steps (required)
break1
(required)
.
(required)
.
(required)
break#steps (required)
logname the location and name of the log file to be created. (The log
will
replace a pre-existing file of the same name.)
datain
the location & prefix for the input data, /cpsdata/cps, for
example,
dataout the location & prefix for the output, e.g.,
/cpsdata/matched/mth,
NOTE: if $datain==$dataout is not permitted.
temploc the directory used for working space, e.g., /usr/temp. (If
temploc is
not specified, the current directory is assumed.),
bloat
1(the input data are compressed) or 0 (the input data are not
compressed.)
compress 1(compress output) or 0 (do not compress).
The default for bloat and compress is 0 so they are required only
if
compressed input is used or compressed output is desired.
NOTE: compress and bloat are unix options only.
unix
1(the os is unix)
retain
Followed by one token in double quotes, retain defines the
records
to be retained in output. The default (retain not specified) is
matched records only. "none" also retains only the matched
records.
Other options "all" (keep all matched and unmatched records),
"hhmatch" keeps all matched and unmatched records from the
matched
households, "year1" keeps all first year records plus all second
year
records from the matched households (whether matched or not),
finally
"year2" adds the second year records from the unmatched
households.
retain is cumulative. Matched records are always retained.
"hhmatch" adds the unmatched records from the matched households,
"year1" adds the first year records from the unmatched households
and
"year2" adds the second year records from the unmatched
households.
NOTE: "year2" is equilivent to "all".
family
One or more variable names, in double quotes, to specify the
variables
that uniquely identify the household.
steps
An integar indicating the number of levels or steps where
matches will
be selected.
For each step there are two macros,
match{n} & break{n} where {n} is the step.
match{n} is a list of variables, in double quotes. Matches occur within
matched
households when first year records match second year records for
the
variables specified in this list.
break{n} includes a supplementary list of variables, in double quotes,
for
breaking ties. The tie breaking process is sequential.
Observations
that match on the $match{n} variables are then matched on
$break{n}
when second tier ties occur, they are broken by random
assignment.
If second tier matches do not occur, the macro break{n} is reduced
by
dropping one variable (from the left) and the process is repeated.
yearlist.raw includes a list of first years (one to a line) of the
adjacent
year matches. Although the default is to match all four pairs of monthsinsample, there is an option to match only selected months. In this case,
on the
line specifying a given year, the user adds the months to be matched.
Again,
as with year, only the month in the first year is specified.
Example:
88
85 3 2
The two-line file shown above, if used for yearlist.raw, is a
request
that all possible months be matched from the 1988 and 1989 Surveys, but,
the
request is that month-in-sample 2 and 3 from the 1985 Survey be matched
with
months-in-sample 6 and 7, respectively, from the 1986 Survey.
Examples of setmacs.do, addvars.do, dropvars.do, and yearlist.raw
are
included along with the *.cps programs.
Download