Soc 971: Fertility, families, and households Sara Moorman, So Jung Lim

advertisement

Soc 971: Fertility, families, and households

Week 4: Marriage markets, assortative mating

Sara Moorman, So Jung Lim

Data and Methods

Data

1. National Survey of Families and Households (NSFH)

A. Overview: The NSFH is a national multi-stage area probability sample of households in the

48 coterminous United States with an oversample of blacks, Puerto Ricans, Mexican Americans, single-parent families, families with step-children, cohabiting couples and recently married persons. In 1987-1988, 13,007 English or Spanish speakers ages 19 and older who were living in households completed 90-minute interviews. Participants were reinterviewed in 1992-1994 and 2001-2003. NSFH also has interviews with the main respondent’s baseline partner, and (in

1992-1994 and 2001-2003) current partner, focal child(ren), and parents. The interviews are fairly comparable across waves (although 1987-1988 is a little different from 1992-1994 and

2001-2003). There’s a fairly large problem with attrition between 1992-1994 and 2001-2003.

The data are really rich with regard to relationship variables.

The data are relatively difficult (although not dreadful) to work with (NSFH doesn’t have the same budget or staffing as, say, WLS), but although Larry Bumpass travels some, he’s often available for meetings and is very helpful. Also, many of the grad students have worked with

NSFH at one time or another, so socgradchat is a good place to get answers to your questions.

B. More information about NSFH:

Everything you need to work with it: http://www.ssc.wisc.edu/nsfh/home.htm

Sweet, J. A., Bumpass, L. L., & Call, V. (1988). The design and content of the National

Survey of Families and Households ( NSFH Working Paper No. 1) .

Madison, WI:

University of Wisconsin-Madison, Center for Demography and Ecology.

2. Integrated Public Use Microdata Series (IPUMS)

A. Overview: The Integrated Public Use Microdata Series (IPUMS) consists of thirty-eight highprecision samples of the American population drawn from fifteen federal censuses and from the

American Community Surveys of 2000-2004. Some of these samples have existed for years, and others were created specifically for this database. The thirty-eight samples, which draw on every surviving census from 1850-2000, is a very rich source of quantitative information on long-term changes in the American population. However, a wide variety of record layouts, coding schemes, and documentation is complicated to use them to study change over time. The IPUMS assigns uniform codes across all the samples and brings relevant documentation into a coherent form to facilitate analysis of social and economic change. The data series includes information on a broad range of population characteristics, including fertility, nuptiality, life-course transitions, immigration, internal migration, labor-force participation, occupational structure, education, ethnicity, and household composition.

IPUMS-International is a project dedicated to collecting and distributing census data from around the world. Access to census data of various countries has been limited and the documentation is often inadequate. Also, comparisons across countries or time periods are difficult because of inconsistencies in both data and documentation. IPUMS-International tries to

1

addresses these issues by converting census microdata for multiple countries into a consistent format, supplying comprehensive documentation, and making the data and documentation available through a web-based data dissemination system.

B. More information about IPUMS:

 http://www.ipums.umn.edu

3. Current Population Survey (CPS)

A. Overview: Every month, the U.S. Census Bureau interviews 50,000 households (not the same

50,000 household each month; the study is not longitudinal) for the Bureau of Labor Statistics.

Respondents report on all members of the household who are 16 years old or older. The sample is nationally representative (excluding military and institutionalized persons). It was originally designed to measure unemployment, and the major variables are employment related (although supplements that ask additional questions are frequent, and there’s a demographic survey conducted every March). The CPS has existed since ~1942, although not in its present form.

B. More information about CPS:

The website: http://www.bls.census.gov/cps/cpsmain.htm

More info, as a pdf: http://www.census.gov/prod/2002pubs/tp63rv.pdf

Analyses

1. Log-linear models

A. Overview: Log linear methods are based on cross-tabulations (contingency tables) of categorical variables. The first model tests the hypothesis that cases appear in cells at random

(the two variables are independent). The most common second model adds a parameter for the odds (probability) of falling on the diagonal (concordance on the two variables). The next step is to add parameters for the off-diagonal cells, modeling odds (probability) of appearing in one of those cells rather than another. These parameters can be fixed in a variety of ways or allowed to vary. Or you can start with the saturated model and begin to delete higher order interaction terms until the fit of the model to the data becomes unacceptable.

Researchers use goodness of fit tests (usually the Bayesian Inference Criteria, or BIC, but sometimes the likelihood ratio G 2 ) and the rule of parsimony to select a model.

B. The model referring to the traditional two-way tables:

Ln ( F ij

)

    i

A  

B j

 

AB ij

(this model referring to the chi-square test where two variables, each with two levels (2*2 table) to see if an association exists between the variables.)

Ln ( F ij

) : the log of the expected cell frequency of the cases for cell ij in the contingency table.

i & j: categories within the variables A, B

A

: the main effect for variable A i

B j

: the main effect for variable B

AB ij

: the interaction effect for variables A and B

C. Hierarchical approach to linear modeling: for a complex multivariate relationship, in this case, less complex models are nested within the higher-order model.

Ln ( F ijk

)

    i

A  

B j

 

C k

 

AB ij

 

AC ik

 

BC jk

 

ABC ijk

D. More on log linear models:

2

See section 15.4 in Agresti & Finlay’s Statistical Methods for the Social Sciences (used in Soc 361)

 Take John Logan’s class on categorical data analysis

Knoke, D., & Burke, P. J. (1980). Log-Linear Models . Beverly Hillys, CA: Sage

Publications. (From the series “Quantitative applications in the social sciences,” also known as, “Those little green Sage books.”)

Agresti. A. 1996. An Introduction to Categorical Analysis . John Wiley & Sons, Inc. New

York, New York, USA.

2. Log-multiplicative layer effect models (Raymo & Xie, 1998)

(1) F ijkl

 

0

 i

W  j

H  k

C  l

P  ik

W C  jk

HC  il

W P  jl

HP  ikl

W CP  jkl

HCP  ij

W H  ijk

W HC  ijl

W HP  ijkl

W HCP

(2) F ijkl

 

0

 i

W  j

H  k

C  l

P  ik

W C  jk

HC  il

W P  jl

HP  ikl

W CP 

(

 ij

: common association pattern,

 kl jkl

HCP exp(

 ij

 kl

)

: table-specific deviations)

Model 2(log-multiplicative model) is used in order to get a parsimonious models (Xie 1992)* because “model of equation (1) generates more parameters than can be easily interpreted.” Logmultiplicative models are based on the assumption that we can find the variation between variables by a pattern of association common to all tables and a table-specific parameter. In the above models, the last four parameters are simply expressed as exp(

 ij

 kl

) in the model (2).

3. Cox proportional hazard models

A. Overview: ln h ( t )

 a ( t )

 b

1 x

1

 b

2 x

2

A(t) can be any function of time. For any two individuals at any point in time, the ratio of the hazards (probability that an event will occur at a particular time to a particular individual given that the individual is at risk at that time) is a constant. Once there are time-varying independent variables, the hazards aren't proportional. You can add the time varying variables: ln h(t) = a(t)

+ b

1 x

1

+ b

2 x

2

(t). You can add the time varying variables with lag, if you expect lag between time and effects: ln h(t) = a(t) + b

1 x

1

+ b

2 x

2

(t-2). The hazards are also not proportional if an independent variable interacts with time. You can stratify, or divide the sample by the categories of the variable that interact with time.

B. More on hazard models:

Allison, P. D. (1984). Event history analysis: Regression for longitudinal event data.

Beverly Hills, CA: Sage Publications. (Another “little green Sage book.”)

Take Alberto Palloni’s class on event history analysis.

3

Download