Oversamples, Units of Analysis, and the Topic of Data

advertisement
x
GESIS-Variable Reports 2014/09
Oversamples, Units of Analysis, and the Topic of Data Transformation
Michael Terwey
Samples of social surveys are not necessarily to be used without prior basic information. Those who
intend to work with ALLBUS/GGSS data should be aware of the fact that since 1991 the ALLBUSsamples contain disproportionate shares of respondents from the area of the former German Democratic Republic. Since the reunification a lot of analyses have shown substantial differences between
the new federal states and the old Federal Republic of Germany (FRG). It is therefore often statistically
accurate to do separate analyses for these areas of present-day Germany. In order to allow meaningful
analyses of social groups in the eastern sub-sample, ALLBUS uses a disproportionate sampling design
which oversamples respondents from the area of the new federal states, i.e. more people are interviewed in East Germany than would be appropriate given its actual proportion of the total population.
For separate analyses of East and West Germany there is no need to take care of this oversample by
weighting the data. If your investigation aims at making a statement about Germany as a whole, however, the data have to be weighted first. For this purpose two East-West weights are provided with the
data set (cf. V1739-V1744). As an enhancement, the new variable report introduces properly weighted
frequency counts for total Germany as a regular feature of most variable descriptions since 1991.
Because the data has been adjusted for the oversample, these marginal distributions can be interpreted as prima facie representative of Germany as a whole. Cross tabulations by East and West can be
found in a separate supplement documentation to the ALLBUS variable reports. The percentages in
these tables can be analyzed without adjusting for the East German oversample. However as mentioned above, if your own analyses of this data set aim at making a statement about total Germany,
the data needs to be weighted first.
An additional topic to consider is the use of transformation weights to change household-samples into
data on persons and vice versa: From 1980 to 1992 and again in 1998, the ALLBUS surveys used a
sampling method based on the ADM-design. In the first stage of this three stage sampling procedure,
a sample of electoral districts was selected. In the second stage the interviewers selected a target
household in the electoral district, starting at a randomly chosen address and then proceeding through
the area following specified rules. In the third stage, random digits (Kish-Selection-Grid) were finally
used to determine the household-member to be interviewed (cf. Kirschner 1984; ADM and AG.MA
1999).
In the case of the ADM-sampling-design it is therefore the households within the universe sampled
which have the same probability of selection. However, for the individuals living at a specific address,
the probability of selection also depends on the respective number of household-members belonging
to the universe sampled. More specifically: The larger the number of household-members who are
eligible to be interviewed, the smaller the chance for a particular person in the household to be finally
chosen for the interview. In principle, all person-level analyses of data from ALLBUS/GGSS surveys
using a household-sample, i.e. ALLBUS 1980 to ALLBUS 1992 and ALLBUS 1998, therefore have to be
weighted proportionally to the number of household-members belonging to the universe sampled
(transformation weight, cf. V1739 and V1741).
In contrast to earlier ALLBUS-surveys, the surveys in 1994, 1996 and as of 2000 all used a two stage
random sampling design yielding a sample of persons rather than of households. In the first stage, a
ALLBUS/GGSS 1980-2012: Variable Report
xi
random sample of municipalities was selected. In the second stage, addresses of individuals were selected at random from the municipal registers of residents. Compared to the ADM-method of sampling, this procedure promises some advantages regarding not only the sample plan and the fieldwork
but also the accuracy of some results (cf. ADM and AG.MA 1999, Blohm 2006, Holst 2006, Koch 1997,
Koch et al. 1994, and table 1). As this more sophisticated and more costly sampling method would
have exceeded the budget for the 1998 survey, ALLBUS 1998 was again conducted using the
ADM-design (with interviewers working through lists of contact addresses which were pre-selected by
random route (cf. ADM and AG.MA 1999, ADM et al. 1999, Holst 2006)).
Similar to the early ALLBUS/GGSS-surveys the General Social Survey (GSS) has used full-probability
sampling of households giving each household an equal probability of being included. Consequently,
the GSS is also self-weighting for household-level variables. But again only one adult per household is
interviewed. Thus, respondents living in large households have lower probabilities to be selected. For
person-level variables, weighting analyses proportionally to the number of persons over 18 in the
household may adjust this.7 Figure 1 includes an example taken from GSS. For various reasons church
attachment can be expected to be positively correlated with household-size (cf. amongst others
Franzmann and Wagner 1999; Blume 2006). Therefore an analysis of household-samples without
transformation weighting (in GSS NADULTS) may underestimate church attendance (see crosses and
dotted trend line in figure 1). Doing the same tabulation with weighting yields slightly higher percentages of weekly church attendance (red boxes, solid trend line). Note that in GSS all units sampled are
of the same type, i.e. households, in all years.
45%
40%
35%
30%
With NADULTS
Without NADULTS
25%
1973 75
80
85
90
95
2000
05
10
Figure 1: Weekly church attendance in the USA weighted with COMPWT (transformation weight)
and unweighted (own calculations on General Social Survey (GSS) 1973-2010, ‘nearly every week’,
‘every week’, ‘more than once a week’ combined)
7
Actually in this analysis the weighting is done by composition of variables (COMPWT), which includes the number of adults in the factor WTSSALL (COMPWT=OVERSAMP*FORMWT*WTSSALL). In addition to the transformation
of the household data some special features of GSS are modified by this composition (cf. SDA 2013, Stephenson
1978).
xii
GESIS-Variable Reports 2014/09
The following illustration shows ALLBUS/GGSS-data on married respondents living together in one
household. In contrast to GSS we have two types of samples. As already mentioned above, the surveys
from 1980 to 1992 and again in 1998 are household-samples, whereas the others are person-samples.
Being married and living together is to some degree directly correlated with household-size. As the
dotted line in figure 1 shows, we would find basically no change over time if no weighting is done in
this straightforward case. Because we are living in a society with a growing share of singles, this finding would probably be misleading.8 After weighting the household-surveys to adjust for the underrepresentation of persons in larger households, we find a trend of diminishing percentages of
married respondents living together, which is probably a more realistic finding. (A similar difference of
trends would occur for calculations on households after transforming the person-samples into household-data (cf. transformation weighting of person-samples below).)
80%
70%
60%
50%
We ighted
Unweighted
92
96
40%
19 80
82
84
86
88
90
94
98
200 0
02
04
06
08
10
Figure 2: Percentages of married West German respondents living together in one household
(weighted with transformation weight and unweighted; own calculations on ALLBUS/GGSS 19802010)
However, transformation weights are rarely used in the daily practice of research using householdsamples. This is usually unproblematic, because weighted and unweighted results often differ only
8
For instance, the share of single person households in the old Federal States increased from 23% in 1980 to 38%
in 2010 and in the new Federal States it increased from 15% in 1991 to 38% in 2010 (own calculations using V1199
in ALLBUS/GGSS 1980-2010 weighted by V1566). This remarkable change of communal culture seems to be even
more striking in post-socialist society.
ALLBUS/GGSS 1980-2012: Variable Report
slightly or not at all. Generally speaking, the strength of the weighting effect depends on the strength
of the correlation between the object of interest in the analysis and the size of the household. If the
correlation is weak, using the transformation weight will only slightly affect the analysis. If the correlation is strong, however, weighting will lead to a more substantial difference between weighted and
unweighted results. Thus, the safest procedure when performing person-level analyses with data from
household samples is to perform both weighted and unweighted analyses. Results should then be
checked carefully to determine the influence of weighting.
As shown in figure 2, if weighted and unweighted results differ, it is often recommendable to use the
weighted results. It should be noted, however, that in other, less simplistic examples weighted data can
deviate more from official census data than unweighted data. The reason for this is that in surveys
using the ADM-design the observed size of households can itself be biased in the net sample because
of the practical chances of getting contact for interviews. One pertinent example is the underrepresentation, to some degree, of one-person households in ADM-design samples. In theory, respondents
living in one-person households should be overrepresented in samples using the ADM-design because
of the a priori higher selection probability of persons living in small households. As a matter of fact,
however, the difficulty in contacting respondents who are living alone trumps their theoretical advantage in the selection process, leading to a de facto underrepresentation (cf. Hartmann and
Schimpl-Neimanns 1992).
Table 1 shows some examples of various weighting effects. For this purpose, ALLBUS/GGSS data from
1992 (household-sample) were compared to data from the 1991 Mikrozensus and from ALLBUS/GGSS
1994 (person-sample). While weighting has little effect on all of the listed attitudinal variables (comparing ALLBUS 1992 and 1994), this is only true for some of the demographic variables (comparing
with Mikrozensus). Weighting only slightly affects the estimated proportion of men, of respondents
with less than 10 years of schooling and the proportion of workers.
The figures for older age groups (70 and older), for married respondents, for respondents living in their
own house and especially for respondents living alone, however, again show an effect of weighting
which may be desirable. This supports the assumption that all of these characteristics are statistically
connected with the size of household. On the other hand, further comparisons show that at least some
of the results show more deviation from the Mikrozensus data after the application of the transformation weight.
Both the ADM-design and the design using municipal registers of residents theoretically provide representative samples of the target population. The two methods differ, however, in the selection probability of the sampled units. If the targeted sampling unit is a known individual, respondents have an a
priori identical probability of selection. Therefore, the sample of persons drawn from the municipal
registers of residence needs no weighting for analyses targeting the person-level. In other words, for
analyses of person-level data from ALLBUS-surveys using the register-of-residents-sample applying a
transformation weight is unnecessary (cf. Bens 2006; Blohm 2006: 40). Demonstrating the accuracy of
a person-sample, the 1994 percentages, which can be analyzed on the person-level without transformation weighting, show rather little deviations from the official data in table 1.
Finally, while household-samples are not necessarily simply representative on the person-level, it is, on
the other hand, mostly inappropriate to draw statements about households from a person-sample
without applying an appropriate transformation weight. Whenever attributes of households – like
number of residents, household income, family typology or household classification – are concerned, it
is strongly recommended to apply the proper transformation weights (cf. Bens 2006 and V1739-V1744
in this variable report). Both types of transformation weights are currently available for all surveys in
the ALLBUS/GGSS-cumulation.
xiii
xiv
GESIS-Variable Reports 2014/09
Table 1:
Comparison of selected data from Mikrozensus 1991 (MZ 91), ALLBUS 1992 (A92) and
ALLBUS 1994 (A94) – calculation by Achim Koch (GESIS)
MZ91
West Germany %
A92
MZ91
East Germany %
A92
A94
A94
62.4
57.3
--
35.0
48.3
27.0
--
29.6
26.4
Attitudes:
Own economic situation:
Very good/good
--
Interested in politics:
Very strongly/strongly
--
Inglehart-Index:
Postmaterialists
--
Socialism is a good idea:
Agree completely/
tend to agree
--
61.6
33.3
33.2
23.3
29.6
22.1
--
23.3
42.5
10.8
10.8
9.8
43.6
--
42.8
72.8
81.2
73.3
Demography:
Sex: Male
47.5
49.0
51.6
46.6
47.2
Age: 70 years and older
12.3
8.8
9.5
10.9
11.8
Schooling: Less than
10 years
61.4
Occupational Position:
Worker*
34.3
Marital Status: Married
61.5
54.1
54.4
--
32.0
43.8
--
46.9
63.5
65.7
18.9
12.4
39.5
42.9
35.2
38.8
74.7
68.1
67.6
47.2
--
41.2
Lives alone
10.5
35.0
60.0
Lives in own house
6.3
40.7
28.8
68.0
48.1
8.4
53.3
29.7
49.0
46.3
36.0
34.8
32.5
18.1
14.9
24.3
8.1
13.6
16.0
MZ 91:
Anonymized 70% subsample of the 1991 Mikrozensus (ZUMA-File); adult population living in private households
A92:
Figures are weighted using the transformation weight (unweighted figures are in italics)
A94:
Unweighted figures
*
Based on the number of respondents currently full- or part-time employed.
ALLBUS/GGSS 1980-2012: Variable Report
xv
Table 2 shows an exemplary comparison of categorized household income data from ALLBUS/GGSS
2004 and Mikrozensus 2004 (cf. Bens 2006). Comparing columns 1 and 2 with columns 3 and 4 it
becomes apparent that a routine analysis of household income would be misleading without transformation weighting (transformation from person-data to household-data). For instance, about 25%
of the non-farm households in Mikrozensus 2004 and ALLBUS/GGSS 2004 have an income of 2600
Euro or more.9 The same household income category applies more often to persons (over 30%). The
share of large households (which often have higher incomes as such) decreases with this type of transformation. On the other hand, a comparison of columns 1 and 3 with columns 2 and 4 shows a positive degree of similarity between Mikrozensus and ALLBUS/GGSS. Consequently, analyzing a sample of
persons on the household-level without weighting will often show rather misleading results.
Table 2:
Comparison of categorized net household incomes (non-farm) in Mikrozensus 2004 and
ALLBUS/GGSS 2004. Income data from ALLBUS/GGSS as household-sample weighted with
transformation weight and east-west weight - income data from ALLBUS/GGSS as sample
of persons weighted with East-West weight only (calculations for Mikrozensus (70%
subsample of Mikrozensus 2004 (ZUMA file)) were provided by Michael Blohm, GESIS
Mannheim).
Net household income
Household Sample
Mikrozensus
ALLBUS
%
%
Sample of Persons
Mikrozensus
ALLBUS
%
%
Under 900 Euro
15.6
14.6
10.2
9.6
900 to 1499 Euro
25.9
25.1
20.8
21.1
1500 to 2599 Euro
33.3
35.6
35.7
37.8
2600 Euro or more
25.2
24.7
33.4
31.5
100.0
100.0
100.0
100.0
9
Income data of farmers are not available in Mikrozensus 2004.
xvi
GESIS-Variable Reports 2014/09
Literature
ADM (Arbeitskreis Deutscher Markt- und Sozialforschungsinstitute e.V.) and AG.MA
(Arbeitsgemeinschaft Media-Analyse e.V.) 1999: Stichproben-Verfahren in der Umfrageforschung. Eine
Darstellung für die Praxis. Opladen: Leske & Budrich.
ADM (Arbeitskreis Deutscher Markt- und Sozialforschungsinstitute e.V.), ASI (Arbeitsgemeinschaft
Sozialwissenschaftlicher Institute e.V.) and BVM (Berufsverband Deutscher Markt- und Sozialforscher
e.V.) 1999: Standards zur Qualitätssicherung in der Markt- und Sozialforschung: Denkschrift Standards for Quality Assurance in Market and Social Research, Frankfurt a.M.: ADM.
Bens, Arno 2006: Zur Auswertung haushaltsbezogener Merkmale mit dem ALLBUS 2004, in: ZAInformation 59: 143 - 156.
Blohm, Michael 2006: Datenqualität durch Stichprobenverfahren bei der Allgemeinen
Bevölkerungsumfrage der Sozialwissenschaften - ALLBUS, in: Frank Faulbaum and Christof Wolf (eds.),
Stichprobenqualität in Bevölkerungsumfragen, Bonn: Informationszentrum Sozialwissenschaften 2006:
37 - 54.
Blume, Michael 2006: Religiosität als demographischer Faktor. Ein unterschätzter Zusammenhang? in:
Marburg Journal of Religion 11: 1 – 24,
http://www.uni-marburg.de/fb03/ivk/mjr/pdfs/2006/articles/blume_germ2006.pdf, retrieved 29 March
2012.
Franzmann, Gabriele and Michael Wagner 1999: Heterogenitätsindizes zur Messung der Pluralität von
Lebensformen und ihre Berechnung in SPSS, in: ZA-Information 44: 75 - 95.
Hartmann, Peter and Bernhard Schimpl-Neimanns 1992: Sind Sozialstrukturanalysen mit
Umfragedaten möglich? Analysen zur Repräsentativität einer Sozialforschungsumfrage, in: Kölner
Zeitschrift für Soziologie und Sozialpsychologie 44 (1992): 315 - 340.
Holst, Christian 2006: Der Ipsos SOWI-Bus und erste Untersuchungsergebnisse, in: Frank Faulbaum and
Christof Wolf (eds.), Stichprobenqualität in Bevölkerungsumfragen, Bonn: Informationszentrum
Sozialwissenschaften 2006: 85 - 109.
Kirschner, Hans-Peter 1984: ALLBUS 1980: Stichprobenplan und Gewichtung, in: Karl Ulrich Mayer and
Peter Schmidt (eds.), Allgemeine Bevölkerungsumfrage der Sozialwissenschaften. Beiträge zu
methodischen Problemen des ALLBUS 1980, Frankfurt a.M. and New York: Campus Verlag 1984: 114 182.
Koch, Achim 1997: ADM-Design und Einwohnermelderegister-Stichprobe. Stichprobenverfahren bei
mündlichen Bevölkerungsumfragen, in: Siegfried Gabler and Jürgen H.P. Hoffmeyer-Zlotnik (eds.),
Stichproben in der Umfragepraxis, Opladen: Westdeutscher Verlag: 99 - 116.
Koch, Achim, Siegfried Gabler and Michael Braun 1994: Konzeption und Durchführung der
"Allgemeinen Bevölkerungsumfrage der Sozialwissenschaften" (ALLBUS) 1994, Mannheim: ZUMAArbeitsbericht Nr. 94/11.
ALLBUS/GGSS 1980-2012: Variable Report
SDA 2013: Survey Documentation and Analysis: Archive GSS 1972-2012 Cumulative Datafile,
http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss12, retrieved 27 March 2013.
Smith, Tom W., Peter V. Marsden, Michael Hout und Jibum Kim 2013: General Social Surveys, 19722012 [machine-readable data file and codebook]., principal investigator Tom W. Smith; co-principal
investigators Peter V. Marsden und Michael Hout, NORC ed. Chicago, National Opinion Research Center
producer; Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut, distributor.
Stephenson, Bruce C. 1978: Weighting the General Social Surveys for Bias Related to Household Size,
GSS Methodological Report No. 3, Chicago: NORC.
xvii
Download