Increasing Survey Statistics Precision Using Split Questionnaire
Design: An Application of Small Area Estimation
1
Population Characteristics Estimation
• Introduction
• Small Area Models
• Nested Error Regression Model
• Steps of Procedure
• Measures of Comparisons
• Results
2
• Issue:
– Effects of a lengthy survey questionnaire on:
• Increasing response burden
• declining response rate and precision of survey statistics.
• A solution:
Splitting the questionnaire into sub-questionnaires and assigning each one to a group of sample units.
Procedure of sub-sample selection is at random, therefore, the resulting nonresponse is completely at random.
The resulting nonresponse would be imputed by the common imputation methods.
• Our method:
– Designing and analyzing the split questionnaire, using Small Area Estimation technique.
– The method is applicable where the efficient survey estimates are required.
– Complete data set is not provided in our method.
3
Population Characteristics Estimation
• Introduction
• Small Area Models
• Nested Error Regression Model
• Steps of Procedure
• Measures of Comparisons
• Results
4
Design steps:
(To apply small area estimation) i.
The original questionnaire is divided to (m) sub-questionnaires. Some common items as covariates are assigned to the all sub-questionnaires. Therefore, all sample units respond to them. ii.
All sample units are classified with respect to a known auxiliary variable.
Consequently, we make homogeneity within classes. Each class is considered as an area.
iii.
Sample units belong to each area randomly divided into (m) sub-samples. In each class, each sub-questionnaire is administrated to a sub-sample. iv.
Step iii is repeated for all classes.
Note: In each class, the number of sub-questionnaires and number of sub samples should be equal .
5
(To apply small area estimation)
(continued)
Pattern of administering subquestionnaires to sub-samples in small area estimation approach:
6
Population Characteristics Estimation
• Introduction
• Small Area Models
• Nested Error Regression Model
• Steps of Procedure
• Measures of Comparisons
• Results
7
Introduction
• There is not large enough sample to support direct estimates of appropriate precision based on the proposed design.
• Small area estimation method as a solution of insufficient sample size in split questionnaire method would be useful, in order to improve the efficiency of survey statistics.
8
Small Area Models
• Area level models
– Fay-Herriot Model
– Model with Correlated Sampling Errors
– …
• Unit level models
– Nested Error Regression Model
✓
– Random Error Variance Linear Model
– …
9
Nested Error Regression model
(Rao 2003)
One of the common models which has been used in small area estimation isNested Error Regression model.
This modelisa special case of unit level linear mixed model with a block diagonal covariance structure : 𝑥 𝑖𝑗
: a vector of auxiliary variable 𝑦 𝑖𝑗 𝑛 𝑖
′
: the response variable
: the sample size in the i th area 𝛽: the vector of regression coefficients 𝑣 𝑖
: an area-specific random effect 𝑒 𝑖𝑗
: error term 𝑣 𝑖 𝑒 𝑖𝑗
~ 𝑁(0, 𝜎 𝑒
2 )
~ 𝑁(0, 𝜎 𝑣
2 ) 𝑣 𝑖 and 𝑒 𝑖𝑗 are assumed to be independent
10
Nested Error Regression model (Continued)
(Rao 2003)
The empirical best linear unbiased predictor (EBLUP):
𝑿 𝒊
: the auxiliary population mean vector of the i th area 𝒙 𝒊
: the auxiliary sample mean vector of the i th area 𝒚 𝒊
: the sample-base mean of the i th area and
11
Nested Error Regression model (Continued)
(Rao 2003)
The MSE of EBLUP estimator: where
Note: 𝑉 𝑣 and 𝑉 𝑣𝑒 and 𝑉 𝑒 are the asymptotic variances of the estimators 𝜎 𝑣
2 is the asymptotic covariance of 𝜎 𝑣
2 and 𝜎 𝑒
2 .
and 𝜎 𝑒
2
,
12
Nested Error Regression Model (Continued)
Population mean estimation ( 𝒀
𝑼
) using Nested Error Regression Model i.
The nested error regression model is used for each sub-questionnaire to compute the EBLUP of population totals for each area.
ii.
Due to obvious independency of each area from the others, we can use stratified sampling formula for population mean of 𝑌
𝑈 for the population of size 𝑁 ′
𝑌 takes the form:
𝑈
. Therefore, the estimate
The MSE of 𝒀
𝑼
:
13
Population Characteristics Estimation
• Introduction
• Small Area Models
• Nested Error Regression Model
• Steps of Procedure
• Measures of Comparisons
• Results
14
Steps of Procedure
• Split questionnaire design: i.
Creating a questionnaire with 17 questions. ii.
Splitting the questionnaire into five different components (based on split questionnaire design (Raghunathan and Grizzle 1995)).
iii.
Considering five items (which are highly correlated with other twelve items) as a core part.
iv.
Administering the core part to all sample units.
v.
Assigning three items to each component in such a way that the within component correlation is small whereas, items in different components are highly correlated.
vi.
Creating 6 subquestionnaires consist of each double combination of four components plus the core part.
15
Steps of Procedure (Continued)
• Data generator i.
Generating a multivariate normal random vector (50,000 times), under the described correlation pattern.
ii.
Producing a multinomial variable as a stratification variable which is strongly correlated with the other variables.
iii.
Classifying the population units based on the stratification variable.
iv.
Selecting a simple random sample (without replacement) of a fixed size n=2000 from the population.
v.
Assigning sample units in each stratum to the all 6 subquestionnaires.
vi.
Estimating the population mean of each item by applying multiple imputation approach using the predictive mean matching method (Rubin
1987) and the small area estimation technique. vii. Generating 1000 simulated bootstrap samples to compare two approaches.
16
• Estimated bias
Measures of Comparisons where with 𝜃 𝑖 is the i th bootstrap estimate.
• Estimated absolute relative bias (EARB)
17
Measures of Comparisons (Continued)
• Estimated mean square error (EMSE) where
• Estimated relative efficiency ( 𝑬𝑹𝑬
𝟐𝟏
)
18
• The estimation of absolute relative bias, MSE and relative efficiency for
1000 bootstrap samples using sample auxiliary information
19
(Continued)
• Absolute relative bias, MSE and relative efficiency for 1000 bootstrap samples using population auxiliary information
20
(Continued)
• Small area estimators mostly have lower ARB respect to multiple imputation based estimators.
• There were no cases in which the multiple imputation approach gave a smaller MSE than the small area method across all items.
• Small area estimates are more efficient than multiple imputation approach estimates.
• Small Area technique requires less computation compare to multiple imputation method.
• Small Area method does not require to produce data, Hence it would be more applicable, where the goals is estimation of population auxiliary and not the improvement of data quality.
21
22
(Continued)
23
24