DO FILE clear all version 10 set more off capture log close set memory 250m * set up a global macro containing file path to data directory *global dir1 "\\iserraid2\ConferenceData\final" *cd M:\ cd "D:\Home\anandi\1-Courses\EC969\SL-AN\DataPrep" * open log file log using Week2Lecture1.log, replace * open dataset use Week2Lecture1, clear // Let's see what these variables look like: What are the variable names, value // labels, their mean, s.d., frequency distribution? Are there any missings? describe summ // Are there any missing values? tab1 sex white country memorig ivfio, missing // What are the different samples? tab memorig // Look at distributions of cross-sectional respondent weights and see how they vary by sample. tabstat xrwtuk1, stat(mean min max sd) by(memorig) longstub nototal // Examine the variable of interest - wage summ wage, de // Why is this missing for some people? tab employed if wage==., m tab ivfio if employed==1 & wage==., m // Compute unweighted mean, standard errors and confidence interval for wage. summ wage ci wage // Compute weighted mean, standard errors, confidence interval and standard error for wage. summ wage [aweight=xrwtuk1] ci wage [aweight=xrwtuk1] // What happens if you use -pweight- instead? capture noisily summ wage [pweight=xrwtuk1] capture noisily ci wage [pweight=xrwtuk1] // Compute weighted mean, standard errors, confidence interval and standard deviation for wage, correctly informing Stata that the weights are probability weights // but again not correcting for the sample design, i.e. assuming that the sample is a simple random sample. This will produce correct UK population estimate of mean // monthly pay but the standard error of the estimate will be incorrect as the BHPS is a stratified and clustered sample. // First inform Stata about the design variables and then compute the weighted means etc. svyset [pweight = xrwtuk1] svy: mean wage estat sd // What happens if you use aweight instead? capture noisily svyset [aweight = lxrwtuk1] // Compute weighted mean, standard errors and confidence interval for wage after informing Stata of the correct sample design. // First inform Stata about the design variables. (But before doing that remember to clear Stata’s memory of any existing design information) and then compute the weighted means etc. 1 svyset, clear svyset [pweight = xrwtuk1], psu(psu) strata(strata) svy: mean wage // This returns mean income, but does not return standard error or confidence interval: Find out why?. svydes // You will find that there is a stratum (-8) with just 1 unit (psu) within it. Which region or sample is that? tab1 memorig if strata==-8 // Exclude that sample from the analysis svy: mean wage if memorig ~= 7 // Compute the different weighted and unweighted mean wage for the different countries (England, Scotland, Wales and Northern Ireland) tab country memorig, missing // (optional) How does country compare with memorig? // Look at distributions of (cross-section) RESPONDENT weights and see how these vary by country of residence (not by sample): tabstat xrwtuk1, stat(mean min max sd count) by(country) longstub nototal // Compute the unweighted mean wage for each country. bysort country: ci wage // Drop Northern Ireland sub-sample drop if memorig==7 // Drop missing country cases drop if country==. // Compute the weighted mean of wage for each country after telling Stata that the weights are probabiilty weights and correcting for sample design. svyset [pweight = xrwtuk1], psu (psu) strata (strata) ** Use the if option svy: mean wage if country==1 svy: mean wage if country==2 svy: mean wage if country==3 ** Use the subpop option svy, subpop(if country==1): mean wage svy, subpop(if country==2): mean wage svy, subpop(if country==3): mean wage ** Use the over option svy: mean wage, over(country) // Compute the weighted mean of wage for men and women in the four countries ** Use the over option svy: mean wage, over(country sex) ** Use the subpop option svy, subpop(if country==1 svy, subpop(if country==1 svy, subpop(if country==2 svy, subpop(if country==2 svy, subpop(if country==3 svy, subpop(if country==3 & & & & & & sex==1): sex==2): sex==1): sex==2): sex==1): sex==2): mean mean mean mean mean mean wage wage wage wage wage wage // Test differences in pay across the different countries. svy: mean wage, over(country) test [wage]England = [wage]Scotland = [wage]Wales // Test gender differences in pay across the different countries. svy: mean wage, over(country sex) test [wage]_subpop_1=[wage]_subpop_2 test [wage]_subpop_3=[wage]_subpop_4 test [wage]_subpop_5=[wage]_subpop_6 2 // Compute design effects and design factor quietly svy: mean wage estat effects, deff deft // [Optional] Plot the weighted mean and the confidence interval using the code // -ciplot- Use -findit- to find it and then install it ciplot wage, by(country) saving(graph1, replace) ciplot wage [aw=xrwtuk1], by(country) saving(graph2, replace) ** Including Northern Ireland use Week2Lecture1, clear replace psu=hid if memorig==7 svyset [pweight = xrwtuk1], psu (psu) strata (strata) svy: mean wage, over(country) log close exit 3 LOG FILE ----------------------------------------------------------------------------------------------------------------------------------------------name: <unnamed> log: D:\Home\anandi\1-Courses\EC969\SL-AN\DataPrep\Week2Lecture1.log log type: text opened on: 1 Mar 2011, 15:23:40 . . . * open dataset . use Week2Lecture1, clear . . // Let's see what these variables look like: What are the variable names, value . // labels, their mean, s.d., frequency distribution? Are there any missings? . . describe Contains data from Week2Lecture1.dta obs: 14,897 vars: 56 24 Feb 2011 15:53 size: 1,564,185 (99.4% of memory free) ----------------------------------------------------------------------------------------------------------------------------------------------storage display value variable name type format label variable label ----------------------------------------------------------------------------------------------------------------------------------------------pid long %12.0g cross-wave person identifier sex byte %8.0g sex sex dobm byte %8.0g dobm month of birth doby int %8.0g doby year of birth memorig byte %8.0g memorig sample origin racel byte %8.0g racel ethnic group membership (long version) hid long %12.0g household identification number pno byte %8.0g person number jbstat byte %8.0g mjbstat current economic activity hlprb byte %8.0g mhlprb health problems: none hlprba byte %8.0g mhlprba health problems: arms, legs, hands, etc hlprbb byte %8.0g mhlprbb health problems: sight hlprbc byte %8.0g mhlprbc health problems: hearing hlprbd byte %8.0g mhlprbd health problems: skin conditions/allergy hlprbe byte %8.0g mhlprbe health problems: chest/breathing hlprbf byte %8.0g mhlprbf health problems: heart/blood pressure hlprbg byte %8.0g mhlprbg health problems: stomach or digestion hlprbh byte %8.0g mhlprbh health problems: diabetes hlprbi byte %8.0g mhlprbi health problems: anxiety, depression, et hlprbj byte %8.0g mhlprbj health problems: alcohol or drugs hlprbk byte %8.0g mhlprbk health problems: epilepsy hlprbl byte %8.0g mhlprbl health problems: migraine hlprbn byte %8.0g mhlprbn health problems: cancer hlprbo byte %8.0g mhlprbo health problems: stroke hlprbm byte %8.0g mhlprbm health problems: other jbhas byte %8.0g mjbhas did paid work last week jboff byte %8.0g mjboff no work last week but has job jbsemp byte %8.0g mjbsemp employee or self-employed: current job jbhrs byte %8.0g mjbhrs no. of hours normally worked per week ivfio byte %8.0g mivfio individual interview outcome mastat byte %8.0g mmastat marital status age byte %8.0g mage age at date of interview nchild byte %8.0g mnchild number of own children in household region byte %8.0g mregion region / metropolitan area qfedhi byte %8.0g mqfedhi highest educational qualification paygu float %9.0g mpaygu usual gross pay per month: current job xrwght float %9.0g x-sectional respondent weight xrwtuk1 float %9.0g x-sect'l resp. weight inc new samples xrwtuk2 float %9.0g x-sect'l resp. weight within uk estimate nch02 byte %8.0g mnch02 number children in household aged 0-2 nch34 byte %8.0g mnch34 number children in household aged 3-4 nch511 byte %8.0g mnch511 number children in household aged 5-11 nch1215 byte %8.0g mnch1215 number children in household aged 12-15 nch1618 byte %8.0g mnch1618 number dependent children in hh 16+ strata int %8.0g stratification class psu int %8.0g primary sampling unit 4 _merge byte %8.0g wage float %9.0g usual hourly wage employed float %9.0g whether in paid employment last week youngchildren float %9.0g If children <5 yrs in HH england float %9.0g wales float %9.0g scotland float %9.0g N_Ireland float %9.0g country byte %16.0g country countries of UK white float %9.0g Ethnicity: White ----------------------------------------------------------------------------------------------------------------------------------------------Sorted by: . summ Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------pid | 14897 6.44e+07 4.63e+07 1.00e+07 1.40e+08 sex | 14897 1.558569 .4965745 1 2 dobm | 14897 6.458884 3.420427 -9 12 doby | 14897 1956.784 24.89484 -9 1988 memorig | 14897 3.361952 2.586713 1 7 -------------+-------------------------------------------------------racel | 14897 1.575955 3.005079 -8 18 hid | 14897 1.34e+07 294516.3 1.30e+07 1.39e+07 pno | 14897 1.673021 .901808 1 11 jbstat | 14897 3.455461 1.948806 1 10 hlprb | 14897 -4.86749 3.904957 -9 0 -------------+-------------------------------------------------------hlprba | 14897 .206216 .9160706 -9 1 hlprbb | 14897 -.0216151 .8083313 -9 1 hlprbc | 14897 .0098678 .8278009 -9 1 hlprbd | 14897 .0429617 .8465235 -9 1 hlprbe | 14897 .0632342 .8571598 -9 1 -------------+-------------------------------------------------------hlprbf | 14897 .1041149 .8767922 -9 1 hlprbg | 14897 .0072498 .8262452 -9 1 hlprbh | 14897 -.0348392 .7996424 -9 1 hlprbi | 14897 .012083 .8291086 -9 1 hlprbj | 14897 -.068403 .776142 -9 1 -------------+-------------------------------------------------------hlprbk | 14897 -.066255 .7777107 -9 1 hlprbl | 14897 .001074 .8225304 -9 1 hlprbn | 14897 -.0598107 .7823625 -9 1 hlprbo | 14897 -.062093 .7807242 -9 1 hlprbm | 14897 -.0277908 .8043123 -9 1 -------------+-------------------------------------------------------jbhas | 14897 1.456333 .499721 -1 2 jboff | 14897 -3.455729 4.955244 -8 3 jbsemp | 14897 -2.849634 4.452904 -8 1 jbhrs | 14897 15.73673 22.22558 -8 99 ivfio | 14897 1.086326 .3776966 1 3 -------------+-------------------------------------------------------mastat | 14897 2.576626 2.029402 -1 6 age | 14897 45.8305 18.98171 15 99 nchild | 14897 .4921125 .9107354 0 7 region | 14897 12.50064 6.846798 -9 19 qfedhi | 14897 5.710143 5.060525 -9 13 -------------+-------------------------------------------------------paygu | 14897 781.0079 1095.092 -8 29794.92 xrwght | 14897 .502307 .5843616 0 2.5 xrwtuk1 | 14897 .9690967 .9195383 0 5.054481 xrwtuk2 | 14897 .9369517 .5081068 0 16.24493 nch02 | 14897 .0614218 .2486266 0 2 -------------+-------------------------------------------------------nch34 | 14897 .0659193 .2584848 0 2 nch511 | 14897 .2438075 .5782308 0 5 nch1215 | 14897 .1784252 .4657831 0 3 nch1618 | 14897 .0649795 .2613045 0 2 strata | 14897 54.52937 51.95504 -8 151 -------------+-------------------------------------------------------psu | 14897 210.3743 194.614 -8 575 _merge | 14897 3 0 3 3 wage | 8038 9.855252 7.044385 .251938 238.9328 employed | 14897 .5722629 .4947671 0 1 youngchild~n | 14897 .1190173 .3790271 0 3 5 -------------+-------------------------------------------------------england | 14897 .4551252 .4979989 0 1 wales | 14897 .1727865 .3780753 0 1 scotland | 14897 .1805733 .3846771 0 1 N_Ireland | 14897 .1753373 .3802681 0 1 country | 14656 2.077374 1.163245 1 4 -------------+-------------------------------------------------------white | 14106 .9762512 .1522708 0 1 . . // Are there any missing values? . tab1 sex white country memorig ivfio, missing -> tabulation of sex sex | Freq. Percent Cum. ----------------+----------------------------------male | 6,576 44.14 44.14 female | 8,321 55.86 100.00 ----------------+----------------------------------Total | 14,897 100.00 -> tabulation of white Ethnicity: | White | Freq. Percent Cum. ------------+----------------------------------0 | 335 2.25 2.25 1 | 13,771 92.44 94.69 . | 791 5.31 100.00 ------------+----------------------------------Total | 14,897 100.00 -> tabulation of country countries of UK | Freq. Percent Cum. -----------------+----------------------------------England | 6,780 45.51 45.51 Wales | 2,574 17.28 62.79 Scotland | 2,690 18.06 80.85 Northern Ireland | 2,612 17.53 98.38 . | 241 1.62 100.00 -----------------+----------------------------------Total | 14,897 100.00 -> tabulation of memorig sample origin | Freq. Percent Cum. ------------------------+----------------------------------original sample | 7,941 53.31 53.31 wales new sample | 2,206 14.81 68.11 scotland new sample | 2,138 14.35 82.47 n.i. new sample | 2,612 17.53 100.00 ------------------------+----------------------------------Total | 14,897 100.00 -> tabulation of ivfio individual interview outcome | Freq. Percent Cum. --------------------------------+----------------------------------full interview | 14,086 94.56 94.56 proxy interview | 336 2.26 96.81 telephone intvw | 475 3.19 100.00 --------------------------------+----------------------------------Total | 14,897 100.00 . . // What are the different samples? . tab memorig sample origin | Freq. Percent Cum. ------------------------+----------------------------------original sample | 7,941 53.31 53.31 wales new sample | 2,206 14.81 68.11 scotland new sample | 2,138 14.35 82.47 n.i. new sample | 2,612 17.53 100.00 ------------------------+----------------------------------- 6 Total | 14,897 100.00 . . // Look at distributions of cross-sectional respondent weights and see how they vary by sample. . tabstat xrwtuk1, stat(mean min max sd) by(memorig) longstub nototal memorig variable | mean min max sd ------------------------------+---------------------------------------original sample xrwtuk1 | 1.570268 0 3.025512 .8695618 wales new sample xrwtuk1 | .2765412 0 5.054481 .2173516 scotland new sam xrwtuk1 | .4626941 0 3.753553 .3132991 n.i. new sample xrwtuk1 | .1408289 0 .7508973 .0602463 ----------------------------------------------------------------------. . // Examine the variable of interest - wage . summ wage, de usual hourly wage ------------------------------------------------------------Percentiles Smallest 1% 2.11209 .251938 5% 3.979806 .503876 10% 4.534883 .503876 Obs 8038 25% 5.818429 .5436955 Sum of Wgt. 8038 50% 75% 90% 95% 99% 8.169528 12.05271 17.00976 20.68929 31.97686 Largest 116.3686 117.632 152.9346 238.9328 Mean Std. Dev. 9.855252 7.044385 Variance Skewness Kurtosis 49.62336 8.247111 188.2287 . . // Why is this missing for some people? . tab employed if wage==., m whether in | paid | employment | last week | Freq. Percent Cum. ------------+----------------------------------0 | 6,372 92.90 92.90 1 | 487 7.10 100.00 ------------+----------------------------------Total | 6,859 100.00 . tab ivfio if employed==1 & wage==., m individual interview outcome | Freq. Percent Cum. --------------------------------+----------------------------------proxy interview | 183 37.58 37.58 telephone intvw | 304 62.42 100.00 --------------------------------+----------------------------------Total | 487 100.00 . . // Compute unweighted mean, standard errors and confidence interval for wage. . summ wage Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------wage | 8038 9.855252 7.044385 .251938 238.9328 . ci wage Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 8038 9.855252 .0785722 9.70123 10.00927 . . // Compute weighted mean, standard errors, confidence interval and standard error for wage. . summ wage [aweight=xrwtuk1] Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- 7 wage | 7924 8293.86803 10.28872 7.357536 .251938 238.9328 . ci wage [aweight=xrwtuk1] Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 7924 10.28872 .0826533 10.1267 10.45074 . . // What happens if you use -pweight- instead? . capture noisily summ wage [pweight=xrwtuk1] pweight not allowed . capture noisily ci wage [pweight=xrwtuk1] pweight not allowed . . . // Compute weighted mean, standard errors, confidence interval and standard deviation for wage, correctly informing Stata that the weights are > probability weights . // but again not correcting for the sample design, i.e. assuming that the sample is a simple random sample. This will produce correct UK popu > lation estimate of mean . // monthly pay but the standard error of the estimate will be incorrect as the BHPS is a stratified and clustered sample. . . // First inform Stata about the design variables and then compute the weighted means etc. . svyset [pweight = xrwtuk1] pweight: VCE: Single unit: Strata 1: SU 1: FPC 1: xrwtuk1 linearized missing <one> <observations> <zero> . svy: mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 1 8926 Number of obs Population size Design df = 8926 = 8293.87 = 8925 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 10.28872 .1063986 10.08015 10.49729 -------------------------------------------------------------. estat sd ------------------------------------| Mean Std. Dev. -------------+----------------------wage | 10.28872 7.357536 ------------------------------------. . // What happens if you use aweight instead? . capture noisily svyset [aweight = lxrwtuk1] aweight not allowed . . // Compute weighted mean, standard errors and confidence interval for wage after informing Stata of the correct sample design. . // First inform Stata about the design variables. (But before doing that remember to clear Stata’s memory of any existing design information) > and then compute the weighted means etc. . svyset, clear . svyset [pweight = xrwtuk1], psu(psu) strata(strata) pweight: xrwtuk1 8 VCE: Single unit: Strata 1: SU 1: FPC 1: linearized missing strata psu <zero> . svy: mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 121 399 Number of obs Population size Design df = 8926 = 8293.87 = 278 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 10.28872 . . . -------------------------------------------------------------Note: missing standard error because of stratum with single sampling unit. . . // This returns mean income, but does not return standard error or confidence interval: Find out why?. . svydes Survey: Describing stage 1 sampling units pweight: VCE: Single unit: Strata 1: SU 1: FPC 1: Stratum --------8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 xrwtuk1 linearized missing strata psu <zero> #Units #Obs -------- -------1* 2612 3 63 3 57 4 68 3 53 3 60 3 105 3 73 3 103 3 49 3 88 8 225 8 256 8 293 8 278 8 204 8 241 3 89 4 168 3 110 4 117 3 110 4 162 2 78 2 91 3 94 2 98 3 104 3 88 3 116 3 120 3 126 3 110 3 72 3 71 #Obs per Unit ---------------------------min mean max -------- -------- -------2612 2612.0 2612 14 21.0 29 17 19.0 23 6 17.0 30 8 17.7 28 14 20.0 24 33 35.0 37 20 24.3 30 31 34.3 37 7 16.3 21 16 29.3 40 15 28.1 39 14 32.0 51 24 36.6 46 24 34.8 50 8 25.5 35 18 30.1 51 22 29.7 34 32 42.0 48 25 36.7 44 16 29.3 41 35 36.7 40 30 40.5 50 38 39.0 40 43 45.5 48 24 31.3 38 45 49.0 53 32 34.7 36 27 29.3 32 21 38.7 53 27 40.0 55 20 42.0 60 35 36.7 39 22 24.0 27 16 23.7 33 9 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 131 132 133 134 135 136 137 138 139 140 141 142 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 2 3 2 2 2 2 3 3 3 2 2 2 3 3 3 3 3 4 4 3 4 3 4 3 5 4 4 3 4 3 4 3 3 3 4 3 3 2 3 3 2 3 3 2 5 2 3 3 2 4 3 3 4 4 3 4 4 3 4 74 55 72 73 149 108 97 89 83 51 80 119 87 112 76 98 106 125 73 92 57 48 60 69 132 115 80 70 80 87 126 77 74 123 130 128 105 102 107 79 133 78 116 93 133 115 118 92 112 92 84 90 130 66 83 67 54 92 66 92 94 58 168 65 105 87 56 81 122 85 86 111 60 90 99 108 125 19 15 36 22 39 32 28 23 20 10 26 33 20 35 37 21 20 35 27 26 28 13 23 30 34 29 21 27 33 38 38 22 15 17 41 17 20 27 14 7 19 20 19 7 28 36 26 28 19 25 23 24 20 15 20 30 16 25 26 19 19 27 21 29 31 26 28 9 22 26 18 5 18 15 16 32 24 24.7 18.3 36.0 24.3 49.7 36.0 32.3 29.7 27.7 17.0 26.7 39.7 29.0 37.3 38.0 32.7 35.3 41.7 36.5 30.7 28.5 24.0 30.0 34.5 44.0 38.3 26.7 35.0 40.0 43.5 42.0 25.7 24.7 41.0 43.3 32.0 26.3 34.0 26.8 26.3 33.3 26.0 23.2 23.3 33.3 38.3 29.5 30.7 28.0 30.7 28.0 30.0 32.5 22.0 27.7 33.5 18.0 30.7 33.0 30.7 31.3 29.0 33.6 32.5 35.0 29.0 28.0 20.3 40.7 28.3 21.5 27.8 20.0 22.5 24.8 36.0 31.3 10 28 21 36 27 60 39 37 38 37 23 27 50 36 41 39 39 55 51 46 38 29 35 37 39 52 53 35 43 47 49 47 32 34 57 45 44 33 41 42 42 50 34 33 30 40 41 35 35 35 40 32 38 45 26 34 37 21 37 40 39 39 31 40 36 38 34 28 26 63 30 26 44 23 34 39 39 37 143 144 145 146 147 148 149 150 151 -------121 3 4 3 4 3 4 3 4 3 -------400 82 105 105 124 97 119 138 116 85 -------14897 21 21 32 27 26 20 41 22 21 -------5 27.3 26.3 35.0 31.0 32.3 29.8 46.0 29.0 28.3 -------37.2 35 35 37 37 40 43 54 43 37 -------2612 . . // You will find that there is a stratum (-8) with just 1 unit (psu) within it. Which region or sample is that? . tab1 memorig if strata==-8 -> tabulation of memorig if strata==-8 sample origin | Freq. Percent Cum. ------------------------+----------------------------------n.i. new sample | 2,612 100.00 100.00 ------------------------+----------------------------------Total | 2,612 100.00 . . // Exclude that sample from the analysis . svy: mean wage if memorig ~= 7 (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 120 398 Number of obs Population size Design df = 7527 = 8101.44 = 278 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 10.31203 .1224545 10.07097 10.55308 -------------------------------------------------------------. . . . . // Compute the different weighted and unweighted mean wage for the different countries (England, Scotland, Wales and Northern Ireland) . tab country memorig, missing // (optional) How does country compare with memorig? | sample origin countries of UK | original wales new scotland n.i. new | Total -----------------+--------------------------------------------+---------England | 6,697 58 25 0 | 6,780 Wales | 444 2,127 3 0 | 2,574 Scotland | 634 0 2,056 0 | 2,690 Northern Ireland | 0 0 0 2,612 | 2,612 . | 166 21 54 0 | 241 -----------------+--------------------------------------------+---------Total | 7,941 2,206 2,138 2,612 | 14,897 . . // Look at distributions of (cross-section) RESPONDENT weights and see how these vary by country of residence (not by sample): . tabstat xrwtuk1, stat(mean min max sd count) by(country) longstub nototal country variable | mean min max sd N ------------------------------+-------------------------------------------------England xrwtuk1 | 1.750575 0 5.054481 .7905738 6780 Wales xrwtuk1 | .2686751 0 2.653648 .156446 2574 Scotland xrwtuk1 | .447012 0 1.837523 .2589535 2690 Northern Ireland xrwtuk1 | .1408289 0 .7508973 .0602463 2612 --------------------------------------------------------------------------------. 11 . // Compute the unweighted mean wage for each country. . bysort country: ci wage -----------------------------------------------------------------------------------------------------------------------------------------------> country = England Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 3889 10.36125 .1262336 10.11376 10.60874 -----------------------------------------------------------------------------------------------------------------------------------------------> country = Wales Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 1208 8.707197 .1525756 8.407855 9.00654 -----------------------------------------------------------------------------------------------------------------------------------------------> country = Scotland Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 1451 9.79849 .1594425 9.485727 10.11125 -----------------------------------------------------------------------------------------------------------------------------------------------> country = Northern Ireland Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 1324 9.330372 .1879657 8.961628 9.699115 -----------------------------------------------------------------------------------------------------------------------------------------------> country = . Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------wage | 166 11.03793 .4429368 10.16338 11.91249 . . // Drop Northern Ireland sub-sample . drop if memorig==7 (2612 observations deleted) . . // Drop missing country cases . drop if country==. (241 observations deleted) . . . . // Compute the weighted mean of wage for each country after telling Stata that the weights are probabiilty weights and correcting for sample d > esign. . svyset [pweight = xrwtuk1], psu (psu) strata (strata) pweight: VCE: Single unit: Strata 1: SU 1: FPC 1: xrwtuk1 linearized missing strata psu <zero> . . ** Use the if option . svy: mean wage if country==1 (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 94 257 Number of obs Population size 12 = 4255 = 6836.02 Design df = 163 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 10.40639 . . . -------------------------------------------------------------Note: 4 strata omitted because they contain no population members. Note: missing standard error because of stratum with single sampling unit. . svy: mean wage if country==2 (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 49 111 Number of obs Population size Design df = 1439 = 351.385 = 62 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 8.81579 . . . -------------------------------------------------------------Note: 1 stratum omitted because it contains no population members. Note: missing standard error because of stratum with single sampling unit. . svy: mean wage if country==3 (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 33 100 Number of obs Population size Design df = 1641 = 691.216 = 67 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 9.738896 . . . -------------------------------------------------------------Note: 1 stratum omitted because it contains no population members. Note: missing standard error because of stratum with single sampling unit. . . ** Use the subpop option . svy, subpop(if country==1): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 94 314 Number of obs Population size Subpop. no. obs Subpop. size Design df = 7136 = 7833.91 = 3821 = 6836.02 = 220 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 10.40639 .1409328 10.12864 10.68414 -------------------------------------------------------------Note: 26 strata omitted because they contain no subpopulation members. . svy, subpop(if country==2): mean wage (running mean on estimation sample) 13 Survey: Mean estimation Number of strata = Number of PSUs = 49 163 Number of obs Population size Subpop. no. obs Subpop. size Design df = 3882 = 4076.64 = 1182 = 351.385 = 114 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 8.81579 .1765602 8.466026 9.165555 -------------------------------------------------------------Note: 71 strata omitted because they contain no subpopulation members. . svy, subpop(if country==3): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 33 112 Number of obs Population size Subpop. no. obs Subpop. size Design df = 2266 = 1607.04 = 1433 = 691.216 = 79 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 9.738896 .222134 9.296749 10.18104 -------------------------------------------------------------Note: 87 strata omitted because they contain no subpopulation members. . . ** Use the over option . svy: mean wage, over(country) (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 120 398 Number of obs Population size Design df = 7346 = 7878.62 = 278 England: country = England Wales: country = Wales Scotland: country = Scotland -------------------------------------------------------------| Linearized Over | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | England | 10.40639 .1409328 10.12896 10.68382 Wales | 8.81579 .1765602 8.468226 9.163355 Scotland | 9.738896 .222134 9.301618 10.17617 -------------------------------------------------------------. . . // Compute the weighted mean of wage for men and women in the four countries . ** Use the over option . svy: mean wage, over(country sex) (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 120 398 Number of obs Population size Design df Over: country sex 14 = 7346 = 7878.62 = 278 _subpop_1: _subpop_2: _subpop_3: _subpop_4: _subpop_5: _subpop_6: England male England female Wales male Wales female Scotland male Scotland female -------------------------------------------------------------| Linearized Over | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | _subpop_1 | 11.66497 .209374 11.25281 12.07713 _subpop_2 | 9.133208 .1466152 8.844591 9.421825 _subpop_3 | 9.775749 .253263 9.277192 10.27431 _subpop_4 | 7.960065 .1906461 7.584771 8.335358 _subpop_5 | 10.86205 .3361621 10.2003 11.52379 _subpop_6 | 8.702004 .2262984 8.256528 9.14748 -------------------------------------------------------------. . ** Use the subpop option . svy, subpop(if country==1 & sex==1): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 86 288 Number of obs Population size Subpop. no. obs Subpop. size Design df = 7847 = 10641.1 = 1855 = 3437.72 = 202 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 11.66497 .209374 11.25213 12.0778 -------------------------------------------------------------Note: 34 strata omitted because they contain no subpopulation members. . svy, subpop(if country==1 & sex==2): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 87 289 Number of obs Population size Subpop. no. obs Subpop. size Design df = 7424 = 9552.9 = 1966 = 3398.29 = 202 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 9.133208 .1466152 8.844116 9.422301 -------------------------------------------------------------Note: 33 strata omitted because they contain no subpopulation members. . svy, subpop(if country==2 & sex==1): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 44 148 Number of obs Population size Subpop. no. obs Subpop. size Design df = 4046 = 3585.01 = 539 = 165.607 = 104 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ 15 wage | 9.775749 .253263 9.273519 10.27798 -------------------------------------------------------------Note: 76 strata omitted because they contain no subpopulation members. . svy, subpop(if country==2 & sex==2): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 47 158 Number of obs Population size Subpop. no. obs Subpop. size Design df = 4195 = 3977.85 = 643 = 185.779 = 111 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 7.960065 .1906461 7.582287 8.337842 -------------------------------------------------------------Note: 73 strata omitted because they contain no subpopulation members. . svy, subpop(if country==3 & sex==1): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 32 110 Number of obs Population size Subpop. no. obs Subpop. size Design df = 2788 = 1735.1 = 669 = 331.806 = 78 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 10.86205 .3360138 10.19309 11.531 -------------------------------------------------------------Note: 88 strata omitted because they contain no subpopulation members. . svy, subpop(if country==3 & sex==2): mean wage (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 30 105 Number of obs Population size Subpop. no. obs Subpop. size Design df = 2419 = 1339.54 = 764 = 359.409 = 75 -------------------------------------------------------------| Linearized | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | 8.702004 .2261375 8.251515 9.152493 -------------------------------------------------------------Note: 90 strata omitted because they contain no subpopulation members. . . // Test differences in pay across the different countries. . svy: mean wage, over(country) (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 120 398 Number of obs Population size Design df England: country = England 16 = 7346 = 7878.62 = 278 Wales: country = Wales Scotland: country = Scotland -------------------------------------------------------------| Linearized Over | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | England | 10.40639 .1409328 10.12896 10.68382 Wales | 8.81579 .1765602 8.468226 9.163355 Scotland | 9.738896 .222134 9.301618 10.17617 -------------------------------------------------------------. test [wage]England = [wage]Scotland = [wage]Wales Adjusted Wald test ( 1) ( 2) [wage]England - [wage]Scotland = 0 [wage]England - [wage]Wales = 0 F( 2, 277) = Prob > F = 24.07 0.0000 . . // Test gender differences in pay across the different countries. . svy: mean wage, over(country sex) (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = Over: _subpop_1: _subpop_2: _subpop_3: _subpop_4: _subpop_5: _subpop_6: 120 398 Number of obs Population size Design df = 7346 = 7878.62 = 278 country sex England male England female Wales male Wales female Scotland male Scotland female -------------------------------------------------------------| Linearized Over | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | _subpop_1 | 11.66497 .209374 11.25281 12.07713 _subpop_2 | 9.133208 .1466152 8.844591 9.421825 _subpop_3 | 9.775749 .253263 9.277192 10.27431 _subpop_4 | 7.960065 .1906461 7.584771 8.335358 _subpop_5 | 10.86205 .3361621 10.2003 11.52379 _subpop_6 | 8.702004 .2262984 8.256528 9.14748 -------------------------------------------------------------. test [wage]_subpop_1=[wage]_subpop_2 Adjusted Wald test ( 1) [wage]_subpop_1 - [wage]_subpop_2 = 0 F( 1, 278) = Prob > F = 122.84 0.0000 . test [wage]_subpop_3=[wage]_subpop_4 Adjusted Wald test ( 1) [wage]_subpop_3 - [wage]_subpop_4 = 0 F( 1, 278) = Prob > F = 45.08 0.0000 . test [wage]_subpop_5=[wage]_subpop_6 Adjusted Wald test 17 ( 1) [wage]_subpop_5 - [wage]_subpop_6 = 0 F( 1, 278) = Prob > F = 35.44 0.0000 . . . // Compute design effects and design factor . quietly svy: mean wage . estat effects, deff deft ---------------------------------------------------------| Linearized | Mean Std. Err. DEFF DEFT -------------+-------------------------------------------wage | 10.27689 .1249507 1.84038 1.35661 ---------------------------------------------------------. . . // [Optional] Plot the weighted mean and the confidence interval using the code . // -ciplot- Use -findit- to find it and then install it . ciplot wage, by(country) saving(graph1, replace) (file graph1.gph saved) . ciplot wage [aw=xrwtuk1], by(country) saving(graph2, replace) (file graph2.gph saved) . . ** Including Northern Ireland . use Week2Lecture1, clear . replace psu=hid if memorig==7 psu was int now long (2612 real changes made) . svyset [pweight = xrwtuk1], psu (psu) strata (strata) pweight: VCE: Single unit: Strata 1: SU 1: FPC 1: xrwtuk1 linearized missing strata psu <zero> . svy: mean wage, over(country) (running mean on estimation sample) Survey: Mean estimation Number of strata = Number of PSUs = 121 1301 England: Wales: Scotland: _subpop_4: = = = = country country country country Number of obs Population size Design df = 8745 = 8071.04 = 1180 England Wales Scotland Northern Ireland -------------------------------------------------------------| Linearized Over | Mean Std. Err. [95% Conf. Interval] -------------+-----------------------------------------------wage | England | 10.40639 .1409328 10.12988 10.6829 Wales | 8.81579 .1765602 8.469383 9.162197 Scotland | 9.738896 .222134 9.303074 10.17472 _subpop_4 | 9.307484 .1992926 8.916477 9.698492 -------------------------------------------------------------. . log close name: log: log type: closed on: <unnamed> D:\Home\anandi\1-Courses\EC969\SL-AN\DataPrep\Week2Lecture1.log text 1 Mar 2011, 15:23:53 18