Introduction to STATA using NIHPS

advertisement
Introduction to Stata
using the Northern Ireland
Household Panel Survey
(NIHPS)
Katrina Lloyd (QUB)
Patricia McKee (UU)
Format
•
•
•
•
9:15 Intro to NIHPS
9:30 Intro to Stata
10:30 – 11:00 Coffee break
11:00 Stata files – log / do
Advantages of Stata
• 12:30 Questions / examples
NIHPS
• NIHPS began in 2001 and is an extension of
the BHPS (1991)
• ISER at Essex University has overall
responsibility for survey
• NISRA carries out fieldwork in NI
• 6 waves of NIHPS data available from UK
Data Archive (2001-2006)
NIHPS
• NIHPS follows representative sample of
individuals
• Household-based interviewing:
– All adults aged 16+
– From Wave 4 all children aged 11-15 (Youth
Panel)
• Unique value is that NIHPS measures
change at the individual level
NIHPS
• Achieved sample (full interviews all
years)
– Wave 1 - 3,458 individuals in 1,978
households
– Wave 2 - 2,692 individuals
– Wave 3 - 2,414 individuals
– BY Wave 6 - 2,151 individuals
• Attrition
Content of the NIHPS
• NIHPS has 3 components:
• Core component asked every year
– Includes health, housing, finances
• Rotating core component – every 3 years
– Includes wealth, assets and debt, parenting
• Variable component – once in the panel
– Includes race, place of birth, age left school
NIHPS datasets
• Cross-sectional files for each wave
• Longitudinal files for individuals
• Files linked by common variables
– PID (unique Personal Identification Number)
– wHID (Household ID – changes year on year)
– wPNO (person number – changes year on year)
• w refers to the wave id k,l,m,n,o,p - years
2001-2006 respectively
NIHPS
Record Type Record Description
wHHSAMP - household-level data for issued
households
wHHRESP - household-level data for
responding households
wINDSAMP - individual-level data for issued
households
wINDALL - enumerated individuals' data
(including children and nonrespondents)
NIHPS
Record Type Record Description
wINDRESP - individual-level data for
respondents
wEGOALT - relationship of each
individual in a household
wINCOME - income and payment data
wJOBHIST - information from the
employment history
NIHPS additional files
wMARRIAG - one record for each reported legal
marriage
wCOHABIT - one record for each cohabitation spell
outside marriage
wCHILDAD - information about adopted and/or stepchildren
wCHILDNT - information about natural children
wCHILD
- information on children and parenting
styles
wYOUTH
- responses to the Young persons
questionnaire
wLIFEMST - information about employment status
spells
NIHPS additional files
For ALL Waves
XWAVEID - information for matching
individuals between waves
XWLSTEN - information on the latest
known sample status of individuals
XWAVEDAT - central source of data on
individuals which is fixed and only
measured once in the panel e.g. race
Files using today: wINDALL
kindall.dta
obs: 5,188
vars: 52
lindall.dta
obs: 4,589
vars: 54
mindall.dta
obs: 4,210
vars 55
nindall.dta
obs: 3,940
vars: 55
oindall.dta
obs: 3,809
vars: 55
pindall.dta
obs: 3,650
vars: 55
Stata windows
Previous
commands
Results
Variables
Commands
Edit Preferences
Click on Edit tab
Come down to preferences
Select general preferences
LOG files – record your session
• Start
– Either click icon or select File > Log > Begin
• Types
– .smcl = Stata formatted
– .log = a text file or ASCII file 
• Choices
– View existing file
– Append new to old
– Overwrite with new
• Closure
– When you exit
– Choose to suspend / resume
Log file
LOG file
Choose folder
Give filename
Choose type LOG
Note : if a log file is on the name appears below results and above commands
DO files
• Text file containing commands rather than
typing commands at the keyboard
• Contents of review window (previous
commands can be saved into a do file
• Do files may call other do-files which call
other do-files nested 64 deep or
in a master.do up to 1,000 do files can be
called one after the other
Do file
Note: comment
Select commands to run and click icon
Built-in Variables
• _pi contains the value π to machine precision
• _n contains the number of the current obs.
– Eg age 23 34 45 56
_n
1 2 3 4
• _N contains the total number of obs.
– Eg age 23 34 45 56
_N
4 4 4 4
Note Stata respects case:
3 distinct names myvar
Myvar
MYVAR
Example of _n and _N
use kindall, clear
sort khid kpno
// sort file hhold and pno within
gen totcases = _N
// generate total number of obs
* For each hhold generate no of people in hhold
bysort khid: gen totninhh = _N
* For each hhold generate the number within
bysort khid: gen nwithinhh = _n
list pid khid kpno totninhh nwithinhh in 1/20
tab totninhh nwithinhh ,miss // crosstab include missing
gen totcases = _N // generate total number of obs
tab totcases
totcases
Freq.
Percent
Cum.
5188
5,188
100.00
100.00
Total
5,188
100.00
bysort khid: gen totninhh = _N
tab totninhh
totninhh
Freq.
Percent Cum.
1
518
9.98
2
1,238
23.86
33.85 <- 2 persons
3
915
17.64
51.48
4
1,176
22.67
74.15 <- 4 persons
5
830
16.00
90.15
6
252
4.86
95.01
7
175
3.37
98.38
8
56
1.08
99.46
9
18
0.35
99.81
10
10
0.19 100.00
Total
5,188
100.00
9.98
list pid khid kpno totninhh nwithinhh in 1/20
Case
pid
khid
kpno
totninhh
nwithinhh
1.
118500023
11850027
1
3
1
2.
118500058
11850027
2
3
2
3.
118500074
11850027
3
3
3
4.
118500317
11850043
1
1
1
5.
118501135
11850116
1
1
1
Saved Results
summarize produces summary statistics
sum kage12
Variable
kage12
Obs
5188
Mean
Std. Dev.
35.46164 22.59792
Min
0
Max
97
Also saves in r( ) 19 scalars like:
r(N) – no of obs
r(mean) – mean
r(sum) – sum of age r(sd) – std deviation
r(p1) – 1st percentile r(p95) 95th percentile
some are only available with sum kage12, detail
To list results stored in r( ) type return list
. sum kage12, detail
age at 1.12.2001
Percentiles Smallest
1%
0
0
5%
3
0
10%
6
0
Obs
5188
25%
16
0
Sum of Wgt.
5188
50%
34
Mean
Largest
Std. Dev.
35.46164
22.59792
75%
53
92
90%
68
94
Variance
510.6658
95%
75
96
Skewness
.2723639
99%
83
97
Kurtosis
2.072386
After sum kage12,detail type return list
scalars:
r(N)
r(sum_w)
r(mean)
r(Var)
r(sd)
r(skewness)
r(kurtosis)
r(sum)
r(min)
r(max)
=
=
=
=
=
=
=
=
=
=
5188
5188
35.46164225134927
510.66577343513
22.59791524533026
.2723638715033958
2.072386222684342
183975
0
97
r(p1) =
r(p5)
r(p10)
r(p25)
r(p50)
r(p75)
r(p90)
r(p95)
r(p99)
=
=
=
=
=
=
=
=
0
3
6
16
34
53
68
75
83
LOCAL variables
eg var referred to as `var’
` from key beside 1 and ‘ from key down beside L
Programming - loop over items/values
• foreach var in – loops over items
– Can be varlist or newlist or numlist
• forvalues x = – loops over consecutive values
– loop is executed as long as `x’ is in range
Example
* Comment Setup a local variable testvars
local testvars " khgr2r khgsex kage12"
* Start of loop – note { and ending }
* Could also use foreach x in khgr2r khgsex kage12 {
foreach x of local testvars {
display " the current variable is `x'
tab `x' // displays frequencies
sum `x' // produces summary statistics
ret list // displays all the saved results
}
// end of loop
Merging data files
• Two kinds of merges
– One-to-one
– Match-merge
• Result contained in new var _merge
– 1 = obs occurred ONLY in master dataset
– 2 = obs occurred ONLY in using dataset
– 3 = obs occurred in BOTH master and using datasets
Example of merging
Local dirdata “j:\nihps\nihps data\”
foreach x in k l m n o p {
use “`dirdata’`x'indall”, clear
keep pid `x'age12 `x'newhy
sort pid
save temp`x’,replace
}
use tempk,clear
foreach x in l m n o p {
merge pid using temp`x', _merge(mer`x')
sort pid
}
Command to check number of obs: tab1 *newhy
kindall.dta
obs: 5,188
vars: 52
lindall.dta
obs: 4,589
vars: 54
mindall.dta
obs: 4,210
vars 55
nindall.dta
obs: 3,940
vars: 55
oindall.dta
obs: 3,809
vars: 55
pindall.dta
obs: 3,650
vars: 55
Download