R-programme for sampling

advertisement
A workshop on using R
to select a sample for
EHES
Susie Cooper & Johan Heldal
1
Statistics Norway
Overview
• What is R and why use it?
• Practical Exercises
1.
2.
3.
4.
5.
Installing and loading R and packages
Reading external files
Calculating sample sizes
Stage 1 - Selecting Primary Sampling Units (PSU)
Stage 2 - Selecting Secondary Sampling Units (SSU)
• Where to get more information
2
Why use R for EHES?
• It has been agreed with EU because
• It’s free - therefore available for all
countries involved.
• Very flexible
• Very powerful and fast tool for sampling
and analyses.
However…
• There can be a steep learning curve to
using the program.
• No user-friendly interface.
3
What is EHESsampling?
• A tool for planning the sampling
design
• Can be used to find good stratifications
• Can calculate cost-variance optimal
sample sizes within PSUs.
• Can calculate costs and variances of
alternatives.
• A tool for taking a probability sample
from a sampling frame.
4
Using EHESsampling
• The EHESsampling manual
• Before using EHESsampling you have to
prepare some input datasets from the
main sampling frame. For sampling at
stage 1 you need
• A dataset describing the PSUs
• A dataset describing the strata
For stage 2 you need
• The main sampling frame describing the
individual units
5
1. Loading Packages
• Load the EHESsampling package and
other necessary packages each time
you re-open R:
library(EHESsampling)
6
2. Reading External
Files
• Open a new script by selecting File
and New script
7
2. Reading External
Files
•
Set the working directory where data
files are stored by typing into the new
script:
setwd("X:/120/EHES/R/Data")
Location on your computer where
the data files are stored
•
Then press
the console
+ R to send the line to
8
2. Reading External
Files
• Read in the chosen file and save it in
the working environment.
PSUs.df<-read.table("post1000.csv", sep=";", dec=",",
header=T)
• The file is now stored as PSUs.df for
this session.
9
2. Reading External
Files
• To see the start of the data set type:
head(PSUs.df)
Print the first 6
lines of this
10
Further Sampling Steps
• Read in the strata dataset
• Calculate the PSU sample sizes
• Take a sample of PSUs – stage 1
• Merge the selected PSUs with the
main sampling frame containing
individual units.
• Sample individual units – stage 2
24
Selected Individuals
25
Help!
• EHESsampling manual available at:
www.ehes.info
• EHES participant manual – Part 1: Chapter
05
• R websites:
• R official site: www.r-project.org
• Quick R: www.statmethods.net
• Us:
• Johan.Heldal@ssb.no
• Susie.Cooper@ssb.no
26
Download