A workshop on using R to select a sample for EHES Susie Cooper & Johan Heldal 1 Statistics Norway Overview • What is R and why use it? • Practical Exercises 1. 2. 3. 4. 5. Installing and loading R and packages Reading external files Calculating sample sizes Stage 1 - Selecting Primary Sampling Units (PSU) Stage 2 - Selecting Secondary Sampling Units (SSU) • Where to get more information 2 Why use R for EHES? • It has been agreed with EU because • It’s free - therefore available for all countries involved. • Very flexible • Very powerful and fast tool for sampling and analyses. However… • There can be a steep learning curve to using the program. • No user-friendly interface. 3 What is EHESsampling? • A tool for planning the sampling design • Can be used to find good stratifications • Can calculate cost-variance optimal sample sizes within PSUs. • Can calculate costs and variances of alternatives. • A tool for taking a probability sample from a sampling frame. 4 Using EHESsampling • The EHESsampling manual • Before using EHESsampling you have to prepare some input datasets from the main sampling frame. For sampling at stage 1 you need • A dataset describing the PSUs • A dataset describing the strata For stage 2 you need • The main sampling frame describing the individual units 5 1. Loading Packages • Load the EHESsampling package and other necessary packages each time you re-open R: library(EHESsampling) 6 2. Reading External Files • Open a new script by selecting File and New script 7 2. Reading External Files • Set the working directory where data files are stored by typing into the new script: setwd("X:/120/EHES/R/Data") Location on your computer where the data files are stored • Then press the console + R to send the line to 8 2. Reading External Files • Read in the chosen file and save it in the working environment. PSUs.df<-read.table("post1000.csv", sep=";", dec=",", header=T) • The file is now stored as PSUs.df for this session. 9 2. Reading External Files • To see the start of the data set type: head(PSUs.df) Print the first 6 lines of this 10 Further Sampling Steps • Read in the strata dataset • Calculate the PSU sample sizes • Take a sample of PSUs – stage 1 • Merge the selected PSUs with the main sampling frame containing individual units. • Sample individual units – stage 2 24 Selected Individuals 25 Help! • EHESsampling manual available at: www.ehes.info • EHES participant manual – Part 1: Chapter 05 • R websites: • R official site: www.r-project.org • Quick R: www.statmethods.net • Us: • Johan.Heldal@ssb.no • Susie.Cooper@ssb.no 26