Module H2 Practical 3 Sampling Distributions and Standard Errors Objectives: By the end of this practical you should be able to: explain what is meant by an “estimate” of a population characteristic, calculated using sample data explain what is meant by the sampling distribution of an estimate calculate and interpret the standard error of a sample mean from data collected according to a simple random sampling scheme 1. Open the Excel file named H2_data.xls. Move to the sheet named cattle. As in practical 2, compute the mean and standard deviation of the two columns corresponding to data (fictitious) of the number of cattle in each district (in 1000’s), in variable cattle000, and the mean number of persons per sleeping room within households in each district, in variable pprm.. Use Excel functions AVERAGE and STDEV for this purpose. (a) Note down your results in the table below, noting that these are population values (this is exactly what you did in practical 2, but is repeated here to make it easier to compare sample values with population values). District Number Cattle Numbers Persons per sleeping room Population Mean ( ) = Population Std.Dev. ( )= (b) Move now to the next worksheet named 50 cattle samples. Here, 50 samples have been drawn, each of size 10, and the individual sample values have been recorded – leading to 500 observations. Use Excel to find the means and standard deviations of both variables with respect to the 1st sample, i.e. observations in cells B2:B11 and C2:C11. Record your results below. District Number Cattle Numbers Persons per sleeping room Sample Mean ( x ) = Sample Standard dev. ( s ) = SADC Course in Statistics Module H2 Practical 3 – Page 1 Module H2 Practical 3 (c) Note down the algebraic relationship between the standard deviation and the standard error of the mean. Use the above formula to find the standard error of the mean for each of the variables cattle000 and pprm. Note down the results below. s.e.m. for cattle = s.e.m. for pprm = Write down your interpretation of the standard error in each case. (d) Move to the next worksheet named sample means&sds which include the means and standard deviations for each sample (n=10 in each case) across the 50 repeated samples. Check that the first row of this worksheet includes your answers given in part (b) above. Each column in this worksheet represents values from a sampling distribution. Discuss with the person sitting next to you, what sampling distribution is being shown in each column, e.g. column catt10mn is the sampling distribution of what? (e) Go to the bottom of the worksheet sample means&sds , and in row 52, compute the standard deviation of the four columns of data. Note down below the standard deviation of the two columns corresponding to the samples means, i.e. of columns named catt10mn and pprm10mn. Std. deviation of cattle000 means = Std. deviation of pprm means = SADC Course in Statistics Module H2 Practical 3 – Page 2 Module H2 Practical 3 How close are these empirically computed standard errors, compared to the results from the single sample as calculated in part (c) above? (f) Note that the empirical values above, computed from 50 repeated samples, are estimates of the true standard error of the mean (of 10 samples) given by the formula s.e.m. (cattle000 mean) = population std.dev/(10) = 149.71/(10) = 47.3 s.e.m. (pprm mean) = population std.dev/(10) = 2.76/(10) = 0.873. Of course, in practice, you will not have population values, nor will you be able to take repeat samples. So in practice, the precision of the sample mean has to be estimated using the standard error of the mean derived from the sample values, as has been done in part (c) above. 2. Several computations have been done above giving you different standard deviations. They correspond to: (a) standard deviation of population values (b) standard deviation of sample values (c) standard error of the mean for a single sample of 10 observations (d) standard deviation of 50 sample means = empirical estimate of the standard error of the mean for a single sample of 10 observations. (e) theoretical standard error of a sample mean if it is based on 10 observations drawn as a simple random sample from a population with known standard deviation. SADC Course in Statistics Module H2 Practical 3 – Page 3 Module H2 Practical 3 Discuss, in pairs or in small groups of 3-4 persons per group, what these different values mean. Ensure you are clear about how they relate to each other and the interpretation of each. Finally indicate what values would be used in practice and how they are useful. Write down below the main lessons you learnt from this practical exercise. SADC Course in Statistics Module H2 Practical 3 – Page 4