Introduction to Probability and Statistics Sayantan Banerjee Session 13 Sampling and Sampling distributions • Statistics is the science of inference. • It is the science of generalization from a part (the randomly chosen sample) to the whole (the population). • A random sample of n elements is a sample selected from the population in such a way that every set of n elements is equally likely to be selected. Sayantan Banerjee 1 Sample statistics • A population may be large, sometimes infinite, collection of elements. • A numerical measure of a population is called a population parameter, or simply a parameter. • A numerical measure of the sample is called a sample statistic, or simply, a statistic. Note that a statistic is free of any unknown population parameter. Sayantan Banerjee 2 Sample statistics • An estimator of a population parameter is a sample statistic used to estimate the parameter. An estimate of the parameter is a particular numerical value of the estimator based on the random sample. • When a single value is used as an estimate, we get a point estimate of the parameter. Sayantan Banerjee 3 Sample statistics Consider a population of students in the age-group 5-9 years. The problem of interest is to obtain an estimate of the mean height of the student population in that age-group. • We first draw a random sample of size 100 from the population. • Suppose the true mean height is µ, which we wish to estimate. • Suppose X1 , . . . , Xn be the heights of the students selected, with observed values x1 , . . . , xn . • A sample statistic to estimate µ may be given by the sample mean, n 1X Xi . X̄ = n i=1 • Suppose we observe the value of the sample mean as x̄ = 4. Then this value is our estimate of µ based on the sample. Sayantan Banerjee 4 Sample statistics • The population proportion p is equal to the number of elements in the population belonging to a category of interest, divided by the total number of elements in the population. • The sample proportion is given by p̂ = x , n where x is the number of elements in the sample belonging to the particular category, and n is the sample size. Sayantan Banerjee 5 Sample statistics A market research worker inteviewed a random sample of 18 people about their use of a certain product. The result, in terms of Y or N (Yes, of user of the product, and No, otherwise), are as follows: Y N N Y Y Y N Y N Y Y Y N Y N Y Y N. Estimate the population proportion of users of the product. Sayantan Banerjee 6 Sampling techniques • Simple Random Sampling (with/without replacement) • Stratified Sampling • Cluster Sampling Sayantan Banerjee 7 Sampling distributions The sampling distibution of a statistic is the probability distribution of that statistic. For example, for a random sample X1 , . . . , Xn from some probability distribution, the sampling distribution of the sample mean X̄ is the probability distribution of all possible values the random variable X̄ may take when a sample of size n is taken. Sayantan Banerjee 8 Sampling distrubution of X̄ for Normal population Suppose X1 , . . . , Xn be a random sample from a N (µ, σ 2 ) distribution. Then, X̄ ∼ N (µ, σ 2 /n). Sayantan Banerjee 9 The Central Limit Theorem Suppose random sampling from a population with mean µ and finite variance σ 2 . When the sample size n becomes large, the sampling distribution of X̄ will tend to a Normal distribution with mean µ and variance σ 2 /n. Sayantan Banerjee 10