A Sampling Distribution Sample and population (ASW, 15)v A population is the collection of all the elements of interest. A sample is a subset of the population. Good or bad samples. Representative or non-representative samples. A researcher hopes to obtain a sample that represents the population, at least in the variables of interest for the issue being examined. Sample and population (ASW, 15)v Probabilistic samples are samples selected using the principles of probability. This may allow a researcher to determine the sampling distribution of a sample statistic. If so, the researcher can determine the probability of any given sampling error and make statistical inferences about population characteristics. Methods of sampling – probabilistic Random sampling methods – each member has an equal probability of being selected. Systematic Random – every kth case. Equivalent to random if patterns in list are unrelated to issues of interest. Stratified random samples – sample from each stratum or subgroup of a population. Eg. region, size of firm. Population inferences can be made... ...by selecting a representative sample from the population Methods of sampling – probability Cluster samples – sample only certain clusters of members of a population. Eg. city blocks, firms. Multistage samples – combinations of random, systematic, stratified, and cluster sampling. If probability involved at each stage, then distribution of sample statistics can be obtained. Methods of sampling – nonprobability Friends, family, neighbours, acquaintances. Students in a class or co-workers in a workplace. Convenience -the willingness of a person as your subject to interact with you counts a lot in this non-probability sampling method. Methods of sampling – nonprobability Volunteers-the subjects you expect to participate in the sample selection are the ones volunteering to constitute the sample, there is no need for you to do any selection process. Snowball sample-Similar to snow expanding widely or rolling rapidly Methods of sampling – nonprobabilistic Quota sample-you tend to choose sample members possessing or indicating the characteristics of the target population. Sampling distribution of statistics cannot be obtained using any of the above methods, so statistical inference is not possible. Why sample? Time of researcher and those being surveyed. Cost to group or agency commissioning the survey. Confidentiality, anonymity, and other ethical issues. Non-interference with population. Large sample could alter the nature of population, eg. opinion surveys. Why sample? Do not destroy population, eg. crash test only a small sample of automobiles. Cooperation of respondents – individuals, firms, administrative agencies. Partial data is all that is available, eg. fossils and historical records, climate change. Examples: 1.Have a list of all members of the population; write each name on a card, and choose cards through a pure-chance SIMPLE RANDOM selection. 2. You want to have a sample of 150, you may select a set of numbers like 1 to 15, and out of a list of 1,500 students, take every 15th name on the list until you complete the total number of respondents to constitute your sample. SYSTEMATIC RANDOM Examples: 3. Dissimilarity of sample with those in the sampling frame STRATIFIED RANDOM 4. Group-by-group selection of sample CLUSTER 5. Checking every 10th student in the list SYSTEMATIC RANDOM Examples: 6. Interviewing some persons you meet on the campus CONVENIENCE 7. Dividing 100 persons into groups CLUSTER 8. Choosing subjects capable of helping you meet the aim of your study SNOWBALL Examples: 9. Choosing samples by chance but through an organizational pattern SYSTEMATIC RANDOM 10. Letting all members in the population join the selection process SIMPLE RANDOM 11. Matching people’s traits with the population members’ traits STRATIFIED RANDOM PARAMETER AND STATISTIC Parameter – characteristics of a population (ASW, 259). Eg. total (annual GDP or exports), proportion p of population that votes Liberal in federal election. Also, µ or σ of a probability distribution are termed parameters. Statistic – numerical characteristics of a sample. Eg. monthly unemployment rate, pre-election polls. Measure Parameter Statistic or point estimator Mean μ Standard deviation Proportion σ No. of elements N s p n Assessment 1. The teacher randomly selects 20 boys and 15 girls from a batch of learners to be members of a group that will go to a field trip. 2. A sample of 10 mice are selected at random from a set of 40 mice to test the effect of a certain medicine. 3. The people in a certain seminar are all members of two of five groups are asked what they think about the president. Assessment 4. A barangay health worker asks every four house in the village for the ages of the children living in those households. 5. A sales clerk for a brand of clothing asks people who comes up to her whether they own a piece of article from her brand. Assessment 6. A psychologist asks his patient, who suffers from depression, whether he knows other people with the same condition, so he can include them in his study. 7. A brand manager of a toothpaste asks ten dentists that have clinic closest to his office whether they use a particular brand of toothpaste. Assessment 8. The process of using sample statistics to draw conclusions about true population parameters is called a) statistical inference b) the scientific method c) sampling d) descriptive statistics Assessment 9. The universe or "totality of items or things" under consideration is called a) a sample b) a population c) a parameter d) a statistic Assessment 10. The portion of the universe that has been selected for analysis is called a) a sample b) a frame c) a parameter d) a statistic Assessment 11. A summary measure that is computed to describe a characteristic from only a sample of the population is called a) a parameter b) a census c) a statistic d) the scientific method Assessment 12. A summary measure that is computed to describe a characteristic of an entire population is called a) a parameter b) a census c) a statistic d) the scientific method Assessment 13. Which of the following is most likely a population as opposed to a sample? a) respondents to a newspaper survey b) the first 5 learners completing an assignment c) every third person to arrive at the bank d) registered voters in a county Assessment 14. Which of the following is most likely a parameter as opposed to a statistic? a) The average score of the first five learners completing an assignment b) The proportion of females registered to vote in a county c) The average height of people randomly selected from a database d) The proportion of trucks stopped yesterday that were cited for bad brakes Assessment 15. Which of the following is NOT a reason for the need for sampling? a) It is usually too costly to study the whole population. b) It is usually too time-consuming to look at the whole population. c) It is sometimes destructive to observe the entire population. d) It is always more informative by investigating a sample than the entire population. Assessment 1. The teacher randomly selects 20 boys and 15 girls from a batch of learners to be members of a group that will go to a field trip. Stratified sampling 2. A sample of 10 mice are selected at random from a set of 40 mice to test the effect of a certain medicine. Simple Random Sampling 3. The people in a certain seminar are all members of two of five groups are asked what they think about the president. Cluster Sampling Assessment 4. A barangay health worker asks every four house in the village for the ages of the children living in those households. Systematic Sampling 5. A sales clerk for a brand of clothing asks people who comes up to her whether they own a piece of article from her brand. Volunteer Sampling 6. A psychologist asks his patient, who suffers from depression, whether he knows other people with the same condition, so he can include them in his study Snowball Sampling 7. A brand manager of a toothpaste asks ten dentists that have clinic closest to his office whether they use a particular brand of toothpaste. Convenience Sampling Assessment 8. The process of using sample statistics to draw conclusions about true population parameters is called a) statistical inference b) the scientific method c) sampling d) descriptive statistics ANSWER: a Assessment 9. The universe or "totality of items or things" under consideration is called a) a sample b) a population c) a parameter d) a statistic ANSWER: b Assessment 10. The portion of the universe that has been selected for analysis is called a) a sample b) a frame c) a parameter d) a statistic ANSWER: a Assessment 11. A summary measure that is computed to describe a characteristic from only a sample of the population is called a) a parameter b) a census c) a statistic d) the scientific method ANSWER: c Assessment 12. A summary measure that is computed to describe a characteristic of an entire population is called a) a parameter b) a census c) a statistic d) the scientific method ANSWER: a Assessment 13. Which of the following is most likely a population as opposed to a sample? a) respondents to a newspaper survey b) the first 5 learners completing an assignment c) every third person to arrive at the bank d) registered voters in a county ANSWER: d Assessment 14. Which of the following is most likely a parameter as opposed to a statistic? a) The average score of the first five learners completing an assignment b) The proportion of females registered to vote in a county c) The average height of people randomly selected from a database d) The proportion of trucks stopped yesterday that were cited for bad brakes ANSWER: b Assessment 15. Which of the following is NOT a reason for the need for sampling? a) It is usually too costly to study the whole population. b) It is usually too time-consuming to look at the whole population. c) It is sometimes destructive to observe the entire population. d) It is always more informative by investigating a sample than the entire population. ANSWER: d A Sampling Distribution The way our means would be distributed if we collected a sample, recorded the mean and threw it back, and collected another, recorded the mean and threw it back, and did this again and again, ad nauseam! A Sampling Distribution A theoretical frequency distribution of the scores for or values of a statistic, such as a mean. Any statistic that can be computed for a sample has a sampling distribution. A sampling distribution is the distribution of statistics that would be produced in repeated random sampling (with replacement) from the same population. A Sampling Distribution It is all possible values of a statistic and their probabilities of occurring for a sample of a particular size. Sampling distributions are used to calculate the probability that sample statistics could have occurred by chance and thus to decide whether something that is true of a sample statistic is also likely to be true of a population parameter. A Sampling Distribution Let’s create a sampling distribution of means… Take a sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K A Sampling Distribution Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. The sample means would stack up in a normal curve. A normal sampling distribution. $30K A Sampling Distribution Say that the standard deviation of this distribution is $10K. Think back to the empirical rule. What are the odds you would get a sample mean that is more than $20K off. The sample means would stack up in a normal curve. A normal sampling distribution. $30K -3z -2z -1z 0z 1z 2z 3z A Sampling Distribution Say that the standard deviation of this distribution is $10K. Think back to the empirical rule. What are the odds you would get a sample mean that is more than $20K off. The sample means would stack up in a normal curve. A normal sampling distribution. 2.5% 2.5% $30K -3z -2z -1z 0z 1z 2z 3z Central Limit Theorem as the sample size increases, the distribution of the sample proportion tends more towards a normal distribution. A Sampling Distribution Some rules about the sampling distribution of the mean… 1. The Central Limit Theorem says that for random sampling, as the sample size n grows, the sampling distribution of Y-bar approaches a normal distribution. 2. The sampling distribution will be normal no matter what the population distribution’s shape as long as n > 30. A Sampling Distribution Some rules about the sampling distribution of the mean… 3. If n < 30, the sampling distribution is likely normal only if the underlying population’s distribution is normal. 4. As n increases, the standard error (remember that this word means standard deviation of the sampling distribution) gets smaller. 5. Precision provided by any given sample increases as sample size n increases. Recall: A sampling distribution is the distribution of statistic that would be produced in repeated random sampling (with replacement) from the same population. A Sampling Distribution It is all possible values of a statistic and their probabilities of occurring for a sample of a particular size. Sampling distributions are used to calculate the probability that sample statistics could have occurred by chance and thus to decide whether something that is true of a sample statistic is also likely to be true of a population parameter. Central Limit Theorem as the sample size increases, the distribution of the sample proportion tends more towards a normal distribution. The Central Limit Theorem Difference between the mean and standard deviation of population and sample of sampling distribution with sample size n. Try this!!! Random samples with size 4 are drawn from a population containing the values 14, 19, 26, 31, 48, and 53. a. Determine the number of random samples with a sample size n=4. b. Construct a sampling distribution of the sample means c. Find the mean of the sample means d. Compute the standard error of the sample means a. Determine the number of random samples with a sample size n=4. a. Determine the number of random samples with a sample size n=4. b. Construct a sampling distribution of the sample means 14, 19, 26, 31, 48, and 53 Sample Sample 14,19,26,31 22.5 14,26,31,53 31 14,19,26,48 26.75 14,26,48,53 35.25 14,19,26,53 28 14,31,48,53 36.5 14,19,31,48 28 19,26,31,48 31 14,19,31,53 29.25 19,26,31,53 32.25 14,19,48,53 33.5 19,26,48,53 36.5 14,26,31,48 29.75 19,31,48,53 37.75 26,31,48,53 39.5 b. Construct a sampling distribution of the sample means Frequency Table: f 22.5 1 26.75 1 28 2 29.25 1 29.75 1 31 2 32.25 1 33.5 1 35.25 1 36.5 2 37.75 1 39.5 1 N=15 c. Find the mean of the sample means f 22.5 1 26.75 1 28 2 29.25 1 29.75 1 31 2 32.25 1 33.5 1 35.25 1 36.5 2 37.75 1 39.5 1 N=15 d. Compute the standard error of the sample means f 22.5 1 31.83 -9.33 87.0489 87.0489 26.75 1 28 2 29.25 1 31.83 -5.08 31.83 -3.83 31.83 -2.58 25.8064 14.6689 6.6564 25.8064 29.3378 6.6564 29.75 1 31 2 32.25 1 31.83 -2.08 31.83 -0.83 31.83 0.42 4.3264 0.6889 0.1764 4.3264 1.3778 0.1764 33.5 1 35.25 1 36.5 2 31.83 1.67 31.83 3.42 31.83 4.67 2.7889 11.6964 21.8089 2.7889 11.6964 43.6178 37.75 1 31.83 5.92 39.5 1 31.83 7.67 N 15 35.0464 58.8289 35.0464 58.8289 306.7085 d. Compute the standard error of the sample means A population containing the values 14, 19, 26, 31, 48, and 53. 14 31.83 -17.83 317.9089 19 31.83 -12.83 164.6089 26 31.83 -5.83 33.9889 31 31.83 -0.83 0.6889 48 31.83 16.17 261.4689 53 31.83 21.17 448.1689 N=6 1226.8334 Difference between population and sampling distribution Problem Solving: 1. A school has 900 senior high school students. The average height of these students is 68 in with a standard deviation of 6 in. Suppose you draw a random sample of 50 students. Find the mean, standard deviation and variance of the distribution of all sample means that can be derived from the samples. Answer: Mean=68 in Standard error =0.8485 Variance = 0.72 Problem Solving: 2.The average monthly income of teachers working in a public school is 25,000 and a standard deviation of 800. If a random sample of 15 teachers is selected, what is the mean , variance and standard error of the corresponding distribution of the sample means . Answer: Mean= 25000 Standard error = 206.56 Variance = 42666.67 Distribution of the sample mean of a Normal variable Example: Solution: This means that the probability that a randomly selected sample from the population will have a mean systolic blood pressure less than 122 is 1.22 %. Example Solution: Solution: