Part one: Designing Survey Samples 1.1 Overview: sampling 2016/3/11 www.uic.edu.hk/~xlpeng 1 What is population? Population: The group of people, items or units under investigation. For example, All cars built by a particular company in the year 2001, All apples sold as Grade I by a particular supermarket, All students in a university, all smokers. 2016/3/11 www.uic.edu.hk/~xlpeng 2 What is a sample? Sampling frame: A list of members of the population of interest. Sample: A subset of a population. Sampling unit: each member of the population. When the sampling units are people, the sample is often referred to as a sample survey. 2016/3/11 www.uic.edu.hk/~xlpeng 3 Sampling Frame It should be comprehensive, complete and up-to-date. Examples of sampling frame: Electoral Register (选民名册); Telephone book of Zhuhai; Student namelist of this course. Any others? Why Sample the Population? The cost of studying all the items in a population. The time-consuming aspect of contacting the whole population. The destructive nature of certain tests. The adequacy of sample results in most cases. Sampling design is very important! Before analyzing data statistically, it is important to consider if the data were collected appropriately. Many years of labor and even careers have been virtually wasted because of fundamental flaws in the data collection step. The statistical analysis will only likely be a minor part of the total expense of a properly conducted experiment, so time, effort, and money spent ensuring the data are collected appropriately is certainly well spent. About the sample size Theoretically, the larger, the better. However, observations cost money, time and talent. If the sample size is too small, we have bought the inadequate for the time and effort expended and have again been wasteful. Probability and Non-Probability Sampling A probability sample is one in which each member of the population has an known chance of being selected. In a non-probability sample, some people have a greater, but unknown, chance than others of selection. Types of Sampling Samples Non-Probability Samples Purposive Quota Snowball Convenience Probability Samples Simple Random Stratified Systematic Cluster Types of Probability Sampling Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Simple Random Sampling If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of selected, the sampling procedure is called simple random sampling. For example, randomly pick two different people from a group of 15: Number the people from 1 to 15; and write their numbers on 15 different slips of paper. Thoroughly mix the papers and randomly pick two of them. The numbers on the slips identifies the people for the sample. Drawing the Random Sample If the population is large, use a table of random numbers. In large sampling projects, tables of random numbers are often used to automate the sample selection process. See Table 1.1 for a table of random numbers. For a demonstration of the use of random numbers, -- Use random numbers to randomly select 100 employees from a bank which has 2,136 employees. Random numbers can be computer-generated. The Purpose in Example 1.1 A bank bought a 500-minute plan for its 2136 employees, and the bank is going to estimate its cellular phone cost. However, it is time consuming to study all 2136 employees’ mobile phone minutes. Instead, the bank plans to draw some conclusions on phone cost by studying the number of minutes used last month by each of 100 randomly selected employees. How to Draw a Random Sample in Example 1.1 ? The bank makes a numbered list of the 2136 employees (the list is called a frame). Use a random number table (see Table 1.1a) Each number in the table has 5 digits. Use the last digit if the population size <10. Use the last two digits if the population size < 100. ……. As in the example, the size of the poplation is 2136, we need to use the last four digits. Arbitrarily select any set of last four digits in the table which is not greater than 2136, say choose 0511 from 90511, which is the first randomly selected employee. Moving in any direction from 0511 (up, down, right, or left), if the four digits > 2136, ignore it; otherwise accept it as the next randomly selected employee. Continuing this procedure until we obtain the entire random sample of 100 employees (phone users). Systematic Random Sampling (系统抽样) Every kth member of the population is sampled, with a random start . The items or individuals of the population are arranged in some order. A random starting point is selected and then every kth member of the population is selected for the sample. 它是先将总体中各单位按一定的标志排队,然后每隔一定的距 离抽取一个单位构成样本。 Systematic Random Sampling Population elements are an ordered sequence (at least, conceptually). k= The first sample element is selected randomly from the first k population elements. Thereafter, sample elements are selected at a constant interval, k, from the ordered sequence frame. N n Where: n = sample size N = Population size k = size of selection interval Example 1.2 Rating a New Bottle Design A soft drink company has designed a new bottle, and wants to know consumer reaction to the new design. The company shows the new design to a part of shoppers in a big shopping mall, and asks them to rate the bottle image. The image will be measured by combining consumers’ response to five items in a survey form. How to Draw a Random Sample in Example 1.2 ? In the example, it is not possible to list and number each shopper at the mall while the study is being conducted. Consequently, we cannot use random number table and computer codes. Instead, we can select every kth (say k=100) shopper passing a specific location in the mall, and invite him/her to participate in the survey. A sample obtained by this way is called a systematic sample. Systematic Random Sampling: Example Purchase orders for the previous fiscal year are serialized 1 to 10,000 (N = 10,000). A sample of fifty (n = 50) purchases orders is needed for an audit. k = 10,000/50 = 200 First sample element randomly selected from the first 200 purchase orders. Assume the 45th purchase order was selected. Subsequent sample elements: 245, 445, 645, . . . Systematic Random Sampling Systematic random sampling is convenient and relatively easy to administer, hence less selection errors. Another example: The historic event leading to the word decimate, where every 10th Roman soldier was killed, is a gruesome example of systematic sampling. Stratified Random Sampling (分层抽样): A population is first divided into subgroups, called strata, and a sample is selected from each stratum. 它是按照某一标志,先将总成分成若干组(类),其中每一组 (类)称为一层,再在层内按简单随机抽样方法进行抽样 Example of stratified sampling If we want to ensure that a sample of 5 students from a group of 50 contains both male and female students in same proportions as in the full population (i.e. the group of 50) . we first divide that population into male and female. To work out the number of males and females in the sample........ No. of males in sample = (5 / 50) x 22 = 2.2 No. of females in sample = (5 / 50) x 28 = 2.8 “Round" the numbers and we choose 2 males and 3 females in the sample. These would be selected using simple random or systematic sample methods. Stratified Random Sampling Strata 1 : Gender Male Female Strata 2 : Age Strata 3 : Occupation < 20 professional 20-30 clerical 31-40 blue collar 41-50 other 51-60 > 60 We can acquire about the total population, make inferences within a stratum or make comparisons across strata Stratified Random Sampling After the population has been stratified, we can use simple random sampling to generate the complete sample: If we only have sufficient resources to sample 400 people total, we would draw 100 of them from the low income group… …if we are sampling 1000 people, we’d draw 50 of them from the high income group. Reasons for using stratified random sampling Stratification may produce a smaller error of estimation. Estimates of population parameters may be desired and compared for subgroups. Cluster Sampling (整群抽样): A population is first divided into primary units then samples are selected from the primary units. 整群抽样是先将总体按某一标志分成若干组,其中每个组称为一个群 ,以群为单位进行简单随机抽样,然后对抽到的每个单位都进行调查 。 Cluster Sampling is divided into several “clusters,” each representative of the population. Population A simple random sample of clusters is selected All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique Population divided into 16 clusters. Randomly selected clusters for sample Other examples for cluster sample: City blocks are frequently used as clusters of households or people . An automobile forms a nice cluster of four tires for studies of tire safety. Cluster sampling is an effective design under the following conditions: A good frame listing population elements is not easy or very cost to obtain, while a frame listing clusters is easily obtained. Deference between stratified random sampling and cluster sampling In stratified random sampling, we take a simple random sample within each group; In cluster sampling, we take a simple random sample of groups and then all items within the selected groups (clusters). Identify the type or types of sampling used-George went through the telephone book and called every 89th person listed. Four people divided the telephone book evenly and each randomly sampling from their portion. All people with a 461 telephone exchange are called. Every 5th block of 10 students leaving the Eau Claire High School cafeteria on June 31 is exhaustively sampled about their faith in random samples. (Judgement Sampling) Examples Hand Picking People at a Shopping Center that Fit a Particular Description Soliciting Participation from Students in Just One Section of a Course when Many Different Sections are Offered and of Interest Only Soliciting Participation from UIC Students when the Population of Interest is College Students in All of China Quota Sampling Selection Procedures are Similar to Purposive Sampling Essentially, the Major Difference is that Sample Representativeness is of Utmost Importance – Specifically, the Proportion of Various Elements in the Sample is Supposed to Closely Match that in the Population (or, at least, our best guess, most trusted estimate, etc. of those proportions in the population) Examples Hand Picking People at a Shopping Center Until We Obtain (n=70) Female Shoppers & (n=30) Male Shoppers Soliciting Participation from UIC Students Until We Have 2.5 Times as Many DBM Students as DST Students for Our Study Examples People Living with HIV/AIDS People with Rare Genetic Disorders Celebrities Seniors/Elders Prisoners