THEORY OF SAMPLING Facilitator: Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Director Centre for Real Estate Studies Faculty of Engineering and Geoinformation Science Universiti Teknologi Malaysia Skudai, Johor Objectives Overall: Reinforce your understanding from the main lecture Specific: * Concept of sampling * Types of sampling techniques * Some useful tips in sampling What I will not do: To teach every bit and pieces of sampling techniques Concept of sampling “Definition” A process of selecting units from a population A process of selecting a sample to determine certain characteristics of a population Concept of sampling “Why sample” Economy Timeliness The large size of many populations Inaccessibility of some of the population Destructiveness of the observation – accuracy In most cases, census is unnecessary! General Types of Sampling Probability Sampling Non-probability Sampling Probability Sampling: utilizes some form of random selection Non-probability sampling: does not involve random selection Random/non-random→ issue of bias, sample validity, reliability of results, generalization Probability Sampling Simple random Stratified random Systematic random Cluster/area random Multi-stage random Non-probability Sampling Convenience Purposive Simple random sampling Population A B S T Sample Y P C G G K N B W E G T K L Q element population Probability selected = ni/N When population is rather uniform (e.g. school/college students, low-cost houses) Simplest, fastest, cheapest Population not uniform Wrong procedure Could be unreliable, why? ? Random selection Pick any “element” Use random table Stratified random sampling Population 1 3 10 7 4 8 6 14 Sample 12 13 20 11 2 15 16 3 7 10 16 Stratum 1 = odd no. Stratum 2 = even no. Break population into “meaningful” strata and take random sample from each stratum Can be proportionate or disproportionate within strata When: * population is not very uniform (e.g. shoppers, houses) * key sub-groups need to be represented → more precision * variability within group affects research results * sub-group inferences are needed Stratified random sampling (contd.) “Disproportionate” Let say a sample of 250 companies is required to conduct a research on “strategic planning” practices among the managers. Total company population is 550, but a sample frame obtained is 290. Sampling intensity = 45.5% Type of company Sample frame Sample stratum Sample Sole Partnership Proprietor 150 58 150/290 X 250 129 58/290 x 250 50 Private Limited 82 82/290 x 250 71 Stratified random sampling (contd.) “Proportionate” Let say a sample of 250 companies is required to conduct a research on “strategic planning” practices among the managers. Total company population is 550, but a sample frame obtained is 290. Researcher decides to take 25% cases from each stratum. Sampling intensity = 13.5%. Type of company Sample frame Sample stratum Sample Sole Proprietor 150 Partnership 58 Private Limited 82 25/100 x 150 38 25/100 x 58 15 25/100 x 82 21 Systematic sampling Simple or stratified in nature Systematic in the “picking-up” of element. E.g. every 5th. visitor, every 10th. House, every 15th. minute Steps: * Number the population (1,…,N) * Decide on the sample size, n * Decide on the interval size, k = N/n * Select an integer between 1 and k * Take case for every kth. unit Systematic sampling (contd.) “Example” Systematic sampling (contd.) “Example” In a face-to-face consumer survey, a sample of 500 shoppers is planned for a 7-day (Mon. – Sun.) period at a shopping complex. The sampling is planned for 3 time blocks: 12-3 p.m.; 3-6 p.m.; and after 6-9 p.m. Respondents are sub-divided into 4 ethnic groups: Malays (30%), Chinese (30%), Indian (30%), and Others (10%). Finally, they are categorized into “Family” and “Single”. Repeat persons are not allowed in the sampling. Determine you sampling plan and determine the timing for respondent “pick-up” interval? Systematic sampling (contd.) sampling plan 500/7 = 72 shoppers per day 72/3 = 24 per time block 24/3 = 8 shoppers per hour 8/4 = 2 shoppers per ethnic group per hour 60/8 = 7.5th. minutes “pick-up” interval Cluster sampling Research involves spatial issues (e.g. do prices vary according to neighbourhood’s level of crime?) Sampling involves analysis of geographic units Sampling involves extensive travelling → try to minimise logistic and resources Steps: * Divide population into “clusters” (localities) * Choose clusters randomly (simple random, stratified, etc.) * Take all cases from each cluster Efficient from administrative perspective Cluster sampling “Example” Multi-stage sampling (contd.) Among choices: * Two-stage cluster (cluster first, then, stratify within cluster). Tmn Daya Tmn Perling Tmn Tebrau Cluster Strata M C I M C I M C I Multi-stage sampling (contd.) * Three-stage stratified (Locality first, then, ethnic, then, family status). Inner Outskirt M I C MD UD C I UD Locality Suburb M MD C MD I M UD Ethnic Family status Convenience sampling Naïve sampling Does not intend to represent the population Selection based on one’s “convenience”, by “accident”, or “haphazard” way Common in popular surveys, public “view” or “opinion” (e.g. by-the-road-side “interviews”) Serious bias – only one group included Must be avoided Purposive sampling Sampling involves “pre-determined” criteria. E.g. house buyers (25-45 years old), low-cost house buyers (income ≤ RM 2,500) Proportionality is not critical Achieve sample size quickly More likely to get the required results about the target population. E.g. what cause tax defaults? → sample those who have not paid tax for, say, over 3 years. Can be useful if designed properly Types of purposive sampling: modal instance, expert panel, quota, heterogeneity/diversity, snowball Purposive sampling (contd.) “Modal instance” “Typical”, “most frequently”, or “modal” cases. E.g. * 60% of Malaysian population earns ≤ RM 4,000 per month. * 65% of residential properties comprises singleand double-storey terrace units. * First-time house buyers have mean age of 27 years. * Modal home is a single-storey terraced priced at RM 120,000 per unit. Sample is taken to represent the population Population’s normal distribution can be analysed Purposive sampling (contd.) “Expert panel” A sample of persons with known or demonstrable experience and expertise in some area. E.g. * Economic growth next two years → ? * Challenges in ICT in Malaysia → ? * Best practices in corporate management → ? Advantages: * Best way to elicit the views of persons who have specific expertise. * Helps validate other sampling approaches Disadvantages: * Even experts can be, and often are, wrong. * May be group-biased Purposive sampling (contd.) “Quota sampling” Select cases non-randomly according to some fixed quota. Proportional quota * Represent major characteristics of the population by proportion. E. g. 40% women and 60% men * Have to decide the specific characteristics for the quota (e.g. gender, age, education race, religion, etc.) Non-proportional quota * Specific minimum size of cases in each category. * Not concerned with upper limit of quota, simply to have enough to assure enumeration. * Smaller groups are adequately represented in sample. Purposive sampling (contd.) “Heterogeneity/diversity sampling” Almost the opposite of modal instance sampling Include all opinions or views Proportionate representation of population is not important Broad spectrum of ideas, not identifying the "average" or "modal instance“. E.g. * Challenges in ICT: different user groups have or perceive different challenges. What is sampled not people, but perhaps, ideas Ideas can be "outlier" or unusual ones. Purposive sampling (contd.) “Snowball sampling” Identify a case that meets criteria for inclusion in the study. Find another case, that also meets the criteria, based on the first one. Next, search for others based on the previous ones, and so on. Hardly leads to representative sample, but useful when population is inaccessible or hard to find. E.g. * the homeless * forced sales properties * wound-up companies Some tips “Determining sample size” Rules of thumb: * anything ≥ 30 cases * smaller population needs greater sampling intensity * type of sample Statistical rules: * level of accuracy required * a priori population parameter * type of sample Why sample size matters? Too large → waste time, resources and money Too small → inaccurate results Generalizability of the study results Minimum sample size needed to estimate a population parameter. Determining sample size “Example” Many ways One way → use statistical sample Different sample types have different formula Based on simple random sampling: n = required sample size Z/2 = known critical value, based on level of confidence (1 – ) σ = std. deviation of population (must be known) = maximum precision required between sample and population mean Determining sample size “Numerical example” Problem A researcher would like to estimate the average spending of households in one week in a shopping complex for the client’s business plan and model. How many households must we randomly select to be 95% sure that the sample mean is within RM 25 of the population mean. Information on household shows that variation in average weekly spending per household = RM 160 Tips for solution * We are solving for the sample size n. * A 95% degree confidence corresponds to = 0.05. * Each of the shaded tails in the following figure has an area of = 0.025 * Region to the left of and to the right of Z = 0 is 0.5 - 0.025, or 0.475 * Table of the Standard Normal ( ) Distribution: area of 0.475 → ‘critical value’ = 1.96. * Margin of error = 25, std. deviation = 160 Test yourselves! 1. A hypothesis in a research says that “investment yields is insignificantly influenced by risk attitude of the investor”. How would you determine your sample to prove or disprove it? 2. Some issues are posed in a social research, among other things, as follows: * What constitutes “good governance”? * What is “good leadership”? * What is an “effective strategy” Suggest how would you design your sample to obtain a wide-spectrum but yet valid answers to these issues? Thank you!