050 Sampling Theory

advertisement
THEORY OF
SAMPLING
Facilitator:
Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman
Director
Centre for Real Estate Studies
Faculty of Engineering and Geoinformation
Science
Universiti Teknologi Malaysia
Skudai, Johor
Objectives

Overall: Reinforce your understanding from the main lecture

Specific:
* Concept of sampling
* Types of sampling techniques
* Some useful tips in sampling

What I will not do: To teach every bit and pieces of sampling
techniques
Concept of sampling
“Definition”


A process of selecting units from a population
A process of selecting a sample to determine
certain characteristics of a population
Concept of sampling
“Why sample”
Economy
 Timeliness
 The large size of many populations
 Inaccessibility of some of the population
 Destructiveness of the observation –
accuracy
In most cases, census is unnecessary!

General Types of Sampling
Probability Sampling
Non-probability Sampling
Probability Sampling: utilizes some form of
random selection
Non-probability sampling: does not involve
random selection
Random/non-random→ issue of bias, sample
validity, reliability of results, generalization
Probability Sampling
Simple random
Stratified random
Systematic random
Cluster/area random
Multi-stage random
Non-probability Sampling
Convenience
Purposive
Simple random sampling
Population
A
B
S
T
Sample
Y
P
C
G
G
K
N
B
W
E
G
T
K
L
Q
element
population
Probability selected = ni/N
When population is rather uniform (e.g.
school/college students, low-cost houses)
Simplest, fastest, cheapest Population not uniform
Wrong procedure
Could be unreliable, why?
?
Random selection
Pick any “element”
Use random table
Stratified random sampling
Population
1
3
10
7
4
8
6
14
Sample
12
13
20
11
2
15
16
3
7
10
16
Stratum 1 = odd no.
Stratum 2 = even no.
 Break population into “meaningful” strata and take
random sample from each stratum
 Can be proportionate or disproportionate within strata
 When:
* population is not very uniform (e.g. shoppers, houses)
* key sub-groups need to be represented → more
precision
* variability within group affects research results
* sub-group inferences are needed
Stratified random sampling (contd.)
“Disproportionate”
Let say a sample of 250 companies is required to conduct a research on
“strategic planning” practices among the managers. Total company population
is 550, but a sample frame obtained is 290. Sampling intensity = 45.5%
Type of
company
Sample
frame
Sample
stratum
Sample
Sole
Partnership
Proprietor
150
58
150/290 X
250
129
58/290 x
250
50
Private
Limited
82
82/290 x
250
71
Stratified random sampling (contd.)
“Proportionate”
Let say a sample of 250 companies is required to conduct a research on
“strategic planning” practices among the managers. Total company population
is 550, but a sample frame obtained is 290. Researcher decides to take 25%
cases from each stratum. Sampling intensity = 13.5%.
Type of
company
Sample
frame
Sample
stratum
Sample
Sole
Proprietor
150
Partnership
58
Private
Limited
82
25/100 x
150
38
25/100 x
58
15
25/100 x
82
21
Systematic sampling



Simple or stratified in nature
Systematic in the “picking-up” of element. E.g.
every 5th. visitor, every 10th. House, every 15th.
minute
Steps:
* Number the population (1,…,N)
* Decide on the sample size, n
* Decide on the interval size, k = N/n
* Select an integer between 1 and k
* Take case for every kth. unit
Systematic sampling (contd.)
“Example”
Systematic sampling (contd.)
“Example”
In a face-to-face consumer survey, a sample of 500
shoppers is planned for a 7-day (Mon. – Sun.)
period at a shopping complex. The sampling is
planned for 3 time blocks: 12-3 p.m.; 3-6 p.m.; and
after 6-9 p.m. Respondents are sub-divided into 4
ethnic groups: Malays (30%), Chinese (30%),
Indian (30%), and Others (10%). Finally, they are
categorized into “Family” and “Single”. Repeat
persons are not allowed in the sampling. Determine
you sampling plan and determine the timing for
respondent “pick-up” interval?
Systematic sampling (contd.)
sampling plan
500/7 = 72 shoppers per day
72/3 = 24 per time block
24/3 = 8 shoppers per hour
8/4 = 2 shoppers per ethnic group per hour
60/8 = 7.5th. minutes “pick-up” interval
Cluster sampling
 Research involves spatial issues (e.g. do prices vary
according to neighbourhood’s level of crime?)
 Sampling involves analysis of geographic units
 Sampling involves extensive travelling → try to
minimise logistic and resources
 Steps:
* Divide population into “clusters” (localities)
* Choose clusters randomly (simple random,
stratified, etc.)
* Take all cases from each cluster
 Efficient from administrative perspective
Cluster sampling
“Example”
Multi-stage sampling (contd.)

Among choices:
* Two-stage cluster (cluster first, then,
stratify within cluster).
Tmn Daya
Tmn Perling
Tmn Tebrau
Cluster
Strata
M
C
I
M
C
I
M
C
I
Multi-stage sampling (contd.)
* Three-stage stratified (Locality first,
then, ethnic, then, family status).
Inner
Outskirt
M
I
C
MD
UD
C
I
UD
Locality
Suburb
M
MD
C
MD
I
M
UD
Ethnic
Family
status
Convenience sampling
Naïve sampling
Does not intend to represent the population
Selection based on one’s “convenience”, by
“accident”, or “haphazard” way
Common in popular surveys, public “view”
or “opinion” (e.g. by-the-road-side
“interviews”)
Serious bias – only one group included
Must be avoided
Purposive sampling
 Sampling involves “pre-determined” criteria. E.g.
house buyers (25-45 years old), low-cost house
buyers (income ≤ RM 2,500)
 Proportionality is not critical
 Achieve sample size quickly
 More likely to get the required results about the
target population. E.g. what cause tax defaults?
→ sample those who have not paid tax for, say,
over 3 years.
 Can be useful if designed properly
 Types of purposive sampling: modal instance,
expert panel, quota, heterogeneity/diversity,
snowball
Purposive sampling (contd.)
“Modal instance”
 “Typical”, “most frequently”, or “modal” cases. E.g.
* 60% of Malaysian population earns ≤ RM
4,000 per month.
* 65% of residential properties comprises singleand double-storey terrace units.
* First-time house buyers have mean age of 27
years.
* Modal home is a single-storey terraced priced at
RM 120,000 per unit.
 Sample is taken to represent the population
 Population’s normal distribution can be analysed
Purposive sampling (contd.)
“Expert panel”
 A sample of persons with known or demonstrable
experience and expertise in some area. E.g.
* Economic growth next two years → ?
* Challenges in ICT in Malaysia → ?
* Best practices in corporate management → ?
 Advantages:
* Best way to elicit the views of persons who have
specific expertise.
* Helps validate other sampling approaches
 Disadvantages:
* Even experts can be, and often are, wrong.
* May be group-biased
Purposive sampling (contd.)
“Quota sampling”
 Select cases non-randomly according to some fixed quota.
 Proportional quota
* Represent major characteristics of the population by
proportion. E. g. 40% women and 60% men
* Have to decide the specific characteristics for the quota
(e.g. gender, age, education race, religion, etc.)
 Non-proportional quota
* Specific minimum size of cases in each category.
* Not concerned with upper limit of quota, simply to have
enough to assure enumeration.
* Smaller groups are adequately represented in sample.
Purposive sampling (contd.)
“Heterogeneity/diversity sampling”
 Almost the opposite of modal instance sampling
 Include all opinions or views
 Proportionate representation of population is not
important
 Broad spectrum of ideas, not identifying the
"average" or "modal instance“. E.g.
* Challenges in ICT: different user groups have or
perceive different challenges.
 What is sampled not people, but perhaps, ideas
 Ideas can be "outlier" or unusual ones.
Purposive sampling (contd.)
“Snowball sampling”
 Identify a case that meets criteria for inclusion in
the study.
 Find another case, that also meets the criteria,
based on the first one.
 Next, search for others based on the previous
ones, and so on.
 Hardly leads to representative sample, but useful
when population is inaccessible or hard to find.
E.g.
* the homeless
* forced sales properties
* wound-up companies
Some tips
“Determining sample size”
Rules of thumb:
* anything ≥ 30 cases
* smaller population needs greater
sampling intensity
* type of sample
Statistical rules:
* level of accuracy required
* a priori population parameter
* type of sample

Why sample size matters?





Too large → waste time, resources and money
Too small → inaccurate results
Generalizability of the study results
Minimum sample size needed to estimate a
population parameter.
Determining sample size
“Example”




Many ways
One way → use statistical sample
Different sample types have different formula
Based on simple random sampling:

n = required sample size
Z/2 = known critical value, based on level of confidence (1 – )
σ = std. deviation of population (must be known)
= maximum precision required between sample and population mean
Determining sample size
“Numerical example”
Problem
A researcher would like to estimate the average spending of households in one week
in a shopping complex for the client’s business plan and model. How many
households must we randomly select to be 95% sure that the sample mean is within
RM 25 of the population mean. Information on household shows that variation in
average weekly spending per household = RM 160
Tips for solution
* We are solving for the sample size n.
* A 95% degree confidence corresponds to = 0.05.
* Each of the shaded tails in the following figure has an area of = 0.025
* Region to the left of and to the right of Z = 0 is 0.5 - 0.025, or 0.475
* Table of the Standard Normal ( ) Distribution: area of 0.475 → ‘critical value’ = 1.96.
* Margin of error = 25, std. deviation = 160
Test yourselves!
1. A hypothesis in a research says that “investment yields is insignificantly
influenced by risk attitude of the investor”. How would you determine
your sample to prove or disprove it?
2. Some issues are posed in a social research, among other things, as
follows:
* What constitutes “good governance”?
* What is “good leadership”?
* What is an “effective strategy”
Suggest how would you design your sample to obtain a wide-spectrum
but yet valid answers to these issues?
Thank you!
Download