INTRODUCTION TO ENGINEERING ECONOMICS Chapter 1

advertisement
CHAPTER 5: DATA COLLECTION AND
SAMPLING
Outline
•
•
•
•
Population and sample
Sources of data
Sampling
Sampling plans
– Simple random sampling
– Stratified random sampling
– Cluster sampling
1
POPULATION AND SAMPLE
• Parameter: summary measure about population, usually
unknown or known from some published sources
• Statistic: summary measure about sample
2
SOURCES OF DATA
• Primary data:
– Data published (in printed form, on data tapes, disks,
and internet) the same organization that collected data
– Some government agencies:
http://www.census.gov/
http://www.statcan.ca/
3
Sample data available from the Statistics Canada website
4
Sample data available from the Statistics Canada website
5
SOURCES OF DATA
• Secondary data:
– Data published by an
organization different
from the one that
originally collected
and published the
data
– A popular source of
the secondary data is
the Statistical Abstract
of the United States
6
SOURCES OF DATA
• Observational and experimental studies:
– Observational study: data is collected and recorded
without controlling any factor like it is done in an
experimental study
– experimental study: if more than one factor may cause
the same outcome, it may be desirable to vary one
factor at a time and control (keep unchanged) the other
factors e.g.,
• aircraft primer paints are applied to improve finished
paint adhesion force which depends on
– primer application method: dripping and spraying
– type of primer paint: type 1, 2, 3
7
SOURCES OF DATA
• an experiment was designed in which
– three specimens were painted with each primer
using each application method, a finish paint was
applied, and the adhesion force was measured.
The resulting data are shown below:
Adhesion Force Data
Primer Type
Dipping
1
4.0, 4.5, 4.3
2
5.6, 4.9, 5.4
3
3.8, 3.7, 4.0
Spraying
5.4, 4.9, 5.6
5.8, 6.1, 6.3
5.5, 5.0, 5.0
8
SOURCES OF DATA
• Surveys:
– Personal interview
– Telephone interview
– Questionnaire survey
9
SAMPLING
• Target population
– The population about which inference is desired
• Sampled population
– The actual population about which the sample has
been taken
• Self-selected samples
– The responders mail/call responses
– Such samples are usually biased
10
SAMPLING PLANS
• Simple random sampling
• Stratified random sampling
• Cluster sampling
11
SIMPLE RANDOM SAMPLING
• Suppose we have data about the annual incomes of 40
families in a spreadsheet file RANDSAMP.XLS.
• We want to choose a simple random sample of size 10
from this frame.
• How can this be done?
• And how do summary statistics of the chosen families
compare to the corresponding summary statistics of the
population?
12
SIMPLE RANDOM
SAMPLING
The family income data
are shown on right
13
SIMPLE RANDOM SAMPLING
• A simple random sample is a sample in which the
sampling units are chosen from the population by means
of a random mechanism such as a random number table
so that every possible sample with the same number of
observations is equally likely to be chosen.
• For example, let sample 1 consist of families 1, 2, 3, 4, 5,
6, 7, 8, 9, 10 and sample 2 consist of families 1, 2, 3, 4, 5,
6, 7, 8, 9, 11. If a simple random sample is chosen, then
Samples 1 and 2 will be equally likely to be chosen.
14
SIMPLE RANDOM SAMPLING
• Solution: The idea is very simple. We first generate a
column of random numbers in column C. Then we sort the
rows according to the random numbers and choose the
first 10 families in the sorted rows.
• The following procedure produces the results.
– Random numbers. Enter the formula =RAND() in cell
C10 and copy it down column C.
– Replace with values. To enable sorting we must
“freeze” the random numbers - that is, replace their
formulas with values. To do this, select the range
C10:C49 use Edit/Copy and then use Edit/Paste
Special with the Values option.
15
SIMPLE RANDOM SAMPLING
– Copy to a new range. Copy the range A10:C49 to the
range E10:G49.
– Sort. Select the range E10:G49 and use the Data/Sort
menu item. Sort according to the Random # column in
ascending order. Then the 10 families with the 10
smallest random numbers are the ones in the sample.
– Means. Use the AVERAGE, MEDIAN and STDEV
functions in row 6 to calculate summary statistics of the
first 10 incomes in column F.
16
SIMPLE RANDOM
SAMPLING
The result of all the
operations are shown
on right
17
STRATIFIED RANDOM SAMPLING
• Suppose we can identify various sub-populations within
the total population. We call these sub-populations strata.
• It makes sense to select a simple random sample from
the stratum instead of from the entire population. This is
called stratified sampling.
• This method is particularly useful when there is
considerable variation between the various strata but
relatively little variation within a given stratum.
18
STRATIFIED RANDOM SAMPLING
• To obtain a stratified random sample we must choose a
total sample size n, and we must choose a sample size ni
for each stratum i.
• There are many ways to choose these numbers but the
most popular method is proportional sample sizes.
• The advantage of proportional sample sizes is that they
are very easy to determine. The disadvantage is that they
ignore differences in variability among the strata.
19
STRATIFIED RANDOM SAMPLING
• Sears has data on all 1000 people in the city of Smalltown
who have Sears credit cards.
• Sears is interested in estimating the average number of
other credit cards these people own, as well as other
information about their use of credit.
• The company decides to stratify these customers by age,
select a stratified sample of size 100 with proportional
sample sizes, and then contact these 100 people by
phone.
20
STRATIFIED RANDOM SAMPLING
• First, Sears must decide exactly how to stratify by age.
• The reasoning is that different age groups probably have
different attitudes and behavior regarding credit.
• After preliminary investigation they decide to have three
age categories: 18-30, 31-62, and 63-80.
• Number of customers in each category are as follows:
Category
Number of Customers
18 to 30
132
31 to 62
766
63 to 80
102
1000
21
STRATIFIED RANDOM SAMPLING
• In a stratified random sampling with proportional sample
sizes, the total sample size of 100 is distributed in 3
categories as follows:
Category
Number of Customers
Sample Size
18 to 30
132
132*100/100013
31 to 62
766
766*100/100077
63 to 80
102
102*100/100010
1000
100
22
CLUSTER SAMPLING
• Suppose a company is interested in various
characteristics of households in a particular city. The
sampling units are households.
• We could proceed with the sampling methods discussed
but it would be more convenient another way.
• We could divide the city into city blocks as sampling units
and then sample all the households in the chosen blocks.
• In this case the city blocks are called clusters and the
sampling is called cluster sampling.
23
CLUSTER SAMPLING
• The advantage of cluster sampling is sampling
convenience (and possibly less cost).
• It is straightforward to select a cluster sample. The key is
to define the sampling units as the clusters, then select a
simple random sample of clusters. Then sample all the
population members in each selected cluster.
24
Download