Essentials of Marketing Research Chapter 13: Determining Sample Size WHAT DO STATISTICS MEAN? • DESCRIPTIVE STATISTICS – NUMBER OF PEOPLE – TRENDS IN EMPLOYMENT – DATA • INFERENTIAL STATISTICS – MAKE AN INFERENCE ABOUT A POPULATION FROM A SAMPLE POPULATION PARAMETER VERSUS SAMPLE STATISTICS POPULATION PARAMETER • VARIABLES IN A POPULATION • MEASURED CHARACTERISTICS OF A POPULATION • GREEK LOWER-CASE LETTERS AS NOTATION, e.g. m, s, etc. SAMPLE STATISTICS • VARIABLES IN A SAMPLE • MEASURES COMPUTED FROM SAMPLE DATA • ENGLISH LETTERS FOR NOTATION – e.g., X or S MAKING DATA USABLE • Data must be organized into: – FREQUENCY DISTRIBUTIONS – PROPORTIONS – CENTRAL TENDENCY • MEAN, MEDIAN, MODE – MEASURES OF DISPERSION • range, deviation, standard deviation, variance Frequency Distribution of Deposits Amount Frequency Percent Probability Under $3,000 499 16 .16 $3,000-$4,999 530 17 .17 $5,000-$9,999 562 18 .18 $10,000$14,999 718 23 .23 $15,000 or more 811 26 .26 Total 100 1 3,120 MEASURES OF CENTRAL TENDENCY • MEAN - ARITHMETIC AVERAGE • MEDIAN - MIDPOINT OF THE DISTRIBUTION • MODE - THE VALUE THAT OCCURS MOST OFTEN Number of Sales Calls Per Day by Salespersons Salesperson Mike Patty Billie Bob John Frank Chuck Samantha Number of Sales calls 4 3 2 5 3 3 1 5 26 Sales for Products A and B, Both Average 200 Product A 196 198 199 199 200 200 200 201 201 201 202 202 Product B 150 160 176 181 192 200 201 202 213 224 240 261 MEASURES OF DISPERSION • THE RANGE • STANDARD DEVIATION Low Dispersion Versus High Dispersion 5 4 Low Dispersion 3 2 1 150 160 170 180 190 200 Value on Variable 210 5 4 High dispersion 3 2 1 150 160 170 180 190 Value on Variable 200 210 Standard Deviation 2 2 S= S = (X - X) n - 1 THE NORMAL DISTRIBUTION • NORMAL CURVE • BELL-SHAPED • ALMOST ALL OF ITS VALUES ARE WITHIN PLUS OR MINUS 3 STANDARD DEVIATIONS • I.Q. IS AN EXAMPLE NORMAL DISTRIBUTION MEAN Normal Distribution 13.59% 2.14% 34.13% 34.13% 13.59% 2.14% An example of the distribution of Intelligence Quotient (IQ) scores 13.59% 34.13% 13.59% 34.13% 2.14% 2.14% 70 85 100 IQ 115 130 STANDARDIZED NORMAL DISTRIBUTION • SYMMETRICAL ABOUT ITS MEAN • MEAN IDENTIFIES HIGHEST POINT • INFINITE NUMBER OF CASES - A CONTINUOUS DISTRIBUTION • AREA UNDER CURVE HAS A PROBABILITY DENSITY = 1.0 • MEAN OF ZERO, STANDARD DEVIATION OF 1 A STANDARDIZED NORMAL CURVE -2 -1 0 1 2 STANDARDIZED SCORES •POPULATION DISTRIBUTION •SAMPLE DISTRIBUTION •SAMPLING DISTRIBUTION POPULATION DISTRIBUTION -s m s x SAMPLE DISTRIBUTION _ C S X SAMPLING DISTRIBUTION µX SX C STANDARD ERROR OF THE MEAN STANDARD DEVIATION OF THE SAMPLING DISTRIBUTION CENTRAL LIMIT THEOREM PARAMETER ESTIMATES • POINT ESTIMATES • CONFIDENCE INTERVAL ESTIMATES RANDOM SAMPLING ERROR AND SAMPLE SIZE ARE RELATED SAMPLE SIZE • VARIANCE (STANDARD DEVIATION) • MAGNITUDE OF ERROR • CONFIDENCE LEVEL Determining Sample Size Recap Sample Accuracy • How close the sample’s profile is to the true population’s profile • Sample size is not related to representativeness, • Sample size is related to accuracy Methods of Determining Sample Size • Compromise between what is theoretically perfect and what is practically feasible. • Remember, the larger the sample size, the more costly the research. • Why sample one more person than necessary? Methods of Determining Sample Size • Arbitrary – Rule of Thumb (ex. A sample should be at least 5% of the population to be accurate – Not efficient or economical • Conventional – Follows that there is some “convention” or number believed to be the right size – Easy to apply, but can end up with too small or too large of a sample Methods of Determining Sample Size • Cost Basis – based on budgetary constraints • Statistical Analysis – certain statistical techniques require certain number of respondents • Confidence Interval – theoretically the most correct method Notion of Variability Little variability Great variability Mean Notion of Variability • Standard Deviation – approximates the average distance away from the mean for all respondents to a specific question – indicates amount of variability in sample – ex. compare a standard deviation of 500 and 1000, which exhibits more variability? Measures of Variability • Standard Deviation: indicates the degree of variation or diversity in the values in such as way as to be translatable into a normal curve distribution • Variance = (x-x)2/ (n-1) • With a normali curve, the midpoint (apex) of the curve is also the mean and exactly 50% of the distribution lies on either side of the mean. Normal Curve and Standard Deviation Number of standard deviations from the mean +/- 1.00 st dev Percent of area under the curve Percent of area to the right or left 68% 16% +/- 1.64 st dev 90% 5% +/- 1.96 st dev 95% 2.5% +/- 2.58 st dev 99% 0.5% Notion of Sampling Distribution • The sampling distribution refers to what would be found if the researcher could take many, many independent samples • The means for all of the samples should align themselves in a normal bell-shaped curve • Therefore, it is a high probability that any given sample result will be close to but not exactly to the population mean. Normal, bell-shaped curve Midpoint (mean) Notion of Confidence Interval • A confidence interval defines endpoints based on knowledge of the area under a bell-shaped curve. • Normal curve – 1.96 times the standard deviation theoretically defines 95% of the population – 2.58 times the standard deviation theoretically defines 99% of the population Notion of Confidence Interval • Example – Mean = 12,000 miles – Standard Deviation = 3000 miles • We are confident that 95% of the respondents’ answers fall between 6,120 and 17,880 miles 12,000 + (1.96 * 3000) = 17,880 12,000 - (1.96 * 3000) = 6.120 Notion of Standard Error of a Mean • Standard error is an indication of how far away from the true population value a typical sample result is expected to fall. • Formula – S X = s / (square root of n) – S p = Square root of {(p*q)/ n} • • • • • where S p is the standard error of the percentage p = % found in the sample and q = (100-p) S X is the standard error of the mean s = standard deviation of the sample n = sample size Computing Sample Size Using The Confidence Interval Approach • To compute sample size, three factors need to be considered: – amount of variability believed to be in the population – desired accuracy – level of confidence required in your estimates of the population values Determining Sample Size Using a Mean • Formula: n = (pqz2)/e2 • Formula: n = (s2z2)/e2 • Where – n = sample size – z = level of confidence (indicated by the number of standard errors associated with it) – s = variability indicated by an estimated standard deviation – p = estimated variability in the population – q = (100-p) – e = acceptable error in the sample estimate of the population Determining Sample Size Using a Mean: An Example • 95% level of confidence (1.96) • Standard deviation of 100 (from previous studies) • Desired precision is 10 (+ or -) • Therefore n = 384 – (1002 * 1.962) / 102 Practical Considerations in Sample Size Determination • How to estimate variability in the population – prior research – experience – intuition • How to determine amount of precision desired – small samples are less accurate – how much error can you live with? Practical Considerations in Sample Size Determination • How to calculate the level of confidence desired – risk – normally use either 95% or 99% Determining Sample Size • Higher n (sample size) needed when: – the standard error of the estimate is high (population has more variability in the sampling distribution of the test statistic) – higher precision (low degree of error) is needed (i.e., it is important to have a very precise estimate) – higher level of confidence is required • Constraints: cost and access Notes About Sample Size • Population size does not determine sample size. • What most directly affects sample size is the variability of the characteristic in the population. – Example: if all population elements have the same value of a characteristic, then we only need a sample of one!