NUMERICALLY SUMMARIZING DATA NOTATION N = SIZE OF POPULATION n = SIZE OF SAMPLE µ = MEAN OF POPULATION X = MEAN OF SAMPLE Σ = SUM OF INDIVIDUALS σ = POPULATION STANDARD DEVIATION S = SAMPLE STANDARD DEVIATION MEASURES OF CENTRAL TENDENCY (3.1) MEAN (or average) of a POPULATION : N X i 1 i N n MEAN (or average) of a SAMPLE: X X i 1 n i MEASURES OF CENTRAL TENDENCY 5.3 5.5 5.6 5.7 5.7 5.8 5.9 6.2 6.3 6.3 6.4 6.6 6.6 6.7 6.8 7.1 7.1 7.3 7.6 7.9 n X X i 1 n i 128.4 6.42 20 MEASURES OF CENTRAL TENDENCY The Sample Mean is ONLY an estimation of the (real) Population Mean. To know the (real) Population Mean, all the individuals of the population must be used in the calculation. MEASURES OF CENTRAL TENDENCY LAW OF LARGE NUMBERS: AS n N THEN X In other words, as the sample size gets closer to the population size the sample mean gets closer to the real population mean. MEASURES OF CENTRAL TENDENCY TRIM MEAN: Remove the minimum value and the maximum value and the find the sample mean. Used to remove possible outliers and find a more reasonable mean. MEASURES OF CENTRAL TENDENCY MEDIAN: Middle value (if n is odd) or the average of the two middle values (if n is even). 5.3 5.5 5.6 5.7 5.7 5.8 5.9 6.2 6.3 6.3 6.4 6.6 6.6 6.7 6.8 7.1 7.1 7.3 7.6 7.9 MEDIAN=Average of 6.3 & 6.4 = 6.35 MEASURES OF CENTRAL TENDENCY MODE: The most frequent value(s). Could be none or several. 5.3 5.5 5.6 5.7 5.7 5.8 5.9 6.2 6.3 6.3 6.4 6.6 6.6 6.7 6.8 MODES: 5.7, 6.3, 6.6, 7.1 7.1 7.1 7.3 7.6 7.9 MEASURES OF CENTRAL TENDENCY SKEWED RIGHT FREQUENCY 20 15 10 5 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 VALUES MODE = 0.2 MEDIAN = 0.3 MEAN = 0.39 MEASURES OF DISPERSION (3.2) RANGE = MAXIMUM – MINIMUM 5.3 5.5 5.6 5.7 5.7 5.8 5.9 6.2 6.3 6.3 6.4 6.6 6.6 6.7 6.8 7.1 7.1 7.3 7.6 7.9 RANGE = 7.9 – 5.3 = 2.6 MEASURES OF DISPERSION Need a better way. Two distributions can have the same range, but the can have significantly different dispersions. Example using Dot Plots: MEASURES OF DISPERSION Using the average distance from the mean for all data points would be better. Need to find the mean and then find each difference x x . Then add them all together and divide by the number of points. But there are a few problems. MEASURES OF DISPERSION STANDARD DEVIATION can be thought of as the average distance of the values from the mean. N 2 Population: i1 X i Sample: N X n s i 1 i X n 1 2 MEASURES OF DISPERSION X X - MEAN = (X - MEAN)^2 5.3 5.3 - 6.42 = -1.12 1.25 5.5 5.5 - 6.42 = -0.92 0.85 * * * * * * * * * * * * 7.3 7.3 - 6.42 = 0.88 0.77 7.6 7.6 - 6.42 = 1.18 1.39 7.9 7.9 - 6.42 = 1.48 2.19 SUM(X-MEAN)^2 /(n-1) = 0.53 SQRT[SUM(X-MEAN)^2 /(n-1)] = 0.73 MEASURES OF DISPERSION ALTERNATIVE FORMULAS: N X i N 2 X i i 1 N i 1 N 2 2 X 2 i n 128.4 2 i 1 X 834.44 i n 20 s i 1 n 1 19 n MEASURES OF DISPERSION VARIANCE: The square of the Standard Deviation. Population VARIANCE 2 2 Sample VARIANCE s SYMBOL SUMMARY ITEM SIZE MEAN STANDARD DEVIATION VARIANCE POPULATION PARAMETER SAMPLE STATISTIC N ns x 2 s s2 USING THE CALCULATOR STAT EDIT: Enter Data Into L1 STAT CALC 1: 1-Var Stats ENTER 2nd “1” (L1) ENTER EXAMPLE EMPERICAL RULE FOR A DISTRIBUTION NORMALLY DISTRIBUTED: Approximately 68% of the population are within the range of 1 Approximately 95% of the population are within the range of 2 Approximately 99.7% of the population are within the range of 3 EMPERICAL RULE EMPERICAL RULE EXAMPLE: Let Mean = 25 and Std. Dev. = 5. Find the % of population: Greater than 25 Between 20 and 30 Between 15 and 25 Between 20 and 35 Less than 15 CHEBYSHEV’S INEQUALITY FOR ANY DISTRIBUTION: THE PERCENT OF THE POPULATION WITHIN +/- K STANDARD DEVIATIONS OF THE MEAN IS GIVEN BY EXAMPLE: IF K=2.5 THEN 1 1 2 *100% K 84% OF THE POPULATION IS WITHIN +/- 2.5 STANDARD DEVIATIONS OF THE MEAN MEASURES OF CENTRAL TENDENCY AND DISPERSION OF GROUPED DATA (3.3) MEAN AND STANDARD DEVIATION FROM FREQUENCY DISTRIBUTIONS MEAN: x * f f i i i STANDARD DEVIATION: ( X ) f i 1 i i N N 2 * fi i 1 N X * f i i i 1 2 X i * fi fi 2 f i IF FROM CONTINUOUS FREQUENCY DISTRIBUTION, USE THE MIDPOINT FROM EACH CLASS. TO DO ON CALCULATOR, ENTER TABLE IN L1 & L2. THEN DO STAT CALC 1: 1-Var Stats ENTER L1,L2 WEIGHTED AVERAGE Calculated like mean for frequency distribution. X * f f i i i Example using Grade Point Average GRADE GRADE VALUE (x) CREDIT HOURS (f) x*f C 2 3 6 B 3 4 12 A 4 3 12 A 4 2 8 B 3 4 12 16 50 50 3.1 16 WEIGHTED AVERAGE Calculating Grades in a Course: Labs Worth 10% Homework Worth 8% Tests Worth 60% Final Worth 22% MEASURES OF RELATIVE POSITION (3.4) Defined as where a data point is (on a number line) relative to the other data points in the distribution. MEASURES OF RELATIVE POSITION Z-SCORE: How far a data point is from the Mean in terms of Std. Dev.’s X X X Z s MEASURES OF RELATIVE POSITION Used to compare relative position of data in two separate groups. “A” has a score of 78 in a class with a mean of 84 and a std. dev. of 6. “B” has a score of 86 in a class with a mean of 90 and a std. dev of 3. Who did better relative to their class? How would you compare baseball pitchers? MEASURES OF RELATIVE POSITION Percentile: The value for which k% of the data set is ≤ Pk. For instance if P18=7.6, then 18% of the sample or population is less than or equal to 7.6 and 82% are greater than 7.6. If your MATH SAT score was in the 92 percentile, then 92% of the population had a score less than OR equal to yours. MEASURES OF RELATIVE POSITION Three important percentiles: P25 = Q1: 25% of the data ≤ Q1 P50 = Q2: 50% of the data ≤ Q2 (median) P75 = Q3: 75% of the data ≤ Q3 MEASURES OF RELATIVE POSITION 5.3 5.5 5.6 5.7 5.7 5.8 5.9 6.2 6.3 6.3 6.4 6.6 6.6 6.7 6.8 7.1 7.1 7.3 7.6 7.9 Using Calculator: STAT EDIT: Enter Data Into L1 STAT CALC 1: 1-Var Stats ENTER 2nd “1” (L1) ENTER Five Number Summary & Box Plots (3.5) FIVE NUMBER SUMMARY MIN, Q1, MEDIAN, Q3, MAX BOX PLOT Good for comparing distributions UNUSUAL VALUES Inter Quartile Range: IQR = Q3 – Q1. 1.5IQR Any Value Less Than Is Considered Unusual (called lower fence). 1.5IQR Any Value Greater Than Is Considered Unusual (called upper fence). OTHER DETERMINATIONS OF UNUSUAL VALUES If the Z-Score is less than – 2 or greater than +2 the value that corresponds to that x x x Z Z-Score is unusual. Recall s . Another way to say this is that if a value is outside the boundaries created by x 2s or 2 is considered unusual. QUOTES “Facts are stubborn, but statistics are more pliable.” Mark Twain “Statistics are used much like a drunk uses a lamppost: for support, not illumination.” Vin Scully “In baseball, my theory is to strive for consistency, not to worry about the numbers. If you dwell on statistics you get shortsighted, if you aim for consistency, the numbers will be there at the end.” Tom Seaver