HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 4 Describing Data from One Variable HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Ch 4. Describing Data From One Variable Sections 4.1-4.3a Measures of Location 4.1 Measures of Location Objectives: • To calculate the mean, median, and mode. • To determine the most appropriate measure of center. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Measures of Location: • If we think about a data set as a group of data values that cluster around some central value, then the central value provides a focal point for the set, a location of sorts. • Unfortunately, the notion of central value is a vague concept, which is as much defined by the way it is measured as by the notion itself. • There are several statistical measures that are used to define the notion of center: the arithmetic mean, trimmed mean, median, and mode. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location The Arithmetic Mean: • Suppose there are n observations in a data set, consisting of the observations x1 , x 2 , ..., x n ; then the arithmetic mean is 1 x x ... x . n 1 2 n •The mean is what we typically call the “average” of a data set. • To calculate the mean, simply add all the values and divide by the total number in the data set. • Mean should only be used for quantitative data. • Outliers have a dramatic effect on the mean value. HAWKES LEARNING SYSTEMS Describing Data from One Variable Section 4.1 Measures of Location math courseware specialists The Arithmetic Mean: • If we use mathematical notation, the formula can be simplified to x th ni where x is the i data value in the data set and (pronounced sigma) is a mathematical notation for adding values. i • There are two symbols that are associated with mean: • x 1 n • x1 x 2 ... x n th e s a m p le m e a n, a n d 1 N x1 x 2 ... x n th e p o p u la tio n m e a n. • Here n refers to the size of the sample and N refers to the size of the population. Otherwise, the calculations are made in precisely the same way. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Example: Calculate the sample mean of the following heights in inches. 63, 68, 71, 67, 63, 72, 66, 67, 70 Solution: 607 9 When calculating the mean, round to one more decimal place than what is given in the data. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Deviation: • Given some point A and a data point x, then x – A represents how far x deviates from A. This difference is also called a deviation. • The table below shows the deviations from the mean for the following sample data values: 4, 10, 7, 15. The mean of this data set is 9. Deviations Data Value from the mean xi (xi – 9) 1 x = 4 + 10 + 7 + 15 = 9. 4 –5 4 10 1 7 –2 15 6 x i 9=0 Notice that the sum of the deviations is zero. This illustrates why the mean is a measure of central tendency. If we calculate the deviations about any other value the sum of the deviations will not equal zero. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location The Median: • The median of a set of data values is the middle value in an ordered array. The same number of values is on either side of the median value. Median is the sum of the two middle values in the data divided by two. Arrange the data in ascending order. Count the number of values in the data Median is the middle value in the data. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.1 Measures of Location math courseware specialists Example: Calculate the median of the following sets of data. a. 15 16 11 22 19 10 17 22 Solution: 10 11 15 16 17 19 22 22 16 + 17 = 1 6 .5 2 b. 2.6 3.3 5.0 1.8 0.7 2.2 4.1 6.1 6.7 Solution: 0.7 1.8 2.2 2.6 3.3 4.1 5.0 6.1 6.7 HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location The Trimmed Mean: • The trimmed mean ignores an equal percentage of the highest and lowest values in calculating the mean. For calculating 10% trimmed mean, arrange the data in ascending order Delete the lowest 10% of the values Delete the highest 10% of the values Calculate the arithmetic mean of the remaining 80% of the values. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Example: Consider the following data: 16 18 20 21 23 23 24 32 36 42 mean = 25.5 median = 23 Find the 10% trimmed mean. Solution: Since there are 10 observations, removing the highest 10% and lowest 10% means only removing one observation from each end of the data. 1 0 % trim m e d m e a n = 1 8 + 2 0 + 2 1+ 2 3 + 2 3 + 2 4 +3 2+3 6 8 = 2 4 .6 2 5 HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Resistant Measures: • Statistical measures which are not affected by outliers are said to be resistant. • The mean is not a resistant measure. • The trimmed mean is a resistant measure. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location The Mode: • The mode of a data set is the most frequently occurring value. • The mode is the only measure of centralness that can be applied to nominal data. • When a data set has two modes it is said to be bimodal. • When the data set has more than two modes it is said to be multimodal. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Example: Calculate the mode of each data set. a. 63 68 71 67 63 72 66 67 70 Solution: There are two modes: 63 and 67. The data set is bimodal. b. 51 77 54 51 68 70 54 65 51 Solution: 51 occurs three times. The mode is 51. c. 1 5 7 3 2 0 4 6 Solution: Each value appears only once. There is no mode. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location The Relationship between the Mean and Median: • The shape of the data determines how the mean, median, and mode are related. • For a bell-shaped distribution, the mean, median, and mode are identical. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.1 Measures of Location Skewed Distributions: • Not all data produce distributions which follow a bell-shaped curve. • If the distribution of the data has a long tail to the right, it is said to be skewed to the right, or positively skewed. • Conversely, if the distribution has a long tail on the left, the data is said to be skewed to the left, or negatively skewed. If the data is positively skewed, the median will be smaller than the mean. If the data is negatively skewed, the median will be larger than the mean. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.2 Selecting a Measure of Location Selecting a Measure of Location: • The objective of using descriptive statistics is to provide measures which convey useful summary information about the data. • When selecting a statistic to represent the central value of a data set, the first question involves what type of data is being analyzed. • The arithmetic mean is frequently, but not always, the most reasonable measure of centralness. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.2 Selecting a Measure of Location math courseware specialists Selecting a Measure of Location: To the right is a table that defines the applicable levels of measurement for each measure of location. Measure of Location Qualitative nominal ordinal median t-mean Quantitative interval ratio mean mode Measure of Location Applicable Level of Measurement not very sensitive sensitive mean median mode t-mean To the left is a table that defines the sensitivity to outliers for each measure of location. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.2 Selecting a Measure of Location Selecting a Measure of Location: • The mean and median are the same value when the data is symmetrical. • When the data is nominal or ordinal, the mean should not be calculated. • When the data is at least interval and there are no outliers the mean is a reasonable choice. • When the data is at most ordinal, then the median is the best choice. • The median is a good measure of central tendency since it is not sensitive to outliers. • The median can be applied to all levels of measurement except nominal. • The mode can be applied to all levels of data, but is not very useful for quantitative data. • If the data is nominal, there is only one choice, the mode. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.2 Selecting a Measure of Location Time Series Data and Measures of Centralness: • The graph below shows the average gas price over a number of years. In this non-stationary time series, the central value of the process is trending upward. • One way to capture this movement is with a moving average. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.2 Selecting a Measure of Location math courseware specialists Moving Average: • A moving average is obtained by adding consecutive observations for a number of periods and dividing the result by the number of periods included in the average. • The table below shows the average US gas price from 1991 to 2002 along with the 2 and 3 period moving averages. Year Average US Gas Price 2 Period Moving Average 3 Period Moving Average Year Average US Gas Price 2 Period Moving Average 3 Period Moving Average 1991 1.09 1997 1.18 1.195 1.167 1992 1.10 1.095 1998 1.01 1.095 1.333 1993 1.07 1.085 1.087 1999 1.14 1.075 1.110 1994 1.08 1.075 1.083 2000 1.49 1.315 1.213 1995 1.11 1.095 1.087 2001 1.38 1.435 1.337 1996 1.21 1.160 1.133 2002 1.34 1.360 1.403 • The two-period moving average for 1992 1.09 +1.10 =1.095. averages the time series in 1991 and 1992: 2 HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.2 Selecting a Measure of Location Moving Average: • The chart below displays the time series and the two and threeperiod moving averages. • Notice that both of the averages follow the time series quite closely. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Ch 4. Describing Data From One Variable Sections 4.1-4.3b Measures of Dispersion 4.1 Measures of Location Objective: •To compute the range, variance, and standard deviation. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.3 Measures of Dispersion Measuring Variation: • Many of the good measures of variation use the concept of deviation from the mean. • If the mean is a focal point or base, use it as a common basis from which to measure variation. • The distance that a point is from its mean is called the deviation from the mean. • The sum of the positive deviations equals the sum of the absolute values of the negative deviations. •The deviations will always sum to zero. • Many of the variability measures average the deviations in some form. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.3 Measures of Dispersion math courseware specialists Example: A data set and its deviations from the mean are calculated in the table below. Notice that the sum of the deviations is zero. Data set: 3, 12, 20, 15, 0 Mean = 10 Data Values Deviations from the mean data – mean = deviation 3 3 – 10 = –7 12 12 – 10 = 2 20 20 – 10 = 10 15 15 – 10 = 0 0 – 10 = 5 – 10 Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.3 Measures of Dispersion math courseware specialists Mean Absolute Deviation: • The sample mean absolute deviation (MAD) is MAD = xi - x . n • Computes the average distance from the mean of a data set. • If data set A has a larger deviation than B, then it is reasonable to believe that data set A has more variability than data set B. • Intuitive measure of variation. • Theoretical development has been hampered due to the difficulty that absolute values pose to calculus. • Sensitive to outliers and not a resistant measure. HAWKES LEARNING SYSTEMS Describing Data from One Variable Section 4.3 Measures of Dispersion math courseware specialists Example: Suppose six people participated in a 1000 meter run. Their times, measured in minutes, are given below. The mean time is 8.333 minutes. Calculate the mean absolute deviation. Time in min. 4 10 9 11 9 7 Deviation Absolute Deviation % of total 4 – 8.333 = – 4.333 10 – 8.333 = 1.667 9 – 8.333 = 0.667 11 – 8.333 = 2.667 4.333 1.667 0.667 2.667 38.23 14.71 5.88 23.53 9 – 8.333 = 0.667 7 – 8.333 = – 1.333 0.667 1.333 5.88 11.77 11.334 100.00 Total 4.333+1.667+0.667+2.667+0.667+1.333 = 4 .3 3 3 1 0 0 = 3 8 .2 3 1 1 .3 3 4 M e a n A b so lu te D e v ia tio n = 1 1 .3 3 4 =1 .8 8 9 m in u te s. 6 HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.3 Measures of Dispersion Variance and Standard Deviation: • Standard deviation and variance are the most common measures of variability. • The standard deviation and variance also provide numerical measures of how the data varies around the mean. • If the data is tightly packed around the mean, the standard deviation and variance will be relatively small. • If the data is widely dispersed about the mean, the standard deviation and variance will be relatively large. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.3 Measures of Dispersion math courseware specialists Variance: • The variance of a data set containing the complete set of population data is given by: 2 (x i ) 2 N and is called the population variance. • The variance of a data set containing sample data is given by: s 2 (xi x ) n 1 and is called the sample variance. 2 Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.3 Measures of Dispersion math courseware specialists Example: Given the following times in minutes of 6 persons running the 1000 meter course, compute the sample variance. The sample mean is 8.333. 4, 10, 9, 11, 9, 7 Data Deviations 4 4 – 8.333 = – 4.333 Squared Deviations 18.7749 10 10 – 8.333 = 1.667 2.7789 8.87 9 9 – 8.333 = 0.667 0.4449 1.42 11 11 – 8.333 = 2.667 7.1129 22.70 9 9 – 8.333 = 0.667 0.4449 1.42 7 7 – 8.333 = – 1.333 1.7769 5.67 31.33 100.00 Total s2 = 59.93 x i x n 1 % of total = 3 1 .3 3 = 6 .2 6 6 sq u a re d m in u te s. 5 HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.3 Measures of Dispersion Standard Deviation: • The standard deviation is the square root of the variance. • There are two measures of variance, so there will be two standard deviations. • The sample standard deviation s = • The population standard deviation s 2 2 • It is important to remember the symbols above since standard deviation is a fundamental statistical concept. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.3 Measures of Dispersion Standard Deviation: • Standard deviation is the square root of the average squared deviation. • It can also be used to measure how far the data values are from the mean. • Relatively few data values will be more than two deviation units from the mean. • Like the variance, the standard deviation is sensitive to outliers. • The presence of outliers tarnishes the interpretation of the standard deviation as a typical deviation. HAWKES LEARNING SYSTEMS Describing Data from One Variable Section 4.3 Measures of Dispersion math courseware specialists Range: • The range is the difference between the largest and smallest data values. Example: Calculate the range of the following data set. 4, 6, 16, 9, 24, 8, 0, 12, 1 Solution: The largest value is 24 and the smallest value is 0. Range = 24 – 0 = 24. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Objectives: • Determine the percentiles and locations of specific data points. • Find the quartiles of the data. • Determine the z-score as a measure of relative position. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Pth Percentile: • Given a set of data x1, x2,…,xn, the Pth percentile is a value say, X, such that at least P percent of the data is less than or equal to X and at least (100 – P) percent of the data is greater than or equal to X. • The most often used measure of relative position is the percentile. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Pth Percentile: To determine the Pth percentile: • Form an ordered array by placing the data in order from smallest to largest • To find the location of the Pth percentile in the ordered array, let P n 100 where n is the number of observations in the ordered data. • If is not an integer, then round to the next greatest integer. • If is an integer, then average the data value in the location with the data value in the 1 location. • Remember, is not the percentile, is the location of the percentile in the ordered array. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.4 Measures of Relative Position math courseware specialists Determining the Pth Percentile Flow Chart: Arrange the data in ascending order. To find the Pth percentile in the ordered data, calculate, P n 100 where n is the number of observations in the ordered data. Is an integer? Yes Average the data value in the location with the data value in the 1 location No Round up to next greatest integer. Find the data value in the th location. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.4 Measures of Relative Position math courseware specialists Example: Find the 50th percentile for the following data set. 3, 5, 0, 1, 9, 2, 7 Solution: 50 7 = 3 .5 100 Since the location is not an integer, the value is rounded up to 4. 0, 1, 2, 3, 5, 7, 9 Thus, the fourth observation in the ordered array would be the median. The median value (which is the 50th percentile) equals 3. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.4 Measures of Relative Position math courseware specialists Example: Find the 50th percentile for the following data set. 3, 5, 0, 1, 9, 2, 7, 6 Solution: 50 8 =4 100 Since the location is an integer, we average the 4th value and the 5th value of the ordered array. 0, 1, 2, 3, 5,6, 7, 9 3+5 = 8 = 4 2 2 The 50th percentile for this data set is 4. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Percentile: • The percentile of some data value x is given by: p e rc e n tile o f x n u m b e r o f d a ta v a lu e s x 1 0 0 to ta l n u m b e r o f d a ta v a lu e s Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.4 Measures of Relative Position math courseware specialists Example: Find the percentile of 45 for the following data set. 67, 45, 63, 58, 35, 54, 27, 66, 21, 48 Solution: The values less than or equal to 45 are: 21, 27, 35, 45, 48, 54, 58, 63, 66, 67 So the number of values less than or equal to 45 is 4. p e rce n tile o f 4 5 = 4 10 1 0 0 = 4 1 0 = 4 0 . HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Quartiles: • The 25th, 50th, and 75th percentiles are known as quartiles and are denoted as Q1, Q2, and Q3. • Quartiles serve as markers to divide the data. • Q1 separates the lowest 25 percent. • Q2 represents the median (50th percentile). • Q3 marks the beginning of the top 25 percent of the data. • Since quartiles are nothing more than percentiles, we construct them in the same way as before. HAWKES LEARNING SYSTEMS Describing Data from One Variable Section 4.4 Measures of Relative Position math courseware specialists Example: Find Q1, Q2, and Q3 for the following data set of test scores. 50, 50, 62, 75, 77, 82, 86, 87, 88, 88 Solution: 25 10 = 2 .5 100 50 10 =5 100 75 10 = 7 .5 100 Q = 2 5 th p e rce n tile = 3 rd d a ta v a lu e = 6 2 . 1 Q = 5 0 th p e rce n tile = 7 7 +8 2 2 = 7 9 .5 . 2 Q = 7 5 th p e rce n tile = 8 th d a ta v a lu e = 8 7 . 3 HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Interquartile Range: • The interquartile range, which describes the range of the middle fifty percent of the data, is given by Interquartile range = Q3 – Q1. • For the previous example the interquartile range is 87 – 62 = 25. • A data point is considered an outlier if it is 1.5 times the interquartile range above the 75th percentile or 1.5 times the interquartile range below the 25th percentile. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Section 4.4 Measures of Relative Position Box Plots: • An important use of quartiles is the construction of box plots. • Box plots are graphical summaries of data which looks like a box. • It provides an alternative method to the histogram for displaying data. • A box plot is a graphical summary of central tendency, the spread, the skewness, and the potential existence of outliers in the data. • Below is a box plot of the test scores data set. 0 10 20 30 40 50 60 70 80 90 100 110 120 130 • The plot is constructed from five summary measures: • largest data value • smallest data value • 25th percentile • 75th percentile • median HAWKES LEARNING SYSTEMS Describing Data from One Variable Section 4.4 Measures of Relative Position math courseware specialists Example: Find the outliers in this new data set of test scores. 12, 50, 62, 75, 77, 82, 86, 87, 88, 126 Q1 = 62, Q2 = 79.5, Q3 = 87, and interquartile range = 25 Solution: Larger than 75th percentile + 1.5 times the interquartile range = 124.5 8 7 + 1 .5 2 5 = 1 2 4 .5 Smaller than 25th percentile – 1.5 times the interquartile range = 24.5 6 2 1 .5 2 5 = 2 4 .5 The outliers of this data set are 12 and 126. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.4 Measures of Relative Position math courseware specialists Z-Scores: • The z-score transforms the data value into the number of standard deviations that value is from the mean. z x Remember: m ean sta n d a rd d e v ia tio n • Describing the number of standard deviations is a fundamental concept of statistics. • It is used as a standardization technique. • If the z-score is negative, the value is less than the mean. • If the z-score is positive, the value is greater than the mean. • The z-score is unit free measure. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.4 Measures of Relative Position math courseware specialists Example: Suppose you scored an 86 on your biology test and a 94 on your psychology test. The Course mean and standard deviation of the two tests are given to the right. Biology What are the z-scores for your two tests? Psychology On which test did you perform relatively better? Mean Standard Deviation 74 10 82 11 Solution: The z-score for the biology test is: z= The z-score for the psychology test is: 86 74 = 1.2. 10 z= 94 82 = 1 .0 9 . 11 Even though the raw score on the psychology test is larger than the raw score on the biology test, the performance on the biology test was slightly better. HAWKES LEARNING SYSTEMS math courseware specialists Describing Data from One Variable Sections 4.5-4.10 Applying the Standard Deviation Objectives: • To calculate the coefficient of variation and use it to compare the variation of different data sets. • To calculate the mean, variance, and standard deviation of grouped data. • To use the empirical rule and Chebyshev’s Theorem to describe the variability of data. HAWKES LEARNING SYSTEMS math courseware specialists Empirical Rule: Describing Data from One Variable Section 4.5 Using the Standard Deviation If the distribution is bell-shaped: One sigma rule: about 68% of the data should lie within one standard deviation of the mean. A deviation of more than one sigma is to be expected once every three observations. Two sigma rule: about 95% of the data should lie within two standard deviations of the mean. A deviation of more than two sigma is to be expected about once every twenty observations. Three sigma rule: about 99.7% of the data should lie within three standard deviations of the mean. A deviation of more than three sigma is to be expected about once every 333 observations, slightly less than 0.3% of the time. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.5 Using the Standard Deviation math courseware specialists Chebyshev’s Theorem: • The proportion of any data set lying within k standard deviations of the mean is at least 1 1 k • k = 2: At least 1 1 2 2 2 , fo r k 1 . = 3 4 (or 75%) of the data values lie within 2 standard deviations of the mean, for any data set. • k = 3: At least 1 1 3 2 = 8 9 (or 88.9%) of the data values lie within 3 standard deviations of the mean, for any data set. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.8 The Coefficient of Variation math courseware specialists Coefficient of Variation: • The coefficient of variation compares the variation in data sets. • For sample data: CV s 1 0 0 % x • For a population: CV 1 0 0 % • The coefficient of variation standardizes the variation measure. Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.9 Analyzing Grouped Data math courseware specialists Finding the Mean of Grouped Data: • Finding the mean of grouped data involves finding the midpoint of each of the classes in the frequency distribution and then weighting each of these midpoints by the number of observations in the class. Let f i n u m b e r o f o b se rv a tio n s in th e i th g ro u p , N th e to ta l n u m b e r o f o b se rv a tio n s in a ll cla sse s, N M i m id p o in t o f th e i th fi, cla ss, a n d n th e n u m b e r o f o b se rv a tio n s in th e sa m p le . • For a population the mean of grouped data is given by fi M i . N • If the grouped data represent sample observations the mean is given by x fi M i n . Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.9 Analyzing Grouped Data math courseware specialists Finding the Variance of Grouped Data: • Let f i n u m b e r o f o b se rv a tio n s in th e i th g ro u p , N th e to ta l n u m b e r o f o b se rv a tio n s in a ll cla sse s, N M i m id p o in t o f th e i th cla ss, a n d n th e n u m b e r o f o b se rv a tio n s in th e sa m p le . • The population variance of grouped data is given by the expression 2 fi M i 2 fi M i 2 N N fi M i 2 N • The sample variance is given by s 2 fi M i 2 n 1 fi M i n 2 . 2 fi M i . N fi, HAWKES LEARNING SYSTEMS Describing Data from One Variable Section 4.10 Proportions math courseware specialists Proportions: • A proportion measures the fraction of a group that possesses some characteristic. • To calculate the proportion, simply count the number in the group that possess the characteristic and divide the count by the number in the group. Let X n u m b e r th a t p o sse ss th e ch a ra cte ristic N n u m b e r in th e p o p u la tio n n n u m b e r in th e sa m p le , th e n p X th e p o p u la tio n p ro p o rtio n , a n d N pˆ X th e sa m p le p ro p o rtio n . n ˆ is pronounced p-hat. • The symbol p Describing Data from One Variable HAWKES LEARNING SYSTEMS Section 4.10 Proportions math courseware specialists Example: Suppose your statistics class is composed of 48 students of which 4 are left-handed. What proportion of the class is left-handed? What proportion of the class is right-handed? Solution: p X = 4 .0 8 3 N 48 Then .083 is the proportion of people in the class that are left-handed. p X N 44 .9 1 7 48 Then .917 is the proportion of people in the class that are right-handed.