1 2 Parameters and statistics DEFINITION: The population is the entire group of objects or individuals under study, about which information is wanted. A unit is an individual object or person in the population. The units are often called subjects if the population consists of people. A sample is a part of the population that is actually used to get information. A variable is a characteristic of interest to be measured for each unit in the sample. The size of the population is denoted by the capital letter N. The size of the sample is denoted by the small letter n. Population Unit Population size N = 16 Sample size n=4 Sample DEFINITION: A parameter is a numerical value that would be calculated using all of the values of the units in the population. A statistic is a numerical value that is calculated using all of the values of the units in a sample. Tip: One way to remember this distinction is this: The letter p is for population and parameter, while the letter s is for statistic and sample. Let's Do It! 1 (1min.) Parameter or Statistic? According to the Campus Housing Fact Sheet at a Big-Ten University, 60% of the students living in campus housing are in-state residents. In a sample of 200 students living in campus housing, 56.5% were found to be in-state residents. Circle your answer. (a) In this particular situation, the value of 60% is a (parameter, statistic). (b) In this particular situation, the value of 56.5% is a (parameter, statistic). 3 DEFINITIONS: A unit is the item or object we observe. When the object is a person, we refer to the unit as a subject. An observation is the information or characteristic recorded for each unit. A characteristic that can vary from unit to unit is called a variable. A collection of observations on one or more variables is called a data set. A Discrete Variable : can only take on a finite (or countable) number of possible values. For example, the number of correct answers on a five-question, multiple-choice test is a discrete variable. 0 Continuous: 1 2 3 4 5 can take on any value in an interval (or collection of intervals). For example, the amount of water poured into a 50-mL glass container. 0 50 50 ml 20 ml 0 ml 4 DEFINITIONS: Qualitative variables are those which classify the units into categories. The categories may or may not have a natural ordering to them. Qualitative variables are also called categorical variables. Quantitative variables have numerical values that are measurements (length, weight, and so on) or counts (of how many). Arithmetic operations on such numerical values do have meaning. We further distinguish quantitative variables based on whether or not the values fall on a continuum. A discrete variable is one for which you can count the number of possible values. A continuous variable can take on any value within a given interval. Qualitative Type of Religion Type of Zip Code Variable Continuous Length Quantitative Discrete # of Children 5 Let's Do It! 2 What Type of Variable? Hurricane Charles, in August 2004, has been blamed for at least 16 deaths. Listed below is information on other major storms and hurricanes that occurred from 1994 to 2003. StormName Tropical Storm Alberto Hurricane Marilyn Hurricane Opal Hurricane Fran Hurricane Bonnie Hurricane Georges Hurricane Floyd Tropical Storm Allison Hurricane Isabel Date Category Estimated Damage/Cost* Jul-94 Sep-95 Oct-95 Sep-96 Aug-98 Sep-98 Sep-99 Jun-01 Sep-03 n/a 2 3 3 3 2 2 n/a 2 $1.2billion $2.5billion $3.6billion $5.8billion $1.1billion $6.5billion $6.5billion $5.1billion $4.0billion Deaths 32 13 27 37 3 16 77 43 47 For each variable, determine whether it is qualitative or quantitative. If the variable is quantitative, state whether it is discrete or continuous. (a) The name of the storm. (b) The date the storm occurred. (c) The category of the storm. (d) The estimated amount of damage or cost of the storm. (e) The number of deaths that occurred. 6 Think About It! A number of packages are brought to a mailing center. The packages are weighed and the results are recorded as 9 pounds, 5 pounds, 4 pounds, 12 pounds, 20 pounds, and so on. These values are all whole numbers. Does this imply that the variable “weight” is discrete? The variable “weight” is continuous. We have just measured weight to the nearest pound. A package having a value for weight of 12 pounds could actually weigh 12.2 pounds, or 11.9975 pounds, or any value in the interval from 11.5 to 12.5. Key Point: Don’t let the appearance of the data after they are recorded be misleading as to their type. Consider again the variable “weight.” Packages weighing under 5 pounds are classified as light and cost a fixed amount to ship. Packages weighing over 20 pounds are classified as heavy and cost a fixed amount to ship. Packages weighing between 5 and 20 pounds are classified as medium and cost a fixed amount to ship. We record the variable “weight,” which takes on the values light, medium, or heavy. Now the variable “weight” is qualitative. Key Point: The type of variable depends mainly on the measuring process, not on the property being measured. It is important to ask many questions about the data and how they were obtained, as discussed in the next section. 7 Central tendency of a set DATA SET 1 Suppose you had to give a single number that would represent the most typical age for the 20 subjects. What number would you choose? Measures of center are numerical values that tend to report in some sense the middle of a set of data -- we will focus on the mean and the median. If the data are a sample, the mean and median would be called statistics. If the data form an entire population then these measures of center would be called parameters. Subject # Gender Age 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 M M F F F F M F M M M F F F M F M F M M 45 41 51 46 47 42 43 50 39 32 41 44 47 49 45 42 41 40 45 37 Mean DEFINITION: The mean of a set of n observations is simply the sum of the observations divided by the number of observations, n. Mean age of the 20 subjects in the medical study -add the 20 ages up and divide by 20: 45 41 51 46 4745 37 43.35 years 20 Special notation: If x 1 , x 2 ,..., x n denote a sample of n observations, then the mean of the sample is called "x-bar" and is denoted by: x x n i x1 x2 n xn If you have all of the population values ... the mean of the population = add all of the values up and divide by how many there are. The mean of a population is denoted by the Greek letter μ. 8 Example Mean Number of Children per Household Problem Suppose that the number of children in a simple random sample of 10 households is as follows: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4 (a) Calculate the sample mean number of children per household. (b) Interpret your answer. (c) Suppose that the observation for the last household in the above list was incorrectly recorded as 40 instead of 4.What would happen to the mean? Solution (a) The sample mean number of children per household is given by: x (b) 2 3 0 2 1 0 3 0 1 4 16 16 . . 10 10 We expect about 1.6 children per household, on average. We report 1.6 even though it is not possible to have 1.6 children in any one given household; that is, the 1.6 is not rounded up to say 2. We are reporting a value that we would expect on average, over many samples of 10 households. (c) The sample mean would now be given by: x 2 3 0 2 1 0 3 0 1 40 52 5.2 . 10 10 Note that 9 of the 10 observations are less than the mean. The mean is sensitive to extreme observations. Most graphical displays would have detected this outlying observation. 9 Think About It! Is the Mean Always the Center? Be Careful! Suppose a sample of size n=10 observations is obtained. Can the mean, x , be larger than the maximum value or less than the minimum value? If yes, give an example. Can the mean, x , be the minimum value? Give an example. Can the mean, x , be the maximum value? If yes, give an example. Can the mean, x , be exactly the midpoint between the minimum and maximum value (when the minimum does not equal the maximum)? If yes, give an example. Can the mean, x , be exactly the second smallest value (out of the 10, not all equal observations, when they are ordered from smallest to largest)? If yes, give an example. Can the mean, x , be not equal to any value in the sample? If yes, give an example. Let's Do it! 3 (1min.) A Mean Is Not Always Representative Kim's biology test scores are 7, 98, 25, 19, and 26. Calculate Kim's mean test score. Explain why the mean does not do a very good job at summarizing Kim's test scores. 10 Let's Do It! 4 (2 min.)1 Combining Means We have seven students. The mean score for three of these students is 54 and the mean score for the four other students is 76. What is the mean score for all seven students? The mean = the point of equilibrium, the point where the distribution would balance. If the distribution is symmetric, as in the first picture at the left, the mean would be exactly at the center of the distribution. 1 2 3 Mean =2 1 2 5 As the largest observation is moved further to the right, making this observation somewhat extreme, the mean shifts towards the extreme observation. Mean =2.5 1 2 11 Mean =4 If a distribution appears to be skewed, we may wish also to report a more resistant measure of center. 11 The Mean of Group Data /Frequency Tables The procedure for finding the mean for grouped data uses the midpoints of the classes. This procedure is shown next. Example The data represent the number of miles run during one week for a sample of 20 runners. Solution The procedure for finding the mean for grouped data is given here. Step 1 Make a table as shown. Step 2 Find the midpoints of each class and enter them in column C. Step 3 For each class, multiply the frequency by the midpoint, as shown, and place the product in column D. 1 .8 = 8 , 2 . 13 = 26 etc. The completed table is shown here. Step 4 Find the sum of column D. Step 5 Divide the sum by n to get the mean. 12 Let's Do It! 5: Eighty randomly selected light bulbs were tested to determine their lifetime in hours. The frequency table of the results is shown in table. Find the average lifetime of a light bulb. Life interval in Frequency hours 53-63 64-74 75-85 86-96 97-107 108-118 6 12 25 18 14 5 Let's Do It! 6 The cost per load (in cents) of 35 laundry detergents tested by consumer organization is given below. Class limit 13-19 20-26 27-33 34-40 41-47 48-54 55-61 62-68 Frequency 2 7 12 5 6 1 0 2 13 A measure of center that is more resistant to extreme values is the median. Median DEFINITION: The median of a set of n observations, ordered from smallest to largest, is a value such that half of the observations are less than or equal to that value and half the observations are greater than or equal to that value. If the number of observations is odd, the median is the middle observation. If the number of observations is even, the median is any number between the two middle observations, including either of the two middle observations. To be consistent, we will define the median as the mean or average of the two middle observations. Location of the median: (n+1)/2, where n is the number of observations. The ages of the n = 20 subjects... Calculating (n+1)/2 we get (20+1)/2 = 10.5. So the two middle observations are the 10th and 11th observations, namely 43 and 44. The median is the mean of these two middle observations, (43+44)/2=43.5 years. 32 37 39 40 41 41 41 42 42 43 44 45 10th obs 11th obs 47 47 49 50 51 median = 43.5 45 45 46 14 Let's Do It! 7 (2 min.)1 Median Number of Children per Household Find the median number of children in a household from this sample of 10 households, that is, find the median of Observation Number: Number of Children: 1 2 2 3 3 0 4 1 5 4 6 0 7 3 8 0 9 1 10 2 (a) Order the observations from smallest to largest: (b) Median = ______________ (c) What happens to the median if the fifth observation in the first list was incorrectly recorded as 40 instead of 4? (d) What happens to the median if the third observation in the first list was incorrectly recorded as -20 instead of 0? Note: The median is resistant—that is, it does not change, or changes very little, in response to extreme observations. 15 Another Measure—The Mode DEFINITION: The mode of a set of observations is the most frequently occurring value; it is the value having the highest frequency among the observations. The mode of the values: {0, 0, 0, 0, 1, 1, 2, 2, 3, 4} is 0. For {0, 0, (bimodal) 0, 1, 1, 2, 2, 2, 3, 4} two modes, 0 and 2 What would be the mode for { 0, 1, 2, 4, 5, 8 } ? For {0, 0, 0, 0, 0, 1, 2, 3, 4, 4, 4, 4, 5 } ? The mode is not often used as a measure of center for quantitative data. The mode can be computed for qualitative data. The modal race category is “white.” If categories were given coded as: 1=White, 2=Asian, 3=African-American, 4=Hispanic, 5=American Indian, 6=No category listed, then the mode would be the value 1. 80 70 Percent 60 50 40 30 20 10 0 American Indian No Category Hispanic AfricanAmerican Race Asian White 16 Example Different Measures Can Give Different Impressions Problem : The famous trio—the mean, the median, and the mode— represent three different methods for finding a so-called “center” value. These three values may be the same but are more likely going to be different. When they are different, they can lead to different interpretations of the data being summarized. Consider the annual incomes of five families in a neighborhood: $12,000 $12,000 $30,000 $90,000 $100,000 (a) (b) (c) (d) Calculate the average income. Calculate the median income. Calculate the modal income. If you were trying to promote that this is an affluent neighborhood, which measure might you prefer to present? (e) If you were trying to argue against a tax increase, which measure might you prefer to present? (f) If you want to represent these values with the income that is in the middle, which measure might you prefer to present? The mean income is: x $48,800 100,000 90,000 30,000 12,000 12,000 $48,800 5 The median income is: $30,000 The modal income is: $12,000 If you were trying to promote that this is an affluent neighborhood, you might prefer to report the mean income. If you were trying to argue against a tax increase, you might argue that income is too low to afford a tax increase and report the mode. 17 Effect of the Shape of the Distribution on the Mean, Median, Mode Bell-shaped, Symmetric Bimodal 50% m ean=m edian=m ode mean=median two modes Skewed Right Skewed Left 50% mode mean median HW 1 page 33: 1, 4, 5, 8, 12, 15,19 50% mean mode median 18 MEASURING VARIATION OR SPREAD Both sets of data have the same mean, median and mode but the values obviously differ in another respect -- the variation or spread of the values. The values in List 1 are much more tightly clustered around the center value of 60. The values in List 2 are much more dispersed or spread out. List 1: 55, 56, 57, 58, 59, 60, 60, 60, 61, 62, 63, 64, 65 mean = median = mode = 60 X X XXXXXXXXXXX 35 40 45 50 55 60 65 . 70 75 80 85 List 2: 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80, 85 mean = median = mode = 60 X X X X 35 40 45 50 X 55 X X X X X X X X 60 65 70 75 80 85 . 19 Range The range is the simplest measure of variability or spread. Range is just the difference between the largest value and the smallest value. Range can give a distorted picture of the actual pattern of variation. Two distributions: same range but different patterns of variation. The first distribution has most of its values far from the center, while the second distribution has most of its values closer to the center. X X X 20 X X X X X X X X X X X X X X X X X X 21 22 23 24 25 26 27 28 29 30 X X X X X X X X X X X X X X X X X X X 20 21 22 23 24 25 26 27 28 29 30 20 Interquartile Range The interquartile range measures the spread of the middle 50% of the data. You first find the median (represented by Q2—the value that divides the data into two halves), and then find the median for each half. The three values that divide the data into four parts are called the quartiles, represented by Q1, Q2, and Q3. The difference between the third quartile and the first quartile is called the interquartile range, denoted by IQR=Q3-Q1. Finding the Quartiles 1. Find the median of all of the observations. 2. First Quartile = Q1 = median of observations that fall below the median. 3. Third Quartile = Q3 = median of observations that fall above the median. Notes When the number of observations is odd, the middle observation is the median. This observation is not included in either of the two halves when computing Q1 and Q3. Although different books, calculators, and computers may use slightly different ways to compute the quartiles, they are all based on the same idea. In a left-skewed distribution, the first quartile will be farther from the median than the third quartile is. If the distribution is symmetric, the quartiles should be the same distance from the median. 21 Example Quartiles for Age The ages of the 20 subjects in the medical study are listed below in order. 32, 37, 39, 40, 41, 41, 41, 42, 42, 43, 44, 45, 45, 45, 46, 47, 47, 49, 50, 51 The histogram of the ages is also provided. 32 (a) Calculate the median age. (b) Calculate the first Quartile Q1 for this age data. (c) Calculate the third Quartile Q3 for this age data. (d) Calculate the range for this age data. 37 39 40 41 41 41 42 42 43 44 45 45 45 46 47 47 49 50 51 median = 43.5 Q1 = 41 Q3 = 46.5 Count We see that the distribution of age is approximately symmetric and that the quartiles are about the same distance from the median. 8 6 4 2 30 35 40 45 50 55 The quartiles are actually the 25th, 50th, and 75th percentiles. DEFINITION: The pth percentile is the value such that p% of the observations fall at or below that value and (100 - p)% of the observations fall at or above that value. 22 Five-number summary: Minimum, Q1, Median, Q3, Maximum To Build a Basic Boxplot List the data values in order from smallest to largest. Find the five number summary: minimum, Q1, median, Q3, and maximum. Locate the values for Q1, the median and Q3 on the scale. These values determine the “box” part of the Boxplot. The quartiles determine the ends of the box, and a line is drawn inside the box to mark the value of the median. Draw lines (called whiskers) from the midpoints of the ends of the box out to the minimum and maximum. Example Five-Number Summary and Boxplot for Age Problem Consider the (ordered) ages of the 20 subjects in a medical study : 32, 44, 37, 45, 39, 45, 40, 45, 41, 46, 41, 47, 41, 47, 42, 49, 42, 50, 43, 51 The five-number summary for the age data is given by: min = 32, Q1 = 41, median = 43.5, Q3 = 46.5, and max = 51. Draw the basic boxplot. 23 Side-by-side boxplots are helpful for comparing two or more distributions with respect to the five-number summary. Although the median of the first process is closer to the target value of 20.000 cm, the second process produces a less variable distribution. Using the 1.5 x IQR Rule to Identify Outliers and Build a Modified Boxplot List the data values in order from smallest to largest. Find the five number summary: minimum, Q1, median, Q3, and maximum. Locate the values for Q1, the median and Q3 on the scale. These values determine the “box” part of the boxplot. The quartiles determine the ends of the box, and a line is drawn inside the box to mark the value of the median. Find the IQR = Q3 – Q1. Compute the quantity STEP = 1.5 x (IQR) Find the location of the inner fences by taking 1 step out from each of the quartiles lower inner fence = Q1 – STEP; upper inner fence = Q3 + STEP. Draw the lines (whiskers) from the midpoints of the ends of the box out to the smallest and largest values WITHIN the inner fences. Observations that fall OUTSIDE the inner fences are considered potential outliers. If there are any outliers, plot them individually along the scale using a solid dot. 24 Five-number summary: Min=1 Q1=21 Median=32 Q3=66 Max=325 Inner Fences Potential Outliers Outside value Far Outside value Farthest observations that are not potential outliers Example Any Age Outlier? Let’s apply the "rule of thumb" to our age data set to assess if there are any outliers. (a) Construct the fences for the modified boxplot based on the 1.5 * IQR rule. (b) Are there any outliers using the 1.5 * IQR rule? (c) Construct the modified boxplot. 25 Let's Do It! 8( 3min) 26 Let's Do It! 9 (5min) Comparing Ages—Antibiotic Study Variable = age for 23 children randomly assigned to one of two treatment groups. (a) Give the five-number summary for each of the two treatment groups. Comment on your results. Amoxicillin Group (n=11): 8 9 9 Five-number summary: 10 10 Cefadroxil Group (n=12): 7 8 Five-number summary: 9 9 9 11 10 11 12 14 14 17 10 11 12 13 14 16 (b) Make side-by-side Boxplots for the antibiotic study data in part a. (c) Using our “rule of thumb,” are there any outliers for the Amoxicillin group? If so, modify your Boxplot above. (d) Using our “rule of thumb,” are there any outliers for the Cefadroxil group? If so, modify your Boxplot above. 27 Standard Deviation .…...a measure of the spread of the observations from the mean. .……think of the standard deviation as an “average (or standard) distance of the observations from the mean.” Example Standard Deviation—What Is It? Deviations: -4, 1, 3 Squared Deviations: 16, 1, 9 ----------------------------------------------------------------------------------------Observation Deviation Squared Deviation x x 2 x x x ----------------------------------------------------------------------------------------0 0 - 4 = -4 16 5 5-4= 1 1 7 7-4= 3 9 ----------------------------------------------------------------------------------------mean = 4 sum always = 0 sum = 26 sample variance 4 2 1 2 3 2 31 sample standard deviation 13 36 . 16 1 9 26 13 2 2 28 Interpretation of the Standard Deviation Think of the standard deviation as roughly an average distance of the observations from their mean. If all of the observations are the same, then the standard deviation will be 0 (i.e. no spread). Otherwise the standard deviation is positive and the more spread out the observations are about their mean, the larger the value of the standard deviation. If x 1 , x 2 ,..., x n denote a sample of n observations, the sample variance is denoted by: s 2 x i x 2 n 1 Sample standard deviation, denoted by s , is the square root of the variance: s s2 . The population standard deviation, denoted by the Greek letter (sigma), is the square root of the population variance and is computed as: 2 x i N 2 . Remarks: The variance is measured in squared units. By taking the square root of the variance we bring this measure of spread back into the original units. Just as the mean is not a resistant measure of center, since the standard deviation used the mean in its definition, it is not a resistant measure of spread. It is heavily influenced by extreme values. There are statistical arguments that support why we divide by n 1 instead of n in the denominator of the sample standard deviation. 29 Shortcut formulas for computing the variance and standard deviation are presented next and will be used in the remainder of the chapter and in the exercises. These formulas are mathematically equivalent to the preceding formulas and do not involve using the mean. They save time when repeated subtracting and squaring occur in the original formulas. They are also more accurate when the mean has been rounded. Example Consistency of Weight Loss Program In a recent study of the effect of a certain diet on weight reduction, 11 subjects were put on the diet for two weeks and their weight loss/gain in lbs was measured (positive values indicate weight loss). 1, 1, 2, 2, 3, 2, 1, 1, 3, 2.5, -23. What is the standard deviation of the weight loss? Solution x 1 1....2.5 23 4.5 , x 2 12 12 ...2.52 (23) 2 569.25 The standard deviation of this sample is s 569.25 ( 4.5) 2 / 11 7.5327 10 Let's Do It! 10 Emergency Room Patients The following are the ages of a sample of 20 patients seen in the emergency room of a hospital on a Friday night. 35 32 21 43 39 60 36 12 54 45 37 53 45 23 64 10 34 22 36 55 Find the standard deviation of the ages. 30 Variance and Standard Deviation for Grouped Data The procedure for finding the variance and standard deviation for grouped data is similar to that for finding the mean for grouped data, and it uses the midpoints of each class. Example The data represent the number of miles that 20 runners ran during one week. Find the variance and the standard deviation for the frequency distribution of the data. Solution Step1 Make a table as shown, and find the midpoint of each class. Step 2 Multiply the frequency by the midpoint for each class, and place the products in column D. 1 .8 = 8, 2 . 13 =26, . . . , 2 .38 = 76 Step 3 Multiply the frequency by the square of the midpoint, and place the products in column E. 1 .82 = 64, 2 . 132 = 338, . . . , 2 .382 = 2888 Step 4 Find the sums of columns B, D, and E. The sum of column B is n, the sum of column D is f i xm , and the sum of column E is f i xm2 . The completed table is shown. Step 5 variance. Substitute in the formula and solve for s2 to get the Step 6 Take the square root to get the standard deviation. 31 Let's Do It! 11 The data show distribution of the birth weight ( in oz.) of 100 consecutive deliveries. Find the variance and the standard deviation. Interval 29.50-69.45 69.50-89.45 89.50-99.45 99.50-109.45 109.50-119.45 119.50-129.45 129.50-139.45 139.50-169.45 Frequency 5 10 11 19 17 20 12 6 HW page 33: 2, 3, 9, 13, 35, 37 32 Let's Do It! 12( 8 min).16A Transformation Data on number of children for 10 households in a neighborhood: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4 Mean = 1.6 and standard deviation = 1.43. We wish to summarize the number of people in a household. Each household has two adults so we can simply add the value 2 to each number in the list. 4, 5, 2, 4, 3, 2, 5, 2, 3, 6 (a) Find the mean and the standard deviation of this new set of observations and compare them to those for the original observations. How did the mean change? How did the standard deviation change? (b) Summarize how adding the same constant to each observation affects the mean and standard deviation of the observations. Knowing how the standard deviation is computed, does this make sense? Suppose each child receives a weekly allowance of $3. The total allowance expense in a household can be obtained by multiplying every number in the original list by 3. 6, 9, 0, 6, 3, 0, 9, 0, 3, 12 (c) Find the mean and the standard deviation of this new set of observations and compare them to those for the original observations. How did the mean change? How did the standard deviation change? (d) Summarize how multiplying the same constant to each observation affects the mean and standard deviation of the observations. Knowing how the standard deviation is computed, does this make sense? 33 (e) Suppose that for a local recreation program, 3 credits are deducted for each child in the household. The adjustment in credit hours can be obtained by multiplying every number in the original list by –3. Note the multiplier is now negative. Determine the new values and find the sample mean and the sample standard deviation for these new values. New Values Mean for the new values values Standard deviation of the new Note: Even though the multiplier was negative, the sign for the new standard deviation is positive. (f) The cost is $5 for each child to enter a children indoor play park. Adults are free. Each household also has a coupon to save $2. Without doing arithmetic, can you state what the mean and standard deviation would be if: Y = 5(X) - 2? 34 Linear Transformation Rules If X represents the original values, x is the average of the original values, and s X is the standard deviation of the original values and the new values, represented by Y, are a linear transformation of X, Y=aX+b, then: the mean for Y is given by: y ax b and the standard deviation for Y is given by: sy a s x Example Temperature Transformation In a recent letter from one of your cousins in Europe, he stated that this past summer had been very hot. In particular, the high temperature of each day for a week was X = Temp (Celsius): Monday Tuesday Wednesday Thursday Friday Saturday Sunday 40 41 39 41 41 40 38 The mean and standard deviation are: x 40 degrees Celsius s X 1.15 degrees Celsius Temperature in the Fahrenheit scale, Y = Temperature in Fahrenheit, is related to temperature in the Celsius scale, X, by the following linear transformation: 9 9 F C 32 or in terms of Y and X: Y X 32 5 5 So the mean and standard deviation for temperature in the Fahrenheit scale are: 9 40 + 32 = 72 + 32 = 104 degrees Fahrenheit 5 9 sY 1.15 = 2.07 degrees Fahrenheit 5 y Now you can understand just how hot it was! Better than that, you did not need to transform each value. You just used what you learned in statistics. 35 Let's Do It! 13 17 Standardization: A Special Transformation Let’s perform a special transformation of the original data on the number of children in a household: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4 (a) The first step: subtract the mean x x. x from each number in the list, (b) The second step: divide the difference deviation s X . x x by the standard (c) Calculate the mean and standard deviation of the resulting values. Mean = _________ Standard Deviation = _______ A variable X is said to be standardized if the variable has a mean of zero and a standard deviation of 1. Note that the standardized variable xx sX can be expressed in the 1 xx 1 x x with a , form of a linear transformation, s sX sX sX X x . b and sX