Review Midterm Exam Midterm Review AMS-UCSC May 6th, 2015 Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 1 / 24 Topics Topics We will talk about... 1 Review Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 2 / 24 Review The histogram Drawing a Histogram Once the distribution table is available the next step is to draw a horizontal axis specifying the class intervals. Then we draw the blocks remembering that: In a histogram, the areas of the blocks represent percentages • When class intervals do not have the same length, it is a mistake to set the heights of the blocks equal to the percentages in the table. • To figure out the height of a block divide the percentage by the length of the interval. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 3 / 24 Review The histogram Vertical Scale The meaning of the vertical scale in a histogram • Remember that the area of the blocks is proportional to the percents. A high height implies that large chunks of area accumulate in small portions of the horizontal scale. • This implies that the density of the data is high in the intervals where the height is large. In other words, the data are more crowded in those intervals. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 4 / 24 Review Average and Standard Deviation Average and SD Average The average of a list of numbers equals their sum, divided by how many they are The Standard Deviation (SD) The SD of a list of numbers measures how far away they are from their average Thus a large SD implies that many observations are far from the overall average. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 5 / 24 Review Average and Standard Deviation The Standard Deviation We can quantify what is written above as • Roughly 68% of the observations are within one SD of the average. • Roughly 95% of the observations are within two SDs of the average. • Roughly 99% of the observations are within three SDs of the average. These statements are more accurate when the distribution is symmetric. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 6 / 24 Review The Normal Density The Normal Density The Gaussian or normal curve corresponds to the following formula 1 2 y = √ e −x /2 e = 2.71828 . . . 2π and corresponds to the graph The area below the curve is equal to one. We observe that the curve is symmetric around zero and that most of the area is concentrated between −4 and 4. The probability of an interval is the corresponding area under the curve. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 7 / 24 Review The Normal Density The Normal Density Doing calculations with the normal curve requires the use of a table. Tables are available for the standard normal curve and they require that observations be transformed to standard units. Standard Units Given a list of numbers, we convert to standard units by subtracting the average and dividing by the SD • P((0, z)) = 1/2 × P((−z, z)) • P((−z, x)) = P((−z, 0)) + P((0, x)) • P(> z) = 1/2 × (P(< −z) + P(> z)) • P(< −z) + P(> z) = 1 − P((−z, z)) • P(< z) = P(< 0) + P((0, z)) • P((z, x)) = 1/2 × (P((−x, x)) − P((−z, z)) Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 8 / 24 Review Correlation Correlation Correlation Coefficient The correlation coefficient gives a measure of the linear association of two variables. The correlation coefficient is usually denoted by r and takes values between -1 and 1 • The correlation is not affected when the two variables are interchanged. • The correlation is not changed if the same number is added to all the values of one of the variables. • The correlation is not changed if all the values of one of the variables is multiplied by the same positive number. It will change sign if the number is negative. • The correlation coefficient is 1 if the variables have perfect positive linear association and -1 is they have perfect negative linear association. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 9 / 24 Review Correlation Correlation Computing the correlation coefficient The procedure to compute the correlation coefficients is the following 1 Convert each variable to standard units 2 Calculate the average of the products The result is the correlation coefficient. The formula is given by r = average of ( x in standard units × y in standard units ) Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 10 / 24 Review Regression Regression The Regression Line The regression line for y on x estimates the average value of y corresponding to each value of x Associated with an increase of one SD in x there is an increase of r × SDs in y on average. error = actual value of y - predicted value of y p RMS error = 1 − r 2 × SD of y Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 11 / 24 Review Regression Regression Estimate Percentile Ranks We can use the regression method and the normal curve to produce estimates of the percentile ranks. Percentile Rank A percentile is a score: for example the 95th percentile is a score of 700. A percentile rank is the percent: if you score 700, you have a percentile rank of 95%. • Given a percentile rank for the x variable, find the corresponding z score in the normal table. • This score gives the number of SDs above the average of the x variable. • Using the regression method find the SDs above the average of the y variable. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 12 / 24 Review Regression Regression Regression • The average of the residuals is 0 and the regression plot for the residuals is horizontal • The formula for the slope of a regression line is r × SD of y SD of x Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 13 / 24 Review Regression Regression • The intercept of the regression line is the predicted value of y for x = 0. The intercept formula is given by average of y − slope × average of x • Among all possible lines through a cloud, the regression line is the one that has the smallest RMS error in predicting y from x. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 14 / 24 Review Problems Problem 1 Among freshmen at a certain university, scores of the Math SAT followed the normal curve, with an average of 550 and a SD of 100. • Find he percentile corresponding to a score of 400 on the Math SAT. • Find the score corresponding to the 75th percentile of the distribution Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 15 / 24 Review Problems Problem 1 (Cont.) a) First calculate the standard units for this score: (400-550)/100=-1.5. 400 is 1.5 SDs below average. This student is in the 7th percentile of the score distribution. The area to the left of -1.5 is about 7%. b) The 75th percentile is around 0.7. The student needs about 0.7 SDs above the average. This is about 550+0.7*100=620 on the Math SAT exam. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 16 / 24 Review Problems Problem 2 A statistical analysis is made of the midterm and final scores in a large class. The results are average midterm score ≈ 60, SD ≈ 15 average final score ≈ 65, SD ≈ 20, r ≈ 0.50 1 Using the normal approximation, about what percentage of the students scored over 80 on the midterm? 80 points on the midterm corresponds to 80 − 60 = 1.33 15 2 standard units. Using the normal we obtain that approximately 9% of the students scored over 80 on the midterm. What is the R.M.S. error? p 1 − .52 × 20 = 17.32 Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 17 / 24 Review Problems Problem 2(Cont.) 1 What is the slope of the regression line? 0.5 × 20 = 0.67 15 2 What is the predicted final score for a student who scored 80 in the midterm? 80 points on the midterm is 1.33 SD units above average. This corresponds to 1.33 × 0.5 = 0.67 SD above average on the final. That corresponds to 0.67 × 20 = 13.4 points over average on the final, so the students that scored 80 on the midterm, scored, on average, 65 + 13.4 = 78.4 on the final. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 18 / 24 Review Problems Problem 2(Cont.) 1 Of the students who scored 80 on the midterm, about what percentage scored over 80 on the final? In standard units we have 80 − 78.4 = 0.09 17.32 and there is an area of about 46% to the right of this value under the normal curve. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 19 / 24 Review Problems Problem 3(Problem 1b) Chp. 10. Sect. C) Average of Midterm exam ≈ 60 SD of Midterm exam ≈ 15 Average of Final exam ≈ 60 SD of Final exam ≈ 15 r = 0.5 Predict final exam score for a student whose Midterm score is 30 Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 20 / 24 Review Problems Problem 3(Problem 1b) Chp. 10. Sect. C (Cont.)) 1 Get standard units for x = 30 (x is midterm score) z = (30 − 60)/15 = −2 2 Get standard units in y using the regression method (y is the final score): −2 × r = −2 × 0.5 = −1 3 Get final standard units in y −1 × 15 = −1.5 The students score in the final is 15 points below the average. 4 Final score: 60 − 15 = 45 Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 21 / 24 Review Problems Problem 4(Problem 2b) Chp. 10. Sect. C) The correlation between the SAT scores and the 1st year GPA scores is r ≈ 0.60. A student got a Percentile Rank on SAT of 30%. Predict the corresponding Percentile Rank of the 1st year GPA exam Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 22 / 24 Review Problems Problem 4(Problem 2b) Chp. 10. Sect. C) 1 You need the z score corresponding to an area of 30% to the left of this value. This is equivalent to the z value of an area of 40% in the normal table. z = −0.53 2 Use the regression method to predict standard units: −0.53 × 0.60 = −0.318 3 The area to the left of this value will be the predicted percentile rank (1 − 0.25)/2 = 0.75/2 = 0.378 This is about 38%. Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 23 / 24 Review Problems Good luck in your midterm exam! Spring 2015. Session 1 (Midterm Review) AMS-5 May 6th, 2015 24 / 24