Review - Midterm Exam

advertisement
Review
Midterm Exam
Midterm Review
AMS-UCSC
May 6th, 2015
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
1 / 24
Topics
Topics
We will talk about...
1
Review
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
2 / 24
Review
The histogram
Drawing a Histogram
Once the distribution table is available the next step is to draw a
horizontal axis specifying the class intervals. Then we draw the blocks
remembering that:
In a histogram, the areas of the blocks represent percentages
• When class intervals do not have the same length, it is a mistake to
set the heights of the blocks equal to the percentages in the table.
• To figure out the height of a block divide the percentage by the
length of the interval.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
3 / 24
Review
The histogram
Vertical Scale
The meaning of the vertical scale in a histogram
• Remember that the area of the blocks is proportional to the percents.
A high height implies that large chunks of area accumulate in small
portions of the horizontal scale.
• This implies that the density of the data is high in the intervals where
the height is large. In other words, the data are more crowded in
those intervals.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
4 / 24
Review
Average and Standard Deviation
Average and SD
Average
The average of a list of numbers equals their sum, divided by how many
they are
The Standard Deviation (SD)
The SD of a list of numbers measures how far away they are from their
average
Thus a large SD implies that many observations are far from the overall
average.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
5 / 24
Review
Average and Standard Deviation
The Standard Deviation
We can quantify what is written above as
• Roughly 68% of the observations are within one SD of the average.
• Roughly 95% of the observations are within two SDs of the average.
• Roughly 99% of the observations are within three SDs of the average.
These statements are more accurate when the distribution is symmetric.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
6 / 24
Review
The Normal Density
The Normal Density
The Gaussian or normal curve corresponds to the following formula
1
2
y = √ e −x /2 e = 2.71828 . . .
2π
and corresponds to the graph
The area below the
curve is equal to one.
We observe that the
curve is symmetric
around zero and that
most of the area is
concentrated between
−4 and 4. The
probability of an interval
is the corresponding area
under the curve.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
7 / 24
Review
The Normal Density
The Normal Density
Doing calculations with the normal curve requires the use of a table.
Tables are available for the standard normal curve and they require that
observations be transformed to standard units.
Standard Units
Given a list of numbers, we convert to standard units by subtracting the
average and dividing by the SD
• P((0, z)) = 1/2 × P((−z, z))
• P((−z, x)) = P((−z, 0)) + P((0, x))
• P(> z) = 1/2 × (P(< −z) + P(> z))
• P(< −z) + P(> z) = 1 − P((−z, z))
• P(< z) = P(< 0) + P((0, z))
• P((z, x)) = 1/2 × (P((−x, x)) − P((−z, z))
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
8 / 24
Review
Correlation
Correlation
Correlation Coefficient
The correlation coefficient gives a measure of the linear association of two
variables.
The correlation coefficient is usually denoted by r and takes values
between -1 and 1
• The correlation is not affected when the two variables are
interchanged.
• The correlation is not changed if the same number is added to all the
values of one of the variables.
• The correlation is not changed if all the values of one of the variables
is multiplied by the same positive number. It will change sign if the
number is negative.
• The correlation coefficient is 1 if the variables have perfect positive
linear association and -1 is they have perfect negative linear
association.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
9 / 24
Review
Correlation
Correlation
Computing the correlation coefficient
The procedure to compute the correlation coefficients is the following
1
Convert each variable to standard units
2
Calculate the average of the products
The result is the correlation coefficient. The formula is given by
r = average of ( x in standard units × y in standard units )
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
10 / 24
Review
Regression
Regression
The Regression Line
The regression line for y on x estimates the average value of y
corresponding to each value of x
Associated with an increase of
one SD in x there is an
increase of r × SDs in y on
average.
error = actual value of y - predicted value of y
p
RMS error = 1 − r 2 × SD of y
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
11 / 24
Review
Regression
Regression
Estimate Percentile Ranks
We can use the regression method and the normal curve to produce
estimates of the percentile ranks.
Percentile Rank
A percentile is a score: for example the 95th percentile is a score of 700.
A percentile rank is the percent: if you score 700, you have a percentile
rank of 95%.
• Given a percentile rank for the x variable, find the corresponding z
score in the normal table.
• This score gives the number of SDs above the average of the x
variable.
• Using the regression method find the SDs above the average of the y
variable.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
12 / 24
Review
Regression
Regression
Regression
• The average of the residuals is 0 and the regression plot for the
residuals is horizontal
• The formula for the slope of a regression line is
r × SD of y
SD of x
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
13 / 24
Review
Regression
Regression
• The intercept of the regression line is the predicted value of y for
x = 0.
The intercept formula is given by
average of y − slope × average of x
• Among all possible lines through a cloud, the regression line is the
one that has the smallest RMS error in predicting y from x.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
14 / 24
Review
Problems
Problem 1
Among freshmen at a certain university, scores of the Math SAT followed
the normal curve, with an average of 550 and a SD of 100.
• Find he percentile corresponding to a score of 400 on the Math SAT.
• Find the score corresponding to the 75th percentile of the distribution
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
15 / 24
Review
Problems
Problem 1 (Cont.)
a) First calculate the standard units for this score: (400-550)/100=-1.5.
400 is 1.5 SDs below average. This student is in the 7th percentile of the
score distribution. The area to the left of -1.5 is about 7%.
b) The 75th percentile is around 0.7. The student needs about 0.7 SDs
above the average. This is about 550+0.7*100=620 on the Math SAT
exam.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
16 / 24
Review
Problems
Problem 2
A statistical analysis is made of the midterm and final scores in a large
class. The results are
average midterm score ≈ 60, SD ≈ 15
average final score ≈ 65, SD ≈ 20, r ≈ 0.50
1
Using the normal approximation, about what percentage of the
students scored over 80 on the midterm?
80 points on the midterm corresponds to
80 − 60
= 1.33
15
2
standard units. Using the normal we obtain that approximately 9% of
the students scored over 80 on the midterm.
What is the R.M.S. error?
p
1 − .52 × 20 = 17.32
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
17 / 24
Review
Problems
Problem 2(Cont.)
1
What is the slope of the regression line?
0.5 × 20
= 0.67
15
2
What is the predicted final score for a student who scored 80 in the
midterm?
80 points on the midterm is 1.33 SD units above average. This
corresponds to 1.33 × 0.5 = 0.67 SD above average on the final.
That corresponds to 0.67 × 20 = 13.4 points over average on the
final, so the students that scored 80 on the midterm, scored, on
average, 65 + 13.4 = 78.4 on the final.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
18 / 24
Review
Problems
Problem 2(Cont.)
1
Of the students who scored 80 on the midterm, about what
percentage scored over 80 on the final?
In standard units we have
80 − 78.4
= 0.09
17.32
and there is an area of about 46% to the right of this value under the
normal curve.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
19 / 24
Review
Problems
Problem 3(Problem 1b) Chp. 10. Sect. C)
Average of Midterm exam ≈ 60
SD of Midterm exam ≈ 15
Average of Final exam ≈ 60
SD of Final exam ≈ 15
r = 0.5
Predict final exam score for a student whose Midterm score is 30
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
20 / 24
Review
Problems
Problem 3(Problem 1b) Chp. 10. Sect. C (Cont.))
1
Get standard units for x = 30 (x is midterm score)
z = (30 − 60)/15 = −2
2
Get standard units in y using the regression method (y is the final
score):
−2 × r = −2 × 0.5 = −1
3
Get final standard units in y
−1 × 15 = −1.5
The students score in the final is 15 points below the average.
4
Final score: 60 − 15 = 45
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
21 / 24
Review
Problems
Problem 4(Problem 2b) Chp. 10. Sect. C)
The correlation between the SAT scores and the 1st year GPA scores is
r ≈ 0.60.
A student got a Percentile Rank on SAT of 30%.
Predict the corresponding Percentile Rank of the 1st year GPA exam
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
22 / 24
Review
Problems
Problem 4(Problem 2b) Chp. 10. Sect. C)
1
You need the z score corresponding to an area of 30% to the left of
this value. This is equivalent to the z value of an area of 40% in the
normal table. z = −0.53
2
Use the regression method to predict standard units:
−0.53 × 0.60 = −0.318
3
The area to the left of this value will be the predicted percentile rank
(1 − 0.25)/2 = 0.75/2 = 0.378
This is about 38%.
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
23 / 24
Review
Problems
Good luck in your midterm exam!
Spring 2015. Session 1 (Midterm Review)
AMS-5
May 6th, 2015
24 / 24
Download