day18 - University of South Carolina

advertisement
STAT 110 - Section 5
Lecture 18
Professor Hao Wang
University of South Carolina
Spring 2012
Last time
• How to measure spread: variance and
standard deviation
• Density curve
• Normal density curve: Mean and SD
http://www.stat.tamu.edu/jhardin/applets/signe
d/Normal.html
The 68-95-99.7 Rule
68% of the
data falls within
1 std deviation
of the mean
95% of the data
falls within 2 std
deviations of
the mean
99.7% of the
data falls within
3 std deviations
of the mean
Apply the 68-95-99.7 Rule
• The Health and Nutrition Examination Study of
1976-1980 (HANES) studied the heights of adults
(aged 18-24) and found that the heights follow a
normal distribution with the following:
• Women Mean (): 65.0 inches standard
deviation (): 2.5 inches
• Men
Mean (): 70.0 inches standard
deviation (): 2.8 inches
• Using 68-95-99.7 rule, what can we say about the
population of heights ?
The 68-95-99.7 Rule
Approximately what percent of a
standard normal distribution will fall
between -1 and 1?
A – 16%
B – 32%
C – 64%
D – 68%
E – 95%
Approximately what percent of a standard normal
distribution will fall between 0 and 1?
A – 16%
B – 32%
C – 34%
D – 68%
E – 95%
Example – IQ Scores for 12 Year Olds
IQ scores for 12 year olds follow a normal distribution
with a mean of 100 and a standard deviation of 16.
What percent of 12 year olds will have an IQ
between 84 and 116?
A – 5%
B – 32%
C – 68%
D – 95%
E – 99.7%
IQ scores for 12 year olds follow a normal distribution
with a mean of 100 and a standard deviation of 16.
What percent of 12 year olds will have an IQ higher
than 132?
A – 32%
B – 16%
C – 5%
D – 2.5%
E – 0.3%
IQ scores for 12 year olds follow a normal distribution
with a mean of 100 and a standard deviation of 16.
What percent will have IQ scores lower than 116?
A – 16%
B – 32%
C – 50%
D – 68%
E – 84%
The starting salaries in a field are approximately
normally distributed with a mean of $40,000 and a
standard deviation of $5,000.
What can we say about the percent of people who
make between $30,000 and $50,000?
A) Could be any percent
B) Is approximately 68%
C) Must be at least 75%
D) Must be at least 88.9%
E) Is approximately 95%
• Observations expressed in terms of standard deviations
above or below the mean are called Standard Scores.
• The standard score is the number of standard deviations
above or below the mean at which an observation is located.
• If the observation is below the mean, the standard score will
be negative.
• If the observation is above the mean, the standard score will
be positive.
Use standard score
• Jennie scored 600 on the verbal part of the SAT.
Her friend Gerald took the ACT and scored a 21 on
the verbal part. SAT scores are normally distributed
with mean 500 and standard deviation 100. ACT
scores are normally distributed with mean 18 and
standard deviation 6. Assuming that both tests
measure the same kind of ability, who has the
higher score?
• Who performs better ?
A woman is told her weight has a standard
score of 1. This means her weight is
A
B
C
D
1 pound above the mean
1 pound below the mean
1 standard deviation above the mean
1 standard deviation below the mean
• Math SAT scores follow a normal distribution with a
mean of 500 and standard deviation of 100.
• Calculate the standard score for a score of 630.
A.
B.
C.
D.
1.3
1.1
-1.3
-1.1
• Two students get a 65 on two different tests.
Student A has a standard score of -1 while
Student B has a standard score of -2. Which
student had the better performance on the
test?
A. Student A
B. Student B
C. Both students gave equal performances.
Percentiles (revisited)
pth percentile – a value such that at least p%
percent of the observations lies below it
and at least (100-p)% percent lie above
it.
Approximately what value does a
standard normal need to be at to be at
the 97.5th percentile?
A – -2
B – -1
C–0
D–1
E–2
• Recall the distribution of SAT math scores
follows a normal distribution with a mean of
500 and a standard deviation of 100.
What score do you need to be at so
that only 2.5% did better than you?
• Recall the distribution of SAT math scores
follows a normal distribution with a mean of
500 and a standard deviation of 100.
What score are you at if 84% did
better than you?
• Recall the distribution of SAT math scores
follows a normal distribution with a mean of
500 and a standard deviation of 100.
What percentage of people are better
than you if you scored 780 ?
Chapter 14 – Describing Relationships
Most statistical studies examine data on
more than one variable. The steps when
trying to talk about two variables at once
are the same as what we used earlier in
the semester with just one variables:
• Plot the data.
• Look for overall patterns and deviations
from those patterns.
• Use numerical summaries.
Scatterplots
scatterplot – shows the relationship between two
quantitative variables measured on
the same individuals
• Values of one variable appear on the x-axis. This
is typically the one doing the explaining – the
explanatory, predictor, or independent variable.
• Values of the other variable appear on the y-axis.
This is typically the one being explained – called the
response or dependent variable.
Scatterplot
Example:
When water flows across farmland, some of the soil is
washed away, resulting in erosion. An experiment
was conducted to investigate the effect of the rate of
water flow on the amount of soil washed away. Flow
is measured in liters/second and the eroded soil is
measured in kilograms.
flow rate
eroded soil
.31
.82
.85
1.95
1.26
2.18
2.47
3.01
3.75
6.07
Scatterplot
• Is there an explanatory variable?
• What’s the response variable?
• Which variable should be on the x-axis?
Flow Rate vs Eroded Soil
Eroded Soil (kg)
7
6
5
4
3
2
1
0
0
1
2
Flow Rate (liters/sec)
3
4
Measuring Strength
Through Correlation
A Linear Relationship
Correlation represented by the letter
r:
Indicator of how closely the values fall to a straight line.
Measures linear relationships only; that is, it measures how
close the individual points in a scatterplot are to a straight
line.
Correlation
Example : Verbal SAT and GPA
Scatterplot of
GPA and verbal
SAT score.
The correlation is
.485, indicating a
moderate positive
relationship.
Higher verbal SAT scores tend to indicate higher
GPAs as well, but the relationship is nowhere
close to being exact.
Example: Husbands’ and Wifes’
Ages and Heights
Scatterplot of British husbands’
and wives’ ages; r = .94
Scatterplot of British husbands’ and
wives’ heights (in millimeters); r = .36
Husbands’ and wives’ ages are likely to be closely related,
whereas their heights are less likely to be so.
Source: Marsh (1988, p. 315) and Hand et al. (1994, pp. 179-183)
Occupational Prestige
and Suicide Rates
Plot of suicide rate
versus occupational
prestige for 36
occupations.
Correlation of .109
– these is not much
of a relationship.
If outlier removed
r drops to .018.
Source: Labovitz (1970, Table 1) and Hand et al. (1994, pp. 395-396)
Example :
Professional Golfers’
Putting Success
Scatterplot of
distance of putt
and putting
success rates.
Correlation r = −.94.
Negative sign
indicates that as
distance goes up,
success rate goes
down.
Source: Iman (1994, p. 507)
Which one has
r = -0.86 ?
Which one has
r = 0.52 ?
(A was -0.86)
Summary: Features of Correlations
1 Correlation of +1 indicates a perfect linear
relationship between the two variables; as one
increases, so does the other. All individuals fall on
the same straight line (a deterministic linear
relationship).
2 Correlation of –1 also indicates a perfect linear
relationship between the two variables; however,
as one increases, the other decreases.
3 Correlation of zero could indicate no linear
relationship between the two variables, or that the
best straight line through the data on a scatterplot
is exactly horizontal.
Summary: Features of Correlations
4 A positive correlation indicates that the variables
increase together.
5 A negative correlation indicates that as one
variable increases, the other decreases.
6 Correlations are unaffected if the units of
measurement are changed. For example, the
correlation between weight and height remains
the same regardless of whether height is
expressed in inches, feet or millimeters (as long
as it isn’t rounded off).
Download