Review 2 - Jan.ucc.nau.edu

advertisement
Review 2
Chapter 5 Summarizing Bivariate Data
1. Scatter plots
A scatter plot  a picture of bivariate numerical data in which each observation (x, y) is
represented as a point on a rectangular coordinate system. It can reveal the relationship
between x and y.
2. Pearson’s sample correlation coefficient and its properties

Pearson’s sample correlation coefficient
 xs x s
z z
r  nx1 y  nx 1 y =
y y

(1)
(2)
(3)
 xy ( x )( y ) / n
x  ( x ) 2 / n y 2  ( y ) 2 / n
2
.
Properties of r
The value of r does not depend on the unit of measurement for either variable.
The value of r does not depend on which of the two variables is labeled x.
-1  r  1. An r near 1 indicates a substantial positive linear relationship, whereas
an r close to –1 suggests a prominent negative linear relationship. The strength of
linear relationship based on r can be summarized as follows.
Strong
Strong
Moderate
Moderate
Weak
--+-------------------+------------------+-------------------+---------------------+---1
-0.8
-0.5
0
0.5
0.8
1
Figure: The strength of linear relationship based on r
(4)
(5)
r =1 only when all the points in a scatter plot lie exactly on a straight line that
slopes upward. r = -1 only when all the points lie exactly on a downward-sloping
line.
The value of r is a measure of the strength of linear relationship between x and y.
A value of r close to zero does not rule out any other strong relationship between
x and y.
3. The least squares line
The least squares line (or sample regression line) is the line yˆ  a  bx with
( x  x )( y  y )
xy  (
x )(  y ) / n
b   ( x  x )2
  x2 
 r 

(


 x)2 / n
sy
sx
.
a = y  bx ,
which gives the best fit to the data.

Properties of the least squares line
1. The least squares line passes through ( x , y ).
2. b and r have the same sign since b = r
sy
sx
.
3. The least squares line can be rewritten as yˆ  y  r
yˆ  y  r
sy
sx
sy
sx
( x  x ) . When x = x  ksx ,
( x  ksx  x )  y  rks y .
Chapter 6 Probability
4. Important concepts
The probability of an outcome, denoted by P(outcome), is interpreted as the long-run
relative frequency of the outcome when the experiment is performed repeatedly under
identical conditions.
Independent outcomes: Two outcomes are said to be independent if the probability that
one outcome occurs is not affected by knowledge of whether the other has occurred.
More than two outcomes are said to be independent if knowledge that some of the
outcomes have occurred does not change the probabilities that any of the other outcomes
occur.
Dependent outcomes: If the occurrence of one outcome changes the probability that the
other outcome occurs, the outcomes are dependent.
5. Basic properties of probability
1) 0  P (any outcome)  1.
2) (Addition rule) If two outcomes A1, A2 cannot occur simultaneously, then
P(A1 or A2) = P(A1)+P(A2).
More generally, if any two of outcomes A1, A2, , Ak cannot occur simultaneously,
then
P(A1 or A2 or or Ak) = P(A1)+P(A2)++P(Ak)
3) (Complement rule) The probability that an outcome A will not occur is equal to 1
minus the probability that the outcome will occur, that is,
P(not A) = 1 – P(A)
4) (Multiplication rule) If two outcomes, A1 and A2, are independent, the probability that
both outcomes occur is the product of the individual outcome probabilities, that is,
P(A1 and A2 ) = P(A1)P(A2).
More generally, if k outcomes, A1, , Ak, are mutually independent, then
P(A1 and A2 and and Ak) = P(A1)P(A2) P(Ak)
Chapter 7 Population Distribution
6. Basic concepts
A population distribution is the distribution of all the values of a numerical variable or
categories of a categorical variable.
A population distribution provides important information about the population. The
population distribution for a categorical variable or a discrete numerical variable can be
summarized by a relative frequency histogram or a relative frequency distribution,
whereas a density histogram is used to summarize the distribution of a continuous
numerical variable. Further, we represent a population distribution for a continuous
variable by using a simple smooth curve. Such a curve is called a continuous probability
distribution (or a density curve).
7. Properties of continuous probability distributions
(1) The total area under the curve is equal to 1.
(2) The area under the curve and above any particular interval is interpreted as the
probability of observing a value in the corresponding interval when an individual or
object is selected at random from the population.
(3) For continuous numerical variables and any particular numbers a, b and c,
P(x = c) = 0
P(x  a) = P(x < a)
P(x  a) = P(x > a)
P(a < x < b) = P(a  x b).
8. Important discrete distributions
(i) Bernoulli distribution
x
Probability (proportion)
1

0
1
Mean:  = 
Variance:  2=  (1- )
 2   (1   )
Standard deviation:  =
ii) Binomial distribution
n
P( X  x)    x (1   ) n  x , x = 0, 1, , n,
 x
n
n!
, m! (read m factorial) = m(m-1) 2 1 and 0! = 1.
where   
x
!
(
n
 x)!
 x
Mean:  = n
Variance:  2 = n(1-)
Standard deviation  =
 2  n (1   )
9. Important continuous distributions
1) Uniform distributions
A continuous distribution is called the uniform distribution on [a, b], if its density curve
is determined by
 b 1 a , a  x  b
f ( x)  
otherwise
0
Mean:   (a  b) / 2
Variance:  2  (b  a) 2 / 12
Standard deviation:   (b  a ) / 12
2) Normal distributions
The density curve of a normal distribution with mean  and standard deviation  is
determined by
f(x) =

1
2 
e

1
2 2
( x )2
, - < x < .
Find probabilities
a) For the standard normal distribution, we can find various probabilities from Appendix
Table 2 on pages 706-707.
b) If x has a normal distribution with mean  and standard deviation , we can find
probabilities related to x by the following equalities.
P( x < b) = P( x  
P(a < x) = P(
a

P(a < x < b) = P( a  


x

b

x


) = P( z 
) = P(
b

a

b

)
 z)
) = P( a   z 
b

)
Identify extreme values
i) For the standard normal distribution, we can identify the three types of extreme
values by Appendix Table 2.
ii) For a normal distribution with mean  and standard deviation , we first solve the
corresponding problem for the standard normal distribution to find z* and then
translate our answer into one for the normal distribution of interest by
x* =  +z*.
3) t distributions
A continuous distribution is called the t distribution with d degrees of freedom, if its
density curve is determined by
f ( x) 
(( d 1) / 2)
 ( d / 2)
1
d
1
(1  x 2 / d ) ( d 1) / 2
Mean:  = 0, d > 1.
Variance:  2  d d 2 , d > 2.
Standard deviation:  
2 
d
d 2
, d > 2.
,
- < x < ,

Important properties of t distributions
1. The t curve corresponding to any fixed number of degrees of freedom is
continuous, bell-shaped, symmetric, and centered at zero (just like the standard
normal (z) curve).
2. Each t curve is more spread out than the z curve.
3. As the number of degrees of freedom increases, the spread of the corresponding t
curve decreases.
4. As the number of degrees of freedom increases, the corresponding sequence of t
curves approaches the z curve.

Find probabilities related to a t distribution by Appendix Table 4 on pages 709-711.
10. Important examples in the notes
Examples: 5.1, 5.3, 6.6, 6.7, 7.2, 7.6, 7.7, 7.8, 7.9.
Download