Review

advertisement
Review
• z-score: the z-score for an observation is the number of
standard deviations that it falls from the mean.
x=
observation − mean
standard deviation
• An observation in a bell-shaped distribution is regarded as a
potential outlier if it falls more than three standard deviations
from the mean.
• association: an association exists between two variables if a
particular value for one variable is more likely to occur with
certain values of the other variable.
• response variable: the outcome variable on which
comparisons are made.
• explanatory variable: the groups to be compared with
respect to values on the response variable.
• contigency table:
a display for two categorical variables.
Its rows list the categories of one variable and its columns list
the categories of the other variable.
Each entry in the table (called cell) is the number of
observations in the sample with certain outcomes on the two
variables.
• conditional proportions & marginal proportions
• positive association & negative association
Two quantitative variables (say x and y ) are said to have a
positive association when high values of x tend to pair with
high values of y , and low values of x tend occur with low
values of the y .
They are said to have a negative association when high
values of one variable tend to pair with low values of the other
variable, and low values of one pair with high values of the
other.
• correlation
the correlation summarizes the direction of the association
between two quantitative variables and the strength of its
straight-line trend. Denoted by r , it takes values between −1
and +1.
Interpretation of correlation:
• a positive value for r indicates a positive association and a
negative value for r indicates a negative association.
• the closer r is to ±1, the closer the data points fall to a
straight line, and the stronger is the linear association. The
closer r is to 0, the weaker is the linear association.
Calculating the Correlation r :
To obtain r , we first calculate the z-score for x value and y value
of each observation and then find a typical value (average) of the
products of the z-scores.
1 X x − x̄
y − ȳ
1 X
r=
zx zy =
n−1
n−1
sx
sy
where n is the number of points, x̄ and ȳ are means, and sx and sy
are standard deviations for x and y . The sum is taken over all n
observations.
Download