Looking at Data

advertisement
Looking at Data-Relationships
2.1 –Scatter plots
Definitions
• Scatter plot-shows relationship between two quantitative variables
measured on the same individuals
• Explanatory variable-a variable may explain or even cause changes in
another
• Response variable-a variable changes with explanatory variables
• Scatter plot axis –x axis(explanatory variable), y axis(response variable)
• Examining a scatter plot
– Overall pattern(linear, non-linear, quadratic, etc) and deviations
– Overall pattern of scatter plot by form( line, parabola),direction( positive, negative),
and strength( strong, weak) of the relationship
– An important kind of deviation is an outlier
– Positive association(high values of the two variables tend to occur together)
– Negative association(high values of one variable tend to occur with low values of the
other variable)
– Strength-the strength of a relationship is determined by how close the points in the
scatter plot lie to simple form such as line
• Prep work
Do problem 2.7 in the text book. Store
second-test scores in list L3 & Final-exam
score in list L4
Do problem 2.11
Looking at Data-Relationships
2.2 –Correlation
Definitions
• Correlation r- measures the direction and strength of the linear(straight line)
association between two quantitative variables x & y
• You can calculate a correlation for any scatter plot, r measures only linear
relationships
• r>0 ->positive association
• r<0 ->negative association
• r between -1 & 1 including endpoints
• Perfect correlation , r=+ or – 1 occurs only when the points lie exactly on a straight
line
• Formula for the correlation coefficient between x & y- standard deviation of x=
• Correlation ignores the distinction between explanatory and response variables
• Correlation not resistant-outliers can greatly change the value of r
• Prep work– Do problem 2.29. Store price in list L5 &
deforestation in list L6
Looking at Data-Relationships
2.3 –Least-Squares Regression
Definitions
• Regression linea straight line that describes the relationship between x & y
Requires an explanatory variable & a response variable
• Fitting a line to datay  b0  b1x
b0  y  int ercept
b1  slope(the amount by which y changes when x increases by one unit
• Extrapolation- Use of a regression line for prediction far outside
the range of values of the explanatory variable x used to obtain
the line
Definitions
• Least-squares regression line of y on x a line that makes the sum of the squares of the vertical distances
of the data points from the line as small as possible
 Requires an explanatory variable & a response variable
• Equation of the Least-squares regression line yˆ  b 0  b1x
b 0  y  int ercept  y  b1x
b1  slope  r
2
sy
sx
• r in regression- is the fraction of the variation in the values of y that is
explained by the least-squares regression of y on x
Looking at Data-Relationships
2.4 –Cautions about Correlation and
Regression
Definitions
• Residuals
 Difference between an observed value of the response and the
value predicted by the regression line
 Requires an explanatory variable & a response variable
• Residual equation
residual  observed y  predicted y
 y  yˆ
• Special property: the mean of the least-squares residuals is always
zero
• Residual Plots: a scatter plot of regression residual against the
explanatory variable. Help us to assess the fit of a regression line
Definitions
• Outlier-An observation that lies outside the overall pattern
• Influential observations-if removed it would change the
result of the calculation
• Lurking variable: a variable that is not among explanatory or
response variables but yet may influence the interpretation of
relationships among those variables
• Association does not imply causation
Prep work-Brain Activity vs. Empathy score
example
Will women who are higher in empathy respond more strongly when their partner has
a painful experience?
1)Store empathy scores in list L1 & Brain activity in list L2
2)Use the TI-84 to find the equation of the least-squares regression line of brain activity
on empathy score (use 4 decimals for coefficients)
3)Use the equation to predict the empathy score for subject 1
4)Find the residual for subject 1
5)Subject 16 can be considered as a possible outlier, find the equation of the leastsquares regression line of brain activity on empathy score without this outlier
Looking at Data-Relationships
2.6 –The Question of Causation
Definitions
• Some observed associations between two variables are
due to a cause-and-effect relationship between these
variables, but others are explained by lurking variables
• The effect of lurking variables can operate through
common response if changes in both explanatory and
response variables are caused by changes in lurking
variables.
• Confounding of two variables(either explanatory or lurking
variables) means that we cannot distinguish their effects
on the response variables
Download