Simple Linear Regression Deterministic Relationship īŽ If the value of y (dependent) is completely determined by the value of x (Independent variable) (Like an equation in the form y = 2x + 10, or f(x) = 5x-1) īŽ However, in most situations, the variables of interest are not deterministically related! For example, the value of y = 1st year college GPA is certainly not determined solely by x = high school GPA. Probabilistic Model īŽ A description of the relation between 2 variables x and y that are not deterministic. īŽ The general form allows y to be larger or smaller than f(x) by a random amount, e. đ = đ đđđđđđđđđđđđ đđđđđđđđ đđ đ + đđđđ đđ đ đđđđđđđđ đ=đ đ +đ Let x* denote the value of x…. īŦ y īž f ( x ) if e īž 0 ī¯ * ī y īŧ f ( x ) if e īŧ 0 ī¯ y īŊ f ( x* ) if e īŊ 0 īŽ * Without the random deviation e, all observed (x, y) points would fall exactly on the population regression line. The inclusion of e in the model equation recognizes that points will deviate from the line. Simple Linear Regression Model: y īŊīĄ īĢ īĸxīĢe īŽ e1 Assumptions about the distribution of e =0 đđ = 0 ī¨ St. Dev. īŗ is the same for any value of x. ī¨ Distribution of e at any x value is normal ī¨ Random deviations đ1 , đ2 , đ3 … . , đđ associated with different observations are independent of one another ī¨ Mean Slope y īŊīĄ īĢ īĸxīĢe Population Regression Line īŽ Average change in y associated with a 1 unit increase in x. īŽ Point estimate is the slope (b). (Population is đˇ) īŽ Y-intercept’s point estimate is a. (Population is đļ) Summary đđĨđĻ īŽ đ = đđđđđĄ đđ đĄđđđđĄđ đđ đŊ = īŽ đ = đđđđđĄ đđ đĄđđđđĄđ đđ đŧ = đĻ − đđĨ where đđĨđĻ = īŽ đĨđĻ − đĨ đĻ đ and đđĨđĨ đđĨđĨ = 2 đĨ − đĨ 2 đ The estimated regression line is then just the leastsquares line đĻ = đ + đđĨ. X* denotes a specified value of the predictor variable x …. * a īĢ bx īŽ So has 2 different interpretations ī¨ It is a point estimate of the true mean y value when x = x*. ī¨ It is a point predictor of an individual y value that would be observed when x = x*. Find the point estimate of the mean y-value for the following: īŦ x īŊ mother's age ī īŽ y īŊ birth weight Age (x) Weight (y) 15 17 18 15 16 19 17 16 18 20 2289 3393 3271 2648 2897 3327 2970 2535 3138 3573 So what’s the point estimate for an 18 year old mom? Point estimate and point prediction are identical – only the interpretation is different. īŽ Prediction – weight of single baby who mom is 18 īŽ Estimate – average weight of all babies born to 18 yearolds Answer the following: īŽ Explain the slope in context of the problem īŽ Explain the y-intercept in context of the problem. Find SSResid. īŽ īŽ īĨī¨ y ī yīŠ 2 On calculator – every time you calculate a linear regression – it calculates the residuals. Put them in list 3 and square them & add the list. Point estimate of is đ is đđ . SSRe sid Se īŊ nī2 īŽ It represents the typical deviation in the y-variable from the least squares line. Find the residual for a mother who is 19. Find the probability that a 19 year old mother has a baby that is more than 3000 g. Coefficient of determination (r2) īŽ ī¨ SSTot īŊ īĨ y ī y īŊīĨy īŽ īŽ 2 īŠ 2 yīŠ ī¨ īĨ ī 2 n SS resid r īŊ 1ī SSTot 2 It’s the amount of variation in the y-variables that can be explained by the least squares line. Homework īŽ Worksheet