Construction Engineering 221 Probability and Statistics Problem 10 • Solution to problem 10 on page 82 – First recognize that it is a binomial (counting) problem – Second recognize that the binomial calculation will be problematic because n is large (n=100) – Third- use the normal probability distribution as an approximation of the binomial distribution Problem 10 • Normal probability approximation: – µ=nπ, or µ = 100*.3 = 30 – sd= nπ(1-π), or 100*.3*.7 = 4.58 – P (40); z= (40-30)/4.58 = 2.18 – A(x) @ z=2.18 = .48537 – Probability of 40 hits is 1-(.5 +.48537) – P(40) = .0146, or 1.46% – I believe the book’s answer is incorrect Linear Regression • Sometimes we need to make predictions about the likelihood of an event (flood, traffic accident, inflation, disease, etc.) • We can use statistics to sort variance into recognizable patterns to help us interpret what is “random” variance” and what is “sample” variance. • Random variance is distributed throughout the population at random. Sample variance is created by membership in a sample (people who smoke and get lung cancer) Linear Regression • Sample variance can be correlated between -1 and +1. If a high score is correlated (occurs frequently within the sample) with a low score, then the correlation coefficient is negative. If a high score occurs frequently with a high score, the data is positively correlated Linear Regression • What type of correlation would you expect between: – IQ and salary? – GPA and hours studying? – GPA and hours drinking/partying? – Price of tea in China and number of wins in a season by the Chicago Cubs? – Socio-economic standing and crime rate? Linear Regression • Correlation coefficient r = 2[Σ(y-ˉ)2] Σ(x-ˉ)(y-ˉ)/ [Σx-ˉ] X Alternate formula eq. 9-2 on page 109 Assumptions: relationship is linear both variables are random conditional variances are equal variables are bivariate normal Linear regression • Example of correlation height Weight 65 185 67 200 69 215 62 140 71 220 77 250 75 245 79 235 70 220 Linear Regression Column 1 Column 2 Column 1 1 Column 2 0.909022 1 Linear Regression • Can be done with Excel spreadsheets • Linear regression is a special form of correlation, attempts to find the regression line, or the line through the correlated data that best fits the data. The regression line can then be used to predict outcomes. • Regression has formula y=bx +a, where – Y is the dependent variable, x is the independent variable, b is the regression coefficient, and a is a constant Linear Regression • When one predictor (independent) variable is used, it is called a simple regression, when more than one predictor is used, it is called multiple regression • Restatement of regression formula in common terms: – Expected value of the variable to be predicted =intercept +(slope X value of predictor variable); where slope is regression coeff. Linear Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.909022 R Square 0.826322 Adjusted R Square 0.801511 Standard Error 15.15392 Observations 9 Linear Regression • • ANOVA • Regression • • Residual 7 Total 8 df Coefficients • • Intercept -176.3 X Variable 5.506608 SS MS 7648.067 1607.489 9255.556 229.6413 1 Standard Error t Stat 67.5124 0.954187 -2.61137 5.770997 F 7648.067 P-value 95.0% 0.034844 0.000684 Significance F 33.30441 0.000684 Lower 95% -335.941 3.250317 Upper 95% -16.6582 7.762899 Lower 95.0% -335.941 3.250317 Upper -16.6582 7.762899 Formula is: weight = -176.3 +height(5.51) So if a new person joined the team and all we knew was that he was 6’-10”, we would be able to guess his weight at w= -176.3 +(82)(5.51)= 275 pounds