Correlation What is correlation? • Correlation is the measure of whether and how strongly pairs of variables are related. Types of correlation • Positive/negatives • Strong /weak Correlation coefficient r • Correlation coefficient r is a number that indicates how well data fit a statistical model • Ranges between –1 and 1 • The closer to –1, the stronger the negative linear relationship • The closer to 1, the stronger the positive linear relationship • The closer to 0, the weaker any positive linear relationship Correlation coefficient r • 1 is a perfect positive correlation • 0 is no correlation (the values don't seem linked at all) • -1 is a perfect negative correlation Correlation and causation • Causation is a relationship that describes and analyses cause and effect • Correlation is NOT causation Examples • Temperature & ice-cream sales • Ice-cream sales & shark attack • Runny nose & headache How to calculate r • Step 1: Find the mean of x, and the mean of y • Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call them "b") • Step 3: Calculate: a × b, a2 and b2 for every value • Step 4: Sum up a × b, sum up a2 and sum up b2 • Step 5: Divide the sum of a × b by the square root of [(sum of a2) × (sum of b2)] How to calculate r • Formula for r • Sxx is the sum of all the squares of the differences between the xi and the mean, for all i from 1 to n. • Syy is the sum of all the squares of the differences between the yi and the mean, for all i from 1 to n. • Sxy is the product of the differences between the xi and the mean and the differences between the yi and the mean, for all I from 1 to n. Example • A local shop of milk shakes keeps a track of the amount of milk shakes they sell in accordance to the temperature on that day. Below are the figures of their sale and temperature for the last 12 days. Comment on the relationship. Solution Question • Find the correlation coefficient of the following data. Linear Regression • In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y. • In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X. • The case of one explanatory variable is called simple linear regression. Linear Regression • Remember this? • Y=mX+B m B What is a slope? • A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y. Prediction • If you know something about the x, based on the model, you can know something about y • Extrapolation: • Attempting to use a regression equation to predict values outside of the observed range How to calculate linear regression Exercise • Find the linear regression of the following data.