Uploaded by Esha Jain

Topic 7 Correlation and regression

advertisement
COR-STAT1202
Topic 7:
Correlation and
regression
Statistics for Business and Economics
Chapter 12
1
Correlation
2
COREL. AND REG.
1
COR-STAT1202
.
Correlation
• Correlation is concerned with the
strength of the linear relationship
between two variables
COR-STAT1202
3
Correlation and Regression
3
Visually, the relationship between two variables can be
seen using scatter diagrams
Results
There is a fairly strong positive
linear relationship between
age and results.
(maybe ρ is about 0.7)
Age
COR-STAT1202
Correlation and Regression
4
4
COREL. AND REG.
2
COR-STAT1202
Results
There is a fairly strong negative
linear relationship between
age and results
(maybe ρ is about -0.8)
Age
COR-STAT1202
5
Correlation and Regression
5
Results
There is a fairly weak negative
linear relationship between
age and results
(maybe ρ is about -0.3)
Age
COR-STAT1202
Correlation and Regression
6
6
COREL. AND REG.
3
COR-STAT1202
There is no relationship
between age and
results
Results
(r would be close to zero)
Age
COR-STAT1202
Correlation and Regression
7
7
Can the correlation coefficient, as a summary
statistic, replace an individual examination of the
data?
COR-STAT1202
Correlation and Regression
8
8
COREL. AND REG.
4
COR-STAT1202
The four y variables have the same mean (7.5), standard deviation
(4.12) and correlation (0.81).
However, as can be seen on the plots, the distribution of the
variables is very different
COR-STAT1202
Correlation and Regression
9
9
You can calculate the correlation coefficient or
product moment correlation or Pearson correlation
coefficient. This is ρ.
The correlation between two variables are often estimated
by the sample relationships, r
COR-STAT1202
Correlation and Regression
10
10
COREL. AND REG.
5
COR-STAT1202
Some points about correlation



It is independent of the scale of measurement
It is independent of the origin of measurement
It is symmetric (correlation between x and y is
the same as between y and x)
COR-STAT1202
Correlation and Regression
11
11
Example:
A test has been designed to examine a prospective salesman’s
ability to sell. Some experienced salesmen sit the test and their
scores are compared with their actual productivity. Calculate
the correlation between test score and productivity.
Score (x)
(mark out of 50) 41, 34, 35, 40, 33, 42, 37, 42, 40, 43, 38, 38, 46,
36, 32, 43, 42, 30, 41, 45
Productivity (y)
(number sold)
COR-STAT1202
32, 35, 20, 24, 27, 28, 31, 33, 26, 41, 29, 33, 36,
23, 22, 38, 26, 20, 30, 30
Correlation and Regression
12
12
COREL. AND REG.
6
COR-STAT1202
r
The correlation between x and y is strong.
COR-STAT1202
Correlation and Regression
13
13
Making sense of correlations
COR-STAT1202
Correlation and Regression
14
14
COREL. AND REG.
7
COR-STAT1202
Spurious Correlation
Spurious Correlation
• “Spurious Correlation” is defined as a
situation in which measures of two or
more variables are statistically related
but are not in fact causally linked, and
this is usually because the statistical
relation is caused by a third variable.
COR-STAT1202
Correlation and Regression
15
15
Think!!!
Studies have shown repeatedly, for example, that children
with longer arms reason better than those with shorter arms.
Yes, there is a correlation between the two. But commonsense
tells us there is no CAUSAL relationship between the two.
Children with longer arms reason better because they’re older!
COR-STAT1202
Correlation and Regression
16
16
COREL. AND REG.
8
COR-STAT1202
Reasoning
ability
Age of
children
The correlation is very strong
Long arms
COR-STAT1202
There is no
causal
relation
here
Correlation and Regression
17
17
So what have we learnt:
Correlation does not imply causation.
Causation does suggest correlation.
COR-STAT1202
Correlation and Regression
18
18
COREL. AND REG.
9
COR-STAT1202
Rank Correlation
The rank correlation coefficient (also known as Spearman’s
rank correlation coefficient) is another way to measure
the strength of correlation between two variables.
where di are the differences in the ranks between xi and yi.
It looks at ranks not actual variable values. Therefore it takes
into account extreme observations in the sample.
COR-STAT1202
19
Correlation and Regression
19
Illustration:
The following figures give examination and project results (in %)
for eight students.
Find the Spearman’s rank correlation coefficient for the data
Student’s examination and project marks
1
2
3
4
5
6
7
8
Exam
95
80
70
40
30
73
85
50
Project
65
60
55
50
40
80
75
70
COR-STAT1202
Correlation and Regression
20
20
COREL. AND REG.
10
COR-STAT1202
Student’s examination and project marks
1
2
3
4
5
6
7
8
Exam
95
80
70
40
30
73
85
50
Rank (E)
8
6
4
2
1
5
7
3
Project
65
60
55
50
40
80
75
70
Rank (P)
5
4
3
2
1
8
7
6
d
3
2
1
0
0
-3
0
-3
d2
9
4
1
0
0
9
0
9
COR-STAT1202
Correlation and Regression
21
COR-STAT1202
Correlation and Regression
22
21
22
COREL. AND REG.
11
COR-STAT1202
Demonstration and Practice




Use the datafile ‘satisfaction_retention’ for this
exercise
What is the strength of linear relationship
between employee satisfaction levels and
employee engagement?
What is the strength of linear relationship
between employee engagement and customer
satisfaction?
What conclusion can you draw from these
findings?
COR-STAT1202
Correlation and Regression
23
23
Regression
24
COREL. AND REG.
12
COR-STAT1202
In regression, we find the way of representing the
linear relationship between variables
We need to know the dependent variable y and the
independent variable x.
The relationship is given as:
COR-STAT1202
25
Correlation and Regression
25
The best fitted line through a set of data is represented as
satisfaction
Y = a + bx
Service quality perception
COR-STAT1202
Correlation and Regression
26
26
COREL. AND REG.
13
COR-STAT1202
The line so formed is known as the sample regression
line of y on x.
COR-STAT1202
27
Correlation and Regression
27
satisfaction
These two graphs show
datasets of different
correlations
Service quality perception
satisfaction
Service quality perception
COR-STAT1202
Correlation and Regression
28
28
COREL. AND REG.
14
COR-STAT1202
satisfaction
These two graphs have
datasets with different
regression weights.
Service quality perception
satisfaction
Service quality perception
COR-STAT1202
Correlation and Regression
29
29
For datasets that have a high r, it means that there is a strong
connection between x and y; the points will be close to the line
of the best fit.
In contrast, a low r means the points are scattered.
CORRELATION r AND REGRESSION WEIGHT b,
MEASURE TWO DIFFERENT THINGS
COR-STAT1202
Correlation and Regression
30
30
COREL. AND REG.
15
COR-STAT1202
Illustration:
A study was made by a retailer to determine the relation between
weekly advertising expenditure and sales (in thousands of pounds).
Find the equation of a regression line to predict weekly sales from
Advertising. Estimate weekly sales when advertising costs are
$35,000.
Adv costs
(‘000)
40
20
25
20
30
50
40
20
50
40
25
50
Sales
(‘000)
385
400
395
365
475
440
490
420
560
525
480
510
COR-STAT1202
Correlation and Regression
31
COR-STAT1202
Correlation and Regression
32
31
32
COREL. AND REG.
16
COR-STAT1202
So, sales = 343.70 + 3.22 Adv.Costs
With advertising costs of 35 (i.e. 35000), sales = $456400
COR-STAT1202
Correlation and Regression
33
33
Demonstration and Practice


Use the excel file ‘satisfaction-retention’ for this
exercise.
What is the impact of employee engagement on
customer satisfaction?
COR-STAT1202
Correlation and Regression
34
34
COREL. AND REG.
17
COR-STAT1202
END OF COURSE
CONGRATULATIONS!
COR-STAT1202
Correlation and Regression
35
35
COREL. AND REG.
18
Download