Uploaded by Nabil Maruf

Correlation Analysis (LEC-4)

advertisement
Correlation Analysis
• Correlation Analysis is the study of the relationship
between variables. It is also defined as group of
techniques to measure the association between two
variables.
• A Scatter Diagram is a chart that portrays the relationship
between the two variables. It is the usual first step in
correlations analysis
– The Dependent Variable is the variable being predicted
or estimated.
– The Independent Variable provides the basis for
estimation. It is the predictor variable.
1
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength of
the relationship between two variables. It requires interval or
ratio-scaled data.
• It can range from -1.00 to 1.00.
• Values of -1.00 or 1.00 indicate perfect and strong correlation.
• Values close to 0.0 indicate weak correlation.
• Negative values indicate an inverse relationship and positive
values indicate a direct relationship.
2
Correlation(Contd.)
 Types of correlation:
1. Positive Correlation: Positive correlation occurs when an increase in one variable
2.
3.
4.
5.
6.
increases the value in another.
Negative Correlation: Negative correlation occurs when an increase in one
variable decreases the value of another.
No Correlation: No correlation occurs when there is no linear dependency
between the variables.
Partial correlation: It measures the strength of a relationship between two
variables, while controlling for the effect of one or more other variables.
Linear Correlation: Correlation is said to be linear if the ratio of change is
constant. When the amount of output in a factory is doubled by doubling the
number of workers, this is an example of linear correlation.
Non Linear Correlation: Correlation is said to be non linear if the ratio of change
is not constant. For example, doubling the rainfall won’t harvest twice crops.
Perfect Correlation
4
Correlation Coefficient - Interpretation
5
Correlation

I.
II.
III.
Methods of coefficient of correlation:
Karl Pearson’s coefficient of correlation
Spearman’s rank of coefficient of correlation
Standard error of coefficient of correlation
 Karl Pearson’s coefficient of correlation:
Coefficient of Determination
•
The coefficient of determination (r2) is the
proportion of the total variation in the dependent
variable (Y) that is explained or accounted for by the
variation in the independent variable (X). It is the
square of the coefficient of correlation.
• It ranges from 0 to 1.
• It does not give any information on the direction of the
relationship between the variables.
7
Correlation Coefficient - Example
Using the Copier Sales of
America data compute the
correlation coefficient and
coefficient of determination.
8
Correlation Coefficient - Example
How do we interpret a correlation of 0.759?
First, it is positive, so we see there is a direct relationship between
the number of sales calls and the number of copiers sold. The value
of 0.759 is fairly close to 1.00, so we conclude that the association
is strong.
However, does this mean that more sales calls cause more sales?
No, we have not demonstrated cause and effect here, only that the
two variables—sales calls and copiers sold—are related.
9
Coefficient of Determination (r2) - Example
•The coefficient of determination, r2 ,is 0.576, found by (0.759)2
•This is a proportion or a percent; we can say that 57.6 percent of
the variation in the number of copiers sold is explained, or
accounted for, by the variation in the number of sales calls.
10
Lag and Lead in correlation
I
In correlation of time series the investigator may find there is a
gap before a cause and effect relationship is established. For
example, the supply of a commodity may increase today, but it
may not have an immediate effect on prices- it may take few
days or even months for prices to adjust to the increased
supply The difference in the period before a cause and effect
relationship is established is called lag. Ignoring this time gap
produce fallacious conclusions. The pairing of items is
adjusted according to the time lag.
Lag and Lead in correlation
• Pran juice is studying the effect of its latest advertising campaign. People
chosen at random are called and asked how many juice they had bought in
the past week and how many advertisement they have either seen in the
past week.
Months
; Jan Feb
Mar
Apr
May
Jun
Jul
Aug
Number of ads : 3
7
4
2
0
4
1
2
juice purchased: 11
18
9
4
7
6
3
8
•
• Develop the estimating equation that best fits the data.
• Calculate the standard error of the estimate.
• Allowing two months time lag calculate coefficient of correlation
Calculate the sample coefficient of determination. Interpret the result.
•
Testing the Significance of
the Correlation Coefficient
H0:  = 0 (the correlation in the population is 0)
H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
13
Testing the Significance of
the Correlation Coefficient - Example
H0:  = 0 (the correlation in the population is 0)
H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
t > t0.025,8 or t < -t0.025,8
t > 2.306 or t < -2.306
14
Testing the Significance of
the Correlation Coefficient - Example
The computed t (3.297) is within the rejection region, therefore, we will reject H0. This means the
correlation in the population is not zero. From a practical standpoint, it indicates to the sales
manager that there is correlation with respect to the number of sales calls made and the number
of copiers sold in the population of salespeople.
15
Correlation(Contd.)
 Spearman’s rank of coefficient of correlation is suitable for Qualitative data like honesty,
efficiency, intelligence etc. not applicable for large and grouped data set.
Here, d is the difference between two ranks.
If tie in Ranks:
 Standard error of coefficient of correlation:
Rank correlation
• Compute rank correlation for the following data:
• X = 15
20
28
12
40
60
20
80
• Y = 40
30
50
30
20
10
30
60
Download