Uploaded by Alex Brigoli

2 Pearson Correlation

advertisement
Pearson’s Product Moment Correlation Coefficient
(PPMCC)
Sanjay Singh, PhD
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Pearson Product Moment Correlation
Coefficient (PPMCC)
• Used when X & Y are both metric variables
• Given by Karl Pearson in 1896 in his paper
‘Philosophical Transactions of the Royal
Society of London’
• Denoted by symbol ‘r’
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Why it is called Product Moment Correlation
• ‘Moment’ refers to average of a set of products. Pearson correlation is
given by following formula
r=
x = deviation of X scores from Mean
Score in Score in
x = X –X
Maths
Stats
Student
y = deviation of Y scores from Mean
A
9
8
n = Total number of Pairs
B
7
6
C
10
11
D
6
18
E
8
7
F
14
9
G
8
3
H
8
6
I
8
8
Moment = If X is a random variable then expected values of X (called E(X)) are the
powers of X (like X, X2, X3, etc.). They are known as first moment, second moment,
third moment, etc. If you derive you will get first moment as mean, 2nd moment as
standard deviation, 3rd moment as skewness and 4th moment as Kurtosis.
Note that when you subtract the value of population mean (μ) from X, its known as
central moment (X- μ). By definition Pearson Correlation coefficient is product of
central moment of X and central moment of Y divided by total number of pairs.
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
X
Y
y = Y –Y
xy
Assumptions of PPMCC
• Quantitative Measures: Both IV & DV should be quantitative (interval or ratio)
• Linearity: X & Y should be linearly related
(In case of curvilinear or no correlation it should not be applied)
• Absence of outliers: There should be no significant outliers.
(Outliers affect linearity)
• Normality: The variables should be normally distributed in their respective
population. If one variable is DV (dependent variable) then at least DV must be
normally distributed.
The Pearson correlation is, however, reasonably robust when there is
departure from normality (Sprinthall, l987, cited from Martin et al. (1993),
Havelicek and Peterson, 1977)
• Minimum 30 observations: For the assumption of normality to hold true there
should be minimum 30 observations.
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Formulas for PPMCC:
Formula 1: Deviation Score Formula
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Formula 2: Z score formula
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Formula 3: Raw Score Formula/Machine
Formula
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Formula 4: Covariance Formula
The formula on right side is for population.
In case of sample the denominator will have
(n-1).
This is because of difference in sample and
population Standard deviation formulas.
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Calculation of PPMCC
Score in
Maths
Student
Score in
Stats
x = X –𝑿
y = Y –𝒀
xy
Score
in
Score
x=X– y=Y–
Stude Math in
nt
s
Stats 𝑿
xy X2
𝒀
A
9
8
A
9
8
B
7
6
B
7
6
C
10
11
C
10
11
D
6
18
D
6
18
E
8
7
E
8
7
F
14
9
F
14
9
G
8
3
G
8
3
H
8
6
H
8
6
I
8
8
I
8
8
X
Y
X Y
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Y2
Importance
• Explains Variability (Variance = Square of r)
• Could be a base for causation but does not guarantee it.
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
References
• http://ww2.amstat.org/publications/jse/v9n3/stanton.html
• The SAGE Encyclopedia of Communication Research Methods edited
by Mike Allen
Sanjay Singh, PhD | sanjay.singh3210@gmail.com
Download