The Purpose of Correlational Studies

advertisement
The Purpose of Correlational Studies:
Correlational studies are used to look for relationships between variables. There are three
possible results of a correlational study: a positive correlation, a negative correlation, and no
correlation. The correlation coefficient is a measure of correlation strength and can range from
–1.00 to +1.00.



Positive Correlations: Both variables increase or decrease at the same time. A
correlation coefficient close to +1.00 indicates a strong positive correlation.
Negative Correlations: Indicates that as the amount of one variable increases, the
other decreases (and vice versa). A correlation coefficient close to -1.00 indicates a strong
negative correlation.
No Correlation: Indicates no relationship between the two variables. A correlation
coefficient of 0 indicates no correlation.
Limitations of Correlational Studies:
While correlational studies can suggest that there is a relationship between two variables, they
cannot prove that one variable causes a change in another variable. In other words,
correlation does not equal causation. For example, a correlational study might suggest that
there is a relationship between academic success and self-esteem, but it cannot show if
academic success increases or decreases self-esteem. Other variables might play a role,
including social relationships, cognitive abilities, personality, socio-economic status, and a
myriad of other factors.
Types of Correlational Studies:
1. Naturalistic Observation
Naturalistic observation involves observing and recording the variables of interest in the
natural environment without interference or manipulation by the experimenter.
Advantages of Naturalistic Observation:



Gives the experimenter the opportunity to view the variable of interest in a natural
setting.
Can offer ideas for further research.
May be the only option if lab experimentation is not possible.
Disadvantages of Naturalistic Observation:




Can be time consuming and expensive.
Does not allow for scientific control of variables.
Experimenters cannot control extraneous variables.
Subjects may be aware of the observer and may act differently as a result.
2. The Survey Method
Survey and questionnaires are one of the most common methods used in psychological
research. In this method, a random sample of participants completes a survey, test, or
questionnaire that relates to the variables of interest. Random sampling is a vital part of
ensuring the generalizability of the survey results.
Advantages of the Survey Method:


It’s fast, cheap, and easy. Researchers can collect large amount of data in a relatively
short amount of time.
More flexible than some other methods.
Disadvantages of the Survey Method:


Can be affected by an unrepresentative sample or poor survey questions.
Participants can affect the outcome. Some participants try to please the researcher, lie
to make themselves look better, or have mistaken memories.
3. Archival Research
Archival research is performed by analyzing studies conducted by other researchers or by
looking at historical patient records. For example, researchers recently analyzed the records of
soldiers who served in the Civil War to learn more about PTSD ("The Irritable Heart").
Advantages of Archival Research:



The experimenter cannot introduce changes in participant behavior.
Enormous amounts of data provide a better view of trends, relationships, and
outcomes.
Often less expensive than other study methods. Researchers can often access data
through free archives or records databases.
Disadvantages of Archival Research:



The researchers have not control over how data was collected.
Important date may be missing from the records.
Previous research may be unreliable.
Basic Research Methods



Introduction to Research Methods
The Simple Experiment
Steps in Psychology Research
Suggested Reading



What is Reliability?
What is Validity?
Psychology Research Methods Study Guide
Related Articles

Correlational Relationships Between Variables




Steps in Psychology Research - The Steps in Psychology Research
Psychology Research Methods Study Guide
Research Methods in Social Psychology - Social Psychology Research Methods
Introduction to Research Methods - Psychology Research Methods
Kendra Cherry
Psychology Guide
CORRELATIONAL RESEARCH DESIGNS
Yvonne L. LaMar
Correlation and Causality
Correlational Research refers to studies in which the purpose is to discover relationships between
variables through the use of correlational statistics (r). The square of a correlation coefficient
yields the explained variance (r-squared). A correlational relationship between two variables is
occasionally the result of an outside source, so we have to be careful and remember that
correlation does not necessarily tell us about cause and effect. If a strong relationship is found
between two variables, causality can be tested by using an experimental approach.
Advantages of the Correlational Method
The correlational method permits the researcher to analyze the relationships among a large
number of variables in a single study. The correlation coefficient provides a measure of degree
and direction of relationship. Correlations do not have to be positive to be important, we'll discuss
that a little later.
Uses of the Correlational Method
Explore relationships between variables.
Predict scores on one variable from subject’s scores on other variables
Planning A Relationship Study
Basic Research Design
The primary purpose is to identify the causes of effects of important phenomena.
*Defining the problem - identify specific variables that may be important determinants of the
characteristics or behavior patterns being studied.
*Review of existing literature is helpful in identifying variables.
*Selection of research participants - only those who can be measured by the variables being
investigated.
*Data collection - must be in quantifiable form.
*Data analysis - correlate scores of measured variable (x) that represent the phenomena of
interest with scores of a measured variable (y) thought to be related to that phenomena.
Interpretation
One problem with interpretation is the shotgun approach which is when a large number of
variables are measured and analyzed without a justifiable rationale for their inclusion. This
approach can lead to inconveniencing participants and higher expenses for the time and number
of measuring tools or methods. One way to avoid this is to do preliminary research to esrtablish
that the variables that you intend to use are the most relevant to your purpose
Limitations of Relationship Studies
1) Correlations do not establish cause and effect relationships between variables.
2) Correlations break down complex relationships into simpler components.
3) Success in many complex activities can probably be achieved in different ways.
Planning a Prediction Study
Prediction studies provide three types of information:
- the extent to which a criterion behavior pattern can be predicted.
- data for developing a theory about the determinants of the criterion behavior pattern.
- evidence about the predictive validity of the test or tests that were correlated with the criterion
behavior pattern.
Basic Research Design
1) The problem - this will reflect the type of information that you are trying to predict
2) Selection of research participants - draw from the specific population most pertinent to your
study.
3) Data collection - predictor variables must be measured before the criterion behavior pattern
occurs.
4) Data Analysis - primary method is to correlate each predictor criterion with the criterion
Useful Definitions
bivariate correlational statistics - expresses the magnitude of relationship between two variables.
multiple regression - uses scores on two or more predictor variables to predict performance on
the criterion variables.
Statistical Factors in Prediction Research
Group Prediction
Prediction research is useful for practical selection purposes.
selection ratio - proportion of the available candidates who must be selected.
base rate- percentage of candidates who would be successful if no selection procedures were
applied.
Taylor-Russell Tables
combine three factors; predictive validity, selection ratio, and base rate.
Shrinkage
This is the tendency for predictive validity to decrease when a research study is repeated.
Bivariate Correlational Statistics
Product Moment Correlation, r
“r” is computed when the variables that we wish to correlate are expressed as continuous scores.
Correlation Ratio, eta
This computation is used when the relationship between two variables is non-linear.
Adjustments to Correlation Coefficients
Correction for Attenuation
Provides an estimate of what the correlation between the variables would be if measures had
perfect reliability.
Correction for Restriction in Range
Applied when researcher knows that the range of scores for a sample is restricted on one or both
of the variables being correlated. This application requires the assumption that the two variables
are linearly related throughout the entire range.
Part and Partial Correlation
This application is employed to rule out the influence of one or more measured variables upon the
criterion in order to clarify the role of other variables.
Multivariate Correlational Statistics
These are used when examining the interrelationship of three or more variables.
Multiple Regression
This method is used to determine the correlation between the criterion variable and a combination
of two or more predictor variables. It can be used to analyze data from any quantitative research
design.
Multiple correlation coefficient
measure of the magnitude of the relationship between a criterion variable and some combination
of predictor variables.
Coefficient of determination - r-squared
expresses the amount of variance that can be explained by a predictor variable or combination of
predictor variables.
Discriminant analysis
also involves two or more predictor variables and a single criterion variable, but, is limited to the
case where the criterion is a categorical variable (dichotomous?).
Canonical correlation
a combination of several predictor variables is used to predict a combination of several criterion
variables.
Path Analysis
Used to test the validity of theories about causal relationships between two or more variables that
have been studied in a correlational research design.
Step One: formulate a hypothesis that causally link the variables of interest.
Step Two: select or develop measures of the variables.
Step Three: compute statistics that show the strength of relationship between each pair of
variables that are causally linked in the hypothesis.
Step Four: interpret statistics to determine whether they support or refute the theory.
Correlation matrix
an arrangement of rows and columns that make it easy to see how each measured variable in a
set of such variables correlates with all the other variables in the set.
Recursive model
considers only unidirectional causal relationships.
Non-recursive
used to test hypotheses that involve reciprocal causation between pairs of variables.
Factor Analysis
Provides an empirical basis for reducing numerous variables that are moderately or highly
correlated with each other. A factor represents the variables that are most correlated.
Loading
the individual coefficients of each variable on the factor.
Factor score
a score given for subjects when each factor is treated like a variable
Orthogonal solution
. when factor analysis yields factors that are not correlated with each other.
Oblique solution
when factor analysis yields factors that do correlate with each other.
Structural Equation Modeling, LISREL
Also known as latent variable causal modeling, tests theories of causal relationships between
variables and supplies more reliable and valid measures than path analysis.
Latent variables
theoretical constructs of interest in the model
Manifest variables
variables that were actually measured by the researchers.
Interpretation of Correlation Coefficients
Statistical Significance of Correlation Coefficients
Indicates whether the obtained coefficient is different from zero at a given level of confidence. If
the coefficient is statistically significant different from zero, the null hypothesis from zero cannot
be rejected.
Interpreting the Magnitude of Correlation Coefficient
The closer to one that the correlation coefficient is the stronger the relationship between two
variables. The closer to zero, the weaker the relationship. If the correlation coefficient is a
negative number, the magnitude is the same only in the opposite direction.
Mistakes Sometimes Made in Doing Correlational Research
The researcher:
-assumes that correlation is proof of cause-effect
-relies on shotgun approach
-selects statistics that are inappropriate
-limit analyses to bivariate when multivariate would be more appropriate
-does not conduct cross-validation study
-uses path analysis or structural equation modeling without checking assumptions
-fails to specify an important causal variable in planning a path analysis
-misinterprets the practical or statistical significance in a study
Data de publicação no site: 28/03/2005
Correlation
The correlation is one of the most common and most useful statistics. A correlation is a
single number that describes the degree of relationship between two variables. Let's work
through an example to show you how this statistic is computed.
Correlation Example
Let's assume that we want to look at the relationship between two variables, height (in
inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your
self esteem (incidentally, I don't think we have to worry about the direction of causality
here -- it's not likely that self esteem causes your height!). Let's say we collect some
information on twenty individuals (all male -- we know that the average height differs for
males and females so, to keep this example simple we'll just use males). Height is
measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating
items (where higher scores mean higher self esteem). Here's the data for the 20 cases
(don't take this too seriously -- I made this data up to illustrate what a correlation is):
Person
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Height
68
71
62
75
58
60
67
68
71
69
68
67
63
62
60
Self Esteem
4.1
4.6
3.8
4.4
3.2
3.1
3.8
4.1
4.3
3.7
3.5
3.2
3.7
3.3
3.4
16
17
18
19
20
63
65
67
63
61
4.0
4.1
3.8
3.4
3.6
Now, let's take a quick look at the histogram for each variable:
And, here are the descriptive statistics:
Variable Mean
StDev
Variance Sum
Minimum Maximum Range
Height
4.40574
19.4105
58
65.4
1308
75
17
Self
Esteem
3.755
0.426090 0.181553 75.1
3.1
4.6
1.5
Finally, we'll look at the simple bivariate (i.e., two-variable) plot:
You should immediately see in the bivariate plot that the relationship between the
variables is a positive one (if you can't see that, review the section on types of
relationships) because if you were to fit a single straight line through the dots it would
have a positive slope or move up from left to right. Since the correlation is nothing more
than a quantitative estimate of the relationship, we would expect a positive correlation.
What does a "positive relationship" mean in this context? It means that, in general, higher
scores on one variable tend to be paired with higher scores on the other and that lower
scores on one variable tend to be paired with lower scores on the other. You should
confirm visually that this is generally true in the plot above.
Calculating the Correlation
Now we're ready to compute the correlation value. The formula for the correlation is:
We use the symbol r to stand for the correlation. Through the magic of mathematics it
turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we
have a negative relationship; if it's positive, the relationship is positive. You don't need to
know how we came up with this formula unless you want to be a statistician. But you
probably will need to know how the formula relates to real data -- how you can use the
formula to compute the correlation. Let's look at the data we need for the formula. Here's
the original data with the other necessary columns:
Person
Height (x)
1
68
Self Esteem
(y)
4.1
2
71
3
x*y
x*x
y*y
278.8
4624
16.81
4.6
326.6
5041
21.16
62
3.8
235.6
3844
14.44
4
75
4.4
330
5625
19.36
5
58
3.2
185.6
3364
10.24
6
60
3.1
186
3600
9.61
7
67
3.8
254.6
4489
14.44
8
68
4.1
278.8
4624
16.81
9
71
4.3
305.3
5041
18.49
10
69
3.7
255.3
4761
13.69
11
68
3.5
238
4624
12.25
12
67
3.2
214.4
4489
10.24
13
63
3.7
233.1
3969
13.69
14
62
3.3
204.6
3844
10.89
15
60
3.4
204
3600
11.56
16
63
4
252
3969
16
17
65
4.1
266.5
4225
16.81
18
67
3.8
254.6
4489
14.44
19
63
3.4
214.2
3969
11.56
20
61
3.6
219.6
3721
12.96
Sum =
1308
75.1
4937.6
85912
285.45
The first three columns are the same as in the table above. The next three columns are
simple computations based on the height and self esteem data. The bottom row consists
of the sum of each column. This is all the information we need to compute the
correlation. Here are the values from the bottom row of the table (where N is 20 people)
as they are related to the symbols in the formula:
Now, when we plug these values into the formula given above, we get the following (I
show it here tediously, one step at a time):
So, the correlation for our twenty cases is .73, which is a fairly strong positive
relationship. I guess there is a relationship between height and self esteem, at least in this
made up data!
Testing the Significance of a Correlation
Once you've computed a correlation, you can determine the probability that the observed
correlation occurred by chance. That is, you can conduct a significance test. Most often
you are interested in determining the probability that the correlation is a real one and not
a chance occurrence. In this case, you are testing the mutually exclusive hypotheses:
Null Hypothesis:
r=0
Alternative Hypothesis:
r <> 0
The easiest way to test this hypothesis is to find a statistics book that has a table of
critical values of r. Most introductory statistics texts would have a table like this. As in all
hypothesis testing, you need to first determine the significance level. Here, I'll use the
common significance level of alpha = .05. This means that I am conducting a test where
the odds that the correlation is a chance occurrence is no more than 5 out of 100. Before I
look up the critical value in a table I also have to compute the degrees of freedom or df.
The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, I have to decide
whether I am doing a one-tailed or two-tailed test. In this example, since I have no strong
prior theory to suggest whether the relationship between height and self esteem would be
positive or negative, I'll opt for the two-tailed test. With these three pieces of information
-- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test
(two-tailed) -- I can now test the significance of the correlation I found. When I look up
this value in the handy little table at the back of my statistics book I find that the critical
value is .4438. This means that if my correlation is greater than .4438 or less than -.4438
(remember, this is a two-tailed test) I can conclude that the odds are less than 5 out of 100
that this is a chance occurrence. Since my correlation 0f .73 is actually quite a bit higher,
I conclude that it is not a chance finding and that the correlation is "statistically
significant" (given the parameters of the test). I can reject the null hypothesis and accept
the alternative.
The Correlation Matrix
All I've shown you so far is how to compute a correlation between two variables. In most
studies we have considerably more than two variables. Let's say we have a study with 10
interval-level variables and we want to estimate the relationships among all of them (i.e.,
between all possible pairs of variables). In this instance, we have 45 unique correlations
to estimate (more later on how I knew that!). We could do the above computations 45
times to obtain the correlations. Or we could use just about any statistics program to
automatically compute all 45 with a simple click of the mouse.
I used a simple statistics program to generate random data for 10 variables with 20 cases
(i.e., persons) for each variable. Then, I told the program to compute the correlations
among these variables. Here's the result:
C1
C8
C9
C1
1.000
C2
0.274
C3
-0.134
C4
0.201
C5
-0.129
C6
-0.095
C7
0.171
C8
0.219
1.000
C9
0.518
0.013
1.000
C10
0.299
0.014
0.352
C2
C10
C3
C4
C5
C6
C7
1.000
-0.269
-0.153
-0.166
0.280
-0.122
0.242
1.000
0.075
0.278
-0.348
0.288
-0.380
1.000
-0.011
-0.378
0.086
-0.227
1.000
-0.009
0.193
-0.551
1.000
0.002
0.324
1.000
-0.082
0.238
0.002
0.082
-0.015
0.304
0.347
0.568
1.000
0.165
-0.122
-0.106
-0.169
0.243
-
This type of table is called a correlation matrix. It lists the variable names (C1-C10)
down the first column and across the first row. The diagonal of a correlation matrix (i.e.,
the numbers that go from the upper left corner to the lower right) always consists of ones.
That's because these are the correlations between each variable and itself (and a variable
is always perfectly correlated with itself). This statistical program only shows the lower
triangle of the correlation matrix. In every correlation matrix there are two triangles that
are the values below and to the left of the diagonal (lower triangle) and above and to the
right of the diagonal (upper triangle). There is no reason to print both triangles because
the two triangles of a correlation matrix are always mirror images of each other (the
correlation of variable x with variable y is always equal to the correlation of variable y
with variable x). When a matrix has this mirror-image quality above and below the
diagonal we refer to it as a symmetric matrix. A correlation matrix is always a symmetric
matrix.
To locate the correlation for any pair of variables, find the value in the table for the row
and column intersection for those two variables. For instance, to find the correlation
between variables C5 and C2, I look for where row C2 and column C5 is (in this case it's
blank because it falls in the upper triangle area) and where row C5 and column C2 is and,
in the second case, I find that the correlation is -.166.
OK, so how did I know that there are 45 unique correlations when we have 10 variables?
There's a handy simple little formula that tells how many pairs (e.g., correlations) there
are for any number of variables:
where N is the number of variables. In the example, I had 10 variables, so I know I have
(10 * 9)/2 = 90/2 = 45 pairs.
Other Correlations
The specific type of correlation I've illustrated here is known as the Pearson Product
Moment Correlation. It is appropriate when both variables are measured at an interval
level. However there are a wide variety of other types of correlations for other
circumstances. for instance, if you have two ordinal variables, you could use the
Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau). When
one measure is a continuous interval level one and the other is dichotomous (i.e., twocategory) you can use the Point-Biserial Correlation. For other situations, consulting the
web-based statistics selection program, Selecting Statistics
Download