The Purpose of Correlational Studies: Correlational studies are used to look for relationships between variables. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation. The correlation coefficient is a measure of correlation strength and can range from –1.00 to +1.00. Positive Correlations: Both variables increase or decrease at the same time. A correlation coefficient close to +1.00 indicates a strong positive correlation. Negative Correlations: Indicates that as the amount of one variable increases, the other decreases (and vice versa). A correlation coefficient close to -1.00 indicates a strong negative correlation. No Correlation: Indicates no relationship between the two variables. A correlation coefficient of 0 indicates no correlation. Limitations of Correlational Studies: While correlational studies can suggest that there is a relationship between two variables, they cannot prove that one variable causes a change in another variable. In other words, correlation does not equal causation. For example, a correlational study might suggest that there is a relationship between academic success and self-esteem, but it cannot show if academic success increases or decreases self-esteem. Other variables might play a role, including social relationships, cognitive abilities, personality, socio-economic status, and a myriad of other factors. Types of Correlational Studies: 1. Naturalistic Observation Naturalistic observation involves observing and recording the variables of interest in the natural environment without interference or manipulation by the experimenter. Advantages of Naturalistic Observation: Gives the experimenter the opportunity to view the variable of interest in a natural setting. Can offer ideas for further research. May be the only option if lab experimentation is not possible. Disadvantages of Naturalistic Observation: Can be time consuming and expensive. Does not allow for scientific control of variables. Experimenters cannot control extraneous variables. Subjects may be aware of the observer and may act differently as a result. 2. The Survey Method Survey and questionnaires are one of the most common methods used in psychological research. In this method, a random sample of participants completes a survey, test, or questionnaire that relates to the variables of interest. Random sampling is a vital part of ensuring the generalizability of the survey results. Advantages of the Survey Method: It’s fast, cheap, and easy. Researchers can collect large amount of data in a relatively short amount of time. More flexible than some other methods. Disadvantages of the Survey Method: Can be affected by an unrepresentative sample or poor survey questions. Participants can affect the outcome. Some participants try to please the researcher, lie to make themselves look better, or have mistaken memories. 3. Archival Research Archival research is performed by analyzing studies conducted by other researchers or by looking at historical patient records. For example, researchers recently analyzed the records of soldiers who served in the Civil War to learn more about PTSD ("The Irritable Heart"). Advantages of Archival Research: The experimenter cannot introduce changes in participant behavior. Enormous amounts of data provide a better view of trends, relationships, and outcomes. Often less expensive than other study methods. Researchers can often access data through free archives or records databases. Disadvantages of Archival Research: The researchers have not control over how data was collected. Important date may be missing from the records. Previous research may be unreliable. Basic Research Methods Introduction to Research Methods The Simple Experiment Steps in Psychology Research Suggested Reading What is Reliability? What is Validity? Psychology Research Methods Study Guide Related Articles Correlational Relationships Between Variables Steps in Psychology Research - The Steps in Psychology Research Psychology Research Methods Study Guide Research Methods in Social Psychology - Social Psychology Research Methods Introduction to Research Methods - Psychology Research Methods Kendra Cherry Psychology Guide CORRELATIONAL RESEARCH DESIGNS Yvonne L. LaMar Correlation and Causality Correlational Research refers to studies in which the purpose is to discover relationships between variables through the use of correlational statistics (r). The square of a correlation coefficient yields the explained variance (r-squared). A correlational relationship between two variables is occasionally the result of an outside source, so we have to be careful and remember that correlation does not necessarily tell us about cause and effect. If a strong relationship is found between two variables, causality can be tested by using an experimental approach. Advantages of the Correlational Method The correlational method permits the researcher to analyze the relationships among a large number of variables in a single study. The correlation coefficient provides a measure of degree and direction of relationship. Correlations do not have to be positive to be important, we'll discuss that a little later. Uses of the Correlational Method Explore relationships between variables. Predict scores on one variable from subject’s scores on other variables Planning A Relationship Study Basic Research Design The primary purpose is to identify the causes of effects of important phenomena. *Defining the problem - identify specific variables that may be important determinants of the characteristics or behavior patterns being studied. *Review of existing literature is helpful in identifying variables. *Selection of research participants - only those who can be measured by the variables being investigated. *Data collection - must be in quantifiable form. *Data analysis - correlate scores of measured variable (x) that represent the phenomena of interest with scores of a measured variable (y) thought to be related to that phenomena. Interpretation One problem with interpretation is the shotgun approach which is when a large number of variables are measured and analyzed without a justifiable rationale for their inclusion. This approach can lead to inconveniencing participants and higher expenses for the time and number of measuring tools or methods. One way to avoid this is to do preliminary research to esrtablish that the variables that you intend to use are the most relevant to your purpose Limitations of Relationship Studies 1) Correlations do not establish cause and effect relationships between variables. 2) Correlations break down complex relationships into simpler components. 3) Success in many complex activities can probably be achieved in different ways. Planning a Prediction Study Prediction studies provide three types of information: - the extent to which a criterion behavior pattern can be predicted. - data for developing a theory about the determinants of the criterion behavior pattern. - evidence about the predictive validity of the test or tests that were correlated with the criterion behavior pattern. Basic Research Design 1) The problem - this will reflect the type of information that you are trying to predict 2) Selection of research participants - draw from the specific population most pertinent to your study. 3) Data collection - predictor variables must be measured before the criterion behavior pattern occurs. 4) Data Analysis - primary method is to correlate each predictor criterion with the criterion Useful Definitions bivariate correlational statistics - expresses the magnitude of relationship between two variables. multiple regression - uses scores on two or more predictor variables to predict performance on the criterion variables. Statistical Factors in Prediction Research Group Prediction Prediction research is useful for practical selection purposes. selection ratio - proportion of the available candidates who must be selected. base rate- percentage of candidates who would be successful if no selection procedures were applied. Taylor-Russell Tables combine three factors; predictive validity, selection ratio, and base rate. Shrinkage This is the tendency for predictive validity to decrease when a research study is repeated. Bivariate Correlational Statistics Product Moment Correlation, r “r” is computed when the variables that we wish to correlate are expressed as continuous scores. Correlation Ratio, eta This computation is used when the relationship between two variables is non-linear. Adjustments to Correlation Coefficients Correction for Attenuation Provides an estimate of what the correlation between the variables would be if measures had perfect reliability. Correction for Restriction in Range Applied when researcher knows that the range of scores for a sample is restricted on one or both of the variables being correlated. This application requires the assumption that the two variables are linearly related throughout the entire range. Part and Partial Correlation This application is employed to rule out the influence of one or more measured variables upon the criterion in order to clarify the role of other variables. Multivariate Correlational Statistics These are used when examining the interrelationship of three or more variables. Multiple Regression This method is used to determine the correlation between the criterion variable and a combination of two or more predictor variables. It can be used to analyze data from any quantitative research design. Multiple correlation coefficient measure of the magnitude of the relationship between a criterion variable and some combination of predictor variables. Coefficient of determination - r-squared expresses the amount of variance that can be explained by a predictor variable or combination of predictor variables. Discriminant analysis also involves two or more predictor variables and a single criterion variable, but, is limited to the case where the criterion is a categorical variable (dichotomous?). Canonical correlation a combination of several predictor variables is used to predict a combination of several criterion variables. Path Analysis Used to test the validity of theories about causal relationships between two or more variables that have been studied in a correlational research design. Step One: formulate a hypothesis that causally link the variables of interest. Step Two: select or develop measures of the variables. Step Three: compute statistics that show the strength of relationship between each pair of variables that are causally linked in the hypothesis. Step Four: interpret statistics to determine whether they support or refute the theory. Correlation matrix an arrangement of rows and columns that make it easy to see how each measured variable in a set of such variables correlates with all the other variables in the set. Recursive model considers only unidirectional causal relationships. Non-recursive used to test hypotheses that involve reciprocal causation between pairs of variables. Factor Analysis Provides an empirical basis for reducing numerous variables that are moderately or highly correlated with each other. A factor represents the variables that are most correlated. Loading the individual coefficients of each variable on the factor. Factor score a score given for subjects when each factor is treated like a variable Orthogonal solution . when factor analysis yields factors that are not correlated with each other. Oblique solution when factor analysis yields factors that do correlate with each other. Structural Equation Modeling, LISREL Also known as latent variable causal modeling, tests theories of causal relationships between variables and supplies more reliable and valid measures than path analysis. Latent variables theoretical constructs of interest in the model Manifest variables variables that were actually measured by the researchers. Interpretation of Correlation Coefficients Statistical Significance of Correlation Coefficients Indicates whether the obtained coefficient is different from zero at a given level of confidence. If the coefficient is statistically significant different from zero, the null hypothesis from zero cannot be rejected. Interpreting the Magnitude of Correlation Coefficient The closer to one that the correlation coefficient is the stronger the relationship between two variables. The closer to zero, the weaker the relationship. If the correlation coefficient is a negative number, the magnitude is the same only in the opposite direction. Mistakes Sometimes Made in Doing Correlational Research The researcher: -assumes that correlation is proof of cause-effect -relies on shotgun approach -selects statistics that are inappropriate -limit analyses to bivariate when multivariate would be more appropriate -does not conduct cross-validation study -uses path analysis or structural equation modeling without checking assumptions -fails to specify an important causal variable in planning a path analysis -misinterprets the practical or statistical significance in a study Data de publicação no site: 28/03/2005 Correlation The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's work through an example to show you how this statistic is computed. Correlation Example Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is): Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Height 68 71 62 75 58 60 67 68 71 69 68 67 63 62 60 Self Esteem 4.1 4.6 3.8 4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 16 17 18 19 20 63 65 67 63 61 4.0 4.1 3.8 3.4 3.6 Now, let's take a quick look at the histogram for each variable: And, here are the descriptive statistics: Variable Mean StDev Variance Sum Minimum Maximum Range Height 4.40574 19.4105 58 65.4 1308 75 17 Self Esteem 3.755 0.426090 0.181553 75.1 3.1 4.6 1.5 Finally, we'll look at the simple bivariate (i.e., two-variable) plot: You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation. What does a "positive relationship" mean in this context? It means that, in general, higher scores on one variable tend to be paired with higher scores on the other and that lower scores on one variable tend to be paired with lower scores on the other. You should confirm visually that this is generally true in the plot above. Calculating the Correlation Now we're ready to compute the correlation value. The formula for the correlation is: We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula to compute the correlation. Let's look at the data we need for the formula. Here's the original data with the other necessary columns: Person Height (x) 1 68 Self Esteem (y) 4.1 2 71 3 x*y x*x y*y 278.8 4624 16.81 4.6 326.6 5041 21.16 62 3.8 235.6 3844 14.44 4 75 4.4 330 5625 19.36 5 58 3.2 185.6 3364 10.24 6 60 3.1 186 3600 9.61 7 67 3.8 254.6 4489 14.44 8 68 4.1 278.8 4624 16.81 9 71 4.3 305.3 5041 18.49 10 69 3.7 255.3 4761 13.69 11 68 3.5 238 4624 12.25 12 67 3.2 214.4 4489 10.24 13 63 3.7 233.1 3969 13.69 14 62 3.3 204.6 3844 10.89 15 60 3.4 204 3600 11.56 16 63 4 252 3969 16 17 65 4.1 266.5 4225 16.81 18 67 3.8 254.6 4489 14.44 19 63 3.4 214.2 3969 11.56 20 61 3.6 219.6 3721 12.96 Sum = 1308 75.1 4937.6 85912 285.45 The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem data. The bottom row consists of the sum of each column. This is all the information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula: Now, when we plug these values into the formula given above, we get the following (I show it here tediously, one step at a time): So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship. I guess there is a relationship between height and self esteem, at least in this made up data! Testing the Significance of a Correlation Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you can conduct a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In this case, you are testing the mutually exclusive hypotheses: Null Hypothesis: r=0 Alternative Hypothesis: r <> 0 The easiest way to test this hypothesis is to find a statistics book that has a table of critical values of r. Most introductory statistics texts would have a table like this. As in all hypothesis testing, you need to first determine the significance level. Here, I'll use the common significance level of alpha = .05. This means that I am conducting a test where the odds that the correlation is a chance occurrence is no more than 5 out of 100. Before I look up the critical value in a table I also have to compute the degrees of freedom or df. The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, I have to decide whether I am doing a one-tailed or two-tailed test. In this example, since I have no strong prior theory to suggest whether the relationship between height and self esteem would be positive or negative, I'll opt for the two-tailed test. With these three pieces of information -- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (two-tailed) -- I can now test the significance of the correlation I found. When I look up this value in the handy little table at the back of my statistics book I find that the critical value is .4438. This means that if my correlation is greater than .4438 or less than -.4438 (remember, this is a two-tailed test) I can conclude that the odds are less than 5 out of 100 that this is a chance occurrence. Since my correlation 0f .73 is actually quite a bit higher, I conclude that it is not a chance finding and that the correlation is "statistically significant" (given the parameters of the test). I can reject the null hypothesis and accept the alternative. The Correlation Matrix All I've shown you so far is how to compute a correlation between two variables. In most studies we have considerably more than two variables. Let's say we have a study with 10 interval-level variables and we want to estimate the relationships among all of them (i.e., between all possible pairs of variables). In this instance, we have 45 unique correlations to estimate (more later on how I knew that!). We could do the above computations 45 times to obtain the correlations. Or we could use just about any statistics program to automatically compute all 45 with a simple click of the mouse. I used a simple statistics program to generate random data for 10 variables with 20 cases (i.e., persons) for each variable. Then, I told the program to compute the correlations among these variables. Here's the result: C1 C8 C9 C1 1.000 C2 0.274 C3 -0.134 C4 0.201 C5 -0.129 C6 -0.095 C7 0.171 C8 0.219 1.000 C9 0.518 0.013 1.000 C10 0.299 0.014 0.352 C2 C10 C3 C4 C5 C6 C7 1.000 -0.269 -0.153 -0.166 0.280 -0.122 0.242 1.000 0.075 0.278 -0.348 0.288 -0.380 1.000 -0.011 -0.378 0.086 -0.227 1.000 -0.009 0.193 -0.551 1.000 0.002 0.324 1.000 -0.082 0.238 0.002 0.082 -0.015 0.304 0.347 0.568 1.000 0.165 -0.122 -0.106 -0.169 0.243 - This type of table is called a correlation matrix. It lists the variable names (C1-C10) down the first column and across the first row. The diagonal of a correlation matrix (i.e., the numbers that go from the upper left corner to the lower right) always consists of ones. That's because these are the correlations between each variable and itself (and a variable is always perfectly correlated with itself). This statistical program only shows the lower triangle of the correlation matrix. In every correlation matrix there are two triangles that are the values below and to the left of the diagonal (lower triangle) and above and to the right of the diagonal (upper triangle). There is no reason to print both triangles because the two triangles of a correlation matrix are always mirror images of each other (the correlation of variable x with variable y is always equal to the correlation of variable y with variable x). When a matrix has this mirror-image quality above and below the diagonal we refer to it as a symmetric matrix. A correlation matrix is always a symmetric matrix. To locate the correlation for any pair of variables, find the value in the table for the row and column intersection for those two variables. For instance, to find the correlation between variables C5 and C2, I look for where row C2 and column C5 is (in this case it's blank because it falls in the upper triangle area) and where row C5 and column C2 is and, in the second case, I find that the correlation is -.166. OK, so how did I know that there are 45 unique correlations when we have 10 variables? There's a handy simple little formula that tells how many pairs (e.g., correlations) there are for any number of variables: where N is the number of variables. In the example, I had 10 variables, so I know I have (10 * 9)/2 = 90/2 = 45 pairs. Other Correlations The specific type of correlation I've illustrated here is known as the Pearson Product Moment Correlation. It is appropriate when both variables are measured at an interval level. However there are a wide variety of other types of correlations for other circumstances. for instance, if you have two ordinal variables, you could use the Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau). When one measure is a continuous interval level one and the other is dichotomous (i.e., twocategory) you can use the Point-Biserial Correlation. For other situations, consulting the web-based statistics selection program, Selecting Statistics