Excel--Correlation Analysis

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Excel--Correlation Analysis
Recall that Correlation Analysis measures the strength/degree of linear relationship (if any) between two
quantitative measurement variables. This handout will explain how to conduct correlation analysis in Microsoft
Excel 2013.
NOTE: It is assumed in this handout that the Analysis Toolpak "Add-In" has been activated in Excel. An
"Add-In" is an extra feature of Excel that is not active by default, so you must activate it. (To activate the Data
Analysis "Add-In" in Excel, start Excel, and then go to the File tab at the top of the Excel window, then select
"Options" on the left, and then "Add-Ins" on the left. Next, at the bottom of the window, in the box to the right of
"Manage", select "Excel Add-ins", then click the "Go" button, then check the box beside "Analysis ToolPak" in
the pop-up window, then click the "OK" button. After doing this, you might need to re-start Excel to activate the
Add-In.)
The ProcCorrData.xls Dataset
This handout will use the ProcCorrData.xls dataset as an example. Go to the Handouts page of the ECN377
website and download the ProcCorrData.xls dataset to the ECN377 folder on the C: drive of your computer. The
ProcCorrData.xls dataset contains data on 9 variables for a random sample of 45 North Carolina counties (out of
the total population of 100 North Carolina counties), as described in the table below:
Variable Name
CntyName
PopCens
LandArea
PM10Area
HousingUnits
EmpManf2000
VehRegs
PavedMiles
MeanFamInc
Variable Definition
Name of county in North Carolina
Population in county in year 2000
Land area (square miles) in county in year 2000
Air pollution index (estimated emissions in tons of air pollution particles less than
10 micrometers in size) for county in year 2000
Total of houses, apartments, mobile homes, etc., in county in year 2000
Manufacturing employment in county in year 2000
Number of cars and trucks registered (owned and located) in county in year 2000
Number of miles of paved roads in county in year 2000
Average (mean) household income in county in year 2000
Open the ProcCorrData.xls Dataset in Excel
The first four rows should look like this:
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Conducting a Correlation Analysis on Two Variables
With the ProcCorrData.xls dataset open in Excel:
 Select the Data tab at the top of the Excel window
 Select "Data Analysis" (on the right). (Note: If you don't have a "Data Analysis" button, then you need
to add the Analysis Toolpak Add-In to Excel. See above.)
 Select "Correlation" and click "OK." You will see the "Correlation" pop-up window.
 Click inside the "Input Range" box, and then select cells B1 to C46. This tells Excel which data to use
for the correlation analysis. In this case, we are conducting a correlation analysis for variables PopCens
and LandArea. NOTE: The two variables that you want to analyze need to be in adjacent columns.
If the variables you want to analyze are not in adjacent columns, move the columns so that the two
variables you want to analyze are in adjacent columns.
 Check the box "Labels in First Row," because the labels for our variables are in the first row of the
spreadsheet. (If the first row of the spreadsheet did not contain the variable names, but instead simply
gave the first row of data, then we would not check the "Labels in First Row" box.)
 Check the button beside "Output Range," and then click inside the box to the right of "Output Range."
Then, click on an empty cell in the spreadsheet, say, cell K2. The cell needs to have other blank cells
below it and to the right of it. This is the area on the spreadsheet where Excel will put the results.
 Click the "OK" button.
Results



Excel puts the results of the correlation analysis on the spreadsheet, starting in cell K2, and then working
down and to the right of cell K2.
The results are presented in the form of a Correlation Matrix. (See the Correlation handout for a
description of a Correlation Matrix.) In this example, the correlation between variables PopCens and
LandArea is r = 0.292234 . Excel gives you the results for only the lower half of the Correlation Matrix,
because the numbers in the upper half of the matrix are the same as the numbers in the lower half.
Notice that, unlike SAS, Excel does not give you the p-value (or the t-test value) for the hypothesis test
H0: ρ=0 vs. H1: ρ≠0. So, if you accept the r value that Excel produces, you are assuming that H1: ρ≠0 is
true, without testing it. That can be risky. If you want, you can conduct the hypothesis test H0: ρ=0 vs.
H1: ρ≠0 on the side, on scratch paper, using the r value produced by Excel in the t-test formula
𝑟√𝑛−2
𝑡𝑡𝑒𝑠𝑡 =
, as described in the Correlation handout. You would need to compare the ttest number with
√1−𝑟 2
a tcritical number from a t-table, using the significance level (α) you want, and using degrees of freedom d.f.
= n – 2.
Conducting a Correlation Analysis for More than Two Variables


Back in Excel, follow the same steps as described above for the analysis of two variables, except select
cells B1 to I46 instead of cells B1 to C46.
The results for all of the selected variables are presented in a Correlation Matrix. Again, only the results
for the lower half of the matrix are presented, because the results in the upper half of the matrix are the
same as the results in the lower half.
2
Download