EXCEL NOTES - Department of Statistics and Analytical Sciences

advertisement
<This is a chapter from a larger reference manual. The dataset
referenced throughout this document (Widgeone.xls) can be
found at
http://www.science.kennesaw.edu/~jpriestl/STAT3010/stat_3010
_datasets.htm>
Notes for EXCEL
What is EXCEL?
Microsoft’s EXCEL has become the standard for basic data analysis. And, again,
individuals with a college education in the 21st century will be expected to have a
working knowledge of this foundational package. EXCEL is critical to understand not
only because it facilitates basic data analysis, but also because it is typically the
starting point for PC-based data which can then be analyzed using more
sophisticated packages like SPSS, Minitab or SAS.
When you open EXCEL, the interface includes row and columns, with “cells” at the
intersections. You can input data or formulas into the individual cells.
Here is a screen shot of a blank EXCEL page:
The cursor in this page is
in cell “F10”
If you needed to enter data into a new spreadsheet, you would simply type the data
values into each cell, with labels in row 1. EXCEL will accept most characters – letters
and numbers – in the cells. However, only numbers (with a few exceptions) can be
subjected to the kinds of analysis outlined in Chapter 2.
At this point, we need to access the WidegeOne.xls dataset in EXCEL. To access the
dataset, click on “File” and then “Open”. At this point, a Microsoft explorer box will
popup. Go to the folder or drive where you have saved the WidegeOne.xls file:
Note that this explorer box is looking for
EXCEL files. If you need to change the
file type, click on the drop down menu.
Once you have opened the WidegeOne.xls file in EXCEL, you should see this:
Recall from the initial description of the dataset, that there are three worksheets in
the file. The Plant_Survey sheet is currently open. We will be executing most of our
analysis in this sheet. However, if you click on one of the two other tabs – Employees
or Attendance – you will move to one of those two sheets.
Return to the Plant_Survey sheet and we will begin to execute the six statistical
concepts from Chapter 2.
3.1 Using EXCEL for: Measurements of Central Tendency
The three measurements of central tendency can be executed in EXCEL using preprogrammed formulas. Notice in the screen shot from the previous page there is a
circle around a button with an “fx”. Click on this button. You will see this box:
Functions in EXCEL are organized into categories, based upon
different specializations. We will be using functions in the
“Statistical” category.
Go ahead and click on the “Statistical” category. You will see this box:
As you scroll through this box, you will see a
wide variety of statistical functions.
Click on “Cancel” and go back to the dataset. Before we perform any analysis, lets
insert an additional column, where we will insert the labels “Mean”, “Median” and
“Mode”.
To do this, first place your cursor on the “A” in the first column and click, so that the
entire column is highlighted. Now, click on InsertColumn.
At this point, the entire dataset should have shifted to the right, and the “new”
column A is blank:
Now, go to the bottom of the dataset to cell “A43”. In cells A43, A44 and A45, type
“Mean”, “Median” and “Mode”, respectively:
Not all variables will lend themselves to these calculations…remember that we only
execute mean and median calculations on quantitative variables. So, it would be
helpful if we could see the column headers to remind us what is in each column. This
can be done using a “split screen”. Go to cell A2 (the row just below the headings)
and then click on WindowSplit.
At this point you should see this:
Now, you can use the toggle bar on the right to scroll back to your labels, and still
see the column headers.
For which columns should we report the measurements of central tendency? The
quantitative values include JOBGRADE, SOCREL (social relations score), YRONJOB
(number of years on the job), PRDCTY (Productivity) and JOBSAT (job satisfaction).
The calculation of the mode for the qualitative variables (PLANT, GENDER and
POSITION) will be addressed below.
Move your cursor to position F43. This is where we will place the mean for the
JOBGRADE variable. With your cursor in this cell, click on the fx button. From the
dialogue box, select “Statistical”. From the list of function names, click on the
second entry “AVERAGE”.
You will see this:
This dialogue box is effectively asking “for what array of numbers do you want to
calculate an average”? EXCEL is pretty clever. You may already have the array
populated in the first field (Number 1). For the JOBGRADE variable, this will be cell F2
through cell F41. If it is not already populated for you, simply click on the little
spreadsheet button and highlight the cells F2 through F41. Note that cell F42 is
empty. If it is included, it will be ignored. However, if there was a “0” in cell F42, it
would be included…and a different mean would be calculated. It is always best to
only include the relevant cells in your calculations. After you have selected cells F2
through F41 as the array for the mean calculation, click OK. You should now see 6.6.
Now, lets copy this function across to column J. With your cursor in cell F43, go to
EditCopy, then highlight cells G43 through J43. Go to EditPaste.
You should now see this:
To populate the Median cells, we will use the same process. Place your cursor in cell
F44 and click on the function button. From the Statistical functions, select “MEDIAN”
and select the same array – F2:F41. Click OK. Copy and paste the function in cell
F44, across to cell J44. You should now see this:
Although it is not typically used as the best measurement of central tendency of
quantitative data, you can provide the mode for these variables using the same
process – MODE is a function listed in the Statistical category of functions.
3.2 Using EXCEL for: Measurements of Dispersion
Recall from Chapter 2, the most commonly used measurements used to describe the
dispersion of a variable include the standard deviation and the frequency table.
The standard deviation will be calculated in EXCEL using the function button.
Returning to the WidegeOne.xls dataset, enter a label for the “standard deviation”
below the measurements of central tendency:
You probably noticed that the words “Standard Deviation” do not fit neatly into cell
A47 – they spilled over into B47 and C47. Remember that what you see in EXCEL is
not necessarily what EXCEL “sees”. In reality, cells B47 and C47 are still empty from
EXCEL’s perspective. But, this looks a little “untidy”. There are several ways to tidy
this. We can expand column A until the words are visually contained within the
column. This is accomplished by aligning the cursor between the “A” and the “B” at
the top of the spreadsheet until the cursor looks like this
and then double
clicking. Column A will widen enough to accommodate the longest string of
characters in the column…in this case “Standard Deviation”.
A second method of accommodating the text is by “wrapping” the text into the cell.
This is accomplished by selecting FormatCellsWrap text:
To wrap the text, ensure that this option is
checked.
After the text has been wrapped, you can then slightly widen the columns or narrow
the rows (using the same process as for the columns), as needed.
Once the label has been established, select the function button, then within the
Statistical category, select the “STDEV” options and the same range as before –
F2:F41 – and click OK. You should see “ 1.54919334”. This is the standard deviation of
the JOBGRADE variable. As before, copy this formula across to column J.
You should now see this:
We now have the basic descriptive statistics for the quantitative variables. You may
notice that some of the values have no decimal points, some have one decimal
point, some have many decimal points. We think this looks a little untidy (as
Statisticians, we like things to be “tidy”). To make this spreadsheet look a bit more
professional, lets format all of the data to have a consistent number of decimal
points. To do this, click on the cell in the far upper right corner – as circled above.
This will highlight the entire spreadsheet. Then click on FormatCells.
Then select the “Number” category as indicated:
Then click OK.
Now, your very tidy spreadsheet should look like this:
Isn’t that better?
In practice, if you need to provide multiple descriptive statistics on a variable, this is
not the process that you would go through. For multiple descriptive statistics, you
would do the following1ToolsData AnalysisDescriptive Statistics.
If you go to Tools and do not see the Data Analysis option…go to the “Add Ins” option under Tools
and select the “Analysis ToolPak”. Then go back into Tools. You should now see the Data Analysis
option. WARNING…if you have an “unauthorized” copy of EXCEL you will not have access to this very
important functionality.
1
This path will bring up the following:
Select the Descriptive Statistics option.
You will then see the following dialogue box:
For the input range, highlight all of the valu
variables, including the column headings.
Ensure that this option is checked.
Ensure that this option is checked.
Now click OK.
You should now see this:
Again, pretty untidy. Format the spreadsheet to have two decimal points for all
values and expand the columns to accommodate the labels.
Your “tidy” version should look like this:
Notice that we reproduced all of the measurements from before, and several more.
This is a more efficient way to produce the descriptive statistics of a variable(s).
In Chapter 2, we presented the concept of a frequency table as another method of
displaying the spread of a dataset. As discussed, frequency tables are one of the
most commonly used methods to display data – understanding how to create a
frequency table is a critical skill. The table created on page 17, was created in
EXCEL. We will reproduce it here.
The first step to creating a frequency table is to determine the categories that need
to be developed for the quantitative variable (this process will effectively transform a
quantitative ratio-scale variable into a qualitative categorical variable). Previously,
we determined that the job tenure variable (YRONJOB) should be categorized into
three level – less than 5 years, 5-10 years and more than 10 years. Recall that the
categories must be mutually exclusive and collectively exhaustive.
To accommodate these categories in EXCEL, we will create “bins”, where the TOP of
each category identifies each bin.
In our WidegeOne.xls dataset, lets create a bin range for YRONJOB in column L:
These are the “bins” for the Histogram for
Job Tenure. Category 1 is 0-4.99, Category
2 is 5-10.00 and Category 3 (which does not
need to be entered) is everything above
10.00.
Once these bins have been created, select ToolsData AnalysisHistogram:
Click OK. This will bring up a dialogue box, asking for information regarding the
quantitative variable to be analyzed, and the associated Bin Range:
Highlight the range of the YRONJOB variable (including the la
Highlight the Bin Range (including the label)
Ensure that the Labels option is checked
Now you should see this:
Again…a little untidy…but this is the base of what we need for the frequency table.
Lets clean this up and add some columns to reproduce the table from Chapter 2.
First, replace the bin titles with the “real” category labels of “Less than 5 years”, “5-10
years” and “More than 10 years”. Second, expand the columns as needed. Third,
“total” the bottom of the frequency column using the “SUM” option – in cell B5, type
“=SUM(B2:B4)” (the SUM function can be found in the Math & Trig category of
functions). Next, create two addition column headers – “Relative Frequency” and
“Cumulative Frequency”. Your sheet should look like this:
The Relative Frequency column will display the percentage of observations in each
column – an important piece of information…particularly when comparing
populations of different sizes. This is done by simply taking each frequency and
dividing it by the total. For example, in cell C2, we would type “=B2/B5”. This would
result in .2250 (9/40). Rather than typing this same formula again and again to
capture the relative frequencies of the next two categories, we would like to copy
this formula into cells C3 and C4. Do this now. Did you get “#DIV/0”? The problem is
that when the formula =B2/B5 is copied down one cell, it becomes =B3/B6. There is
nothing in cell B6. Since any number divided by 0 is undefined, we receive this error
message. If we want to copy the formula into the cells below, we need to “nail
down” the reference to the “Total” cell and prevent the reference from changing.
To do this, we place a “$” in front of the “B” and another “$” in front of the “5” –
“$B$5” instead of “B5”. This can also be accomplished by placing the cursor in
between the B and the 5 and hitting the F4 button on your computer. Once you
have “nailed down” the Total cell as a reference cell, you can copy the formula into
cells C3 and C4.
The Cumulative Frequency column will display the cumulative percentage of
observations from 0 to the top of the category in question. This is accomplished by
adding the relative frequency of a category to all of the relative frequencies before
it. In EXCEL, we would type “=C2” in cell D2 – the first entry in the Cumulative column
will always equal the first entry in the Relative Frequency column. In cell D3, we
would enter “=D2+C3”. This will add the cumulative value (D2) plus the Relative
Frequency for the category (C3). And…we can now copy this formula into cell D4.
Clearly, this is a lot of “manual” work in EXCEL for a relatively small table. However,
our focus is on helping to build the EXCEL skills necessary to execute this kind of
analysis for any size table or dataset.
You should now have this:
You probably can guess what is next – lets make it a bit more tidy and presentable.
First, lets convert the decimals to percentages – since that is the way most people
would expect to see the data. Highlight cells C2 through D5. Then select
FormatCellsPercentage:
Click OK.
Second, lets format the text to ensure that it is all the same (right now some text is
italicized and may not be the same font). Click on the cell in the most upper left
hand corner – this will highlight the entire spreadsheet. Select FormatCellsFont.
Select your preferred font and size (we are particularly fond of Century Gothic ).
Finally, lets get rid of the “border” line between rows 4 and 5. Highlight the entire
dataset again, and select Format CellsBorderNone. Then go back to your
table, highlight JUST the table, and select Format CellsBorder:
If you want gridlines in your table (helpful when the table
categories), click on “Outline” and “Inside”.
You can change the appearance of the line and the color of
If you created the gridlines, your table should look something like this:
For nascent users of EXCEL, we understand that this seems like a lot of work. To this
mild protest, we have two points. First - most recipients of your analysis will ONLY see
your tables and/or graphics (next section). So you need to spend as much time
making your analysis look clean and professional as you do ensuring that it is
mathematically and logically correct. Second – as you will see in the subsequent
chapters, some of these executions, which appear awkward in EXCEL are quite easy
in other software applications.
3.3 Using EXCEL for: Visualization/Organization of Univariate Data
In this section, we will provide the steps needed to create Histograms, Pie Charts and
Bar Charts. The Stem and Leaf Plot and the Box Plot as outlined in Chapter 2, while
important, are not easily executed in EXCEL. These visualization tools are however
easily executed in the other software applications and will be addressed in
subsequent chapters.
To reproduce the histogram on page 19, we will follow most of the same process,
which was used to create the frequency table in the previous section. Starting with
the Plant_Survey sheet open, go to the Bins that were developed from the previous
exercise. Lets create five categories instead of just three:
Remember that when creating bins in EXCEL, we
identify the TOP of each category…and the highest
category does not need to be identified.
Now, just as before, select ToolsData AnalysisHistogram:
Click OK.
Following the same process as was used to create the frequency table, identify the
Input Range and the Bin Range, and ensure that the Labels box is checked. This
time, also check the “Cumulative Percentage” and “Chart Output” options:
Selecting these two options will convert the
frequency table into a histogram.
Now click OK.
You should see this:
You guessed it…a little untidy. Lets format a few things on our histogram.
First, as before, lets change the Bin names to what we really want: Less than 3, 3-6, 710, 11-14 and 15+. These changes can be made in the frequency table – the
histogram will be automatically updated because the graphic is linked to the table.
Second, double click on the right axis and go to the “scale” tab. Indicate that the
“Maximum” should be 1.0.
Third, highlight the legend and delete it (it does not really communicate any
meaningful information).
Fourth, double click on the x-axis and format the font as needed (we prefer Century
Gothic). Do the same for the other two axes.
Finally, click on the area of the graphic and then right click. Select Chart Options –
and rename the labels as needed. Your final histogram should look something like
this:
Job Tenure for Employees
Frequency
15
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
10
5
0
<3
3-6
7-10
11-14
15+
Years on Job
To reproduce the pie chart on page 21, begin by bringing up the sheet, which
contains the frequency chart that you created in the previous section:
This is the Chart Wizard button, which is used to
create most graphics in EXCEL.
Click on the Chart Wizard button. The Chart Wizard will take you through four steps.
The first step is to select the graphic:
After you have selected the Pie chart type, click on Next. The second step is to
identify which data is to be charted.
Assuming your frequency chart looks like the chart on the previous page, you will
indicate cells A1 through B4 – although the actual data is in column B, we need to
include the data labels from column A:
After the data has been correctly identified, click on Next. In the third step, we are
including the appropriate labels for the Pie Chart:
After you have changed the Chart Title, go to the “Data Labels” tab and select
“Show Percent”. Ultimately, what kind of label you select for your Pie Chart is
personal preference…but typically Pie Charts are used to communicate the
percentages of each category.
After you have completed the third step, you can either click “Finish” or click on
“Next” to identify WHERE in the workbook you want to place your Pie Chart.
Your completed Pie Chart should look like the chart on page 21:
Job Tenure
22.5%
37.5%
Less than 5 years
5-10 years
More than 5 years
40.0%
Well done! As noted in Chapter 2, some recipients of your data may be colorblind.
Although EXCEL is typically does not place colors such as green, red and brown
together, should you need to override the default colors provided in EXCEL (or
include patterns to accommodate printing in black and white), simply click on the
Pie. Then right click and select “Format Data Series”.
The following box will appear:
You can now go through each “slice” and change the color. If you need to change
the solid colors to patterns, simply select the “Fill Effects…” button.
Bar charts are created using a very similar process. We will create a bar chart of the
same information.
Click on the Chart Wizard button. Select the Bar chart type:
Select Next.
As before, identify the data range, which will include the category names:
Click Next.
In step 3, change the title, axis names and other formatting as needed:
Then click Finish to place the chart in the current worksheet:
Job Tenure for Employees
15+
7-10
<3
0
5
10
15
3.4 Using EXCEL for: Visualization/Organization of Multivariate Data
In this section, we will provide the steps necessary to create Contingency Tables
(referred to as “Pivot Tables” in EXCEL), Scatter plots and Stacked Bar Charts in
EXCEL.
As stated in Chapter 2, Contingency Tables are one of the most common and useful
methods of communicating the relationships between and among variables in a
dataset. In EXCEL, the tool used to create these tables is particularly useful and very
flexible (this is one of the few examples where EXCEL may rival or outperform the
more sophisticated applications).
To reproduce the Contingency Tables in Chapter 2, return to the Plant_Survey page
of the WidegeOne.xls dataset. Select DATAPivotTable and PivotChart Report. You
should see this:
This dialogue box is the first of three steps in a Wizard. In step one, simply click Next
(we will use the default selections). In step two, select the entire dataset (including
labels):
Now click Next. The third step requires a bit more thought. The first screen in step
three looks like this:
Click on Layout.
You will see this:
We have four
possible positions
in which to place
any of the
variables on the
right.
Here are all of the
variables in our dataset.
Lets start by reproducing the first Contingency Table in Chapter 2, which provides the
number of individuals by Plant and by Gender.
To do this, drag the “Plant” variable to the Column position and then drag the
Gender variable to the Row position. Then, drag the Gender variable a second time
(notice that after you placed the Gender variable in the Column position, it was still
listed on the right) into the Data position. For now, we will leave the Page position
unpopulated.
You should now have this:
Now click OK and then Finish.
You should now see this:
This is referred to as an EXCEL pivot table. This is the easiest way to determine the
MODE for qualitative variables in EXCEL (e.g., “D” is the MODE for Plant). If we
wanted to convert the counts into percentages (which is typically more meaningful),
we would click on the “Pivot Table” drop down box – circled above. Select Field
Settings, which will result in the following:
The Options page will bring up a dialogue box.
Select the Show Data As drop down menu and select % of row. Then Click OK.
You should now see this:
This provides us with the breakdown of plants by gender. If we need to reverse this,
and report the breakdown of gender by plant, we simply go back to Pivot
TableField SettingsShow Data As% of Columns. This set of executions will
provide us with:
If we want to incorporate an additional piece of information – like the average job
tenure by plant and by gender, we could do this by substituting the YRONJOB
variable in the Data position.
Select Pivot TableWizardLayout. This series of executions will bring us back to the
layout page:
Drag the Count of Gender back to the list on the right and then drag YRONJOB to
the Data position. When the YRONJOB variable is in the Data position, the box will
read” Sum of YRONJOB”. Lets change the sum to an average.
Double click on the “Sum of YRONJOB” box and select Summarize byAverage.
Then select OKOKFinish.
You should now see this:
As before…EXCEL’s output is a little untidy. To format the decimals to be consistent,
highlight the data in the table and then select FormatCellsNumberOK.
Now you should see this:
Much better! This table now provides information such as “in the Dallas plant,
women have an average of 8.85 years on the job”.
Now you can copy and paste this table into other documents or into another EXCEL
sheet.
As you can see, Pivot Tables are very useful and very flexible. However, because
they are so flexible, they do require a bit of manipulation. Mastering Pivot Tables in
EXCEL is a great differentiating skill, but will require practice (and patience).
Stacked bar charts are much easier to create and manipulate. To reproduce the
stacked bar chart on page 29, we will use the first Pivot Table created above that
indicated the frequency counts by gender and by plant. Most graphics in EXCEL are
created using the Chart Wizard – as we did with the pie chart and the bar chart.
Go to the table and copy these cells and paste them into another part of the
spreadsheet:
Count of Gender Plant
Gender
D
N
Grand Total
20.00
F
13.00
7.00
M
10.00 10.00
20.00
Grand Total
23.00 17.00
40.00
This will “disengage” the data from the Pivot Table.
Click on the Chart Wizard button and select the Bar chart type:
Gender
D
F
M
Grand Total
N
13.00
10.00
23.00
Grand
Total
7.00
20.00
10.00
20.00
17.00
40.00
Select the second Chart sub-type option. This is the stacked bar chart. Select Next.
Identify the data range – which will NOT include the totals, but WILL include the
labels. Then add titles.
You should see something like this:
Employees in each Plant by Gender
N
F
M
D
0.00
5.00
10.00
15.00
20.00
25.00
Lets clean this up a bit. We need to change the labels from single letters to the
actual names of the plants and the genders and change the axis to have no
decimals (when dealing with discrete data like people, its best not to have any
decimals). First, place your cursor on one of the blue bars and right click. Select
“source data” and then “series”:
Highlight each series and then
change the name to “Female”
and “Male”.
Click on OK. To change the decimals, double click on the axis and select Number
and then change the number of decimals to 0. Finally, to change the N and the D
to Norcross and Dallas…go back to the original data in the spreadsheet and replace
the N with Norcross and the D with Dallas. See?! The data is integrated into the
graphic!
You should now see this:
Employees in each Plant by Gender
Norcross
Female
Male
Dallas
0
5
10
15
20
25
Well Done!
The final visualization in this section is the scatter plot. Scatter plots are typically used
to determine if there is a meaningful relationship between two variables. To
reproduce the scatter plot on page 30, lets return to the Plant_Survey sheet.
To examine the relationship between Job Productivity and Job Tenure, we will plot
these two variables in a scatter plot. It is important to note that in a scatter plot, we
are NOT trying to establish any causation, only correlation.
We create scatter plots using the same process, which was used for all of the other
graphics – the Chart Wizard.
Click on the chart wizard button and select the Chart type “XY (Scatter)” :
Select Next. In the data range, select the variables PRDCTY and YRONJOB (including
labels). In the Chart Options, create titles and axis labels as appropriate (Productivity
is on the y-axis).
Now, you should have something that looks like this:
Productivity
Is Job Tenure related to Productivity?
150.00
100.00
PRDCTY
50.00
0.00
0.00
5.00
10.00 15.00 20.00
Job Tenure
Again…pretty untidy. We will do three things to clean up the appearance of this
graph: delete the legend, rescale the y-axis and take away the decimals.
To delete the legend, simply click on it and then hit the delete button on your
keyboard. This not only deletes the legend, which was not meaningful, it also
creates space. The y-axis needs to be rescaled because the data does not actually
start until above 50; there is a great deal of wasted space. To rescale the y-axis,
double click on the y-axis and select scale and type 50 in the box for Minimum and
select OK. To resize the graphic, simply highlight the chart and drag one of the
corners until the graphic is the desired size. Finally, to delete the decimals, double
click on the x-axis. Select the Number tab. Set the number of decimals to 0 and
click OK. Do the same thing for the y-axis.
Your scatter plot should look something like this:
Productivity
Is Job Tenure Related to Productivity?
100
90
80
70
60
50
0
5
10
15
20
Job Tenure
As highlighted in Chapter 2, we can derive additional information from this graphic
by adding a trendline to the data. To add a trendline, click on the dots in the graph
and then right click. Select Add trendline.
You should see this:
Select the Linear trend and then click on the Options tab.
You should see this:
Ensure that the “Display equation on chart” and “Display R-squared value on chart”
options are selected. Then click OK.
You should see this:
Productivity
Is Job Tenure Related to Productivity?
100
90
80
70
60
50
y = -0.5715x + 89.318
R2 = 0.1124
0
5
10
15
20
Job Tenure
As explained in Chapter 2, this information now provides us with the best linear
equation, which fits the relationship between Productivity (y) and Job Tenure (x). The
R-squared value of .1124 indicates that this is not a particularly strong relationship –
Job Tenure only explains 11.24% of the change in Productivity. These concepts form
the basis of Regression Modeling.
For a more detailed explanation of Regression Modeling, we recommend Statistical
Methods and Data Analysis by Ott and Longnecker.
3.5 Using EXCEL for: Random Number Generation and Simple Random
Sampling
Our WidegeOne.xls dataset is fairly small – only 40 observations. As a result, it would
be unusual that we would want to extract a sample from such a small dataset.
However, for the purposes of executing the application of random number
generation in EXCEL, lets assume that we want to randomly select ten individuals with
whom we want to conduct in depth interviews.
Lets begin by assigning random numbers to each individual. Go back to the
Plant_Survey sheet and create a new column label – RANDOM. Place your cursor in
the first cell under the column label (row 2). Click on the formula button. Ensure that
“ALL” is selected as the Function Category. Scroll down through the Function Names
until you see “RAND”. Select RAND and click OK. This will generate the following:
There are three pieces of information you need to understand from this box:
1. The function takes no arguments – which means that we do not need to
provide any information;
2. The function will return an evenly distributed (uniform distribution) random
number between 0 and 1;
3. The function is volatile – which means that the value returned will change
EVERY time the spreadsheet is manipulated.
Click OK. You should see some number between 0 and 1 in this cell (your result will
be different each time since the random number is generated using your computer’s
internal clock). Remember that EXCEL reads this cell as “=RAND” – not as the
number that you see. Now copy the formula in this cell down to the bottom of the
dataset. Did you notice that your original number in row 2 changed? This is because
it is volatile. Sometimes we need to have volatile arguments in (not with) EXCEL.
Most of the time we do not. To convert the numbers you see from volatile to stable
(unchanging), highlight the entire column, then select EditCopyEdit Paste
SpecialValues. Now, you should have a column of unchanging random numbers
(your numbers will be different from ours):
Now, sort the entire dataset on the random numbers just created. Select
DATASORTRANDOM (since the numbers are completely random, it does not
matter if you select Ascending or Descending).
Click OK.
Then, select the first 10 individuals for the interviews.  This is a fairly simple, but very
useful process.
3.6 Using EXCEL for: Confidence Intervals
The penultimate section in this chapter will aid in the calculation of Confidence
Intervals – one of the most commonly used techniques in Inferential Statistics. Recall
from Chapter 2 that formula used to estimate a two-sided confidence level of a
population mean is x + Z *(s/SQRT(n)), where
x = the sample mean;
Z = the number of standard deviations, using the sampling distribution and the
Central Limit Theorem, associated with the established confidence level:
90% confidence = 1.645
95% confidence = 1.96
99% confidence = 2.575
s = the sample standard deviation;
n = the number of elements in the sample.
The portion of this formula which occurs after the + is referred to as the “Margin of
Error”. EXCEL will calculate this portion of the formula for us. Then, we will add and
subtract this from the mean to report the complete interval.
From Chapter 2 - Lets assume that the WidegeOne.xls dataset is a representative
sample of a larger manufacturing firm with hundreds of employees in Norcross, GA
and Dallas, TX (if we have access to the entire organization’s data, we would not
calculate confidence intervals of any population parameter – we would report the
descriptive statistics). Lets also assume that the HR department at WidgeOne has
been charged with understanding the level of job satisfaction among employees.
For cost reasons, they were unable to survey the entire organization, so they
surveyed the 40 employees in our dataset. Report the job satisfaction for all
WidgeOne employees, using the sample of 40. Use a 95% level of confidence.
Go back to the Plant_Survey sheet. We previously calculated the mean job
satisfaction to be 6.85 and the standard deviation to be 1.02. Using this information,
we can use EXCEL to compute the confidence interval.
To execute this computation, go into blank portion of the spreadsheet and click on
the function button. Ensure that the Statistical function category is selected and then
scroll through the function names until you get to the CONFIDENCE function. Click
OK.
After you click OK, you will see this dialogue box:
Alpha is 1-the Confidence Level
The STD would have been previously calculated
Size is the sample size…in this case 40
If we are computing a 95% confidence interval, we would enter “.05” for the alpha
value (you can think of alpha as the probability you are willing to accept of being
wrong). The standard deviation, which was computed previously for job satisfaction,
was 1.022. The (sample) size is 40.
Once this information is entered, the resulting computation should be .32. This is the
margin of error for job satisfaction at a confidence level of 95%. You would then add
and subtract this to/from the mean (6.85) to create the full interval. The full interval
would then be reported as: “Based on a representative sample of 40 employees, we
are 95% confident that job satisfaction among all employees is estimated to be
between 7.17 and 6.53”.
An important note in spreadsheet development: you could enter “1.02” in this box or enter the cell reference – J47. You would generate the same
answer. However, you are almost ALWAYS better off entering the cell reference rather than “hard coding” a number. This makes the formula more
“portable”.
2
3.7
EXCEL Lagniappe
We could spend hundreds of pages providing EXCEL Lagniappe. Don’t worry…we
won’t. As you may have already discerned, EXCEL is an incredibly underutilized data
analysis package. Most EXCEL users don’t use the majority of the functionality that is
available. As you continue to further develop your EXCEL skills, you will find more and
more functionality not explored here.
In terms of extra interesting EXCEL short cuts, we would like to share three:
1. Transposing Columns – Occasionally you may find that you need to have the
columns in a dataset rearranged. For example, from the WidegeOne.xls
dataset, you may want to create a scatter plot of Productivity (PRDCTY) and
Social Relations score (SOCREL). Recall that when we selected the chart
wizard, and selected the data range, we had to indicate a single range.
What if the variables are not next to each other? To move PRDCTY next to
SOCREL, first highlight the PRDCTY column by clicking on the column letter.
Then press Ctrl and x. Then highlight the SOCREL column by clicking on the
column letter. Now hold Ctrl and Shift down together then press +. Cool huh?
2. Highlighting data – In the WidegeOne.xls dataset, we only have 40
observations. So, highlighting all of the observations is not particularly difficult.
But, what if we had 4,000 or 400,000 observations? Highlighting all of the rows
might take some time. To highlight an entire column(s) of data, position your
cursor at the top of the dataset. Press down the Ctrl button. With the Ctrl
button pressed, hit the End key and then the down arrow key. This will highlight
all of the data to the end.
3. Autofill – Often we need to create a pattern in EXCEL. Examples might include
days of the week, months of the year, a consecutive series of numbers or a
geometric series (2,4,8,16…). If you can provide EXCEL with the first few entries
in a common pattern, it will then “autofill” the remaining entries.
For example, go into a clean EXCEL spreadsheet. Type “Jan” in cell A1 and then
“Feb” in cell B1. Now, highlight these two cells. Place your cursor on the little square
“handle” on the bottom right corner:
Drag this handle 10 spaces to the right. You now have the months of the year!
Download