EXCEL NOTES - Department of Statistics and Analytical Sciences

<This is a chapter from a larger reference manual. The dataset referenced throughout this document (Widgeone.xls) can be found at http://www.science.kennesaw.edu/~jpriestl/STAT3010/stat_3010 _datasets.htm> Notes for EXCEL What is EXCEL? Microsoft’s EXCEL has become the standard for basic data analysis. And, again, individuals with a college education in the 21st century will be expected to have a working knowledge of this foundational package. EXCEL is critical to understand not only because it facilitates basic data analysis, but also because it is typically the starting point for PC-based data which can then be analyzed using more sophisticated packages like SPSS, Minitab or SAS. When you open EXCEL, the interface includes row and columns, with “cells” at the intersections. You can input data or formulas into the individual cells. Here is a screen shot of a blank EXCEL page: The cursor in this page is in cell “F10” If you needed to enter data into a new spreadsheet, you would simply type the data values into each cell, with labels in row 1. EXCEL will accept most characters – letters and numbers – in the cells. However, only numbers (with a few exceptions) can be subjected to the kinds of analysis outlined in Chapter 2. At this point, we need to access the WidegeOne.xls dataset in EXCEL. To access the dataset, click on “File” and then “Open”. At this point, a Microsoft explorer box will popup. Go to the folder or drive where you have saved the WidegeOne.xls file: Note that this explorer box is looking for EXCEL files. If you need to change the file type, click on the drop down menu. Once you have opened the WidegeOne.xls file in EXCEL, you should see this: Recall from the initial description of the dataset, that there are three worksheets in the file. The Plant_Survey sheet is currently open. We will be executing most of our analysis in this sheet. However, if you click on one of the two other tabs – Employees or Attendance – you will move to one of those two sheets. Return to the Plant_Survey sheet and we will begin to execute the six statistical concepts from Chapter 2. 3.1 Using EXCEL for: Measurements of Central Tendency The three measurements of central tendency can be executed in EXCEL using preprogrammed formulas. Notice in the screen shot from the previous page there is a circle around a button with an “fx”. Click on this button. You will see this box: Functions in EXCEL are organized into categories, based upon different specializations. We will be using functions in the “Statistical” category. Go ahead and click on the “Statistical” category. You will see this box: As you scroll through this box, you will see a wide variety of statistical functions. Click on “Cancel” and go back to the dataset. Before we perform any analysis, lets insert an additional column, where we will insert the labels “Mean”, “Median” and “Mode”. To do this, first place your cursor on the “A” in the first column and click, so that the entire column is highlighted. Now, click on InsertColumn. At this point, the entire dataset should have shifted to the right, and the “new” column A is blank: Now, go to the bottom of the dataset to cell “A43”. In cells A43, A44 and A45, type “Mean”, “Median” and “Mode”, respectively: Not all variables will lend themselves to these calculations…remember that we only execute mean and median calculations on quantitative variables. So, it would be helpful if we could see the column headers to remind us what is in each column. This can be done using a “split screen”. Go to cell A2 (the row just below the headings) and then click on WindowSplit. At this point you should see this: Now, you can use the toggle bar on the right to scroll back to your labels, and still see the column headers. For which columns should we report the measurements of central tendency? The quantitative values include JOBGRADE, SOCREL (social relations score), YRONJOB (number of years on the job), PRDCTY (Productivity) and JOBSAT (job satisfaction). The calculation of the mode for the qualitative variables (PLANT, GENDER and POSITION) will be addressed below. Move your cursor to position F43. This is where we will place the mean for the JOBGRADE variable. With your cursor in this cell, click on the fx button. From the dialogue box, select “Statistical”. From the list of function names, click on the second entry “AVERAGE”. You will see this: This dialogue box is effectively asking “for what array of numbers do you want to calculate an average”? EXCEL is pretty clever. You may already have the array populated in the first field (Number 1). For the JOBGRADE variable, this will be cell F2 through cell F41. If it is not already populated for you, simply click on the little spreadsheet button and highlight the cells F2 through F41. Note that cell F42 is empty. If it is included, it will be ignored. However, if there was a “0” in cell F42, it would be included…and a different mean would be calculated. It is always best to only include the relevant cells in your calculations. After you have selected cells F2 through F41 as the array for the mean calculation, click OK. You should now see 6.6. Now, lets copy this function across to column J. With your cursor in cell F43, go to EditCopy, then highlight cells G43 through J43. Go to EditPaste. You should now see this: To populate the Median cells, we will use the same process. Place your cursor in cell F44 and click on the function button. From the Statistical functions, select “MEDIAN” and select the same array – F2:F41. Click OK. Copy and paste the function in cell F44, across to cell J44. You should now see this: Although it is not typically used as the best measurement of central tendency of quantitative data, you can provide the mode for these variables using the same process – MODE is a function listed in the Statistical category of functions. 3.2 Using EXCEL for: Measurements of Dispersion Recall from Chapter 2, the most commonly used measurements used to describe the dispersion of a variable include the standard deviation and the frequency table. The standard deviation will be calculated in EXCEL using the function button. Returning to the WidegeOne.xls dataset, enter a label for the “standard deviation” below the measurements of central tendency: You probably noticed that the words “Standard Deviation” do not fit neatly into cell A47 – they spilled over into B47 and C47. Remember that what you see in EXCEL is not necessarily what EXCEL “sees”. In reality, cells B47 and C47 are still empty from EXCEL’s perspective. But, this looks a little “untidy”. There are several ways to tidy this. We can expand column A until the words are visually contained within the column. This is accomplished by aligning the cursor between the “A” and the “B” at the top of the spreadsheet until the cursor looks like this and then double clicking. Column A will widen enough to accommodate the longest string of characters in the column…in this case “Standard Deviation”. A second method of accommodating the text is by “wrapping” the text into the cell. This is accomplished by selecting FormatCellsWrap text: To wrap the text, ensure that this option is checked. After the text has been wrapped, you can then slightly widen the columns or narrow the rows (using the same process as for the columns), as needed. Once the label has been established, select the function button, then within the Statistical category, select the “STDEV” options and the same range as before – F2:F41 – and click OK. You should see “ 1.54919334”. This is the standard deviation of the JOBGRADE variable. As before, copy this formula across to column J. You should now see this: We now have the basic descriptive statistics for the quantitative variables. You may notice that some of the values have no decimal points, some have one decimal point, some have many decimal points. We think this looks a little untidy (as Statisticians, we like things to be “tidy”). To make this spreadsheet look a bit more professional, lets format all of the data to have a consistent number of decimal points. To do this, click on the cell in the far upper right corner – as circled above. This will highlight the entire spreadsheet. Then click on FormatCells. Then select the “Number” category as indicated: Then click OK. Now, your very tidy spreadsheet should look like this: Isn’t that better? In practice, if you need to provide multiple descriptive statistics on a variable, this is not the process that you would go through. For multiple descriptive statistics, you would do the following1ToolsData AnalysisDescriptive Statistics. If you go to Tools and do not see the Data Analysis option…go to the “Add Ins” option under Tools and select the “Analysis ToolPak”. Then go back into Tools. You should now see the Data Analysis option. WARNING…if you have an “unauthorized” copy of EXCEL you will not have access to this very important functionality. 1 This path will bring up the following: Select the Descriptive Statistics option. You will then see the following dialogue box: For the input range, highlight all of the valu variables, including the column headings. Ensure that this option is checked. Ensure that this option is checked. Now click OK. You should now see this: Again, pretty untidy. Format the spreadsheet to have two decimal points for all values and expand the columns to accommodate the labels. Your “tidy” version should look like this: Notice that we reproduced all of the measurements from before, and several more. This is a more efficient way to produce the descriptive statistics of a variable(s). In Chapter 2, we presented the concept of a frequency table as another method of displaying the spread of a dataset. As discussed, frequency tables are one of the most commonly used methods to display data – understanding how to create a frequency table is a critical skill. The table created on page 17, was created in EXCEL. We will reproduce it here. The first step to creating a frequency table is to determine the categories that need to be developed for the quantitative variable (this process will effectively transform a quantitative ratio-scale variable into a qualitative categorical variable). Previously, we determined that the job tenure variable (YRONJOB) should be categorized into three level – less than 5 years, 5-10 years and more than 10 years. Recall that the categories must be mutually exclusive and collectively exhaustive. To accommodate these categories in EXCEL, we will create “bins”, where the TOP of each category identifies each bin. In our WidegeOne.xls dataset, lets create a bin range for YRONJOB in column L: These are the “bins” for the Histogram for Job Tenure. Category 1 is 0-4.99, Category 2 is 5-10.00 and Category 3 (which does not need to be entered) is everything above 10.00. Once these bins have been created, select ToolsData AnalysisHistogram: Click OK. This will bring up a dialogue box, asking for information regarding the quantitative variable to be analyzed, and the associated Bin Range: Highlight the range of the YRONJOB variable (including the la Highlight the Bin Range (including the label) Ensure that the Labels option is checked Now you should see this: Again…a little untidy…but this is the base of what we need for the frequency table. Lets clean this up and add some columns to reproduce the table from Chapter 2. First, replace the bin titles with the “real” category labels of “Less than 5 years”, “5-10 years” and “More than 10 years”. Second, expand the columns as needed. Third, “total” the bottom of the frequency column using the “SUM” option – in cell B5, type “=SUM(B2:B4)” (the SUM function can be found in the Math & Trig category of functions). Next, create two addition column headers – “Relative Frequency” and “Cumulative Frequency”. Your sheet should look like this: The Relative Frequency column will display the percentage of observations in each column – an important piece of information…particularly when comparing populations of different sizes. This is done by simply taking each frequency and dividing it by the total. For example, in cell C2, we would type “=B2/B5”. This would result in .2250 (9/40). Rather than typing this same formula again and again to capture the relative frequencies of the next two categories, we would like to copy this formula into cells C3 and C4. Do this now. Did you get “#DIV/0”? The problem is that when the formula =B2/B5 is copied down one cell, it becomes =B3/B6. There is nothing in cell B6. Since any number divided by 0 is undefined, we receive this error message. If we want to copy the formula into the cells below, we need to “nail down” the reference to the “Total” cell and prevent the reference from changing. To do this, we place a “$” in front of the “B” and another “$” in front of the “5” – “$B$5” instead of “B5”. This can also be accomplished by placing the cursor in between the B and the 5 and hitting the F4 button on your computer. Once you have “nailed down” the Total cell as a reference cell, you can copy the formula into cells C3 and C4. The Cumulative Frequency column will display the cumulative percentage of observations from 0 to the top of the category in question. This is accomplished by adding the relative frequency of a category to all of the relative frequencies before it. In EXCEL, we would type “=C2” in cell D2 – the first entry in the Cumulative column will always equal the first entry in the Relative Frequency column. In cell D3, we would enter “=D2+C3”. This will add the cumulative value (D2) plus the Relative Frequency for the category (C3). And…we can now copy this formula into cell D4. Clearly, this is a lot of “manual” work in EXCEL for a relatively small table. However, our focus is on helping to build the EXCEL skills necessary to execute this kind of analysis for any size table or dataset. You should now have this: You probably can guess what is next – lets make it a bit more tidy and presentable. First, lets convert the decimals to percentages – since that is the way most people would expect to see the data. Highlight cells C2 through D5. Then select FormatCellsPercentage: Click OK. Second, lets format the text to ensure that it is all the same (right now some text is italicized and may not be the same font). Click on the cell in the most upper left hand corner – this will highlight the entire spreadsheet. Select FormatCellsFont. Select your preferred font and size (we are particularly fond of Century Gothic ). Finally, lets get rid of the “border” line between rows 4 and 5. Highlight the entire dataset again, and select Format CellsBorderNone. Then go back to your table, highlight JUST the table, and select Format CellsBorder: If you want gridlines in your table (helpful when the table categories), click on “Outline” and “Inside”. You can change the appearance of the line and the color of If you created the gridlines, your table should look something like this: For nascent users of EXCEL, we understand that this seems like a lot of work. To this mild protest, we have two points. First - most recipients of your analysis will ONLY see your tables and/or graphics (next section). So you need to spend as much time making your analysis look clean and professional as you do ensuring that it is mathematically and logically correct. Second – as you will see in the subsequent chapters, some of these executions, which appear awkward in EXCEL are quite easy in other software applications. 3.3 Using EXCEL for: Visualization/Organization of Univariate Data In this section, we will provide the steps needed to create Histograms, Pie Charts and Bar Charts. The Stem and Leaf Plot and the Box Plot as outlined in Chapter 2, while important, are not easily executed in EXCEL. These visualization tools are however easily executed in the other software applications and will be addressed in subsequent chapters. To reproduce the histogram on page 19, we will follow most of the same process, which was used to create the frequency table in the previous section. Starting with the Plant_Survey sheet open, go to the Bins that were developed from the previous exercise. Lets create five categories instead of just three: Remember that when creating bins in EXCEL, we identify the TOP of each category…and the highest category does not need to be identified. Now, just as before, select ToolsData AnalysisHistogram: Click OK. Following the same process as was used to create the frequency table, identify the Input Range and the Bin Range, and ensure that the Labels box is checked. This time, also check the “Cumulative Percentage” and “Chart Output” options: Selecting these two options will convert the frequency table into a histogram. Now click OK. You should see this: You guessed it…a little untidy. Lets format a few things on our histogram. First, as before, lets change the Bin names to what we really want: Less than 3, 3-6, 710, 11-14 and 15+. These changes can be made in the frequency table – the histogram will be automatically updated because the graphic is linked to the table. Second, double click on the right axis and go to the “scale” tab. Indicate that the “Maximum” should be 1.0. Third, highlight the legend and delete it (it does not really communicate any meaningful information). Fourth, double click on the x-axis and format the font as needed (we prefer Century Gothic). Do the same for the other two axes. Finally, click on the area of the graphic and then right click. Select Chart Options – and rename the labels as needed. Your final histogram should look something like this: Job Tenure for Employees Frequency 15 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 10 5 0 <3 3-6 7-10 11-14 15+ Years on Job To reproduce the pie chart on page 21, begin by bringing up the sheet, which contains the frequency chart that you created in the previous section: This is the Chart Wizard button, which is used to create most graphics in EXCEL. Click on the Chart Wizard button. The Chart Wizard will take you through four steps. The first step is to select the graphic: After you have selected the Pie chart type, click on Next. The second step is to identify which data is to be charted. Assuming your frequency chart looks like the chart on the previous page, you will indicate cells A1 through B4 – although the actual data is in column B, we need to include the data labels from column A: After the data has been correctly identified, click on Next. In the third step, we are including the appropriate labels for the Pie Chart: After you have changed the Chart Title, go to the “Data Labels” tab and select “Show Percent”. Ultimately, what kind of label you select for your Pie Chart is personal preference…but typically Pie Charts are used to communicate the percentages of each category. After you have completed the third step, you can either click “Finish” or click on “Next” to identify WHERE in the workbook you want to place your Pie Chart. Your completed Pie Chart should look like the chart on page 21: Job Tenure 22.5% 37.5% Less than 5 years 5-10 years More than 5 years 40.0% Well done! As noted in Chapter 2, some recipients of your data may be colorblind. Although EXCEL is typically does not place colors such as green, red and brown together, should you need to override the default colors provided in EXCEL (or include patterns to accommodate printing in black and white), simply click on the Pie. Then right click and select “Format Data Series”. The following box will appear: You can now go through each “slice” and change the color. If you need to change the solid colors to patterns, simply select the “Fill Effects…” button. Bar charts are created using a very similar process. We will create a bar chart of the same information. Click on the Chart Wizard button. Select the Bar chart type: Select Next. As before, identify the data range, which will include the category names: Click Next. In step 3, change the title, axis names and other formatting as needed: Then click Finish to place the chart in the current worksheet: Job Tenure for Employees 15+ 7-10 <3 0 5 10 15 3.4 Using EXCEL for: Visualization/Organization of Multivariate Data In this section, we will provide the steps necessary to create Contingency Tables (referred to as “Pivot Tables” in EXCEL), Scatter plots and Stacked Bar Charts in EXCEL. As stated in Chapter 2, Contingency Tables are one of the most common and useful methods of communicating the relationships between and among variables in a dataset. In EXCEL, the tool used to create these tables is particularly useful and very flexible (this is one of the few examples where EXCEL may rival or outperform the more sophisticated applications). To reproduce the Contingency Tables in Chapter 2, return to the Plant_Survey page of the WidegeOne.xls dataset. Select DATAPivotTable and PivotChart Report. You should see this: This dialogue box is the first of three steps in a Wizard. In step one, simply click Next (we will use the default selections). In step two, select the entire dataset (including labels): Now click Next. The third step requires a bit more thought. The first screen in step three looks like this: Click on Layout. You will see this: We have four possible positions in which to place any of the variables on the right. Here are all of the variables in our dataset. Lets start by reproducing the first Contingency Table in Chapter 2, which provides the number of individuals by Plant and by Gender. To do this, drag the “Plant” variable to the Column position and then drag the Gender variable to the Row position. Then, drag the Gender variable a second time (notice that after you placed the Gender variable in the Column position, it was still listed on the right) into the Data position. For now, we will leave the Page position unpopulated. You should now have this: Now click OK and then Finish. You should now see this: This is referred to as an EXCEL pivot table. This is the easiest way to determine the MODE for qualitative variables in EXCEL (e.g., “D” is the MODE for Plant). If we wanted to convert the counts into percentages (which is typically more meaningful), we would click on the “Pivot Table” drop down box – circled above. Select Field Settings, which will result in the following: The Options page will bring up a dialogue box. Select the Show Data As drop down menu and select % of row. Then Click OK. You should now see this: This provides us with the breakdown of plants by gender. If we need to reverse this, and report the breakdown of gender by plant, we simply go back to Pivot TableField SettingsShow Data As% of Columns. This set of executions will provide us with: If we want to incorporate an additional piece of information – like the average job tenure by plant and by gender, we could do this by substituting the YRONJOB variable in the Data position. Select Pivot TableWizardLayout. This series of executions will bring us back to the layout page: Drag the Count of Gender back to the list on the right and then drag YRONJOB to the Data position. When the YRONJOB variable is in the Data position, the box will read” Sum of YRONJOB”. Lets change the sum to an average. Double click on the “Sum of YRONJOB” box and select Summarize byAverage. Then select OKOKFinish. You should now see this: As before…EXCEL’s output is a little untidy. To format the decimals to be consistent, highlight the data in the table and then select FormatCellsNumberOK. Now you should see this: Much better! This table now provides information such as “in the Dallas plant, women have an average of 8.85 years on the job”. Now you can copy and paste this table into other documents or into another EXCEL sheet. As you can see, Pivot Tables are very useful and very flexible. However, because they are so flexible, they do require a bit of manipulation. Mastering Pivot Tables in EXCEL is a great differentiating skill, but will require practice (and patience). Stacked bar charts are much easier to create and manipulate. To reproduce the stacked bar chart on page 29, we will use the first Pivot Table created above that indicated the frequency counts by gender and by plant. Most graphics in EXCEL are created using the Chart Wizard – as we did with the pie chart and the bar chart. Go to the table and copy these cells and paste them into another part of the spreadsheet: Count of Gender Plant Gender D N Grand Total 20.00 F 13.00 7.00 M 10.00 10.00 20.00 Grand Total 23.00 17.00 40.00 This will “disengage” the data from the Pivot Table. Click on the Chart Wizard button and select the Bar chart type: Gender D F M Grand Total N 13.00 10.00 23.00 Grand Total 7.00 20.00 10.00 20.00 17.00 40.00 Select the second Chart sub-type option. This is the stacked bar chart. Select Next. Identify the data range – which will NOT include the totals, but WILL include the labels. Then add titles. You should see something like this: Employees in each Plant by Gender N F M D 0.00 5.00 10.00 15.00 20.00 25.00 Lets clean this up a bit. We need to change the labels from single letters to the actual names of the plants and the genders and change the axis to have no decimals (when dealing with discrete data like people, its best not to have any decimals). First, place your cursor on one of the blue bars and right click. Select “source data” and then “series”: Highlight each series and then change the name to “Female” and “Male”. Click on OK. To change the decimals, double click on the axis and select Number and then change the number of decimals to 0. Finally, to change the N and the D to Norcross and Dallas…go back to the original data in the spreadsheet and replace the N with Norcross and the D with Dallas. See?! The data is integrated into the graphic! You should now see this: Employees in each Plant by Gender Norcross Female Male Dallas 0 5 10 15 20 25 Well Done! The final visualization in this section is the scatter plot. Scatter plots are typically used to determine if there is a meaningful relationship between two variables. To reproduce the scatter plot on page 30, lets return to the Plant_Survey sheet. To examine the relationship between Job Productivity and Job Tenure, we will plot these two variables in a scatter plot. It is important to note that in a scatter plot, we are NOT trying to establish any causation, only correlation. We create scatter plots using the same process, which was used for all of the other graphics – the Chart Wizard. Click on the chart wizard button and select the Chart type “XY (Scatter)” : Select Next. In the data range, select the variables PRDCTY and YRONJOB (including labels). In the Chart Options, create titles and axis labels as appropriate (Productivity is on the y-axis). Now, you should have something that looks like this: Productivity Is Job Tenure related to Productivity? 150.00 100.00 PRDCTY 50.00 0.00 0.00 5.00 10.00 15.00 20.00 Job Tenure Again…pretty untidy. We will do three things to clean up the appearance of this graph: delete the legend, rescale the y-axis and take away the decimals. To delete the legend, simply click on it and then hit the delete button on your keyboard. This not only deletes the legend, which was not meaningful, it also creates space. The y-axis needs to be rescaled because the data does not actually start until above 50; there is a great deal of wasted space. To rescale the y-axis, double click on the y-axis and select scale and type 50 in the box for Minimum and select OK. To resize the graphic, simply highlight the chart and drag one of the corners until the graphic is the desired size. Finally, to delete the decimals, double click on the x-axis. Select the Number tab. Set the number of decimals to 0 and click OK. Do the same thing for the y-axis. Your scatter plot should look something like this: Productivity Is Job Tenure Related to Productivity? 100 90 80 70 60 50 0 5 10 15 20 Job Tenure As highlighted in Chapter 2, we can derive additional information from this graphic by adding a trendline to the data. To add a trendline, click on the dots in the graph and then right click. Select Add trendline. You should see this: Select the Linear trend and then click on the Options tab. You should see this: Ensure that the “Display equation on chart” and “Display R-squared value on chart” options are selected. Then click OK. You should see this: Productivity Is Job Tenure Related to Productivity? 100 90 80 70 60 50 y = -0.5715x + 89.318 R2 = 0.1124 0 5 10 15 20 Job Tenure As explained in Chapter 2, this information now provides us with the best linear equation, which fits the relationship between Productivity (y) and Job Tenure (x). The R-squared value of .1124 indicates that this is not a particularly strong relationship – Job Tenure only explains 11.24% of the change in Productivity. These concepts form the basis of Regression Modeling. For a more detailed explanation of Regression Modeling, we recommend Statistical Methods and Data Analysis by Ott and Longnecker. 3.5 Using EXCEL for: Random Number Generation and Simple Random Sampling Our WidegeOne.xls dataset is fairly small – only 40 observations. As a result, it would be unusual that we would want to extract a sample from such a small dataset. However, for the purposes of executing the application of random number generation in EXCEL, lets assume that we want to randomly select ten individuals with whom we want to conduct in depth interviews. Lets begin by assigning random numbers to each individual. Go back to the Plant_Survey sheet and create a new column label – RANDOM. Place your cursor in the first cell under the column label (row 2). Click on the formula button. Ensure that “ALL” is selected as the Function Category. Scroll down through the Function Names until you see “RAND”. Select RAND and click OK. This will generate the following: There are three pieces of information you need to understand from this box: 1. The function takes no arguments – which means that we do not need to provide any information; 2. The function will return an evenly distributed (uniform distribution) random number between 0 and 1; 3. The function is volatile – which means that the value returned will change EVERY time the spreadsheet is manipulated. Click OK. You should see some number between 0 and 1 in this cell (your result will be different each time since the random number is generated using your computer’s internal clock). Remember that EXCEL reads this cell as “=RAND” – not as the number that you see. Now copy the formula in this cell down to the bottom of the dataset. Did you notice that your original number in row 2 changed? This is because it is volatile. Sometimes we need to have volatile arguments in (not with) EXCEL. Most of the time we do not. To convert the numbers you see from volatile to stable (unchanging), highlight the entire column, then select EditCopyEdit Paste SpecialValues. Now, you should have a column of unchanging random numbers (your numbers will be different from ours): Now, sort the entire dataset on the random numbers just created. Select DATASORTRANDOM (since the numbers are completely random, it does not matter if you select Ascending or Descending). Click OK. Then, select the first 10 individuals for the interviews.  This is a fairly simple, but very useful process. 3.6 Using EXCEL for: Confidence Intervals The penultimate section in this chapter will aid in the calculation of Confidence Intervals – one of the most commonly used techniques in Inferential Statistics. Recall from Chapter 2 that formula used to estimate a two-sided confidence level of a population mean is x + Z *(s/SQRT(n)), where x = the sample mean; Z = the number of standard deviations, using the sampling distribution and the Central Limit Theorem, associated with the established confidence level: 90% confidence = 1.645 95% confidence = 1.96 99% confidence = 2.575 s = the sample standard deviation; n = the number of elements in the sample. The portion of this formula which occurs after the + is referred to as the “Margin of Error”. EXCEL will calculate this portion of the formula for us. Then, we will add and subtract this from the mean to report the complete interval. From Chapter 2 - Lets assume that the WidegeOne.xls dataset is a representative sample of a larger manufacturing firm with hundreds of employees in Norcross, GA and Dallas, TX (if we have access to the entire organization’s data, we would not calculate confidence intervals of any population parameter – we would report the descriptive statistics). Lets also assume that the HR department at WidgeOne has been charged with understanding the level of job satisfaction among employees. For cost reasons, they were unable to survey the entire organization, so they surveyed the 40 employees in our dataset. Report the job satisfaction for all WidgeOne employees, using the sample of 40. Use a 95% level of confidence. Go back to the Plant_Survey sheet. We previously calculated the mean job satisfaction to be 6.85 and the standard deviation to be 1.02. Using this information, we can use EXCEL to compute the confidence interval. To execute this computation, go into blank portion of the spreadsheet and click on the function button. Ensure that the Statistical function category is selected and then scroll through the function names until you get to the CONFIDENCE function. Click OK. After you click OK, you will see this dialogue box: Alpha is 1-the Confidence Level The STD would have been previously calculated Size is the sample size…in this case 40 If we are computing a 95% confidence interval, we would enter “.05” for the alpha value (you can think of alpha as the probability you are willing to accept of being wrong). The standard deviation, which was computed previously for job satisfaction, was 1.022. The (sample) size is 40. Once this information is entered, the resulting computation should be .32. This is the margin of error for job satisfaction at a confidence level of 95%. You would then add and subtract this to/from the mean (6.85) to create the full interval. The full interval would then be reported as: “Based on a representative sample of 40 employees, we are 95% confident that job satisfaction among all employees is estimated to be between 7.17 and 6.53”. An important note in spreadsheet development: you could enter “1.02” in this box or enter the cell reference – J47. You would generate the same answer. However, you are almost ALWAYS better off entering the cell reference rather than “hard coding” a number. This makes the formula more “portable”. 2 3.7 EXCEL Lagniappe We could spend hundreds of pages providing EXCEL Lagniappe. Don’t worry…we won’t. As you may have already discerned, EXCEL is an incredibly underutilized data analysis package. Most EXCEL users don’t use the majority of the functionality that is available. As you continue to further develop your EXCEL skills, you will find more and more functionality not explored here. In terms of extra interesting EXCEL short cuts, we would like to share three: 1. Transposing Columns – Occasionally you may find that you need to have the columns in a dataset rearranged. For example, from the WidegeOne.xls dataset, you may want to create a scatter plot of Productivity (PRDCTY) and Social Relations score (SOCREL). Recall that when we selected the chart wizard, and selected the data range, we had to indicate a single range. What if the variables are not next to each other? To move PRDCTY next to SOCREL, first highlight the PRDCTY column by clicking on the column letter. Then press Ctrl and x. Then highlight the SOCREL column by clicking on the column letter. Now hold Ctrl and Shift down together then press +. Cool huh? 2. Highlighting data – In the WidegeOne.xls dataset, we only have 40 observations. So, highlighting all of the observations is not particularly difficult. But, what if we had 4,000 or 400,000 observations? Highlighting all of the rows might take some time. To highlight an entire column(s) of data, position your cursor at the top of the dataset. Press down the Ctrl button. With the Ctrl button pressed, hit the End key and then the down arrow key. This will highlight all of the data to the end. 3. Autofill – Often we need to create a pattern in EXCEL. Examples might include days of the week, months of the year, a consecutive series of numbers or a geometric series (2,4,8,16…). If you can provide EXCEL with the first few entries in a common pattern, it will then “autofill” the remaining entries. For example, go into a clean EXCEL spreadsheet. Type “Jan” in cell A1 and then “Feb” in cell B1. Now, highlight these two cells. Place your cursor on the little square “handle” on the bottom right corner: Drag this handle 10 spaces to the right. You now have the months of the year!

EXCEL NOTES - Department of Statistics and Analytical Sciences

Related documents

Products

Support

EXCEL NOTES - Department of Statistics and Analytical Sciences

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib