First steps with Excel Dr. M. Lawrence Clevenson Starting Excel To start Excel, there may be an icon for Excel on your screen; if so, double-click it. Also, the Start menu may have Open Office Document; click on Start in the lower left corner, and see if an icon for New Office Document appears. After clicking on that, see if one of the choices is Blank Workbook with a green X. If so, double-click on that choice. The choice that will surely be available for everyone is to first click on Start, then Programs, then either Microsoft Excel or MS Office 2000 and then Microsoft Excel. Moving Around Once “in” Excel, you will see a rectangular array of cells, above which are rows of menu choices and icons. The cell in row 1 and column A has a darker border than the others; it is the active cell, which means that anything that you input with the keyboard will go will go in that cell, called A1. You can change the active cell with the mouse, by clicking, or with the navigation keys, arrow keys and Page Up and Page Down keys. The End key in combination with an arrow key moves the active cell to the last nonblank cell if you start in a nonblank cell, and the first nonblank cell if you start in an empty cell. All of your cells should start empty (blank), so End (just press once and release; do not hold down) DownArrow and End RightArrow should bring you to cell IV65536, which informs you that Excel allows 65,536 rows and 256 columns, labeled A, B, C, …, Z, AA, AB, …, AZ, …, IU, IV. To return to A1, try End UpArrow and End LeftArrow, or Ctrl+Home, a command that (usually) returns to A1. Entering Data The cells in Excel are either empty or contain labels, values, or formulas. Let’s start with a simple example: enter Marital Status of US Adults in A1. In B2 and C2 enter Marital Status and Count (millions). Then make a column of entries beneath these two headings that look like the columns below Never married 43.9 Married 116.7 Widowed 13.4 Divorced 17.6 Most likely the labels in B2, C2, and B3 through B6 don’t fit in the cells, but Excel is recording all of the characters. To see them, use the mouse to click on the space between the column labels B and C, and then click and drag it to the right; this adjusts the width of the column. If you double click between the column labels, Excel will make the width just large enough to cover all the existing entries in that column. The menu command is Format > Column > Autofit or Alt+o c a (with the cells selected or highlighted) First steps, p. 1 of 6, Dr. M. Lawrence Clevenson First Graph There are many ways to make graphs with Excel. One of the easier ways is to use the “Chart Wizard.” Highlight the range from B2 to C6. For a small area like this, the easiest way to highlight is to click on any “corner” (B2, C2, B6, or C6), and drag to the opposite corner. Very large areas are highlighted more quickly by holding the Shift key and then using the End and Arrow buttons. Try that method also. While this area is highlighted, click on the Chart button; this icon looks like a blue, yellow, and red bar graph. Excel gives you many choices and sub-choices of graphs. A Pie Chart is good for the marital status data. Click on Pie, the fourth choice down. The highlighted sub-type is described, and you can click on other choices to see what they look like. You can always change to a different choice; choose any one for now. Click Next. Step 2 allows you to choose or change the input data. No changes are necessary for this example; click Next. Step 3 allows you to input a title for you chart. Notice that Excel already “guessed” a title for you, but you can change that by highlighting the title, and typing “over” it. Try that. Then choose the Data Labels tab. Try the Show label and percent button; do you like the graph? Then the legend on the right is not necessary. Click on the Legend tab, and uncheck the Show legend box. Remember, any of this can be changed easily. Click Next. The last choice is entirely subjective; I usually like my charts in separate sheets; try that. There are advantages to either choice so we should try both. Simple Formulas Suppose we want to find the percentage of each marital status (without the graphs we just did). Highlight the values in C3 to C6. Click on the sum () key. Click on C7. Notice the formula: =Sum(C3:C6). Since spreadsheets often compute totals for rows or columns, the button provides an easy way to do it. However, many formulas can be entered at least somewhat automatically. Next to the icon is the icon fx. Choose an empty cell, say C8, and click the fx button. The Paste Function dialog box appears and asks you to choose a function. For category, choose All, or Math & Trig, and then scroll down for your function, Sum. Highlight Sum and Click OK (or double click Sum). A second dialog box appears that allows you to enter the data, or the arguments, to the Sum function. Click on the little red arrow button to the right of the box. Now highlight the data you want summed, from C3 to C6. Click the red arrow button again. You are again presented with the dialog box, but there are no more values to be summed. Note that the dialog box also displays the answer to your function, 191.6 In D2 enter the label Percent. In D3 enter =C3/C$7. This formula divides the entry in C3 by the entry in C7 and shows the results of this calculation in D3. Note that when D3 is the active cell, the entry window shows the formula, but D3 shows the result of the calculation. This does not look like a percentage; make D3 active and click on Format in the menu row. (Skilled keystrokers may prefer Alt+o, the underlined letter in Format.) Click Cells and the Number tab. Then First steps, p. 2 of 6, Dr. M. Lawrence Clevenson Click Percentage, and choose the number of decimal places you want, say 1, by clicking the up or down arrow keys in the Decimal places box. Look OK? One reason spreadsheets are powerful calculation tools is that they can copy and paste functions with automatic adjustment. We could enter a similar formula into D4 through D7, but if these data were lengthy, that’d be tedious. Instead, we will copy and paste this formula. With D3 active, click on Edit > Copy (or the Copy icon, or Alt+E C, or Ctrl+C). Then highlight D4 through D7. Then Edit > Paste (or the Paste icon, or Alt+E P, or Ctrl+V). Another way to do this is with the fill handle, but we’ll discuss this later. Relative and Absolute References When we copied the formula in D3, =C3/C$7, into D4, notice that the formula in D4 became =C4/C$7. The C3 became C4, but the C$7 stayed the same. The reason is that the $ sign before a number or letter makes this reference absolute, and will not change (or adjust relative to the formula’s other references) when the formula is copied. E.g., try entering into E3, this formula: =C3/C7, and then copying and pasting it into E4. It will become =C4/C8 and probably produce an error message. Try putting dollar signs into the formula in E3 at various places (and choose arbitrary rows and columns for the paste) and see how that changes the pasted results. Practice: Make a normal table of male and female heights. This should run from heights of 4 feet 10 to 7 feet 2 inches in increments of 0.5 inches and table the percentage of males and females shorter than each of these heights. The function you will need is =normdist(x, mean, standard deviation, 1), where these values of (mean, standard deviation) are approximately (64.5” and 2.2”) for women and (70” and 2.8”) for men. (The 1 dictates that the formula is for area to the left of x.) Put the heights in Column A, percentages for women in column B, and percentages for men in column C. Try to use references properly so that your formulas should look like =normdist(A10, B$1, B$2, 1) rather than =normdist(A10, 64.5, 2.2, 1). Part of your table should look something like this: Height (in.) Women Men … … … 64.5 0.5000 0.0247 65 0.5899 0.0371 65.5 0.6753 0.0540 Fill Handles – Copying and Series The table for the work above has consecutive heights in Column A. Start a new sheet by clicking on Sheet 2 in the sheet tabs below the spreadsheet cells. (If you have only one (the current) sheet available, click Insert > Worksheet). In A3 enter First steps, p. 3 of 6, Dr. M. Lawrence Clevenson 58 (4 feet 10 inches), and then in A4 enter 58.5. Highlight these two cells. In the lower right hand corner of cell A4 is a tiny + sign. If you place the mouse cursor over that lower right hand corner, the cursor will change shape to a + sign indicating you have engaged the “Fill handle.” Click and drag down. A small message to the right of the highlight informs you that you reach 86 (7 feet 2 inches) around cell A59. Release the click and you will have created an arithmetic series with increments chosen by the difference between the two highlighted cells in A3 and A4. Play with the fill handle for a bit. Try entering a number like 4 in any open cell, say C3. See what happens when you use the fill handle on it. Then try a formula, e.g., =Average(a3:a10) entered into cell D5. Fill down; fill right. Then finish the normal table above. Homework: On a new sheet make the normal table indicated on pp 40 to 41 of the textbook. Extend that table to negative values of z in the negative direction to –3.99. Be careful; -2.00 + 0.04 = -1.96, not –2.04. Note that you can insert rows as necessary by highlighting the place(s) where rows are to be inserted and then click Insert > Rows. The interior of your table should look something like the table below: (Note: The correct function for standard normal distribution in Excel 2000 is NORMSDIST(Z) (with an S in the middle). This is equivalent to NORMDIST(Z, 0, 1, 1), a normal distribution with mean 0, standard deviation 1, and the last entry, 1, asks for the cumulative area to the left of Z. z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 … … … … … … … … … … … … … … … … … … … … … … -0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 -0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 -0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 Elementary Statistical Calculations and Graphs Retrieve the data called SurvivalTimes from the shared hard drive. The data are the number of days guinea pigs survived after a bacterial injection. (This was a medical study on acquisition of resistance to infection, taken from American Journal of Hygiene 72 (1960), pp130-148). Insert a column of ID numbers for the data from 1 to 72. The missing data are: ID Days 26 53 27 84 First steps, p. 4 of 6, Dr. M. Lawrence Clevenson 28 126 56 80 57 83 58 80 Use End and Down arrow to efficiently fill in the missing values. Then sort the data from the smallest to largest, but keeping track of the ID’s as follows: 1. highlight the entire block of data to be sorted, from A1 to B73. (This step is not necessary, in this case, where there is only one block of data, and we’re sorting all of it.) 2. Choose Data > Sort (or Alt+D S). A dialog box asks how you want to sort the data; note that Header row should be chosen, and use column B ascending. 3. When all your choices seem ok, click the OK button Observe that your data is now sorted from smallest to largest (ascending) and that the ID’s have the same correspondence. The median is the datum in the middle when data are ordered. This datum is in position (n+1) / 2, where n is the number of data. In this case, n = 72, and the median’s position is “36.5”, which means the median is the average of the two measurements in the 36th and 37th positions of the ordered data. These two data are 102 and 103 so that the median is 102.5. Sorting is not necessary to do this, since Excel provides fx functions for statistical calculations. Insert a few rows at the top of the worksheet for some statistical functions. In column A write the name of the statistic, and in column B its value. The median is one of five numbers often cited in a five-number summary, which consists of the minimum, maximum, first and third quartiles (25th and 75th percentiles), and the median. Find all these values. Note that the quartile function in Excel is designed to give you all five of these numbers. Can you see how to do this easily with Copy and Paste? Another stat is the mean or average; Excel uses the name average. Compute the mean (average) survival time. Check that the average is defined as the sum divided by the number by comparing =Average(range) with =Sum(range) / Count(range). What do you think the function Count computes? The most commonly used measure of data spread is the standard deviation. The Excel function is STDEV; if the data are more “spread out,” the standard deviation is larger. Like most “lifetime” data, whether of machinery under stress or survival with an illness, these data are right-skewed. This means that the righthand tail, defined as the maximum – median, is (much) greater than the left-hand tail, defined as the median – the minimum. One could also compare the “extreme tails” defined as maximum – third quartile and … First steps, p. 5 of 6, Dr. M. Lawrence Clevenson A graph of the data will show this right-skewness. Numeric data like survival times are graphed with a histogram, a contiguous column graph of the frequency or percentage of data in each group. Here are the steps. 1. Click Tools > Data Analysis. If that is not available (click on the down arrows at the bottom of the menu if you don’t see it immediately), follow the instructions on pp. 23-24 of the text to “add it in.” 2. A dialog box with many statistical procedures appears. Scroll to histogram and click. 3. Select our SurvivalTime data, either by typing directly in the range (like B9:B80) in the window, or, clicking on the red arrow button and selecting it off the spreadsheet as we have done before. 4. Choose the bin range. a. For now, leave the Bin range blank. Go to Step 5. b. Go to step 7 to see how to choose a bin range. 5. Choose a place for the output, either on a new sheet or some (preferably blank) part of this sheet. 6. Make the chart look better (it is ugly). a. Stretch the graph by left-clicking anywhere inside it and then dragging, say, the lower right corner down and right (watch me). b. Double-click inside one of the columns. Format Data Series dialog box appears. c. Click on options tab. d. Set the Gap width to zero. Choose to Vary colors by point for a colorful graph. You can choose individual colors later also. e. Clear the legend (if you like; I like) by right clicking inside it, and then clicking on clear. 7. It’s still not too good, and that’s usually the case if we leave the Bin range blank. There is no rule for choosing bins, but usually we can improve the graph appearance by over-riding Excel’s choice. We need a little narrower bin-width, so for starters, try again, except we’ll set bins of width 50, say. a. Choose a blank area of the spreadsheet, say D15 of the data sheet (the sheet with the original SurvivalTimes b. Start a series with 50, 100, and then use automatic fill procedure to go to 600. c. When you get to step 4, choose the range of the spreadsheet that these numbers occupy by either method (typing or pointing). d. Go back to step 5 and finish your modifications. First steps, p. 6 of 6, Dr. M. Lawrence Clevenson