Chapter 2: How Values Cluster Together There are three main metrics - mean, median, mode – which are a measure of central tendency. That is, how a group of people or things tend to cluster in some way around a central value. Mean: When you think about a group that’s measured on some numeric variable, you often start thinking about the group’s average value. For example, to investigate Presidential approval, you might go to 100 Independent voters, ask them each for a rating from 1 to 10, add up all the ratings, and divide by 100. That’s one kind of average, and it’s more precisely termed the mean. [p. 35] Median: If you’re after the average housing value in Boston, you probably ask some group such as a board of realtors. They’ll likely tell you what the median price is. The reason you’re less likely to get the mean value is that in real estate sales, there are always a few houses that sell for really outrageous amounts of money. Those few houses pull the mean up so far that it isn’t really representative of the price of a typical house in the region you’re interested in. The median, on the other hand, is right on the 50th percentile for house prices; half the houses sold for less than the median price and half sold for more. [p. 35] Mode: the most frequently occurring value [p. 36] Calculating the Mean [p. 36-37] mean - refers to the total divided by the count. In the context of statistics, it’s very convenient, and more precise, to use the word mean this way. It avoids the vagueness of the word average, which—as just discussed—can refer to the mean, to the median, or to the mode. [questionable – most people understand that average and mean are the same thing] Excel uses the function name AVERAGE() instead of MEAN(). Calculating the Median [p. 46-47] The median of a group of observations is usually, and somewhat casually, thought of as the middle observation when they are in sorted order. But with an even number of observations there is no specific, middle record, and therefore there is no identifiable median record. One way, used by Excel, to calculate the median with an even number of records is to take the mean of the two middle numbers. The syntax for the MEDIAN() function echoes the syntax of the AVERAGE() function. The median is sometimes a more descriptive measure of central tendency than the mean. For example, Figure 2.7 shows what’s called a skewed distribution—that is, the distribution isn’t symmetric. Calculating the Mode [p.48] The mode takes yet another tack: It tells you which one of several categories occurs most frequently. MODE() function returns the most frequently occurring observation only, Defining a Worksheet Function [p. 37-40] An Excel worksheet function—more briefly, a function—is just a formula that someone at Microsoft wrote to save you time, effort, and mistakes. =(B2+B3+B4+B5+B6+B7+B8+B9+B10+B11) / 10 A worksheet function means is a prewritten formula. An Excel formula begins with an equal sign (=). =SUM(B2:B11)/COUNT(B2:B11) =AVERAGE(B2:B11) Defining Arguments More terminology: Those “other, individual values” are called arguments. That’s a highfalutin name for the values that you hand off to the function—or, put another way, that you plug into the prewritten formula. In the instance of the function =AVERAGE(B2:B11) the range of cells represented by B2:B11 is the function’s argument. The arguments always appear in parentheses following the function. The FREQUENCY() function expects that you will use its first argument to tell it where the individual observations are (here, they’re in B2:B11, called the data array by Excel) and that you’ll use its second argument to tell it where to find the boundaries of the groups (here, E2:E6, called the bins array). Defining Return One final bit of terminology used in functions: When a function calculates its result using the arguments you have supplied, it displays the result in the cell where you entered the function. This process is termed returning the result. For example, the AVERAGE() function returns the mean of the values you supply. Using Two Special Excel Skills [p. 36] There are two particular skills in Excel that you will find indispensable for statistical analysis—and they’re also handy for other sorts of work you do in Excel. One is the design and construction of pivot tables and pivot charts. The other is array-entering formulas. Using an Array Formula to Count the Values [p. 53-56] Using an Array Formula Various reasons exist for using array formulas in Excel. Two of the most typical reasons are to support a function that requires it and to enable a function to work on more than just one value. Accommodating a Function One reason you might need to use an array formula is that you’re employing a function that must be array-entered if it is to return results properly. For example, the FREQUENCY() function, which counts the number of values between a lower bound and an upper bound (see “Defining Arguments,” earlier in this chapter) requires that you enter it in an array formula. Another function that requires array-entry is the LINEST() function, which will be discussed in great detail in several subsequent chapters. Both FREQUENCY() and LINEST(), along with a number of other functions, return an array of values to the worksheet. You need to accommodate that array. To do so, begin by selecting a range of cells that has the number of rows and columns needed to show the function’s results. (Knowing how many rows and columns to select depends on your knowledge of the function and your experience with it.) Then you enter the formula that calls the function by means of Ctrl+Shift+Enter instead of simply Enter; again, this sequence is called array entering the formula.