Chapter 2: How Values Cluster Together There are three main

advertisement
Chapter 2: How Values Cluster Together
There are three main metrics - mean, median, mode – which are a measure of central tendency. That is,
how a group of people or things tend to cluster in some way around a central value.
Mean: When you think about a group that’s measured on some numeric variable, you often start
thinking about the group’s average value. For example, to investigate Presidential approval, you might
go to 100 Independent voters, ask them each for a rating from 1 to 10, add up all the ratings, and divide
by 100. That’s one kind of average, and it’s more precisely termed the mean. [p. 35]
Median: If you’re after the average housing value in Boston, you probably ask some group such as a
board of realtors. They’ll likely tell you what the median price is. The reason you’re less likely to get the
mean value is that in real estate sales, there are always a few houses that sell for really outrageous
amounts of money. Those few houses pull the mean up so far that it isn’t really representative of the
price of a typical house in the region you’re interested in. The median, on the other hand, is right on the
50th percentile for house prices; half the houses sold for less than the median price and half sold for
more. [p. 35]
Mode: the most frequently occurring value [p. 36]
Calculating the Mean [p. 36-37]
mean - refers to the total divided by the count. In the context of statistics, it’s very convenient, and
more precise, to use the word mean this way. It avoids the vagueness of the word average, which—as
just discussed—can refer to the mean, to the median, or to the mode. [questionable – most people
understand that average and mean are the same thing]

Excel uses the function name AVERAGE() instead of MEAN().
Calculating the Median [p. 46-47]
The median of a group of observations is usually, and somewhat casually, thought of as the middle
observation when they are in sorted order. But with an even number of observations there is no
specific, middle record, and therefore there is no identifiable median record. One way, used by Excel, to
calculate the median with an even number of records is to take the mean of the two middle numbers.

The syntax for the MEDIAN() function echoes the syntax of the AVERAGE() function.
The median is sometimes a more descriptive measure of central tendency than the mean. For example,
Figure 2.7 shows what’s called a skewed distribution—that is, the distribution isn’t symmetric.
Calculating the Mode [p.48]
The mode takes yet another tack: It tells you which one of several categories occurs most frequently.

MODE() function returns the most frequently occurring observation only,
Defining a Worksheet Function [p. 37-40]
An Excel worksheet function—more briefly, a function—is just a formula that someone at Microsoft
wrote to save you time, effort, and mistakes.
=(B2+B3+B4+B5+B6+B7+B8+B9+B10+B11) / 10
A worksheet function means is a prewritten formula. An Excel formula begins with an equal sign (=).
=SUM(B2:B11)/COUNT(B2:B11)
=AVERAGE(B2:B11)
Defining Arguments
More terminology: Those “other, individual values” are called arguments. That’s a highfalutin name for
the values that you hand off to the function—or, put another way, that you plug into the prewritten
formula. In the instance of the function
=AVERAGE(B2:B11)
the range of cells represented by B2:B11 is the function’s argument. The arguments always appear in
parentheses following the function.
The FREQUENCY() function expects that you will use its first argument to tell it where the individual
observations are (here, they’re in B2:B11, called the data array by Excel) and that you’ll use its second
argument to tell it where to find the boundaries of the groups (here, E2:E6, called the bins array).
Defining Return
One final bit of terminology used in functions: When a function calculates its result using the arguments
you have supplied, it displays the result in the cell where you entered the function. This process is
termed returning the result. For example, the AVERAGE() function returns the mean of the values you
supply.
Using Two Special Excel Skills [p. 36]
There are two particular skills in Excel that you will find indispensable for statistical analysis—and
they’re also handy for other sorts of work you do in Excel. One is the design and construction of pivot
tables and pivot charts. The other is array-entering formulas.
Using an Array Formula to Count the Values [p. 53-56]
Using an Array Formula
Various reasons exist for using array formulas in Excel. Two of the most typical reasons are to support a
function that requires it and to enable a function to work on more than just one value.
Accommodating a Function
One reason you might need to use an array formula is that you’re employing a function that must be
array-entered if it is to return results properly. For example, the FREQUENCY() function, which counts
the number of values between a lower bound and an upper bound (see “Defining Arguments,” earlier in
this chapter) requires that you enter it in an array formula. Another function that requires array-entry is
the LINEST() function, which will be discussed in great detail in several subsequent chapters.
Both FREQUENCY() and LINEST(), along with a number of other functions, return an array of values to
the worksheet. You need to accommodate that array. To do so, begin by selecting a range of cells that
has the number of rows and columns needed to show the function’s results. (Knowing how many rows
and columns to select depends on your knowledge of the function and your experience with it.) Then
you enter the formula that calls the function by means of Ctrl+Shift+Enter instead of simply Enter; again,
this sequence is called array entering the formula.
Download