Class 2 – variability 1 Measures of Variability We have already discussed the most frequently used measures of central tendency. Measures of central tendency allow us to select one number to represent a distribution of scores. In discussing them, we focused on the type of measurement scale that is being employed (Nominal, Ordinal, Interval and Ratio. Measures of Variability are a second form of descriptive statistic, which we use to describe how spread out the scores in our distribution are. We will begin by defining the most common ways of measuring variability. We will then discuss how different types of distributions might affect our choice of measure of central tendency. Finally, we will talk about the importance of looking at variability when interpreting results. Since the logic underlying inferential statistics depends on a good understanding of variability it is important that you understand these concepts. When dealing with nominal scales we have the same limitations we had with measures of central tendency. Numbers assigned to nominal scales are not truly numbers – they are name labels. We therefore, cannot calculate a number that would describe the variability of the responses. In fact, we cannot even meaningfully say that the scores range from category 1 to category 7 because the order of the categories is arbitrary. We can only summarize by listing the categories and their frequency of occurrence. If there are only a few categories, you might simply summarize the distribution in text form (e.g., Thirty-eight percent of respondents were males and 62% were female). When there are around 4 to 7 categories bar charts or pie graphs may be appropriate, whereas, larger numbers of categories might best be summarized in tables. This is not a hard and fast rule. It depends on the variable. We tend to reserve figures for more important variables. With ordinal scales, we can define the Range of the categories. We might say our sample included respondents ranging in education level from no high school through to PhDs. This defines the extremes, or end points of our ordered categories. Since the intervals are not equal between our categories, we cannot define in numbers the average difference between scores. With continuous /scale variables, (Interval and Ratio) we can use numbers to describe the variability of the distribution. Looking at an example, let’s compare final exam scores from three sections of a Gen Psych course. Section 1 Section 2 Section 3 160 102 200 130 101 78 100 100 77 70 99 75 40 98 70 Total 500 500 500 Mean 100 100 100 All three classes have the same mean but the variability of the grades differs greatly. There are several measures of variability that can be used to describe these differences. Range - simplest measure of variability. It is defined by the highest score minus the lowest score. Range Section 1 = 160 - 40 = 120 Range Section 2 = 102 - 98 = 4 Range Section 3 = 200 - 70 = 170 Class 2 – variability 2 The higher the range, the more variable the scores, however, the range may not be a very representative measure of a distribution of scores. You might notice that the range is defined by only 2 numbers; the highest and the lowest. In section one, there is a relatively large range and the scores are fairly evenly distributed within it. The range of section 2 is small, but once again the distribution of scores is fairly even. Section 3 has the largest range; however this is due to one extremely high score. All other scores in section 3 are very close to each other. Range can be strongly affected by the occurrence of extreme scores. Therefore, range is often not the best way to represent the variability of scores. Deviation Scores. One way that we can use to represent the distribution of scores is to look at the average amount that scores deviate (differ) from the mean. We could take each score and minus the mean from it and then sum this value. The problem here is that the mean is the arithmetic center. By definition the sum of the deviation of scores that fall above the mean (have a positive deviation) is equal to the sum of deviation scores below the mean (have a negative deviation). When we sum deviation scores, we will always get zero. This measure therefore is useless. The mean deviation score will always be zero. Clearly, the mean deviation score will not help us to describe the distribution of scores. We could get around this, however, if we used absolute (ignoring the positive or negative sign) distribution scores. In essence, that is what we do when we calculate Variance. Variance - Instead of ignoring the sign, we use a little mathematical trick to convert all the deviation scores to positive numbers. We square them. You might recall from algebra that all squared values are positive. E.g., 22 = 4 and -22 = 4 (a negative multiplied by a negative is a positive). If we square all the deviation scores and sum them we will get a positive number. If we then divide by the number of scores, we obtain the average squared distance that scores deviate from the mean. Variance is one of the most commonly used measures of variability. Recall it is the heart of the analysis that we call Analysis of Variance (ANOVA). We are also going to learn that the assumption of homogeneity of variance (the requirement that that the variance of samples we are comparing do not differ from each other) will be something that you will need to test in order to be able to use ANOVA’s. For the moment, the important thing to realize about variance is that it is a measure of variability in terms of average squared distances between the individual scores in the distribution and the mean of that distribution. In this course you will not be asked to calculate the variance of a distribution by hand, nor with a calculator. SPSS will do these calculations for you. There is, however, something you should be aware of about the manner in which SPSS calculates variance. It sums the squared deviation scores and divides by N-1 (the total number of scores – minus one). Why does it do that? In order to understand this we have to take a short detour and discuss the difference between Samples and Populations. A Population is the entire group of people or scores that you want to apply your sample statistics to. If I wanted to know the average height of students attending Platteville, I could go out and measure them all and then determine the exact average height. When we obtain measurements from an entire population we refer to the descriptive values (mean, range, variability) as parameters. They are exact. When we do research, more often then not, we measure a sample of the population. The descriptive statistics from the sample are used as estimates of the population parameters. The word statistic means that we are estimating. A statistic is an estimate of a parameter. When we use statistics we are taking a subset of scores (the sample) and generalizing them to the population. One way that statistics can be misleading is that the sample might not be an unbiased subset of the population. We have discussed this a great deal when talking about sample selections and external validity. Statisticians have also done a great deal of work looking at the degree to which statistics are unbiased estimates of the parameter. The easiest way to understand this is to look at a technique they use called Monte Carlo studies. Statisticians generate a large distribution of numbers of a known mean and variability and then they repeatedly (thousands of times) draw random samples of a given size from this population. Generally, they use computers to do these studies. Monte Carlo studies have provided us with two important findings. 1) Larger samples give more precise estimates of the population parameters. This should make sense. The larger the sample, the more representative it is of the population. Extreme scores to one side of the distribution are more Class 2 – variability 3 likely to be counteracted by extreme scores to the other end and thus the estimate is more accurate. 2) No matter how large the sample, some statistics are still biased estimates of the population. The mean is an unbiased estimate. If you calculate the mean from several samples, any given sample is as likely to be an overestimate of the population mean as it is to be an underestimate. If you average the means from several samples, you will get a good estimate of the population mean. Variance, however, is not an unbiased estimate of the population variance. The smaller the sample, the more it underestimates the variance. I am sure there are complex mathematical explanations for this, but it would be well beyond what you need to know. The important thing to know about variances is that this bias is very easily corrected. Statisticians have found that if you divide the sum of the squared deviation scores by N-1, you get an unbiased estimate of the population variance. Notice that the higher the sample size the less the correction is. For example, 100 - 1 is a smaller adjustment than 10 - 1. When you have SPSS calculate the variance of a distribution of scores, it assumes you are working with a sample. It divides the sum of the squared deviation scores by N-1. If you are really working with a population you should correct this by multiplying the variance by N-1 and then dividing by N. One of the major advantages of variance is that it is easy for the computer program to work with. The major limitation is that, unlike computers, people have difficulty time thinking about squared values. If you look at the two distributions below, you can see that the variability of scores in distribution B is twice that as distribution A, but the variance of B is 4 times as large as A’s. We can, however, convert variances to values that are easier to think about, simply by using their square roots. These are called standard deviations. So the standard deviation of Distribution A is 1.58 and the standard deviation of Distribution B is 3.16. The variability of Distribution A is half that of Distribution B and this is also true of the magnitudes of their respective Standard Deviations. In other words, it is not easy for us to compare distributions using variance, but it is easy to do so with standard deviations. Distribution A Deviation Scores Squared Deviations Deviation scores Squared Deviations 4 -2 4 2 -4 16 5 -1 1 4 -2 4 6 0 0 6 0 0 7 1 1 8 2 4 8 2 4 10 4 16 s2 = 2.5 s = 1.58 Mean = 6 Mean = 6 Distribution B s2 = 10 s = 3.16 Standard Deviations. The easiest way to think about standard deviations is as an approximation of the average amount that the scores in the distribution deviate from the mean. Yes, I know that the average deviation of scores in distribution A is 1.5 not 1.58, and the average distance between scores for distribution B is 3 not 3.16 but it is a very close estimate. This is once again so that the statistic estimates the population parameter. The important thing to remember is that Variances and Standard deviations allow us to use a number to describe the amount to which scores in the distribution differ from each other. Properties of Variance and Standard Deviations. While standard deviations are more useful for describing the distribution of scores in a manner in which most people can understand and they allow us to compare the average variability of scores between distributions, standard deviations cannot be meaningfully added or averaged. For example, if I wanted to calculate the average standard deviation of two distributions, I cannot simply add them together and divide by 2. Instead, I would need to go back to the variances and find their average and then reconvert to a standard deviation. The main point is that you cannot add, subtract, divide or multiply standard Class 2 – variability 4 deviations and obtain a meaningful answer. These mathematical manipulations can be done with variances and that makes variance much more useful when computing inferential statistics. Although there is debate about which statistic, variance or standard deviation, should be reported in the results section, I suggest you use the one which is most easily understood, the standard deviation. Whenever you report a mean, you should report the standard deviation as well. Variation is only one aspect of a distribution that may be important to look at, and perhaps included in your write-up. The shape of the distribution can also be important. Below I have included some examples of distributions and the terms used to describe them. One way we can describe a distribution as symmetric or skewed. A distribution curve is symmetric if when folded in half the two sides match up. If a curve is not symmetrical it is skewed. When a curve is positively skewed, most of the scores occur at the lower values of the horizontal axis, and the curve tails off towards the higher end. When a curve is negatively skewed, most of the scores occur at the higher value and the curve tails off towards the lower end of the horizontal axis. SPSS reports skewness as a number. A perfectly symmetrical curve has a skewness value of 0. Positive skewed curves have positive numbers, whereas, negative skewed curves have negative numbers. If a distribution is a bell-shaped symmetrical curve, the mean, median and mode of the distribution will all be the same value. When the distribution is skewed the mean and median will not be equal. Since the mean is most affected by extreme scores, it will have a value closer to the extreme scores than will the median. For example, consider a country so small that its entire population consists of a queen (Queen Cori) and four subjects. Their annual incomes are Citizen Annual Income Queen Cori 1,000,000 Subject 1 5,000 Subject 2 4,000 Subject 3 4,000 Subject 4 2,000 I, as Queen, might boast that this is a fantastic country with an “average” annual income of $203,000. Before rushing off to become a citizen, you might want to be wise and find out what measure of central tendency I am using! The mean is $203,000, so I am not lying, but this is not a very accurate representation of the incomes of the population. Money is a continuous (ratio) variable, but in this case the median ($4,000.00) or the mode (also $4,000.00) would be a more reprehensive value of the “average” income. The point to be made here is that the appropriate measure of central tendency is affected not only by the type of measurement but also by the distribution of income. The mean of a distribution is strongly affected by extreme scores, which we call outliers. Another way to describe the shape of a distribution is called kurtosis which describes how peaked or flat a distribution is. The standard normal curve (which we will speak about more in the future) has a kurtosis value of 0. Curves which are narrower (more peaked) have positive values; whereas, curves that are flatter (more evenly distributed) have negative kurtosis values. While numbers can be used to define the skewness and kurtosis, these values are rarely reported in results sections. Lab 2: Exploratory Data Analysis Before analyzing data that you have entered into SPSS, it is advisable to conduct a quick exploratory analysis. In the Lab today we will be learning how to do that. The purpose of the exploratory analysis is to familiarize you with the nature of your data distributions, and to identify problems that might need to be dealt with, such as errors, extreme outliers, or extreme skew and/or kurtosis. Class 2 – variability 5 For LAB2 I will give you the data. You should define the variables. In the first column I have used 1 to indicate the subject is male and 2 to indicate that they are female. In the second column I have entered final exam scores for all the subjects (who just happen to have been in my Gen Psych Class). In the third column, I have entered scores on a life happiness rating scale that ranges from 1 (very unhappy) to 7 extremely happy. The first step is to name and define your variables (just like last week) To begin with we will look at the distribution of the overall scores. Then we will re-do the analysis looking at exploratory analysis for males and females scores individually. Using the Explore option. From the top menu click on, “Analyze” then click on Descriptive statistics, and then click on Explore. For the first analysis, you should move the final grade variable and the happiness rating variable into Dependent List Box. In the Display option box click on Both. Explore will limit output to either statistics (numbers) or plots (graphs) but I want you to become familiar with both, so choose both. Click on the box labels Statistics. On the menu that comes up means should already have a check mark beside it. This is the default analysis. I want you to click the box next to outliers as well. Click on continue to return to the Explore Menu Click on the Plots Box Then click on “histogram” (note Stem and leaf should not be selected – it will give you output you do not know how to interpret). Click on continue to return to the Explore menu Click on OK and SPSS will display the results of your exploratory analysis. The output will consist of a chart that contains various statistics for both variables. Most of these statistics you will be familiar with but some are new. I will explain the new ones. Remember statistics are estimates of population parameters. Based on the distribution of the scores, the 95% confidence Interval for Mean output defines a range of scores within which we can be 95% sure the population mean lies within. If the sample is representative of the population, then in only 5% of samples we would obtain a sample so extreme (due to chance alone) that the mean would not be within this range. Five% Trimmed Mean - To obtain a value SPSS removes the top, bottom 5% of cases, and recalculates a new mean value. If you compare this mean to the new trimmed mean, you can see if some of your extreme scores are having a strong influence on the mean. If they are very different, it would be a good idea to check to see if there is a possible data entry error, or if a subject who should not have been included in your sample for some reason has been included. Perhaps, they differ in age or in some other important way that might explain their extreme score. Perhaps they were unable to complete the task due to language difficulties or some other impairment. Interquadrile range and it defines the middle 50% of the scores. It is the range if the top 25% and the bottom 25% of scores were removed. We will come back to this at a latter time. The second Table gives a list of five highest and the five lowest scores in each variables distribution. The Case number is also present so that if an error is detected, it can be quickly identified and changed in the data set. Class 2 – variability 6 SPSS provides a Histogram so that you can see the distribution of scores for each variable. The last output you will obtain for each variable is a Boxplot (also called a Box and Whisker’s Plot.) The rectangle in the middle represents the interquadrile range (middle 50% of cases). The lines protruding from this rectangle (called whiskers) extend to the smallest and largest values. You may see additional circles outside this range – these are classified by SPSS as outliers. Data points are considered outliers if they are more than 1.5 box lengths from the edge of the box. Extreme scores (marked with an Asterisk * ) are defined as scores more than three box lengths from the edge of the box. The line in the middle of the central rectangle is the median of the distribution. One of the two variables has an extreme value. Identify the value, (it is clearly an error). Replace it with the most likely correct entry, and then redo the analysis for this variable. Note: there is no reason to redo the other variable, but do not loose the results for that variable -- you will need it. One final Analysis. This time I want you to look at the final exam scores for Males and Females individually. You do this by moving the variable that defines subjects’ sex into the factor list box on the Explore Menu. You will be comparing these to the overall (when both males and females were included in the same distribution) statistics obtained for grade scores. With this computer printout you should be able to answer the following questions 1. Compare the statistics you obtained after you remove the error from your data to the statistics you obtained when the error was present. Pay attention to which statistics (mean, Median and Mode, range and standard deviation, skewness and kurtosis) are affected by extreme scores and which are not. 2. On your print out, find the overall variance for the Final Exams. If I wanted to determine the variance of the class’ final grades as a parameter of the class population, how would I need to adjust this value? (i.e., what would the variance be if the class is the entire population rather than as a sample?) 3. Find the Mean and Standard Deviations for Males final exam scores and for Females final exam scores. Then find the overall Mean and Standard Deviation for Final exams for the entire class. Average the mean scores for Males and females. Is it the same as the overall mean for the distribution? Average the standard deviations obtained from the males and the females. Is this value the same as the standard deviation obtained for the entire class. If not, Why? Mathematically, how would you go about getting the standard deviation of the class, if all you had to work with is the number of subjects in each subgroup and the standard deviations for each subgroup?