How do you know?: Interpreting and Analyzing Data NCLC 203 New Century College, George Mason University April 6, 2010 We shall not cease from exploration And the end of all our exploring Will be to arrive where we started And know the place for the first time - T.S. Eliot Learning objectives: Quantitative Differentiate between descriptive and inferential statistics. Know how to calculate measures of central tendency (mean, median, mode). Understand measures of dispersion (range, frequencies, percentages) and how to report them. Qualitative Review Stringer’s (2007) steps for interpreting and analyzing data. Differentiate between coding and categorizing data. Overall Understand the benefits and challenges of using different techniques. Learn to draw conclusions from the data (what is known? what is interesting or intriguing but uncertain or unknown?). Brief review of surveys Advantages •Gather lots of data in short time frame Disadvantages •Responses may not be accurate •Relatively inexpensive •Wording of questions may affect responses •Standardized format and choices allow for direct comparison across participants •Researcher bias may affect survey design choices •Participants may feel more confident in anonymity and confidentiality •Literacy issues •Lack of depth of information, chance for follow-up Introduction to Descriptive Statistics Descriptive versus Inferential Statistics: Descriptive statistics are used to DESCRIBE data examples: what is the average case? The most frequently occurring case? The percentage of people who do x? What is the relationship between X and Y? Other kinds of statistics are used to make INFERENCES (deductions, conclusions) from data, offer probabilities about the likelihood that something will happen… examples: what is the likelihood that X will happen? Average (Mean) Sum of all data points divided by the number of cases. Given the sample data, what is the average? Our Sample Data Set: 2, 4, 4, 4, 6 Average (Mean) Sum of all data points divided by the number of cases. (2+4+4+4+6)/5 = 20/5 = 4.0 Average (Mean) Notes about the average: Every item in the group is used to calculate the mean. Every group of data has one and only one mean (rigidly determined). The mean may take on a value that is not realistic/meaningful in a social sense. An extreme value in the group (an outlier) has a disproportionate influence on the mean. Median The middle value in an ordered array of data. First you would array the data in order, either ascending or descending After the numbers are in order, find the middle number. If there is an even number of cases, then add the middle two values and divide by two. In our data set 2, 4, 4, 4, 6, the Median is 4 Median What is the median for the following data set? 2, 4, 6, 9 Median Notes about the median: The median is not affected by extreme values. At most, only two items will be used to calculate the median. If items are not clustered closely around the median, the median is not a good measure of central tendency. Medians would not usually take on unrealistic values (only if two data items are averaged). Mode The mode is the most frequently occurring value(s). In our data set 2, 4, 4, 4, 6 the mode is 4. Mode Notes on Mode: A data set can have one or more modes (mode, bimodal, trimodal, tetramodal, etc.) which can make it difficult to interpret. For instance, what is the most common score in the list: 1, 2, 2, 3, 4, 4, 5, 8? Mode Notes on Mode: A data set can have one or more modes (mode, bimodal, trimodal, tetramodal, etc.) which can make it difficult to interpret. For instance, what is the most common score in the list: 1, 2, 2, 3, 4, 4, 5, 8? In this case, the data set has two modes: 2 and 4. Mean vs Median vs Mode The mode will often tell us more about a set of data than does the mean. Example: 78, 78, 78, 42. Which score better represents the data? Another example: 42, 78, 80, 82, 83 The median is 80, the mean is 73. The median is more resistant to outliers than is the mean. If these were a student’s scores, would you give them a “C” or a “B”? An example…. Measures of Dispersion Says something about the location of the data in the data set. That is, how the data is “dispersed” throughout the data set. Includes: Range Max Min Frequency: number of times something appears in a data set Also, Standard Deviation (won’t cover this here) Measures of Dispersion Range: Maximum (MAX): the maximum number in the data. Minimum (MIN): the minimum number in the data set. Measures of Dispersion FREQUENCY: Often, neither measures of central tendency nor the measure of dispersion describe the data well enough. One very useful tool in this case is the Frequency Table. A Frequency Table is just that: a table that tells the frequency (how often) values occur in a data set. Usually data are grouped together in what are called BINS. Measures of Dispersion FREQUENCY: This concept is best illustrated with a large data set, however, to use the data set we have been working with thus far: 2, 4, 4, 4, 6 Bin Boundaries Less than or Equal to Number of Items in Bin [Frequency] >0 0 0 0 2 1 3 5 3 6 8 1 9 Infinity 0 Bigger Than Measures of Dispersion HISTOGRAM: A HISTOGRAM is simply a bar graph of the frequency table. Along the base ("x" axis) go the bin boundaries and in the vertical direction go the frequencies. For our simple data set of 2, 4, 4, 4, 6, the histogram be: Sample Data 3.5 3 2.5 Frequency 2 1.5 1 0.5 0 0 2 4 Bins 6 Small group activity Take the brief survey, anonymously. Trade surveys with another group. Calculate the mean, median, mode, min/max for questions 1-2. Calculate the frequency for question 3. Discuss how you would report question 4. How would you report these results? Using technology for quantitative analysis Can use multiple platforms for tabulating, compiling, and analyzing survey results. A few include: Excel SPSS Surveymonkey.com Working with qualitative data Advantages Challenges Treats each individual as unique, builds on their reality and experiences Open ended questions not easily aggregated/reduced Questions can be asked in a number of formats (structured, unstructured or semi-structured interviews, focus groups) Can be time-consuming Permits rich data Can be difficult to compare across cases. One example of a customer survey interview gone wrong… http://www.nbc.com/The_Office/video/custo mer-survey/817102/ Steps from Stringer (chapter 5) 1. 2. Review the collected data Identify key experiences (categorizing and coding) Categorizing= grouping similar concepts together Coding= noting variations in your categories Steps from Stringer (chapter 5) 3. 4. 5. 6. Identify main features of each experience Identify elements that compose the experience Identifying themes Develop a report framework Small group activity Each group is assigned a topic. Everyone should write for 10 minutes about that topic. Have each person read their paragraph aloud to the group. Work through Stringer’s stages…identify key experiences, and elements that compose that experience. Big Picture Reflection: What will you look for? What will you measure? What will you do with it? (Adapted from Gelmon and Holland, AAHE 2003) Using technology for quantitative analysis… Can use multiple platforms for categorizing, coding, and making meaning of qualitative data. A few include: Excel Microsoft word edit (use comment function) QDA (free!) http://www.pressure.to/qda/ HyperResearch