Interpreting and Analyzing Data

advertisement
How do you know?:
Interpreting and
Analyzing Data
NCLC 203
New Century College,
George Mason University
April 6, 2010
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time
- T.S. Eliot
Learning objectives:
Quantitative

Differentiate between descriptive and inferential statistics.

Know how to calculate measures of central tendency (mean, median, mode).

Understand measures of dispersion (range, frequencies, percentages) and how to
report them.
Qualitative

Review Stringer’s (2007) steps for interpreting and analyzing data.

Differentiate between coding and categorizing data.
Overall

Understand the benefits and challenges of using different techniques.

Learn to draw conclusions from the data (what is known? what is interesting or
intriguing but uncertain or unknown?).
Brief review of surveys
Advantages
•Gather lots of data in short time
frame
Disadvantages
•Responses may not be
accurate
•Relatively inexpensive
•Wording of questions may
affect responses
•Standardized format and choices
allow for direct comparison across
participants
•Researcher bias may affect
survey design choices
•Participants may feel more
confident in anonymity and
confidentiality
•Literacy issues
•Lack of depth of information,
chance for follow-up
Introduction to Descriptive Statistics
Descriptive versus Inferential Statistics:
Descriptive statistics are used to DESCRIBE data
examples: what is the average case? The most frequently occurring
case? The percentage of people who do x? What is the relationship
between X and Y?
Other kinds of statistics are used to make INFERENCES (deductions,
conclusions) from data, offer probabilities about the likelihood that
something will happen…
examples: what is the likelihood that X will happen?
Average (Mean)
Sum of all data points divided by the number of cases.
Given the sample data, what is the average?
Our Sample Data Set: 2, 4, 4, 4, 6
Average (Mean)
Sum of all data points divided by the number of cases.
(2+4+4+4+6)/5 = 20/5 = 4.0
Average (Mean)
Notes about the average:

Every item in the group is used to calculate the mean.

Every group of data has one and only one mean
(rigidly determined).

The mean may take on a value that is not
realistic/meaningful in a social sense.

An extreme value in the group (an outlier) has a
disproportionate influence on the mean.
Median
The middle value in an ordered array of
data.

First you would array the data in order, either
ascending or descending

After the numbers are in order, find the middle number.

If there is an even number of cases, then add the
middle two values and divide by two.
In our data set 2, 4, 4, 4, 6, the Median is 4
Median
What is the median for the following data
set?
2, 4, 6, 9
Median
Notes about the median:

The median is not affected by extreme values.

At most, only two items will be used to calculate the
median.

If items are not clustered closely around the median,
the median is not a good measure of central tendency.

Medians would not usually take on unrealistic values
(only if two data items are averaged).
Mode
The mode is the most frequently occurring
value(s).

In our data set 2, 4, 4, 4, 6 the mode is 4.
Mode
Notes on Mode:

A data set can have one or more modes (mode,
bimodal, trimodal, tetramodal, etc.) which can make it
difficult to interpret.
For instance, what is the most common score in the list:
1, 2, 2, 3, 4, 4, 5, 8?
Mode
Notes on Mode:

A data set can have one or more modes (mode,
bimodal, trimodal, tetramodal, etc.) which can make it
difficult to interpret.
For instance, what is the most common score in the list:
1, 2, 2, 3, 4, 4, 5, 8?
In this case, the data set has two modes: 2 and 4.
Mean vs Median vs Mode

The mode will often tell us more about a set of
data than does the mean.
 Example:
78, 78, 78, 42.
Which score better represents the data?

Another example: 42, 78, 80, 82, 83
 The
median is 80, the mean is 73. The median is
more resistant to outliers than is the mean. If these
were a student’s scores, would you give them a “C” or
a “B”?
An example….
Measures of Dispersion
Says something about the location of the data in the data set. That
is, how the data is “dispersed” throughout the data set.
Includes:
 Range
 Max
 Min
 Frequency: number of times something appears in a data set
 Also, Standard Deviation (won’t cover this here)
Measures of Dispersion
Range:

Maximum (MAX): the maximum number in the data.

Minimum (MIN): the minimum number in the data set.
Measures of Dispersion
FREQUENCY:

Often, neither measures of central tendency nor the measure of
dispersion describe the data well enough.

One very useful tool in this case is the Frequency Table.

A Frequency Table is just that: a table that tells the frequency (how
often) values occur in a data set.

Usually data are grouped together in what are called BINS.
Measures of Dispersion
FREQUENCY:

This concept is best illustrated with a large data set, however, to
use the data set we have been working with thus far: 2, 4, 4, 4, 6
Bin Boundaries
Less than
or Equal
to
Number of Items in Bin
[Frequency]
>0
0
0
0
2
1
3
5
3
6
8
1
9
Infinity
0
Bigger
Than
Measures of Dispersion
HISTOGRAM:

A HISTOGRAM is simply a bar graph of the frequency
table. Along the base ("x" axis) go the bin boundaries and in
the vertical direction go the frequencies.
For our simple data set of 2, 4, 4, 4, 6, the histogram be:
Sample Data
3.5
3
2.5
Frequency

2
1.5
1
0.5
0
0
2
4
Bins
6
Small group activity
Take the brief survey, anonymously.
 Trade surveys with another group.
 Calculate the mean, median, mode,
min/max for questions 1-2. Calculate the
frequency for question 3. Discuss how you
would report question 4.
 How would you report these results?

Using technology for quantitative
analysis

Can use multiple platforms for tabulating,
compiling, and analyzing survey results. A
few include:
 Excel
 SPSS
 Surveymonkey.com
Working with qualitative data
Advantages
Challenges
 Treats each individual as unique,
builds on their reality and
experiences
Open ended questions not easily
aggregated/reduced
 Questions can be asked in a
number of formats (structured,
unstructured or semi-structured
interviews, focus groups)
Can be time-consuming
 Permits rich data
Can be difficult to compare across
cases.
One example of a customer survey
interview gone wrong…
http://www.nbc.com/The_Office/video/custo
mer-survey/817102/
Steps from Stringer (chapter 5)
1.
2.
Review the collected data
Identify key experiences (categorizing and
coding)
Categorizing= grouping similar concepts together
Coding= noting variations in your categories
Steps from Stringer (chapter 5)
3.
4.
5.
6.
Identify main features of each experience
Identify elements that compose the
experience
Identifying themes
Develop a report framework
Small group activity




Each group is assigned a topic.
Everyone should write for 10 minutes about that
topic.
Have each person read their paragraph aloud to
the group.
Work through Stringer’s stages…identify key
experiences, and elements that compose that
experience.
Big Picture Reflection:

What will you look for?

What will you measure?

What will you do with it?
(Adapted from Gelmon and Holland, AAHE 2003)
Using technology for quantitative
analysis…

Can use multiple platforms for
categorizing, coding, and making meaning
of qualitative data. A few include:
 Excel
 Microsoft
word edit (use comment function)
 QDA (free!) http://www.pressure.to/qda/
 HyperResearch
Download