Chapter 14 – Data and Information Analysis (pp. 348

advertisement
Chapter 14 – Data and Information Analysis (pp. 348-387)
Overall teaching objective: To introduce undergraduate criminal justice research method
students to various quantitative and qualitative analytical techniques and to demonstrate
their applications.









Analysis is a rather useful word. It can mean some form of examination, study,
investigation, scrutiny, and/or testing.
In research, we use the term ‘analysis’ to describe the process by which researchers
evaluate the data they gather and formulate an answer to their research questions or
hypotheses.
Analysis occurs at the end of the research process, after we have conducted our literature
review, designed our research method and collected our data.
But our planning of the analysis phase of research should start at the very beginning of
the research process. Without an eye on how we plan to analyze our data, we cannot
develop an effective research design. Indeed, concerns about analysis underlie the entire
research process since analysis is the essential task in research. Here is an example.
The purpose of this chapter is to describe the tools that researchers use to analyze data.
The chapter is divided into two parts.
o The first part focuses on the analysis of quantitative data.
o The second part focuses on the analysis of qualitative data.
The analytical tools used by quantitative and qualitative researchers are different because
the data these researchers collect is different. But generally speaking, analysis is
analysis.
In both quantitative and qualitative research, considerations about analysis should be
made during the design phase of the research process to ensure that the proper analysis
can take place.
During the analysis phase, researchers evaluate the data they gather to answer their
research questions or hypotheses. Even though analysis occurs near the end of the
research process, considerations of analysis should occur earlier in the research process.
Making Research Real 14.1 – Anticipating Analysis (p. 348)
 A high school counselor and school resource officer are interested in the
relationship (correlation) between illegal drug use and juvenile delinquency.
 They decide to distribute a survey that, among other things asks, do you use drugs
(yes or no) and before your 18th birthday did you violate the law (yes or no)
 The most accurate way measure correlation is with the Pearson r statistic.
 Unfortunately for our researchers this option is not available because this
statistical technique requires interval or ratio level data. Their data is nominal.
 Had the researchers considered the level of data they needed to perform this
analysis (and answer their research question) earlier then they could have revised
their questions and response sets so that they yielded interval or ratio level data.
Quantitative Data Analysis (p. 349)


Statistics summarize large amounts of data into a single number and enable us to
communicate information efficiently.
There are two general types of statistics;
o descriptive statistics and
o inferential statistics.
Descriptive Statistics (p. 350)
 Descriptive statistics describe the current state of something. An important set of
descriptive statistics are known as the measures of central tendency. These
measures include the mean, median, and mode.
Measures of central tendency (mean, median, mode) (p. 350)
 The mean is calculated by adding together all of the values for a particular
variable and dividing that sum by the total number of cases. Although it is a good
measure of central tendency, it is sensitive to extreme values, or outliers.
 The median is referred to as the middlemost value because it is the value that is
situated in the middle, with half the cases equal to or greater than and half the
cases equal to or lesser than this value. It is less susceptible to extreme values or
outliers than the mean.
 The mode is the most frequently occurring value in a population or sample. Like
the median, the mode is less susceptible to extreme values or outliers than the
mean.
 The decision about which measure of central tendency to use should be based on
two factors;
o whether the data are skewed toward extreme scores, and
o what level the variables are measured at.
Table 14.1 - Level of measurement and measures of central tendency. (p. 355)
Level of
Measurement
Measure of
Central Tendency
Example
Nominal
Mode
With the exception of the driver, the most frequently
injured occupant in a vehicle crash (the mode) is the
person in the front passenger seat.
Ordinal
Mode, Median
A review of the top scores in the Sergeant’s
Promotional Exams indicates that the patrol officers
who most frequently place near the top of the list
(the mode) are those with 10 or more years of
experience.
Of the eleven officers that took the latest
promotional exam, six scored 95 percent or higher
and six scored 95 percent or lower, meaning that 95
percent is the median score among these test takers.
According to this department’s promotional policy,
only the six officers who scored 95 percent are
higher are eligible for further promotional
consideration.
Interval/Ratio
Mode, Median,
Mean
The most frequently occurring age (the mode) at
which juveniles begin offending is 12 years of age.
The median age at which juveniles begin offending
is 12 years of age. Half of all juvenile offenders
begin offending at 12 years of age or younger; half
begin offending at 12 years of age or older.
The average (mean) age at which juveniles begin
offending is 12 years old.
Variability (range, standard deviation, percentages, percentiles, percent change) (p. 356)
 Measures of variability are descriptive statistics that tell us how much variation
exists within a sample or population.
 Among the measures of variability is the range, which is the difference between
the highest and lowest value in a sample or population. This descriptive statistic,
like the mean, is susceptible to extreme scores or outliers.
 The range, which is the difference between the highest and lowest value in a
sample or population. The range is computed by subtracting the smallest value
from the largest value.
 The standard deviation is a descriptive statistic that describes how much
variability exist within a sample or population. Because the standard deviation
considers both the mean and the total number of cases in the sample or
population, it is a much more stable statistic than the range.
 A percentage is a descriptive statistic that describes a portion of a sample or
population. Percentages are calculated by dividing the number of like cases by
the total number of cases, then multiplying that quotient by 100.
 A percentile is a statistic that tells us where a value ranks within a distribution.
Sometimes this is referred to as the percentile rank. We calculate the percentile
rank by dividing the number of cases below the value by the total number of cases
and then multiplying that quotient by 100.
 Percent change is a descriptive statistic that indicates how much something
changed from one time to the next. We calculate the percent change by
subtracting the original number from the new number, dividing that difference by
the original number and then multiplying that quotient by 100.
 Rates are a descriptive statistic that enable us to compare similar behaviors across
multiple locations. Rates factor in population size and report incidents per n
units.
The normal distribution (p. 361)
 In normally distributed data, the mean, median and mode are equal because all
of the data are distributed equally around the same value.
 In a normal distribution;
o 68.2 percent of all cases fall within one standard deviation of the mean,
o 95.4 percent of all cases fall within two standard deviations of the mean,
and
o 99.9 percent of all cases fall within three standard deviations of the
mean.
Making Research Real 14.2 – How Intelligent are the Inmates in our System?(p. 362)
 Using what he knows about normally distributed data, a researcher evaluates the
results of intelligence tests administered to inmates.
 Using this information enables a correctional administrator to make more
appropriate assignments (i.e. to programs) for inmates.
Inferential Statistics (p. 362)
 Inferential statistics enable analysts to determine the probability of certain
outcomes. When reading inferential statistics, we are concerned with statistical
significance, which is a measure of the probability that the statistic is due to
chance. If the statistical significance of a statistic is .05 or less, we can conclude
that the results are not due to chance.
Statistical significance (p. 363)
 Statistical significance is a measure of the probability that the statistic is due to
chance. As a general rule, if the statistical significance of a statistic is .05 or less,
we can conclude that the results are not due to chance. The .05 level of statistical
significance means that there is a 5 in 100 chance that the results are due to pure
chance.
t-tests (p. 363)
 The t-test is a statistical technique used to determine whether or not two groups
are different with respect to a single variable. T-tests can only be run using
interval or ratio level data. If the statistical significance of the t-score is .05 or
less, it can be concluded that the difference between the two groups is not due to
chance.
Making Research Real 14.3 – Improving the Self Esteem of Juvenile Offenders
 In this research a juvenile probation officer compares the results of an experiment
that tested the effect of a new program designed to improve the self esteem of
probationers in her caseload.
 She uses a t-test to determine whether the difference in self esteem scores between
the experimental and control groups (post treatment) are statistically significant
(i.e. not due to chance)
Analysis of variance (ANOVA)(p. 363)


The analysis of variance (ANOVA) model allows analysts to compare two or more
groups to see if they are different with respect to a single variable measured at
the interval or ratio level.
An ANOVA produces an F-ratio statistic. If the statistical significance of the Fratio is .05 or less, it can be concluded that the difference between at least two of
the groups is not due to chance.
Making Research Real 14.4 – Improving the Self Esteem of Juvenile Offenders – Part II
(p. 364)
 In this research a juvenile probation officer compares the results of an experiment
that tested the effect of a new program designed to improve the self esteem of
probationers in her case load.
 In this case she divided her case load into three groups.
 She used an ANOVA to determine whether the difference in self esteem scores
between the three groups (post treatment) are statistically significant (i.e. not due
to chance)
Chi Square (p. 365)
 The Chi Square test is used to determine whether there is a statistically significant
difference between what we expect to happen and what actually happens.
 The operative statistic is called the chi-square statistic.
 If the statistical significance of the chi square statistic is .05 or less, it can be
concluded that the difference between what actually happened and what was
expected to happen was not due to chance.
Making Research Real 14.5 – Profiling at the Airport (p. 365)
 In this research a researcher attempts to determine whether racial or ethnic
minorities (particularly Muslim appearing travelers) are selected for more
invasive searches at airports.
 The researcher calculates the percentage of each racial and ethnic group that
comes through the security gate.
 The researcher then compares these baseline figures (what is supposed to
happen) with the population of individuals (by race or ethnicity) what were
actually searched (what actually happened)
 The researcher concluded that Muslim appearing individuals were searched more
frequently.
Pearson r (p. 367)
 The Pearson r is used to determine whether two variables measured at the
interval or ratio level are correlated.
 The Pearson r coefficient ranges from -1 to +1. The closer it is to -1 or +1, the
higher the level of correlation between the two variables.
 Positive Pearson r coefficients indicate a positive correlation.
 Negative Pearson r coefficients indicate a negative correlation.
Making Research Real 14.6 – Keeping Kids Involved (p. 368)
 This research attempts to determine the relationship (correlation) between
participation in extracurricular activities, illegal drug use and grades.
 The researchers determine that there is some correlation, however in some cases
the relationships are weak.
Spearman rho (p. 370)
 The Spearman rho statistic is similar to the Pearson r, but it indicates the level of
correlation between variables measured at the ordinal level and ranges from -.80
to +.80.
Multiple regression (p. 370)
 Multiple regression enables the analyst to measure the individual and combined
effects of various independent variables on a dependent variable. A multiple
regression requires data collected at the interval or ratio levels.
Making Research Real 14.7 – Keeping Kids Involved – Part II (p. 371)
 In a continuation of the study described in Making Research Real 14.6,
researchers attempt to determine if grade point average can actually be predicted
by involvement in extracurricular activities and drug use.
 The researchers conclude that in school activities do have an effect on grades
while out of school activities do not.
Selecting an appropriate inferential statistical technique (p. 372)
 The decision as to which inferential statistical technique to use depends on the
level at which the data are measured and the type of hypothesis that the study is
testing.
Table 14.2 - Commonly used inferential statistical techniques. (p. 373)
Level of
measurement
Type of hypothesis Appropriate statistical technique
Association
NA
Difference
Association
Chi-Square
Spearman rho
Difference
Association
NA
Pearson r (without prediction)
Regression (with prediction)
Difference
t-test (two groups)
ANOVA (three or more groups)
Nominal
Ordinal
Interval/Ratio
Qualitative Data Analysis (p. 373)

Qualitative researchers focus more on analyzing words than they do numbers;
they attempt to explain the ‘how’ and ‘why’ of social processes.
Making Research Real 14.8 – It Wasn’t What He Said; It Was How He Said It! (p. 373)
 A highway patrolman gets in trouble for telling a motorist to “Have a nice day”
 Objectively, this comment seems benign, even nice.
 But, the way the trooper said appears to anger the woman.
 In the end the lesson is not in what is said, but in how it is said, which is an
important part of qualitative analysis.
Transcription (p. 374)
 The process of producing a written transcript of interviews that have been videoor audio-taped is known as transcription.
 These transcripts provide the written data that qualitative researchers analyze.
Memoing (p. 375)
 Qualitative researchers use a process called memoing to record their thoughts
and ideas on the research data.
 Memoing is typically on-going throughout the data collection process.
Segmenting (p. 376)
 Segmenting is a process used by researchers to organize or categorize qualitative
data.
 This stage of qualitative data analysis occurs after the researcher has
familiarized themselves with the data.
Making Research Real 14.9 – A Typology of Violence (p. 376)
 In this study the researcher develops an inventory to classify behaviors in terms of
their level of violence.
 This is an example of how qualititative researchers use segmenting.
Coding (p. 377)
 After segmenting the data, qualitative researchers go through their data and code
it.
 Coding refers to a process whereby researchers identify recurring themes, label
these themes with a descriptive word or phrase (“codes), and organize their notes
or transcripts according to these themes.
Diagramming (p. 378)
 Diagramming is a process by which researchers develop flow charts or
hierarchical diagrams to illustrate relationships between different parts of their
qualitative data.
Matrices (p. 378)
 Researchers also use matrices, or tables, to illustrate such relationships.
Getting to the Point (Chapter Summary) (p. 380)
 During the analysis phase, researchers evaluate the data they gather to answer
their research questions or hypotheses. Even though analysis occurs near the end
of the research process, considerations of analysis should occur earlier in the
research process.

Statistics summarize large amounts of data into a single number and enable us to
communicate information efficiently. There are two general types of statistics:
descriptive statistics and inferential statistics.

Descriptive statistics describe the current state of something. An important set of
descriptive statistics are known as the measures of central tendency. These
measures include the mean, median, and mode.

The mean is calculated by adding together all of the values for a particular
variable and dividing that sum by the total number of cases. Although it is a good
measure of central tendency, it is sensitive to extreme values, or outliers.

The median is referred to as the middlemost value because it is the value that is
situated in the middle, with half the cases equal to or greater than and half the
cases equal to or lesser than this value. It is less susceptible to extreme values or
outliers than the mean.

The mode is the most frequently occurring value in a population or sample. Like
the median, the mode is less susceptible to extreme values or outliers than the
mean.

The decision about which measure of central tendency to use should be based on
two factors: (1) whether the data are skewed toward extreme scores, and (2) what
level the variables are measured at.

Measures of variability are descriptive statistics that tell us how much variation
exists within a sample or population. Among the measures of variability is the
range, which is the difference between the highest and lowest value in a sample or
population. This descriptive statistic, like the mean, is susceptible to extreme
scores or outliers.

The standard deviation is a descriptive statistic that describes how much
variability exist within a sample or population. Because the standard deviation
considers both the mean and the total number of cases in the sample or
population, it is a much more stable statistic than the range.

A percentage is a descriptive statistic that describes a portion of a sample or
population. Percentages are calculated by dividing the number of like cases by
the total number of cases, then multiplying that quotient by 100.

A percentile is a statistic that tells us where a value ranks within a distribution.
Sometimes this is referred to as the percentile rank. We calculate the percentile
rank by dividing the number of cases below the value by the total number of cases
and then multiplying that quotient by 100.

Percent change is a descriptive statistic that indicates how much something
changed from one time to the next. We calculate the percent change by
subtracting the original number from the new number, dividing that difference by
the original number and then multiplying that quotient by 100.

Rates are a descriptive statistic that enable us to compare similar behaviors across
multiple locations. Rates factor in population size and report incidents per n
units.

In normally distributed data, the mean, median and mode are equal because all of
the data are distributed equally around the same value. In a normal distribution,
68.2 percent of all cases fall within one standard deviation of the mean; 95.4
percent of all cases fall within two standard deviations of the mean; and 99.9
percent of all cases fall within three standard deviations of the mean.

Inferential statistics enable analysts to determine the probability of certain
outcomes. When reading inferential statistics, we are concerned with statistical
significance, which is a measure of the probability that the statistic is due to
chance. If the statistical significance of a statistic is .05 or less, we can conclude
that the results are not due to chance.

The t-test is a statistical technique used to determine whether or not two groups
are different with respect to a single variable. T-tests can only be run using
interval or ratio level data. If the statistical significance of the t-score is .05 or
less, it can be concluded that the difference between the two groups is not due to
chance.

The analysis of variance (ANOVA) model allows analysts to compare two or
more groups to see if they are different with respect to a single variable measured
at the interval or ratio level. An ANOVA produces an F-ratio statistic. If the
statistical significance of the F-ratio is .05 or less, it can be concluded that the
difference between at least two of the groups is not due to chance.

The Chi Square test is used to determine whether there is a statistically significant
difference between what we expect to happen and what actually happens. The
operative statistic is called the chi-square statistic. If the statistical significance of
the chi square statistic is .05 or less, it can be concluded that the difference
between what actually happened and what was expected to happen was not due to
chance.

The Pearson r is used to determine whether two variables measured at the interval
or ratio level are correlated. The Pearson r coefficient ranges from -1 to +1. The
closer it is to -1 or +1, the higher the level of correlation between the two
variables. Positive Pearson r coefficients indicate a positive correlation; negative
Pearson r coefficients indicate a negative correlation. The Spearman rho statistic
is similar to the Pearson r, but it indicates the level of correlation between
variables measured at the ordinal level and ranges from -.80 to +.80.

Multiple regression enables the analyst to measure the individual and combined
effects of various independent variables on a dependent variable. A multiple
regression requires data collected at the interval or ratio levels.

The decision as to which inferential statistical technique to use depends on the
level at which the data are measured and the type of hypothesis that the study is
testing.

Qualitative researchers focus more on analyzing words than they do numbers;
they attempt to explain the ‘how’ and ‘why’ of social processes.

The process of producing a written transcript of interviews that have been videoor audio-taped is known as transcription. These transcripts provide the written
data that qualitative researchers analyze.

Qualitative researchers use a process called memoing to record their thoughts and
ideas on the research data. Memoing is typically on-going throughout the data
collection process.

Segmenting is a process used by researchers to organize or categorize qualitative
data. This stage of qualitative data analysis occurs after the researcher has
familiarized themselves with the data.

After segmenting the data, qualitative researchers go through their data and code
it. Coding refers to a process whereby researchers identify recurring themes, label
these themes with a descriptive word or phrase (“codes), and organize their notes
or transcripts according to these themes.

Diagramming is a process by which researchers develop flow charts or
hierarchical diagrams to illustrate relationships between different parts of their
qualitative data. Researchers also use matrices, or tables, to illustrate such
relationships.

There are a number of software programs specifically designed for qualitative
data analysis. These programs include ATLAS™, Nvivo™, NUD-IST™, and
Ethnograph™. Using these and other programs, researchers and practitioners can
mine data for patterns and other useful information.
Download