Content analysis

From Words to Meaning to Insight
Julia Cretchley & Mike Neal
 Content Analysis
 What is Leximancer?
 Steps to your first analysis
 In-depth Leximancer
Consider Some Text
"We use the laser 500 printer here at the office. We are pretty happy
with it. Once there was a leak and all the toner spilled out of the
machine, but a technician came out and fixed the problem for us. We
still have to top the toner up often. The printer goes through ink
quickly and the cartridges are expensive, but we put up with this
because it delivers good results reliably. We are pleased with the
quality of rinting we get. The laser 500 can batch process, and collate
the pages to save us time. Sometimes paper gets jammed in the laser
500. Then we have to open it up to remove the crumpled pages. We
have tried other machines in the past, but have not found an
alternative that works better for us.”
A Definition
 Content analysis is
‣ A formal methodology
‣ to study a collection of media
‣ to discover, uncover, or answer
A Little History
 Systematic analysis of texts performed several times by
religious entities prior to 1900 (Krippendorff, 2004)
 Major growth periods in the 20th Century
(Krippendorff, 2004)
• Early 20th Century studies of newspaper content
• Behavior sciences emerge in 1930s and 1940s and begin to
study media effects
• World War II brought about propaganda studies
• Post war saw expansion into conversation analysis, personal
document analysis, processes of communication, and a
generalized measure of meaning
A Little (More) History
 Computer text analysis began in the 1960s but
challenging beyond quantitative analysis of text
(Krippendorff, 2004)
 Today, extensive proliferation of traditional, electronic,
and social media are leading to strong interest in
content analysis and more powerful software
Application Areas Today
News Analysis
Social Media
Historical Reviews
Political Documents
Television Content
Bias Determination
Song Lyrics
National Security
Video Game Content
More ...
A Definition: Again
 Content analysis is
‣ A formal methodology
to study a collection of media
to discover, uncover, or answer
A Formal Methodology
 A formal, objective method with rigor and repeatability
 Many methods and processes are valid
 Methodology example
Determine research question
Identify and collect samples
Perform quantitative analysis
Perform qualitative analysis
Draw conclusions
Summarize, publish, and share results
To Study A Collection of Media
 Media is a method of information communication
 Collections include the following (normalize formats to
• Written media such as newspapers, magazines, websites,
Blogs, Tweets, Facebook pages, emails
• Audio, such as radio programs, interview transcripts,
conversations (can be transcribed into text)
• Video, such as television, movies, news footage, YouTube
videos (can be transcribed into text)
• Images described in text
To Discover, Uncover, Answer...
 Discover concepts, themes, and relationships in
the collection
 Uncover unknown qualities about the data
 Answer a specific research question
Key Points Outline
 All methods of content analysis share common
components, which will now be presented
 Quantitative (counting) and qualitative (meaning)
• Analysts can use one or both methods
• A content analysis is best when both quantitative and
qualitative approaches are combined (Weber, 1990)
• More later...
 Important study aspects include sampling, units of
measure, coding, validity, and reliability
 Sampling is a method to take subsets of documents to
 Krippendorf (2004) provided this guidance
Sampling plans are needed to reduce researcher bias
Select a type of sampling (e.g., random)
Sample size is important to be representative
Split-half technique: Two samples equal the same result
Units of Measure
 Sampling Part 2: Samples require a definition of data
• Television comedies, 1/2 hour, Wednesday nights
• Entire tweet, tweets from a user, collection of topical
• One blog entry, an entire blog, or consolidation of many
• Newspaper article, articles of a set timeline
 Content analysts must determine these units to
• Impacts relationships of words and coding
• Concept discovery restricted to within units
 Process of examining text in a specific unit and
extracting relevant data
 Look for words, phrases, word sense, and categorize
units of text (i.e., words, sentences, paragraphs,
 Three methods of coding
1. Manual, by person(s) coding from codebooks,
instructional guides, intuition
2. Computer-assisted (NVivo) beginning with coding then
often some automation for remaining documents
3. Computer generated (Leximancer, CATPAC)
Reliability and Validity
 For a formal analysis method to be sound,
reliability and validity must be addressed
 Reliability refers to stability and reproducibility
• Coding to be repeatable if manual or computer
• Inter-rater reliability for manual coding with multiple
coders affects reproducibility and must be ensured
• Measure of accuracy is tied to statistical norms
• Accuracy is the strongest form of reliability (Weber,
Reliability and Validity (cont.)
 Validity refers to general applicability of results and
conclusions obtained from inferences in the study
• Major concern for qualitative analysis in general
 Researcher chooses coding concepts --makes inferences
 Researcher bias, errors, conclusions
• Neuendorf listed external validity, face validity,
criterion validity, content validity, and construct validity
“Are we measuring what we want to measure?”
(Neuendorf, 2002, p. 112)
Quantitative Analysis
 Counting and statistics: Numeric measurements
• Word frequencies: how many times does a word
 Specify stop-words to ignore (e.g., the, and, others)
 Need to consolidate synonyms, stems (e.g., dog =
 Compound words (i.e., word pairs) are important
o United States
o not good
• Categories (simply present or frequencies)
Quantitative Analysis cont
 Concept frequencies
• How often do concepts occur?
• Existence (occurs) or actual counts
 Other Statistics
• Proximity and co-occurrence frequencies can all be
used to determine concept relationships
Qualitative Analysis
 Coding is performed to reduce text collection to
categories (i.e., concepts)
 Analyst can seed concepts or discover concepts
during analysis
 Often, the more discovery allowed the more
objective the analysis (grounded theory reduces
researcher bias)
 Concepts and their relationships form the
foundations for extracting meaning
What is a Concept?
 Synthesis of a text representation
• Key words, including consolidating synonyms, stems
• Represents something meaningful
• Found by examining word, compound word, and
surrounding words in a measurable unit
 Useful to display on a graphical “map”
A Concept Map
Role of the Computer Solutions
 A content analysis can be done without a computer.
• At a minimum, a computer serves as a document file
folder and backup device
• And a search tool for and within documents
 Software can also assist with manual coding then
continue coding automatically (NVivo)
 Or software can do coding automated by statistical
processing (Leximancer) or networks (CATPAC)
Key Points Summary
 A content analysis is best when both quantitative
and qualitative approaches are combined (Weber,
 Quantitative analysis counts and finds statistics
 Qualitative analysis determines meaning
 Important operational aspects include sampling,
units of measure, coding, validity, and reliability
Krippendorf, K. (2004). Content analysis: An
introduction to its methodology (2nd ed.). Thousand
Oaks, CA: Sage.
Neuendorf, K. A. (2002). The content analysis
guidebook. Thousand Oaks, CA: Sage.
Weber, R. P. (1990). Basic content analysis. Newbury
Park, CA: Sage.
Willig, C. (2008). Introducing qualitative research in
psychology: Adventures in theory and method (2nd
ed.). Philadelphia, PA: Open University Press.
Reading List
 Evaluation of Unsupervised Semantic Mapping of
Natural Language with Leximancer Concept
Mapping, Andrew Smith.
 Conversations Between Carers and People With
Schizophrenia: A Qualitative Analysis Using
Leximancer, Julia Cretchley, Cindy Gallois, Helen
Chenery, and Andrew Smith
 Analysis of Asynchronous Discourse in Webassisted and Web-based Courses, David Thomas and
Cleborne Maddux
Reading List
 Computer Aided Phenomenography: The Role
of Leximancer Computer Software in
Phenomenographic Investigation, Sorrel PennEdwards
 Content Analysis of a Random Day of Two
News Sites: and,
Michael R. Neal