From Words to Meaning to Insight Julia Cretchley & Mike Neal Outline Content Analysis What is Leximancer? Steps to your first analysis In-depth Leximancer Consider Some Text "We use the laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The laser 500 can batch process, and collate the pages to save us time. Sometimes paper gets jammed in the laser 500. Then we have to open it up to remove the crumpled pages. We have tried other machines in the past, but have not found an alternative that works better for us.” A Definition Content analysis is ‣ A formal methodology ‣ to study a collection of media ‣ to discover, uncover, or answer A Little History Systematic analysis of texts performed several times by religious entities prior to 1900 (Krippendorff, 2004) Major growth periods in the 20th Century (Krippendorff, 2004) • Early 20th Century studies of newspaper content • Behavior sciences emerge in 1930s and 1940s and begin to study media effects • World War II brought about propaganda studies • Post war saw expansion into conversation analysis, personal document analysis, processes of communication, and a generalized measure of meaning A Little (More) History Computer text analysis began in the 1960s but challenging beyond quantitative analysis of text (Krippendorff, 2004) Today, extensive proliferation of traditional, electronic, and social media are leading to strong interest in content analysis and more powerful software Application Areas Today News Analysis Sentiment Social Media Forensics Historical Reviews Political Documents Conversations Propaganda Television Content Bias Determination Song Lyrics National Security Video Game Content More ... A Definition: Again Content analysis is ‣ A formal methodology ‣ ‣ to study a collection of media to discover, uncover, or answer A Formal Methodology A formal, objective method with rigor and repeatability Many methods and processes are valid Methodology example 1. 2. 3. 4. 5. 6. Determine research question Identify and collect samples Perform quantitative analysis Perform qualitative analysis Draw conclusions Summarize, publish, and share results To Study A Collection of Media Media is a method of information communication Collections include the following (normalize formats to text) • Written media such as newspapers, magazines, websites, Blogs, Tweets, Facebook pages, emails • Audio, such as radio programs, interview transcripts, conversations (can be transcribed into text) • Video, such as television, movies, news footage, YouTube videos (can be transcribed into text) • Images described in text To Discover, Uncover, Answer... Discover concepts, themes, and relationships in the collection Uncover unknown qualities about the data Answer a specific research question Key Points Outline All methods of content analysis share common components, which will now be presented Quantitative (counting) and qualitative (meaning) analyses • Analysts can use one or both methods • A content analysis is best when both quantitative and qualitative approaches are combined (Weber, 1990) • More later... Important study aspects include sampling, units of measure, coding, validity, and reliability Sampling Sampling is a method to take subsets of documents to study Krippendorf (2004) provided this guidance • • • • Sampling plans are needed to reduce researcher bias Select a type of sampling (e.g., random) Sample size is important to be representative Split-half technique: Two samples equal the same result Units of Measure Sampling Part 2: Samples require a definition of data resolution • Television comedies, 1/2 hour, Wednesday nights • Entire tweet, tweets from a user, collection of topical tweets • One blog entry, an entire blog, or consolidation of many blogs • Newspaper article, articles of a set timeline Content analysts must determine these units to measure • Impacts relationships of words and coding • Concept discovery restricted to within units Coding Process of examining text in a specific unit and extracting relevant data Look for words, phrases, word sense, and categorize units of text (i.e., words, sentences, paragraphs, tweets) Three methods of coding 1. Manual, by person(s) coding from codebooks, instructional guides, intuition 2. Computer-assisted (NVivo) beginning with coding then often some automation for remaining documents 3. Computer generated (Leximancer, CATPAC) Reliability and Validity For a formal analysis method to be sound, reliability and validity must be addressed Reliability refers to stability and reproducibility • Coding to be repeatable if manual or computer assisted • Inter-rater reliability for manual coding with multiple coders affects reproducibility and must be ensured • Measure of accuracy is tied to statistical norms • Accuracy is the strongest form of reliability (Weber, 1990) Reliability and Validity (cont.) Validity refers to general applicability of results and conclusions obtained from inferences in the study • Major concern for qualitative analysis in general Researcher chooses coding concepts --makes inferences Researcher bias, errors, conclusions • Neuendorf listed external validity, face validity, criterion validity, content validity, and construct validity “Are we measuring what we want to measure?” (Neuendorf, 2002, p. 112) Quantitative Analysis Counting and statistics: Numeric measurements • Word frequencies: how many times does a word appear? Specify stop-words to ignore (e.g., the, and, others) Need to consolidate synonyms, stems (e.g., dog = dogs) Compound words (i.e., word pairs) are important o United States o not good • Categories (simply present or frequencies) Quantitative Analysis cont Concept frequencies • How often do concepts occur? • Existence (occurs) or actual counts Other Statistics • Proximity and co-occurrence frequencies can all be used to determine concept relationships Qualitative Analysis Coding is performed to reduce text collection to categories (i.e., concepts) Analyst can seed concepts or discover concepts during analysis Often, the more discovery allowed the more objective the analysis (grounded theory reduces researcher bias) Concepts and their relationships form the foundations for extracting meaning What is a Concept? Synthesis of a text representation • Key words, including consolidating synonyms, stems • Represents something meaningful • Found by examining word, compound word, and surrounding words in a measurable unit Useful to display on a graphical “map” A Concept Map Role of the Computer Solutions A content analysis can be done without a computer. Although... • At a minimum, a computer serves as a document file folder and backup device • And a search tool for and within documents Software can also assist with manual coding then continue coding automatically (NVivo) Or software can do coding automated by statistical processing (Leximancer) or networks (CATPAC) Key Points Summary A content analysis is best when both quantitative and qualitative approaches are combined (Weber, 1990). Quantitative analysis counts and finds statistics Qualitative analysis determines meaning Important operational aspects include sampling, units of measure, coding, validity, and reliability References Krippendorf, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage. Neuendorf, K. A. (2002). The content analysis guidebook. Thousand Oaks, CA: Sage. Weber, R. P. (1990). Basic content analysis. Newbury Park, CA: Sage. Willig, C. (2008). Introducing qualitative research in psychology: Adventures in theory and method (2nd ed.). Philadelphia, PA: Open University Press. Reading List Evaluation of Unsupervised Semantic Mapping of Natural Language with Leximancer Concept Mapping, Andrew Smith. Conversations Between Carers and People With Schizophrenia: A Qualitative Analysis Using Leximancer, Julia Cretchley, Cindy Gallois, Helen Chenery, and Andrew Smith Analysis of Asynchronous Discourse in Webassisted and Web-based Courses, David Thomas and Cleborne Maddux Reading List cont Computer Aided Phenomenography: The Role of Leximancer Computer Software in Phenomenographic Investigation, Sorrel PennEdwards Content Analysis of a Random Day of Two News Sites: FoxNews.com and MSNBC.com, Michael R. Neal