From Words to Meaning to Insight Julia Cretchley & Mike Neal Outline Content Analysis What is Leximancer? Steps to your first analysis In-depth Leximancer What Is Leximancer? Leximancer is a software tool designed for analyzing natural language text data Uses statistics-based algorithms • Initial analysis in minutes Automatically analyzes a text collection • User can direct search, add, remove, merge terms Extracts semantic (meaning) and relational information (more later) Outputs include concept map, network cloud, quantitative data, concept thesaurus Leximancer Overview Text Let’s Look at Some Text "We use the Laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The Laser 500 can batch process, and collate the pages to save us time. Sometimes paper gets jammed in the Laser 500. Then we have to open it up to remove the crumpled paper. We have tried other machines in the past, but have not found an alternative that works better for us.” What is this text about? (one main topic) Concept Extraction Terms around a word indicate its meaning Word associations discover concepts; language independent Leximancer concept: A group of related words that travel together in the text • Evidence words include synonyms and adjectives They begin as seed words for coding and evolve to a thesaurus • word-like, Name-like (proper nouns), and compounds (United States) Concept Extraction cont A few things to note... • • • • Several concepts may be in a single sentence Concept may span multiple sentences Adjustable resolution (default: 2 sentences) Stop lists remove common words (the, and) Algorithms • Threshold of evidence words for a concept must be present to be coded in a block of text • Concept can be coded with evidence words, even if the actual seed word (printer) is not present Concept Extraction Units of Resolution "We use the laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The laser 500 can batch process, and collate the pages to save us time. Sometimes paper gets jammed in the laser 500. Then we have to open it up to remove the crumpled pages. We have tried other machines in the past, but have not found an alternative that works better for us.” Leximancer divides into two sentence units (configurable) Concept Extraction Units of Resolution "We use the Laser 500 printer here at the office. We are pretty happy with it. Once there was a leak and all the toner spilled out of the machine machine, but a technician came out and fixed the problem for us. We still have to top the toner up often. The printer goes through ink quickly and the cartridges are expensive, but we put up with this because it delivers good results reliably. We are pleased with the quality of rinting we get. The Laser 500 can batch process, and collate the pages to 500 save us time. Sometimes paper gets jammed in the Laser 500. paper Then we have to open it up to remove the crumpled paper. We have tried other machines in the past, but have not found an alternative that works better for us.” printer concept: laser 500, toner, machine, rinting paper concept: pages, crumpled, jammed Semantic and Relational Analysis Semantic meaning created through conceptual analysis • Presence and frequency of words, phrases • Co-occurrence of words make a concept • Explicit and implicit concepts identified (tsunami and earthquake imply Japan) Relationships created through concept cooccurrence Themes and Concept Map Themes • Collection of related concepts in close proximity on the map • Theme name is most prominent concept Concept map display • Size of dots means frequency of occurrence • Line between concepts show relationships • Map proximity is by shared friends links (LinkedIn) Concept map becomes interface to explore underlying text Concept and Theme Creation Evidence words (thesaurus) Laser 500 machine toner rinting pages crumpled jammed Concepts printer paper 2 co-occurrences of printer and paper Additional Features Thesaurus (coding dictionary) automatically generated • No manual coding required • Profiling and directed coding supported Analyst can seed their own terms Sentiment lens feature for affective analysis Discourse analysis of speakers supported Survey data analysis supported Key Points Summary Automated, statistical approach • How do you do this manually? • No data management, dictionary creation and updates User does not have to formulate a coding scheme • This saves time, and • Avoids introduction of researcher bias (grounded theory) Nuances, subtleties, distinction in expression • Word association approach most likely to identify these Evidence words with links from Leximancer allows deeper exploration, documentation of findings Questions?