Australian Market & Social Research Society | Volume 32 | Number 6 | July 2015 10 INSPIRATION, IDEATION, ITERATION By Sarah Boden. 12 WHAT WERE THEY THINKING? Tips for MR companies. 18 LEVERAGING VALUE 20 CREATIVITY AND INNOVATION Three key benefits. Keeping ahead of market disruption. ISSN: 1839-4256 AMSRS YEARS DESIGN THINKING T H E M E AT Y S E C T I O N : I N V I T E D C O M Will text analytics replace qualitative analysis? Introduction 1. Understanding the terminology of text analysis, and Text analytics turns words into data. It is the qualitative heart of 2. Understanding what’s available. ‘big data’ in that it is the only practical way to be able to see what Let’s have a look at each of these. is being expressed in writing in social media and in open-ends of surveys with very large samples. As surprising as this will seem to Understanding the terminology some researchers, the term ‘qualitative analysis’ to some people Much of what has been written about text analytics has been means the coding of open-ends rather than analysis of qualitative highly technical, written by computational linguists, statisticians research. I am going to answer the question ‘will text analytics and IT specialists. Even the core terms ‘text analytics’ and replace qualitative analysis?’ from both of these points of view. ‘qualitative analysis’ need to be explained. I knew a great deal about qualitative analysis before I started the research for this article, but less about text analytics, having only Text analysis, textual analytics or text mining? tried out two of the many on offer. Not one to be deterred, I set There are three terms in common use in this field which for out to do some research. In this article, I share what I have learned: our purposes mean pretty much the same thing: text analysis, that text analytics is more powerful, useful and sophisticated than textual analytics and text mining. ‘Textual analytics’ is probably the many researchers realise, but much less so than some of the text better term because it tends to be used more for the “systematic analytics suppliers are promising. Text analytics represents an application of numeric and statistical methods that service and opportunity for researchers working in some types of social and deliver quantitative information” (Grimes). ‘Text analysis’ can get market research because it turns a currently impractical idea - that confused with the kind of close text analysis conducted on novels we can ‘listen’ to vast amounts of text without reading anything and poetry. I have a personal objection to the word ‘mining’ - I – into something do-able. It represents something of a threat don’t like to compare the interesting things that people share to researchers who simply describe qualitative data, but is no with us with pieces of coal but that’s just a personal gripe! threat to researchers who do more than describe – i.e. those who interpret and explain. Such researchers can in fact bring unique Disambiguating ‘qualitative analysis’ sense-making and knowledge transfer skills to text analytics. Disambiguation is a word often used in text analytics. It refers to the process of differentiating between words and phrases that Why should market and social researchers be interested in text? have more than one meaning. Ironically, the field of text analytics While many of us perhaps continue to conceptualise research ‘qualitative analysis’ means the process of turning unstructured in terms of the spoken word - CATI interviews, face to face data e.g. written words and symbols like hashtags into a structured groups, and so on - the truth is that our industry is moving form for counting and further statistical analysis. It is not the same away from talking towards text. Online surveys, online qual, as the ‘qualitative analysis’ which means the process of finding online communities, and social media are all obvious examples. meaning in qualitative research data drawn from focus groups, Organisations across the world are conducting online customer in-depth interviews, and the like. This is a useful lesson to text satisfaction surveys with sample sizes in their tens or hundreds analysts about how meaning can be context-sensitive! seems greatly in need of its own disambiguation. In this context of thousands, each with an open-ended question to complete. What about the qualitative analysis of qualitative research? In social media, people are writing publicly about their lives, Here too we need to clarify. For qualitative researchers, one their state of health, their experiences, what they are feeling, of the key tenets of qualitative analysis is that it includes thinking and doing. interpretation of the findings, with interpretation occurring We can choose to ignore all this data and leave the analysis iteratively throughout the project. (Esomar) task to someone else or we can look at the best way to get the The key difference between the two is that in qualitative most out of it. The main barriers to doing this seem to me to be: research qualitative analysis preserves the qualitative nature 14 Research News July 2015 COMINVIT ME ED NTA RY M E N TA R Y F R O M I N D U S T R Y F I G U R E S of the data. In text analytics, qualitative analysis quantifies the preponderance of words that express certainty such as qualitative data so that it can be further analysed. ‘extremely’ and ‘absolutely’ and words that express doubt as in ‘uncertain’ and ‘might’. We get much closer to the Understanding what is available emotional texture of the transcript that way. So now we are all on the same page, let’s explore what is actually • Unsupervised NLP-based models. Some models have available in text analytics. There are three types (which have their been developed to identify topics or themes in the data own sub-types which I am ignoring for this summary): without the need to develop a separate training corpus. • ‘Bag of words’ models. ‘Bag-of-words’ is the term used This is called ‘unsupervised’ machine learning. This is much to describe the most basic form of text analytics. These closer to the way in which qualitative researchers would models produce frequency counts of words. They treat text manually identify themes in the data, whereas ‘bag of words’ just as words, disregarding grammar and word order. They models are based on content analysis, which few commercial can provide a ‘right now’ summary of brand mentions, for qualitative researchers use. Here of course there is an example, visualised in frequency charts or a word cloud. opportunity to use them on very large samples of text. (Greenbook, 2014). Sentiment analysis can be built into these models, to (with some limitations) identify negative What to use it for descriptors like ‘terrible’ versus positive ones, like ‘great’. It is clear that contemporary versions of text analytics bring sheer The limitation of ‘bag of words’ models is that they ignore computing power to market research problems that are hard to what the word means. Because they don’t know what role the solve any other way. Detailed case studies are available online word plays in the sentence, they can’t differentiate between which show successful use for ‘right’ meaning correct and ‘right’ meaning direction. • Corpus-based models based on natural language processing (NLP). Models which use machine-learning based on NLP were developed to overcome the problems of ‘bag of words’ models, to find out what people were • Developing new product ideas (Pettit, 2014) • Understanding how people talk about their health (Ramirez-Esparza, 2008) • Client feedback / complaints (Anderson). One case study using a corpus -based approach below: saying, not just what they mentioned. They deduce what people mean by understanding the syntax. “As an overly broad generalization, we can say that NLP is Case Study: a UK airport carpark and transfer service fundamentally about taking an opaque document that consists A UK airport replaced the manual coding of an open end in of an ordered collection of symbols adhering to proper syntax its large customer survey: “what is the single most important and a reasonably well-defined grammar, and deducing the factor you feel we can improve upon to enhance your car park semantics associated with those symbols.” (Russell, 2011) and transfer experience?” Manual coding was taking about 2 These models can parse sentences, but they need to be weeks per survey period. taught what to look for. Their first step is to ‘learn’ from a The company first selected N=100 comments from their large training corpus, which may need to be developed for December survey results. One of the company’s experienced this purpose. Adjacent words and phrases frequently found coders annotated each of these comments identifying and together are said to represent a thought, idea or concept. assigning subcategories to different stages in the carpark These are first identified in the training corpus and then experience. collated in the data being analysed for the project. It helps They then used these categories to develop a text analytics if the corpus used for this purpose is domain-specific. In model, assigning experiences to a specific part of the service other words it’s about the same thing as the data you are process (e.g. booking). They refined the model and tested going to analyse. This is an iterative and by no means simple several times. Using the human coder’s input, they recognised process but these models can identify statements such as that some people used the question to complain, some to ‘the service was slow’, or ‘I am looking for a ….’. compliment and some to make suggestions, so they adapted It has always seemed a shame to me that the combined the text analytics model to allow for this. brain power of computational linguists, AI experts and The final model was fully implemented was considered statisticians has been reduced to measuring sentiment. successful not because it delivered greater insight than manual NLP-based systems give us the opportunity to do so much coding – it was broadly the same – but because it was quicker more than that. In fact, the one I use for qualitative research and this can matter since customers may leave if there is a delay does not tell me what the actual words were, it tells me in responding to their complaint. (Villarroel Ordenes, 2013) what type of words they were. For example, it can identify Research News July 2015 15 T H E M E AT Y S E C T I O N : I N V I T E D C O M Some things to bear in mind • Visualisation is not the same as analysis. The output • Prepare, prepare , prepare. A minor issue, but worth from some text analytics is a concept map, in which it is noting - no matter which text analytics model you use, you possible to explore the relationships between concepts need to prepare the text to be machine-readable. All text – qualitatively that is interesting and useful. On the other has to be lower case for example and words are ‘stemmed’ hand, some produce word clouds which look pretty. to make sure that the machine recognises related words Semiotically, they give a potentially misleading impression like ‘research’ and ‘researched’. of unity, since you can create a pretty word cloud out of “In reality, dealing with text requires dedicated pre- any jumbled mass of words. processing steps and sometimes specific expertise on the part of the data science team.” (Provost, 2013) Are researchers still needed? • Manage the ‘stop word’ list. Most of the words which We have seen that machines can analyse text that we could aren’t nouns or verbs are removed before conducting text not analyse any other way, and do so reasonably well for some analytics since you don’t want the machine to count all the products and services. Like everything they have weaknesses. We articles (‘the’), pronouns (‘my), prepositions (‘despite’) and live with the weaknesses if the strengths are compelling enough, conjunctions (‘because’). These are called ‘stop words’. so I can see several applications for text analytics as an adjunct However, if you want more than just lists of words, you to research not a replacement. will need to customise the stop word list. In the carpark The main thing is that we don’t want to lose our nerve; machines case study, they had to remove ‘to’ from the stop word list are good but they are not that good. As powerful and sophisticated because travel ‘to’ the carpark was relevant. as they are, machines cannot replace what researchers do, • You need to customise. Sentiment analysis versions because they cannot think, understand language, or interpret. work by customising lists of positive and negative words or word co-occurrences. Corpus-based NLP models require Machines cannot think humans to teach the machine about that particular corpus, Some parts of the text analytics industry would have us believe which may not be useful to the next project you work on that text analytics can do anything and everything. For example, and they have to be continually updated. Such models one supplier promises to deliver “100% of the meaning” from may therefore suit a client-side researcher better than a any text, without actually specifying what they mean by ‘meaning’. research agency which works for many clients. We need go no further than academics and data scientists • Text analytics is unsuited to some market and social working in this field to see that the unconditional claims of some research projects. For a start, it requires scale to work. commercial text analytics suppliers are not warranted. There is a One healthcare organisation discovered that their patients’ simply wonderful ongoing conversation about whether machines open-ended survey comments mentioned the unsurprising: can think here: http://edge.org/annual-question/what-do-you- “doctor”, “appointment”, “surgery”, “practice”, and “time”. You think-about-machines-that-think. I quote Carlo Rovelli: “The gap need lots of text. In Australia, only a few brands probably between our best computers and the brain of a child is the gap have the right kind of scale to make analysis of social media between a drop of water and the Pacific Ocean”1 useful. Some models works best with unique terms such as brand names because they are easy for the machine to ‘see’ but much less well for general topics like ‘skin care’, which As one market research firm which conducts text analytics puts it: “These software systems are very powerful, but they cannot limits the kind of research project they can be used for. take the place of the thinking human brain. The results from • Data reduction is not the same as analysis. What most these software systems should be thought of as approximations, of these models do is reduce the data to a manageable as crude indicators of truth and trends, but the results must form but data reduction is only the first stage in analysis. always be verified by other methods and other data.” (http://www. To make matters worse, there is a fine line between data decisionanalyst.com/Database/TextMining.dai) reduction and being too reductive with data. Am I the only One of the reasons for this is that machines can’t differentiate person who instinctively disengages when faced with a sense from nonsense. Concepts or topics that emerge from text word list? Word lists are dull, because they are reductive; analytics are just “statistical regularities in the data. As such, they they need context to be meaningful. Context is easier to are not necessarily intelligible, and they are not guaranteed to see in customer satisfaction research, because the original correspond to topics familiar to people ...” (Provost, 2013). question acts as the context, (Menictas, 2013) As a linguist, I question the assumption that all text is the same 1. http://edge.org/response-detail/26026. NOTE: References have been omitted for space reasons, but can be found via Research News online, using the ‘view and search text’ function. 16 Research News July 2015 COMINVIT ME ED NTA RY M E N TA R Y F R O M I N D U S T R Y F I G U R E S regardless of its source. People use customer satisfaction surveys words and the hearts and minds of the people who receive the very differently from the way they use social media, yet the text words. Researchers also know that the best insights come from analytics industry seems to suggest that ‘one size fits all’. It’s not going beyond what people tell you, to why they told you, or why just that people use more negative language in some than others they didn’t, so during the analysis we ask ourselves questions like: but the way people use language is different. We need to ask: • Why have people said that? why are people communicating - what do they want to achieve • What’s missing? by completing this open end, or tweeting, or blogging? Are they communicating a fact, or are they trying to impress, or persuade? Conclusion Let’s embrace the new possibilities brought by text analytics, Machines can’t understand language but do so on our own terms. Researchers can add value to text Machines can identify words and parse sentences, but they can’t analytics because we know that data is just data. Researchers ‘understand’. Machines can’t detect or understand figurative can bring understanding from individual and social psychology, language for example, despite the fact that people use figurative culture, tactical and strategic marketing, and social policy issues, language all the time, especially to express emotion. Think of the and can synthesise and communicate insights to interpret data emotional implications of the ‘war’ metaphor used in skin care so that it is meaningful. discourse; it is about defence against (the enemy) ageing. Text “The ability to organize knowledge into concepts is one of the analysts perhaps unwittingly reveal how they feel about words defining characteristics of the human mind. A truly intelligent which have to be ‘extracted’, while ideas are ‘nuggets’ that have system needs physical knowledge of how objects behave, social to be ‘mined’. knowledge of how people interact, sensory knowledge of how things look and taste, psychological knowledge about the way Machines can’t interpret people think, and so on. Having a database of millions of common- At the beginning of the 21st Century, there was a revolution in sense facts, however, is not enough for computational natural market research which seems to have been forgotten. Research language understanding: we will need to teach NLP systems how was said to be moving away from describing to interpreting. A to handle this knowledge (IQ), but also interpret emotions (EEL) quote from 2004: and cultural nuances (CHQ).” (Cambria & White, 2014) “Information was once power. But today, the power lies in interpreting what the information really means. In the hands of a skilled analyst, survey data may unearth invaluable insights into what makes people ‘tick’. But the same data, in the hands of a journeyman analyst, may lead to a creative idea being stifled at birth”. (Smith, 2004) If ‘journeyman analysts’ can’t interpret, what hope has a machine? As information scientists who work in the field of ‘sense-making’ argue, the processes that humans use to make sense out of something are ‘interactive, dynamic and infinite’. It is not just about counting. “Information seeking is a complex communication process that involves the interaction among the information seeker, the information, and the information provider’. (Lui, 2013) In research, interpretation frequently involves categorising and re-categorising findings, and decontextualizing and then recontextualising them, especially in terms of what this means, or doesn’t mean, for the client. People not used to interpreting assume that “Words, like little buckets, are assumed to pick up their loads of meaning in one person’s mind, carry them across the intervening space, and dump them into the mind of another” (Osgood, 1979, cited in http://langs.eserver.org/linell/ chapter09.html). In contrast to this the meaning of something is not in the Susan Bell Susan Bell is an AMSRS Fellow, with an Honours degree in English and Linguistics, and a Graduate Diploma in Psychology. She started her research career as an interviewer, moving on to running the data preparation department of Hoare Wheeler and Lenehan and was then trained as a qualitative and quantitative researcher by Yann Campbell Hoare Wheeler. She started her own agency Susan Bell Research in 1994 with clients from professional services, government and FMCG. As well as being a research allrounder, Susan has a special expertise in semiotics and discourse analysis and has taught qualitative analysis at AMSRS Summer and Winter Schools and at the NewMR Festival. words but in the hearts and minds of the people who made the Research News July 2015 17