Running head: PERSONALITY AND LANGUAGE To appear in T. Holtgraves (Ed.), Oxford Handbook of Language and Social Psychology Natural Language Use as a Marker of Personality Molly E. Ireland University of Pennsylvania Matthias R. Mehl University of Arizona Address correspondence to Molly E. Ireland (mireland@asc.upenn.edu), Annenberg School for Communication, University of Pennsylvania, 3620 Walnut Street, Philadelphia, PA 19104, or Matthias Mehl (mehl@email.arizona.edu), Department of Psychology, University of Arizona, 1503 E University Blvd., P.O. Box 210068, Tucson, AZ 85721. 1 PERSONALITY AND LANGUAGE 2 Natural Language Use as a Marker of Personality Natural language has been integral to the study of personality since the field’s inception. Lexical approaches to personality factor analyze the words people use to describe others in order to zero in on a small number of mostly-independent personality traits. Early on, this approach generated the Big Five model that remains a major force in personality research today (Costa & McCrae, 1992; Goldberg, 1981). More holistic narrative approaches code content themes in the stories that people tell about their lives in order to understand how individuals construe their own personality and represent it to others (McAdams & Pals, 2006). Despite personality researchers’ early recognition that language and stories might accurately describe personality, the idea that natural language use also reflects personality and might provide knowledge about individual differences above and beyond self-reports did not gain traction until the last decade. Before the computer revolution, Walter Weintraub (1981) spent decades amassing data on natural language use – primarily by training coders to count words and phrases by hand – with little recognition. Although some accepted his premise that verbal behavior, like other actions, reflects psychological processes, the psychological community was slow to adopt the burdens that came with carrying out linguistic research on sufficiently large samples by hand (Mehl, 2006a). The recent surge in the popularity of studying language in the social and behavioral sciences stems in large part from technological advances in computational linguistics. The internet and computer science more generally have made it possible to easily compile and analyze natural language corpora with word counts climbing into the millions and billions. As of 2010, Google Books had digitized 12 percent of all books ever published and has made those data available to interested researchers (Michel et al., 2010). Social networking and publishing PERSONALITY AND LANGUAGE 3 sites like Livejournal, and, more recently, Twitter allow researchers to download their users’ language use for free or for relatively low costs (e.g., Golder & Macy, 2011; Ramirez-Esparza, Chung, Kacewicz, & Pennebaker, 2008; Yarkoni, 2010). Facebook similarly allows researchers to access users’ status updates, although they tend to have stricter restrictions on usage than other sites (e.g., Kramer, 2010; Schwartz et al., 2013). The near-universal use of the internet has helped psychology widen its sampling nets for a wide range of purposes. The rise of the internet has been particularly fruitful for language research: Observing verbal communication or other language behavior online is simple. Within minutes, online social media users often produce hundreds of units of objectively and easily quantifiable (i.e. text-based) behavioral data. Perhaps more importantly, sophisticated text analysis tools allow researchers to analyze the increasingly large and diverse datasets that technological advances have made possible. One of the first text analysis tools to enter into common usage is a computationally simple word count program called the Linguistic Inquiry and Word Count (LIWC; Pennebaker, Francis & Booth, 2007). LIWC is a computerized text analysis program that outputs the percentage of words used in a text or batch of texts that fall into one or more of over 80 grammatical (e.g., articles, firstperson singular pronouns), psychological (e.g., positive emotion, insight), and topical (e.g, social, sex) categories. It does so by comparing each word in a text against a set of internal word lists or dictionaries. Although the psychometrically-developed content categories have a (minor) subjective component, the grammatical and linguistic categories are, for the most part, based on objective and factual information about the established lexical members of that category. LIWC, along with its predecessors the General Inquirer (Stone, Dunphy, Smith, & Ogilvie, 1966) and DICTION (Hart, 1984), focus on word frequencies alone irrespective of context. For example, the sentences, “I’ve never been less happy” and “I’m the happiest person PERSONALITY AND LANGUAGE 4 alive” would be coded as containing identical proportions of positive emotion words despite the very different meaning of each sentence. Critics have pointed out that, in examples like these, the programs’ context blindness can lead to noisy or difficult-to-interpret results. The role of word count programs, however, is not to obviate self- and observer reports but to complement them, and to clarify the gaps and inconsistencies they leave behind. A cursory reading of the two statements about happiness above or a single-item measure of positive emotion could tell you that the above speakers range from least to most happy, respectively. However, much of word count programs incremental utility comes from the information that questionnaires and content coding often misses: Programs like LIWC are able to tell us that the speakers in the examples above, despite their differences, are focusing on happiness to similar degrees. Studies have borne out the intuition that it’s often speakers’ focus rather than their conveyed meaning that matters. For example, early expressive writing studies showed that people who wrote about past traumas using positive emotion words tended to benefit more than those who used exclusively negative emotion words (Pennebaker, 1997). This finding was particularly striking because, due to the traumatic nature of the expressive writing topics, uses of positive emotion words were frequently negated (e.g., “she’ll never forgive me,” “I’m not a good person”). Measurement and Psychometrics of Natural Language Use Before serious and widespread work could begin on the links between natural language use and personality, two questions needed to be addressed: First, how can we measure (i.e., analyze) natural language use? And, second, does natural language use fulfill the basic psychometric requirements for a personality or individual difference variable (i.e., is it moderately stable over time and across context)? PERSONALITY AND LANGUAGE 5 Before computerized text analysis, language analysis in both psychology and linguistics was by necessity primarily qualitative. Linguists would often subject single utterances or exchanges to intense scrutiny (see Tanenhaus & Trueswell, 1995). Others would draw inferences about human behavior in general based on everyday observations of speech patterns of social relationships (e.g., Lakoff, 1975). Psychologists who sought psychometrically tractable linguistic data tended to use content coding methods, in which trained judges rated texts according to a set of criteria that typically focused on writers’ or speakers’ use of content themes, such as anxiety, hope, and health or sickness (Gottschalk & Gleser, 1969). However, even the most reliable and statistically sound of these methods were slow, labor-intensive, and subject to human error. A major benefit of word count approaches is that their reliability is never compromised by subjective biases or experimenter error. Computer programs output the same results regardless of the mental state of the experimenter using them, and they will always find exactly how many times a certain word or group of words occurs in a given text regardless of how easily overlooked those words might be (e.g., the notoriously invisible of1). Indeed, the contextblindness of word count programs is beneficial in this respect. Whereas a human coder might be bogged down by shades of meaning or biased assumptions about a speaker or writer, computerized text analysis programs focus single-mindedly on what language alone can tell us. Once programs were available that could rapidly and reliably identify language patterns in large samples of texts, it was possible to establish the basic psychometrics of natural language use. One of the first investigations into the reliability of language use was conducted by Gleser and colleagues (1959). In that study, participants told a personal story in monologue for about 5 minutes. Transcripts were divided into two equal halves and coded for several linguistic and psychological language categories, such as adjectives and feelings. The average correlation PERSONALITY AND LANGUAGE 6 between the two halves of these stories across all categories was moderate-to-high, approximately r = .50. In other words, as anyone who has either written or read a story could tell you, there is some variation in stories over time. Authors use different words when laying out the setting of a story than when describing the climax (Vine & Pennebaker, 2009). Yet despite these obvious differences, the linguistic fingerprint of an author or speaker tends to remain visible over the course of a narrative. Similarly, individuals’ spoken language use tends to remain consistent during hour-long life history interviews, with consistency coefficients (Cronbach’s alpha) ranging from .41 to .64 for several stylistic and content language categories, despite stark differences in interviewer questions between the first and second halves (Fast & Funder, 2008). Several studies have now demonstrated that natural language use evidences substantial consistency not only over the course of a narrative, but over time as well. Schnurr et al. (1986) asked medical students to describe their experiences coming to medical school in two unscripted monologues spaced one week apart. They found that language use, as analyzed with the General Inquirer (Stone et al., 1966), was highly reliable over time. Across 83 content categories measured in that study, including references to people, work, affect, and evaluations, the average correlation between the two monologues was .78. Later, around the time that computerized text analysis approaches were beginning to gain mainstream momentum, Pennebaker and King (1999) further tested the limits of linguistic stability by comparing individuals’ language use over longer stretches of time, lasting up to several years, and across diverse contexts, ranging from scientific articles to students’ stream-of-consciousness essays. They once again found, using word frequencies calculated by LIWC, that people maintain good linguistic consistency in most language categories – often despite predictable situational fluctuations in language use. The same consistency across time and place has been found for naturalistic, everyday PERSONALITY AND LANGUAGE 7 spoken language as well. In one of the first studies using the Electronically Activated Recorder (EAR; Mehl, Pennebaker, Crow, Dabbs, & Price, 2001) methodology, Mehl and Pennebaker (2003) recorded college students in their natural environments over the course of two 2-day periods spaced 4 weeks apart. The EAR sampled 30 seconds of ambient sounds roughly every 12 minutes. All captured talking was transcribed and coded for participants’ location, activity, and mode of conversation (i.e., telephone or face-to-face). With few exceptions, both linguistic and psychological categories were substantially correlated across time, activity, and interaction mode. Interestingly, the authors also found that function word categories, including grammatical parts of speech such as pronouns and articles, were more consistent than were content-based psychological categories (average function word r = .41; average psychological processes r = .24). Taken together, this past research establishes that people’s natural language use is characterized by a good degree of temporal stability and cross-situational consistency. Therefore, the ways in which people spontaneously use language – for example, idiosyncrasies in word choices or speaking styles – satisfy psychometric requirements for personality or individual difference variables. Language Style versus Language Content Early analyses of personality and language tended to base their conclusions on transcripts that had been coded for content words and phrases (e.g., Smith, 1992). Indeed, in computational linguistics, the function words that make up language style continue to be referred to as “stop words” because they are usually ignored during automated language processing. However, individual differences in language style are often more psychologically telling and psychometrically parsimonious than are differences in language content. Language style is PERSONALITY AND LANGUAGE 8 defined by a person’s use of function words, including pronouns, articles, conjunctions, and several other categories that make up the grammatical structure of utterances (Table 1). Function words tend to be short, frequently used, and have little meaning outside the context of a conversation. In part because of these characteristics, they are processed fluently and largely automatically during both language production and comprehension (Bell, Brenier, Gregory, Girand, & Jurafsky, 2009; Levelt, 1989). Language content, on the other hand, is defined by a person’s use of nouns, verbs, adjectives, and most adverbs. In short, content words determine what a person says, and function words determine how they say it. Table 1. Function word categories and examples. Category LIWC Label Examples First-person singular pronouns I I, me, my Third-person singular pronouns shehe she, his, her Second-person singular pronouns you you, y’all, yours Third-person singular pronouns we we, our, us Third-person plural pronouns they they, their, them Impersonal pronouns ipron it, those, there Articles article a, an, the Conjunctions conj and, but, because Prepositions preps in, under, about Auxiliary verbs auxverb shall, be, was High frequency adverbs adverb quite, highly, very Negations negate no, not, never Quantifiers quant much, few, lots Note. Only basic-level and not superordinate function word categories were included. Categories are from LIWC2007 (Pennebaker et al., 2007). To a greater degree than function words, content words are practically constrained by the topic or context of conversation. For example, group members assigned to solve math problems together will uniformly use content words related to that task (e.g., calculate, solution) regardless of whether they are each individually thinking about the problems in different ways. Function words, on the other hand, are more loosely constrained by the topic of a conversation, allowing people to discuss the same content in different styles. The versatility of function words allows PERSONALITY AND LANGUAGE 9 researchers to measure differences in language style across contexts rapidly and objectively. Despite some degree of natural verbal and nonverbal convergence between individuals during social interaction (Chartrand & van Baaren, 2009; Ireland & Pennebaker, 2010; Pickering & Garrod, 2004), function words used during conversation reliably reflect differences in social status, honesty, and leadership styles (Hancock, Curry, Goorha, & Woodworth, 2008; Pennebaker, 2011; Slatcher, Chung, Pennebaker, & Stone, 2007; see Tausczik & Pennebaker, 2010 for a review). A second reason that researchers might focus on language style instead of or in addition to content is that function words tend to more directly reflect social cognition during conversation. The relationship between function word use and social cognition is primarily a practical matter: Because function words have little meaning outside of the context of a sentence, they require common ground or shared social knowledge to be interpreted (Pennebaker et al., 2003). For example, to understand the sentence He shut the dog in there, the speaker must know that the listener shares his knowledge of the man, the dog, and the location in question. This mutual understanding of a situation, its potential referents, and each conversation partner’s knowledge of the situation is known as common ground and theoretically forms the foundation of any successful conversation (Clark & Brennan, 1991). Given that interest in and attention to others’ thoughts and feelings is an integral aspect of personality (e.g., particularly Big Five extraversion and agreeableness), the ability to automatically extract individuals’ social cognitive styles from their language use could be a valuable addition to personality researchers’ toolboxes. A final reason for paying attention to style in addition to content is purely psychometric. Individual differences in language observed in early language research focused primarily on phrases that include both language content and style. To use women as an example, female PERSONALITY AND LANGUAGE 10 speakers tend to use more uncertainty phrases, such as I wonder if, and extra-polite phrases, such as would you mind, than male speakers do (Holmes, 1995; Lakoff, 1975; Poole, 1979; Rubin & Green, 1992). However, many of these phrases can be measured more simply by counting the function words that they commonly include – specifically, in the previous examples, first-person singular pronouns (e.g., I) and auxiliary verbs (e.g., would). Indeed, an analysis of formal writing in the British National Corpus found that function words offered the most efficient way to classify texts as authored by men or women (Koppel, Argamon, and Shimoni, 2003). Similarly, a corpus analysis of spoken and written language collected in 70 studies revealed that function words more reliably discriminated between male and female participants than did content words (Newman, Groom, Handelman, & Pennebaker, 2008). In personality research specifically, direct comparisons are relatively sparse. However, style appears to provide the best classification accuracy for neuroticism, providing gains over content alone and even over content and style combined (Argamon, Koppel, Pennebaker, & Schler, 2009). Whether similar effects will be found for other personality traits and individual differences has yet to be conclusively determined. In the end, whether language content or style is a more reliable indicator of personality and individual differences may depend to a large degree on what personality measures are used to establish criterion validity. Although individual differences such as age and sex appear to be more strongly linked to language style, personality traits as measured by Big Five scales are often more consistently and strongly linked with language content – including both language categories and individual words – than language style. This may be because research exploring the link between language use and the Big Five has overwhelmingly used personality self-reports as the gold standard of true personality, whereas demographic variables can be more objectively PERSONALITY AND LANGUAGE 11 observed. The pattern of personality self-reports matching language content and demographic individual differences matching language style may be due to a match between the automaticity of the behavior and the measurement (Eastwick, Eagly, Finkel, & Johnson, 2011). For example, a neurotic person who realizes that neuroticism is socially undesirable may – deliberately or not – project a cheerful exterior by using positive emotion words and by downplaying his anxiety in self-report questionnaires. However, less accessible components of language use, such as increased use of self-references like I and me, may correlate with less accessible behavioral indicators of worry such as compulsively checking the status of a loved one’s flight or spending extensive time on WebMD. Given the abundance of online language use – including e-mails, blogs, and online chats, which are often archived by default (see Baddeley, 2011) -- and the fact that many nonverbal behaviors can be accessed simply by downloading browser histories, future personality research may be able to incorporate naturalistic measures of individuals’ online behavior to triangulate when and where language style and content and behavioral and selfreported personality converge. The Big Five Personality Domain The literature on language and the Big Five is the largest of the subareas within the study of personality and natural language. To accommodate its size, the sections below first summarize the samples commonly used in this research and next address major findings for each Big Five dimension individually. Language samples. The kind of naturalistic language that has perhaps most frequently been subjected to computerized text analysis and linked with the Big Five is online or computerized language use. In the roughly 20 years since the internet was made accessible to the PERSONALITY AND LANGUAGE 12 general public, language has become the most accessible naturalistic behavior available to behavioral scientists. As the sections below will explore as well, everyday verbal behavior is carried out online and often automatically saved in blogs, social networking sites, e-mail accounts, online chats, and text messages. More formal texts abound as well, including a huge range of academic submissions, ranging from admissions essays to published scholarly work, not to mention nearly a fifth of the fictional novels, poetry collections, and nonfictional books published in recorded history (see Michel et al., 2010). Considering that this goldmine of information is often free and accessible to anyone with the necessary web programming or copying-and-pasting skills, it is surprising that only a few studies linking the Big Five and the kind of quasi-naturalistic language use that occurs in these formats have been conducted. The studies that have been conducted show great promise, however, for both understanding naturalistic manifestations of personality and for the longstanding goal of automatically building personality profiles based on behavioral data (Dodds & Dansforth, 2009; Mairesse & Walker, 2011; Mairesse, Walker, Mehl, & Moore, 2007). A few studies have gone to the effort of collecting spoken language as it occurs in real life. These studies were made possible by the advent of the EAR, or Electronically Activated Recorder, about a decade ago (Mehl et al., 2001). The EAR is a programmable audio recorder that periodically records snippets of ambient sounds (e.g., 30 seconds every 12 minutes). When the EAR records, it captures any surrounding noise – including language used by subjects in their daily interactions with their social networks. Later, trained transcribers listen to the recordings, type the language they hear and typically also coding for basic features of subjects’ momentary social environments (e.g., location, activity). Within studies that have looked at laboratory writing or dialog tasks, language use largely PERSONALITY AND LANGUAGE 13 falls into two categories: tasks with face-valid relevance to personality, such as asking people to talk about events that were important in shaping their identity, and those that attempt a more circumspect route, such as asking students to describe an object (e.g., a water bottle; Pennebaker, 2011). Not surprisingly, considering that the criterion for personality is nearly always responses to face-valid self-report scales, language used in the former tasks tends to correlate more strongly with personality dimensions. For example, although Pennebaker & King found only a small number of modest significant correlations (rs < .20) between self-reported Big Five traits and language used in stream-of-consciousness writing and essays about coming to college, Hirsh and Peterson (2008) and Fast and Funder (2008) found a large number of moderate correlations (rs = .20-.40) between self-reported personality and language used in separate tasks that asked participants to describe their life stories. Extraversion. Among the Big Five, extraversion tends to leave some of the strongest and most predictable traces in individuals’ online and physical environments (Gosling, 2008) and, to a slightly lesser degree, language (Mairesse & Walker, 2006). People who are rated by others and themselves as higher in extraversion use less inhibited (e.g., careful, avoid), tentative (e.g., doubt,,maybe), and self-focused language, more positive emotion words (e.g., adorable, nice), are more talkative, and talk more about social topics, such as friends, people, communication, and leisure activities (Augustine, Mehl & Larsen, 2011; Dewaele & Furnham, 1999; Fast & Funder, 2008; Mehl et al., 2006; Mairesse & Walker, 2006; Oberlander & Gill, 2006; Qiu, Lin, Ramsay, & Yang, 2012; Walker et al., 2007; Yarkoni, 2010). Word- and phrase-level analyses of large corpora made up of Facebook status updates (Kosinski & Stillwell, 2012) and blog entries (Yarkoni, 2010) have found that party, bar, can’t wait are among the best indicators of high extraversion, and internet, computer, and cats are good indicators of low extraversion. PERSONALITY AND LANGUAGE 14 In terms of language style, extraverts use more immediate first-person plural and secondperson pronouns such as we and you (i.e., pronouns used primarily to talk with rather than about a person; Dewaele & Furnham, 2000; Holtgraves, 2011; Yarkoni, 2010). Although extraverts do not appear to use first-person singular pronouns at different rates than their introverted counterparts overall (Mehl et al., 2006; Pennebaker & King, 1999; Yarkoni, 2010), some evidence suggests that the link between extraversion and I is moderated by the words that firstperson singular co-occurs with. A study that counted pairs of co-occurring words rather than individual words found that extraverts used some I-phrases more frequently and others less frequently than introverts do, leading to null first-person singular correlations overall. For example, Gill and Oberlander (2002) found that people asked to write e-mails to a close friend used a greater variety of I-phrases and more bigrams containing negations (I don’t, I’m not) to the degree that they reported being relatively introverted. In the same study, those higher in extraversion limited themselves to a small number of less negative and perhaps implicitly more social first-person phrases such as I’ll be and I was. Agreeableness. A clear and intuitive indicator of agreeableness is linguistic positivity. Agreeable people use more positive language, including both verbs (e.g., laughing) and modifiers (e.g., lovely), and fewer negative emotion words (e.g., damn, jerk) in everyday speech and writing (Augustine et al., 2011; Holtgraves, 2011; Küfner, Back, Nestler, & Egloff, 2010; Yarkoni, 2010). They have also been found to use more social words (Küfner et al., 2010) and self-references (Mehl et al., 2006), which, in the context of generally cheerful language, may suggest polite hedge phrases such as I think rather than the ruminative self-references that characterize negative affective traits (see Depression, below). First-person singular also signifies lower social status, relative to other conversation partners (Kacewicz, Pennebaker, Davis, Jeon & PERSONALITY AND LANGUAGE 15 Graesser, in press) further suggesting that I-words used by agreeable individuals reflect polite self-effacement rather than neurotic self-consciousness (see Holtgraves, 2010, for a review of polite language use). Consistent with this interpretation, one of the best classifiers of high agreeableness in Facebook status updates is the phrase thank you (Schwartz et al., 2013). A facet-level analysis of agreeableness demonstrated that it may be one of the most cohesive five-factor dimensions, linguistically and otherwise: Across most or all of its five facets, people who rank higher in agreeableness talk more about home, family, communication, and avoid sensitive topics such as death (e.g., coffin, killer; Yarkoni, 2010). Another face valid linguistic correlate of agreeableness is swearing. Unsurprisingly, people swear less to the degree that they report being more agreeable (Holtgraves, 2010; Mehl et al., 2006; Yarkoni, 2010; see Jay, 2009). Indeed, the five words that best discriminate between individuals ranking high and low in agreeableness in Facebook status updates are all swear words (Schwartz et al., 2013). The negative correlation between agreeableness and swearing fits with lay theories of personality as well and is correctly interpreted by outside observers of students’ EAR recordings (Mehl et al., 2006). Given the low overall incidence of swearing – making up about 1/3 of a percent of spoken conversation and 1/10 of a percent of emotional writing – these results essentially mean that highly agreeable people are unlikely to swear even once in a given sample, whereas a highly disagreeable person might swear only a few times. Nevertheless, swearing, like negative emotion words, another low-frequency category, is a potent and reliable indicator of agreeableness and other key psychological variables (e.g., Robbins, Focella, Kasle, Weihs, Lopez, & Mehl, 2011). Neuroticism. Low emotional stability, or neuroticism, is characterized by negative thoughts, anxiety, and ruminative self-focus (Teasdale & Green, 2004). The language use of PERSONALITY AND LANGUAGE 16 individuals who rate themselves as high in neuroticism reflects these characteristics in higher rates of I-words, negations, and references to sadness, anger, and anxiety in written life stories, stream of consciousness essays (in both Korean and English), blogs, and text messages (Argamon et al., 2009; Holtgraves, 2011; Hirsh & Peterson, 2008; Lee et al., 2007; Mairesse et al., 2007; Pennebaker & King, 1999; Qiu et al., 2012; Yarkoni, 2010). In Facebook status updates, the words basketball and success were the best classifiers of emotionally stable individuals, whereas the single best classifier of high neuroticism was the swear word fucking, closely followed by variants of depression and the phrases I hate and sick of (Schwartz et al., 2013). In contrast with the majority of research cited above, everyday spoken language shows very few signs of neuroticism (Mehl et al., 2006). The reason behind the discrepancy between these sets of findings may lie in the extent to which individuals feel pressure to behave in socially desirable or appropriate ways (Mehl & Holleran, 2008). Like depression, neuroticism is known to be a socially undesirable trait, and the expression of negative emotion is typically considered a private event. As such, individuals who are aware that they tend to be dysphoric and anxious may mask their negative emotions from many of the individuals they interact with directly in their daily lives. Thus, like depression, disclosure of negative emotions may turn out to be moderated by closeness with one’s interaction partners (Baddeley, 2011; see Depression). In anonymous experimental writing, during which showing obvious signs of neuroticism could not possibly matter, or text messages and Facebook statuses, which are likely to be read by friends rather than casual acquaintances or passersby, neurotic individuals may feel free to express their chronic negativity (Holtgraves, 2011; Schwartz et al., 2013). Consistent with this interpretation, Mehl and Holleran (2008) found that private but not public negative emotion word PERSONALITY AND LANGUAGE 17 usage reflects increased neuroticism. As a result of this moderation, when language is aggregated across conversation contexts, ranging from public conversations with professors to private conversations with relationship partners, the typical markers of neuroticism may be critically attenuated, resulting in surprisingly modest effect sizes for intuitively valid markers of neuroticism, such as anxiety words (Pennebaker & King, 1999). Openness. The trait of openness is an interesting case that strongly recommends the inclusion of facet level traits in research on language and personality. First, openness is relatively difficult to capture linguistically. Analyses of naturalistic spoken and text message conversations have produced a number of null findings and a handful of significant correlations that fail to paint a cohesive or theory-consistent picture (Holtgraves, 2011; Mehl et al., 2006). When meaningful patterns do emerge, the linguistic indicators of openness seem to reflect only its intellectual aspects, ignoring facets related to artistic expression and emotionality. Bloggers whose self-reports indicated a higher degree of openness used more articles and prepositions and fewer personal pronouns and references to family and home (Yarkoni, 2010). On Facebook, highly open individuals are distinguished by more frequent uses of the words universe, writing, and music (Schwartz et al., 2013). In life history interviews, people who ranked higher in openness in general and intellectualism specifically used more articles as well (Fast & Funder, 2008), and in a Korean stream-of-consciousness writing sample, people who were higher in openness produced more sentences and talked about sleeping and resting less (Lee, Kim, Seo, & Chung, 2008). In other words, at least when talking about themselves or their interests at length, people who are high in openness seemed to adopt a formal rather than narrative writing style. This distinction is captured by Biber’s (1995) involved-informative dimension of language use, which describes a high rate of verbs and pronouns at the involved PERSONALITY AND LANGUAGE 18 end of the spectrum and nouns (implied by the use of articles) and prepositions at the informative end. Informative language tends to be more characteristic of male language use, scripted speech, formal writing, and, in the last US presidential elections, liberal political candidates (Koppel et al., 2003; Pennebaker, 2011). Intellectualism and liberalism may be salient and central characteristics of openness, but they are hardly necessary or sufficient for a person to rank relatively highly on a broad fivefactor measure of openness. The other facets of openness concern interest in art, emotionality, adventurousness, and imagination. In the only study that examined the relationship between language and the finer-grained facets of the Big Five, Yarkoni (2010) found that several of the language categories that predicted emotionality and artistic interest were at odds with those that predicted the remaining facets. People who are emotionally and artistically open use more personal pronouns, references to physical states, positive emotions, and words related to leisure or rest; liberal, intellectual, imaginative, and adventurous bloggers used fewer of each category. In studies that reported language correlates of only the primary Big Five dimensions, this moderation by facet is likely to have led to null results for categories that show full crossover effects between facets and weakened results for categories that only occur for one or a few facets. For example, in the Yarkoni (2010) sample, the modest negative correlation between references to home and total openness scores was diluted by the finding that references to home are unrelated to emotionality and artistic interests. Conscientiousness. Like agreeableness, conscientiousness appears to manifest in everyday language use as polite speech. In blogs as well as students’ EAR recordings and stream-of-consciousness essays, conscientious individuals swear less and use fewer negative emotion words regardless of their sex (Lee et al., 2007; Mairesse & Walker, 2006; Mehl at al., PERSONALITY AND LANGUAGE 19 2006; Pennebaker & King, 1999; Yarkoni, 2010). At the level of individual words as well, conscientiousness is best defined by the words that people high on the trait do not use. For the facets of achievement striving and self-discipline in particular, almost every one of the top 10 strongest single-word predictors were negatively correlated (e.g., protest, boring), excepting only a few positively correlated words, including ready and HR[human resources] (Yarkoni, 2010). Schwartz and colleagues’ (2013) word- and phrase-level Facebook analysis corroborates these findings: Although highly conscientious people use the phrases ready for, to work, and great day more than less conscientious people do, the strongest correlates of conscientiousness were negative, including several swear words as well as the words YouTube, and bored. The high number of negative correlations suggests that, perhaps especially for those high in achievement striving and self-discipline facets of conscientiousness, increased conscientiousness is associated with regulation or inhibition of those words that serve as indicators of individuality for other traits. That behavior pattern would be consistent with findings that people who rank higher in conscientiousness tend to value conformity more than less conscientious individuals do (Roccas, Sagiz, Schwartz, & Knafo, 2002). Surprisingly, women used second-person pronouns, or you, much less to the extent that they were conscientious, whereas the correlation was the opposite for men – a pattern that observers listening to participants’ daily recordings accurately (although modestly) observed and used in their personality ratings (Mehl et al., 2006). Like many pronouns, you is a versatile word. However, it most notably predicts hostility, and is associated with both state and trait anger (Simmons, Chambless, & Gordon, 2005; Weintraub, 1981). It may be the case that, in the context of daily social interactions with friends and strangers, conscientious people strive to follow societal gender norms that define men as assertive (more you) and women as PERSONALITY AND LANGUAGE 20 nonconfrontational (less you). Moderation by sex. Sex2 appears to significantly moderate the strength of behavioral signals of personality in some cases and reverse them in others. Moderation appears to be particularly likely in cases where word categories have relatively low base rates and are taboo or socially sensitive (characteristics that are obviously not unrelated). In an interesting study that was designed to capture correlations between Big Five traits and swearing, a taboo linguistic dimension that is especially elusive in more controlled written language use, Fast and Funder (2008) had participants complete a life history interview as well as a spontaneous face-to-face conversation with two strangers also in the experiment. When acquaintances’ ratings of participants’ personality (i.e., informant reports) were compared with swearing across these two natural dialog tasks, men but not women who swore were rated by informants as more extraverted. (In that study, self-report personality measures were not collected.) Mehl et al. (2006) found a similar moderation by sex: Untrained judges who listened to EAR recordings of students’ daily lives rated men but not women as more extraverted when they swore. In that study, self-reported extraversion was unrelated to swearing for either sex. In other words, in contrast with individuals’ self-ratings, judges tend to interpret swearing as a sign of aggression or assertiveness which, for men but not women, might be used as a heuristic for extraversion. Indeed, in Mehl et al (2006) men who argued more during their EAR recordings were also seen by judges – but not by themselves – as more extraverted. As mentioned above, the display of content-based indicators of neuroticism, such as negative emotion words, is likely moderated by whether language use is public and identifiable or private and anonymous. Consistent with the trait’s characteristic negative emotionality, worry, and tendency to over-report health symptoms (Watson & Pennebaker, 1989), women who are PERSONALITY AND LANGUAGE 21 more neurotic use more negative emotion words and references to health in naturalistic text messages (Holtgraves, 2011). However, men who are more neurotic not only fail to use more negative emotion words, but they also fail to use any language category that was examined more or less than those who are more emotionally stable. Mehl et al.’s (2006) EAR recordings showed a similar effect: whereas for women, neuroticism was reflected in decreased verbosity and laughter and increased arguing, men’s neuroticism was reflected in more socializing and time spent outdoors. These results may be due to gender differences in emotional display rules. It may be that the traditional social sanction against male expressions of strong emotion causes men to regulate their emotional language (even in private text messages) and, as an alternative, cope with the negative emotions of neuroticism through social and physical activity to a greater degree than women do. Trait Emotionality Presumably due to the strong presence of the Big Five approach in personality research, trait positivity and negativity has received relatively little attention in language research. Yet, there are a few studies that suggest both face-valid and non-obvious relationships between trait emotionality and natural language. Despite the apparent importance of the trait, however, those studies that have investigated language and trait affect have not examined its relationship with and stylistic aspects of natural language such as pronoun usage. Rather the aim tends to be on improving the ability to unobtrusively gauge individuals’ affect through automated text analysis (e.g., Cohen, Minor, Najolia, & Hong, 2009; Dodds et al., 2010). Trait negativity. Unlike more specific measures of depression or neuroticism, general psychological distress appears to be strongly related to more references to negative emotions and fewer references to positive emotions. This is what Cohen (2011) found in a sample of students PERSONALITY AND LANGUAGE 22 who were asked to speak in monologue about a recent disagreement. In this study, the author used custom positive and negative emotion dictionaries that excluded positive emotion words with more common non-emotional meanings such as like and pretty. Psychological distress in that sample was unrelated to positive or negative emotion word usage for LIWC, although LIWC’s negative emotion category was positively correlated with depression symptoms in the same sample (Cohen, 2011). Part of the success of this study in accurately tracking negative affect in dialogue may be due to the nature of the task, which likely afforded greater use of emotional language than more neutral monologues. The study also suggests, however, that LIWC, the most widely used text analysis program in personality research, has room to improve its emotion categories (O’Carroll Bantum & Owen, 2009). Trait anger. Walter Weintraub (1981), a pioneer in the study of personality and language, found in a case study of a young man with an explosively angry temperament, that angry speech is characterized by a high rate of second-person pronouns (e.g., you, you’re). A follow-up analysis of simulated anger in spontaneous monologues produced by two trained actors replicated the second-person singular finding and further found that angry speech contained fewer uses of we, more uses of me, more swear words, and more negations (e.g., no, not). (Coming before the computer revolution, Weintraub and collaborators counted pronouns by hand.) The parallels between the chronically angry young man and the actors asked to simulate anger suggest that these linguistic characteristics reflect both state and trait anger. Despite the small sample sizes used in Weintraub’s original studies, his results presaged later findings regarding the roles of decreased we and increased you and me as indicators of negativity during marital disputes (Simmons et al., 2005) and hostility during interactions between family members (Simmons, Chambless, & Gordon, 2008). PERSONALITY AND LANGUAGE 23 Trait positivity. Positive emotionality or trait positive affect is another broad and important aspect of personality. Beyond the fact that being happy is valuable for its own sake, happier people live longer, a fact that is reflected in language use. In a sample of 180 autobiographical sketches written by Catholic nuns when they first entered their convent, women who used more positive language had lower mortality risk between the ages of 75 and 95 (Danner, Snowdon, & Friesen, 2002). However, Danner et al. (2002) did not measure trait positive affectivity, optimism, or other enduring personality dimensions outside of nuns’ positive emotion references. As such, it is unclear whether their results were, as the longitudinal results strongly suggest, due to trait rather than state positivity. If we assume that positive language tapped some chronic underlying trait in that study, it is not clear which trait it specifically reflects. More recent research sheds some light on what trait or traits positive references in Danner et al. (2006) may have captured. In unstructured experimental language use, trait positive affect is, as expected, positively correlated with positive emotion word usage. For example, a sample of students who were instructed to talk aloud about any topic that they chose for 3 minutes used more positive emotion words to the degree that they scored higher on measures of trait positive affect and behavioral approach system sensitivity, and used more negative words to the degree that they ranked higher in trait negative affect and behavioral inhibition system sensitivity (Cohen, Minor, Bailie, & Dahir, 2008). Psychopathology-Related Personality Traits Type A. Some of the earliest research on language use and personality interestingly came out of behavioral medicine. In a series of studies, Scherwitz and colleagues (for reviews see Scherwitz & Canick, 1988; Scherwitz, Graham, & Ornish, 1985) linked self-involvement to the PERSONALITY AND LANGUAGE 24 Type A behavior pattern and Coronary Heart Disease (CHD). Self-involvement was operationalized as the frequency with which participants used first-person singular pronouns during the structured Type A interview. Type A emerged as positively related to the use of firstperson singular. Further (and going beyond personality), use of first-person singular pronouns was related to blood pressure, coronary atherosclerosis, and prospectively to CHD incidence and mortality. Interestingly, the relationship between first-person singular use and CHD outcome often remained significant even after statistically controlling for the Type A behavior pattern (Scherwitz et al., 1985; Scherwitz & Canick, 1988). Despite these promising early findings, links between pronoun use, Type A, and heart disease have not pursued in recent years (see Rohrbaugh, Mehl, Shoham, Reilly, & Ewy, 2008). Depression. One theory of depression suggests that emotional pain – like physical pain – forces people to pay attention to themselves (Pyszczynski & Greenberg, 1987). High degrees of self-focus can be seen in people’s inattention to others and preoccupation with the self. Other work on self-focus suggests that states of self-awareness are linked to elevated use of first-person singular pronouns – especially the use of I-words such as I, I’m, I’ll (e.g., Davis & Brock, 1975). Depressive states have been linked to higher use of I-words across several genres. In some of the first systematic word count studies, Walter Weintraub (1981) found that stream of consciousness writing produced much higher use of I-words in depressed patients than patients dealing with other medical disorders. More recently, Rude, Gortner and Pennebaker (2004) found that college students diagnosed with depression used more I-words than non-depressed students when writing essays about their college experiences. Also, in natural speech captured over several days of tape recordings, use of first-person singular is more frequent among those with high depression scores than those with low depression scores (Mehl, 2006b). PERSONALITY AND LANGUAGE 25 The link between depression and increased self-focus is found not only in everyday language use but in published writing as well. Being a published poet is a surprisingly dangerous job. According to Kay Jamison (1995), published poets are up to 18 times as likely as the general population to commit suicide. Her research suggests that poets have an extraordinarily high rate of bipolar disorder which is known to be closely linked to suicidal behaviors. Motivated by this observation, Stirman and Pennebaker (2001) analyzed the published work of 18 poets – 9 who committed suicide, and 9 yoked controls. They found that those who eventually committed suicide used first-person singular pronouns at higher rates than those who did not (Stirman & Pennebaker, 2001). Ironically, suicidal poets did not use more negative emotion words than other poets. Overall, suicidal poets’ language use showed that they were more self-focused and less socially integrated than non-suicidal poets. Surprisingly, depression diagnosis and sub-clinical symptomatology are only positively correlated with references to negative emotions in certain contexts. For example, in naturalistic e-mails, talking, and blog posts, clinically depressed people and those with higher subclinical depression scores use as many or more references to positive feelings overall as never-depressed and less depressed individuals do (Baddeley, 2011; Rodriguez, Holleran, & Mehl, 2010). In a sample of college students with subclinical depression symptomatology, Mehl (2006) found no correlation between depression symptom severity and use of positive emotion words, sadness, and anger words in everyday spoken language. It is clear, however, that people suffering from depression experience more negative emotions than do unaffected people, and that those who are vulnerable to or currently experiencing depression preferentially focus on negative over positive stimuli (Bistricky, Ingram, & Atchley, 2008; Teasdale, 1988). Null results for negative emotion word use and depression symptoms therefore suggest that depressed individuals inhibit the most PERSONALITY AND LANGUAGE 26 obvious markers of depression – negativity, sadness, irritability – by regulating their emotional language, perhaps in an attempt to avoid the negative social consequences of depression. Depressed individuals may mask negative emotions in order to maintain their social network. Jenna Baddeley and colleagues have conducted two studies of naturalistic language use among individuals with major depressive disorder (MDD) that may shed light on where and to whom negative emotions are expressed during depression. In an EAR study that recorded the daily lives of individuals from the community with and without MDD over 3-4 days, she found that depressed individuals laugh, socialize, and talk as much as non-depressed people (Baddeley, Pennebaker, & Beevers, 2012). In contrast with other EAR findings, those participants with MDD did use negative emotions more frequently than never-depressed participants; however, their negative emotion word use was moderated by interaction partner, such that they primarily expressed negativity only with close friends and relationship partners. A similar pattern emerged in real-life e-mails. In a recent study, Baddeley (2011) downloaded e-mails sent and received by individuals with and without MDD over the course of 1 year. For MDD participants, that year included a depressive episode and a period of remission that each lasted at least 1 month. She found that depressed individuals overall used more positive language than did matched controls, and that when depressed individuals used negative emotion words they did so primarily with correspondents they rated as being close to them. Depressed individuals’ tendency to hide their depression from all but their closest friends may be an effective method of coping with the negative social consequences of depression, such as being avoided by non-depressed friends (Schaefer, Kornienko, & Fox, 2011). Content words, such as the negative emotion terms awful and cry, and their meaning are more salient during language processing than are function words like I and me; as a result, they are easier to inhibit PERSONALITY AND LANGUAGE 27 and control (Schmauder, Morris, & Poynor, 2000; Townsend & Saltz, 1972). Therefore, when depressed individuals attempt to appear normal, content markers of depression decrease and function word markers of depression remain the same, results suggest. The Dark Triad. Another area of psychopathology-related individual differences that language researchers are increasingly interested in is the Dark Triad (Paulhus & Williams, 2002). The Dark Triad is comprised of a group of socially antagonistic personality traits: Machiavellianism (a tendency to strategically deceive and manipulate others), narcissism (a tendency to hold inflated, grandiose self-views), and psychopathy (a tendency to show disregard and a lack of empathy and remorse towards others). Similar to research on subclinical depression, the Dark Triad personalities are typically considered sub-clinical, non-pathological versions of “full blown” clinical personality disorders. Machiavellianism, narcissism, and psychopathy share a “dark” socially malevolent core but also have unique psychological aspects and correlates (Paulhus & Williams, 2002). Among the dark triad personalities, subclinical narcissism has by far received the most scientific attention (Campbell & Miller, 2011). The idea that narcissism is characterized by a frequent use of first-person singular pronouns was straightforward given narcissists’ pervasive self-focus in social interactions. This hypothesis was therefore also the first to be subjected to empirical tests. Raskin and Shaw (1988) found that subclinical narcissism correlated positively with the use of first-person singular and negatively with the use of first-person plural in tape-recorded impromptu monologues. Such a bias towards linguistic egocentrism (Weintraub, 1981) appeared theoretically meaningful and suggested that narcissism is distinctly manifested in people’s natural use of self-references. Building on Raskin and Shaw’s (1988) landmark study, other researchers have taken first-person singular use as a face-valid linguistic indicator of self-focus. DeWall, Buffardi, PERSONALITY AND LANGUAGE 28 Bonser, and Campbell (2011) found that narcissistic Facebook users who did not draw attention to themselves in their profile pictures (e.g., by posting a sexy, revealing photo) used first-person singular self-references at a high rate in their profile self-descriptions. Similarly, narcissistic participants who did not use antisocial language (i.e., swear or anger words) in personality essays (explaining why they possess the traits of honesty, trustworthiness, and kindness) used firstperson singular self-references at a high rate in their essays. In both studies, presumably, the use of first-person singular indexed a compensatory verbal attention seeking strategy—although, interestingly, it did not show a (reported) significant zero-order correlation with narcissism. Most recently, DeWall, Pond, Campbell and Twenge (2011) tested the notion of a generational increase in narcissism by tracking the use of self-references in popular U.S. song lyrics. The authors reported a linear trend pointing to an increase in first-person singular pronouns between 1980 and 2007. Again, presumably, the use of first-person singular in songlyrics indexes (cultural) egocentrism and the data are meant to speak to the debate on whether or not the U.S. society has become more narcissistic over time (Trzesniewski & Donnellan, 2010). Several other studies, though, question the simple logic of I-use equals narcissistic selffocus. Sampling language representatively from all of participants’ daily conversations (using the EAR method), Holtzman, Vazire, and Mehl (2011) found no systematic association between use of first-person singular and narcissism (r = .13, p = .25). Also, Fast and Funder (2010) found no simple correlation between use of first-person singular and narcissism among participants who underwent a life history interview (r = .02 for women and r = .11, n.s. for men). Among men but not women, authority – a facet of narcissism – was significantly positively related to I-use. However, self-sufficiency, yet another facet, showed a significant negative association with I-use among men. Among women, I-use was generally more strongly related to reported and observed PERSONALITY AND LANGUAGE 29 anxiety and depressive symptoms than it was to any facet of narcissism – a finding that is consistent with prior research on the link between first-person singular self-references and depression (Mehl, 2006b; Rude, Gortner, & Pennebaker, 2004; Weintraub, 1981). The scientific question of whether or when first-person singular use indexes narcissism is thus still open. Conceptually, it seems important to distinguish between self-focus and self-awareness, with narcissists presumably ranging high on the former but low on the latter. Future research should aim at reconciling the discrepant findings by identifying language contexts that particularly afford the expression of narcissism and neuroticism (or negative affect or depression), respectively. Going beyond the narcissism-I link, Holtzman and colleagues (2010) found that narcissism in general, and especially its “toxic” components Superiority/Arrogance and Exploitativeness/Entitlement were correlated with a more frequent use of swear and anger words. The study further found that narcissists make more sexual references in their daily language use. Finally, in the only language study on the Dark Triad personality trait of Machiavellianism, Ickes, Reidhead and Patterson (1986) used text analysis to compare Machiavellianism and selfmonitoring. Following the idea that Machiavellianism is a form of self-oriented, assimilative impression management, it was positively related to the use of first-person singular (and plural) pronouns and negatively related to the use of second- and third-person pronouns. Selfmonitoring, on the other hand, as a form of other-oriented, accommodative impression management, was negatively related to first-person singular use and positively related to secondperson pronouns. Summary and Future Directions The aim of this chapter has been to provide a blueprint of research on language and PERSONALITY AND LANGUAGE 30 personality up to this point that depicts both its structural soundness and need for additions and improvements. In closing, we will provide an overview of the existing studies in form of a summary table (see Table 2) and outline a few recommendations for future progress. Table 2. Summary of Linguistic Indicators of Personality Correlates Qualifier (e.g., Moderator) Big Five Extraversion second-person pronouns (+), first-person Sex plural pronouns (+), positive emotion (+), social (+), leisure (+), sex (+), inhibition (-), tentativeness (-) Agreeableness positivity (+), first-person singular pronouns (+), social (+), home (+), family (+), communication (+), death (-), money (-), swearing (-) Openness articles (+), prepositions (+), personal Contradictory facets pronouns (-), family (-), home (-), rest (-) Conscientiousness swearing (-), negative emotion (-) Sex Neuroticism first-person singular pronouns (+), Sex, both only in private negative emotion (+) Psychopathology Type A first-person singular pronouns (+) Narcissism first-person singular pronouns (+), Sex, Communication negative emotion (+), anger (+), swear Context (+), sex (+), first-person plural pronouns (-) Machiavellianism first-person singular pronouns (+), first- Sex person plural pronouns (+), second person pronouns (-), third-person pronouns (-) Depression first-person singular pronouns (+), Public vs. private, negative emotion (+) correspondent closeness Trait Emotionality Anger second-person singular pronouns (+) Negative emotions negative emotion (+) Writing or speaking topic Positive emotions positive emotion (+) Note. See text for references. For the Big Five, only the most common and universal correlates are listed. Finding consistent threads among studies is sometimes made difficult by differing methodologies. Even among studies that used the same text analysis tool, some focused only on linguistic content rather than all categories, and others used different versions of a program that PERSONALITY AND LANGUAGE 31 include several non-overlapping categories. The literature on language and personality would no doubt benefit from more comprehensive reporting of effects, in papers or in online supplemental materials. The existing studies suggest that both content and style categories are critical. Although content words are more susceptible to self-regulation and thus tend to be lower fidelity indicators of internal states, the degree to which a person’s language use fails to reflect their selfor informant-reported personality is often a telling indicator of self-regulatory personality processes and person x situation interactions (Baddeley, 2011; Baddeley & Pennebaker, 2012; Mehl et al., 2006; Mehl & Holleran, 2008). Style words are often more challenging to interpret, but are valuable as the mostly automatic, and therefore more psychologically representative, indicators of attentional focus and thinking styles (see Tausczik & Pennebaker, 2010). Content and style are two sides of a data-rich coin, and personality psychology has much to gain from increasingly considering both aspects of language use. In order to correctly interpret the nature and true magnitude of effects, studies of language and personality may also need to increasingly measure and consider a range of potential moderators or modifiers, including facet-level trait measures (Yarkoni, 2010), individuals’ sex (Mehl et al., 2006), whether language use is public or private (Mehl & Holleran, 2008), the closeness of conversation partners (Baddeley, 2011) and linguistic co-occurrences (Gill & Oberlander, 2002). Specifically for function words, which are by definition extraordinarily versatile, research has shown that moderators matter. For example, whether I or you is said by a man or a woman and in the context of an angry or cheerful communication can dramatically influence which psychological processes those words reflect (Fast & Funder, 2010; Mehl et al., 2006; Tausczik & Pennebaker, 2010). Context effects, such as the types of communication that a situation affords or demands, PERSONALITY AND LANGUAGE 32 are important considerations in any area of behavioral research. Studies of language use are no exception. Just as a highly extraverted person would not be expected to behave dramatically differently than an introverted person in a situation lacking the potential for social interaction, personality traits that are predominantly defined by differences in social interaction are likely to leave fewer observable traces in solitary writing such as stream-of-consciousness essays. Furthermore, writing or speaking tasks that resemble criterion measures of personality (e.g., selfreport personality questionnaires and essays describing one’s personality) are bound to be more highly correlated than naturalistic measures of language (e.g., Hirsh & Peterson, 2008). However, perhaps in part due to the influence of corpus linguistics, where language from a wide range of communication media are frequently compiled into a single dataset comprising billions of words, studies of linguistic indicators of personality have only recently come to seriously consider communication context. Given that so many personality dimensions hinge on how people react to and interact with others, it is particularly important – in studies of natural language use and beyond – for personality research to increasingly study the links between naturally occurring dialog, self-reports, and observer reports. As naturalistic language research expands with ongoing advances in audio recording technology and computer science methods, it should become easier to understand how linguistic signals are attenuated and warped by contextual influences such as experimental task, communication medium, and motivation. The accomplishments of computerized text analysis in the last 15 years have been extraordinary. However, the software designers, programmers, and data analysts behind this revolution readily admit that there is room for improvement. Cohen and colleagues’ (2009) and S. Cohen’s (2011) research on the measurement of trait affect points to a possible need to improve word-count measurements of common positive emotion words, which are often used in PERSONALITY AND LANGUAGE 33 ways that do not reflect positivity (e.g., I was pretty bored, someone like you), by considering their linguistic contexts. New discoveries made in function word categories that are new to the most recent version of LIWC (Pennebaker, Booth, & Francis, 20007) suggest that finer grained analyses based on words’ grammatical roles have the potential to clarify mixed results in past research and shed light on the cognitive mechanisms underlying personality dimensions. Measures of within-text context – and the usability of tools that consider linguistic context – are bound to improve studies of language and personality as well. A word’s location in a text or sentence (Vine & Pennebaker, 2009) and its neighboring words (Gill & Oberlander, 2004) clearly matter but are rarely considered in psychological text analyses. Programs such as Latent Semantic Analysis (Landauer & Dumais, 1997) and WordSmith (Scott, 2008) handle such variables and, as they become more widely known and user-friendly, stand to greatly enrich future research. Conclusion In this famous monograph on personality, Allport (1937) wrote “language is a codification of common human experience, and by analyzing it much may be found that reflects the nature of human personality” (p. 373). Interestingly, the field of personality and language use only started getting serious momentum more than half a century later. As the research reviewed in this chapter reveals, though, the field is now rich, vibrant, and has already produced many important discoveries. We expect that the immense progress in (stationary and mobile) computing technology and parallel advances in computational linguistics will create a strong push for the field over the next years and lead to critical improvements in the complexity with which naturalistic language can be analyzed. It is our sense that the field will thrive to the extent that it uses these technologically-driven, “bottom-up”, analytic advances and, at the same time, PERSONALITY AND LANGUAGE 34 balances them with innovative theoretical developments and clarifications from “top down”. To achieve this, it will undoubtedly become necessary for researchers from different fields to “crosstalk”. Social psychologists, personality psychologists, cognitive psychologists, linguists, communication scholars, computer scientists and other researchers will need to engage in conversations and collaborations and thereby transcend (and hopefully reduce) traditional discipline boundaries to more fully understand how our words reflect our selves. PERSONALITY AND LANGUAGE 35 Footnote 1 At some point, you may have received the following test over e-mail: “How many Fs does the following passage contain? ‘Finished files are the result of years of scientific study combined with the experience of years.’” Finding only three Fs tends to result from readers skipping ofs. 2 The term sex is used by default to refer to all differences in personality-language links between men and women. However, gender may be more appropriate in cases where linguistic differences seem to be more strongly influenced by gender norms than biology (see Eagly, 1995). PERSONALITY AND LANGUAGE 36 References Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52, 119. Augustine, A. A., Mehl, M. R., & Larsen, R. J., (2011). A positivity bias in written and spoken English, and its moderation by personality and gender. Social Psychology and Personality Science, 2, 508-515. Baddeley, J. L., Beevers, C. G., & Pennebaker, J. W. (2012). Everyday social behavior during a major depressive episode. Manuscript under revision, Social Psychology and Personality Science. University of Texas at Austin, Austin, TX. Baddeley, J. L. (2011). Email communications among people with and without major depressive disorder. Unpublished doctoral dissertation. University of Texas at Austin, Austin, TX. Baddeley, J. L., & Singer, J. A. (2008). Telling losses: Personality correlates and functions of bereavement narratives. Journal of Research in Personality, 42, 421-438. doi:10.1016/j.jrp.2007.07.006 Bell, A., Brenier, J., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language, 60, 92-111. Elsevier Inc. doi:10.1016/j.jml.2008.06.003 Bistricky, S. L., Ingram, R. E., & Atchley, R. A. (2011). Facial affect processing and depression susceptibility: Cognitive biases and cognitive neuroscience. Psychological Bulletin, 137, 998-1028. doi:10.1037/a0025348 Brown, P. & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge: Cambridge University Press. Burke, P. A., & Dollinger, S. J. (2005). A picture’s worth a thousand words: Language use in PERSONALITY AND LANGUAGE 37 autophotographic essay. Personality and Social Psychology Bulletin, 31, 536-548. Campbell, W. K., & Miller, J. D. (2011). The Handbook of Narcissism and Narcissistic Personality Disorder: Theoretical Approaches, Empirical Findings, and Treatments. Hoboken, NJ: John Wiley & Sons. Clark, H. H., & Brennan, S. A. (1991). Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington, DC: APA Books. Cohen, A. S., Minor, K. S., Baillie, L. E., & Dahir, A. M. (2008). Clarifying the linguistic signature: Measuring personality from natural speech. Journal of Personality Assessment, 90, 559-563. Cohen, A. S., Minor, K. S., Najolia, G. M., & Lee Hong, S. (2009). A laboratory-based procedure for measuring emotional expression from natural speech. Behavior Research Methods, 41, 204-12. doi:10.3758/BRM.41.1.204 Cohen, S. J. (2011). Measurement of negativity bias in personal narratives using corpus-based emotion dictionaries. Journal of Psycholinguistic Research, 40: 119-135. Costa, P. T., Jr., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO Personality Inventory. Psychological Assessment, 4, 5-13. Danner, D. D., Snowdon, D. A., & Friesen, W. V. (2002). Positive emotions in early life and longevity: Findings from the Nun Study. Journal of Personality and Social Psychology, 80, 804-813. Davis, D., & Brock, T. C. (1975). Use of first-person pronouns as a function of increased objective self-awareness and performance feedback. Journal of Experimental Social Psychology, 11, 381-388. PERSONALITY AND LANGUAGE 38 Dewaele, J-M., & Furnham, A. (2000). Personality and speech production: A pilot study of second language learners. Personality and Individual Differences, 28, 355-365. DeWall, C. N., Buffardi, L. E., Bonser, I., & Campbell, W. K. (2011). Narcissism and implicit attention seeking: Evidence from linguistic analyses of social networking and online presentation. Personality and Individual Differences, 51, 57-62. DeWall, C. N., Pond, R. S., Campbell, W. K., & Twenge, J. M. (2011). Tuning in to psychological change: Linguistic markers of self-focus, loneliness, anger, antisocial behavior, and misery increase over time in popular U.S. song lyrics. Psychology of Aesthetics, Art, and Creativity. Dodds, P. S., & Danforth, C. M. (2009). Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. Journal of Happiness Studies, 11, 441-456. doi:10.1007/s10902-009-9150-9 Eagly, A. (1995). The science and politics of comparing women and men. American Psychologist, 50, 145-158. Eastwick, P. W., Eagly, A. H., Finkel, E. J., & Johnson, S. E. (2011). Implicit and explicit preferences for physical attractiveness in a romantic partner: A double dissociation in predictive validity. Journal of Personality and Social Psychology, 101, 993-1011. Fast, L. A, & Funder, D. C. (2008). Personality as manifest in word use: correlations with selfreport, acquaintance report, and behavior. Journal of Personality and Social Psychology, 94, 334-46. doi:10.1037/0022-3514.94.2.334 Fast, L. A., & Funder, D. C. (2010). Gender differences in the correlates of self-referent word use: authority, entitlement, and depressive symptoms. Journal of Personality, 78, 313-38. doi:10.1111/j.1467-6494.2009.00617.x PERSONALITY AND LANGUAGE 39 Gill, A. J. & Oberlander, J. (2002). Taking care of the linguistic features of extraversion. Proceedings of the 24th Annual Conference of the Cognitive Science Society, 363—368. Goldberg, L. R. (1981). Language and individual differences: The search for universals in personality lexicons. In L. Wheeler (Ed.), Review of personality and social psychology (pp. 141-165). Beverly Hills: Sage. Goldenfeld, N., Baron-Cohen, S., & Wheelwright, S. (2007). Empathizing and systemizing in males, females, and autism: A test of the neural competition theory. Autism, 1-16. Golder, S. A. & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep and daylength across diverse cultures. Science, 333, 1878-1881. Gosling, S. D. (2008). Snoop: What your stuff says about you. New York: Basic books. Gosling, S. D., Ko, S. J., Mannarelli, T., & Morris, M. E. (2002). A Room with a cue: Judgments of personality based on offices and bedrooms. Journal of Personality and Social Psychology, 82, 379-398. Gottschalk, L. A., Gleser, G. C. (1969). Measurement of Psychological States Through the Content Analysis of Verbal Behaviour. Berkeley, CA: University of California Press. Groom, C. J., & Pennebaker, J. W. (2005). The language of love: Sex, sexual orientation, and language use in online personal advertisements. Sex Roles, 52, 447-461. doi:10.1007/s11199-005-3711-0 Hancock, J., Curry, L., Goorha, S., & Woodworth, M. (2008). On lying and being lied to: A linguistic analysis of deception. Discourse Processes. 45:1-23. Hart, R. P. (1984). Verbal style and the presidency: A computer-based analysis. New York: Academic Press. Hirsh, J. B., & Peterson, J. B. (2009). Personality and language use in self-narratives. Journal of PERSONALITY AND LANGUAGE 40 Research in Personality, 43(3), 524-527. doi:10.1016/j.jrp.2009.01.006 Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of Research in Personality, 45(1), 92-99. doi:10.1016/j.jrp.2010.11.015 Holtgraves, T. (2010). Social psychology and language: Words, utterances and conversations. In S. Fiske, D. Gilbert, & G. Lindzey (Eds.), Handbook of social psychology, 5th edition. Holtzman, N. S., Vazire, S., & Mehl, M. R. (2010). Sounds like a narcissist: Behavioral manifestations of narcissism in everyday life. Journal of Research in Personality, 44, 478-484. doi:10.1016/j.jrp.2010.06.001 Ickes, W., & Reidhead, S., & Patterson, M. (1986). Machiavellianism and self-monitoring: As different as “me” and “you”. Social Cognition, 4, 58 – 74. Jay, T. (2009). The utility and ubiquity of taboo words. Perspectives on Psychological Science, 4, 153–161. Kacewicz, E., Pennebaker, J. W., Davis, M., Jeon, M., & Graesser, A. C. (in press, pending minor revision). The language of status hierarchies. Social Psychological and Personality Science. Koppel, M., Argamon, S. & Shimoni, A. (2003), Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17, 401-412. Kosinski, M. & Stillwell, D. (2012). myPersonality research wiki: myPersonality project. In http://www.mypersonality.org/wiki/. Kramer, A. D. I. (2010). An unobtrusive behavioral model of “gross national happiness”. Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI, 287-290. PERSONALITY AND LANGUAGE 41 Küfner, A. C. P., Back, M. D., Nestler, S., & Egloff, B. (2010). Tell me a story and I will tell you who you are! Lens model analyses of personality and creative writing. Journal of Research in Personality, 44, 427-435. doi:10.1016/j.jrp.2010.05.003 Lakoff, R. T. (1975). Language and woman's place. New York: Harper & Row. Landauer, T. K., and Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240 Lee, C. H., Kim, K., Seo, Y. S., & Chung, C. K. (2007). The relations between personality and language use. Journal of General Psychology, 134, 405-413. Mairesse, F. & Walker, M. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computational Linguistics, 37, 445-488. Mairesse, F., & Walker, M. A. (2006). Words mark the nerds : Computational models of personality recognition through language. Proceedings of the 28th Annual Conference of the Cognitive Science Society, 543–548. Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30, 457-500. McAdams, D. P., & Pals, J. L. (2006). A new Big Five: Fundamental principles for an integrative science of personality. American Psychologist, 61, 204-17. doi:10.1037/0003066X.61.3.204 Mehl, M. R. (2006a). Quantitative text analysis. In M. Eid & E. Diener (Eds.), Handbook of multimethod measurement in psychology (pp.141–156). Washington, DC: American PERSONALITY AND LANGUAGE 42 Psychological Association. Mehl, M. R. (2006b). The lay assessment of sub-clinical depression in daily life. Psychological Assessment, 18, 340-345. Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90, 862-877. Mehl, M. R. & Holleran, S. E. (2008). How taking a word for a word can be problematic: Context-dependent linguistic markers of extraversion and neuroticism. Paper presented at the 11th Conference of the International Association for Language and Social Psychology, Tucson, Arizona. Mehl, M. R., & Pennebaker, J. W. (2003). The sounds of social life: A psychometric analysis of students’ daily social environments and natural conversations. Journal of Personality and Social Psychology, 84, 857-870. Mehl, M., Pennebaker, J.W., Crow, D.M., Dabbs, J., & Price, J. (2001). The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations. Behavior Research Methods, Instruments, & Computers, 33, 517-523. Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Te Google Books Team, Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., and Aiden, E. L. (2010). Quantitative analysis of culture using millions of digitized books. Science. doi: 10.1126/science.1199644 Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes, 45(3), 211-236. doi:10.1080/01638530802073712 PERSONALITY AND LANGUAGE 43 Oberlander, J. & Gill, A.J. (2006). Language with character: A corpus-based study of individual differences in e-mail communication. Discourse Processes, 42, 239-270. O‘Carroll Bantum, E., & Owen, J. E. (2009). Evaluating the validity of computerized content analysis programs for identification of emotional expression in cancer narratives. Psychological Assessment, 21, 79-88. Paulhus, D.L. & Williams, K.M. (2002). The Dark Triad of personality: Narcissism, machiavellianism, and psychopathy. Journal of Research in Personality, 36, 556–563. doi:10.1016/S0092-6566(02)00505-6 Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. New York: Bloomsbury Press. Pennebaker, J. W. (1997). Opening up: The healing power of expressing emotion. New York: Guilford Press. Pennebaker, J.W., Francis, M.E., & Booth, R.J. (2007). Linguistic Inquiry and Word Count (LIWC): LIWC 2007 [Computer program]. Austin, TX: LIWC.net. Pennebaker, J. W. & Ireland, M. E. (2011). Using literature to understand authors: The case for computerized text analysis. Scientific Study of Literature, 1, 34-48. Pennebaker, J. W., & Lay, T. C. (2002). Language use and personality during crises: Analyses of mayor Rudolph Giuliani’s press conferences. Journal of Research in Personality, 36, 271-282. Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language. use: Our words, our selves. Annual Review of Psychology, 54, 547-77. Poole, M. E. (1979). Social class, sex, and linguistic coding. Language and Speech, 22, 49–67. Pyszczynski, T., & Greenberg, J. (1987). Self-regulatory perseveration and the depressive self- PERSONALITY AND LANGUAGE 44 focusing style: A self-awareness theory of reactive depression. Psychological Bulletin, 102, 122-138. Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you tweet: Personality expression and perception on Twitter. Journal of Research In Personality, 46, 710-718. Ramírez-Esparza, N., Chung, C., Kacewicz, E., & Pennebaker, J. W. (2008). The Psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. Proceedings of the International Conference on Weblogs and Social Media (ICWSM 2008). Raskin, R., & Shaw, R. (1988). Narcissism and the use of personal pronouns. Journal of Personality, 56, 2, 393-404. Robbins, M. L., Focella, E. S., Kasle, S., Weihs, K. L., Lopez, A. M., & Mehl, M. R., (2011). Naturalistically observed swearing, emotional support and depressive symptoms in women coping with illness. Health Psychology, 30, 789-792. Roccas, S., Sagiv, L., Schwartz, S. H., & Knafo, A. (2002). The Big Five personality factors and personal values. Personality and Social Psychology Bulletin, 28, 789-801. doi:10.1177/0146167202289008 Rodriguez, A. J., Holleran, S. E., & Mehl, M. R. (2010). Reading between the lines: The lay assessment of subclinical depression from written self-descriptions. Journal of Personality, 78, 575-98. doi:10.1111/j.1467-6494.2010.00627.x Rohrbaugh, M. J., Mehl, M. R., Shoham, V., Reilly, E. S., & Ewy, G. a. (2008). Prognostic significance of spouse we talk in couples coping with heart failure. Journal of Consulting and Clinical Psychology, 76, 781–9. PERSONALITY AND LANGUAGE 45 Rude, S. S., Gortner, E.-M., & Pennebaker, J. W. (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18, 1121–1133. Schaefer, D. R., Kornienko, O., & Fox, A. M. (2011). Misery does not love company: Network selection mechanisms and depression homophily. American Sociological Review, 76, 764-785. Scherwitz, L., Canick, J. (1988). Self-reference and coronary heart disease risk. In K. Houston, & C. R. Snyder (Eds.), Type A behavior pattern: Research, theory, and intervention. New York: John Wiley & Sons. Scherwitz, L., Graham, L. E., Ornish, D. (1985). Self-involvement and the risk factors for coronary heart disease. Advances, 2, 6 – 18. Schmauder, A.R., Morris, R.K., & Poynor, D.V. (2000) Lexical processing and text integration of function and content words: Evidence from priming and eye fixations. Memory & Cognition, 7, 1098-1108. Schnurr, P. P., Rosenberg, S. D., Oxman, T. E., & Tucker, G. J. (1986). A methodological note on content analysis: Estimates of reliability. Journal of Personality Assessment, 50, 601609. Schwartz, H. A. Eichstaedt, J., Dziurzynski, L., Kern, M., Blanco, E., Kosinski, M. Stillwell, D., Seligman, M., & Ungar, L. H.. (2013). Toward personality insights from language exploration in social media. AAAI-2013 Spring Symposium: Analyzing Microtext. Stanford, California. Scott, M., 2008, WordSmith Tools version 5, Liverpool: Lexical Analysis Software. Simmons, R. A., Gordon, P. C., & Chambless, D. L. (2005). Pronouns in marital interaction. Psychological Science, 16, 932-6. PERSONALITY AND LANGUAGE 46 Simmons, R. a, Chambless, D. L., & Gordon, P. C. (2008). How do hostile and emotionally overinvolved relatives view relationships? What relatives’ pronoun use tells us. Family Process, 47, 405–19. Slatcher, R. B., Chung, C. K., Pennebaker, J. W., & Stone, L. D. (2007). Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates. Journal of Research in Personality, 41, 63-75. Smith, C. P. (Ed.). (1992). Motivation and personality: Handbook of thematic content analysis. Cambridge, MA: Cambridge University Press. Stirman, S.W., & Pennebaker, J.W. (2001). Word use in the poetry of suicidal and non-suicidal poets. Psychosomatic Medicine 63, 517-522. Stone, P. J., Dunphy, D. C., Smith, M. S., & Ogilvie, D. M. (1966). The general inquirer: A computer approach to content analysis. Cambridge: MIT Press. Tanenhaus, M.K. & Trueswell, J.C. (1995). Sentence comprehension. In Eimas & Miller (Eds.) Handbook in Perception and Cognition, Volume 11: Speech Language and Communication, pp. 217-262. New York: Academic Press. Tausczik, Y. R., & Pennebaker, J. W. (2009). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29, 2454. doi:10.1177/0261927X09351676 Teasdale, J. D. (1988). Cognitive vulnerability to persistent depression. Cognition and Emotion, 2, 247-274. Teasdale, J. D., & Green, H. A. C. (2004). Ruminative self-focus and autobiographical memory. Personality and Individual Differences, 36, 1933–1943. Townsend, D. J. & Saltz, E. (1972). Phrases vs. meaning in the immediate recall of PERSONALITY AND LANGUAGE 47 sentences. Psychonomic Science, 29, 381-384. Trzesniewski, K.H. & Donnellan, M.B. (2010). Rethinking “Generation Me”: A study of cohort effects from 1976–2006. Perspectives in Psychological Science, 5, 58–75. Vazire, S., & Gosling, S. D. (2004). e-Perceptions: personality impressions based on personal websites. Journal of personality and social psychology, 87, 123-32. Vine, V. & Pennebaker, J. W. (2009). [The arc of narrative project]. Unpublished raw data. University of Texas at Austin, Austin, TX. Watson, D. & Pennebaker, J. W. (1989). Health complaints, stress, and disease: Exploring the central role of negative affectivity. Psychological Review, 96, 234-254. Weintraub, W. (1981). Verbal behavior: Adaptation and psychopathology. New York: Springer. Yarkoni, T. (2010). Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality, 44, 363-373. Elsevier Inc. doi:10.1016/j.jrp.2010.04.001