Social Psychological Analysis of Public Political Comments on Facebook Márton Miháltz 21st November 2014, Budapest www.trendminer-project.eu TrendMiner Overview • What kind of social political trends are there in Hungarian comments to political posts on Facebook? – Facebook in Hungary: 4.27M registered users = 59.2% of internet users, 43% of total population • Download all public comments from Hungarian politicians’, parties’ facebook pages • Analysis of comments: – – – – Basic NLP (tokenization, PoS, stemming), domain-adapted Entities: political actors (people, organizations) Sentiment Social psychology dimensions: agency/communion, individualism/collectivism, optimism/pessimism, primordial/conceptual thinking • In cooperation with Narrative Psychology Research Group, Hungarian Academy of Sciences 2 Data Acquisition • Get comments via fb Graph API – 1.9M comments for 141K fb posts (2013.10.01 – 2014.09.02) – from 1344 fb pages • Organizations: parties, regional and associated branches • People: candidate and elected representatives (MPs), government, party officials • Official and fan pages – In 3 categories • Hungarian parliament 2010-2014 • Hungarian parliament elections 2014 (6th April) • EU parliament elections 2014 (25th May) • Sources: valasztas.hu, wikipedia.hu • Everything in a MySQL database – For arbitrary queries (political groups, time etc.) Data model • Fb_pages – Id, URL, Page title – Type: person or organization – Affiliated party (3 campaigns) • Fb_posts, Fb_comments – Id, Created_timestamp – Message text, Author_user_id • Comments_annotations – Sentence_id, Start_token, End_token index – Annotated text, Lemmatized_annotated_text, Annotation_tag • Fb_comments_scores – 16 scores and counts (sentiment, RID,, agency, communion, optimism, …) Hungarian Political Ontology • Extending TM multilingual political ontology – 8 New classes, 3+3 new object/data properties, 1579 new instances (1 Country,18 Party, 661 Politician, 899 Nomination) – Nominated and elected MPs (2010 Hu. Parl., 2014 Hu. Parl., 2014 EU Parl.), nominating parties; – Names, abbreviated names, nicknames, Facebook page URLs etc. • Example: 5 Hungarian Political Ontology Example: Benedek Jávor was member of Hungarian Parliament during 2010-2014 (nominated by LMP), member of European Parliament from 2014 (nominated by EGYÜTT-PM). 6 Processing Pipeline • • • • • • • • Downloading (Fb Graph API py script) Tokenizaton (huntoken tool) PoS-tagging (hunmorph tool) Morphological analysis (hunmorph tool) Stem+analysis disambiguation (Python script) Content analysis (Java NooJ) Scoring & storage in DB Uploading in RDF to TM Integration Server Domain Adaptation • Problem: existing NLP tools developed on different domain, (f)ail on social media language (facebook comments) • Using corpus for survey: – – – – – 1.25M fb comments (29M tokens) 2.25M unknown tokens (694K types) Frequency list, f > 15 items manually revised Identify common problems Lists of frequent, relevant unknown, new words etc. Domain Adaptation: Tokenization • Huntoken tool • Frequent problems: – missing spaces around punctuation ... end of sentence.Beginning of another ... – Multiplicated punctuation first part……. Second part – Contracted words (slang) asszem = azt hiszem (“I think”) – Consonant multiplication (interjections, onomatopeic words etc.) e.g. pfffffffff, uffffff, ejjjjjjjj (pff(f*), uff(f*), ej(j*)) – split large numbers by decimal groups 125 000 – split URLS – split emoticons :D Domain Adaptation: PoS/stemming • Hunpos tagger + hunmorph analyzer + stemming script • Frequent problems: – Unknown words (no lemma/PoS) • add to hunmorph analyzer’s lexicon • using analogous words (morphological paradigm) • Compounds, abbreviations, acronyms, slang words etc. – Frequently misspelled word forms: • replace with correct forms – Wrong capitalization e.g. SENTENCES IN ALL CAPS – Missing accent characters –disambiguation model needed E.g. kor (age), kór (disease), kör (circle) NooJ, Java NooJ, Nooj-cmd • Java NooJ – Open source version of NooJ: define and run finite state machines for querying, annotation etc. (morphology, syntax) – NooJ-Cmd extension: all NooJ GUI features => command line options – Open source: https://github.com/tkb-/nooj-cmd • NooJ grammars (FSMs) for annotation: – – – – – – Actors (entities) Emotional valence (sentiment polarity) Regressive imagery dictionary Agency-communion Optimism-pessimism Individualism-collectivism Development of NooJ Grammars • In collaboration with social psychologist researchers – Social Psychology Department, Eötvös Lóránd University, Budapest – Narrative Psychology Research Group, Hungarian Academy of Sciences • Development Corpus – 176K sample fb comments from 570 fb pages (4.9M tokens) – NLP annotation – Frequency lists (lemmas, lemmas+PoS, lemmas+morphological info etc.) • Development: – – – – f > 100 content words from development corpus (3500 types) 7 independent annotators >= 4 annotartors agree: manual revision Compile into NooJ grammar with polarity shifters, items to be excluded etc. 1. Political Actors (NEs) • Maxent NE tool (huntag): low performance on domain – Trained on standard language news texts – Miscategorization, false positive NEs, entity boundary recognition problems • NooJ grammar/lexicon for Trendminer – Person names: family_name (given_name_lemmatized)? | frequent_nicknames … – Organization names: Standard_form | abbreviated_forms… | nicknames… – Created automatically (names from DB) + manually (nicknames from freq. lists) 2. Emotional Valence • Emotions with positive or negative polarity • Polarity in context: recognize negation using simple rules • Nouns, adjectives, verbs, adverbs, emoticons, multiword expressions • 500 Positive, 420 negative entries 3. Regressive Imagery Dictionary • Martindale (1975, 1990): uncover psychological processes reflected in the text • 2 basic categories of thinking: – Primordial (primary): associative, concrete, and takes little account of reality (fantasy, dreams) – Conceptual (secondary): abstract, logical, reality oriented, aimed at problem solving • 7+29 more subcategories (social behavior, cognition, perceptions, sensations etc.) • Hungarian version by Pólya and Szász • 3000+ terms 4. Agency/Communion • 2 fundamental dimensions of social values: – Communion: moral and emotional aspects of an individual’s relations to others (affection, expressiveness, cooperation, social benefit etc.) – Agency: efficiency of an individual’s goal-orientated behavior (motivation, competence, control) • Positive or negative for both dimensions – Context dependent (e.g. negation) • 640 expressions 5. Optimism/Pessimism • Based on PoS and morphology annotations + time expressions • 2 measures: 1. |future_tense_verbs| / (|present_tense_verbs| + |past_tense_verbs|) 2. |present_tense_verbs| / |past_tense_verbs| • Both correlate with degree of optimism 6. Individualism/Collectivism • Based on PoS and morphology annotations • 1 measure: |personal pronouns| / (|verbs with personal inflection| + |nouns with possessive inflection|) • Higher score: higher degree of individualism Visualisation 19 20 21 22 23 Dissemination and Exploitation • Presentations – Hungarian NLP Meetup, Sept. 25. 2014., Budapest – conText, Nov. 20. 2014, Budapest • Conference papers, presentations – 2 papers at 11th Conference on Hungarian Computational Linguistics (January 15-16. 2015., Szeged) • Source code – https://github.com/mmihaltz/trendminer-hunlp – https://github.com/mmihaltz/trendminer-hutools – https://github.com/tkb-/nooj-cmd • Project website (http://corpus.nytud.hu/trendminer) – Download political ontology – Download 1.9M facebook comments corpus (w/ annotations) – Project info, papers, presentations slides 24 Thank You! 21st November 2014, Budapest www.trendminer-project.eu