Terri & Ben present Diction & WordStat

advertisement
Diction 5.0
By Ben Gifford
and Terri Johnson
About the creators
Diction was created by the Dean of the
College of Communication at The University
of Texas at Austin, Roderick P. Hart--he is
very focused on political communication.
He's "passionate about many things, but
especially about his family and basketball."
and
Craig Carroll, associate professor and
department chair of Communication and
Journalism at Lipscomb University
Features
• Examines text for 5 main semantic
features:
o
o
o
o
o
Activity
Optimism
Certainty
Realism
Commonality
Features
It also has built-in dictionaries for
numerical terms, ambivalence, self-referencing, tenacity,
leveling terms, collectives, praise, satisfaction, inspiration,
blame, hardship, aggression, accomplishment,
communication, cognition, passivity, spatial terms
familiarity, temporal terms, present concerns, human
interest, concreteness, past concern, centrality, rapport,
cooperation, diversity, exclusion, liberation, denial, and
motion
Features
• Reports an average frequency and
whether the variable falls within a
standard range.
• Diction can analyze
o
o
o
o
the first 500 words of a given passage
up to 500,000 words in 500 word units averaged together
any passage up to 5,000 words in length (500-word units)
Units smaller than 500 words
Comparing Bill Clinton and Barack
Obama
• Make sure you have some text (.txt file)
Create a new project file
File > New or Ctrl +N
Add the text files
Edit > Add File(s) or Insert
Then navigate to your .txt files and open them
Check the properties
Go to File->Properties.
This is especially important if you want to
use SPSS
In the processing tab,
check under Large File
Options. “Averaged” is
probably your best bet.
Properties continued
Give the output file a
unique name. Under
“Numeric Filename”
highlight the text
immediately before
“.num” and replace it
with a chosen name.
*Note: this is different from saving
the entire project. When you save a
project, this file is created and
saved separately and can be used
in SPSS
Choose Norms Profile
• Diction notes when text falls outside of a normal range based
on previous content analyses
o The default for this a "single normative profile"
o Can tailor to more specific needs
 Public speeches
 Poetry
 Newspaper Editorials
 Music lyrics
 etc.
• It's simple to change the profile
Choose Norms Profile Cont.
• Go to View ->Normative Values
• To choose a more specific set of norms, make a general
selection under Class, and then a more specific selection
under Type.
Choose Norms Profile Cont.
• Some normative values are "better" than others.
• Searching for "Normative Values" under help-> help topics
will bring up a list of all the different profiles.
e.g. the creators
sampled 2,357
campaign
speeches, but
only 78 poetry
and verse
samples
Process your files
Processing-> All Files (Ctrl+Shift+G)
or Selected Files (Ctrl+G)
You may have to add new words
to the insistence score
More on that soon,
but just go ahead and hit “yes”
(there’s really no reason not to)
Viewing output
Output for
one file
Abridged
output
For all files.
Clinton Results
It’s possible to look at some raw results. This presentation will
touch on some of the variables. A full list is available in the
manual.
Clinton Results
Diction brings up a count of all words that appear
three or more times (in a 500-word passage) called
“Insistence Score”
Looks for nouns, noun-derived
objects, or words that can be
used as both a verb and a
noun/noun-derived object
Interesting, but probably not
statistically significant or
practical.
Clinton Results
Calculated Variables
Insistence - repetition of key terms
Embellishment - Ratio of adj. to verbs
Variety - Different words/total words
Complexity - Avg. # of chars. per word
Master Variables
These scores use built-in dictionaries
(See next slide)
Clinton Results
Activity: Language featuring movement,
change, the implementation of ideas and the
avoidance of inertia.
ex. formula: [Aggression + Accomplishment +
Communication + Motion] - [Cognitive Terms +
Passivity + Embellishment]
Optimism: Language endorsing some person,
group, concept or event or highlighting their
positive entailments.
Certainty: Language indicating resoluteness,
inflexibility, and completeness and a tendency
to speak ex cathedra (authority from
office/position)
Realism: Language describing tangible,
immediate, recognizable matters that affect
people’s everyday lives
Commonality: Language highlighting the
agreed -upon values of a group and rejecting
idiosyncratic modes of engagement.
Clinton Results
Diction flags any of these variables that it
deems "out of range"
When you’re ready to use
SPSS…
Save first (File -> Save, Ctrl + s). This creates the .num file
Find where your .num file went. Copy it
(Ctrl+C)
Move some stuff around
Create folders in C:\ called “RAWDATA” and “spssdata” if they aren’t there
already. Go to C:\RAWDATA. Paste your .num file (Ctrl+V). Right-click it and
click “rename.” Rename it “mystudy.dat”
Click “Yes”
Open SPSS
Get to the default blank view.
Go to File->Open->Syntax…
Navigate to C:\Program Files\Diction\Stats\
Open ‘SPSS-DIC.SPS’
This file is a pain… no really. Open it, you’ll see.
You need to make this…
Look like something like this
Consult the Diction manual
Go to page 54 (of the Diction 4.0 manual) and
look at Figure 28.
You need to make SPSS-DIC.SPS look
somewhat like that file. The most important part
is that each word in all CAPS is on its own line.
Figure out where there is a line break at hit
“enter” at each one.
After that long and tedious
process
Run the syntax. Cross fingers. Pray.
This
must
match
up with
the .dat
file you
created
If it worked
You should have an SPPS sheet filled
with data
From here, the sky is the limit
For example
A simple means comparison shows Clinton
was much more ambivalent in the speeches
sampled
Though the results are not
significant
WordStat
By Ben Gifford and Terri
Johnson
Craig Stovall
JetBlue Airlines
To obtain a free trial of WordStat
About WordStat
• Content analysis module of SimStat
• Analyzes textual information
o open-ended responses
o interview transcripts
o journal articles
o websites
• Can be used for automatic categorization of text
• Can be used for manual coding
• Facilitates the development of new dictionaries
Features
•
•
•
•
•
•
Integrated text-mining analysis
Visualization tools
Hierarchical categorization dictionary
User-generated dictionary
Keyword-in-context (KWIC) retrieval tools
Statistical analysis capabilities
o factor analysis
o word frequencies
To Create Dictionary
Go to My Computer
...C drive
...Program Files
...Provalis Research
...Dictionary
...Copy Existing CAT file
...Rename to ______.cat
...Right click new cat file
...Open with notepad ("choose program" if notepad is nonexistent)
<--Category
<--Words
Create dictionary using correct
formatting: Category flush left, word
tabbed in with " (1)" after (space is
important). Everything single-space.
Dictionary needs
fixing
Dictionary results
To Open WordStat
...Go to CATA
...Provalis Research
...Simstat
...Simstat for Windows
Go to File>Data>New
You'll get a screen that looks like this:
Create Variables
...1) Person (integer) tab to add additional variables
...2) Speech (memo)
(text files will always be memo variables)
A Content Analysis on Speeches by
Clinton and Obama
Step #1-Enter data
• In this case, enter "1" for a speech by Obama, "2" for a Clinton speech.
• Copy/paste the text into the window below when the appropriate "memo" column is
highlighted.
• To add another line, hit "tab" while in the right-most column.
A Content Analysis on Speeches by
Clinton and Obama
Step #2 -Select the variables
• Execute the STATISTICS...CHOOSE X-Y command
• Move the PERSON variable to the INDEPENDENT
• Move the SPEECH variable to the DEPENDENT
• Press the OK button
A Content Analysis on Speeches by
Clinton and Obama
Step # 3- Run the content analysis module
• Execute STATISTICS...CONTENT ANALYSIS
Step # 4- Choose the proper dictionaries (for Inclusion)
Speeches by Clinton and Obama
Step # 5- View the results
• Click different tabs (word count and crosstabs)
• Click that button
Clinton and Obama Speeches
More Results
THE END...
(or should we say this is just the beginning)
Download