Chapter 4

advertisement
CMST 4N03 – PRODUCING AND VIEWING THE NEWS
DR. ALEXANDRE SEVIGNY
LECTURE 4
I. Message Units and Sampling
1. Units: A unit is an identifiable message or message component. Units can be words,
characters, themes, time periods, interactions or any other result of breaking up a
communication into bits. It has three qualities:
i. It serves as the basis for identifying the population and drawing a sample
ii. It is the basis on which variables are measured.
iii. It serves as the basis for reporting analyses.
2. They are called units of sampling, units of data collection and units of analysis.
i. Lombard et al. (1996) – random sampling of time periods, dates, and TV
channels to obtain a representative sample of TV programming.
1. They analyse episodes or 10-second chunks of episodes.
ii. Weyls (2001) – collected news stories using text analysis software. His
ultimate goal was to rtrack changes in coverage in a year.
Unit(s) of sampling
Unit(s) of data collection
Unit(s) of analysis
Weyls (2001)
News story
News story
Year
Lombard et al. (1996)
Time, date, channel
Episode, time interval, etc.
Episode, time interval, etc.
3. There are two perspectives on unitization
i. Etic – scientifically generated knowledge, they are determined before the
analysis
ii. Emic – subjective knowledge or experience, these units are determined
during the analysis
iii. Content analysis research cannot begin before the end of the emic discovery
process.
iv. The sampling unit should be large enough to well represent the phenomenon
under investigation.
1. Hill and Hughes (1997) used the thread of discussion (entire
conversation) found in USENET discussions about American politics.
They were interested in the dynamics of the interaction.
4. Unitizing a Continuous Stream of Information
i. Hard to do, because coders’ perception of time and what is important in
continuous speech is very variable.
ii. Time based units are one solution (5 mins, 15 mins, etc).
iii. Training coders to extract discrete events from a continuous stream works better.
1. Greenberg (1980) – asked his 50 coders to identify unique instances of
TV behaviour – antisocial, prosocial, sex-role, etc.
2. Hirokawa identified four options for interaction analysis:
a. Thought units
b. Themes
c. Time intervals
d. Speech acts
5. Defining the population
i. Population : the set of units being studies, which will be generalized upon.
1. Often messages, sometimes people (psychometrics)
2. Once the population is defined, it must serve as the basis for sampling
3. Populations can be gigantic – all the books ever written; or tiny – two
weeks of newspaper coverage.
4. If your population is small, there is no need for a representative sample.
You can include all the message units – this is called a CENSUS.
The study of its units
The number of units in it
A number that summarizes
information about a variable
and its distribution
The mean of a variable
The standard deviation of a
variable
The variance of a variable
Population
Census
N
Parameter
Sample
Survey, experiement,
content analysis
N
Statistic


M or X’
Sd
2
sd2
Sometimes the size of your n will be determined by the availability of documents.
Sometimes documents that we think should be indexed are not either.
6. Archives: collections of messages, usually well indexed. Remember to distinguish this
from the index itself. Indexes contain listings, whereas archives contain the messages
in their entireties.
i. Longtitudinal Analyses can be done retrospectively
ii. Content analysis can be done on “dead corpora” (psychometrics of dead
celebrities, presidents, scientists)
iii. Archives are good for sorting out cross-cultural data that would otherwise be very
noisy.
7. Medium Management: you need to understand the medium in which the target
messages are found and the operation of the equipment used for delivery of the
messages.
8. The Digital World
i. Things you can do with electronics:
1. Archiving messages
2. Searching for messages
3. Message preparation for coding
4. Automatic coding
9. Sampling: the process of selecting a subset of units for study from the larger
population.
i. Randomness: every unit in the populatio must have an equal chance of being
selected.
ii. Sampling frame: an itemized set of units that make up a population
iii. If individuals are generating messages that will be analysing you require a
two-step process:
1. Sampling the individuals or groups
2. Sampling messages generated by those individuals or groups.
iv. Simple random sampling: pulling units out of a hat; if the sampling frame is
numbered, then you can use a random number generator
1. With replacement: once it’s drawn, we put it back in the hat
2. Without replacement: once it’s drawn it’s out
v. Systematic Random sampling: selecting every xth unit either from the
sampling frame or in some flow of occurrence over time.
1. You need a SKIP INTERVAL = if the size of the population is known,
then the skip interval is N/n. For example, 10,000 units and desired
sample size of 500 = 10,000/500 = 20. So we sample every 20th unit.
2. PERIODICITY = do things repeat regularly? If so, then you have to
account for this. For example: if sampling frame is Top 50 Movies and
skip interval turns out to be 50, then it is possible that every film will
not represent all the top 50 films but only one specific ranking (1st, 10th,
etc.)
vi. Cluster Sampling : any random sampling in which a group or a set of
messages are sampled together Eg. Lin 1997 collected a full week of
broadcast commercials.
vii. Stratified sampling: the sampling frame is stratified according to categories
on some variable(s) of prime interest to th researcher. For example, Smith
(1999) studied women in film. She constructed three different sampling
frames of the top box office films featuring women, one for eachof the target
decades, and then conducted a systematic random sample for each.
viii. Multistage sampling: any random sampling technique in which two or more
sampling steps are used.
ix. Combinations of random sampling techniques:
1. Danielson and Lasorsa (1997) – stratified, multi-stage, cluser sampling
technique in their sudy of symbolic content in sentences on the front
pages of the New York Times and LA Times over a 10-year period.
10. Nonrandom Sampling: these are generally undesirable because they lack
generalizability.
i. Convenience sampling: relies on the selection of readily available units.
ii. Purposive or Judgement Sampling: involves the researcher making a
decision as to what units he or she deems appropriate to include in the
sample. (eg. Fan and Shaffir (1989) : handwritten essays for legibility)
iii. Quota Sampling : similar to nonrandom stratified sample. Key variable
catgories are identified and then a certain number of units from each
category. The mall intercept is a common example from survey research:
interviewers are instructed to get a crtain number of targeted consumers,
such as 20 females with children or 20 males over 40.
11. Sample Size: this is usually calculated using two measures:
i. Standard error: a measure of dispersion for a hypothetical distribution of
sample means for a given variable. The SE allows us to calculate a
confidence interval around a particular sample mean.
ii. Confidence Intervals: this measure tells us how confident we are that the true
population mean () falls within a given range.
Download