Langbourne W. Rust

advertisement
Journal of Advertising Research. Volume 25, Number 5, October/November, 1985
Using Test Scores to Guide the
Content Analysis of TV Materials
Langbourne W. Rust
There is a class of questions of
great importance to market and
advertising research which has
been remarkably resistant to
solution. These are questions which
require the discovery of the critical
attributes of complex stimuli. What
is there about different commercials, programs, and products that
make people respond to them as
they
do?
Some
recent
methodological
advances,
originating in a qualitative wing of
academic sociology, may give new
life to our efforts to answer such
questions. The new approach is
often referred to as “the
development of grounded theory,”
after a seminal book of that name
by Glaser and Strauss. (1967).
This paper reports on an analysis
that sought to define the critical
attributes of television pilot shows.
It tested the generality of its
findings by using them to predict
audience response to new material.
The Positivist Tradition. Most
American behavioral scientists
I wish to express my deep appreciation
to Arnold Becker, vice president of
research. CBS. for sponsoring the
original research and granting permission
to publish it here. I also wish to thank
Francis lanni, Howard Becker. Arthur
Kover. and Andrzej Osiatynski who read
early versions of the paper and gave me
most valuable encouragement and
advice.
have been trained to follow
theory-testing
strategies.
Observations of the world (data)
are used to test deductions from
theories and thereby to test the
theories themselves. The coding
schemes used to describe behavior
are established before the data are
gathered. Research designs impose
stringent limitations (controls) on
the
parameters
of
variation—reflecting the reasoning that
the more you limit the phenomena
being observed, the more confident
you can be in drawing inferences
about theory.
There have been debates about
this hypothetico-deductive model
of science for years, many of them
echoing
the
philosophical
differences
between
logical
positivists and phenomenologists.
In the United States, especially in
the more quantitative schools of
psychology and economics, the
positivists have had the upper
hand.
American advertising researchers
are likely to approach a question
like,
“What
makes
some
commercials get high recall scores
and others get low?” with some
sort
of
experimental
or
quasi-experimental
design
(Campbell, 1963). They might set
up a controlled experiment to
compare people’s recall of
specially constructed commercials
(intended to vary on only one
dimension). Or they might do a
survey: correlating a grid of
content ratings with recall scores,
checking for those which are
statistically significant. By either
design: hypotheses are framed,
variables defined, scales selected,
and controls are applied before the
data are collected. The results are
numbers reflecting the degree of
association between the variables.
Regarded as a system, this type of
research takes qualitative inputs
(the definitions) and uses data to
produce quantitative outputs.
The Problem of Input—
‘Garbage in …” There is at least
one Achilles heel to this otherwise
robust-looking
strategy:
its
qualitative inputs. The pictures of
the
world
produced
by
theory-validation research are
limited by the theory the researcher
started with. Stated in more general
form:
7
17
Journal of Advertising Research
the relationships found between
variables depend on how we define the
variables in the first place. This is a
critical consideration in the behavioral
sciences
where
the
subjects’
conceptualizations
of
their
environments may be very different
from the researchers’.
Many grown-ups, for example, have
been concerned about “violent” content in children’s TV. Many studies
have attempted to establish whether or
not “violent” programming leads to
antisocial behavior in children. Children may not perceive all the same
things as violent as adults do, but I am
aware neither of any comprehensive
efforts to relate children’s behavior to
programming which they see as
violent nor of any attempts to develop
grounded definitions of whatever it is
in programs that might lead to antisocial behavior (without presuming it to
be violence).
An Alternative Research Tradition—
the Chicago School. Academic sociology has been dominated by the
theory-validation approach during
much of this century. But minority positions, reaching back to the prewar
Chicago School (W. I. Thomas, R.
Park et a!.), have been arguing for a
focus on how people describe and
experience their own worlds. The
voice of these qualitative researchers
has grown stronger, and their
influence has spread into other areas,
including
anthropology
and
educational research (Bogdan and
Bicklin, 1982; Patton, 1980).
The key to the new approach is the
way it uses data. While traditional
techniques use data to test preformed
theories, the grounded approach uses
data as a “ground” from which to
induce new theories and concepts. A
number of earlier writers considered
ways to conduct “inductive analysis”
(Znaniecki, 1934; Robinson, 1950;
Turner 1955). They laid the
groundwork for the synthesis achieved
by Glaser and Strauss (1967).
While most of the grounded theory
literature has focused on the traditional
concerns of sociology, education, and
18
anthropology, some of this author’s
own work has explored matters of
more direct concern to marketing and
advertising: the attributes of toys affecting children’s spontaneous choices
(Rust, 1971a); attributes of Sesame
Street and The Electric Company affecting children’s attention levels
(Rust, 197lb,c); and the study reported
here—attributes of CBS sitcom pilot
shows affecting Lazarsfeld-Stanton
test scores. These grounded studies can
be seen as using quantitative inputs to
guide the production of qualitative
outputs.
Lazarsfeld-Stanton Data. For many
years, CBS has tested prospective TV
programs by using its program-analyzer system, usually referred to as the
Stanton-Lazarsfeld procedure at CBS
and as the Lazarsfeld-Stanton procedure elsewhere (Frank Stanton worked
for CBS).
Each pilot show is tested with a
sample of 80 out-of-town tourists who
are recruited off the street following a
quota-sampling procedure. Each is
Langbourne W. Rust is president of
Langbourne Rust Research, Inc.,
which he founded 14 years ago. Most
of his research has been directed
toward
understanding
people’s
perceptions
of
products
and
communications. Dr. Rust received
a B.A. from Harvard and a Masters
and an Ed.D. in psychology from
Teachers College,
versity.
Columbia
Uni-
provided with two buttons to press as
they watch the show. They are told to
press the green button (on the right) if
they like what they are watching and to
press the red button (on the left) if they
dislike it. They are free to press neither
button, but are requested not to press
both at once.
The button pressings are recorded
continuously. By aggregating viewer
scores at regular intervals throughout
the show, two graphs of audience response are prepared: one plotting
green-button (positive) scores, the
other plotting red-button (negative)
scores. These are superimposed on a
scene by scene outline of the test show
to produce a viewer reaction profile.
Data from the program analyzer are
important inputs to CBS programming.
A bad score or unhealthy profile (e.g.,
a steadily declining one) can kill a
show. Fluctuations in response can
help identify program elements that
should be culled or emphasized in future development.
Once a pilot show has been tested
and the decisions regarding its fate
have been made, the test data have
little more utility to the network. They
get filed away, along with a copy of the
test show itself, and forgotten. Until the
time of the study reported here there
had been few, if any, attempts to use
this large data base to develop a formal
set of programming generalities.
This analysis has two resources to
work from: the films of the pilot shows
and the aggregated scene scores. Scene
scores express the percent of the audience pressing a button for the average moment within a scene. Shows
had between 27 and 43 scenes each
(see Table 1).
Procedure
The green- and red-button scores
were analyzed separately. Each analysis was done in two phases or
“rounds.” Round one took 12 shows
and designed attributes to account for
the scores of all their scenes. Round
two applied the attributes to 14 new
shows, changing, deleting, and adding
Journal of Advertising Research
Table 1
Pilot Shows Used in Rounds I and 2
Pilot shows
Round I
After the Honeymoon
Arnie
Barefoot in the Park
Bridget Loves Bernie
The Governor and JJ
Me and Benjy
Miss Stewart, Sir
My Sister, Hank
My Wives, Jane
Oh Nurse!
The Plumbum
Vernon’s Volunteers
Round 2
Anna and the King
The Blonde
Bobby-Jo & the Big Apple Goodtime
Band
The Bob Newhart Show
Chicago Teddy Bears
The Don Rickles Show
Funny Face
Keep the Faith
The Living End
Mary Tyler Moore Show
Missy’s Men
The New Dick Van Dyke Show
Stanley versus the System
Tim Conway Show
Scenes ending in the first three minutes and
were excluded in calculating program scores.
to the definitions on the basis of the new data.
The analytical process involved three steps:
generation, refinement, and integration. They
are elaborated here:
(1)
Generate the prototypes.
•
•
19
Identify and review the maximum-contrast
cases. Which were the very highest and
Average score
Green button*
Red button
%
%
32
28
30
37
28
23
25
46
28
36
31
27
4
3
2
2
2
32
31
25
37
34
32
3
3
4
2
4
3
5
3
5
2
4
2
3
35
2
31
26
4
34
5
3
25
3
47
2
20
34
4
3
the credits at the end
maximum-contrast cases, make
note of features that seem to
differentiate the high-from the
low-scoring ones.
• Quantjfy
prototype
performance
within these cases. Measure how
well each attribute discriminates
lowest scoring scenes? These are likely
high-scoring scenes from low.
to highlight whatever features audiences • Preliminary cull. Scrap all weak
are affected by and to be least
performers. If a prototype does not
complicated by mixtures of positive and help you differentiate these very
negative attributes.
highest and very lowest scenes, it
Induce prototype differentiators by
is not likely to reflect a feature that
studying contrasts. Review the
was critical to the viewers.
(2) Refine the definitions. For each
protoype surviving the initial cull:
• Test against the remaining cases and
quantify its performance. This
means reviewing the complete
sample of scenes, not just those
with the highest and lowest
scores. If an attribute actually affects viewers, then there should
be a measurable trend in the
scores of scenes that have the attribute. If there is no trend, one is
better off working with another
prototype.
• Deviant case analysis. Identify all
cases scoring against the trend.
Contrast them with conforming
cases to induce ways to improve
the definition. The working
assumption is scenes which score
against the trend probably do so
because
the
attribute
is
improperly defined.
• Test the revisions against the full
sample of cases.
• Iterate: Keep modifying the definition through deviant case
analysis until improvements
cease. Then select the next prototype for refinement.
(3) Integrate the attribute set. When all
prototypes have been explored and
refined, take the survivors and look
for ways to:
• Generate superordinate attributes:
Seek definitions which permit
collapsing several into one
higher-order attribute definition
that accounts for the data as
accurately as the independent
ones do.
• Reduce redundancy: Eliminate
attributes which fail to make a net
contribution to the analysis, i.e.,
those which differentiate cases
which are already accounted for
by the others.
An
Example:
Design of a
Green-Button Attribute. The search for
prototypes began with the round 1
shows. The highest and lowest scene in
each show were identified. These
scenes (12 high, 12 low) were viewed
Journal of Advertising Research
over and
over again. Long lists of differential
attributes were prepared. Glaser and
Strauss refer to this sort of induction
as “comparative analysis” and to the
selection of maximum-contrast cases
as “theoretical sampling.”
One of the first attributes to be induced was seen in 5 of the 12
low-scoring scenes and in none of the
high-scoring ones. It was “presence of
new characters.” (Chi-square with 1 df
= 6.3, p < .02.) Its performance was adequate to take it into the next phase,
refinement, which began by testing it
against the sample of remaining
scenes. This second phase of analysis
follows what Glaser and Strauss call
“representative sampling” and uses a
strategy of deviant case analysis to
refine the definitions.
Defined as “scenes in which new
characters enter the show for the first
time,” 59 of the 357 scenes were
judged to have it; 42 of them scored
low. Chisquare = 9.7. Although the
definition had improved, there were
still 17 deviant cases. Examination of
these led to a series of refinements.
The term “characters” was modified to
read, “characters with impact on the
plot line—not just walk-ons.” This
eliminated six deviant cases and one
supporting case. Chi-square became
16.7.
Then the definition was modified to
specify “new characters” as those with
no prior indication of their
personality. This eliminated four
deviant cases and three supporting
ones. Chi-square
= 20.9.
The final modification redefined
“new” to “no prior introduction or
characterization” and opened up the
“impact on plot-line” restriction to include all “salient characters.” The net
effect was to reduce deviant cases to 6
and increase supporting cases to 40.
Chi-square = 24.9.
In round 2, the attribute was found
in 55 of 455 scenes. Seventy-eight percent scored low. Chi-square = 13.8, p <
.001. A study of the 12 deviant cases
led to no modifications.
In the red-button analysis, the at-
tribute appeared again. No alterations
in the definition were indicated by the
data. Presence of the attribute was associated with low levels of red-button
pressing. This analysis showed that
audiences hold back all judgment of a
show (positive or negative) when confronted with a new character.
Attributes of Scenes with High
Green-button Scores
Goodness, help, and kindness:
where persons are, or try to be, good
to others in an open, direct,
face-to-face way. They must be
moved by a genuine good-hearted
feeling. It is not sufficient to be
routinely
or
habitually
polite,
remorseful,
sympathetic,
or
affectionate.
In the Funny Face pilot, Sandy
Duncan accepted a date with a shy
bookstore clerk. Later, after having
broken their date, she ditched a movie
star to go out with the clerk after all.
Both scenes got high levels of
green-button pressing. People seem to
like people who are good, kind, and
helpful. They like scenes in which
characters act this way in an open and
well-meaning manner.
Harmless
embarrassment:
where
characters appear or feel awkward,
silly, embarrassed, or ridiculous. The
embarrassing incident may actually
occur or be vividly recalled or
anticipated.
This attribute is quite characteristic
of sitcoms. Over 25 percent of the
scenes had it. Many others attempted
to portray embarrassment but failed to
meet all the criteria.
Surprise: where the audience or one
of the characters is surprised or startled. If a character is surprised, he/she
must show it clearly. The emphasis
must be on something happening that
the audience is in on: the kind of private surprise that is shown by characters when they get a sudden idea does
not qualify.
A suitor in Miss Stewart, Sir had his
pass blocked with a stomp on his foot.
A supposedly dying character in My
Wives, Jane spoke unexpectedly from
her hospital bed. A nurse in Oh Nurse!
took a shower in a stall adjoining one
in which two men were hiding. This
was the most frequent high-appeal attribute being in 34 percent of the
scenes.
Justice realized: where a negative
character (or a character who has acted
with questionable motives) sees the
light and reforms, relents under pressure, gets comeuppance, retribution,
put-down, makes a fool of him/herself,
or is resisted successfully. It is essential that the character be aware of what
has happened.
This attribute occurred most often
as retribution. Audiences seem to love
to see evil punished and the bad guy
put in his/her place. Most situation
comedies give a scene or two with it.
Audiences also like to see someone
reform or see the light. They hit the
green button in My Wives, Jane when a
TV producer finally let an actress have
a baby, and they liked it when an
immigration official in Missy’s Men finally decided to grant Missy a visa.
Catch and chase, rough and tumble.
Catch and chase, rough and tumble
scenes often have an air of childhood
roughhousing or intrigue, though they
need not be done playfully in order to
get high ratings.
Physical
discomfort:
where
non-threatening physical discomfort is
received with good-natured grumbling,
gag-making, slow burn, blustering, etc.
Some Attributes of Scenes with Low
Green-button Scores
Sudden scene changes: where a show
cuts suddenly to new surroundings not
anticipated by the audience, scores
drop. This excludes scenes which have
been led into by the plot or dialogue or
lyrics, but the lead-in must be very
clear.
Nearly 25 percent of all scenes in
19
Journal of Advertising Research
rounds I and 2 began with a sudden
change of surroundings. Eighty-two
Not all scene changes get low
scores. When a chase moves smoothly
from one scene to the next, or when
the audience knows about the new
setting in advance (where it will be,
who will be there, etc.), scores may
not be low.
Sad or troubled characters:
where a positive or sympathetic
character feels genuinely troubled,
worried, or unhappy and expresses it
openly.
New characters: where salient
characters have their presence felt for
the first time, with no prior
introduction or characterization.
First five minutes: scenes that end
within the first five minutes of a show.
It seems to take some time for an
audience to warm up to any show (at
least in terms of button-pressing) no
matter how strong the early scenes
may be in terms of other attributes.
Red-button pressing also tended to be
below average for the first few
minutes.
Story over: scenes which occur
after the story is over and the end of
the show is in sight.
Attributes of Scenes with High
Red-button Scores
The attributes affecting red-button
pressing were more difficult to
discover than those affecting green
buttons. Given the low frequency of
red-button pressing, the opinion of just
one or two persons could strongly
affect the scores, making them less
reliable and obscuring the underlying
patterns. Despite this problem, a
number of attributes were identified.
Apprehension: scenes which introduce, reinforce, or sustain audience apprehension that something bad, wrong,
or anxiety-provoking may happen. The
sense of apprehension must be strong,
and the nature of the threat must be
clear.
Apprehension was the most frequent
stimulus to red-button pressing.
One-third of all scenes had it.
20
percent of them scored low, irrespective of their other attributes.
Red-button pressing may not always
reflect something undesirable about a
scene. Could it be that pilot shows
which succeed in building apprehen
sion (in order to resolve it later) tend
to get screened out by network pretests, and get replaced by shows which
are blander and less involving!
Perhaps one should discount high
red-button scores when they are
inflated by scenes with high
apprehension levels.
Manifest anxiety: scenes with seriously felt complaining, anger, expressed anxiety, unhappiness, or hostility. The definition does not refer to
scenes where a concrete solution to a
problem is signaled (hope, if concrete,
seems to defuse audience anxiety), to
scenes where characters express outrage at clear injustice, where the anxiety is expressed through gags, or
where it is trivial (e.g., good-natured
grumbling about some physical
discomfort or inconvenience).
Manifest anxiety was present when
Katie told her father-in-law about her
worries (with no clear call for
solution)
in After the Honeymoon. Expressing
anxiety through gags tends to mute the
negative response of audiences: a pattern demonstrated by the many scenes
where Dick Van Dyke expressed
worry over his screen test. Writers
often try to build tension through the
display of “manifest anxiety,” but it
may be a dangerous strategy.
The rest of the high-negative attributes seem to reflect qualities that
viewers judge as socially “bad”:
Tattles and lies: scenes where
characters lie, tattle on someone, or
violate a confidence. Excluded from
this factor is putting on an assumed
role for a legitimate purpose.
Meddlers: where characters mind
other people’s business, try to thwart
their legitimate aspirations, or meddle
in their concerns. The acts must have a
negative intent.
Illicit sex: scenes with unmarried
couples making physical contact with
strong and at least vaguely illicit erotic
implications. Also included in this
factor are scenes in which such activity
is talked about and dirty jokes or
allusions.
Ridicule: scenes where characters
poke fun at someone, laugh at their
embarrassment, or otherwise make fun
of them to their face.
Conceit: when salient characters indulge in conceited behavior or self-adulation, either at the expense of others
around them or as a flaw in an otherwise positive character. Normally arrogant characters are OK unless they
exercise their conceit at the expense of
innocent others.
Red-button Analysis:
Low-Negative Attributes
The four attributes found to account
for low red-button scores were almost
identical to four of the low-green attributes. One required a slight change. It
specified scenes ending in the “first
three minutes” instead of the first five.
The others remained unchanged: “new
characters,” “story over,” and “credits.”
Predicting New Scene Scores
An assistant prepared the materials
for the analysis. Scores were withheld
from the researcher until all the scenes
were classified, using the differentiating attributes. A prediction was
made for each scene: high if there were
more high attributes than low attributes, low if there were not. A 2 x 2
contingency table related the predictions to the scores (high or low, relative
to the show mean). Table 2 lists the
nine pilot shows. There were 305
scenes.
The green-button scores were correct
83 percent of the time: 76 percent of
the highs and 86 percent of the lows;
chi-square = 118 with I df, p < .001. The
red-button scores were correct for 69
percent of the highs, 65 percent of the
lows; chi-square = 32, p < .001 (see
Journal of Advertising Research
Table 3).
Conclusion
Table 2 Green29button
Pilot Shows Used for the
35
Prediction Test
31
23
29
The Doris Day Show
26
Houseboat
30
M*A*S*H
30
Maureen
The Minnie Pearl Show
When in Rome (To Rome with Love)
Scared Stiff
Shepherd’s Flock
The Singles
Table 3
Scene-Score Predictions
Number of scenes
High score
Low score
Green-button predictions
High-score predicted
Low-score predicted
Red-button predictions
High-score predicted
Low-score predicted
how we define those variables. The
problem of definition is especially
critical in the social sciences where
the subjects may process experience
differently than the researchers. The
tendency for researchers to build
abstract models (descriptive or
theoretical) around ad hoc, a priori
variables is a primary reason for their
difficulties in predicting how people
will respond to previously untested
complex stimuli.
The attributes discovered through
grounded-analysis procedures proved
to generalize well in this study. When
applied to new program material,
they differentiated high- from
low-scoring
scenes
with
an
encouraging degree of accuracy.
It is hoped that publication of this
study will encourage extension of the
methods of grounded analysis to new
substantive areas and to the evolution
21
Red button
3.5
2.7
1.6
3.5
4.2
1.5
7.5
4.0
2.6
The relationship we discover between any two variables depends on
produce, and the findings have shown
a high level of predictive generality.
By providing a qualitative
characterization of the structure of
other people’s perceptions, we may
better understand why they act as they
do, and we may be able to produce
better materials for them.
Bogdan, R., and S. K. Biklin. Quali
tative Research for Education: An In
troduction to Theory and Methods.
Boston: Allyn and Bacon, Inc., 1982.
Accuracy
%
98
25
31
151
76
86
74
68
34
129
69
65
of inductive methodologies in
Campbell, D. T., and J. C. Stanley.
“Experimental and Quasi-experimental Designs for Research on
Teaching.”
In
Handbook
of
Research on Teaching. N. L. Gage,
ed. Chicago: Rand McNally, 1963.
Glaser, B. G. “The Constant Comparative Method of Qualitative
Analysis.” Social Problems
(1965):436—445.
general. There is a large class of
problems that can be analyzed by the
grounded approach. Appropriate data
bases abound, particularly in the
worlds of business and
communications. Wherever
standardized pretesting (of advertisements, products, programs, etc.)
has been carried on long enough, the
necessary raw materials are available.
Grounded analysis makes it possible
to draw valuable generalizations from
data on the shelf.
The results take some getting use
to. We are accustomed to research
which reports that the relationship
between two predefined variables is
such-and-such a statistic. We are less
accustomed to research which says
that the critical stimuli to a certain
type of response are best defined in
such-and-such a way. But the latter is
what these grounded analyses
Journal of Advertising Research
------------,and A. L. Strauss. The
Discovery of Grounded Theory:
Strategies for Qualitative
Research. Chicago: Aldine
Publishing Company, 1967.
Levy, M. R. “The Lazarsfeld-Stanton
23
Program Analyzer: An Historical
Note.” Journal of Communication
32, 4 (l982):30—38.
Patton, M. Q. Qualitative
Evaluation Methods, Beverly Hills,
CA: Sage Publications, 1980.
Rust, L. W. Attributes That
Differentiate Boys’ and Girls’
Preference for Materials in the
Preschool Classroom: A Systems
Design Approach: Ann Arbor, MI,
University Microfilms, 1971.
Journal of Advertising Research
------------. Attributes of The Electric Company
Pilot Shows That Produced High and Low
Attention in 2nd and 3rd Graders. New York:
Children’s Television Workshop, 1971, Eric
text-fiche.
-------------.Attributes of Sesame Street That
Influence Preschoolers’ Attention to the TV
Screen. New York: Children’s Television
Workshop, 1971.
Turner, R. H. “The Quest for Universals in
Sociological Research.” American Sociological
Review 18 (1953):
604—611.
Znaniecki, F. The Method of Sociology: New
York: Farrar & Reinhart, 1934.
Journal of Advertising Research. Volume 25, Number 5, October/November, 1985
23
Download