Lecture 4a. Scales.

advertisement
ICU. MARKETING RESEARCH
Vladimir V. Bulatov (bbe@voliacable.com)
LECTURE 3B. SCALES
Note: the notion of measurement assumes that there is something worth measuring.
The “thing” to be measured (e.g., an attitude toward a supplier, favorite color, or
sales) is referred to here as a construct.
Many constructs are fairly complex (e.g., one’s attitude toward Japanese restaurants
selling liquor on Sundays includes feelings toward Japanese, restaurants, liquor,
etc.). Nonetheless, in order to arrive at a bottom-line statement about such
constructs, there is a strong tendency to convert/simplify these constructs into a
single scale or series of scales, usually quantitative ones.
SCALE TYPES
Nominal (Categorical).
Refers to arbitrarily assigning a number to different response categories. The scale
number has no meaning in and of itself. Some obvious examples of nominal scales
include tax ID code or football players. There is no obvious relationship b/w the
quantity of the construct being measured and the numerical value assigned to it.
 Non-metric scale. Used to compute frequencies and other calculations (e.g.,
average values, are meaningless).
Ordinal.
The higher is the number – the more (less) the construct exists. The absolute size of
the number, however, has no meaning nor do the differences b/w two scale value.
Ranking is the most common form of ordinal scale. If the ranking is based on
intelligence, we know that the subject ranked first is more intelligent (at least
according to our ranking method) than the person ranked second, but we have no
idea how much smarter he/she is.
 Non-metric scale. Frequencies, percentile, medians, plus a variety of other order
statistics can be utilized.
Interval.
An interval scale is a scale where differences (intervals) b/w scale values have
meaning, but the absolute scale values are meaningless. E.g. Celsius or Fahrenheit
temperature scales. Difference b/w 40 and 41 is the same as b/w 1 and 2, but 0 has
no meaning. All we can say is 0 is one degree warmer than –1 and one degree
colder than +1; hence, 100 degrees are not at all 2 times warmer than 50.
 Metric scale. Allows computation of means, standard deviations, use of parametric
statistical tests, and the computation of product-moment correlations b/w 2 intervalscaled variables; this in turn allows to utilize such “fancy” techniques as regression,
discriminant, and factor analysis (very desirable scale type for data being collected).
Ratio.
The highest order scale. Here the ratio b/w 2 scale values is meaningful. 0 represent
absence of construct (e.g., money), while 50 items of construct are 2 times less than
100.
 Metric scale. Allows everything of interval scales plus, geometric mean or the
coefficient of variation; metric scales are also meaningful when multiplied together.
Higher-order scales are desirable by analysts but are heavier for respondents to
cope with. There must be a trade-off b/w on whom to put the burden: respondent
1
MR. SCALES
(who may reject filling a questionnaire) or analyst (who will later have to convert
scales or deal with less desirable type of data). Interval scale is basically the most
preferred by MRers.
EXAMPLES OF IMPLEMENTING DIFFERENT SCALES INTO QUESTIONS.
NOMINAL SCALES.
Multiple choice. (Check of a single answer from a set of alternatives).
E.g.
Which of the following terms best describes inizlots?
__A. Riboflavit.
__B. Ordils and humspiels.
__C. Octiviniginianus.
__D. All of the above.
__E. B on alternate Tuesdays.
__F. None of the above.
(For quantification purposes, we would typically assign a 1 to the first answer
(riboflavit), a 2 to the second, etc. The numbers would represent only what category
was chosen, but not how much of the construct was present.
E.g.
Marital status: What is your marital status?
______
______
______
______
Single
Married
Divorced
Widowed
(=1)
(=2)
(=3)
(=4)
Occupation: What is your occupation?
______
______
………
Lawyer (1) Teacher (2)
Brand choice/used: Which brand of soft drink did you last buy?
______
______
______
______
Coke (1)
7UP (2)
Pepsi (3)
Other (4)
The categories used may be either supplied in advance to the respondent (aided) or
coded after the respondent gives a verbal/written answer (unaided). In general, the
aided/structured approach is easier for both the respondent and the analyst.
Yes/No (Binary) (Measures with only 2 possible values are typically nominal scales).
Ownership: Do you own a color TV?
______
______
Yes (1)
No (2)
Trait association (adjective checklist). Please, indicate which of the following
descriptions apply to these products. Check as many descriptions as you feel apply
to each product.
Descriptions
Product
Necessary Fun
Useless
Good investment
Color TV
______
______
______
______
Showmobile
______
______
______
______
Life insurance
______
______
______
______
2
MR. SCALES
ORDINAL SCALES.
Forced ranking. The most obvious ordinal scale is a forced ranking:
E.g. Please, rank the following five brands in terms of your preference by marking 1
next to your most preferred brand, 2 next to your second most preferred brand, and
so forth:
Coke
______
Pepsi
______
7UP
______
Dr.Pepper ______
Fresca
______
Paired comparison. A mean to generate an ordinal scale without asking the
respondent to consider all the alternatives simultaneously. Respondents only choose
the more preferred (or heavier or prettier, or any other characteristic you wish to
measure) of two alternatives at a time. Converting previous task into a paired
comparison framework, there are 10 pairs:
Coke, Pepsi
Coke, 7UP
Coke, Dr. Pepper
Coke, Fresca
Pepsi, 7UP
Pepsi, Dr Pepper
Pepsi, Fresca
7UP, Dr. Pepper
7UP, Fresca
Dr. Pepper, Fresca
Formula
a!=(a)(a-1)(a-2)…(2)(1) is called factorial.
 a
a!
  
 b  (a  b)!b!
5
5!
(5)( 4)(3)( 2)(1)
  

 10
 2  (5  2)!2! [(3)( 2)(1)][( 2)(1)]
Number of distinct ways I can draw a sample of size b out of universe a.
Paired comparison allows intransitivity (A preferred to B, B to C, but C to A again!);
this allows to uncover special nature of preferences, but makes data quality
questionable. Another tough thing about PC is that if we have a big number of
alternatives, than the number of distinct variants may explode, leading to a trouble in
getting an ordinal scale. Because of their cumbersome nature, complete paired
comparisons are rarely used except in pilot studies or laboratory situations.
Semantic scale. A SS obtains responses to a stimulus in terms of semantic
categories.
E.g. Do you like yogurt?
______
______
______
______
______
Dislike
Dislike
Neutral
Like
Like
Extremely
(2)
(3)
(4)
extremely
(1)
(5)
3
MR. SCALES
Respondents are instructed to check the category which best describes their
feelings. Since they choose the category on the basis of the words (semantics)
attached to it, this is a semantic differential scale. (Ordinal, but not interval, still).
Picture scale. E.g. for children, smiling faces ranging from sad to happy; or another
set of pictures for areas where literacy is low.
Summated (Likert) scale. It is an extension of semantic scale in two ways. Rather
than measure a construct by a single item, a series of items are used and sum score
is calculated. Second, the scales are calibrated so the neutral score is coded “0”.
E.g.
Do you like the taste of yogurt?
______
__x___
______
______
______
Dislike
Dislike
Neutral
Like
Like
Strongly
(-1)
(0)
(1)
strongly
(-2)
(2)
Is yogurt a healthful food?
______
______
__x___
Extremely
Not
Neutral
not healthful healthful
(0)
(-2
(-1)
______
Healthful
(1)
______
Like
extremely
(2)
Do you feel your friends like yogurt?
___x__
______
______
Dislike
Dislike
Neutral
Strongly
(-1)
(0)
(-2)
______
Like
(1)
______
Like
strongly
(2)
See, overall score on yogurt characteristics is negative.
Other ordinal scales can be incorporated into a MR, but its usage is usually limited.
INTERVAL SCALES
Equal Appearing Interval (Thurstone’s technique. Not practical and is rarely used).
Bipolar Adjective.
Rather than attaching a description to each of the response categories, only the two
extreme categories are labeled; e.g.:
Dislike
Like
Extremely
Extremely
1
2
3
4
5
6
____________________________
Since the responses are equally apart both physically and numerically, it can be
assumed that the responses will be intervally scaled.
Note: many consider this scale only somewhere in b/w ordinal and interval; secondly,
test results (if the researches are valid) on Bipolar Adjective and semantic scales
rarely differ from each other.
Agree-Disagree scale. (A variant of bipolar adjective).
E.g. Show your agreement to the statement “I like yogurt”
4
MR. SCALES
Disagree
Agree
Strongly
Strongly
1
2
3
4
5
6
____________________________
(Minor logical problem: I strongly disagree either because I am strongly neutral, or I
dislike yogurt; however, usually respondents interpret correctly).
Continuous scales. The same as above but:
e.g:
Very bad _____________________ Very good
Optical devices are then used to measure the response.
(However, since results, using these continuous scales are usually identical to bipolar
adjective scales, first are almost never used).
Equal width interval. (Assessing to which category the respondents fall)
E.g. (yet, only ordinal scale):
______
______
______
______
______
None
1-2
3-15
16-99
100 or more
The below scale is interval:
______
______
______
0-4
5-9
10-14
______
15-19
______
…
…
…
Second approach has some advantages (and no additional cost associated with); but
in case of improper distribution of data (e.g., almost all responses are 2 and 3, then
disproportionate interval scale is to be used.
Dollar Metric (Graded Paired Comparison). Resembles the paired comparison test,
but the choices in pairs must be accompanied by “how much” evaluation. E.g. Which
brand you prefer? (pairs follow). How much extra would you be willing to pay to get
your more preferred brand? (Amount of money must be stated).
Pilot tests are run to evaluate how well respondents can differentiate the differences
in construct. Afterwards, appropriate scales are made.
Generally, for individual level analysis 6 or more points are usually sufficient to
account for respondent’s discriminatory abilities; for aggregate analysis, even fewer
are needed. Therefore, most scales should use b/w four scale points (for phone
surveys, intercept interviews, low commitment situations) and eight (for committed
and knowledgeable respondents).
Use either odd or even number of scale points. In well-done research such difference
will have almost no effect on result.
The Law of Comparative Judgement.
Paired comparison judgements can be converted into intervally scaled data by
means of Thurstone’s law of comparative judgement.
We will discuss in very detail this approach later.
(other approaches are also applicable, and can be utilized).
5
MR. SCALES
RATIO SCALES
Direct quantification (the simplest way). Ask directly for quantification of a construct,
which is ratio scaled.
E.g.
How many dress shirts do you own? ____
How old are you? ____
Problem with this approach is that the respondent may not know or refuse to answer.
Consequently, this approach is to be used only during a pilot/small scale surveys.
Constant Sum Scale. Very popular device in marketing research. Researchers are
given a number of points (if the process is conducted in person, chips or other
physical objects are often use) and told to divide them among alternatives according
to some criteria (e.g., preference, importance, aesthetic appeal). Since respondents
are told to allocate chips in a ratio manner (if you like brand A twice as much as
brand B, assign it twice as many chips, etc.), then the results are presumably ratio
scaled.
E.g., I might ask for 10 points to be allocated among three brands:
_______
A
2
B
3
C
5
_______
10
Two problems with this approach: respondents may mess up proper score
distribution, necessitating recalculations; second: determining the appropriate
number of points/chips to use requires trading off b/w rounding error if too few are
used and fatigue/frustration/refusal problems if too many are used. Still, the approach
is quite useful.
(Constant sum paired comparison: by combining a constant sum scale and paired
comparison methods, we get a constant sum paired comparison; this allows for ratio
scaled paired comparison judgement).
Delphi procedure. (Separate hand-out will be given). The Delphi procedure is a
modification of the constant sum scale designed to produce agreement among
judges.
Reference Alternative. (or: Fractionation or Magnitude scaling). This approach seeks
a ratio scale by having respondents compare alternatives to a reference alternative.
E.g.
Reference alternative X=100
Alternative A ___
Alternative B ___
Alternative C ___
Respondents are instructed to indicate how alternatives compare to the reference
alternative on some criterion such as preference by putting down a number half as
large if the alternative is half as preferred, and so on.
In the example case, a respondent might assign 50 to A, 250 to B, and 130 to C.
To note: total sum approach is more often used than reference alternative approach.
6
MR. SCALES
To conclude:
It may be interesting to consider how crucial the choice of method is. One study
(Haley and Case, 1979) compared 13 disparate measures of response to a brand,
and found that:
1. All 13 tested measuring methods are highly correlated.
2. Awareness and brand choice are somewhat different from the other attitude
measures.
3. Acceptability, 6 point adjective, agreement, quality, 10-point numerical,
thermometer, and Stapel (modification of semantic scale) tend to produce
predominantly favorable readings.
4. For purposes of predicting market share, scales, which restrict the number of
brands getting top ratings (such as constant sum) tend to discriminate better.
5. With the exception of the constant sum scale, rating less than midpoint were
associated with essentially a zero share and even top-category ratings (e.g. will
absolutely definitely buy) tended to be related to only about 50 percent share.
CONSEQUENTLY, IT APPEARS THAT CONSISTENCY IN USE OF A SCALE IS AT
LEAST AS IMPORTANT AS THE SPECIFIC SCALE USED.
7
MR. SCALES
Download