75 Years After Likert: Thurstone Was Right!

advertisement
Assessing Personality
75 Years After Likert:
Thurstone Was Right!
(And some implications for I/O)
Colleagues
Sasha Chernyshenko
 Steve Stark

Thurstone



In a series of papers in the late 1920s,
Thurstone asserted “Attitudes Can Be
Measured” and provided several methods
for their measurement
He assumed that a conscientious person
would endorse a statement that reflected
his/her attitude…but
“as a result of imperfections, obscurities, or
irrelevancies in the statement, and
inaccuracy or carelessness of the subjects”
not everyone will endorse a statement, even
when it matches their attitude
Thurstone, Psych Review,
1929



For N1 people with attitude S1, all should
endorse a statement with scale value S1 if
they were conscientious and the item was
perfect; but only n1 actually endorse the
item
These people will endorse another
statement with scale value S2 with a
probability p that is a function of |S1-S2|
Figure from Thurstone’s paper:
Thurstone 1929
Thurstone 1928 Attitudes
Can Be Measured

Gave an example of an attitude
variable, militarism-pacifism, with six
statements representing a range of
attitudes:
Thurstone 1928
Thurstone 1928


A pacifist “would be willing to indorse all or
most of the opinions in the range d to e and
… he would reject as too extremely
pacifistic most of the opinions to the left of
d, and would also reject the whole range of
militaristic opinions.”
“His attitude would then be indicated by the
average or mean of the range that he
indorses”
Implications

On Thurstone’s pacificism-militarism scale,
three people might endorse two items each:




Person 1 endorses f and d, and is very
pacifistic
Person 2 endorses e and b, and is neutral
Person 3 endorses c and a, and is very
militaristic
Thus, it is crucial to know which items are
endorsed!
Likert 1932



Proposed a much simpler approach: A fivepoint response scale with options “Strongly
Approve”, “Approve”, “Neutral”,
“Disapprove”, and “Strongly Disapprove”.
The numerical values 1 to 5 were assigned
to the different response options
And an individual’s score was the sum or
mean of the numerical scores
Likert 1932

Likert evaluated his scales by
Split-half reliability
 Item-total correlations


To make this work, he hit upon the
idea of reverse scoring, e.g.,
statements like d and f from
Thurstone needed to be scored in the
opposite direction of statements like a
and c.
Likert 1932


When computing item-total correlations, “if a
zero or very low correlation coefficient is
obtained, it indicates that the statement fails
to measure that which the rest of the
statements measure.” (p. 48)
“Thus item analysis reveals the
satisfactoriness of any statement so far as
its inclusion in a given attitude scale is
concerned”
Likert 1932
Likert discarded intermediate
statements like “Compulsory military
training in all countries should be
reduced but not eliminated”
 Such a statement is “double-barreled
and of little value because it does not
differentiate persons in terms of their
attitudes” (p. 34)

Likert Scaling


Although Likert didn’t articulate a
psychometric model for his procedure, his
analysis implies what Coombs (1964) called
a dominance response process.
Specifically, someone high on the trait or
attitude measured by a scale is likely to
“Strongly Agree” with a positively worded
item and “Strongly Disagree” with a
negatively worded item
Example of a Dominance Process
Prob of Positive Response
Person endorses item if her standing on
the latent trait, theta, is more extreme
than that of the item.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Person
Item
-3
-2
-1
0
Theta
1
2
3
Thurstone Scaling
Thurstone assumed people endorse
items reflecting attitudes close to their
own feelings
 Coombs (1964) called this an ideal
point process
 Sometimes called an unfolding model

Example of an Ideal Point Process

Person endorses item if his standing on the
latent trait is near that of the item.

“I enjoy chatting quietly with a friend at a cafe.”

Disagree either because:
Too introverted (uncomfortable in public places)
Too extraverted (chatting over coffee is boring)
Too
Introverted
Item
Too
Extraverted
Important Point:

The item-total correlation of
intermediate ideal point items will be
close to zero!
Which Process is Appropriate
for Temperament Assessment?

In a series of studies, we’ve

Examined appropriateness of dominance process
by fitting models of increasing complexity to data
from two personality inventories

Compared fits of dominance and ideal point
models of similar complexity to 16PF data

Compared fits of dominance and ideal point
models to sets of items not preselected to fit
dominance models
Fitting Traditional Dominance
Models to Personality Data


Data

16PF 5th Edition

• 13,059 examinees completed 16 noncognitive scales
Goldberg’s Big Five factor markers
• 1,594 examinees completed 5 noncognitive scales
Models examined

Parametric – 2PLM, 3PLM

Nonparametric – Levine’s Maximum Likelihood Formula
Scoring (MFSM)
Three-Parameter Logistic
Model
Three-Parameter Logistic
Model
Three-Parameter Logistic
Model
Three-Parameter Logistic
Model
Three-Parameter Logistic
Model
Three-Parameter Logistic
Model
Two-Parameter Logistic
Model
Methods for Assessing Fit: Fit Plots
Prob. of Positive Response
1.0
0.8
0.6
IRF
EMP
0.4
0.2
0.0
-3.0
-2.0
-1.0
0.0
Theta
1.0
2.0
3.0
Methods for Assessing Fit: ChiSquares

Chi-squares typically computed for single items
2






O
k

E
k
i
i2   i
E i k 
k 1
s

Ei  k   N  P  ui  k  * f   d 
Very important to examine item pairs and triplets

May indicate violations of local independence or misspecified model
Eij (k , k ')  N  P(ui  k  ) P (u j  k ') f ( )d
Methods for Assessing Fit: ChiSquares
To aid interpretation of chi-squares:

Adjust to sample size of 3,000

Compare groups of different size
The expected value of a non-central chi-square is equal to its df
plus N times the noncentrality parameter d

E(  )  df  Nd
2
where N is the sample size. So an estimate of the noncentrality
parameter is
2

dˆ  (   df ) / N .
Adjusted Chi-square

To adjust to a sample size of, say,
250, use
Adjusted   df  250(   df ) / N
2
2
For IRT, we usually adjust to N =
3000, and divide by the df to get an
adjusted chi-square/df ratio
 Less than 2 is great, less than 3 is OK

Adjusted Chi-square/df for
an Ability Test
<1
Singlets
Doublets
Triplets
11
77
327
FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS
1<2
2<3
3<4
4<5
5<7
>7
Mean
3
2
1
0
2
1
1.877
44
31
16
12
9
1
1.829
424
264
92
14
16
3
1.71
AdjChf < 3
SD
2.923
1.734
1.092
Results for 16 PF Sensitivity
Scale: Mean Chi-sq/df Ratios
Model
Singles
Doubles
Triples
2PL
0.98
4.05
5.45
3PL
0.87
3.89
5.23
SGR
0.99
7.76
7.12
MFS-dich
2.91
2.61
2.42
MFS-poly
1.55
2.68
2.58
What if Items Assessed Trait Values
Along the Whole Continuum?
Items on existing personality scales
have been pre-screened on item-total
correlation
 We speculate that items measuring
intermediate trait values are
systematically deleted
 So, what happens if a scale includes
some intermediate items?

TAPAS Well-being Scale
Tailored Adaptive Personality
Assessment System
 Assesses up to 22 facets of the Big
Five
 Well-being is a facet of emotional
stability
 We wrote items reflecting low,
moderate, and high well-being

For example, TAPAS WellBeing Scale
WELL04, “I don’t have as many happy
moments in my life as others have
 WELL17, “My life has had about an
equal share of ups and downs
 WELL41, “Most days I feel extremely
good about myself
 In total, 20 items. 5 negative items, 9
positive, and 6 neutral

Traditional Analysis Results
Item_Name
Initial SME
Location
Reverse
Mean
SD
Factor
Loading
CITC
(alpha=.76)
1
2
3
4
5
WELL02
WELL04
WELL06
WELL09
WELL13
negative
negative
negative
negative
negative
r
r
r
r
r
2.14
2.08
2.23
2.22
2.20
0.80
0.87
0.78
0.76
0.77
-0.40
-0.45
-0.55
-0.53
-0.54
0.35
0.40
0.45
0.42
0.45
6
WELL16
neutral
2.48
0.85
0.08
0.08
7
WELL17
neutral
2.82
0.73
0.13
0.15
8
WELL19
neutral
2.85
0.65
-0.09
-0.05
9
10
11
12
13
14
15
16
17
18
19
20
WELL20
WELL23
WELL26
WELL29
WELL30
WELL34
WELL38
WELL40
WELL41
WELL43
WELL45
WELL46
neutral
neutral
neutral
positive
positive
positive
positive
positive
positive
positive
positive
positive
3.00
3.03
2.80
2.89
2.77
3.13
2.80
2.53
2.96
3.13
2.82
2.89
0.89
0.64
0.78
0.74
0.74
0.70
0.82
0.75
0.73
0.66
0.70
0.72
0.04
0.07
-0.14
0.36
0.56
0.46
0.57
0.56
0.56
0.63
0.53
0.47
0.06
0.11
0.06
0.48
0.42
0.35
0.49
0.48
0.50
0.55
0.46
0.41
r
r
Fit Plot for 2PL WELL17
Prob. of Positive Response
Fit Plot for Item 7
1.0
0.8
0.6
IRF7
0.4
EMP7
0.2
0.0
-3.0
-2.0
-1.0
0.0
Theta
1.0
2.0
3.0
An Ideal Point Model: The
Generalized Graded Unfolding
Model (GGUM)

Roberts, Donoghue, & Laughlin (2000). Applied
Psychological Measurement.

The model assumes that the probability of
endorsement is higher the closer the item to the
person

GGUM software provides maximum likelihood
estimates of item parameters
GGUM

P( Zi  0 |  j ) 
The probability of disagree is:
1  exp{3 i ( j  d i )}
1  exp{3 i ( j  d i )}  exp{ i [( j  d i )   i1 ]}  exp{ i [2( j  d i )   i1 ]}
and the probability of agree is
P( Z i  1|  j ) 
exp{ i [( j  d i )   i1 ]}  exp{ i [2( j  d i )   i1 ]}
1  exp{3 i ( j  d i )}  exp{ i [( j  d i )   i1 ]}  exp{ i [2( j  d i )   i1 ]}
GGUM Estimated IRF for
Moderate Item
Prob. of Positive Response
GGUM ORF for Option 2
1.0
0.8
0.6
0.4
0.2
0.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
Well-Being
IRF for Agree response to TAPAS Well-being item “My life
has had about an equal share of ups and downs.”
3.0
TAPAS Well-being Scale
2PL Results:
<1
Singlets
Doublets
Triplets
20
17
5
FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS
1<2
2<3
3<4
4<5
5<7
>7
Mean
0
0
0
0
0
0
0
1
0
0
1
2
3
2.955
0
1
0
0
1
5
5.408
SD
0
6.439
6.512
GGUM Results:
<1
Singlets
Doublets
Triplets
20
22
9
FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS
1<2
2<3
3<4
4<5
5<7
>7
Mean
0
0
0
0
0
0
0
0
0
0
0
0
2
0.997
0
0
1
1
1
0
1.081
SD
0
3.256
2.001
Summary of Findings

2PLM and 3PLM fit scales developed by traditional
methods OK, but if moderate items are included



Chi-square doublets and triplets can be large, especially when
moderate items are included
Discrimination parameter estimates are uniformly small for
moderate items (and item-total correlations are near zero).
GGUM fits all items, including moderate items


Adj. chi-square to df ratios are small for doubles and triples
GGUM discrimination parameter estimates are large for the
moderate items!
So, for Well-Being
Fitting a dominance item response
theory model (the 2-parameter
logistic) produced an adjusted ChiSquare to df ratio of 2.955 for pairs
 The ideal point model yielded an
adjusted Chi-square/df ratio of 0.997
for pairs

Conclusion

Ideal point model seems more appropriate
for temperament assessment

BUT there’s a “Fly in the ointment” for I/O

Correct specification of response process
does not guarantee more accurate
assessment, because …

Traditional items are easily FAKED
Examples of “Traditional” Items
that are Easily Faked
In each case, the positively keyed response is
obvious.
I get along well with others. (A+)
 I try to be the best at everything I do. (C+)
 I insult people. (A-)
 My peers call me “absent minded.” (C-)

Because these items consist of individual statements, they
are commonly referred to as “single stimulus” items.
Army Assessment of
Individual Motivation (AIM)

Uses tetrads:
•
•
•
•



I get along well with others. (A+)
I set very high standards for myself. (C+)
I worry a lot. (ES-)
I like to sit on the couch and eat potato
chips. (Physical condition-)
Respondent picks the statement that is
Most Like Me and the statement that is
Least Like Me
Army AIM has shown less score inflation
What psychometric model would describe
this type of data????
So…
US Army researchers Len White and
Mark Young (and others) found some
fake resistance and criterion-related
validity for the tetrad format
 But modeling four-dimensional items
was too hard for me!
 How about two-dimensional items?

Multidimensional Pairwise
Preference (MDPP) Format

Create items by pairing stimuli that are similar in
desirability, but representing different dimensions

“Which is more like you?”
• I get along well with others. (A+)
• I always get my work done on time. (C+)

This led to my work on personality assessment
over the past 10 years

And the result is:
Tailored Adaptive
Personality Assessment
System (TAPAS)


TAPAS is designed to overcome existing limitations of
personality assessment for selection by incorporating
recent advancements in:

Temperament/personality assessment

Item response theory (IRT)

Computerized adaptive testing (CAT)
Our goal is for TAPAS to be innovative in both how we
assess (IRT, CAT) and what we assess (facets of
personality)
TAPAS Vision


Fully customizable assessment to fit array of users’
needs
Users can select





any dimension from a comprehensive superset;
a scale length to suit their needs
a response format (depends on faking worries)
adaptive or static
Resulting scores can be used to predict multiple criteria
or as source of feedback
TAPAS Facet Dimensions
 Based
on factor analysis of each of the
Big Five dimensions

E.g., Roberts, B., Chernyshenko, O.S., Stark, S., & Goldberg,
L. (2005). The structure of conscientiousness. Personnel
Psychology

Analyzed 7 major personality inventories

Currently 21 facets + additional “physical
condition” facet for military jobs
TAPAS Facet Dimensions
Conscientiousness

Six facet hierarchical structure:

Industriousness: task- and goal-directed

Order: planful and organized

Self-control: delays gratification

Traditionalism: follows norms and rules

Social Responsibility: dependable and reliable

Virtue: ethical, honest, and moral
Factor Analysis Results

For each facet, we have an empirical mapping of existing scales to our
facets

Provide basis for existing scale classification

Validity of each facet can be investigated via meta-analysis
Scale Name
Industrious
ness
neo competence
.88
neo achievement striving
.76
ab5c organization
.75
ab5c purposefulness
.67
neo self-discipline
.65
ab5c efficiency
.63
ab5c rationality
.50
neo dutifulness
.49
Order
-.28
.02
.11
.18
.22
.36
.16
-.05
Factor
SelfResponsibi Traditional
control
lity
ism
.14
.10
-.01
-.12
.10
.09
.05
.11
-.10
-.04
-.02
-.11
-.11
-.03
-.02
-.19
-.03
-.07
.12
-.28
.16
.14
-.02
.26
Virtue
-.09
-.18
-.17
.24
.16
.21
-.01
.09
TAPAS Military Meta-Analysis

42 studies or technical reports

1988-2006

Small number of police and fire-fighter studies were also
included

22 TAPAS facets

8 criteria (e.g., task proficiency, contextual
performance, leadership, attrition, fitness)

1494 empirical correlations
TAPAS Military Meta-Analysis
Industriousness Results
N
kd
kc
Job/Task Performance
38964
14
36
Contextual Performance
19423
9
18
Counterproductivity
17673
8
17
Attrition
17912
5
8
Leadership
9429
12
20
Training Performance
6156
8
27
Adaptability
1291
3
4
18044
5
17
Criterion
.05
.21
-.14
-.09
.15
.14
.17
.18

Validity tables can be used to guide the choice of facets!
Corrected
Validity
.06
.26
-.18
-.10
.18
.17
.21
.23

Physical Fitness
Observed
Validity


TAPAS Civiliam Meta-Analysis

Studies or technical reports in the period
1988-2006

Same 8 criterion categories and 22 TAPAS
facets

4755 validity coefficients (so, in total, we
have over 6,000 validities in our database)
“How” TAPAS Measures

Our research on the item response process for
personality stimuli (Stark et al., 2006; Chernyshenko et al., 2007)
suggests that


Response endorsement is driven by the similarity between
the person and the behavior described by the stimulus
(aka, an ideal point process)
Implications:


Different models (not the 3PL or SGR) should be used for
item administration and scoring: e.g., GGUM
Multiple stimuli per item are possible (i.e., pairs)
“How” TAPAS Measures
The choice of 4 response formats will be
available

Single statement dichotomous (Agree/Disagree)
Single statement polytomous (SA,A,D,SD)
Unidimensional pairwise preference (i.e., twoalternative forced choice)
Multidimensional pairwise preference (Stark,




2002)
•
Used when faking is likely
Single Statement Scales

Generalized Graded Unfolding Model (GGUM;
Roberts et al., 1998)



Reverse scoring is not needed
Basic idea: a person endorses an item if it
accurately describes him/her
Thus, the probability of endorsement is higher
the closer the item to the person
GGUM IRFs for two
Personality Statements
"I enjoy chatting quietly with a friend at a café."
(Sociability)
"I am about as organized as most people."
(Order)
1.0
1.0
0.9
0.9
0.8
0.8
0.7
0.6
0.6
P(theta)
P(Theta)
0.7
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
-3.0
-2.0
-1.0
0.0
Theta
1.0
2.0
3.0
0.0
-3.0
-2.0
-1.0
0.0
Theta
1.0
2.0
3.0
Multidimensional Pairwise
Preference (MDPP) Format

Create items by pairing stimuli that are similar in
desirability, but representing different dimensions

“Which is more like you?”
• I get along well with others. (A+)
• I set very high standards for myself. (C+)
MDPP Roots: Assessment of
Individual Motivation (AIM)

AIM utilizes forced-choice tetrad format to reduce
social desirability effects

Greater resistance to faking than ABLE (a single
statement personality inventory developed by the
Army researchers)

Low correlations (.00 to .25) with examinee race
and gender and measures of cognitive ability

Predicts attrition and various job and training
performance criteria in research and operational
testing
MDPP Roots: Assessment of
Individual Motivation (AIM)

But, due to quasi-ipsative scoring

AIM items are difficult to create and

Score accuracy cannot be checked against
known scores, because no formal
psychometric model for stimulus
endorsement is available

CAT is not possible without a psychometric
model
IRT Model for Scoring
Multidimensional Pairwise Preference Items
(Stark, 2002; Stark, Chernyshenko, & Drasgow, 2005)
1 = Agree
0 = Disagree
Pst {1,0}
Ps {1}Pt {0}
P( s  t ) i (d s , d t ) 

Pst {1,0}  Pst {0,1} Ps {1}Pt {0}  Ps {0}Pt {1}

Respondent evaluates each stimulus (personality statement) separately
and makes independent decisions about endorsement.

Stimuli may be on different dimensions.

Single stimulus response probabilities P{0} and P{1} computed using a
unidimensional ideal point model for “traditional” items (GGUM)
Refer to new pairwise preference model as MDPP
MDPP IRF for Item Measuring
Sociability and Order
(a)
MDPP Model Performance

Stark & Drasgow (2002)

.77 correlation between estimated and known
scores in 2-D tests, 20 pairs, 10% unidimensional

Stark & Chernyshenko

.88 for 5-D tests, 50 items, 5% unidimensional

All possible pairings of dimensions was not
required for good parameter recovery
CAT vs. Nonadaptive
Average Correlation Across Dimensions
Nonadaptive
Adaptive
%
Items Per
Unidim. Construct
5
5
10
20
5
10
10
20
5
20
10
20
3-d
5-d
7-d
10-d
3-d
5-d
7-d
10-d
.73
.85
.93
.73
.85
.93
.74
.85
.92
.72
.87
.93
.74
.85
.93
.74
.85
.93
.76
.87
.93
.75
.86
.94
.74
.87
.93
.76
.86
.94
.75
.87
.94
.75
.86
.94
.87
.93
.96
.87
.92
.96
.87
.92
.96
.85
.93
.96
.87
.93
.96
.84
.90
.96
.86
.93
.96
.85
.93
.96
.86
.93
.96
.87
.93
.96
.88
.93
.96
.87
.93
.96
* CAT yielded similar correlations with only half as many items.
* 10-d CAT correlations > .9 with 100 items (only 5 unidim!).
Summary of MDPP Model
Studies

MDPP items are attractive for applied use:

Faking is more difficult

Can create huge pool with relatively few statements
representing each dimension (20 stimuli = 190 items)

5% unidimensional pairings sufficient for accurate score
recovery

As with SS models, MDPP CAT can reduce test length
by about 50% while maintaining accuracy, which is
important if many dimensions assessed.
Current Empirical TAPAS
Studies

Comparing MDPP format to single statement
(SS) format

Testing what makes forced-choice items
resistant to faking

# of dimensions?

Matching on social desirability?

Matching on statement locations?
Study 1: Benchmark Study

4-D MDPP measure (41 pairs) designed using “conventional
wisdom”

Match stimuli on social desirability (average difference
between SocD did not exceed 1.08 on 5-point scale)

Match stimuli to have different locations on respective
dimensions (average distance 4.3 units on Z-score metric)

4-D SS measure (40 items)

Both measures administered under faking and honest conditions (N
= 510 and N = 574)

2-D SS measure (20 items) – all honest (n=1084)
Very Strong Faking
Instructions!


Unlike in the previous sections where the instructions
asked you to be as honest and accurate as possible,
we now ask that you PRETEND you are not yet in the
Army, but very much want to be. Imagine a recruiter
asks you to take this questionnaire to determine if you
are GOOD ARMY MATERIAL. If you score well, you
will be let into the Army. If you don’t score well, you
will not.
For the remaining sections, you are to answer the test
questions by describing yourself in a way that will
make you look like “good Army material” so you are
sure to pass the test and get into the Army.
Remember you are not yet in the Army, but very much
want to be. In other words, create the best possible
impression of yourself and convince the Army that you
will make a good Soldier.
Study 1: Benchmark Study

Comparability of formats under Honest Conditions
dom_MDPP
dom_MDPP
1.00
enr_MDPP
0.27
ord_MDPP
0.12
trad_MDPP
0.11
dom_GGUM
0.59
enr_GGUM
0.21
ord_GGUM
0.20
trad_GGUM
0.05
ord_GOLD
0.21
trad_GOLD
0.06
enr_MDPP
0.27
1.00
0.09
0.12
0.22
0.49
0.15
0.10
0.13
0.10
ord_MDPP
0.12
0.09
1.00
0.33
0.02
0.06
0.49
0.21
0.50
0.24
trad_MDPP
0.11
0.12
0.33
1.00
0.08
0.13
0.34
0.54
0.35
0.50
Study 1: Benchmark Study
Honest
dom_MDPP
enr_MDPP
ord_MDPP
trad_MDPP
dom_GGUM
enr_GGUM
ord_GGUM
trad_GGUM
TRAD_GOLD
ORD_GOLD


Faking
0.10
0.17
-0.07
0.48
0.13
0.25
-0.19
0.65
31.43
29.96
0.32
0.95
0.32
1.56
0.44
0.65
0.36
1.25
31.31
29.69
Difference
0.21
0.78
0.39
1.08
0.31
0.41
0.54
0.60
-0.12
-0.26
Effect Size
0.32
0.97
0.70
1.06
0.41
0.59
0.71
0.77
-0.03
-0.05
MDPP scales created using conventional wisdom are as fakable as SS scales
in strong faking conditions
In faking conditions, respondents chose items with “more positive” location
(i.e., > 20% endorsement shift across conditions)
Study 2: Location Matching

11-D MDPP static measure with 117 items

Match stimuli on similarity in locations (average distance
2.09 z-score units)

11-D SS measure (7 items each)

Both measures administered under faking and honest
conditions (N = 286 and N = 358)

Again, very strong faking instructions
Study 2: Location Matching

Unlike benchmark study, only 20 out of 117
items showed inflated percent endorsement
shifts
Note that we matched only on locations, not
Soc.D
 Scored 97 pair 11-D MDPP measure

Similar correlations across formats as in
benchmark study
 But, less score inflation

Study 2: Location Matching
MDPP Scores
ORD_MDPP97
SOC_MDPP97
TRAD_MDPP97
ENR_MDPP97
DOM_MDPP97
IND_MDPP97
INTE_MDPP97
TRUST_MDPP97
CURI_MDPP97
WELL_MDPP97
PHYC_MDPP97

Honest (N= 358) Faking (N=276) Difference Effect Size
-0.08
0.10
0.18
0.38
0.13
0.06
-0.07
-0.12
-0.24
-0.01
0.23
0.30
-0.77
-0.57
0.20
0.28
-0.29
-0.33
-0.04
-0.06
-0.72
-0.43
0.29
0.49
-0.17
-0.01
0.15
0.26
-0.24
-0.18
0.07
0.07
0.01
0.13
0.12
0.20
-0.38
-0.26
0.12
0.20
-0.54
-0.28
0.25
0.42
Compare to: SS scales in benchmarking study had .41 SD inflation for DOM,
and .79 SD inflation for TRAD
Conclusions




MDPP model (Stark, 2002) can be used effectively to score real
MDPP response patterns
 MDPP scores agree with SS scores under honest conditions
Fake resistance of forced-choice format should not be taken for
granted
 E.g., must match on item locations, not just Soc.D
Our MDPP CAT algorithm has constraints on location difference
and Soc.D difference
 Adaptive testing format may further decrease fakability (e.g.,
NCAPS results with UPP scales)
But, there is lots of R&D work to be done…
Current Work
TAPAS is being implemented by the
US Army for enlistment screening
June 8 for applicants without high
school diplomas
 Will it predict their attrition and
counter-productive behaviors?

Current Work



We have about 50 statements for each of
the 13 dimensions that are being used by
the US Army
Are some statements overused? We don’t
have a exposure control algorithm
In principle, each of the approximately 650
statements could be paired with any of the
other 649…but there are lots of constraints
on item selection…
In Sum,

TAPAS designed to bring the latest in
Psychometric theory
 Computer technology
 Personality theory


Our goal is to produce an easily
customizable assessment tool to meet
the needs of diverse users and
researchers
Download