Uploaded by jdawa20.office

PSYCH ASSESSMENT REVIEWER

advertisement
PSYCHOLOGICAL ASSESSMENT
GENERAL CONCEPTS
Uses:
1. Measure differences between individuals or
between reactions of the same individual under
different circumstances
2. Detection of intellectual difficulties, severe
emotional problems, and behavioral disorders
3. Classification of students according to type of
instruction, slow and fast learners, educational and
occupational counseling, selection of applicants for
professional schools
4. Individual counseling – educational and
vocational plans, emotional well-being, effective
interpersonal relations, enhance understanding and
personal development, aid in decision- making
5. Basic research – nature and extent of individual
differences, psychological traits, group differences,
identification of biological and cultural factors
6. Investigating problems such as developmental
changes in the lifespan, effectiveness of educational
interventions,
psychotherapy
outcomes,
community program impact assessment, influence
of environment on performance

Measures broad aptitudes to specific skills
Features of a psychological test:
 Sample of behavior
 Objective and standardized measure of
behavior

Diagnostic or predictive value depends on
how much it is an indicator of relatively
broad and significant areas of behavior
 Tests alone are not enough – it has to be
empirically demonstrated that test
performance is related to the skill set for
which he or she is tested
 Tests need not resemble closely the
behavior trying to be predicted
 Prediction – assumes that the performance
of the individual in the test generalizes to
other situations
 Capacity – can tests measure “potential”?
 Only in the sense that present behavior can
be used as an indicator of future behavior?
 No psychological test can do more than
measure behavior
OBJECTIVE MEASUREMENT OF DIFFICULTY
o
o


STANDARDIZATION





Uniformity
of
procedure
when
administering and scoring a test
Testing conditions must be the same for all
Establishing norms (normal or average
performance of others who took the same
test under the same conditions)
Raw scores are meaningless unless
evaluated against suitable interpretative
data
Standardization sample – indicates average
performance and frequency of deviating by
varying degrees from the average
o Indicates position with reference to
all others who took the test
o In personality tests, indicates
scores typically obtained by
average persons
Objective – scores remain the same
regardless of examiner characteristics
Difficulty – items passed by the most
number of people are the easiest
Reliability - Consistency of scores obtained
when retested with the same test or with an
equivalent form of test
Validity
- Degree to which the test measures what
it’s supposed to measure
- Requires independent, external criteria
against which the test is evaluated
- Validity coefficient – determines how
closely the criterion performance can be
predicted from the test score
– low correspondence
between test performance and criterion
– high correspondence
between test performance and criterion
- Broader tests must be validated against
accumulated data based on different
investigations
- Validity is first established on a
representative sample of test takers before
it is ready for use
h we
know what the test measures
Guidelines in the Use of Psychological Tests

General: prevent the misinterpretation
and misuse of the test to avoid:
o Rendering test invalid; and
o Hurting the individual



A qualified examiner needs to:
o Select, administer, score, and
interpret the test
o Evaluate
validity,
reliability,
difficulty level, and norms
o Be familiar with standardized
instructions and conditions
o Understand the test, test-taker, and
testing conditions
o Remember that scores obtained
can only be interpreted wit
reference to the specific procedure
used to validate the test
o Obtain some background data in
order to interpret the score
o Obtain information on other special
factors that influenced the score
The test user is anyone who uses test scores
to arrive at decisions
o Most frequent cause of misuse:
insufficient or faulty knowledge
about the test
Ensure the security of test content and
communication
o Need to forestall deliberate efforts
to fake scores
o Need to communicate in order to:
- Dispel the mystery surrounding the test
and correct prevalent misconceptions
Present relevant data about reliability,
validity, and other psychometric
properties
- Familiarize
test-takers
about
procedures, dispel anxiety, an ensure
that best performance is given
- Feedback regarding test performance


The test administration:
o Should help predict how the client
will behave outside the testing
situation
o Influences specific to the testing
situation introduces error variance
and reduces test validity
o Examiners need to memorize exact
verbal instructions, prepare test
materials,
and
familiarize
themselves with specific testing
procedure
The testing conditions
o Suitable testing room with
adequate lighting, ventilation,
seating facilities, and work space
o Implications of details during
testing (e.g. improvised answer
sheet,
paper-and-pencil
vs.
computer, familiar examiner vs.
stranger)
o Need to:
o
o
of test
how to take the test
most minute detail


account when interpreting test
results
Some examiners may deviate from
procedure to extract more information.
However, scores obtained this way can
no longer be compared to the norm.
Establish rapport
o Examiner’s efforts to arouse
interest in the test, elicit
cooperation, and encourage them
to respond in a manner appropriate
to the test objectives
Any deviation from standard
motivating conditions should be
noted and used for interpretation
Maximizing rapport:
- Maintain a friendly, cheerful, and
relaxed manner
- Consider examinee characteristics
(e.g. for children, consider
presenting the test as a game, have
brief test periods)
- Be sensitive to special difficulties
- Give reassurance – no one is
expected to finish or to get all items
correctly (every test implies a
threat to a person’s prestige)
- Eliminate surprise by:

items
- Convince them that is in their own
interest to obtain a valid and
reliable score (e.g.avoiding waste of
time, arriving at correct decisions)
Examiner and Situational Variables
- E.g. age, sex, ethnicity, professional or
socioeconomic status, training and
experience, personality characteristics
- Manner: warm vs. cold, rigid vs. normal
- Testing variables: nature of test,
purpose of testing, instructions given to
test-takers Examiner’s non-verbal
behavior (e.g. facial or postural cues)
-
Test-taker: activities preceding the task,
receiving feedback
In case these situations cannot be
controlled, qualify this in the feedback /
report
Norms – represents test performance of the
standardization sample
The raw score is converted into a derived score,
which:
o

Training effects
- Coaching – close resemblance of test
content and coaching material
o
o
scores
-
to specific test content, there is low
generalizability of improvement to
other criteria
Test sophistication – repeated testing
experience introduces an advantage
over first-time test-takers
-
details, and implications of a
solution
formulation of a
solution
evaluate performance
NORMS AND THE MEANING OF TEST SCORES
REMEMBER: In the absence of additional
interpretative data, a raw score on any
psychological test is meaningless.
Measures relative standing in the
normative sample – performance in
reference to other persons
Permits direct comparison on different tests
Can be expressed in terms of
- Developmental level attained; or
- Relative position within a specified
group
STATISTICAL CONCEPTS


equalize
It’s more effective to train on broad
cognitive skills such as:
-solving


Statistics – used to organize and summarize
quantitative data to facilitate and
understanding of it
Frequency distribution – tabulating scores
into class intervals and counting how often
a score falling in the class interval appears
within the data
Normal curve features:
- Largest number of cases cluster in the
center of the range
Number drops gradually in both
directions as extremes are approached
- Bilaterally symmetrical – 50% of cases
fall to the left and to the right of the
- Single peak in the center



Central tendency – single, most typical or
representative scores to characterize the
performance of an entire group
 Mean – average; add all scores
and divide by total number of
cases
 Mode – most frequent score;
midpoint of the class interval
with the highest frequency;
highest
point
on
the
distribution curve
 Median – middlemost score
when all scores have been
arranged from smallest to
largest
Variability – extent of individual differences
around the central tendency
Range – highest and lowest score
o Deviation – difference between an
individual’s score and the mean of
the group (x = X - M)
Standard deviation – square root of the
variance; compares the variability of
different groups
o higher standard deviation means
more
individual
differences
(variation)
DEVELOPMENTAL NORMS


Basal age – highest age at and below which
all tests were passed
Mental age – basal age + partial credits in
months for tests passed above basal agelevel tests
o Mental
age
unit
shrinks
correspondingly with age

Grade equivalent – mean raw score
obtained by children in each grade
Disadvantages:
o
Appropriate only for common subjects
taught across grade levels (e.g. not
applicable for high school level)
Emphasis on different subjects may
vary from grade to grade
Grade norms are not performance
standards

Where X = raw
score, µ = mean,
and σ = standard
deviation
o
Percentile – percentage of persons who fall
below a given raw score
o Indicates person’s relative position in
the standardization sample
o The lower the percentile, the lower the
standing
o Advantages:
Easy to compute
Can be easily understood
Universally applicable
o
Disadvantage: inequality of units
 Shows only the relative
position but not the

the group (because the distance from 3σ to -2σ is equal to 2.14, + distance
from -2σ to -1σ id 13.59 = 13.59 + 2.14
= 15.73
- T-score – (normalized standard score) x
10 ± 50
o µ = 50, σ = 10
- Stanine – also called “standard nine”
o µ = 5, σ = 2
Stanine Percentage

Ordinal scales – sequential patterning of
early behavior development
o Developmental stages follow a
constant order; each stage
presupposes mastery of an earlier
stage
WITHIN-GROUP NORMS

amount of difference
between the scores
Standard score – individual’s distance
from the mean in terms of standard
deviation units
o Linear transformation – retain
exact numerical relations of
original raw scores
 Subtract
constant,
divide by constant
 Also called z-score
-
Non-linear transformation – fit
scores to any specified
distribution curve (usually
normal curve)
 Normalized
standard
scores – distribution that
has been transformed to fit
the normal curve
 Compute
the
percentage
of
persons falling at
or above each raw
score
 Locate percentage
in the normal curve
 Obtain normalized
standard score
Example: A score of -1 means that
person surpassed approximately 16% of

Deviation IQ
o IQ = ratio of mental age to
chronological age
 if IQ = 100, mental age =
chronological age
𝐼𝑄 = 𝑚𝑒𝑛𝑡𝑎𝑙 𝑎𝑔𝑒/𝑐ℎ𝑟𝑜𝑛𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑎𝑔𝑒
o
o
standard score with µ = 100 and σ =
15 (or 16, depending on the test)
DIQ is only comparable across tests
if they have the same mean and
standard deviation
o
Relativity of Norms




IQ should always be accompanied by the
name of the test
Individual’s
standing
may
be
misrepresented if inappropriate norms are
used
Sources of variation across tests:
o Test content
o Scale units of mean and standard
deviation
o Composition of standardization
sample
Normative
sample
–
ideally,
a
representative cross-section of the
population for which the test is designed
o Sample – group of persons actually
tested
o Population – larger but similarly
constituted group from which the
sample is drawn
o Should be large enough to provide
stable values
o Should be representative of the
population under consideration
 Else, restrict the population
to fit the sample (redefine
population)
o Should consider specific influences
affecting the normative sample
Anchor Norms – used to work out equivalency
between tests
 Equipercentile method – scores are
equivalent if they have equal percentiles on
two tests (e.g. 80th percentile in Test A = IQ
of 115, 80th percentile in Test B = IQ of 120,
therefore Test A’s 115 is Test B’s 120)
Specific norms – tests are standardized to more
specific populations to suit the purpose of the test
(a.k.a. subgroup, local norms)

Fixed reference group –
comparability and continuity



referred
to
for
An independently sampled group against
which future test scores are compared
Updated via anchor test (or list of common
items) that have items occurring in the
original reference group. Adjustments are
made based on comparing frequency of
correct answers on common items of
previous group and present group
DOMAIN-REFERENCED TEST INTERPRETATION








Aka “criterion-referenced” testing
Reference is content domain rather than a
group of persons
Tests mastery of specific content (what can
the client do?)
Content meaning – focus on what they can
do vs. how they compare with others
Should have content that is widelyrecognized as important
Should have items that sample each
objective
Best used for testing basic skills at
elementary levels
Mastery testing – if individual has or has not
obtained a pre-established level of mastery
o Individual differences is of little or no
importance
Impractical for content beyond
elementary skills because of differing
levels of achievement, instruction
Tests need to have critical variables
required for performance of certain
functions
Efforts should be made to address
limitations of a single test score
o Cutoff should be a band of scores rather
than a single score on one
administration of the
o test
o Should be dependent on other sources
of information
o Both test construction and content
experts should decide on cutoff scores
o Score should be established on
empirical data
RELIABILITY
RELIABILITY
 Consistency of scores obtained by
the same person across time, items,
or other test conditions
 Extent
to
which
individual
differences in test scores represent
“true” differences or chance errors


Estimate what proportion of test
score variance is error variance
o Error variance – difference
in scores resulting from
conditions
that
are
irrelevant to the purpose of
the test
No test is a perfectly reliable
instrument
CORRELATION COEFFICIENT
-
Expresses the degree of relationship
between two scores
Zero correlation indicates the total
absence of a relationship
Pearson Product-Moment Correlation
Coefficient – accounts for individual’s
position in the group and the amount of
deviation from the mean
-

TYPES OF RELIABILITY
Statistical significance – whether findings in
the sample can be generalized to the
population
“significant at the .01 level” = there is
only about 1 out of 100 chance that the
findings in the sample is wrong (i.e. only
1 in 100 chance that the correlation is
actually 0).
Significance level – risk of error we’re
willing to take in drawing conclusions
from our data




Test-Retest Reliability






Confidence interval – range of score
under which the true score might fall
given a specified level of confidence
Reliability coefficient – use of correlation
coefficient for psychometric properties
o Level of acceptable correlation
coefficient = .80 - .90




Repeat same test on the same person on
another occasion
Test for correlation between scores on the
two separate testing occasions
Source of error variance – fluctuations in
performance between the two testing
occasions
Shows how test can be generalized across
situations
Higher reliability, lower susceptibility to
random changes
Need to specify length of interval
Interval rarely exceeds 6 Months
Disadvantage: practice effect
Can only be applied to tests in which
performance is not affected by repetition
(e.g. sensorimotor, motor)
Alternate-Form Reliability



Same person is tested with one form on one
occasion and an alternate, equivalent form
on
another occasion
Test for correlation of scores on the two
forms


Measure of both temporal stability and
consistency of responses to tw different
item samples
Source of error variance: content sampling
(to what extent does performance depend
on
specific items or arrangement of the test?)
Parallel forms must:
o Be independently constructed;
o Items should be expressed in the same
form;
o Same type of content;
o Equivalent range and level of difficulty;
o Instructions, time limits, and sample
items must be equivalent
Disadvantage: reduce but does not
completely eliminate practice effect
Questionable: degree of change in the test
due to repetition (e.g. insight tasks)
Split-Half Reliability






Two scores are obtained by dividing it into
equivalent halves
Source of error variance: content sampling
Test for coefficient of internal consistency
Single administration of a single form
Longer test = more reliable
Spearman-Brown formula – for estimating
the effect of shortening or lengthening the
test
o Used because this type of reliability
only technically computes for the
reliability of half the test
o
Inter-item Consistency (a.k.a. Kuder-Richardson
Reliability and Coefficient Alpha)







Single administration of a single form
Consistency of all items in the test
Source of error variance:
o Content sampling
o Heterogeneity of behavior
More homogenous items, more consistency
However, is a homogenous test appropriate
for a heterogeneous psychological
construct? Is the criterion being predicted
homogenous or heterogeneous?
Unless items are highly homogenous, the
KR coefficient will be lower than S-H
Used for tests with no wrong or right
answers
Scorer Reliability


Factors excluded from error variance:
o True variance (remains in scores)
o Irrelevant factors that can be controlled
experimentally
Correlate results obtained by two separate
scorers
Interpreting reliability scores:


.85 = 85% is true variance, 15% is error
variance
Analysis of error variance:
o Reliability from delayed alternate forms
= 1 - .70 = .30 (content + time)
o Reliability from split-half = 1 - .80 = .20
(content)
o Error variance due to time: .30 - .20 =
.10 (time)






Scorer Reliability = 1 - .92 = .08
(interscorer)
o Error variance = .30 + .20 + .08 = .38
o True variance = 1 - .38 = .62
Speed tests – low-difficulty test with very
short time limit
Power tests – high-difficulty items with no
time limit
No one can complete the tests
Reliability of speed tests cannot be
measured from single administration
Reliability is affected by the range of
individual differences in the group
Reliability is also affected by varying
average ability level




Standard Error of Measurement


Also expresses reliability
Interval between which the true score may
lie (obtained score ± 1 SEM)
VALIDITY



Criterion-related Validity

Validity
What the test measures and how well it
measures it
What can be inferred from the test scores
Correlation coefficient between a test score
and a direct and independent measure of
criterion



Content-Description Procedures (Content Validity)

Systematic examination of the test to
evaluate if it covers a representative sample
of behavior to be tested
Content must be broadly defined to cover
major objectives
Important to consider test-taker responses,
not just relevance of content
Test specifications – content areas or topics
to be covered, objectives, importance of
topics, number of items per topic
o More appropriate for achievement
tests
o Does the test cover a representative
sample of specified skills and
knowledge?
o Is test performance free from irrelevant
variables?
Face validity – whether the test “looks
valid” to test-takers and other technically
untrained observers
o Desirable feature of a test but should
not be a substitute for other types of
validity

Indicate test’s effectiveness in predicting
performance in specified activities
Not about time, but about objective of
testing
Concurrent – used to diagnose existing
status (Does person qualify or the job?)
o Criterion data is already available
Predictive – used to predict future
performance (Does person have the prerequisites to do well in a job?)
Avoid criterion contamination (e.g. rater’s
knowledge of test contaminates criterion
ratings)

Criterion measure examples: academic
achievement, performance in training,
actual job performance, contrasted groups
(extremes of distribution of criterion
measures); psychiatric diagnoses, ratings by
authority, correlation between new test
and previously-available test


Pre-test and post-test scores = training is
valid if, after training, failed items in pretest were passed during post-test
Structural Equation Modeling (SEM) –
explores relationships among constructs
and the path that a construct uses to affect
criterion performance


Construct Validity








Extent to which a test measures a
theoretical construct or trait
Evidence includes research on nature of the
trait and the conditions affecting
development and manifestation
Age differentiation – used in traditional
intelligence tests
Correlation with other tests – new test
measures approximately the same behavior
as the previous test
o Moderate correlation is desirable
Factorial validity – identification of factors
and determining factors that impact the
scores
Internal consistency – measure of
homogeneity
o Upper criterion group vs lower criterion
group – items that do not show higher
scores on upper criterion group are
eliminated
Convergent – test correlates highly with
others tests that it should theoretically
correlate with
Discriminant – test does not correlate with
variables it should be theoretically different
from

Measurement and Interpretation of Validity



Validity coefficient – correlation between
test scores and criterion
Conditions affecting validity:
o Demographics of the group
o Sample heterogeneity (i.e. sample was
pre-selected)
o Change over time because of selection
standards
o Relationship between test and criterion
(linear, curved, curvilinear)
Heterscedasticity – unequal variability in
high and low scores (e.g. little variability in
scores of Test A when scores on test B are
low, wider variability of scores in Test A
when scores of test B are higher)
Uses of Tests for Decision-making


Selection – either accepted or rejected
Placement – assignments to different
categories based on a single score
Classification – involves two or more criteria
for placement
Differential validity – test should be able to
determine
differences
in
person’s
performance in different jobs or programs
(i.e. test should be able to see if person is
good at Job A and not at Job B)
o Battery should include tests that are
good predictors of criterion A and poor
predictors of criterion B, and vice-versa
Multiple
discriminant
functions
–
determine how closely a set of scores
approximates typical scores in a given job,
diagnosis, etc. Used for:
o Criterion is unavailable but group
characteristics are
o Non-linear relationship between a
criterion and one or more predictors
Test bias


Slope bias – significantly different validity
coefficients in the two groups (differential
validity)
Intercept bias – systematically under- or
over-predicts criterion performance for a
particular group
ITEM ANALYSIS
Item Analysis


Used to shorten test and increase its
reliability and validity
Item difficulty – percentage of people
passing the item
o Items are usually arranged in increasing
difficulty
o







The higher the inter-item correlations,
the wider the spread of difficulty should
be
Thurstone Absolute Scaling
o Find scale values of items separately
within each group by converting
percentage passing into z-values
o Translate all these scale values into
corresponding values for the group
chosen as the reference group
Test score distribution must approximate
the normal curve
Item discrimination – degree to which an
item differentiates correctly among testtakers in the measured behavior
In contrasting groups, upper and lower 27%
are used
Purpose: identify deficiencies in the test or
in the teaching
Index of discrimination (D) – difference in
percentage passing of upper scorers and
lower scorers (convert number of persons
passing into percentages)
Phi coefficient – relationship between item
and criterion
Item Response Theory





Item-test regression – represents both item
difficulty and item discrimination
o Difficulty level – 50% threshold (50%
passing and 50% failing)
o Discriminative power = steeper curve,
higher discriminative index
Item performance is related to an estimated
amount of latent trait
Item information functions – takes all item
parameters into account and shows how
efficiently an item measures behavior at
different ability levels
Item parameters should not vary according
to ability levels
Cross-validation – independent validation
of the test separate from group on which
items were selected
o Factors affecting lowering validity
across different groups:
 Size of the original item pool =
number in original item pool was
large and the number retained is
small, there is a higher chance that
the validity of the retained items
will be spurious because of more
opportunities to capitalize on
chance differences
 Size of the sample – smaller sample
size has higher error variance
 Items are assembled without a
theory
 Differential item functioning –
identify items for which persons
with equal ability from different
cultural groups have different
probabilities for success
- Possible reason: item does not
measure the same construct in
the two groups.
INTELLIGENCE
Intelligence








Ability level at a given point in time
Score is not indicative of the reasons behind
performance
o Should be descriptive rather than
explanatory
Should not be used to label individuals but
help in understanding them
Start where they are, assess strengths and
weaknesses, make interventions
Contribute to self-understanding and
personal development
Not a single entity but a composite of
several functions
o Combination of abilities required for
survival and advancement within a
culture
Measures of scholastic aptitude or
academic achievement
o Reflective of prior educational
achievement
o Indicator of future performance
o Effective predictor of performance in
various occupations and daily life
activities
Should not be the only basis for making
decisions
Heritability and Modifiability



Heritability index – how much of the
variation in scores is due to genetics?
o Obtained using correlations of
monozygotic and dizygotic twins
Limitations
o Applicable to populations and not
individuals – MR can still be due to a
defective gene
o Limited to population characteristics at
a given time
o Does not indicate modifiability
IQ is not fixed and unchanging, it can be
modified
o Changes can result from events or
environmental interventions
o Training on cognitive skills, problemsolving strategies, efficient learning
habits
Motivation



Personality is not independent from
aptitude
Aptitudes
cannot
be
investigated
independent from affect
o Prediction of subsequent performance
can be enhanced by combining it with
information about motivation and
attitudes
Achievement elsewhere can help shape
cognitive performance (self-concept)
o
Theories of Intelligence Organization
Two-Factor Theory








Charles Spearman
All intellectual factors share a common
factor (g)
(s) – specific factors limited to very specific
abilities
Only g accounts for the correlation of
performance in two intelligence tests
Aim of testing: measure the amount of g
Single test that is highly saturated with g
could be substituted for test with
heterogeneous items
Abstract relations are the best measures of
g
Group factor – degree of correlation that
may result above and beyond g (e.g.
arithmetic, mechanical, linguistic) –
common to some but not all
Multiple Factor Theories


Thurstone
Group factors called “primary mental
abilities”
o Verbal comprehension – tests such as
reading
comprehension,
verbal
analogies, verbal reasoning, etc.
o Word fluency – anagrams, rhyming,
naming words within a category
o Number – speed and accuracy of
arithmetic operations
o Space – perception of fixed spatial or
geometric relations, manipulatory
visualizations
o Associative Memory
o
Perceptual Speed – quick and accurate
grasp of visual details, similarities, and
differences
Induction / General Reasoning – find a
rule and apply it to others
Structure of Intellect Model


Guilford
Mental abilities can be traced into
underlying factors, which are categorized
into three dimensions
o Operations – what the person does
(memory recording and retention,
divergent and convergent production,
evaluation, cognition)
o Contents – nature of materials on which
operations are performed (visual,
auditory,
symbolic,
semantic,
behavioral)
o Products – form in which information is
processed (units, classes, relations,
systems, transformations, implications)
Cattell-Horn-Carroll Theory (C-H-C)


Catell
o Fluid intelligence - broad ability to
reason, form concepts, and solve
problems using unfamiliar information
or novel procedures.
o Crystallized intelligence - breadth and
depth of a person's acquired
knowledge, the ability to communicate
one's knowledge, and the ability to
reason using previously learned
experiences or procedures.
Carroll
o Layers that represent






ABILITY TESTS





Index of general level of performance
Often designated as tests of scholastic
aptitude or academic achievement
Individually-Administered Intelligence Tests

Nature and Development of Traits
Differences in factor patterns are
influenced by experiential background
Change over time is also observed, also
because of different methods in carrying
out the same task
Mechanisms
o Learning set – learning through
presentation of different problems of
the same kind






o
o
Transfer of training – formal schooling
where efficient and systematic
problem-solving techniques is learned
Co-occurrence of learning experiences
– learn one, learn all in a proper
environment


Processing skills tend to be specific to type
of content being processed (domain
specificity)
Domain
–
content
(linguistics,
mathematical) or context (cultural, social,
geographical)
Intelligence tests developed are just
measures of scholastic achievement. Are
there tests that will measure so-called
“practical, everyday” intelligence?
Stanford-Binet Intelligence Test (5th
Edition)
o Age range: 2-65 years old
o Verbal and Non-verbal
Fluid Reasoning
Knowledge
Quantitative Reasoning
Visual-Spatial Processing
Working Memory
o Reliability: split-half, test-retest, interscorer
o Validity: Content, Construct (age
differentiation)
Wechsler Scales
o Wechsler Preschool and Primary Scale
of Intelligence (WPPSI-R), Wechsler
Intelligence Scale for Children (WISCIV), Wechsler Adult Intelligence Scale
(WAIS-IV)






Verbal
o Information
o Comprehension
o Similarities
o Vocabulary
 Performance
o Block Design
o Matrix Reasoning
o Visual Puzzles
o Picture Completion
o Figure Weights
Working Memory
o Digit Span
o Arithmetic
o Letter-Number Sequencing
Processing Speed
o
o
Coding
Reliability: test-retest, interscorer
Validity: Construct (convergent with other
cognitive abilities including motor, memory,
language, attention)
Differential Ability Scales
o Measure specific abilities rather than a
global IQ
 Core subtests = General Conceptual
Ability
 Diagnostic subtests – relatively
independent abilities
 Achievement tests
o Reliability: internal consistency, testretest
o Validity: Criterion (w/ Wechsler, SB),
Construct
Kaufman Scales

Tests for Special Populations
Infant and Preschool Testing


Require individual administration
Bayley Scales of Infant Development
o Assess current developmental status rather than
subsequent ability
Comprehensive Assessment of Mentally-Retarded
Persons

o Mental scale – sensory and perceptual, memory,
learning, problem-solving, vocalization, vocal
communication, abstract thinking
o Motor scale – gross motor abilities
o Behavior rating scale – personality development:
emotional and social behavior, attention span and
arousal, persistence, goal-directedness


McCarthy Scales of Children’s Abilities
o Index of functioning at the time of
testing
 Verbal
 Perceptual-Performance
 Quantitative
 Genera Cognitive
 Memory
 Motor
Piagetian Scales
o Presuppose a uniform sequence of
development through successive stages
 Object permanence
 Development of means to achieve
ends
 Imitation
 Operational causality
 Object relations in space

o
Development of schemata for
relating to objects

Mental retardation – substantial limitations
in present functioning
o Sub-average intellectual functioning
concurrent with limitations in two or
more of the following: communication,
self-care, home living, social skills,
community use, self-direction, health
and safety, functional academics,
leisure and work
Vineland Adaptive Behavior Scale – focus on
what individual habitually does rather than
what s/he can do
Testing Persons with Physical Disabilities





Modify testing medium, time limits, content
of tests
Individualized assessment using a variety of
data from different sources
Hearing impairments – usually handicapped
by verbal tests
Visual impairments – adapting oral tests, no
performance tests
Motor impairments – may not be able to
compose oral or written responses, no time
limit, more prone to fatigue
Multicultural Testing



Language, speed removed as a parameter
Varying test content
Ravens’s Advanced Progressive Matrices
o Measure of ‘g’


Requires eduction of relations among
abstract items
Culture-Fair Intelligence Test
o Cattell
o Test of fluid reasoning
o Inductive reasoning – make broad
generalizations based on available data
o Think logically and solve problems in
novel situations, regardless of learned
intelligence
 Series - Choose which best
completes the series
 Classification - Identify two figures
which are in some way different
from others
 Matrices - Complete the design or
matrix presented
 Conditions - Select the one that
duplicates the conditions given
Goodenough-Harris Drawing Test
o Accuracy
of
observation
and
development of conceptual thinking
o Test may measure different functions at
different ages
Approaches to cross-cultural testing
1. Choose items that are common across
cultures; validate against local criteria
2. Develop a test within one culture and
administer it to persons with different
cultural backgrounds
3. Different tests are developed for each
culture, validated, and used only within
that culture
Group Tests






Used in educational system, government
service, industry military
Typically employs multiple-choice format
for uniformity and objectivity in scoring
Increasing difficulty arranged in separately
timed subtests
Spiral-omnibus format – single long time
limit, mixed items of increasing difficulty
Advantages
o Can be administered simultaneously
o Greatly simplifies examiner’s role
o Provides more uniform testing
conditions
o Scoring is more objective
o Provide better established norms
Disadvantages
o Less
opportunity
for
rapport,
maintaining cooperation and interest
o Less likely to detect extraneous
interfering variables
o Examinees have restricted responses –
penalized original thinkers
o Little to no opportunity for direct
observations
o Lack of flexibility


Tests for Multiple Aptitudes

Multi-level batteries



Sample major intellectual skills found to be
pre-requisite for schoolwork
Suitable for schools for comparability across
levels
Youngest age suitable for group testing:
Kindergarten / 1st grade
Cognitive Abilities Test
o Verbal – verbal classification, sentence
completion, verbal analogies
o Quantitative – quantitative relations,
number series, equation building
o Nonverbal – figure classification, figure
analogies, figure analysis
Test of Cognitive Skills
o Sequences – understanding and
applying rules of arrangement in
patterns of figures, letters, or numbers
o Analogies – identifying the relationship
and applying the principle to select a
second pair exhibiting the same
relationship
o Verbal Reasoning – identification of
essential elements in objects or things,
inferring relationships between sets of
words, drawing logical conclusions from
verbal passages
o Memory – definitions of a set of
artificial words are presented and recall
is tested after other tests have been
given

Used because of:
o Intraindividual variation in performance
on intelligence scales
o Tests are found to be primarily a
measure of verbal comprehension
Differential Aptitudes Test
o For educational and career counseling
of grades 8-12 students
o Verbal
Reasoning,
Numerical
Reasoning,
Abstract
Reasoning,
Perceptual Speed and Accuracy,

Mechanical Reasoning, Space Relations,
Spelling, Language Use
Multidimensional Aptitude Battery
o Group test designed to measure the
same aptitudes as WAIS-R
o Suitable for adolescents and adults, not
for individuals with mental disturbance
or retardation
o Provides fully interpretable scores at
the subtest level (T-score), Verbal and
Performance Level, and Overall Total
Score
Psychological Issues in Ability Testing



Nature of intelligence
o Intelligence is complex and dynamic
o Intelligence test performance is highly
stable
o Intelligence develops cumulatively
Environmental contributions to Intelligence
o Environmental stability contributes to
IQ stability
o Pre-requisite learning skills contribute
to subsequent learning
Functional academics + personality
characteristics
o Rises or drops may occur as a result of
environmental changes



o
o
o
Genetics and development
o Pre-school tests have moderate
predictive validity, infant tests have
none
o In the absence of inborn pathology,
environment plays a major role in
subsequent development
o Developmental transformations –
rudimentary skills at infancy are
transformed with age into more
complex manifestations
o Individual differences within age level is
greater than individual differences
across age levels
o Changes that occur with aging varies
with the individual
o Demands for adults are different from
school-age children (practical vs.
academic information)
o Mean IQ of adult performance
increased over the years (Flynn effect)
Research trends
o Cross-sectional analysis of IQ trends –
adults have less IQ (because they
received less education)
o Longitudinal studies – scores tend to
improve with age
Culture
Cultural changes and not simply age
determines rises and declines in
performance
Cultural influences will and should be
reflected in test scores
Cultural differences can become handicap
when the individual moves from one culture
to another and attempts to succeed in the
latter culture
PERSONALITY

Measures
emotional,
motivation,
interpersonal, attitudinal characteristics of
a person
Development of Personality Tests



Content-related Procedures
o Obtain information regarding a
psychological construct, create items
consistent with that construct
o Example: Woodworth Personal Data
Sheet –
information regarding
psychiatric and pre-neurotic symptoms
Empirical Criterion-Keying
o Development of scoring key in terms of
some external criterion
o Select items that differentiate between
clinical samples and normal population
o Example: if 25% or more “normal”
people answered an item unfavorably,
it could not be “abnormal” since it is
present in the “normal” population
with such frequency
o Responses are treated as diagnostic or
symptomatic of the criterion behavior
with which they are associated
 Examples: Minnesota Multiphasic
Personality Inventories (MMPI),
California Psychological Inventory,
Personality Inventory for Children
Factor analysis
o Systematic classification of personality
traits
o Example:
Guilford-Zimmerman
Temperament Survey



Personality Theories
o Biopsychosocial
 Source of reinforcement (detached,
discordant,
dependent,
independent, and ambivalent)
 Pattern of coping behavior (active
vs. passive)
 Not
a
general
personality
instrument
 Help in differential diagnoses
 Example: Millon Clinical Multiaxial
Inventory
o Manifest Needs System (Henry Murray)
 Results in ipsative scores = strength
of need is expressed in relation too
other needs within the individual

questionable
ample: Edwards Personal
Preference Schedule
 Achievement - need to
accomplish tasks well
 Deference - need to conform to
customs and defer to others
 Order - need to plan well and
be organized
 Exhibition - need to be the
center of attention in a group
 Autonomy - need to be free of
responsibilities and obligations
 Affiliation - need to form strong
friendships and attachments
 Intraception - need to analyze
behaviors and feelings of
others
 Succorance - need to receive
support and attention from
others
 Dominance - need to be a
leader and influence others
 Abasement - need to accept
blame for problems and
confess errors to others
 Nurturance - need to be of
assistance to others
 Change - need to seek new
experiences and avoid routine
 Endurance - need to follow
through on tasks and complete
assignments
 Heterosexuality - need to be
associated with and attractive
to members of the opposite sex

Aggression - need to express
one's opinion and be critical of
others
Example: Personality Research Form and
Other Jackson Inventories
o Behaviorally-oriented and mutuallyexclusive definitions of 20 personality
constructs
o For prediction of behavior of individuals
in normal contexts
Test-Taking Attitudes and Response Bias





Faking
o Faking good – choosing answers that
create a favorable impression
o Faking bad – choosing answers that
make them appear more disturbed
o Face validity increases susceptibility to
faking
Social desirability – test-taker is unaware of
putting up a “good front”
Impression management – conscious
dissembling to create a specific effect
o Avoid: forced-choice items
Acquiescence – tendency to answer True or
Yes
o Avoid: number of items keyed positively
should equal number of items keyed
negatively
Deviation – tendency to give unusual or
uncommon responses
Traits, States, Persons, and Situations

Behavior can be explained by both traits,
states, and their interaction




Individuals differ in extent of altering
behavior to meet the situation
Different behavior settings influence
behavior
Trait – relatively stable
State – transitory condition

Measuring Interests and Attitudes
Values

Difficulties with value inventories:
o Sampling systematically
o Appropriate level of abstraction
o Value domains
o Early inventories were incompatible
with contemporary definitions
Interest Inventories




Interest testing – used for educational an
career assessment
o Also stimulated by occupational
selection and classification
Opinions and attitudes
o For social psychology research
o Consumer research and employee
relations
Has exploration validity – interest
inventories increase behaviors needed for
career exploration
o Used to introduce individual to careers
that he or she has not previously
considered
Issue: Sex fairness
o Tests are validated against existing
groups,
it
perpetuates
group
differences


Example: Strong Interest Inventory
o “Like,” “Indifferent,” “Dislike” of 5
categories:
 Occupations
 School subjects
 Activities
 Leisure activities
 Day-to-day contact with various
people
o Levels of Scores

(Realistic, Investigative, Artistic,
Conventional, Enterprising, Social)
– RIASEC


o Personal Style




-taking
o Validity: Criterion
Example: Jackson Vocational Interest
Survey
o Work roles – what a person does on the
job
o Work styles – preference for situations
or environments
o 34 basic interest scales, 26 work roles,
and 8 work styles
o Equally applicable to both sexes
o Validity: Construct
Example: Kuder Occupational Interest
Survey
o Uses forced-choice triad (liked most to
liked least)
o


10 Broad interest areas: Outdoor,
Mechanical, Computational, Scientific,
Persuasive, Artistic, Literary, Musical,
Social Service, Clerical
o Grouped based on content validity
o Scores expressed as correlation
between respondent scores and the
interest pattern of a particular group
Example: Self-Directed Search
o Self-administered, self-scored, selfinterpreted
o Holland – occupational preferences s a
choice of a way of life
 Individuals seek environments that
are
congruent
with
their
personality types
 Vocational
choices
are
implementations of self-concepts
Trends
o Expansion of occupational levels
o Effect of inventory on test-taker
o RIASEC model not a good fit for minority
and other cultures
Opinion Surveys and Attitude Scales


Attitude – tendency to react favorably or
unfavorably to a stimulus (e.g. ethnic group,
custom, institution)
o Cannot be directly observed
o Should be inferred from verbal and
nonverbal behavior
Opinions – replies to specific questions

Other Assessment Techniques
Measures of Styles and Types




Cognitive style – preferred and typical
modes of perceiving, remembering,
thinking, and problem solving
Individuals differ on how they perceive and
categorize situations, which depends on
prior learning and experience
Aptitude
cannot
be
investigated
independent of affect
o Example: Perceptual tasks are related
to attitude, motivation, and emotion
o Example: Flexibility of closure = socially
retiring, independent, analytical
o Example: Field dependence – extent to
which their perception of what is
upright is influenced by surrounding
visual field; sometimes called “cognitive
control”

-independent = active,
participant approach to learning
Personality types – constructs used to
explain similarities and differences in
preferred modes of thinking, perceiving,
and behaving across individuals
o Example: Myers-Briggs Type Indicator
 Attitude:
Introversion
vs.
Extraversion
 Ways of Perceiving: Sensing vs.
Intuition
 Ways of Judging: Thinking vs.
Feeling
 Lifestyle: Judging vs. Perceiving
 All types are valuable and
necessary, they each have
strengths and weaknesses
o
Individuals are more skilled within
their
preferred
functions,
processes, and attitudes
Criticism:


stereotypes

categorical data
Situational Tests





Placing individual in a situation closely
resembling a “real-life” criterion situation
Character Education Inquiry – makes use of
familiar, natural situations in one’s routine
o Measures
honesty,
self-control,
altruism
Situational Stress Test – sample individual’s
behavior in a stressful, frustrating, or
emotionally disruptive environment
Leaderless Group Discussion – group is
assigned a topic for discussion. Measures
verbal communication, verbal problemsolving, and acceptance by peers
Role-playing
Self-Concepts and Personal Constructs



How events are perceived by the individual
Extent of self-acceptance by the individual
Capacity to conceptualize self – capability to
assume distance from one’s self and one’s
impulses
o Manifests in test-taking defensiveness,
response sets, social desirability
o Increases with age, education, SES, and
intelligence


Example: Washington University Sentence
Completion Test – measures levels of ego
development
o Prosocial, Impulsive, Self-Protective,
Conformist, Self-Aware Conscientious,
Individualistic,
Autonomous,
and
Integrated
Self-Esteem Inventories and Others
o Self-esteem – evaluative component of
self-construct; evaluation of an
individual of his or her performance
o Example: Adjective Checklist – consists
of 300 adjectives and adjectival phrases
commonly used to describe a person’s
attributes.
o Example: Q-Sort – give piles, arrange
from “most characteristic” to “least
characteristic” in a forced-normal
distribution
(examiner
specifies
number of cards to be placed in each
pile)
o Example: Semantic Differential –
examines connotations of any given
concept for the individual (e.g. From a
scale of 1 that means bad and 7 that
means good, how would you rate
“Father”?)

uative (good-bad, valuableworthless, clean-dirty)

-weak, largesmall, heavy-light)

-passive, fast-slow,
sharp-dull)
o
Observer Reports




Naturalistic
observation
–
direct
observation of spontaneous behavior in
natural settings (e.g. diary method, time
sampling)
o No control is exerted over the stimulus
situation
Interview – elicit life-history data
o Can
be
highly-structured
to
unstructured
o Affords direct observation
Ratings – evaluation of the individual based
on cumulative, uncontrolled observations
o Disadvantages:
 Ambiguity
 Amount of relevant contact
 Halo effect
 Error of central tendency
 Leniency error
Nominating technique – choose one person
with whom individual would like to study,
work, eat lunch
o Can identify potential leaders, isolates
o Good concurrent and predictive validity
because of high number of raters,
raters are in a good position to observe,
and observer’s opinions influence the
observed action
APPLICATIONS OF TESTING
Educational Testing




Interview and questionnaires to elicit lifehistory data
Consistently
good
predictors
of
performance
Developed through
o Criterion keying and cross-validation
Prediction and classification within a
specific educational setting
Uses educational achievement tests

Achievement tests





Biodata

o
Identification of constructs through job
analyses and surveys


Measures effects of specific program of
instruction or training
Measure effects of relatively standardized
experiences (controlled, known)
Aptitude – cumulative influence of different
learning experiences in daily living
o Measure effect of learning under
relatively uncontrolled or unknown
conditions
Ability – any measure of cognitive behavior
o Sample of what individual knows at the
time of testing
o Level of development in one or more
abilities
o Includes
both
aptitude
and
achievement
No two tests correlate perfectly with one
another
o Difference in achievement and ability
could be about over prediction or under
prediction
Is objective, uniform, and efficient
Functions:
o Reveal weaknesses in learning
o Give direction to subsequent learning
o Motivate learner


Provide means of adapting to individual
results
o Aid in evaluating teaching
o Aid in formulating educational goals
(analyze educational objectives, critical
examination of content of instruction
methods)
Item format
o Multiple choice is often used

 Promote rote memorization
 Learning of isolated facts vs.
development of problemsolving
and
conceptual
understanding
o Constructed-response / open-ended =
requires examinee to generate an
answer
o Portfolio assessment – cumulative
record of a sample of a student’s work
in various areas over a period of time
General Achievement Batteries
o Provide profiles of scores on individual
subtests or in major academic areas
o Horizontal and vertical comparisons
o Large majority have overlapping items
for different levels
o Some are concurrently normed with
aptitude tests

enable direct comparison of scores

Tests of Minimum Competency
o Ascertain mastery of basic skills
o Teacher-Made Classroom Tests
o Tests for College Level
 Used for placement and admissions
o
o
o
Graduate School Admission


special appointments
Diagnostic and Prognostic Testing
 Diagnosis of learning disabilities
 Prognostic – predict usual performance
in a course
 Teach-test-teach – how well s/he can
learn during one-to-one instruction
Assessment in Early Childhood Education
 Measure outcomes of early childhood
education
 School readiness – attainment of prerequisite skills, knowledge, attitudes,
motivation, and behavior to profit from
school instruction
 Emphasis on abilities required for
learning to rea
o



Selection and classification of personnel
Individuals should be placed in a job where
they are most qualified
Traits irrelevant to job requirements should
not affect selection decisions
Selection tests should be validated with test
performance



Global Procedures for Performance Assessment



Job Sample – task is part of work to be
performed, but all applicants operate under
uniform conditions
Simulations – reproduce functions in the job
Job analysis – identify requirements that
differentiate one job from other jobs

o Description of job activities in terms of behavioral
requirements
Occupational Testing


Identify aspects of performance that
differentiates good and poor workers
o Facilitates effective use of tests across
jobs that may seem different
Job elements – units describing critical work
requirements

Synthetic validation – it is possible to
identify skills, knowledge, and performance
requirements common to many jobs
o Job analysis to identify elements and
relative weights
o Analysis and empirical study of each
test to determine extent to which it
measures proficiency in performing job
elements
o Finding validity in each test from the
weight of these elements in the job and
in the test
Validity generalization – application of prior
validity findings to a new situation via metaanalysis
Multiple Factor Theory
o Considers behaviors under the control
of worker + environmental conditions
o Effectiveness + productivity + utility
o Any job has multiple performance
components, consisting of various
combinations of knowledge, skills, and
motivations
Tests of verbal and numerical reasoning
have some predictive validity for different
jobs. However, additional variables need to
be measured
Special aptitude tests – for testing abilities
that are “supplemental” to IQ, such as
mechanical, musical, etc.
o For abilities specific to situations, not
included in standard batteries
o Example: psychomotor tests – for
manual dexterity, motor, perceptual,
mechanical abilities
o Mechanical
aptitudes
–
rapid
manipulation
of
items,
spatial
manipulation / perception
o Clerical aptitudes – perceptual speed,
accuracy
o Computer-related aptitudes
o Social and emotional aspects of
intelligence – knowledge, skills, abilities
of examinees in interpersonal and selfmanagement
Personality Testing in the Workplace




Most relevant personality dimensions in
specific jobs
Examples:
o Emotional stability – quick decisionmaking in stressful conditions
o Agreeableness – needed for extensive
interpersonal contact
Integrity tests – applicant’s attitude toward
and history of involvement in illegal
behaviors
Leadership – ability to persuade others to
work towards a common goal
Clinical and Counseling Psychology

Individual intelligence tests, educational
tests, brief questionnaires and rating scales


For diagnostic, prognostic, and therapeutic
decisions in mental health settings
Psychological assessment – intensive study
of one or more individuals through multiple
sources of data
o Provides an integrated picture of the
individual
o Multiple sources protect against
overgeneralizing test data
o Aim:: making informed decisions
pertaining to differential diagnoses,
career selection, treatment, education,
forensic
o Continuous process of hypothesisgeneration and testing
o Involves professional judgement based
on knowledge about specific problems
of specific populations
o Ecological viewpoint – also need to
consider the context of person’s life







Explore patterns of test scores for strengths
and weaknesses
Profile analysis:
o Amount of scatter or variation among
scores
o Base rate data – frequency of such
features in normative population
o Score patterns that are typical of special
populations / clinical syndromes
Irregularities in performance suggest
avenues for exploration
Observing general behavior in the context
of testing



Neuropsychological Assessment



Intelligence Tests

Integrate test’s statistical info with human
development, personality theory, etc.
Consider both skills and extraneous
conditions
Need for supplementary information
Calls for individualized interpretation of test
performance
rather
than
uniform
application of any type of pattern analysis




Apply what is known about the brain-body
relationship for diagnosis and treatment of
brain damaged individuals
E.g. left hemisphere lesion = V < P in
Wechsler, opposite pattern In righthemisphere lesions and diffuse brain
damage
Age affects behavioral symptoms caused by
brain damage
o Amount of learning, intellectual
development
o The younger the age, the greater the
effect of brain damage on intellectual
functioning
Chronicity – amount of time elapsed since
injury will affect physiological changes and
behavioral recovery through learning /
compensation
Intellectual impairment may be an indirect
result of brain damage
Same behavior may be due to organic,
emotional, or mixed causes
Need premorbid ability level to examine the
extent of damage
Instruments: Perception of spatial relations
and memory for newly learned material
(example:Bender Visual Gestalt Motor Test)
Difficult to interpret results in terms of
score patterns
Batteries can measure all significant
neuropsychological skills:
o Detect brain damage
o Help identify and localize damaged area
o Differentiate among syndromes
o Planning
rehabilitation
through
identifying type and extent of
behavioral deficits
Identifying Learning Disabilities




Specific learning disability
o Disorder in basic psychological
processes involved in using and
understanding spoken or written
language, which manifests in imperfect
ability to listen, think, speak, read,
write, spell, do math calculations
o Does not include children whose
learning problems are a result of
economic /environmental / cultural
disadvantage
Severe discrepancy in ability and
achievement in different communication
and math skills
Not achieving in a manner commensurate
with age and ability levels, even with proper
education
Shows
normal
or
above-normal
intelligence, with difficulties in learning one
or more basic skills




Could also manifest in difficulty perceiving
and encoding information, poor integration
of input from different senses, disruption of
sensorimotor coordination
Also:
aggression,
affective
and
interpersonal problems because of
academic failures and frustration
Assessment uses different sources because:
o Various behavioral disorders are
associated with LD
o Individual differences in combination of
symptoms
o Need for specific information on nature
and extent of disability
Dynamic assessment – deliberate departure
from standardized or uniform test
administration to elicit qualitative data
o “Testing the limits” – additional cues
are provided
o Learning potential assessment – teachtest-teach
o Disadvantages:

Transportability – extent to which
it can be used by others

problem-solving
to
real-life
problems
o
o
Career Assessment





Define problem through functional analysis
of behavior
Select appropriate treatments
Assess behavior change resulting from
treatment
Procedures:
o Self-report
Integrate information from expressed
interests, preferences, and value system
Career maturity – mastery of vocational
tasks appropriate to age level
Clinical Judgment




Influenced by cultural stereotypes,
fallacious prediction principles
Used when satisfactory tests are
unavailable
Suited for cases that are rare and
idiosyncratic, frequency is too low for the
development of statistical strategies
Psychologist with low levels of cognitive
complexity are more likely to form biased
clinical judgments
The Assessment Report



Behavioral Assessment

Direct observation
Physiological measures (for anxiety,
sex, and sleep disorders)


There is no standard form or outline
Report must adapt to needs, interests, and
background of those who will receive it
Should select what is relevant to answering
questions
Concentrate on individual’s differentiating
characteristics rather than on traits on
which the individual is average
Barnum effect – pseudo-validation from
general, vague statements that apply to
most people
ETHICAL AND SOCIAL CONSIDERATIONS
1. Do no harm.
a. Provide services and use techniques in which
they have been trained
b. Choose tests that are appropriate for the
purpose and for the examinee
c. Recognize boundaries of competencies and
limitations of expertise
2. Be sufficiently knowledgeable about the science
of human behavior to guard against unwarranted
inferences in interpretation.
3. Protect safety and security of test materials.
4. Protect safety and security of examinees
a. Persons should not be subjected to testing
programs under false pretenses
b. Protect the individual’s privacy
i. Information to be asked must be relevant to
purpose
ii. Informed consent should include the purpose
of testing, data needed, and use of scores
c. Test-taker should have the opportunity to
comment on the report
i. The report should be readily understandable,
free from technical jargon and labels, and oriented
towards the immediate objective of testing
d. Records should not be released without the
knowledge or consent of the examinee, unless
mandated by law
Download