Uploaded by Gewille Dynne

psychological-assessment-reviewerpdf

advertisement
PSYCHOLOGICAL ASSESSMENT
GENERAL CONCEPTS
Uses:

1. Measure differences between individuals or between reactions of the same individual under
different circumstances
2. Detection of intellectual difficulties, severe emotional problems, and behavioral disorders
3. Classification of students according to type of instruction, slow and fast learners,
educational and occupational counseling, selection of applicants for professional schools
4. Individual counseling – educational and vocational plans, emotional well-being, effective
interpersonal relations, enhance understanding and personal development, aid in decisionmaking
5. Basic research – nature and extent of individual differences, psychological traits, group
differences, identification of biological and cultural factors
6. Investigating problems such as developmental changes in the lifespan, effectiveness of
educational interventions, psychotherapy outcomes, community program impact
assessment, influence of environment on performance
Measures broad aptitudes to specific skills
Features of a psychological test:
 Sample of behavior
o Objective and standardized measure of behavior
o Diagnostic or predictive value depends on how much it is an indicator of relatively broad
and significant areas of behavior
o Tests alone are not enough – it has to be empirically demonstrated that test
performance is related to the skill set for which he or she is tested
o Tests need not resemble closely the behavior trying to be predicted
o Prediction – assumes that the performance of the individual in the test generalizes to
other situations
o Capacity – can tests measure “potential”?
 Only in the sense that present behavior can be used as an indicator of future
behavior?
o No psychological test can do more than measure behavior
 Standardization
o Uniformity of procedure when administering and scoring a test
o Testing conditions must be the same for all
o Establishing norms (normal or average performance of others who took the same test
under the same conditions)
o Raw scores are meaningless unless evaluated against suitable interpretative data
o Standardization sample – indicates average performance and frequency of deviating by
varying degrees from the average
 Indicates position with reference to all others who took the test
 In personality tests, indicates scores typically obtained by average persons
 Objective measurement of difficulty
o Objective – scores remain the same regardless of examiner characteristics
o Difficulty – items passed by the most number of people are the easiest


Reliability
o Consistency of scores obtained when retested with the same test or with an equivalent
form of test
Validity
o Degree to which the test measures what it’s supposed to measure
o Requires independent, external criteria against which the test is evaluated
o Validity coefficient – determines how closely the criterion performance can be predicted
from the test score
 Low VC – low correspondence between test performance and criterion
 High VC – high correspondence between test performance and criterion
o Broader tests must be validated against accumulated data based on different
investigations
o Validity is first established on a representative sample of test takers before it is ready for
use
 Tells us what the test is measuring
 Tells us the extent to which we know what the test measures
Guidelines in the Use of Psychological Tests
 General: prevent the misinterpretation and misuse of the test to avoid:
o Rendering test invalid; and
o Hurting the individual
 A qualified examiner needs to:
o Select, administer, score, and interpret the test
o Evaluate validity, reliability, difficulty level, and norms
o Be familiar with standardized instructions and conditions
o Understand the test, test-taker, and testing conditions
o Remember that scores obtained can only be interpreted wit reference to the specific
procedure used to validate the test
o Obtain some background data in order to interpret the score
o Obtain information on other special factors that influenced the score
 The test user is anyone who uses test scores to arrive at decisions
o Most frequent cause of misuse: insufficient or faulty knowledge about the test
 Ensure the security of test content and communication
o Need to forestall deliberate efforts to fake scores
o Need to communicate in order to:
 Dispel the mystery surrounding the test and correct prevalent misconceptions
 Present relevant data about reliability, validity, and other psychometric properties
 Familiarize test-takers about procedures, dispel anxiety, an ensure that best
performance is given
 Feedback regarding test performance
 The test administration:
o Should help predict how the client will behave outside the testing situation
o Influences specific to the testing situation introduces error variance and reduces test validity
o Examiners need to memorize exact verbal instructions, prepare test materials, and
familiarize themselves with specific testing procedure
 The testing conditions
o Suitable testing room with adequate lighting, ventilation, seating facilities, and work space
o




Implications of details during testing (e.g. improvised answer sheet, paper-and-pencil vs.
computer, familiar examiner vs. stranger)
o Need to:
 Follow testing conditions to the most minute detail
 Record unusual testing conditions
 Take testing conditions into account when interpreting test results
Some examiners may deviate from procedure to extract more information. However, scores
obtained this way can no longer be compared to the norm.
Establish rapport
o Examiner’s efforts to arouse interest in the test, elicit cooperation, and encourage them to
respond in a manner appropriate to the test objectives
o Any deviation from standard motivating conditions should be noted and used for
interpretation
o Maximizing rapport:
 Maintain a friendly, cheerful, and relaxed manner
 Consider examinee characteristics (e.g. for children, consider presenting the test as
a game, have brief test periods)
 Be sensitive to special difficulties
 Give reassurance – no one is expected to finish or to get all items correctly (every
test implies a threat to a person’s prestige)
 Eliminate surprise by:
 Explaining purpose and nature of test
 Giving general suggestions on how to take the test
 Giving and answering sample items
 Convince them that is in their own interest to obtain a valid and reliable score (e.g.
avoiding waste of time, arriving at correct decisions)
Examiner and Situational Variables
o E.g. age, sex, ethnicity, professional or socioeconomic status, training and experience,
personality characteristics
o Manner: warm vs. cold, rigid vs. normal
o Testing variables: nature of test, purpose of testing, instructions given to test-takers
o Examiner’s non-verbal behavior (e.g. facial or postural cues)
o Test-taker: activities preceding the task, receiving feedback
o In case these situations cannot be controlled, qualify this in the feedback / report
Training effects
o Coaching – close resemblance of test content and coaching material
 Leads to improvement of test scores
 BUT, since coaching is restricted to specific test content, there is low generalizability
of improvement to other criteria
o Test sophistication – repeated testing experience introduces an advantage over first-time
test-takers
 Need orientation and practice to equalize
o It’s more effective to train on broad cognitive skills such as:
 Effective problem-solving
 Analysis of problems / questions
 Consideration of alternatives, details, and implications of a solution
 Deliberate formulation of a solution

Application of high standards to evaluate performance
NORMS AND THE MEANING OF TEST SCORES
REMEMBER: In the absence of additional interpretative data, a raw score on any psychological test is
meaningless.
Norms – represents test performance of the standardization sample
The raw score is converted into a derived score, which:
 Measures relative standing in the normative sample – performance in reference to other
persons
 Permits direct comparison on different tests
 Can be expressed in terms of
o Developmental level attained; or
o Relative position within a specified group
STATISTICAL CONCEPTS
 Statistics – used to organize and summarize quantitative data to facilitate and understanding of it
 Frequency distribution – tabulating scores into class intervals and counting how often a score falling
in the class interval appears within the data
 Normal curve features:
o Largest number of cases cluster in the center of the range
o Number drops gradually in both directions as extremes are approached
o Bilaterally symmetrical – 50% of cases fall to the left and to the right of the
o Single peak in the center



Central tendency – single, most typical or representative scores to characterize the performance of
an entire group
o Mean – average; add all scores and divide by total number of cases
o Mode – most frequent score; midpoint of the class interval with the highest frequency;
highest point on the distribution curve
o Median – middlemost score when all scores have been arranged from smallest to largest
Variability – extent of individual differences around the central tendency
Range – highest and lowest score

 Deviation – difference between an individual’s score and the mean of the group (x = X - M)
Standard deviation – square root of the variance; compares the variability of different groups
o higher standard deviation means more individual differences (variation)
DEVELOPMENTAL NORMS
 Basal age – highest age at and below which all tests were passed
 Mental age – basal age + partial credits in months for tests passed above basal age-level tests
o Mental age unit shrinks correspondingly with age
 Grade equivalent – mean raw score obtained by children in each grade
o Disadvantages:
 Appropriate only for common subjects taught across grade levels (e.g. not
applicable for high school level)
 Emphasis on different subjects may vary from grade to grade
 Grade norms are not performance standards
 Ordinal scales – sequential patterning of early behavior development
o Developmental stages follow a constant order; each stage presupposes mastery of an earlier
stage
WITHIN-GROUP NORMS
 Percentile – percentage of persons who fall below a given raw score
o Indicates person’s relative position in the standardization sample
o The lower the percentile, the lower the standing
o Advantages:
 Easy to compute
 Can be easily understood
 Universally applicable
o Disadvantage: inequality of units
 Shows only the relative position but not the amount of difference between the
scores
 Standard score – individual’s distance from the mean in terms of standard deviation units
o Linear transformation – retain exact numerical relations of original raw scores
 Subtract constant, divide by constant
 Also called z-score
𝑋−𝜇
𝑧=
𝜎
o
Where X = raw score, µ = mean, and σ = standard deviation
Non-linear transformation – fit scores to any specified distribution curve (usually normal
curve)
 Normalized standard scores – distribution that has been transformed to fit the
normal curve
 Compute the percentage of persons falling at or above each raw score
 Locate percentage in the normal curve
 Obtain normalized standard scre
 Example: A score of -1 means that person surpassed approximately 16% of the
group (because the distance from -3σ to -2σ is equal to 2.14, + distance from -2σ to 1σ id 13.59 = 13.59 + 2.14 = 15.73



T-score – (normalized standard score) x 10 ± 50
 µ = 50, σ = 10
Stanine – also called “standard nine”
 µ = 5, σ = 2
Stanine Percentage
1
First 4%
2
Next 7%
3
Next 12%
4
Next 17%
5
Next 20%
6
Next 17%
7
Next 12%
8
Next 7%
9
Last 4%
Deviation IQ
o IQ = ratio of mental age to chronological age
 if IQ = 100, mental age = chronological age
𝑚𝑒𝑛𝑡𝑎𝑙 𝑎𝑔𝑒
𝐼𝑄 =
𝑐ℎ𝑟𝑜𝑛𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑎𝑔𝑒
o standard score with µ = 100 and σ = 15 (or 16, depending on the test)
o DIQ is only comparable across tests if they have the same mean and standard deviation
Relativity of Norms
 IQ should always be accompanied by the name of the test
 Individual’s standing may be misrepresented if inappropriate norms are used
 Sources of variation across tests:
o Test content
o Scale units of mean and standard deviation
o Composition of standardization sample
 Normative sample – ideally, a representative cross-section of the population for which the test is
designed
o Sample – group of persons actually tested
o Population – larger but similarly constituted group from which the sample is drawn
o Should be large enough to provide stable values
o Should be representative of the population under consideration
 Else, restrict the population to fit the sample (redefine population)
o Should consider specific influences affecting the normative sample
Anchor Norms – used to work out equivalency between tests
 Equipercentile method – scores are equivalent if they have equal percentiles on two tests (e.g. 80th
percentile in Test A = IQ of 115, 80th percentile in Test B = IQ of 120, therefore Test A’s 115 is Test
B’s 120)
Specific norms – tests are standardized to more specific populations to suit the purpose of the test
(a.k.a. subgroup, local norms)
Fixed reference group – referred to for comparability and continuity
o
o
An independently sampled group against which future test scores are compared
Updated via anchor test (or list of common items) that have items occurring in the original
reference group. Adjustments are made based on comparing frequency of correct answers
on common items of previous group and present group
DOMAIN-REFERENCED TEST INTERPRETATION
 Aka “criterion-referenced” testing
 Reference is content domain rather than a group of persons
 Tests mastery of specific content (what can the client do?)
 Content meaning – focus on what they can do vs. how they compare with others
 Should have content that is widely-recognized as important
 Should have items that sample each objective
 Best used for testing basic skills at elementary levels
 Mastery testing – if individual has or has not obtained a pre-established level of mastery
o Individual differences is of little or no importance
o Impractical for content beyond elementary skills because of differing levels of achievement,
instruction
 Tests need to have critical variables required for performance of certain functions
 Efforts should be made to address limitations of a single test score
o Cutoff should be a band of scores rather than a single score on one administration of the
test
o Should be dependent on other sources of information
o Both test construction and content experts should decide on cutoff scores
o Score should be established on empirical data
 Expectancy Table – probability of different criterion outcomes given a score
Score
Count
Lower than D C
B
A
Lower than 10 14
43
36
14
7
10-19
71
47
37
24
3
20-29
104
9
21
43
27
30 or higher
22
5
0
36
59
OR
Score
9
8
7
6
5
4
3
2
1

Percentage eliminated during training
4%
10%
14%
22%
30%
40%
53%
57%
77%
Gives general idea about the validity of the test in predicting a criterion
RELIABILITY
RELIABILITY
 Consistency of scores obtained by the same person across time, items, or other test conditions
 Extent to which individual differences in test scores represent “true” differences or chance errors
 Estimate what proportion of test score variance is error variance
o Error variance – difference in scores resulting from conditions that are irrelevant to the
purpose of the test
 No test is a perfectly reliable instrument
CORRELATION COEFFICIENT
 Expresses the degree of relationship between two scores
 Zero correlation indicates the total absence of a relationship
 Pearson Product-Moment Correlation Coefficient – accounts for individual’s position in the group
and the amount of deviation from the mean
∑ 𝑥𝑦
𝑟𝑥𝑦 = (𝑁)(𝑆𝐷 )(𝑆𝐷 )
𝑥


𝑦
Where Σxy = sum of score 1 x score 2
N = total number of cases
SDx = SD of Test X
SDy = SD of Test Y
Statistical significance – whether findings in the sample can be generalized to the population
o “significant at the .01 level” = there is only about 1 out of 100 chance that the findings in the
sample is wrong (i.e. only 1 in 100 chance that the correlation is actually 0).
o Significance level – risk of error we’re willing to take in drawing conclusions from our data
o Confidence interval – range of score under which the true score might fall given a specified
level of confidence
Reliability coefficient – use of correlation coefficient for psychometric properties
o Level of acceptable correlation coefficient = .80 - .90
TYPES OF RELIABILITY
Test-Retest Reliability
 Repeat same test on the same person on another occasion
 Test for correlation between scores on the two separate testing occasions
 Source of error variance – fluctuations in performance between the two testing occasions
 Shows how test can be generalized across situations
 Higher reliability, lower susceptibility to random changes
 Need to specify length of interval
 Interval rarely exceeds 6 onths
 Disadvantage: practice effect
 Can only be applied to tests in which performance is not affected by repetition (e.g.
sensorimotor, motor)
Alternate-Form Reliability







Same person is tested with one form on one occasion and an alternate, equivalent form on
another occasion
Test for correlation of scores on the two forms
Measure of both temporal stability and consistency of responses to tw different item samples
Source of error variance: content sampling (to what extent does performance depend on
specific items or arrangement of the test?)
Parallel forms must:
o Be independently constructed;
o Items should be expressed in the same form;
o Same type of content;
o Equivalent range and level of difficulty;
o Instructions, time limits, and sample items must be equivalent
Disadvantage: reduce but does not completely eliminate practice effect
Questionable: degree of change in the test due to repetition (e.g. insight tasks)
Split-Half Reliability
 Two scores are obtained by dividing it into equivalent halves
 Source of error variance: content sampling
 Test for coefficient of internal consistency
 Single administration of a single form
 Longer test = more reliable
 Spearman-Brown formula – for estimating the effect of shortening or lengthening the test
o Used because this type of reliability only technically computes for the reliability of half
the test
Inter-item Consistency (a.k.a. Kuder-Richardson Reliability and Coefficient Alpha)
 Single administration of a single form
 Consistency of all items in the test
 Source of error variance:
o Content sampling
o Heterogeneity of behavior
 More homogenous items, more consistency
 However, is a homogenous test appropriate for a heterogeneous psychological construct? Is the
criterion being predicted homogenous or heterogeneous?
 Unless items are highly homogenous, the KR coefficient will be lower than S-H
 Used for tests with no wrong or right answers
Scorer Reliability
 Factors excluded from error variance:
o True variance (remains in scores)
o Irrelevant factors that can be controlled experimentally
 Correlate results obtained by two separate scorers
Interpreting reliability scores:
 .85 = 85% is true variance, 15% is error variance
 Analysis of error variance:
o Reliability from delayed alternate forms = 1 - .70 = .30 (content + time)
o Reliability from split-half = 1 - .80 = .20 (content)
o Error variance due to time: .30 - .20 = .10 (time)
o Scorer Reliability = 1 - .92 = .08 (interscorer)
o Error variance = .30 + .20 + .08 = .38
o True variance = 1 - .38 = .62






Speed tests – low-difficulty test with very short time limit
Power tests – high-difficulty items with no time limit
No one can complete the tests
Reliability of speed tests cannot be measured from single administration
Reliability is affected by the range of individual differences in the group
Reliability is also affected by varying average ability level
Standard Error of Measurement
 Also expresses reliability
 Interval between which the true score may lie (obtained score ± 1 SEM)
VALIDITY
Validity
 What the test measures and how well it measures it
 What can be inferred from the test scores
 Correlation coefficient between a test score and a direct and independent measure of criterion
Content-Description Procedures (Content Validity)
 Systematic examination of the test to evaluate if it covers a representative sample of behavior to be
tested
 Content must be broadly defined to cover major objectives
 Important to consider test-taker responses, not just relevance of content
 Test specifications – content areas or topics to be covered, objectives, importance of topics, number
of items per topic
o More appropriate for achievement tests
o Does the test cover a representative sample of specified skills and knowledge?
o Is test performance free from irrelevant variables?
 Face validity – whether the test “looks valid” to test-takers and other technically untrained
observers
o Desirable feature of a test but should not be a substitute for other types of validity
Criterion-related Validity
 Indicate test’s effectiveness in predicting performance in specified activities
 Not about time, but about objective of testing
 Concurrent – used to diagnose existing status (Does person qualify or the job?)
o Criterion data is already available
 Predictive – used to predict future performance (Does person have the pre-requisites to do well in a
job?)
 Avoid criterion contamination (e.g. rater’s knowledge of test contaminates criterion ratings)

Criterion measure examples: academic achievement, performance in training, actual job
performance, contrasted groups (extremes of distribution of criterion measures); psychiatric
diagnoses, ratings by authority, correlation between new test and previously-available test
Construct Validity
 Extent to which a test measures a theoretical construct or trait
 Evidence includes research on nature of the trait and the conditions affecting development and
manifestation
 Age differentiation – used in traditional intelligence tests
 Correlation with other tests – new test measures approximately the same behavior as the previous
test
o Moderate correlation is desirable
 Factorial validity – identification of factors and determining factors that impact the scores
 Internal consistency – measure of homogeneity
o Upper criterion group vs lower criterion group – items that do not show higher scores on
upper criterion group are eliminated
 Convergent – test correlates highly with others tests that it should theoretically correlate with
 Discriminant – test does not correlate with variables it should be theoretically different from
 Pre-test and post-test scores = training is valid if, after training, failed items in pre-test were passed
during post-test
 Structural Equation Modeling (SEM) – explores relationships among constructs and the path that a
construct uses to affect criterion performance
The same test can have different purposes, leading to different evidences of validity.
Purpose of testing
Illustrative question
Type of validity
Achievement test
How much did you learn?
Content
For future performance
How well will you learn in the
Criterion – predictive
future?
Learning difficulty
Is your performance indicative
Criterion – concurrent
of a learning disorder?
Measure of ability
How well does your score relate Construct
to other indicators of ability?
Validity is built in from the outset and extends even after test dissemination.
Measurement and Interpretation of Validity
 Validity coefficient – correlation between test scores and criterion
 Conditions affecting validity:
o Demographics of the group
o Sample heterogeneity (i.e. sample was pre-selected)
o Change over time because of selection standards
o Relationship between test and criterion (linear, curved, curvilinear)
 Heterscedasticity – unequal variability in high and low scores (e.g. little variability in scores of Test A
when scores on test B are low, wider variability of scores in Test A when scores of test B are higher)
Uses of Tests for Decision-making
 Selection – either accepted or rejected




Placement – assignments to different categories based on a single score
Classification – involves two or more criteria for placement
Differential validity – test should be able to determine differences in person’s performance in
different jobs or programs (i.e. test should be able to see if person is good at Job A and not at Job B)
o Battery should include tests that are good predictors of criterion A and poor predictors of
criterion B, and vice-versa
Multiple discriminant functions – determine how closely a set of scores approximates typical scores
in a given job, diagnosis, etc. Used for:
o Criterion is unavailable but group characteristics are
o Non-linear relationship between a criterion and one or more predictors
Test bias
 Slope bias – significantly different validity coefficients in the two groups (differential validity)
 Intercept bias – systematically under- or over-predicts criterion performance for a particular group
ITEM ANALYSIS
Item Analysis
 Used to shorten test and increase its reliability and validity
 Item difficulty – percentage of people passing the item
o Items are usually arranged in increasing difficulty
o The higher the inter-item correlations, the wider the spread of difficulty should be
 Thurstone Absolute Scaling
o Find scale values of items separately within each group by converting percentage passing
into z-values
o Translate all these scale values into corresponding values for the group chosen as the
reference group
 Test score distribution must approximate the normal curve
 Item discrimination – degree to which an item differentiates correctly among test-takers in the
measured behavior
 In contrasting groups, upper and lower 27% are used
 Purpose: identify deficiencies in the test or in the teaching
 Index of discrimination (D) – difference in percentage passing of upper scorers and lower scorers
(convert number of persons passing into percentages)
 Phi coefficient – relationship between item and criterion
o What items will significantly correlate with the criterion?
1.96
𝜙0.05 =
√𝑁
Where N = total number of cases in the upper and lower groups

o Strictly applicable to dichotomous conditions
Biserial correlation – assumes a continuous and normal distribution of traits that underlie the
dichotomous item response and the criterion variable
Item Response Theory






Item-test regression – represents both item difficulty and item discrimination
o Difficulty level – 50% threshold (50% passing and 50% failing)
o Discriminative power = steeper curve, higher discriminative index
Item performance is related to an estimated amount of latent trait
Item information functions – takes all item parameters into account and shows how effieciently an
item measures behavior at different ability levels
Item parameters should not vary according to ability levels
Cross-validation – independent validation of the test separate from group on which items were
selected
o Factors affecting lowering validity across different groups:
 Size of the original item pool = number in original item pool was large and the
number retained is small, there is a higher chance that the validity of the retained
items will be spurious because of more opportunities to capitalize on chance
differences
 Size of the sample – smaller sample size has higher error variance
 Items are assembled without a theory
Differential item functioning – identify items for which persons with equal ability from different
cultural groups have different probabilities for success
o Possible reason: item does not measure the same construct in the two groups.
INTELLIGENCE
Intelligence
 Ability level at a given point in time
 Score is not indicative of the reasons behind performance
o Should be descriptive rather than explanatory
 Should not be used to label individuals but help in understanding them
 Start where they are, assess strengths and weaknesses, make interventions
 Contribute to self-understanding and personal development
 Not a single entity but a composite of several functions
o Combination of abilities required for survival and advancement within a culture
 Measures of scholastic aptitude or academic achievement
o Reflective of prior educational achievement
o Indicator of future performance
o Effective predictor of performance in various occupations and daily life activities
 Should not be the only basis for making decisions
Heritability and Modifiability
 Heritability index – how much of the variation in scores is due to genetics?
o Obtained using correlations of monozygotic and dizygotic twins
 Limitations
o Applicable to populations and not individuals – MR can still be due to a defective gene
o Limited to population characteristics at a given time
o Does not indicate modifiability
 IQ is not fixed and unchanging, it can be modified
o Changes can result from events or environmental interventions
o
Training on cognitive skills, problem-solving strategies, efficient learning habits
Motivation
 Personality is not independent from aptitude
 Aptitudes cannot be investigated independent from affect
o Prediction of subsequent performance can be enhanced by combining it with information
about motivation and attitudes
 Achievement elsewhere can help shape cognitive performance (self-concept)
Theories of Intelligence Organization
Two-Factor Theory
 Charles Spearman
 All intellectual factors share a common factor (g)
 (s) – specific factors limited to very specific abilities
 Only g accounts for the correlation of performance in two intelligence tests
 Aim of testing: measure the amount of g
 Single test that is highly saturated with g could be substituted for test with heterogeneous items
 Abstract relations are the best measures of g
 Group factor – degree of correlation that may result above and beyond g (e.g. arithmetic,
mechanical, linguistic) – common to some but not all
Multiple Factor Theories
 Thurstone
 Group factors called “primary mental abilities”
o Verbal comprehension – tests such as reading comprehension, verbal analogies, verbal
reasoning, etc.
o Word fluency – anagrams, rhyming, naming words within a category
o Number – speed and accuracy of arithmetic operations
o Space – perception of fixed spatial or geometric relations, manipulatory visualizations
o Associative Memory
o Perceptual Speed – quick and accurate grasp of visual details, similarities, and differences
o Induction / General Reasoning – find a rule and apply it to others
Structure of Intellect Model
 Guilford
 Mental abilities can be traced into underlying factors, which are categorized into three dimensions
o Operations – what the person does (memory recording and retention, divergent and
convergent production, evaluation, cognition)
o Contents – nature of materials on which operations are performed (visual, auditory,
symbolic, semantic, behavioral)
o Products – form in which information is processed (units, classes, relations, systems,
transformations, implications)


In each cell, at least one factor or ability is expected, each factor is described in all three dimensions
Contributions:
o Distinguished between operations and content
o Recognition of divergent thinking
Hierarchical Theories
 Organized factors in a hierarchy
 Reconciles single-factor model with multiple-factor models
 Broader factors have more loadings on more variables
Cattell-Horn-Carroll Theory (C-H-C)
 Catell
o Fluid intelligence - broad ability to reason, form concepts, and solve problems using
unfamiliar information or novel procedures.
o Crystallized intelligence - breadth and depth of a person's acquired knowledge, the ability to
communicate one's knowledge, and the ability to reason using previously learned
experiences or procedures.
 Carroll
o Layers that represent
 Narrow
 Broad
 General (g)
Nature and Development of Traits
 Differences in factor patterns are influenced by experiential background
 Change over time is also observed, also because of different methods in carrying out the same task




Mechanisms
o Learning set – learning through presentation of different problems of the same kind
 “learn how to learn”
o Transfer of training – formal schooling where efficient and systematic problem-solving
techniques is learned
o Co-occurrence of learning experiences – learn one, learn all in a proper environment
Processing skills tend to be specific to type of content being processed (domain specificity)
Domain – content (linguistics, mathematical) or context (cultural, social, geographical)
Intelligence tests developed are just measures of scholastic achievement. Are there tests that will
measure so-called “practical, everyday” intelligence?
ABILITY TESTS


Index of general level of performance
Often designated as tests of scholastic aptitude or academic achievement
Individually-Administerd Intelligence Tests
 Stanford-Binet Intelligence Test (5th Edition)
o Age range: 2-65 years old
o Verbal and Non-verbal
 Fluid Reasoning
 Knowledge
 Quantitative Reasoning
 Visual-Spatial Processing
 Working Memory
o Reliability: split-half, test-retest, inter-scorer
o Validity: Content, Construct (age differentiation)
 Wechsler Scales
o Wechsler Preschool and Primary Scale of Intelligence (WPPSI-R), Wechsler Intelligence Scale
for Children (WISC-IV), Wechsler Adult Intelligence Scale (WAIS-IV)
o Verbal
 Information
 Comprehension
 Similarities
 Vocabulary
o Performance
 Block Design
 Matrix Reasoning
 Visual Puzzles
 Picture Completion
 Figure Weights
o
Working Memory
 Digit Span
 Arithmetic
 Letter-Number Sequencing
o Processing Speed
 Symbol Search


 Coding
o Reliability: test-retest, interscorer
o Validity: Construct (convergent with other cognitive abilities including motor, memory,
language, attention)
Differential Ability Scales
o Measure specific abilities rather than a global IQ
 Core subtests = General Conceptual Ability
 Diagnostic subtests – relatively independent abilities
 Achievement tests
o Reliability: internal consistency, test-retest
o Validity: Criterion (w/ Wechsler, SB), Construct
Kaufman Scales
Tests for Special Populations
Infant and Preschool Testing
 Require individual administration
 Bayley Scales of Infant Development
o Assess current developmental status rather than subsequent ability
o Mental scale – sensory and perceptual, memory, learning, problem-solving, vocalization,
vocal communication, abstract thinking
o Motor scale – gross motor abilities
o Behavior rating scale – personality development: emotional and social behavior, attention
span and arousal, persistence, goal-directedness
 McCarthy Scales of Children’s Abilities
o Index of functioning at the time of testing
 Verbal
 Perceptual-Performance
 Quantitative
 Genera Cognitive
 Memory
 Motor
 Piagetian Scales
o Presuppose a uniform sequence of development through successive stages
 Object permanence
 Development of means to achieve ends
 Imitation
 Operational causality
 Object relations in space
 Development of schemata for relating to objects
Comprehensive Assessment of Mentally-Retarded Persons
 Mental retardation – substantial limitations in present functioning
o Sub-average intellectual functioning concurrent with limitations in two or more of the
following: communication, self-care, home living, social skills, community use, self-direction,
health and safety, functional academics, leisure and work

Vineland Adaptive Behavior Scale – focus on what individual habitually does rather than what s/he
can do
Testing Persons with Physical Disabilities
 Modify testing medium, time limits, content of tests
 Individualized assessment using a variety of data from different sources
 Hearing impairments – usually handicapped by verbal tests
 Visual impairments – adapting oral tests, no performance tests
 Motor impairments – may not be able to compose oral or written responses, no time limit, more
prone to fatigue
Multicultural Testing
 Language, speed removed as a parameter
 Varying test content
 Ravens’s Advanced Progressive Matrices
o Measure of ‘g’
o Requires eduction of relations among abstract items
 Culture-Fair Intelligence Test
o Cattell
o Test of fluid reasoning
o Inductive reasoning – make broad generalizations based on available data
o Think logically and solve problems in novel situations, regardless of learned intelligence
 Series - Choose which best completes the series
 Classification - Identify two figures which are in some way different from others
 Matrices - Complete the design or matrix presented
 Conditions - Select the one that duplicates the conditions given
 Goodenough-Harris Drawing Test
o Accuracy of observation and development of conceptual thinking
o Test may measure different functions at different ages
 Approaches to cross-cultural testing
1. Choose items that are common across cultures; validate against local criteria
2. Develop a test within one culture and administer it to persons with different cultural
backgrounds
3. Different tests are developed for each culture, validated, and used only within that culture
Group Tests
 Used in educational system, government service, industry military
 Typically employs multiple-choice format for uniformity and objectivity in scoring
 Increasing difficulty arranged in separately timed subtests
 Spiral-omnibus format – single long time limit, mixed items of increasing difficulty
 Advantages
o Can be administered simultaneously
o Greatly simplifies examiner’s role
o Provides more uniform testing conditions
o Scoring is more objective
o Provide better established norms

Disadvantages
o Less opportunity for rapport, maintaining cooperation and interest
o Less likely to detect extraneous interfering variables
o Examinees have restricted responses – penalized original thinkers
o Little to no opportunity for direct observations
o Lack of flexibility
Multi-level batteries
 Sample major intellectual skills found to be pre-requisite for schoolwork
 Suitable for schools for comparability across levels
 Youngest age suitable for group testing: Kindergarten / 1st grade
 Cognitive Abilities Test
o Verbal – verbal classification, sentence completion, verbal analogies
o Quantitative – quantitative relations, number series, equation building
o Nonverbal – figure classification, figure analogies, figure analysis
 Test of Cognitive Skills
o Sequences – understanding and applying rules of arrangement in patterns of figures, letters,
or numbers
o Analogies – identifying the relationship and applying the principle to select a second pair
exhibiting the same relationship
o Verbal Reasoning – identification of essential elements in objects or things, inferring
relationships between sets of words, drawing logical conclusions from verbal passages
o Memory – definitions of a set of artificial words are presented and recall is tested after
other tests have been given
Tests for Multiple Aptitudes
 Used because of:
o Intraindividual variation in performance on intelligence scales
o Tests are found to be primarily a measure of verbal comprehension
 Differential Aptitudes Test
o For educational and career counseling of grades 8-12 students
o Verbal Reasoning, Numerical Reasoning, Abstract Reasoning, Perceptual Speed and
Accuracy, Mechanical Reasoning, Space Relations, Spelling, Language Use
 Multidimensional Aptitude Battery
o Group test designed to measure the same aptitudes as WAIS-R
o Suitable for adolescents and adults, not for individuals with mental disturbance or
retardation
o Provides fully interpretable scores at the subtest level (T-score), Verbal and Performance
Level, and Overall Total Score
Verbal
Performance
Information
Digit Symbol
Comprehension
Picture Completion
Arithmetic
Spatial
Similarities
Picture Arrangement
Vocabulary
Object Assembly
Psychological Issues in Ability Testing





Nature of intelligence
o Intelligence is complex and dynamic
o Intelligence test performance is highly stable
o Intelligence develops cumulatively
Environmental contributions to Intelligence
o Environmental stability contributes to IQ stability
o Pre-requisite learning skills contribute to subsequent learning
 Functional academics + personality characteristics
o Rises or drops may occur as a result of environmental changes
Genetics and development
o Pre-school tests have moderate predictive validity, infant tests have none
o In the absence of inborn pathology, environment plays a major role in subsequent
development
o Developmental transformations – rudimentary skills at infancy are transformed with age
into more complex manifestations
o Individual differences within age level is greater than individual differences across age levels
o Changes that occur with aging varies with the individual
o Demands for adults are different from school-age children (practical vs. academic
information)
o Mean IQ of adult performance increased over the years (Flynn effect)
Research trends
o Cross-sectional analysis of IQ trends – adults have less IQ (because they received less
education)
o Longitudinal studies – scores tend to improve with age
Culture
o Cultural changes and not simply age determines rises and declines in performance
o Cultural influences will and should be reflected in test scores
o Cultural differences can become handicap when the individual moves from one culture to
another and attempts to succeed in the latter culture
PERSONALITY

Measures emotional, motivation, interpersonal, attitudinal characteristics of a person
Development of Personality Tests
 Content-related Procedures
o Obtain information regarding a psychological construct, create items consistent with that
construct
o Example: Woodworth Personal Data Sheet – information regarding psychiatric and preneurotic symptoms
 Empirical Criterion-Keying
o Development of scoring key in terms of some external criterion
o Select items that differentiate between clinical samples and normal population
o Example: if 25% or more “normal” people answered an item unfavorably, it could not be
“abnormal” since it is present in the “normal” population with such frequency
o Responses are treated as diagnostic or symptomatic of the criterion behavior with which
they are associated


Examples: Minnesota Multiphasic Personality Inventories (MMPI), California
Psychological Inventory, Personality Inventory for Children
Factor analysis
o Systematic classification of personality traits
o Example: Guilford-Zimmerman Temperament Survey
Factor
High Score
Rapid pace of activities,
General Activity (G)
Energy, Keeping in motion,
Liking for speed, Hurrying,
Quickness of action,
Enthusiasm
Serious-mindedness,
Restraint (R)
Deliberate, Persistent
effort, Self-control
Leadership, Persuasion,
Ascendance (A)
Conspicuousness, Selfdefense, Speaking with
others and in public
Having many friends and
Sociability (S)
acquaintances, Liking social
activities, Seeking limelight
Evenness of moods,
Emotional Stability (E)
interests, and energy,
composure, optimism,
feeling in good health
Tolerate hostile action,
Friendliness (F)
acceptance of domination,
Respect for others
Thoughtfulness (T)
Personal Relations (P)
Masculinity (M)
o
Reflectiveness,
meditativeness, observing
behavior in others,
philosophically-inclined,
mental poise
Tolerance of people, faith in
social institutions
Not easily disgusted,
Resistant to fear, inhibition
of emotional expressions,
little interest in clothes and
styles
Example: 16 Personality Factors
 Scores are expressed in Stanine
Low Score
Slow and deliberate pace,
Fatigability, Pausing for rest,
Low production, Liking for
slow pace, Taking time,
Slowness of action
Happy-go-lucky, Carefree,
Impulsive, Excitementloving
Submissiveness, Following,
Hesitation to speak,
Avoiding conspicuousness
Few friends and
acquaintances, Avoiding
social activities, Shyness
Fluctuation of moods,
interests, and energy,
Pessimism, Feelings of guilt,
loneliness, and worry
Belligerence, readiness to
fight, Desire to fight,
Resistance to domination,
Contempt for others
Interested in overt activity,
mental disconcertedness
Hypercriticalness of people,
fault-finding habits, selfpity, suspiciousness of
others
Easily disgusted,
sympathetic, fearful,
romantic interests,
emotional expressiveness,
much interest in clothes and
styles, dislike of vermin
Factor
Warmth (A)
High (Sten 7-9)
Warm, outgoing, attentive to others,
kindly, easy-going, participating, likes
people
Abstract-thinking, more intelligent,
bright, higher general mental capacity,
fast learner
Emotionally stable, adaptive, mature,
faces reality calmly
Reasoning (B)
Emotional Stability (C)
Dominance (E)
Liveliness (F)
Rule-Consciousness (G)
Social Boldness (H)
Sensitivity (I)
Vigilance (L)
Abstractedness (M)
Privateness (N)
Apprehension (O)
Openness to Change (Q1)
Self-Reliance (Q2)
Perfectionism (Q3)
Tension (Q4)

Dominant, forceful, assertive,
aggressive, competitive, stubborn,
bossy
Lively, animated, spontaneous,
enthusiastic, happy go lucky, cheerful,
expressive, impulsive
Rule-conscious, dutiful, conscientious,
conforming, moralistic, staid, rule
bound
Socially bold, venturesome, thick
skinned, uninhibited
Sensitive, aesthetic, sentimental,
tender minded, intuitive, refined
Vigilant, suspicious, skeptical,
distrustful, oppositional
Abstract, imaginative, absent minded,
impractical, absorbed in ideas
Private, discreet, nondisclosing,
shrewd, polished, worldly, astute,
diplomatic
Self-Assured, unworried, complacent,
secure, free of guilt, confident, selfsatisfied
Open to change, experimental, liberal,
analytical, critical, free thinking,
flexibility
Self-reliant, solitary, resourceful,
individualistic, self-sufficient
Perfectionistic, organized, compulsive,
self-disciplined, socially precise,
exacting will power, control, selfsentimental
Tense, high energy, impatient, driven,
frustrated, over wrought, time driven.
Low (Sten 1-3)
Impersonal, distant, cool, reserved,
detached, formal, aloof
Concrete thinking, lower general
mental capacity, less intelligent,
unable to handle abstract problems
Reactive emotionally, changeable,
affected by feelings, emotionally less
stable, easily upset
Deferential, cooperative, avoids
conflict, submissive, humble, obedient,
easily led, docile, accommodating
Serious, restrained, prudent, taciturn,
introspective, silent
Expedient, nonconforming, disregards
rules, self-indulgent
Shy, threat-sensitive, timid, hesitant,
intimidated
Utilitarian, objective, unsentimental,
tough minded, self-reliant, nononsense, rough
Trusting, unsuspecting, accepting,
unconditional, easy
Grounded, practical, prosaic, solution
oriented, steady, conventional
Forthright, genuine, artless, open,
guileless, naive, unpretentious,
involved
Apprehensive, self doubting, worried,
guilt prone, insecure, worrying, self
blaming
Traditional, attached to familiar,
conservative, respecting traditional
ideas
Group-oriented, affiliative, a joiner and
follower dependent
Tolerates disorder, unexacting, flexible,
undisciplined, lax, self-conflict,
impulsive, careless of social rules,
uncontrolled
Relaxed, placid, tranquil, torpid,
patient, composed low drive
5 Global Factors
o Introversion / Extraversion (A, F, H, N, Q2)
o Low Anxiety / High Anxiety (C, L, O, Q4)
o Receptivity / Tough-Mindedness (A, I, M, Q1)
o Accommodation / Independence (E, H, L, Q1)
o Lack of Restraint / Self-Control (F, G, M, Q3)
 Example: Big 5 Personality Traits
o Costa and McCrae
o Factors are descriptive, not explanatory
o Domain: OCEAN
o Facets – additional traits that identify each domain
Domain
Description
Engagement with the outer world,
Extraversion
enthusiasm, engaging with others

Agreeableness
Reflects individual differences in
concern with cooperation and social
harmony.
Conscientiousness
Concerns the way in which we control,
regulate, and direct our impulses
Neuroticism
Refers to the tendency to experience
negative feelings
Openness to
Experience
Describes a dimension of cognitive
style that distinguishes imaginative,
creative people from down-to-earth,
conventional people.
Facets
Warmth
Gregariousness
Assertiveness
Activity
Excitement-Seeking
Positive Emotions
Trust
Straightforwardness
Altruism
Compliance
Modesty
Tender-Mindedness
Competence
Order
Dutifulness
Achievement-Striving
Self-discipline
Deliberation
Anxiety
Anger Hostility
Depression
Self-Consciousness
Impulsiveness
Vulnerability
Fantasy
Aesthetics
Feelings
Actions
Ideas
Values
Personality Theories
o Biopsychosocial
 Source of reinforcement (detached, discordant, dependent, independent,
ambivalent)
 Pattern of coping behavior (active vs. passive)
 Not a general personality instrument
 Help in differential diagnoses
 Example: Millon Clinical Multiaxial Inventory
o Manifest Needs System (Henry Murray)




Results in ipsative scores = strength of need is expressed in relation too other needs
within the individual
Normative comparisons questionable
Example: Edwards Personal Preference Schedule
 Achievement - need to accomplish tasks well
 Deference - need to conform to customs and defer to others
 Order - need to plan well and be organized
 Exhibition - need to be the center of attention in a group
 Autonomy - need to be free of responsibilities and obligations
 Affiliation - need to form strong friendships and attachments
 Intraception - need to analyze behaviors and feelings of others
 Succorance - need to receive support and attention from others
 Dominance - need to be a leader and influence others
 Abasement - need to accept blame for problems and confess errors to
others
 Nurturance - need to be of assistance to others
 Change - need to seek new experiences and avoid routine
 Endurance - need to follow through on tasks and complete assignments
 Heterosexuality - need to be associated with and attractive to members of
the opposite sex
 Aggression - need to express one's opinion and be critical of others
Example: Personality Research Form and Other Jackson Inventories
 Behaviorally-oriented and mutually-exclusive definitions of 20 personality
constructs
 For prediction of behavior of individuals in normal contexts
Test-Taking Attitudes and Response Bias
 Faking
o Faking good – choosing answers that create a favorable impression
o Faking bad – choosing answers that make them appear more disturbed
o Face validity increases susceptibility to faking
 Social desirability – test-taker is unaware of putting up a “good front”
 Impression management – conscious dissembling to create a specific effect
o Avoid: forced-choice items
 Acquiescence – tendency to answer True or Yes
o Avoid: number of items keyed positively should equal number of items keyed negatively
 Deviation – tendency to give unusual or uncommon responses
Traits, States, Persons, and Situations
 Behavior can be explained by both traits, states, and their interaction
 Individuals differ in extent of altering behavior to meet the situation
 Different behavior settings influence behavior
 Trait – relatively stable
 State – transitory condition
Measuring Interests and Attitudes
Values

Difficulties with value inventories:
o Sampling systematically
o Appropriate level of abstraction
o Value domains
o Early inventories were incompatible with contemporary definitions
Interest Inventories
 Interest testing – used for educational an career assessment
o Also stimulated by occupational selection and classification
 Opinions and attitudes
o For social psychology research
o Consumer research and employee relations
 Has exploration validity – interest inventories increase behaviors needed for career exploration
o Used to introduce individual to careers that he or she has not previously considered
 Issue: Sex fairness
o Tests are validated against existing groups, it perpetuates group differences
 Example: Strong Interest Inventory
o “Like,” “Indifferent,” “Dislike” of 5 categories:
 Occupations
 School subjects
 Activities
 Leisure activities
 Day-to-day contact with various people
o Levels of Scores
 6 general occupation themes (Realistic, Investigative, Artistic, Conventional,
Enterprising, Social) – RIASEC
 25 Basic Interest
 211 Occupational Scales
o Personal Style
 Work Style
 Learning Environment
 Leadership Style
 Risk-taking
o Validity: Criterion
 Example: Jackson Vocational Interest Survey
o Work roles – what a person does on the job
o Work styles – preference for situations or environments
o 34 basic interest scales, 26 work roles, and 8 work styles
o Equally applicable to both sexes
o Validity: Construct
 Example: Kuder Occupational Interest Survey
o Uses forced-choice triad (liked most to liked least)
o 10 Broad interest areas: Outdoor, Mechanical, Computational, Scientific, Persuasive, Artistic,
Literary, Musical, Social Service, Clerical
o Grouped based on content validity
o Scores expressed as correlation between respondent scores and the interest pattern of a
particular group


Example: Self-Directed Search
o Self-administered, self-scored, self-interpreted
o Holland – occupational preferences s a choice of a way of life
 Individuals seek environments that are congruent with their personality types
 Vocational choices are implementations of self-concepts
Trends
o Expansion of occupational levels
o Effect of inventory on test-taker
o RIASEC model not a good fit for minority and other cultures
Opinion Surveys and Attitude Scales
 Attitude – tendency to react favorably or unfavorably to a stimulus (e.g. ethnic group, custom,
institution)
o Cannot be directly observed
o Should be inferred from verbal and nonverbal behavior
 Opinions – replies to specific questions
Other Assessment Techniques
Measures of Styles and Types
 Cognitive style – preferred and typical modes of perceiving, remembering, thinking, and problemsolving
 Individuals differ on how they perceive and categorize situations, which depends on prior learning
and experience
 Aptitude cannot be investigated independent of affect
o Example: Perceptual tasks are related to attitude, motivation, and emotion
o Example: Flexibility of closure = socially retiring, independent, analytical
o Example: Field dependence – extent to which their perception of what is upright is
influenced by surrounding visual field; sometimes called “cognitive control”
 Field-independent = active, participant approach to learning
 Personality types – constructs used to explain similarities and differences in preferred modes of
thinking, perceiving, and behaving across individuals
o Example: Myers-Briggs Type Indicator
 Attitude: Introversion vs. Extraversion
 Ways of Perceiving: Sensing vs. Intuition
 Ways of Judging: Thinking vs. Feeling
 Lifestyle: Judging vs. Perceiving
 All types are valuable and necessary, they each have strengths and weaknesses
 Individuals are more skilled within their preferred functions, processes, and
attitudes
o Criticism:
 Individuals should be unique
 Link between types and stereotypes
 Inadequate method for analyzing categorical data
Situational Tests
 Placing individual in a situation closely resembling a “real-life” criterion situation
 Character Education Inquiry – makes use of familiar, natural situations in one’s routine



o Measures honesty, self-control, altruism
Situational Stress Test – sample individual’s behavior in a stressful, frustrating, or emotionally
disruptive environment
Leaderless Group Discussion – group is assigned a topic for discussion. Measures verbal
communication, verbal problem-solving, and acceptance by peers
Role-playing
Self-Concepts and Personal Constructs
 How events are perceived by the individual
 Extent of self-acceptance by the individual
 Capacity to conceptualize self – capability to assume distance from one’s self and one’s impulses
o Manifests in test-taking defensiveness, response sets, social desirability
o Increases with age, education, SES, and intelligence
 Example: Washington University Sentence Completion Test – measures levels of ego development
o Prosocial, Impulsive, Self-Protective, Conformist, Self-Aware Conscientious, Individualistic,
Autonomous, and Integrated
 Self-Esteem Inventories and Others
o Self-esteem – evaluative component of self-construct; evaluation of an individual of his or
her performance
o Example: Adjective Checklist – consists of 300 adjectives and adjectival phrases commonly
used to describe a person’s attributes.
o Example: Q-Sort – give piles, arrange from “most characteristic” to “least characteristic” in a
forced-normal distribution (examiner specifies number of cards to be placed in each pile)
o Example: Semantic Differential – examines connotations of any given concept for the
individual (e.g. From a scale of 1 that means bad and 7 that means good, how would you
rate “Father”?)
 Evaluative (good-bad, valuable-worthless, clean-dirty)
 Potency (strong-weak, large-small, heavy-light)
 Activity (active-passive, fast-slow, sharp-dull)
Observer Reports
 Naturalistic observation – direct observation of spontaneous behavior in natural settings (e.g. diary
method, time sampling)
o No control is exerted over the stimulus situation
 Interview – elicit life-history data
o Can be highly-structured to unstructured
o Affords direct observation
 Ratings – evaluation of the individual based on cumulative, uncontrolled observations
o Disadvantages:
 Ambiguity
 Amount of relevant contact
 Halo effect
 Error of central tendency
 Leniency error
 Nominating technique – choose one person with whom individual would like to study, work, eat
lunch
o Can identify potential leaders, isolates
o
Good concurrent and predictive validity because of high number of raters, raters are in a
good position to observe, and observer’s opinions influence the observed’s action
Biodata
 Interview and questionnaires to elicit life-history data
 Consistently good predictors of performance
 Developed through
o Criterion keying and cross-validation
o Identification of constructs through job analyses and surveys
APPLICATIONS OF TESTING
Educational Testing
 Prediction an classification within a specific educational setting
 Uses educational achievement tests
Achievement tests
 Measures effects of specific program of instruction or training
 Measure effects of relatively standardized experiences (controlled, known)
 Aptitude – cumulative influence of different learning experiences in daily living
o Measure effect of learning under relatively uncontrolled or unknown conditions
 Ability – any measure of cognitive behavior
o Sample of what individual knows at the time of testing
o Level of development in one or more abilities
o Includes both aptitude and achievement
 No two tests correlate perfectly with one another
o Difference in achievement and ability could be about overprediction or underprediction
 Is objective, uniform, and efficient
 Functions:
o Reveal weaknesses in learning
o Give direction to subsequent learning
o Motivate learner
o Provide means of adapting to individual results
o Aid in evaluating teaching
o Aid in formulating educational goals (analyze educational objectives, critical examination of
content of instruction methods)
 Item format
o Multiple choice is often used
 Disadvantages:
 Promote rote memorization
 Learning of isolated facts vs. development of problem-solving and
conceptual understanding
o Constructed-response / open-ended = requires examinee to generate an answer
o Portfolio assessment – cumulative record of a sample of a student’s work in various areas
over a period of time
 General Achievement Batteries
o
o
o
o

Provide profiles of scores on individual subtests or in major academic areas
Horizontal and vertical comparisons
Large majority have overlapping items for different levels
Some are concurrently normed with aptitude tests
 Using same normative sample to enable direct comparison of scores
 WISC & WIAT
Tests of Minimum Competency
o Ascertain mastery of basic skills
o Teacher-Made Classroom Tests
o Tests for College Level
 Used for placement and admissions
o Graduate School Admission
 For admission and placement
 For scholarships, fellowships, and special appointments
o Diagnostic and Prognostic Testing
 Diagnosis of learning disabilities
 Prognostic – predict usual performance in a course
 Teach-test-teach – how well s/he can learn during one-to-one instruction
o Assessment in Early Childhood Education
 Measure outcomes of early childhood education
 School readiness – attainment of pre-requisite skills, knowledge, attitudes,
motivation, and behavior to profit from school instruction
 Emphasis on abilities required for learning to rea
Occupational Testing
 Selection and classification of personnel
 Individuals should be placed in a job where they are most qualified
 Traits irrelevant to job requirements should not affect selection decisions
 Selection tests should be validated with test performance
Global Procedures for Performance Assessment
 Job Sample – task is part of work to be performed, but all applicants operate under uniform
conditions
 Simulations – reproduce functions in the job
 Job analysis – identify requirements that differentiate one job from other jobs
o Identify aspects of performance that differentiates good and poor workers
o Facilitates effective use of tests across jobs that may seem different
 Job elements – units describing critical work requirements
o Description of job activities in terms of behavioral requirements
 Synthetic validation – it is possible to identify skills, knowledge, and performance requirements
common to many jobs
o Job analysis to identify elements and relative weights
o Analysis and empirical study of each test to determine extent to which it measures
proficiency in performing job elements
o
Finding validity in each test from the weight of these elements in the job and in the test
 Validity generalization – application of prior validity findings to a new situation via meta-analysis
 Multiple Factor Theory
o
o
o


Considers behaviors under the control of worker + environmental conditions
Effectiveness + productivity + utility
Any job has multiple performance components, consisting of various combinations of
knowledge, skills, and motivations
Tests of verbal and numerical reasoning have some predictive validity for different jobs. However,
additional variables need to be measured
Special aptitude tests – for testing abilities that are “supplemental” to IQ, such as mechanical,
musical, etc.
o For abilities specific to situations, not included in standard batteries
o Example: psychomotor tests – for manual dexterity, motor, perceptual, mechanical abilities
o Mechanical aptitudes – rapid manipulation of items, spatial manipulation / perception
o Clerical aptitudes – perceptual speed, accuracy
o Computer-related aptitudes
o Social and emotional aspects of intelligence – knowledge, skills, abilities of examinees in
interpersonal and self-management
Personality Testing in the Workplace
 Most relevant personality dimensions in specific jobs
 Examples:
o Emotional stability – quick decision-making in stressful conditions
o Agreeableness – needed for extensive interpersonal contact
 Integrity tests – applicant’s attitude toward and history of involvement in illegal behaviors
 Leadership – ability to persuade others to work towards a common goal
Clinical and Counseling Psychology
 Individual intelligence tests, educational tests, brief questionnaires and rating scales
 For diagnostic, prognostic, and therapeutic decisions in mental health settings
 Psychological assessment – intensive study of one or more individuals through multiple sources of
data
o Provides an integrated picture of the individual
o Multiple sources protect against overgeneralizing test data
o Aim:: making informed decisions pertaining to differential diagnoses, career selection,
treatment, education, forensic
o Continuous process of hypothesis-generation and testing
o Involves professional judgement based on knowledge about specific problems of specific
populations
o Ecological viewpoint – also need to consider the context of person’s life
Intelligence Tests
 Explore patterns of test scores for strengths and weaknesses
 Profile analysis:
o Amount of scatter or variation among scores
o Base rate data – frequency of such features in normative population
o Score patterns that are typical of special populations / clinical syndromes
 Irregularities in performance suggest avenues for exploration
 Observing general behavior in the context of testing
 Integrate test’s statistical info with human development, personality theory, etc.



Consider both skills and extraneous conditions
Need for supplementary information
Calls for individualized interpretation of test performance rather than uniform application of any
type of pattern analysis
Neuropsychological Assessment
 Apply what is known about the brain-body relationship for diagnosis and treatment of braindamaged individuals
 E.g. left hemisphere lesion = V < P in Wechsler, opposite pattern In right-hemisphere lesions and
diffuse brain damage
 Age affects behavioral symptoms caused by brain damage
o Amount of learning, intellectual development
o The younger the age, the greater the effect of brain damage on intellectual functioning
 Chronicity – amount of time elapsed since injury will affect physiological changes and behavioral
recovery through learning / compensation
 Intellectual impairment may be an indirect result of brain damage
 Same behavior may be due to organic, emotional, or mixed causes
 Need premorbid ability level to examine the extent of damage
 Instruments: Perception of spatial relations and memory for newly learned material (example:
Bender Visual Gestalt Motor Test)
 Difficult to interpret results in terms of score patterns
 Batteries can measure all significant neuropsychological skills:
o Detect brain damage
o Help identify and localize damaged area
o Differentiate among syndromes
o Planning rehabilitation through identifying type and extent of behavioral deficits
Identifying Learning Disabilities
 Specific learning disability
o Disorder in basic psychological processes involved in using and understanding spoken or
written language, which manifests in imperfect ability to listen, think, speak, read, write,
spell, do math calculations
o Does not include children whose learning problems are a result of economic /
environmental / cultural disadvantage
 Severe discrepancy in ability and achievement in different communication and math skills
 Not achieving in a manner commensurate with age and ability levels, even with proper education
 Shows normal or above-normal intelligence, with difficulties in learning one or more basic skills
 Could also manifest in difficulty perceiving and encoding information, poor integration of input from
different senses, disruption of sensorimotor coordination
 Also: aggression, affective and interpersonal problems because of academic failures and frustration
 Assessment uses different sources because:
o Various behavioral disorders are associated with LD
o Individual differences in combination of symptoms
o Need for specific information on nature and extent of disability
 Dynamic assessment – deliberate departure from standardized or uniform test administration to
elicit qualitative data
o “Testing the limits” – additional cues are provided
o
o
Learning potential assessment – teach-test-teach
Disadvantages:
 Transportability – extent to which it can be used by others
 Generalizability of taught problem-solving to real-life problems
Behavioral Assessment
 Define problem through functional analysis of behavior
 Select appropriate treatments
 Assess behavior change resulting from treatment
 Procedures:
o Self-report
o Direct observation
o Physiological measures (for anxiety, sex, and sleep disorders)
Career Assessment
 Integrate information from expressed interests, preferences, and value system
 Career maturity – mastery of vocational tasks appropriate to age level
Clinical Judgment
 Influenced by cultural stereotypes, fallacious prediction principles
 Used when satisfactory tests are unavailable
 Suited for cases that are rare and idiosyncratic, frequency is too low for the development of
statistical strategies
 Psychologist with low levels of cognitive complexity are more likely to form biased clinical judgments
The Assessment Report
 There is no standard form or outline
 Report must adapt to needs, interests, and background of those who will receive it
 Should select what is relevant to answering questions
 Concentrate on individual’s differentiating characteristics rather than on traits on which the
individual is average
 Barnum effect – pseudo-validation from general, vague statements that apply to most people
ETHICAL AND SOCIAL CONSIDERATIONS
1. Do no harm.
a. Provide services and use techniques in which they have been trained
b. Choose tests that are appropriate for the purpose and for the examinee
c. Recognize boundaries of competencies and limitations of expertise
2. Be sufficiently knowledgeable about the science of human behavior to guard against
unwarranted inferences in interpretation.
3. Protect safety and security of test materials.
4. Protect safety and security of examinees
a. Persons should not be subjected to testing programs under false pretenses
b. Protect the individual’s privacy
i. Information to be asked must be relevant to purpose
ii. Informed consent should include the purpose of testing, data needed, and use
of scores
c. Test-taker should have the opportunity to comment on the report
i. The report should be readily understandable, free from technical jargon and
labels, and oriented towards the immediate objective of testing
d. Records should not be released without the knowledge or consent of the examinee,
unless mandated by law
For Further Reading:
Anastasi, A. & Urbina, S. (1997). Psychological Testing (7th Ed.). New Jersey: Prentice Hall
Psychological Association of the Philippnes. (2008). Code of Ethics for Philippine Psychologists. Retrieved
from http://www.pap.org.ph/includes/view/default/uploads/code_of_ethics_pdf.pdf
Download