Uploaded by xdarimbuyutan

Psych-Assess-Reviewer-Chapter-1-to-6

advertisement
INTRODUCTION TO PSYCHOLOGICAL
TESTING AND ASSESSMENT
-
1905 – Alfred Binet and colleague published
-
a test designed to help place Paris school
children appropriate classes.
WWI and WWII: psychological testing for
military.
Psychological Testing
- Measuring psychology related
variables by means of devices or
procedures designed to obtain a
sample or behavior.
- Objective: obtain some gauge,
numerical in nature, with regard
to ability and attribute.
- Process: may be individual or
group in nature; adding up the
number of correct answers or the
number of certain types of
responses.
- Role of Evaluator: one tester may
be substituted for another tester
without appreciably affecting the
evaluation.
- Skill of Evaluator: technician-like
skills in terms of administering
and scoring a test, and
interpreting a test result.
- Outcome: testing yields a test
score or series of test scores.
Psychological Assessment
- Gathering and integration of
psychology related data for the
purpose
of
making
a
psychological evaluation.
-
-
-
-
Through use of tools: tests,
interviews,
case
studies,
behavioral observation.
Objective: to answer a referral
question, solve a problem, or
arrive at a decision through use of
tools of evaluation.
Process: individualized; focuses
on how an individual processes
rather than simply the results of
that processing.
Role of Evaluator: assessor is key
to the process of selecting tests
and/or other tools of evaluation.
Skill of Evaluator: educated
selection of tools of evaluation,
skill in evaluation, and thoughtful
organization and integration of
data.
Outcome:
logical
problemsolving approach.
VARIETIES OF ASSESSMENT
• Psychological Assessment
• Therapeutic Psychological
Assessment
• Educational Assessment
• Retrospective Assessment
• Remote Assessment
PROCESS OF ASSESSMENT
Referral - Initial Contact - Selection of Tools
- Formal Assessment - Report Writing Feedback Sessions
Collaborative Psychological Assessment
Therapeutic Psychological Assessment
Dynamic
Assessment:
Evaluation
–
Intervention – Evaluation
TOOLS OF PSYCHOLOGICAL ASSESSMENT
1. Psychological Tests - may differ in
content, format, administration
procedures, scoring, interpretation,
and technical quality or psychometric
soundness.
Score: code or summary statement,
not necessarily numerical in nature,
reflects
an
evaluation
of
performance on a tests, task,
interview, or some other sample of
behavior.
Scoring: process of assigning such
evaluative codes or statements to
performance on tests, tasks.
Interviews, or others.
Cut Score: reference point, usually
numerical, derived by judgement and
used to divide a set of data into two
or more classifications.
Psychometrics:
science
of
psychological measurement
Psychometrician or Psychometrist:
professional who uses, analyzes, and
interprets psychological test data
2. Interview
- Taking note of verbal and
nonverbal behavior
- May be conducted in other
formats.
- Through direct communication
involving reciprocal exchange.
- Motivational Interviewing –
therapeutic
dialogue
that
combines
person-centered
listening skills such as openness
and empathy, with the use of
3.
4.
5.
6.
7.
cognition-altering
techniques
designed to positively affect
motivation and effect therapeutic
change.
Portfolio – work products, canvas,
film, video, audio, or some medium.
Case History Data records,
transcripts, and other accounts in
written, pictorial, or other form that
preserve archival information.
Behavioral
Observation
–
monitoring actions of others by
recording
quantitative
and/or
qualitative information.
Role-play Test – acting an improvised
part in a simulated situation.
Computers/Computer-Assisted
Psychological Assessment (CAPA)
PARTIES IN THE ASSESSMENT ENTERPRISE
• Test Developers – publishers,
creators, distributors.
• Test Users – Levels A, S, B, and C.
• Test Takers – psychological autopsy
HISTORICAL, CULTURAL, LEGAL, AND
ETHICAL CONSIDERATIONS
Chinese Imperial Examinations
2200 BCE
- Music
- Archery
- Horsemanship
- Writing and Arithmetic
- Agriculture
- Geography
- Civil Law
- Military Strategy
(Song Dynasty) 960 – 1279 CE
- Tests: knowledge of classical
literature.
- Wisdom of the past.
➢ Privileges: wearing of special garb,
exemption from taxes, exemption
from
government-sponsored
interrogation by torture.
Middle Ages
- Ancient Greco-Roman writings:
indicative of attempts to
categorize people in terms of
personality
types
–
overabundance or deficiency in
some bodily fluid.
Charles Darwin
- Natural Selection (1859)
- Chance variation in species would
be selected or rejected by nature
according to adaptivity and
survival value.
- Individual Differences
Francis Galton
- Classified people according to
their natural gifts and to ascertain
their deviation from an average.
Wilhelm Max Wundt
- Experimental
Psychology
Laboratory
- General description of human
abilities with respect to variables
such as reaction time, perception,
and attention span.
James McKeen Cattell
- Coined the term mental test
(1890).
- Psychological corporation –
advancement of psychology and
the promotion of the useful
applications of psychology.
Charles Spearman
- Originating the concept of test
reliability as well as building the
mathematical framework for the
statistical technique of factor
analysis.
Victor Henri
- With Alfred Binet: how mental
tests could be used to measure
higher mental processes.
Emil Kraepelin
- Word association technique
Emil B. Titchener
- Psychological school of thought:
Structuralism.
- Coined the word empathy.
G. Stanley Hall
- First president of American
Psychological Association (APA)
Lightner Witmer
- Succeeded Cattell as director of
Psychology
Laboratory
in
Pennsylvania.
- Founded the first clinic in the US.
20TH CENTURY: MEASUREMENT OF
INTELLIGENCE
1895 – Alfred Binet and Victor Henri:
Measurement of Abilities (memory and
social comprehension).
1905 – Binet and Theodore Simon: 30-item
measuring scale of intelligence.
1939 – David Wechsler: measure adult
intelligence.
1914 – Robert S. Woodworth: emotional
fitness – Woodworth Psychoneurotic
Inventory: personality test.
Self-Report – process whereby
assesses themselves supply assessmentrelated information by responding to
questions, keeping a diary, or selfmonitoring thoughts or behaviors.
Projective Test – an individual
assumed to project onto some ambiguous
stimulus. (inkblot, drawing, photograph)
Thematic Apperception Test (TAT) –
use of pictures as projective stimuli.
CULTURE AND ASSESSMENT
Culture
- The socially transmitted behavior
patterns, beliefs, and products of
work of a particular population,
community, or group of people.
Cultural Considerations
- Culture-Specific Tests – for
people from one culture but not
from another.
- Culture-Fair Tests – to be free of
cultural bias.
LEGAL AND ETHICAL CONSIDERATIONS
Law – body of rules thought to be for the
good of society as a whole.
Ethics – body of principles of right, proper, or
good conduct.
STATISTICS
➢ Range of techniques and procedures
for analyzing, interpreting, displaying,
and making decisions based on data.
➢ Figures and facts.
➢ Central component: Math
Psychological Statistics
- Based on Psychological Data
- Psychological
constructs
or
variables.
Types of Data
• Data – set of qualitative and
quantitative values, made up of
variables.
• Variable – can be measured,
different values between individuals
or in the same individual at different
time points.
Types of Variables
• Independent Variable – can control,
not affected by the state of any other
variable in the experiment, may have
different levels.
• Dependent Variable – can be
measured.
•
Qualitative Variable – qualities, does
not imply numerical ordering.
• Quantitative Variable – measured in
terms of numbers.
• Discrete Variable – possible scores
are discrete points on the scale.
• Continuous Variable – possible
scores are continuous.
Level or Scales of Measurement
• Nominal – differentiates between
items or subjects based on
categories, CLASSIFICATIONS.
• Ordinal – ranking, spectrum of
values.
• Interval – degree of difference
between
observations,
equal
distances.
• Ratio – fixed intervals between
scores, has TRUE ZERO POINT.
Types of Statistical Analyses
• Descriptive Statistics – summarizing,
graphing, and describing quantitative
information.
• Inferential Statistics – drawing of
conclusions and generalizations,
testing hypothesis and deriving
estimates.
Describing Data
Qualitative Variables
- Frequency Tables
- Pie Charts
- Bar Charts
- Comparing Distributions using
Bar Charts
Quantitative Variables
- Histogram
- Frequency Polygon
- Commulative Frequency Polygon
- Bar Charts
- Line Graph
Shape of Distribution
• Symmetrical – can be cut down to
form 2 mirror images, never a
perfectly symmetrical distribution.
• Asymmetrical/Skewed – one of the 2
tails of the distribution is
disproportionately longer than the
other: positive or negative.
• Normal Distribution – 2 sides are
roughly the same shape, single peak:
center, 2 tails extends out equally,
bell shaped or bell curved.
• Kurtosis – degree of flatness of
peakness of a distribution, higher
peak: data is clustered around the
middle, flat: data is spread around
evenly.
Measures of Central Tendency
➢ Mean – sum of the numbers divided
by the number of numbers.
➢ Median – midpoint, counting all
numbers in the data set.
➢ Mode – most frequently occurring
value, qualitative data only.
Measures of Spread or Variability
- How spread out a group of scores
is within a distribution.
- Range – simplest measure of
variability; highest score minus
lowest score.
- Interquartile Range (IQR) –
divides data set into 4 parts (25%
of data), range of the middle 50%
of the scores in a distribution.
- Variance – average squared
difference of the scores from the
mean.
- Standard Deviation – square
root of variance.
Standard Normal Distribution
1. Symmetrical around their mean.
2. Mean, median, mode is equal.
3. Area under normal curve is equal to
1.0.
4. Denser in the center, less denser in
the tails.
5. Two parameters: mean and
standard deviation.
6. 68% of the area is one standard
deviation above and below the
mean.
7. 95% of the area is within two
standard deviations above and
below the mean.
Standard Scores
- Raw scores that has been
converted from one scale to
another scale.
- Position of test takers
performance relative to other
test takers is readily apparent.
Z Scores – 1
T Scores – 10
A Scores – 100
IQ Scores – 15
Stanine – Mean: 5, SD: 2
Correlation
- Statistical technique that is used
to measure and describe the
relationship between two
variables.
Strength of Correlation
- Correlation coefficients range
from 1.00 to -1.00.
- Closer to 1: stronger (regardless
of the sign).
Types of Correlation:
➢ Positive Correlation – one variable
increase or decreases, another
variable also increases or decreases.
➢ Negative Correlation – one variable
increases, another variable decreases.
RELIABILITY, VALIDITY, AND UTILITY
RELIABILITY
- Consistency in measurement.
- Reliability Coefficient: an index of
reliability, a proportion that
indicates the ratio between the
true score variance on a test and
the total variance.
- Proportion of the total variance
attributed to true variance.
- Greater proportion of the total
variance attributed to true
variance: more reliable.
Classical Test Theory – score on a test is
presumed to reflect not only the test taker’s
true score but also error.
Error – component of the observed test
score that does not have to do with the test
taker’s ability.
X=T+E
Variance – standard deviation squared,
statistic useful n describing sources of test
score variability.
True Variance – variance from true
differences.
Error Variance – variance from true
differences
TYPES OF ERROR
➢ Measurement Error – all of the
factors associated with the process of
measuring some variable, other than
the variable being measured.
➢ Random Error – source of error in
measuring targeted variable – by
unpredictable
fluctuations
and
inconsistencies of other variables.
➢ Systematic Error – typically constant
or proportionate to what is presumed
to be the true value of the variable
being measured.
SOURCES OF ERROR VARIANCE
➢ Test Construction
➢ Test Administration
➢ Test Scoring and Interpretation
TEST-RETEST RELIABILITY ESTIMATES
- Reliability obtained by correlating
pairs of scores from the same
people
on
two
different
administrations of the same test.
- Appropriate when evaluating the
reliability of a test that purports
to measure something that is
relatively stable over time.
- Interval between administrations
increases = correlation between
the scores obtained on each
testing decreases.
- Longer time passes = reliability
coefficient will be lower.
- Coefficient of Stability – interval
between testing is greater than 6
months.
Internal Consistency Estimate of Reliability
or Inter-Item Consistency
- Degree of correlation among all
the
items
on
a
scale
(homogeneity).
- Split-half Reliability Estimates –
correlating two pairs of scores
obtained.
SPEARMAN - BROWN FORMULA
- To estimate internal consistency
reliability from a correlation of
two halves of a test.
- Specific application of a more
general formula to estimate the
reliability of a test that is
lengthened or shortened by any
number of items.
General Spearman-Brown Formula:
Adjusted/Corrected:
Reliability increases = test length
increases
- Based on consideration of the
entire test – tend to be higher
than those based on half of a test.
- To determine the number of
items needed to attain a desired
level of reliability – new items
must be equivalent in content
and difficulty.
KUDER-RICHARDSON FORMULA
- G. Frederic Kuder and M.W.
Richardson
- 20th formula developed in a
series.
- Statistic of choice for determining
the inter-item consistency of
dichotomous items – items that
can be scored right or wrong.
-
COEFFICIENT ALPHA
- Mean of all possible split-half
correlations, corrected by the
Spearman-Brown formula.
- Appropriate for use on tests
containing
non-dichotomous
items.
Measures of Inter-Scorer Reliability
- Scorer reliability, judge reliability,
observer reliability, and interrater reliability.
- Degree of agreement or
consistency between two or
more scorers with regard to a
particular measure.
- Coding non-verbal behavior.
- Coefficient
of
inter-scorer
reliability.
• Test-retest – stability of a measure
• Alternate Forms – relationship
between different forms of a
measure.
• Internal Consistency – extent to
which items on a scale relate to one
another.
• Inter Scorer – level of agreement
between raters on a measure.
True Score Model of Measurement and
Alternatives
- Classical Test Theory (CTT) – each
test taker has a true score on a
test that would be obtained but
for the action of measurement
error.
- Domain
Sampling
Theory/Generalizability Theory –
to estimate the extent to which
specific sources of variation
under defined conditions are
contributing to the test score.
- Item Response Theory (IRT) –
item
difficulty
and
item
discrimination.
• Item Difficulty – attribute of not
being easily accomplished, solved,
or comprehended.
• Item Discrimination – degree to
which an item differentiates among
people with higher or lower levels of
the trait or ability.
VALIDITY
- Judgement or estimate of how
well a test measures what it
purports to measure in a
particular context.
- Appropriateness of inferences
drawn from test scores.
- Validation/Validation Studies –
gathering
and
evaluating
evidence about validity.
- Local Validation Studies – when
the test user plans to alter in
some
way
the
formal,
instructions, language, or content
of the test.
Trinitarian Model of Validity
- Content Validity – measure of
validity based on an evaluation of
the subjects, topics, or content
covered by the items in the test.
- Test Blueprint – a plan regarding
the types of information to be
covered by the items.
- Criterion-Related Validity –
obtained by evaluating the
relationship of scores obtained
on the test to scores on other
tests or measures.
o Concurrent Validity –
index of the degree to
which a test score is
related
to
some
criterion
measure
obtained at the same
time.
o Predictive Validity –
predicts
some
criterion measure.
Base Rate – trait,
behavior,
characteristic,
or
attributes exist in the
population.
- Hit Rate – proportion of people a
test accurately identifies as
possessing or exhibiting a
particular
trait,
behavior,
characteristic, or attribute.
- Miss Rate – those who fails to
identify as having, or not having,
a particular characteristic or
attribute, amounts to an
inaccurate prediction.
- False Positive – a miss wherein
the test predicted that the test
taker did possess the particular
characteristic or attribute being
measured when in fact the test
taker did not.
- False Negative – a miss wherein
the test you predicted that the
test taker did not possess the
particular
characteristic
or
attribute being measured when
the test taker actually did.
Validity Coefficient
- A correlation coefficient that
provides a measure of the
relationship between test scores
and scores on the criterion
measure.
- Incremental Validity – degree to
which an additional predictor
explains something about the
criterion measure that is not
explained by predictors already in
use.
Trinitarian Model of Validity
• Construct Validity – “umbrella
validity”
• Construct – informed, scientific idea
developed or hypothesized to
describe or explain behavior.
Evidence of Construct Validity
• Evidence of Homogeneity – how
uniform a test is in measuring a
single concept/construct.
• Evidence of Changes with Age – if a
test score purports to be a measure
of a construct that could be
expected to change over time, then,
the test score too should show the
same progressive changes with age
to be considered a valid measure of
the construct.
• Convergent Evidence – from a
number of sources.
• Discriminant Evidence – showing
little relationship between test
scores and/or other variables with
which scores on the test being
construct-validated should not
theoretically be correlate.
Validation Strategies
• Ecological Validity – how well a test
measures what it purports to
measure at the time and place that
the variable being measured is
actually emitted.
• Face Validity – how relevant test
items appear to be, if a test
definitely appears to measure what
it purports to measure ‘on the face
of it’.
Factor Analysis
- Class
of
mathematical
procedures designed to identify
factors or specific variables that
are
typically
attributes,
characteristics, or dimensions on
which people may differ.
o Exploratory Factor Analysis –
estimating or extracting factors,
deciding how many factors to retain,
and rotating factors to an
interpretable orientation.
o Confirmatory Factor Analysis –
testing the degree to which a
hypothetical model fits the actual
data.
o Factor Loading – testing the degree
to which a hypothetical model fits
the actual data.
Test Bias and Rating Errors
• Bias – factor inherent in a test that
systematically prevents accurate,
impartial measurement, bias implies
systematic variation.
• Rating Error – judgement resulting
from
the
intentional
or
unintentional misuse of a rating
scale.
• Leniency/Generosity Error – from
the tendency on the part of the rater
to be lenient in scoring, marking,
and/or grading.
• Severity Error - to be overly critic in
scoring, marking, and/or grading.
• Central Tendency Error – rater
exhibits a general and systematic
reluctance to giving ratings at either
the positive or the negative
extreme.
• Restriction of Range Rating Errors –
central tendency, leniency, and
severity errors.
• Rankings – requires the rater to
measure individuals against one
another instead of against an
absolute scale, used to overcome
restriction of range rating errors.
Halo Effect – tendency to give a particular
ratee a higher rating than he or she
objectively deserves because of the rater’s
failure to discriminate among conceptually
distinct and potentially independent aspects
of a ratee’s behavior.
UTILITY
-
How useful a test is.
Practical value of using a test to
aid in decision making.
Improve efficiency.
Factors that Affect a Test’s Utility
• Psychometric
Soundness
–
reliability and validity of a test.
• Costs – disadvantages, losses, or
expenses in both economic and
noneconomic terms.
• Benefits – profits, gains, or
advantages in both economic and
noneconomic terms.
Utility Analysis
- Family of techniques that entail a
cost-benefit analysis designed to
yield information relevant to a
decision about the usefulness
-
-
-
and/or practical value of tool of
assessment.
Expectancy Table or Chart –
likelihood that individuals who
score within a given range on the
predictor
will
perform
successfully on the criterion.
Taylor-Russell Tables – increase
in base rate of successful
performance that is associated
with a particular level of criterionrelated validity.
Naylor-Shine Tables – likely
average increase in criterion
performance as a result of using a
particular test or intervention,
provides a selection ratio needed
to achieve a particular increase in
criterion performance.
Cut Scores
- Cutoff Score, reference point
derived as a result of a judgement
and used to divide a set of data
into two or more classifications.
Relatively Cut Score/Norm-Referenced Cut
Score
- Set based on norm-related
considerations or with reference
to the performance of a group.
Fixed Cut Score/Absolute Cut Score
- Set with reference to a
judgement
concerning
a
minimum level of proficiency
required to be included in a
particular classification.
Multiple Cut Scores
- Two or more cut scores with
reference to one predictor for the
purpose of categorizing test
takers.
Multiple Hurdles
- One collective element of a
multistage
decision-making
process in which the achievement
of a particular cut score on one
test is necessary in order to
advance to the next stage of
evaluation in the selection
process.
Compensatory Model of Selection
- Assumption is made that high
scores on one attribute can
balance out or compensate for
low scores on another attribute.
Methods of Setting Cut Scores
• Angoff Method – by William Angoff,
can be applied to personnel
selection tasks as well as to
questions regarding the presence or
absence of a particular trait,
attribute, or ability.
• The Known Groups Method –
collection of data on the predictor of
interest from groups known to
possess, and not to possess, a trait,
attribute, or ability of interest.
• IRT-Based Methods – based in item
response theory framework.
TEST AND TESTING
ASSUMPTION 1
o PSYCHOLOGICAL
TRAITS
AND
STATES EXIST
Trait – distinguishable, relatively enduring.
State – relatively less enduring.
Construct – informed, scientific concept
developed of explain behavior.
Overt Behavior – observable action or the
product of an observable action.
ASSUMPTION 2
o PSYCHOLOGICAL
TRAITS
AND
STATES CAN BE MEASURED
- Carefully defined traits and states
to be measured.
- Considering the types of item
content.
- Appropriate ways to score the
test and interpreting the results.
- Cumulative Scoring – higher
score: higher amount of trait.
ASSUMPTION 3
o TEST-RELATED BEHAVIOR PREDICTS
NON-TEST RELATED BEHAVIORS
- Obtained sample of behavior is
used to make predictions about
future behavior.
- Postdict
–
aid
in
the
understanding of behavior that
has already taken place.
ASSUMPTION 4
o TESTS AND OTHER MEASUREMENT
TECHNIQUES HAVE STRENGTHS AND
WEAKNESSES
ASSUMPTION 5
o VARIOUS SOURCES OF ERROR ARE
PART OF THE ASSESSMENT PROCESS
- Error – long standing assumption
that factors other than what a
test attempts to measure will
influence performance on the
test.
Error Variance – component of a
test score attributable to sources
other than the trait or ability
measured.
- Classical Test Theory (CTT) – true
score theory.
ASSUMPTION 6
o TESTING AND ASSESSMENT CAN BE
CONDUCTED IN A FAIR AND
UNBIASED MANNER
ASSUMPTION 7
o TESTING AND ASSESSMENT BENEFIT
SOCIETY
-
Reliable – consistency, yields the same
numerical measurement every time it
measures the same thing under the same
conditions.
Valid – measure what it purports to
measure.
Norms
- Behavior that is usual, average,
normal, standard, expected, or
typical.
- Test performance data of a
particular group of test takers
that are designed for use as a
reference when evaluating or
interpreting
individual
test
scores.
- Norm-Referenced Testing and
Assessment – comparing to
scores of a group of test takers.
- Criterion-Referenced Testing and
Assessment – with reference to a
set of standard or criterion.
- Normative Sample – group of
people whose performance on a
-
-
particular test is analyzed for
reference in evaluating the
performance of individual test
takers.
Norming – deriving norms.
Race Norming – controversial
practice of norming on the basis
of race or ethnic background.
User Norms/Program Norms –
descriptive statistics based on a
group of test takers in a given
period of time rather than norms
obtained by formal sampling
methods.
Sampling to Develop Norms
• Test Standardization – process of
administering a test to a
representative sample of test takers
for the purpose of establishing
norms.
• Sampling – selecting the portion of
the universe deemed to be
representative of the whole
population – sample.
• Random Sampling – every member
of the population had the same
chance of being included in the
sample.
Sampling Techniques
• Probability Sampling – random
selection, equal chances of being
selected.
- Simple Random Sampling
- Systematic Sampling
- Stratified Sampling
- Cluster Sampling
- Multi-Stage Random Sampling
• Non-Probability
Sampling
–
selecting samples on the basis of
accessibility or personal judgement
of researcher.
- Convenience/Haphazard/Inciden
tal Sampling
- Purposive
or
Judgmental
Sampling
- Snowball Sampling
Types of Norms
• Percentile Norms – expression of
the percentage of people whose
score on a test or measure falls
below a particular raw score.
• Percentage Correct – distribution of
raw scores.
• Age Equivalent Scores/Age Norms –
average performance of different
samples of test takers who were at
various ages at the time the test was
administered.
• Grade Norms
• National Norms
TEST DEVELOPMENT
1. Test Conceptualization
- Conceptualizing the idea or
construct of the test.
- Through RRL.
- Emerging social phenomenon or
pattern of behavior.
2. Test Construction
- Scaling – setting rules for
assigning
numbers
in
measurement.
- Age Based Scale – function of age
Grade Based Scale – function of
grade
- Stanine Scale – all raw scores on
the test are to be transformed
into scores that can range from 1
to 9.
Scaling Methods
• Rating Scales – group of words,
statements, or symbols.
• Summative Scale – obtained by
summing the ratings across all the
items – e.g., Likert scale.
• Method of Paired Comparisons –
pairs of stimuli.
• Comparative Scaling – judgments of
a stimulus in comparison with every
other stimulus on the scale.
• Categorical Scaling – stimuli are
placed into one of two or more
alternative categories that differ
quantitatively with respect to some
continuum.
• Guttman Scale – has weak and
strong statements.
Item Format
- Form,
plan,
structure,
arrangement, and layout of
individual test items.
- Selected-Response Format – set
of alternative responses – e.g.,
multiple choices.
- Constructed-Response Format –
to supply or to create the correct
answer.
Writing Items
• Item Bank – collection of test
questions.
-
•
Computerized Adaptive Testing
(CAT) – interactive, computer
administered test-taking process.
• Item Branching – ability of the
computer to tailor the content and
order of presentation of test items
on the basis of responses to previous
items.
• Floor Effect – low end of the ability.
• Ceiling Effect – high end of the
ability.
Scoring Items
• Cumulative Model – higher score:
higher test taker is on the ability,
trait, or other characteristic that the
test purports to measure.
• Class Scoring/Category Scoring –
test taker responses earn credit
toward placement in a particular
class or category with other test
takers whose pattern of responses is
presumably similar in some way.
• Ipsative Scoring – comparing test
taker’s score.
3. Test Tryout
- Informal rule of thumb: there
should be no fewer than 5
subjects and preferably as many
as 10 for each item on the test.
- More subjects the better.
4. Test Analysis
- Item-Difficulty
Index
–
achievement tests
- Item-Endorsement
Index
–
personality tests
- Lower italic ‘P’ is used to denote
item difficulty and a subscript
refers to the item number.
Larger item-difficulty index:
easier the item.
- Item-Reliability
Index
–
indication of the internal
consistency of a test.
- Item-Validity Index – provides an
indication of the degree to which
a test is measuring what it
purports to measure.
- Needs: Item-Score Standard
Deviation
and
Correlation
between the Item Score and the
Criterion Score.
- Item-Discrimination Index –
indicates how adequately an item
separates
or
discriminates
between high scorers and low
scorers on an entire test – highest
possible value: 1.00, lowest: -1.00
- Item Characteristic Curve –
graphic representation of item
difficulty and discrimination.
Qualitative Item Analysis
- Nonstatistical procedures to
explore how individual test items
work.
5. Test Revision
- Cross Validation – revalidation of
test on a sample of test takers
other than those on whom test
performance was originally found
to be a valid predictor of some
criterion.
- Validity Shrinkage – decrease in
item validities that inevitably
occurs after cross-validation of
findings.
-
-
Co-Validation – conducted on
two or more tests using the same
sample of test takers. (conorming)
Download