Uploaded by Hafinsor Olasid

reviewer-psychological-assessment-chapter-1-10 compress

advertisement
Psychological Assessment Reviewer
LEGEND:
Main Topic
❖ Primary Terms
➢ Secondary-terms
▪ Definition
• Examples
 Additional Notes
Chapter 1
Psychological Assessment VS Psychological Testing
❖ Psych Assessment
▪ the gathering and integration of psychology-related data for the
purpose of making a psychological evaluation.
▪ answer a referral question, solve a problem, or arrive at a decision
through the use of tools of evaluation.
▪ Assessor is the key to the process or evaluation
❖ Psych Testing
▪ the process of measuring psychology-related variables and
designed to obtain a sample of behavior.
▪ obtain some gauge, usually numerical in nature, with regard to an
ability or attribute.
▪ The tester does not affect the process or evaluation
Varieties of assessment
❖ Retrospective Assessment
▪ draw conclusions about psychological aspects of a person as they
existed at some point in time prior to the assessment.
❖ Remote Assessment
▪ the draw conclusions about a subject who is not in physical
proximity to the assessor.
❖ Ecological momentary assessment
▪ evaluation of specific problems and related cognitive and
behavioral variables at the very time and place they occur.
❖ Educational assessment
▪ to evaluate abilities and skills relevant to success or failure in a
school or pre-school context.
❖ Collaborative Assessment
▪ Assessor and assessee may work as partners from initial contact
through final feedback.
❖ Therapeutic psychological assessment
▪ Therapeutic self-discovery and new understandings are
encouraged throughout the assessment process.
❖ Dynamic assessment
▪ Interactive approach to psychological assessment that usually
follows a model of evaluation, intervention of some sort, and
evaluation.
The Tools of Psychological Assessment
❖ Test
▪ Defined simply as a measuring device or procedure.
➢ Psychological test tinyurl.com/yco7ur28
▪ Refers to a device or procedure designed to measure variables
related to psychology.
❖ Format
▪ The form, plan, structure, arrangement, and layout of test items as
➢ Panel Interview
▪ More than one interviewer partic
➢ Motivational Interview
▪ Dialogue that combines person-c
openness and empathy, with the
techniques designed to positively
therapeutic change.
Case History Data
❖ Case History Data
▪ Records, transcript, and other acc
other form.
❖ Case Study (Case History)
▪ Report about a person or event t
case history data.
❖ Groupthink
▪ Arises as a result of varied forces
reach a consensus.
Behavioral Observation
❖ Behavioral Observation
▪ Monitoring the actions of others
mean while recording quantitativ
regarding those actions.
❖ Naturalistic Observation
▪ Research participants are observe
Unlike lab experiments.
Role-Play Tests
❖ Role Play
▪ Acting an improvised or partially
situation.
➢ Role-play test
▪ is a tool of assessment wherein a
they were in particular situation.
Computer as Tools
❖ Local Processing
▪ Scoring done on-site.
❖ Central Processing
▪ Scoring conducted at some centr
❖ Teleprocessing
▪ when test-related data are sent u
➢ Interpretive Report
▪ A formal or official computer-gen
performance presented in both n
including an explanation of the fi
▪ The three varieties of interpretive
and consultative; contrast with sc
report
➢ Consultative Report
▪ A type of interpretive report desi
detailed analysis of test data that
consultant.
➢ Integrative Report
▪ Includes medication records or b
the test report.
❖ CAT (computer adaptive testing)
▪
well as to related considerations such as time limits.
Computerized, pencil-and-paper, or some other form.
❖ Score
▪ Code or summary statement usually but not necessarily numerical
▪
Computer-based test that adapts
Also called tailored testing.
 ADDITIONAL: even a dec
id d
How Are Assessments Conducted?
▪ Test administrator must be familiar with the test materials and
procedures.
➢ Protocol
▪ refer to a description of a set of tests- or assessment-related
procedures.
➢ Rapport
▪ establishing a working relationship between the examiner and the
examinee
➢ Level C
▪ Tests and aids that require subst
and supporting psychological fie
experience in the use of these de
tests, individual mental tests).
▪ (Individually- administered tests o
and projective methods) test use
Master's degree in Psychology, o
under a licensed psychologist.
❖ Computerized test administration, scori
▪ Computer-assisted psychologica
Chapter 2
▪ Any application of computers to t
A Historical Perspective
scoring or interpretation of tests,
▪ It is believed that test and testing came from China as early as 2200
questionnaires used in education
B.C.E
▪ number of psychological tests can
▪ In ancient China, passing test examination gives the passer
administered and scored online.
government job or a particular benefit like official position or
entitlement to wear a special garb.
❖ Major issues with CAPA:
Twentieth Century
▪ Computer-administered test may
❖ The measure of Intelligence
▪ Comparability of pencil-and-pape
▪ In 1905, Alfred Binet and Theodore Simon published a 30-item
tests.
measuring of intelligence to help identify Paris schoolchildren with
▪
Thousands
of words are spewed o
intellectual disability.
interpretation results, but the va
is questionable.
➢ In 1939, David Weschler introduced a test to measure adult intelligence.
▪
Unprofessional,
unregulated “psy
Wechsler Adult Intelligence Scale (WAIS)
contribute to more public skeptic
The Rights of Test takers
❖ The measurement of Personality
❖ The right of Informed Consent
➢ Woodworth Psychoneurotic Inventory
❖ The right to be informed of test findings
▪ The measure of personality
❖ The right to privacy and confidentiality
▪ After the WW1, Woodworth introduced developed a personality
➢ Privacy Right
test for civilian use that was based on the personal data sheet.
▪ “Recognizes the freedom of the i
▪ WPI or Woodsworth Psychoneurotic Inventory was the first widely
himself the time, circumstances,
used self-report measure of personality.
which he wishes to share or with
beliefs, behavior, and opinions”
❖ Self-report
➢
Confidentiality
▪ Refers to a process whereby the test taker themselves supply
▪ Confidentiality concerns matters
assessment-related information by responding to questions,
courtroom, privilege protects clie
keeping a diary, or self-monitoring thoughts or behaviors.
proceedings.
❖ Projective test
▪ A psychological test in which words, images, or situations are
presented to a person and the responses analyzed for the
unconscious expression of elements of personality that they reveal.
Culture and Assessment
❖ Culture
▪ The socially transmitted behavior patterns, beliefs, and products of
work of a particular population, community, or group of people.
➢ Culture-specific tests
▪ Test designed for use with people from one culture but not from
another.
➢ Individualist culture
▪ Is characterized by value being placed on traits such as selfreliance, autonomy, independence, uniqueness, and
competitiveness.
▪ Dominant culture in United States and Great Britain
➢ Collectivist culture
▪ Value being placed on traits such as conformity, cooperation,
interdependence, and striving toward group goals.
▪ Dominant culture in Asia, Latin America, and Africa.
❖ The right to the least stigmatizing label.
❖ Advise that the least stigmatizing labels s
reporting test results
 Read about it more on “
CHAPTER 3
Scales of Measurement
❖ Measurement
▪ Act of assigning numbers or symb
(people, events, whatever) accor
❖ Scale
▪
A system of ordered numerical or
occurring at fixed intervals, used
measurement.
▪ A set of number or symbols whos
properties of the objects to whic
➢ Continuous Scale
▪ Measure a continuous variable
▪ Exists when it is theoretically pos
the scale.
➢ Discrete Scale
▪ Measure a discrete variable
CONTINUE READING:
❖ Test and Group Membership
❖ Legal and Ethical Considerations
▪
❖ Error
Discrete variable values can be o
▪
Most frequently used in psychology
• Kerlinger (1973) said, “Intelligence, aptitude, and personality
test scores are, basically and strictly speaking, ordinal. These
tests indicate with more or less accuracy not the amount of
intelligence, aptitude, and personality traits of individuals, but
rather the rank-order positions of the individuals.”
❖ Interval Scales
▪ Contain equal intervals between numbers.
▪ Numerical
▪ Contains no absolute zero point or fixed beginning.
❖ Ratio Scales
▪ Contains true zero point
▪ Numerical and Informative
▪ Permits not only addition, subtraction, and multiplication but also
division
▪ You can tell that there is a fixed beginning.
❖ Describing Data
➢ Distribution
▪ A sets of test scores arrayed for recording or study.
➢ Raw Scores
▪ A straightforward, unmodified accounting of performance that is
unusually numerical.
❖ Frequency Distributions
▪ All scores are listed alongside the number of times each score
occurred.
▪ The score might be in tabular or graphic form.
❖ Grouped frequency distribution
▪ Replace the actual test scores
▪ Described as an indicator of how many times each variable value
occurs in a set of grouped observations.
❖ Histogram
▪ A graph with vertical lines drawn at the true limits of each test
score (or class interval), forming a series of contiguous rectangles.
▪
▪
a statistic that indicates the avera
extreme scores in a distribution.
The mean is a measure of centra
ratio level of measurement, the
tendency that takes into accoun
ordinal in nature, and the mode
that is nominal in nature.
❖ Arithmetic Mean
▪ Denoted by the symbol X (and pr
sum of the observations (or test s
number of observations.
▪ Formula:
❖ The Median
▪ The middle score in a distribution
▪ When the total number of scores
the median can be calculated by
of the two middle scores.
❖ The Mode
▪ Most frequently occurring score
➢ Bimodal distribution
▪ Probability distribution with two
Measures of Variability
❖ Variability
▪ An indication of how scores in a d
dispersed.
▪
two or more distributions of test
even though differences in the di
mean can be wide.
❖ The Range
▪ A distribution is equal to the diffe
the lowest scores.
❖ The interquartile and semi-interquartile
▪ A distribution of test scores (or an
be divided into four parts such th
each quarter.
❖ Bar Graph
▪ Numbers indicative of frequency also appears on the Y-axis, and
reference to some categorization (e.g., yes/no/maybe,
male/female) appears on the X-axis.
❖ Interquartile range
▪ A measure of variability equal to
Q1.
▪ It is an ordinal statistic
❖ Variance
▪ Equal to the arithmetic mean of the squares of the differences
between the scores in a distribution and their mean.
• Formula:
▪
▪
From raw scores, first calculate the summation of the raw scores
squared, divide by the number of scores, and then subtract the
mean squared.
Formula
Skewness
▪ The nature and extent to which symmetry is absent.
▪ An indication of how the measurements in a distribution are
distributed.
❖ Positive Skew
▪ A distribution has a positive skew when relatively few of the scores
fall at the high end of the distribution.
• High left declining to right
▪ Positively skewed examination results may indicate that the test
was too difficult.
❖ Negative Skew
▪ When relatively few of the scores fall at the low end of the
distribution.
• Low left rising to right
▪ Negatively skewed examination results may indicate that the test
was too easy.
❖ T scores
▪ can be called a fifty plus or minus
mean set at 50 and a standard de
▪ Used to describe how far from th
the data follow a t-distribution.
❖ Stanine
▪ A method of scaling test scores o
a mean of five and a standard dev
❖ Linear Transformation
▪ One that retains a direct numeric
score.
❖ Non-linear Transformation
▪ The resulting standard score does
numerical relationship to the orig
Correlation and Inference
❖ Coefficient of correlation (or correlation
▪ A number that provides us with a
relationship between two things.
Concept of Correlation
▪ An expression of the degree and
between two things
▪ Coefficient of correlation (r) expr
between two (and only two) var
nature.
• If a correlation coefficient ha
relationship between the tw
perfect—without error in th
❖ Positive Correlation
▪ Exists when two variables simulta
Kurtosis
▪
▪
Use to refer to the steepness of a distribution in its center.
To describe the peakedness/flatness of three general types of
curves
❖ Platykurtic (relatively flat)
❖ Leptokurtic (relatively peak)
❖ Mesokurtic (somewhere in the middle)
❖ Negative Correlation
▪ Occurs when one variable increa
decreases.
• If a correlation is zero, then
between the two variables.
The Pearson R
▪ Devised by Karl Pearson, r can b
when the relationship between th
the two variables being correlate
theoretically take any value).
▪
▪
The value obtained for the coeffic
interpreted by deriving from it w
determination, or;
The coefficient of determination
variance is shared by the X- and t
The Spearman Rho
▪ Developed by Charles Spearman
Normal Curve
coefficient of correlation is frequ
▪ A bell-shaped, smooth, mathematically defined curve that is
is small (fewer than 30 pairs of m
highest at its center.
when both sets of measurements
▪ From the center it tapers on both sides approaching the X-axis
form.
asymptotically (meaning that it approaches, but never touches, the
Graphic Representations of Correlation
axis).
❖ Scatterplot
▪ The curve is perfectly symmetrical, with no skewness
▪ Useful in revealing the presence o
▪
❖ O li
Curvilinearity in this context refe
curved a graph is.
CHAPTER 4
Assumption 1: Psychological Traits and States Exist
❖ Trait
▪ “Any distinguishable, relatively enduring way in which one
individual varies from.
▪ Tends to be a more stable and enduring characteristic or pattern of
behavior.
❖ States
▪ Distinguish one person from another but are relatively less
enduring (Chaplin et al., 1988)
▪ A state is a temporary way of being (i.e., thinking, feeling,
behaving, and relating)
❖ Construct
▪ An informed, scientific concept developed or constructed to
describe or explain behavior.
▪ We can’t see, hear, or touch constructs, but we can infer their
existence from overt behavior
❖ Overt Behavior
▪ Refers to an observable action or the product of an observable
action, including test- or assessment-related responses.
Assumption 2: Psychological Traits and States Can Be Quantified and
Measured
▪ Defined constructs that need to be measured
▪ Consider the item (that supposed to be indicative of the construct,
traits, or stats that is being measured) to be included in the test.
❖ Cumulative Scoring
▪ Represent the strength of the targeted ability or trait or state.
▪ The assumption that the more the test taker responds in a
particular direction as keyed by the test manual as correct or
consistent with a particular trait, the higher that test taker is
presumed to be on the targeted ability or trait.
Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior
▪ Some tests are used not to predict future behavior but to postdict
it. To understand behavior that has already taken place.
▪ the objective of the test is to provide some indication of other
aspects of the examinee’s behavior.
Assumption 4: Tests and Other Measurement Techniques Have Strengths and
Weaknesses
▪ Test user should understand the test that they are going to use.
Assumption 5: Various Sources of Error Are Part of the Assessment Process
▪ error traditionally refers to something that is more than expected;
it is actually a component of the measurement process.
▪ error refers to a long-standing assumption that factors other than
what a test attempts to measure will influence performance on the
test.
Assumption 6: Testing and Assessment Can Be Conducted in a Fair and
Unbiased Manner
▪ One source of fairness-related problems is the test user who
attempts to use a particular test with people whose background
and experience are different from the background and experience
of people for whom the test was intended.
▪
▪
evaluating an individual test take
to scores of a group of test takers
The singular is used in the schola
that is usual, average, normal, st
In a psychometric context, norms
a particular group of test takers
reference when evaluating or in
❖ Normative Sample
▪ Group of people whose performa
for reference in evaluating the pe
takers.
❖ Norming
▪ The process of deriving norms.
❖ Race norming
▪ The controversial practice of norm
background.
❖ Program Norms/ User
▪ Which “consist of descriptive stat
takers in a given period of time ra
formal sampling methods”.
Sampling to Develop Norms
❖ Test Standardization
▪ The process of administering a te
test takers for the purpose of est
▪ a test is said to be standardized w
procedures for administration an
normative data.
❖ Sampling
▪ The process of selecting the port
representative of the whole popu
❖ Sample
▪ A portion of the universe of peop
of the whole population.
➢ Stratified Sampling
▪ Method of sampling from a popu
into subpopulations.
➢ Purposive Sampling
▪ A sampling technique which relie
members of population to partici
selected.
➢ Convenience Sample/Sampling
▪ One that is convenient or availab
❖ Developing Norms for Standardized Test
▪ The test developer administers t
set of instructions that will be us
Assumption 7: Testing and Assessment Benefit Society
▪
The
test developer also describes
▪ Imagine a world without a test, how can we identify the
giving
the test. It has to be consi
proficiency, skill, and intelligence of people like Bong-bong Marcos
and other officials that they are fit for presidency.
▪ Dapat ‘yung instruction at condit
Good Test?
magmula normative sample han
▪ Logically, the criteria for a good test would include clear
Types
of
Norms
instructions for administration, scoring, and interpretation.
▪
Most of all, a good test would seem to be one that measures what
it claims to measure.
❖ Percentiles
▪ An expression of the percentage
or measure falls below a particul
t d
th t f t
A
through sixth grades).
❖ Developmental Norms
▪ a term applied broadly to norms developed on the basis of any
trait, ability, skill, or other characteristic that is presumed to
develop, deteriorate, or otherwise be affected by chronological
age, school grade, or stage of life.
❖ National Norms
▪ Derived from a normative sample that was nationally
representative of the population at the time the norming study
was conducted.
❖ National Anchor Norms
▪ provide some stability to test scores by anchoring them to other
test scores.
❖ Equipercentile Method
▪ The equivalency of scores on different tests is calculated with
reference to corresponding percentile scores.
❖ Subgroup Norms
▪ A normative sample can be segmented by any of the criteria
initially used in selecting subjects for the sample
❖ Local Norms
▪ provide normative information with respect to the local
population’s performance on some test.
▪ Typically developed by test users themselves
Fixed Reference Group Scoring Systems
▪ The distribution of scores obtained on the test from one group of
test takers
▪ Used as the basis for the calculation of test scores for future
administrations of the test.
Norm-Referenced Versus Criterion-Referenced Evaluation
❖ Norm-Referenced
▪ One way to derive meaning from a test score is to evaluate the
test score in relation to other scores on the same test.
❖ Criterion
▪ As a standard on which a judgment or decision may be based.
▪
Sometimes referred to as “noise,
from one testing situation to ano
that would systematically raise o
❖ Systemic Error
▪ Refers to a source of error in mea
constant or proportionate to wha
of the variable being measured.
▪ systematic source of error does n
▪ Once a systematic error becomes
predictable—as well as fixable.
Sources of Error Variance
❖ Test construction
➢ Item Sampling/Content sampling
▪ Terms that refer to variation am
to variation among items betwe
▪ Consider two or more tests desig
personality attribute, or body of
to be found in the way the items
content sampled.
❖ Test administration (Read in the Book)
▪ Sources of error variance that occ
influence the test taker’s attentio
▪ Test taker’s reactions to those inf
kind of error variance.
• Test environment: room temp
amount of ventilation and no
➢ Test taker variables:
▪ Pressing emotional problems, ph
and the effects of drugs or medic
variance.
➢ Examiner-related variables:
▪ examiner’s physical appearance
presence or absence of an exami
❖ Test scoring and interpretation
▪ Scorers and scoring systems are
variance.
▪ The advent of computer scoring
objective, computer-scorable ite
❖ Criterion-referenced testing and assessment
variance caused by scorer differe
▪ Method of evaluation and a way of deriving meaning from test
❖ Other sources of error
scores by evaluating an individual’s score with reference to a set
▪ Surveys and polls are two tools o
standard.
researchers who study public op
Reliability Estimates
❖ Test-retest Reliability Estimates
CHAPTER 5 (read the book for example)
▪ One way of estimating the reliab
Reliability
by using the same instrument to
▪ in the language of psychometrics reliability refers to consistency in
points in time is called Test-retes
measurement.
▪ and the result of such an evaluat
reliability.
❖ Reliability coefficient
• When the interval between t
▪ An index of reliability, a proportion that indicates the ratio
the estimate of test-retest re
between the true score variance on a test and the total variance.
coefficient of stability.
The Concept of Reliability
Parallel-Forms
and
Alternate-Forms Reliabili
❖ Variance (σ2)
❖ Coefficient of Equivalence
▪ The expected value of the squared variation of a random variable
▪ The degree of the relationship b
from its mean value, in probability and statistics.
can be evaluated by means of an
▪ A measure of how far a set of data (numbers) are spread out from
coefficient of reliability
their mean (average) value.
▪ A measure of how data points differ from the mean.
❖ Parallel Forms
❖ Alternate
simply differe
test
exist
when,
for
each
➢ True Variance
▪
Variance from true differences
➢ Error variance
form of the test, the
means and the variances
of observed test scores
of a test that
constructed s
parallel
Split-half Reliability Estimates
❖ Split-half Reliability
▪ Statistical method used to measure the consistency of the scores
of a test.
▪ Obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once.
 CHECK EXAMPLE IN THE BOOK
The Nature of the Test
1) the test items are homogeneous or heter
2) the characteristic, ability, or trait being m
dynamic or static;
3) the range of test scores is or is not restric
4) the test is a speed or a power test; and
5) the test is or is not criterion-referenced.
❖ Spearman-Brown Formula
❖ Homogeneity versus heterogeneity of te
▪ Psychometric reliability to test length and used by
▪ Tests designed to measure one fa
psychometricians to predict the reliability of a test after changing
trait, are expected to be homoge
the test length.
▪ Because the reliability of a test is affected by its length, a formula is
▪ For such tests, it is reasonable to
consistency.
necessary for estimating the reliability of a test that has been
▪
if
the test is heterogeneous in ite
shortened or lengthened.
consistency might be low relative
 CHECK EXAMPLE IN THE BOOK
of test-retest reliability.
Other Methods of Estimating Internal Consistency
❖ Inter-item consistency
❖ Dynamic versus static characteristics
▪ The degree of correlation among all the items on a scale.
➢ Dynamic
▪ Calculated from a single administration of a single form of a test.
▪ A trait, state, or ability presumed
▪ useful in assessing the homogeneity of the test.
of situational and cognitive exper
• Tests are said to be homogeneous if they contain items that
measure a single trait.
➢ Static
• The more homogeneous a test is, the more inter-item
▪ One in which hourly assessments
consistency it can be expected to have.
made on a trait, state, or ability p
▪ heterogeneity describes the degree to which a test measures
unchanging, such as intelligence.
different factor.
• test takers with the same score on a more heterogeneous test
❖ Restriction or inflation of range
may have quite different abilities.
➢ Restriction of Range/Variance
▪ If the variance of either variable i
❖ The Kuder–Richardson formulas
restricted by the sampling proced
▪ Checks the internal consistency of measurements with
correlation
coefficient tends to b
dichotomous choices.
▪ The statistic of choice for determining the inter-item consistency
➢ Inflation of Range/Variance
of dichotomous items, primarily those items that can be scored
▪ If the variance of either variable i
right or wrong (such as multiple-choice items).
inflated by the sampling procedu
▪ If test items are more heterogeneous, KR-20 will yield lower
coefficient
tends to be higher.
reliability estimates than the split-half method.
▪
may be used if there is reason to assume that all the test items
have approximately the same degree of difficulty.
 CHECK EXAMPLE IN THE BOOK
❖ Coefficient Alpha
▪ One way to quantify reliability and represents the proportion of
observed score variance that is true score variance.
▪ Developed by Cronbach (1951)
▪ Appropriate for use on tests containing no dichotomous items.
 CHECK EXAMPLE IN THE BOOK
❖ Average Proportional Distance
▪ New measure for evaluating the internal consistency of a test.
▪ a measure used to evaluate the internal consistency of a test that
focuses on the degree of difference that exists between item
scores.
 CHECK EXAMPLE IN THE BOOK
Measures of Inter-Scorer Reliability
❖ Inter-scorer reliability
▪ The degree of agreement or consistency between two or more
scorers (or judges or raters) with regard to a particular measure.
▪ If the reliability coefficient is high, the prospective test user knows
❖ Speed tests versus power tests
➢ Power Test
▪ Long time limit and some items a
able to obtain a perfect score.
➢ Speed Test
▪ Time limit on a speed test is esta
test takers will be able to comple
▪ A reliability estimate of a speed t
performance from two independ
the following: (1) test-retest relia
reliability, or (3) split-half reliabili
tests.
❖ Criterion-referenced tests
▪ Designed to provide an indicatio
with respect to some variable or
or a vocational objective.
▪ Uses test scores to generate a sta
can be expected of a person with
▪ Scores on criterion- referenced t
pass–fail (or, perhaps more accu
terms, and any scrutiny of perfor
to be for diagnostic and remedial
▪
that test scores can be derived in a systematic, consistent way by
various scorers with sufficient training.
Often used when coding nonverbal behavior.
 CHECK EXAMPLE IN THE BOOK
The True Score Model of Measurement and A
❖ Classical test theory (CTT)
Th t
( l i l)
d
➢ Generalizability theory
▪ Based on the idea that a person’s test scores vary from testing to
testing because of variables in the testing situation
▪ Cronbach encouraged test developers and researchers to describe
the details of the particular test situation or universe leading to a
specific test score.
▪ This universe is described in terms of its facets, which include
things like the number of items in the test, the amount of training
the test scores have had, and the purpose of the test
administration.
▪ According to generalizability theory, given the exact same
conditions of all the facets in the universe, the exact same test
score should be obtained. This test score is the universe score, and
it is, as Cronbach noted, analogous to a true score in the true
score model.
➢ Generalizability Study
▪ Examines how generalizable scores from a particular test are if the
test is administered in different situations
➢ Coefficients of Generalizability
▪ It represents the influence of particular facets on the test score.
➢ Decision Study
▪ Developers examine the usefulness of test scores in helping the
test user make decisions.
▪ Designed to tell the test user how test scores should be used and
how dependable those scores are as a basis for decisions,
depending on the context of their use.
❖ Item response theory (IRT)
▪ A paradigm for the design, analysis, and scoring of tests,
questionnaires, and similar instruments measuring abilities,
attitudes, or other variables.
▪ Also known as the latent response theory refers to a family of
mathematical models that attempt to explain the relationship
between latent traits (unobservable characteristic or attribute) and
their manifestations (i.e., observed outcomes, responses or
performance).
CHAPTER 6
Validity
▪
▪
Estimate of how well a test mea
measure in a particular context.
Judgment based on evidence abo
inferences drawn from test score
❖ Inference
▪ A logical result or deduction.
❖ Validation
▪ The process of gathering and eva
▪ It is the test developer’s respons
in the test manual
❖ Local validation studies
▪ Are absolutely necessary when th
way the format, instructions, lang
One way measurement specialists have t
is according to three categories:
❖ Content validity
▪ This is a measure of validity based
topics, or content covered by the
❖ Criterion-related validity
▪ This is a measure of validity obtai
of scores obtained on the test to
measures.
❖ Construct validity
▪ This is a measure of validity that
comprehensive analysis of:
▪ how scores on the test relate to o
and
▪ how scores on the test can be un
framework for understanding the
designed to measure.
❖ Ecological Validity
▪ A judgment regarding how well a
measure at the time and place th
(typically a behavior, cognition, o
▪ the greater the ecological validity
procedure, the greater the gener
❖ Polytomous Test Items
results to particular real-life circ
▪ Test items or questions with three or more alternative responses, Face Validity
where only one is scored correct or scored as being consistent with
▪ A test can be said to have face v
a targeted trait or other construct.
to measure what it is supposed
Reliability and Individual Scores
• If a test is prepared to measu
▪ The reliability coefficient helps the test developer build an
multiplication, and the peopl
adequate measuring instrument, and it helps the test user select a
that it looks like a good test o
suitable test.
demonstrates face validity of
▪ By employing the reliability coefficient in the formula for the
• A test’s lack of face validity co
standard error of measurement, the test user now has another
confidence in the perceived e
descriptive statistic relevant to test interpretation, this one useful
consequential decrease in th
in estimating the precision of a particular test score.
motivation to do his or her be
The Standard Error of Measurement
• In a corporate environment,
▪ Used to estimate or infer the extent to which an observed score
unwillingness of administrato
deviates from a true score.
use of a particular test
▪ Provides an estimate of the amount of error inherent in an
Content Validity
observed score or measurement.
▪ Describes a judgment of how ade
❖ Dichotomous Test Items
▪ Test items or questions that can be answered with only one of two
alternative responses, such as true–false, yes–no, or correct–
incorrect questions
▪
▪
Often abbreviated as SEM
The higher the reliability of a test (or individual subtest within a
test), the lower the SEM.
▪
representative of the universe of
designed to sample
refers to the ability of a test to c
▪
measures how well a new test compares to a well-established
test.
❖ Predictive Validity
▪ An index of the degree to which a test score predicts some
criterion measure.
▪ refers to how likely it is for test scores to predict future job
performance.
❖ Base Rate
▪ The extent to which a particular trait, behavior, characteristic, or
attribute exists in the population (expressed as a proportion).
❖ Hit Rate
▪ Defined as the proportion of people a test accurately identifies as
possessing or exhibiting a particular trait, behavior, characteristic,
or attribute.
❖ Miss Rate
▪ Defined as the proportion of people the test fails to identify as
having, or not having, a particular characteristic or attribute.
❖ False Positive
▪ A miss wherein the test predicted that the test taker did possess
the particular characteristic or attribute being measured when in
fact the test taker did not.
• Assumption is False
❖ False Negative
▪ A miss wherein the test predicted that the test taker did not
possess the particular characteristic or attribute being measured
when the test taker actually did.
• False is actually true
What Is Criterion
▪ The standard against which a test or a test score is evaluated.
❖ Characteristic of a Criterion
1. An adequate criterion measure must also be valid for the purpose for
which it is being used.
2. Criterion should be uncontaminated. A variable that is used as a
predictor and criterion and is called Criterion contamination.
❖ The validity coefficient
▪ A correlation coefficient that provides a measure of the
relationship between test scores and scores on the criterion
measure.
▪ The correlation coefficient computed from a score (or
classification) on a psychodiagnostics test and the criterion score
(or classification) assigned by psychodiagnosticians is one example
of a validity coefficient.
▪
The design of such pretest–postt
a control group to rule out altern
❖ Test scores obtained by people from dis
the theory
▪ Also referred to as the method o
providing evidence for the valid
scores on the test vary in a pred
membership in some group.
▪ The rationale here is that if a test
construct, then test scores from g
presumed to differ with respect t
correspondingly different test sc
▪ Test scores correlate with scores
what would be predicted from a t
manifestation of the construct in
❖ Convergent evidence
▪ Evidence for the construct validit
converge from a number of sourc
measures designed to assess the
▪ It is an example of convergent ev
undergoing construct validation
predicted direction with scores o
already validated tests designed t
construct.
❖ Discriminant evidence
▪ When measures of constructs tha
highly related to each other are,
each other.
❖ The multitrait-multimethod matrix (Cam
▪ The matrix or table that results fr
within and between methods.
❖ Factor Analysis
▪ A shorthand term for a class of m
to identify factors or specific vari
characteristics, or dimensions on
• Frequently employed as a da
several sets of scores and the
analyzed.
• Conducted on either an explo
❖ Exploratory factor analysis
▪ Typically entails “estimating, or e
many factors to retain; and rota
orientation”
❖ Confirmatory Factor Analysis
▪ Researchers test the degree to w
includes factors) fits the actual d
❖ Incremental validity
▪ The degree to which an additional predictor explains something
❖ Factor Loading
about the criterion measure that is not explained by predictors
▪ Serves as a data reduction metho
already in use.
correlations between observed v
• Predicting Grade Point Average by either studying or resting.
factors.
Construct Validity
▪ Judgment about the appropriateness of inferences drawn from test
Validity, Bias, and Fairness
scores regarding individual standings on a variable called a
❖ Test Bias
construct.
▪ For psychometricians, bias is a fa
▪ The researcher investigating a test’s construct validity must
systematically prevents accurate,
formulate hypotheses about the expected behavior of high
scorers and low scorers on the test.
❖ Construct
▪
Bias implies systematic variation
❖ Rating Error
Aj d
li
f
h i
Download