Uploaded by Kristine Joy Yanguas

PSYCH-ASS-BLEPP

advertisement
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Psychometric Properties and Principles (39)
▪ Item: a specific stimulus to which a person
Psychometric Properties essential in Constructing,
responds overtly and this response is being
Selecting, Interpreting tests
scored or evaluated
▪
Administration Procedures: one-to-one basis or
Psychological Testing - process of measuring
group administration
psychology-related variables by means of devices or
▪ Score: code or summary of statement, usually
procedures designed to obtain a sample of behavior
but not necessarily numerical in nature, but
- numerical in nature
reflects an evaluation of performance on a test
- individual or by group
▪ Scoring: the process of assigning scores to
- administrators can be interchangeable without
performances
affecting the evaluation
▪ Cut-Score: reference point derived by judgement
- requires technician-like skills in terms of
and used to divide a set of data into two or more
administration and scoring
classification
- yield a test score or series of test score
▪ Psychometric Soundness: technical quality
- minutes to few hours
▪ Psychometrics:
science of psychological
Psychological Assessment - gathering and integration
measurement
of psychology-related data for the purpose of making
▪ Psychometrist or Psychometrician: refer to
psychological evaluation
professional who uses, analyzes, and interprets
- answers referral question thru the use of different
psychological data
tools of evaluation
Ability or Maximal Performance Test – assess what
- individual
a person can do
- assessor is the key to the process of selecting tests
1. Achievement Test – measurement of the previous
and/or other tools of evaluation
learning
- requires an educated selection of tools of evaluation,
- used to measure general knowledge in a specific
skill in evaluation, and thoughtful organization and
period of time
integration of data
- used to assess mastery
- entails logical problem-solving that brings to bear
- rely mostly on content validity
many sources of data assigned to answer the referral
- fact-based or conceptual
question
2. Aptitude – refers to the potential for learning or
- Educational: evaluate abilities and skills relevant in
acquiring a specific skill
school context
- tends to focus on informal learning
- Retrospective: draw conclusions about psychological
- rely mostly on predictive validity
aspects of a person as they existed at some point in time
3. Intelligence – refers to a person’s general potential
prior to the assessment
to solve problems, adapt to changing environments,
- Remote: subject is not in physical proximity to the
abstract thinking, and profit from experience
person conducting the evaluation
- Ecological Momentary: “in the moment” evaluation
Human Ability – considerable overlap of
of specific problems and related cognitive and
achievement, aptitude, and intelligence test
behavioral variables at the very time and place that they
Typical Performance Test – measure usual or habitual
occur
thoughts, feelings, and behavior
- Collaborative: the assessor and assesee may work as
- indicate how test takers think and act on a daily basis
“partners” from initial contact through final feedback
- use interval scales
- Therapeutic: therapeutic self-discovery and new
- no right and wrong answers
understanding are encouraged
Personality Test – measures individual dispositions
- Dynamic: describe interactive approach to
and preferences
psychological assessment that usually follows the
- designed to identify characteristic
model: evaluation > intervention of some sort >
- measured ideographically or nomothetically
evaluation
1. Structured Personality tests – provide statement,
o Psychological Test – device or procedure designed
usually self-report, and require the subject to choose
to measure variables related to psychology
between two or more alternative responses
▪ Content: subject matter
2. Projective Personality Tests – unstructured, and the
▪ Format: form, plan, structure, arrangement,
stimulus or response are ambiguous
layout
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
o Behavioral Observation – monitoring of actions of
3. Attitude Test – elicit personal beliefs and opinions
others or oneself by visual or electronic means while
4. Interest Inventories – measures likes and dislikes
recording quantitative and/or qualitative information
as well as one’s personality orientation towards the
regarding those actions
world of work
▪ Naturalistic Observation: observe humans in
Other Tests:
natural setting
1. Speed Tests – the interest is the number of times a
▪ SORC Model: Stimulus, Organismic Valuables,
test taker can answer correctly in a specific period
Actual Response, Consequence
2. Power Tests – reflects the level of difficulty of items
o
Role
Play – defined as acting an improvised or
the test takers answer correctly
partially
improvised part in a stimulated situation
3. Values Inventory
▪ Role Play Test: assesses are directed to act as if
4. Trade Test
they are in a particular situation
5. Neuropsychological Test
o Other tools include computer, physiological devices
6. Norm-Referenced test
(biofeedback devices)
7. Criterion-Referenced Tests
Psychological
Assessment Process
o Interview – method of gathering information
1. Determining the Referral Question
through direct communication involving reciprocal
exchange
2. Acquiring Knowledge relating to the content of
the problem
Standardized/Structured – questions are prepared
3. Data collection
Non-standardized/Unstructured – pursue relevant
ideas in depth
4. Data Interpretation
Semi-Standardized/Focused – may probe further on
o Hit Rate – accurately predicts success or failure
specific number of questions
o Profile – narrative description, graph, table. Or other
representations of the extent to which a person has
Non-Directive – subject is allowed to express his
demonstrated certain targeted characteristics as a
feelings without fear of disapproval
result of the administration or application of tools of
▪ Mental Status Examination: determines the
assessment
mental status of the patient
o Actuarial Assessment – an approach to evaluation
▪ Intake Interview: determine why the client came
characterized by the application of empirically
for assessment; chance to inform the client about
demonstrated statistical rules as determining factor
the policies, fees, and process involved
in assessors’ judgement and actions
▪ Social Case: biographical sketch of the client
o Mechanical Prediction – application of computer
▪ Employment Interview: determine whether the
algorithms together with statistical rules and
candidate is suitable for hiring
probabilities
to
generate
findings
and
▪ Panel Interview (Board Interview): more than
recommendations
one interviewer participates in the assessment
o Extra-Test Behavior – observations made by an
▪ Motivational Interview: used by counselors and
examiner regarding what the examinee does and how
clinicians to gather information about some
the examinee reacts during the course of testing that
problematic behavior, while simultaneously
are indirectly related to the test’s specific content but
attempting to address it therapeutically
of possible significance to interpretation
o Portfolio – samples of one’s ability and
Parties in Psychological Assessment
accomplishment
o Case History Data – refers to records, transcripts,
1. Test Author/Developer – creates the tests or other
and other accounts in written, pictorial, or other form
methods of assessment
that preserve archival information, official and
2. Test Publishers – they publish, market, sell, and
informal accounts, and other data and items relevant
control the distribution of tests
to an assessee
3. Test Reviewers – prepare evaluative critiques based
▪ Case study: a report or illustrative account
on the technical and practical aspects of the tests
concerning a person or an event that was
4. Test Users – uses the test of assessment
compiled on the basis of case history data
5. Test Takers – those who take the tests
▪ Groupthink: result of the varied forces that drive
6. Test Sponsors – institutions or government who
decision-makers to reach a consensus
contract test developers for various testing services
7. Society
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
the types of item content that would provide
o Test Battery – selection of tests and assessment
insight to it, to gauge the strength of that trait
procedures typically composed of tests designed to
o Measuring traits and states means of a test entails
measure different variables but having a common
developing not only appropriate tests items but
objective
also appropriate ways to score the test and
Assumptions about Psychological Testing and
interpret the results
Assessment
o Cumulative Scoring – assumption that the more
the testtaker responds in a particular direction
Assumption 1: Psychological Traits and States Exist
keyed by the test manual as correct or consistent
o Trait – any distinguishable, relatively enduring
with a particular trait, the higher that testtaker is
way in which one individual varies from another
presumed to be on the targeted ability or trait
- Permit people predict the present from the past
- Characteristic patterns of thinking, feeling, and
Assumption 3: Test-Rlated Behavior Predicts Nonbehaving that generalize across similar situations,
Test-Related Behavior
differ systematically between individuals, and remain
o The tasks in some tests mimics the actual
rather stable across time
behaviors that the test user is attempting to
- Psychological Trait – intelligence, specific
understand
intellectual abilities, cognitive style, adjustment,
o Such tests only yield a sample of the behavior that
interests, attitudes, sexual orientation and preferences,
can be expected to be emitted under nontest
psychopathology, etc.
conditions
o States – distinguish one person from another but
Assumption 4: Test and Other Measurement
are relatively less enduring
Techniques have strengths and weaknesses
- Characteristic pattern of thinking, feeling, and
o Competent test users understand and appreciate
behaving in a concrete situation at a specific moment
the limitations of the test they use as well as how
in time
those limitations might be compensated for by
- Identify those behaviors that can be controlled by
data from other sources
manipulating the situation
Assumption 5: Various Sources of Error are part of
o Psychological Traits exists as construct
the Assessment Process
- Construct: an informed, scientific concept developed
o Error – refers to something that is more than
or constructed to explain a behavior, inferred from
expected; it is component of the measurement
overt behavior
process
- Overt Behavior: an observable action or the product
▪ Refers to a long-standing assumption that
of an observable action
factors other than what a test attempts to
o Trait is not expected to be manifested in behavior
measure will influence performance on the test
100% of the time
▪ Error Variance – the component of a test
o Whether a trait manifests itself in observable
score attributable to sources other than the trait
behavior, and to what degree it manifests, is
or ability measured
presumed to depend not only on the strength of the
o Potential Sources of error variance:
trait in the individual but also on the nature of the
1. Assessors
action (situation-dependent)
2. Measuring Instruments
o Context within which behavior occurs also plays a
3. Random errors such as luck
role in helping us select appropriate trait terms for
o Classical Test Theory – each testtaker has true
observed behaviors
score on a test that would be obtained but for the
o Definition of trait and state also refer to a way in
action of measurement error
which one individual varies from another
Assumption 6: Testing and Assessment can be
o Assessors may make comparisons among people
conducted in a Fair and Unbiased Manner
who, because of their membership in some group
o Despite best efforts of many professionals,
or for any number of other reasons, are decidedly
fairness-related questions and problems do
not average
occasionally rise
Assumption 2: Psychological Traits and States can
In al questions about tests with regards to fairness, it is
be Quantified and Measured
important to keep in mind that tests are tools ꟷthey can
o Once the trait, state or other construct has been
be used properly or improperly
defined to be measured, a test developer consider
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
▪ Factors that contribute to inconsistency:
Assumption 7: Testing and Assessment Benefit
characteristics of the individual, test, or situation,
Society
which have nothing to do with the attribute being
o Considering the many critical decisions that are
measured, but still affect the scores
based on testing and assessment procedures, we
o Goals of Reliability:
can readily appreciate the need for tests
✓ Estimate errors
Reliability
✓ Devise techniques to improve testing and reduce
o Reliability – dependability or consistency of the
errors
instrument or scores obtained by the same person
o
Variance
– useful in describing sources of test score
when re-examined with the same test on different
variability
occasions, or with different sets of equivalent items
▪ True Variance: variance from true differences
▪ Test may be reliable in one context, but
▪ Error Variance: variance from irrelevant random
unreliable in another
sources
▪ Estimate the range of possible random
Measurement Error – all of the factors associated
fluctuations that can be expected in an
with the process of measuring some variable, other than
individual’s score
the
variable
being
measured
▪ Free from errors
- difference between the observed score and the true
▪ More number of items = higher reliability
score
▪ Minimizing error
Positive:
can
increase
one’s
score
▪ Using only representative sample to obtain an
- Negative: decrease one’s score
observed score
- Sources of Error Variance:
▪ True score cannot be found
a. Item Sampling/Content Sampling: refer to variation
▪ Reliability Coefficient: index of reliability, a
among items within a test as well as to variation among
proportion that indicates the ratio between the
items between tests
true score variance on a test and the total
- The extent to which testtaker’s score is affected by the
variance
content sampled on a test and by the way the content is
o Classical Test Theory (True Score Theory) – score
sampled is a source of error variance
on a ability tests is presumed to reflect not only the
b. Test Administration- testtaker’s motivation or
testtaker’s true score on the ability being measured
attention, environment, etc.
but also the error
c. Test Scoring and Interpretation – may employ
▪ Error: refers to the component of the observed
objective-type items amenable to computer scoring of
test score that does not have to do with the
well-documented reliability
testtaker’s ability
▪ Errors of measurement are random
Random Error – source of error in measuring a
targeted variable caused by unpredictable fluctuations
and inconsistencies of other variables in measurement
process (e.g., noise, temperature, weather)
Systematic Error – source of error in a measuring a
variable that is typically constant or proportionate to
what is presumed to be the true values of the variable
being
measured
- has consistent effect on the true score
- SD does not change, the mean does
▪ Reliability refers to the proportion of total
variance attributed to true variance
▪ The greater the proportion of the total variance
▪ When you average all the observed scores
attributed to true variance, the more reliable the
obtained over a period of time, then the result
test
would be closest to the true score
▪ Error variance may increase or decrease a test
▪ The greater number of items, the higher the
score by varying amounts, consistency of test
reliability
score, and thus, the reliability can be affected
▪ Factors the contribute to consistency: stable
attributes
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Test-Retest Reliability
time
- most rigorous and burdensome, since test developers
Error: Time Sampling
create two forms of the test
- time sampling reliability
- main problem: difference between the two test
- an estimate of reliability obtained by correlating
- test scores may be affected by motivation, fatigue, or
pairs of scores from the same people on two different
intervening events
administrations of the test
- means and the variances of the observed scores must
- appropriate when evaluating the reliability of a test
be equal for two forms
that purports to measure an enduring and stable
- Statistical Tool: Pearson R or Spearman Rho
attribute such as personality trait
- established by comparing the scores obtained from
Internal Consistency (Inter-Item Reliability)
two successive measurements of the same individuals
Error: Item Sampling Homogeneity
and calculating a correlated between the two set of
- used when tests are administered once
scores
- consistency among items within the test
- the longer the time passes, the greater likelihood that
- measures the internal consistency of the test which is
the reliability coefficient would be insignificant
the degree to which each item measures the same
- Carryover Effects: happened when the test-retest
construct
interval is short, wherein the second test is influenced
- measurement for unstable traits
by the first test because they remember or practiced
- if all items measure the same construct, then it has a
the previous test = inflated correlation/overestimation
good internal consistency
of reliability
- useful in assessing Homogeneity
- Practice Effect: scores on the second session are
- Homogeneity: if a test contains items that measure a
higher due to their experience of the first session of
single trait (unifactorial)
testing
- Heterogeneity: degree to which a test measures
- test-retest with longer interval might be affected of
different factors (more than one factor/trait)
other extreme factors, thus, resulting to low
- more homogenous = higher inter-item consistency
correlation
- KR-20: used for inter-item consistency of
- lower correlation = poor reliability
dichotomous items (intelligence tests, personality tests
- Mortality: problems in absences in second session
with yes or no options, multiple choice), unequal
(just remove the first tests of the absents)
variances, dichotomous scored
- Coefficient of Stability
- KR-21: if all the items have the same degree of
- statistical tool: Pearson R, Spearman Rho
difficulty (speed tests), equal variances, dichotomous
Parallel Forms/Alternate Forms Reliability
scored
- Cronbach’s Coefficient Alpha: used when two
Error: Item Sampling (Immediate), Item Sampling
halves of the test have unequal variances and on tests
changes over time (delaued)
containing non-dichotomous items, unequal variances
- established when at least two different versions of
- Average Proportional Distance: measure used to
the test yield almost the same scores
evaluate internal consistence of a test that focuses on
- has the most universal applicability
the degree of differences that exists between item
- Parallel Forms: each form of the test, the means,
scores
and the variances, are EQUAL; same items, different
positionings/numberings
Split-Half Reliability
- Alternate Forms: simply different version of a test
Error: Item sample: Nature of Split
that has been constructed so as to be parallel
- Split Half Reliability: obtained by correlating two
- test should contain the same number of items and the
pairs of scores obtained from equivalent halves of a
items should be expressed in the same form and
single test administered ONCE
should cover the same type of content; range and
- useful when it is impractical or undesirable to assess
difficulty must also be equal
reliability with two tests or to administer a test twice
- if there is a test leakage, use the form that is not
- cannot just divide the items in the middle because it
mostly administered
might spuriously raise or lower the reliability
- Counterbalancing: technique to avoid carryover
coefficient, so just randomly assign items or assign
effects for parallel forms, by using different sequence
odd-numbered items to one half and even-numbered
for groups
items to the other half
- can be administered on the same day or different
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
o Criterion-Referenced Tests – designed to provide
- Spearman-Brown Formula: allows a test developer
an indication of where a testtaker stands with respect
of user to estimate internal consistency reliability
to some variable or criterion
from a correlation of two halves of a test, if each half
▪
As individual differences decrease, a traditional
had been the length of the whole test and have the
measure of reliability would also decrease,
equal variances
regardless of the stability of individual
- Spearman-Brown Prophecy Formula: estimates how
performance
many more items are needed in order to achieve the
o Classical Test Theory – everyone has a “true score”
target reliability
on test
- multiply the estimate to the original number of items
▪
True Score: genuinely reflects an individual’s
- Rulon’s Formula: counterpart of spearman-brown
ability level as measured by a particular test
formula, which is the ratio of the variance of
▪ Random Error
difference between the odd and even splits and the
o Domain Sampling Theory – estimate the extent to
variance of the total, combined odd-even, score
which specific sources of variation under defined
- if the reliability of the original test is relatively low,
conditions are contributing to the test scores
then developer could create new items, clarify test
▪
Considers problem created by using a limited
instructions, or simplifying the scoring rules
number of items to represent a larger and more
- equal variances, dichotomous scored
complicated construct
- Statistical Tool: Pearson R or Spearman Rho
▪ Test reliability is conceived of as an objective
Inter-Scorer Reliability
measure of how precisely the test score assesses
Error: Scorer Differences
the domain from which the test draws a sample
- the degree of agreement or consistency between two
▪
Generalizability Theory: based on the idea that a
or more scorers with regard to a particular measure
person’s test scores vary from testing to testing
- used for coding nonbehavioral behavior
because of the variables in the testing situations
- observer differences
▪ Universe: test situation
- Fleiss Kappa: determine the level between TWO or
▪ Facets: number of items in the test, amount of
MORE raters when the method of assessment is
review, and the purpose of test administration
measured on CATEGORICAL SCALE
▪
According to Generalizability Theory, given the
- Cohen’s Kappa: two raters only
exact same conditions of all the facets in the
- Krippendorff’s Alpha: two or more rater, based on
universe, the exact same test score should be
observed disagreement corrected for disagreement
obtained (Universe score)
expected by chance
▪
Decision Study: developers examine the
o Tests designed to measure one factor (Homogenous)
usefulness of test scores in helping the test user
are expected to have high degree of internal
make decisions
consistency and vice versa
▪
Systematic Error
o Dynamic – trait, state, or ability presumed to be evero Item Response Theory – the probability that a
changing as a function of situational and cognitive
person with X ability will be able to perform at a
experience
level of Y in a test
o Static – barely changing or relatively unchanging
▪
Focus: item difficulty
o Restriction of range or Restriction of variance – if
▪
Latent-Trait Theory
the variance of either variable in a correlational
▪
a system of assumption about measurement and
analysis is restricted by the sampling procedure used,
the extent to which item measures the trait
then the resulting correlation coefficient tends to be
▪ The computer is used to focus on the range of
lower
item difficulty that helps assess an individual’s
o Power Tests – when time limit is long enough to
ability level
allow test takers to attempt all times
▪
If you got several easy items correct, the
o Speed Tests – generally contains items of uniform
computer will them move to more difficult items
level of difficulty with time limit
▪ Difficulty: attribute of not being easily
▪ Reliability should be based on performance from
accomplished, solved, or comprehended
two independent testing periods using test-retest
▪ Discrimination: degree to which an item
and alternate-forms or split-half-reliability
differentiates among people with higher or lower
levels of the trait, ability etc.
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Dichotomous: can be answered with only one of
3. False Positive (Type 1) – success does not occur
two alternative responses
4. False Negative (Type 2) – predicted failure but
▪ Polytomous: 3 or more alternative responses
succeed
o Standard Error of Measurement – provide a
measure of the precision of an observed test score
▪ Standard deviation of errors as the basic measure
of error
▪ Index of the amount of inconsistent or the amount
of the expected error in an individual’s score
▪ Allows to quantify the extent to which a test
provide accurate scores
▪ Provides an estimate of the amount of error
inherent in an observed score or measurement
▪ Higher reliability, lower SEM
▪ Used to estimate or infer the extent to which an
Validity
observed score deviates from a true score
o Validity – a judgment or estimate of how well a test
▪ Standard Error of a Score
measures what it supposed to measure
▪ Confidence Interval: a range or band of test
▪ Evidence about the appropriateness of inferences
scores that is likely to contain true scores
drawn from test scores
o Standard Error of the Difference – can aid a test
▪ Degree to which the measurement procedure
user in determining how large a difference should be
measures the variables to measure
before it is considered statistically significant
▪ Inferences – logical result or deduction
o Standard Error of Estimate – refers to the standard
▪ May diminish as the culture or times change
error of the difference between the predicted and
✓ Predicts future performance
observed values
✓ Measures appropriate domain
o Confidence Interval – a range of and of test score
✓ Measures appropriate characteristics
that is likely to contain true score
o Validation – the process of gathering and evaluating
▪ Tells us the relative ability of the true score within
evidence about validity
the specified range and confidence level
o Validation Studies – yield insights regarding a
▪ The larger the range, the higher the confidence
particular population of testtakers as compared to the
o If the reliability is low, you can increase the number
norming sample described in a test manual
of items or use factor analysis and item analysis to
o Internal Validity – degree of control among
increase internal consistency
variables in the study (increased through random
o Reliability Estimates – nature of the test will often
assignment)
determine the reliability metric
o External Validity – generalizability of the research
a) Homogenous (unifactor) or heterogeneous
results (increased through random selection)
(multifactor)
o Conceptual Validity – focuses on individual with
b) Dynamic (unstable) or static (stable)
their unique histories and behaviors
c) Range of scores is restricted or not
▪ Means of evaluating and integrating test data so
d) Speed Test or Power Test
that the clinician’s conclusions make accurate
e) Criterion or non-Criterion
statements about the examinee
o Test Sensitivity – detects true positive
o Face Validity – a test appears to measure to the
o Test Specificity – detects true negative
person being tested than to what the test actually
o Base Rate – proportion of the population that
measures
actually possess the characteristic of interest
Content
Validity
o Selection ratio – no. of available positions compared
describes
a judgement of how adequately a test
to the no. of applicants
samples
behavior
representative of the universe of
o Four Possible Hit and Miss Outcomes
behavior that the test was designed to sample
1. True Positives (Sensitivity) – predict success
- when the proportion of the material covered by the
that does occur
test approximates the proportion of material covered in
2. True Negatives (Specificity) – predict failure
the course
that does occur
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
▪
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- Test Blueprint: a plan regarding the types of
- logical and statistical
information to be covered by the items, the no. of items
- judgement about the appropriateness of inferences
tapping each area of coverage, the organization of the
drawn from test scores regarding individual standing
items, and so forth
on variable called construct
- more logical than statistical
- Construct: an informed, scientific idea developed or
- concerned with the extent to which the test is
hypothesized to describe or explain behavior;
representative of defined body of content consisting the
unobservable, presupposed traits that may invoke to
topics and processes
describe test behavior or criterion performance
- panel of experts can review the test items and rate
- One way a test developer can improve the
them in terms of how closely they match the objective
homogeneity of a test containing dichotomous items is
or domain specification
by eliminating items that do not show significant
- examine if items are essential, useful and necessary
correlation coefficients with total test scores
- construct underrepresentation: failure to capture
- If it is an academic test and high scorers on the entire
important
components
of
a
construct
test for some reason tended to get that particular item
- construct-irrelevant variance: happens when scores
wrong while low scorers got it right, then the item is
are influenced by factors irrelevant to the construct
obviously not a good one
- Lawshe: developed the formula of Content Validity
- Some constructs lend themselves more readily than
Ratio
others to predictions of change over time
- Zero CVR: exactly half of the experts rate the item as
- Method of Contrasted Groups: demonstrate that
essential
scores on the test vary in a predictable way as a
function of membership in a group
Criterion Validity
- If a test is a valid measure of a particular construct,
- more statistical than logical
then the scores from the group of people who does not
- a judgement of how adequately a test score can be
have that construct would have different test scores
used to infer an individual’s most probable standing on
than those who really possesses that construct
some measure of interestꟷthe measure of interest being
- Convergent Evidence: if scores on the test
criterion
undergoing construct validation tend to highly
- Criterion: standard on which a judgement or decision
correlated with another established, validated test that
may be made
measures the same construct
- Characteristics: relevant, valid, uncontaminated
- Discriminant Evidence: a validity coefficient
- Criterion Contamination: occurs when the criterion
showing little relationship between test scores and/or
measure includes aspects of performance that are not
other variables with which scores on the test being
part of the job or when the measure is affected by
construct-validated should not be correlated
“construct-irrelevant” (Messick, 1989) factors that are
- test is homogenous
not part of the criterion construct
- test score increases or decreases as a function of age,
1. Concurrent Validity: If the test scores obtained at
passage of time, or experimental manipulation
about the same time as the criterion measures are
- pretest-posttest differences
obtained; economically efficient
- scores differ from groups
2. Predictive Validity: measures of the relationship
- scores correlated with scores on other test in
between test scores and a criterion measure obtained at
accordance to what is predicted
a future time
o Factor Analysis – designed to identify factors or
- Incremental Validity: the degree to which an
specific variables that are typically attributes,
additional predictor explains something about the
characteristics, or dimensions on which people may
criterion measure that is not explained by predictors
differ
already in use; used to improve the domain
▪
Developed by Charles Spearman
- related to predictive validity wherein it is defined as
▪ Employed as data reduction method
the degree to which an additional predictor explains
▪ Used to study the interrelationships among set of
something about the criterion measure that is not
variables
explained by predictors already in use
▪ Identify the factor or factors in common between
Construct Validity (Umbrella Validity)
test scores on subscales within a particular test
- covers all types of validity
▪ Explanatory FA: estimating or extracting factors;
deciding how many factors must be retained
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Confirmatory FA: researchers test the degree to
o Cost – disadvantages, losses, or expenses both
which a hypothetical model fits the actual data
economic and noneconomic terms
▪ Factor Loading: conveys info about the extent to
o Benefit – profits, gains or advantages
which the factor determines the test score or
o The cost of test administration can be well worth it if
scores
the results is certain noneconomic benefits
▪ can be used to obtain both convergent and
o Utility Analysis – family of techniques that entail a
discriminant validity
cost-benefit analysis designed to yield information
o Cross-Validation – revalidation of the test to a
relevant to a decision about the usefulness and/or
criterion based on another group different from the
practical value of a tool of assessment
original group form which the test was validated
o Expectancy table – provide an indication that a
▪ Validity Shrinkage: decrease in validity after
testtaker will score within some interval of scores on
cross-validation
a criterion measure – passing, acceptable, failing
▪ Co-Validation: validation of more than one test
o Might indicate future behaviors, then if successful,
from the same group
the test is working as it should
▪ Co-Norming: norming more than one test from
o Taylor-Russel Tables – provide an estimate of the
the same group
extent to which inclusion of a particular test in the
o Bias – factor inherent in a test that systematically
selection system will improve selection
prevents accurate, impartial measurement
o Selection Ratio – numerical value that reflects the
▪ Prejudice, preferential treatment
relationship between the number of people to be
▪ Prevention during test dev through a procedure
hired and the number of people available to be hired
called Estimated True Score Transformation
o Rating – numerical or verbal judgement that places
a person or an attribute along a continuum identified
by a scale of numerical or word descriptors known as
o Base Rate – percentage of people hired under the
Rating Scale
existing system for a particular position
▪ Rating Error: intentional or unintentional misuse
o One limitation of Taylor-Russel Tables is that the
of the scale
relationship between the predictor (test) and criterion
▪ Leniency Error: rater is lenient in scoring
must be linear
(Generosity Error)
o Naylor-Shine Tables – entails obtaining the
▪ Severity Error: rater is strict in scoring
difference between the means of the selected and
▪ Central Tendency Error: rater’s rating would tend
unselected groups to derive an index of what the test
to cluster in the middle of the rating scale
is adding to already established procedures
▪ One way to overcome rating errors is to use
o Brogden-Cronbach-Gleser Formula – used to
rankings
calculate the dollar amount of a utility gain resulting
▪ Halo Effect: tendency to give high score due to
from the use of a particular selection instrument
failure to discriminate among conceptually
o Utility Gain – estimate of the benefit of using a
distinct and potentially independent aspects of a
particular test
ratee’s behavior
o Productivity Gains – an estimated increase in work
o Fairness – the extent to which a test is used in an
output
impartial, just, and equitable way
o High performing applicants may have been offered
o Attempting to define the validity of the test will be
in other companies as well
futile if the test is NOT reliable
o The more complex the job, the more people differ on
Utility
how well or poorly they do that job
o Utility – usefulness or practical value of testing to
o Cut Score – reference point derived as a result of a
improve efficiency
judgement and used to divide a set of data into two
o Can tell us something about the practical value of the
or more classifications
information derived from scores on the test
Relative Cut Score – reference point based on normo Helps us make better decisions
related considerations (norm-referenced); e.g, NMAT
o Higher criterion-related validity = higher utility
Fixed Cut Scores – set with reference to a judgement
o One of the most basic elements in utility analysis is
concerning minimum level of proficiency required;
financial cost of the selection device
e.g., Board Exams
▪
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Validity
Multiple Cut Scores – refers to the use of two or more
cut scores with reference to one predictor for the
purpose of categorization
Multiple Hurdle – multi-stage selection process, a cut
score is in place for each predictor
Compensatory Model of Selection – assumption that
high scores on one attribute can compensate for lower
scores
o Angoff Method – setting fixed cut scores
▪ low interrater reliability
o Known Groups Method – collection of data on the
predictor of interest from group known to possess
and not possess a trait of interest
▪ The determination of where to set cutoff score is
inherently affected by the composition of
Item Difficulty
contrasting groups
o IRT-Based Methods – cut scores are typically set
based on testtaker’s performance across all the items
on the test
▪ Item-Mapping Method: arrangement of items in
histogram, with each column containing items
with deemed to be equivalent value
▪ Bookmark Method: expert places “bookmark”
between the two pages that are deemed to separate
testtakers who have acquired the minimal
Item Discrimination
knowledge, skills, and/or abilities from those who
are not
o Method of Predictive Yield – took into account the
number of positions to be filled, projections
regarding the likelihood of offer acceptance, and the
distribution of applicant scores
o Discriminant Analysis – shed light on the
relationship between identified variables and two
naturally occurring groups
P-Value
Reason for accepting or rejecting instruments and
o P-Value ≤ ∞, reject null hypothesis
tools based on Psychometric Properties
o P-Value ≥ ∞, accept null hypothesis
Reliability
o
o
Basic Research = 0.70 to 0.90
Clinical Setting = 0.90 to 0.95
Research Methods and Statistics (20)
Statistics Applied in Research Studies on tests and
Tests Development
Measures of Central Tendency - statistics that
indicates the average or midmost score between the
extreme scores in a distribution
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- Goal: Identify the most typical or representative of
Measures of Spread or Variability – statistics that
entire group
describe the amount of variation in a distribution
- Measures of Central Location
- gives idea of how well the measure of central
tendency represent the data
Mean
- the average of all the
- large spread of values means large differences
raw scores
between individual scores
- Equal to the sum of the
observations divided by
Range
- equal to the difference
the number of
between highest and the
observations
lowest score
- Interval and ratio data
- Provides a quick but
(when normal
gross description of the
distribution)
spread of scores
- Point of least squares
- When its value is based
- Balance point for the
on extreme scores of the
distribution
distribution, the resulting
- susceptible to outliers
description of variation
may be understated or
Median
– the middle score of the
overstated
distribution
- Ordinal, Interval, Ratio
Interquartile Range
- difference between Q1
- for extreme scores, use
and Q2
median
Semi-Quartile Range
- interquartile range
- Identical for sample and
divided by 2
population
Standard Deviation
- approximation of the
- Also used when there
average deviation around
has an unknown or
the mean
undetermined score
- gives detail of how
- Used in “open-ended”
much above or below a
categories (e.g., 5 or
score to the mean
more, more than 8, at
- equal to the square root
least 10)
of the average squared
- For ordinal data
deviations about the
- if the distribution is
mean
skewed for ratio/interval
- Equal to the square root
data, use median
of the variance
Mode
- most frequently
- Distance from the mean
occurring score in the
Variance
- equal to the arithmetic
distribution
mean of the squares of
- Bimodal Distribution: if
the differences between
there are two scores that
the scores in a
occur with highest
distribution and their
frequency
mean
- Not commonly used
- average squared
- Useful in analyses of
deviation around the
qualitative or verbal
mean
nature
Measures of Location
- For nominal scales,
Percentile or Percentile - not linearly
discrete variables
Rank
transformable, converged
- Value of the mode gives
at the middle and the
an indication of the shape
outer ends show large
of the distribution as well
interval
as a measure of central
tendency
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Quartile
Decile/STEN
Correlation
Pearson R
Spearman Rho
Biserial
Point Biserial
Phi Coefficient
Tetrachoric
Kendall’s
Rank Biserial
Differences
T-test Independent
T-Test Dependent
One-Way ANOVA
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- expressed in terms of
and the differences of
the percentage of persons
their salaries
in the standardization
One-Way Repeated
- 1 group, measured at
sample who fall below a
Measures
least 3 times
given score
- e.g., measuring the
- indicates the
focus level of board
individual’s relative
reviewers during
position in the
morning, afternoon, and
standardization sample
night sessions of review
- dividing points between
Two-Way ANOVA
- 3 or more groups, tested
the four quarters in the
for 2 variables
distribution
- e.g., people in different
- Specific point
socio-economic status
- Quarter: refers to an
and the differences of
interval
their salaries and their
- divide into 10 equal
eating habits
parts
ANCOVA
- used when you need to
- a measure of the
control for an additional
asymmetry of the
variable which may be
probability distribution of
influencing the
a real-valued random
relationship between your
about its mean
independent and
dependent variable
- interval/ratio +
ANOVA Mixed Design
- 2 or more groups,
interval/ratio
measured more than 3
times
- ordinal + ordinal
- e.g., Young Adults,
- artificial Dichotomous +
Middle Adults, and Old
interval/ratio
Adults’ blood pressure is
- true dichotomous +
measured during
interval/ratio
breakfast, lunch, and
- nominal (true dic) +
dinner
nominal (true/artificial
Non-Parametric Tests
dic.)
Mann Whitney U Test
- t-test independent
- Art. Dichotomous + Art.
Wilcoxon Signed Rank
- t-test dependent
Dichotomos
Test
- 3 or more ordinal/rank
Kruskal-Wallis H Test
- one-way/two-way
- nominal + ordinal
ANOVA
Friedman
Test
- ANOVA repeated
- two separate groups,
measures
random assignment
Lambda
- for 2 groups of nominal
- e.g., blood pressure of
data
male and female grad
Chi-Square
students
Goodness of Fit
- used to measure
- one group, two scores
differences and involves
- e.g., blood pressure
nominal data and only
before and after the
one variable with 2 or
lecture of Grad students
more categories
- 3 or more groups, tested
Test of Independence
- used to measure
once
correlation and involves
- e.g., people in different
nominal data and two
socio-economic status
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
variables with two or
II. Test Construction – stage in the process that entails
more categories
writing test items, revisions, formatting, setting scoring
rules
Regression – used when one wants to provide
- it is not good to create an item that contains numerous
framework of prediction on the basis of one factor in
ideas
order to predict the probable value of another factor
- Item Pool: reservoir or well from which the items will
Linear Regression of Y
- Y = a + bX
or will not be drawn for the final version of the test
on X
- Used to predict the
- Item Banks: relatively large and easily accessible
unknown value of
collection of test questions
variable Y when value of
- Computerized Adaptive Testing: refers to an
variable X is known
interactive, computer administered test-taking process
Linear Regression of X
- X = c + dY
wherein items presented to the testtaker are based in
on Y
- Used to predict the
part on the testtaker’s performance on previous items
unknown value of
- The test administered may be different for each
variable X using the
testtaker, depending on the test performance on the
known variable Y
items presented
- Reduces floor and ceiling effects
- Floor Effects: occurs when there is some lower limit
on a survey or questionnaire and a large percentage of
respondents score near this lower limit (testtakers have
low scores)
- Ceiling Effects: occurs when there is some upper limit
on a survey or questionnaire and a large percentage of
respondents score near this upper limit (testtakers have
high scores)
- Item Branching: ability of the computer to tailor the
content and order of presentation of items on the basis
o True Dichotomy – dichotomy in which there are
of responses to previous items
only fixed possible categories
- Item Format: form, plan, structure, arrangement, and
o Artificial Dichotomy - dichotomy in which there are
layout of individual test items
other possibilities in a certain category
- Dichotomous Format: offers two alternatives for each
Methods and Statistics used in Research Studies and
item
Test Construction
- Polychotomous Format: each item has more than two
Test Development
alternatives
o Test Development – an umbrella term for all that
- Category Format: a format where respondents are
goes into the process of creating a test
asked to rate a construct
I. Test Conceptualization – brainstorming of ideas
1. Checklist – subject receives a longlist of adjectives
about what kind of test a developer wants to publish
and indicates whether each one if characteristic of
- stage wherein the ff. is determined: construct, goal,
himself or herself
user, taker, administration, format, response, benefits,
2. Guttman Scale – items are arranged from weaker to
costs, interpretation
stronger expressions of attitude, belief, or feelings
- determines whether the test would be norm- Selected-Response Format: require testtakers to select
referenced or criterion-referenced
response from a set of alternative responses
- Pilot Work/Pilot Study/Pilot Research – preliminary
1. Multiple Choice - Has three elements: stem
research surrounding the creation of a prototype of the
(question), a correct option, and several incorrect
test
alternatives (distractors or foils), Should’ve one
- Attempts to determine how best to measure a targeted
correct answer, has grammatically parallel alternatives,
construct
similar length, alternatives that fit grammatically with
- Entail lit reviews and experimentation, creation,
the stem, avoid ridiculous distractors, not excessively
revision, and deletion of preliminary items
long, “all of the above”, “none of the above” (25%)
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- Effective Distractors: a distractor that was chosen
3. Constant Sum – respondents are asked to allocate a
equally by both high and low performing groups that
constant sum of units, such as points, among set of
enhances the consistency of test results
stimulus objects with respect to some criterion
- Ineffective Distractors: may hurt the reliability of the
4. Q-Sort Technique – sort object based on similarity
test because they are time consuming to read and can
with respect to some criterion
limit the no. of good items
Non-Comparative Scales of Measurement
- Cute Distractors: less likely to be chosen, may affect
1. Continuous Rating – rate the objects by placing a
the reliability of the test bec the testtakers may guess
mark at the appropriate position on a continuous line
from the remaining options
that runs from one extreme of the criterion variable to
2. Matching Item - Test taker is presented with two
the other
columns: Premises and Responses
- e.g., Rating Guardians of the Galaxy as the best
3. Binary Choice - Usually takes the form of a sentence
Marvel Movie of Phase 4
that requires the testtaker to indicate whether the
2. Itemized Rating – having numbers or brief
statement is or is not a fact (50%)
descriptions associated with each category
- Constructed-Response Format: requires testtakers to
- e.g., 1 if your like the item the most, 2 if so-so, 3 if
supply or to create the correct answer, not merely
you hate it
selecting it
3. Likert Scale – indicate their own attitudes by
1. Completion Item - Requires the examinee to
checking how strongly they agree or disagree with
provide a word or phrase that completes a sentence
carefully worded statements that range from very
2. Short-Answer - Should be written clearly enough
positive to very negative towards attitudinal object
that the testtaker can respond succinctly, with short
- principle of measuring attitudes by asking people to
answer
respond to a series of statements about a topic, in terms
3. Essay – allows creative integration and expression
of the extent to which they agree with them
of the material
4. Visual Analogue Scale – a 100-mm line that allows
- Scaling: process of setting rules for assigning
subjects to express the magnitude of an experience or
numbers in measurement
belief
Primary Scales of Measurement
5. Semantic Differential Scale – derive respondent’s
1. Nominal - involve classification or categorization
attitude towards the given object by asking him to
based on one or more distinguishing characteristics
select an appropriate position on a scale between two
- Label and categorize observations but do not make
bipolar opposites
any quantitative distinctions between observations
6. Staple Scale – developed to measure the direction
- mode
and intensity of an attitude simultaneously
2. Ordinal - rank ordering on some characteristics is
7. Summative Scale – final score is obtained by
also permissible
summing the ratings across all the items
- median
8. Thurstone Scale – involves the collection of a
3. Ratio - contains equal intervals, has no absolute zero
variety of different statements about a phenomenon
point (even negative values have interpretation to it)
which are ranked by an expert panel in order to develop
- Zero value does not mean it represents none
the questionnaire
4. Interval - - has true zero point (if the score is zero,
- allows multiple answers
it means none/null)
9. Ipsative Scale – the respondent must choose
- Easiest to manipulate
between two or more equally socially acceptable
Comparative Scales of Measurement
options
1. Paired Comparison - produces ordinal data by
III. Test Tryout - the test should be tried out on people
presenting with pairs of two stimuli which they are
who are similar in critical respects to the people for
asked to compare
whom the test was designed
- respondent is presented with two objects at a time and
- An informal rule of thumb should be no fewer than 5
asked to select one object according to some criterion
and preferably as many as 10 for each item (the more,
2. Rank Order – respondents are presented with
the better)
several items simultaneously and asked to rank them in
- Risk of using few subjects = phantom factors emerge
order or priority
- Should be executed under conditions as identical as
possible
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- A good test item is one that answered correctly by high
- The higher Item-Validity index, the greater the test’s
scorers as a whole
criterion-related validity
- Empirical Criterion Keying: administering a large
- Item-Discrimination Index: measure of item
pool of test items to a sample of individuals who are
discrimination; measure of the difference between the
known to differ on the construct being measured
proportion of high scorers answering an item correctly
- Item Analysis: statistical procedure used to analyze
and the proportion of low scorers answering the item
items, evaluate test items
correctly
- Discriminability Analysis: employed to examine
- Extreme Group Method: compares people who have
correlation between each item and the total score of the
done well with those who have done poorly
test
- Discrimination Index: difference between these
- Item: suggest a sample of behavior of an individual
proportion
- Table of Specification: a blueprint of the test in terms
- Point-Biserial Method: correlation between a
of number of items per difficulty, topic importance, or
dichotomous variable and continuous variable
taxonomy
- Guidelines for Item writing: Define clearly what to
measure, generate item pool, avoid long items, keep the
level of reading difficulty appropriate for those who
will complete the test, avoid double-barreled items,
consider making positive and negative worded items
- Double-Barreled Items: items that convey more than
one ideas at the same time
- Item-Characteristic Curve: graphic representation of
- Item Difficulty: defined by the number of people who
item difficulty and discrimination
get a particular item correct
- Guessing: one that eluded any universally accepted
- Item-Difficulty Index: calculating the proportion of
solutions
the total number of testtakers who answered the item
- Item analyses taken under speed conditions yield
correctly; The larger, the easier the item
misleading or uninterpretable results
- Item-Endorsement Index for personality testing,
- Restrict item analysis on a speed test only to the items
percentage of individual who endorsed an item in a
completed by the testtaker
personality test
- Test developer ideally should administer the test to be
- The optimal average item difficulty is approx. 50%
item-analyzed with generous time limits to complete
with items on the testing ranging in difficulty from
the test
about 30% to 80%
Scoring Items/Scoring Models
1. Cumulative Model – testtaker obtains a measure of
the level of the trait; thus, high scorers may suggest
high level in the trait being measured
2. Class Scoring/Category Scoring – testtaker
response earn credit toward placement in a particular
class or category with other testtaker whose pattern of
responses is similar in some way
3. Ipsative Scoring – compares testtaker’s score on one
scale within a test to another scale within that same test,
- Omnibus Spiral Format: items in an ability are
two unrelated constructs
arranged into increasing difficulty
IV. Test Revision – characterize each item according to
- Item-Reliability Index: provides an indication of the
its strength and weaknesses
internal consistency of a test
- As revision proceeds, the advantage of writing a large
- The higher Item-Reliability index, the greater the
item pool becomes more apparent because some items
test’s internal consistency
were removed and must be replaced by the items in the
- Item-Validity Index: designed to provide an indication
item pool
of the degree to which a test is measure what it purports
- Administer the revised test under standardized
to measure
conditions to a second appropriate sample of examinee
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
o Basal Level – the level of which a the minimum
- Cross-Validation: revalidation of a test on a sample of
testtakers other than those on who test performance was
criterion number of correct responses is obtained
originally found to be a valid predictor of some
o Computer Assisted Psychological Assessment –
criterion; often results to validity shrinkage
standardized test administration is assured for
- Validity Shrinkage: decrease in item validities that
testtakers and variation is kept to a minimum
inevitably occurs after cross-validation
▪ Test content and length is tailored according to
- Co-validation: conducted on two or more test using
the taker’s ability
the same sample of testtakers
Statistics
- Co-norming: creation of norms or the revision of
o Measurement – the act of assigning numbers or
existing norms
symbols to characteristics of things according to
- Anchor Protocol: test protocol scored by highly
rules
authoritative scorer that is designed as a model for
Descriptive Statistics – methods used to provide
scoring and a mechanism for resolving scoring
concise description of a collection of quantitative
discrepancies
information
- Scoring Drift: discrepancy between scoring in an
Inferential
Statistics – method used to make
anchor protocol and the scoring of another protocol
inferences
from
observations of a small group of people
- Differential Item Functioning: item functions
known as sample to a larger group of individuals
differently in one group of testtakers known to have the
known as population
same level of the underlying trait
o
Magnitude – the property of “moreness”
- DIF Analysis: test developers scrutinize group by
o
Equal Intervals – the difference between two points
group item response curves looking for DIF Items
at any place on the scale has the same meaning as the
- DIF Items: items that respondents from different
difference between two other points that differ by the
groups at the same level of underlying trait have
same number of scale units
different probabilities of endorsing a function of their
o
Absolute 0 – when nothing of the property being
group membership
measured exists
o Computerized Adaptive Testing – refers to an
o Scale – a set of numbers who properties model
interactive, computer administered test-taking
empirical properties of the objects to which the
process wherein items presented to the testtaker are
numbers are assigned
based in part on the testtaker’s performance on
Continuous Scale – takes on any value within the
previous items
range and the possible value within that range is infinite
▪ The test administered may be different for each
- used to measure a variable which can theoretically be
testtaker, depending on the test performance on
divided
the items presented
Discrete Scale – can be counted; has distinct, countable
▪ Reduces floor and ceiling effects
values
- used to measure a variable which cannot be
▪ Floor Effects: occurs when there is some lower
theoretically be divided
limit on a survey or questionnaire and a large
o
Error – refers to the collective influence of all the
percentage of respondents score near this lower
factors on a test score or measurement beyond those
limit (testtakers have low scores)
specifically measured by the test or measurement
▪ Ceiling Effects: occurs when there is some upper
▪
Degree to which the test score/measurement may
limit on a survey or questionnaire and a large
be wrong, considering other factors like state of
percentage of respondents score near this upper
the testtaker, venue, test itself etc.
limit (testtakers have high scores)
▪ Measurement with continuous scale always
▪ Item Branching: ability of the computer to tailor
involve with error
the content and order of presentation of items on
Four Levels of Scales of Measurement
the basis of responses to previous items
Nominal – involve classification or categorization
▪ Routing Test: subtest used to direct or route the
based on one or more distinguishing characteristics
testtaker to a suitable level of items
- Label and categorize observations but do not make
▪ Item-Mapping Method: setting cut scores that
any quantitative distinctions between observations
- mode
entails a histographic representation of items and
expert judgments regarding item effectiveness
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Ordinal - rank ordering on some characteristics is also
- Bimodal Distribution: if there are two scores that
permissible
occur with highest frequency
- median
- Not commonly used
- Useful in analyses of qualitative or verbal nature
Interval - contains equal intervals, has no absolute zero
- For nominal scales, discrete variables
point (even negative values have interpretation to it)
- Value of the mode gives an indication of the shape of
- Zero value does not mean it represents none
the distribution as well as a measure of central tendency
Ratio - has true zero point (if the score is zero, it means
o Variability – an indication how scores in a
none/null)
distribution are scattered or dispersed
- Easiest to manipulate
o Measures of Variability – statistics that describe the
o Distribution – defined as a set of test scores arrayed
amount of variation in a distribution
for recording or study
o Range – equal to the difference between highest and
o Raw Scores – straightforward, unmodified
the lowest score
accounting of performance that is usually numerical
▪
Provides a quick but gross description of the
o Frequency Distribution – all scores are listed
spread of scores
alongside the number of times each score occurred
▪ When its value is based on extreme scores of the
o Independent Variable – being manipulated in the
distribution, the resulting description of variation
study
may be understated or overstated
o Quasi-Independent Variable – nonmanipulated
o Quartile – dividing points between the four quarters
variable to designate groups
in the distribution
▪ Factor: for ANOVA
▪
Specific point
Post-Hoc Tests – used in ANOVA to determine which
▪ Quarter: refers to an interval
mean differences are significantly different
▪ Interquartile Range: measure of variability equal
Tukey’s HSD test – allows the compute a single value
to the difference between Q3 and Q1
that determines the minimum difference between
▪
Semi-interquartile Range: equal to the
treatment means that is necessary for significance
interquartile range divided by 2
o Measures of Central Tendency – statistics that
o
Standard
Deviation – equal to the square root of the
indicates the average or midmost score between the
average squared deviations about the mean
extreme scores in a distribution
▪ Equal to the square root of the variance
▪ Goal: Identify the most typical or representative
▪ Variance: equal to the arithmetic mean of the
of entire group
squares of the differences between the scores in a
Mean – the average of all the raw scores
distribution and their mean
- Equal to the sum of the observations divided by the
▪
Distance from the mean
number of observations
o Normal Curve – also known as Gaussian Curve
- Interval and ratio data (when normal distribution)
o Bell-shaped, smooth, mathematically defined curve
- Point of least squares
that is highest at its center
- Balance point for the distribution
o
Asymptotically = approaches but never touches the
Median – the middle score of the distribution
axis
- Ordinal, Interval, Ratio
o
Tail – 2 – 3 standard deviations above and below the
- Useful in cases where relatively few scores fall at the
mean
high end of the distribution or relatively few scores fall
at the low end of the distribution
- In other words, for extreme scores, use median
(skewed)
- Identical for sample and population
- Also used when there has an unknown or
undetermined score
- Used in “open-ended” categories (e.g., 5 or more,
more than 8, at least 10)
- For ordinal data
Mode – most frequently occurring score in the
distribution
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
▪ Mean < Median < Mode
Skewed is associated with abnormal, perhaps
because the skewed distribution deviates from the
symmetrical or so-called normal distribution
o Kurtosis – steepness if a distribution in its center
Platykurtic – relatively flat
Leptokurtic – relatively peaked
Mesokurtic – somewhere in the middle
o
Symmetrical Distribution – right side of the graph
is mirror image of the left side
▪ Has only one mode and it is in the center of the
distribution
▪ Mean = median = mode
o Skewness – nature and extent to which symmetry is
absent
o Positive Skewed – few scores fall the high end of the
distribution
▪ The exam is difficult
▪ More items that was easier would have been
desirable in order to better discriminate at the
lower end of the distribution of test scores
o
o
▪ Mean > Median > Mode
Negative Skewed – when relatively few of the scores
fall at the low end of the distribution
▪ The exam is easy
▪ More items of a higher level of difficulty would
make it possible to better discriminate between
scores at the upper end of the distribution
▪ High Kurtosis = high peak and fatter tails
▪ Lower Kurtosis = rounded peak and thinner tails
o Standard Score – raw score that has been converted
from one scale to another scale
o Z-Scores – results from the conversion of a raw score
into a number indicating how many SD units the raw
score is below or above the mean of the distribution
▪ Identify and describe the exact location of each
score in a distribution
▪ Standardize an entire distribution
▪ Zero plus or minus one scale
▪ Have negative values
▪ Requires that we know the value of the variance
to compute the standard error
o T-Scores – a scale with a mean set at 50 and a
standard deviation set at 10
▪ Fifty plus or minus 10 scale
▪ 5 standard deviations below the mean would be
equal to a t-score of 0
▪ Raw score that fell in the mean has T of 50
▪ Raw score 5 standard deviations about the mean
would be equal to a T of 100
▪ No negative values
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
o
o
o
o
o
o
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
▪ Used when the population or variance is unknown
o Directional Hypothesis Test or One-Tailed Test –
Stanine – a method of scaling test scores on a ninestatistical hypotheses specify either an increase or a
point standard scale with a mean of five (5) and a
decrease in the population mean
standard deviation of two (2)
o T-Test – used to test hypotheses about an unknown
Linear Transformation – one that retains a direct
population mean and variance
numerical relationship to the original raw score
▪ Can be used in “before and after” type of research
Nonlinear Transformation – required when the
▪ Sample
must
consist
of
independent
data under consideration are not normally distributed
observationsꟷthat is, if there is not consistent,
Normalizing the distribution involves stretching the
predictable relationship between the first
skewed curve into the shape of a normal curve and
observation and the second
creating a corresponding scale of standard scores, a
▪ The population that is sampled must be normal
scale that is technically referred to as Normalized
▪ If not normal distribution, use a large sample
Standard Score Scale
o Correlation Coefficient – number that provides us
Generally preferrable to fine-tune the test according
with an index of the strength of the relationship
to difficulty or other relevant variables so that the
between two things
resulting distribution will approximate the normal
o Correlation – an expression of the degree and
curve
direction of correspondence between two things
STEN – standard to ten; divides a scale into 10 units
▪ + & - = direction
▪ Number anywhere to -1 to 1 = magnitude
▪ Positive – same direction, either both going up
or both going down
▪ Negative – Inverse Direction, either DV is up
and IV goes down or IV goes up and DV goes
down
▪ 0 = no correlation
Z-Score
T-Score
Stanine
STEN
IQ
GRE or SAT
Mean
0
50
5
5.5
100
500
SD
1
10
2
2
15
100
Hypothesis Testing – statistical method that uses a
sample data to evaluate a hypothesis about a
population
Alternative Hypothesis – states there is a change,
difference, or relationships
Null Hypothesis – no change, no difference, or no
relationship
o Alpha Level or Level of Significance – used to
define concept of “very unlikely” in a hypothesis test
o Critical Region – composed of extreme values that
are very unlikely to be obtained if the null hypothesis
is true
o If sample data fall in the critical region, the null
hypothesis is rejected
o The alpha level for a hypothesis test is the probability
that the test will lead to a Type I error
o
Pearson
r/Pearson
Correlation
Coefficient/Pearson Product-Moment Coefficient
of Correlation – used when two variables being
correlated are continuous and linear
▪ Devised by Karl Pearson
▪ Coefficient of Determination – an indication of
how much variance is shared by the X- and Yvariables
o Spearman
Rho/Rank-Order
Correlation
Coefficient/Rank-Difference
Correlation
Coefficient – frequently used if the sample size is
small and when both sets of measurement are in
ordinal
▪ Developed by Charles Spearman
o Outlier – extremely atypical point located at a
relatively long distance from the rest of the
coordinate points in a scatterplot
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
o
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Regression Analysis – used for prediction
b. Aptitude – refers to the potential for learning or
▪ Predict the values of a dependent or response
acquiring a specific skill
variable based on values of at least one
c. Intelligence – refers to a person’s general potential
independent or explanatory variable
to solve problems, adapt to changing environments,
▪ Residual: the difference between an observed
abstract thinking, and profit from experience
value of the response variable and the value of the
Human Ability – considerable overlap of
response variable predicted from the regression
achievement, aptitude, and intelligence test
line
Typical Performance Test – measure usual or habitual
▪ The Principle of Least Squares
thoughts, feelings, and behavior
▪ Standard Error of Estimate: standard deviation of
Personality Test – measures individual dispositions
the residuals in regression analysis
and preferences
▪ Slope: determines how much the Y variable
a. Structured Personality tests – provide statement,
changes when X is increased by 1 point
usually self-report, and require the subject to choose
o T-Test (Independent) – comparison or determining
between two or more alternative responses
differences
b. Projective Personality Tests – unstructured, and the
▪ 2 different groups/independent samples +
stimulus or response are ambiguous
interval/ratio scales (continuous variables)
c. Attitude Test – elicit personal beliefs and opinions
Equal Variance – 2 groups are equal
d. Interest Inventories – measures likes and dislikes
as well as one’s personality orientation towards the
Unequal Variance – groups are unequal
world of work
o T-test (Dependent)/Paired Test – one groups
- Purpose: for evaluation, drawing conclusions of some
nominal (either matched or repeated measures) + 2
aspects of the behavior of a person, therapy, decisiontreatments
making
o One-Way ANOVA – 3 or more IV, 1 DV comparison
- Settings: Industrial, Clinical, Educational,
of differences
Counseling, Business, Courts, Research
o Two-Way ANOVA – 2 IV, 1 DV
- Population: Test Developers, Test Publishers, Test
o Critical Value – reject the null and accept the
Reviewers, Test Users, Test Sponsors, Test Takers,
alternative if [ obtained value > critical value ]
Society
o P-Value (Probability Value) – reject null and accept
Levels of Tests
alternative if [ p-value < alpha level ]
1. Level A – anyone under a direction of a supervisor
o Norms – refer to the performances by defined groups
or consultant
on a particular test
2. Level B – psychometricians and psychologists only
o Age-Related Norms – certain tests have different
3. Level C – psychologists only
normative groups for age groups
2. Interview – method of gathering information
o Tracking – tendency to stay at about the same level
through direct communication involving reciprocal
relative to one’s peers
exchange
Norm-Referenced Tests – compares each person with
- can be structured, unstructured, semi-structured, or
the norm
non-directive
Criterion-Referenced Tests – describes specific types
- Mental Status Examination: determines the mental
of skills, tasks, or knowledge that the test taker can
status of the patient
demonstrate
- Intake Interview: determine why the client came for
Selection of Assessment Methods and Tools and Uses,
assessment; chance to inform the client about the
Benefits, and Limitations of Assessment tools and
policies, fees, and process involved
instruments (32)
- Social Case: biographical sketch of the client
Identify appropriate assessment methods, tools (2)
- Employment Interview: determine whether the
1. Test – measuring device or procedure
candidate is suitable for hiring
- Psychological Test: device or procedure designed to
- Panel Interview (Board Interview): more than one
measure variables related to psychology
interviewer participates in the assessment
Ability or Maximal Performance Test – assess what
- Motivational Interview: used by counselors and
a person can do
clinicians to gather information about some
a. Achievement Test – measurement of the previous
problematic behavior, while simultaneously attempting
learning
to address it therapeutically
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
o
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
3. Portfolio – samples of one’s ability and
provides
behavioral
observations
during
accomplishment
administration
- Purpose: Usually in industrial settings for evaluation
Wechsler Intelligence Scales (WAIS-IV, WPPSI-IV,
of future performance
WISC-V)
4. Case History Data – refers to records, transcripts,
[C]
and other accounts in written, pictorial, or other form
- WAIS (16-90 years old), WPPSI (2-6 years old),
that preserve archival information, official and
WISC (6-11)
informal accounts, and other data and items relevant to
- individually administered
an assessee
- norm-referenced
- Standard Scores: 100 (mean), 15 (SD)
5. Behavioral Observation – monitoring of actions of
- Scaled Scores: 10 (mean), 3 (SD)
others or oneself by visual or electronic means while
- addresses the weakness in Stanford-Binet
recording quantitative and/or qualitative information
- could also assess functioning in people with brain
regarding those actions
injury
- Naturalistic Observation: observe humans in natural
- evaluates patterns of brain dysfunction
setting
- yields FSIQ, Index Scores (Verbal Comprehension,
6. Role Play – defined as acting an improvised or
Perceptual Reasoning, Working Memory, and
partially improvised part in a stimulated situation
Processing Speed), and subtest-level scaled scores
- Role Play Test: assesses are directed to act as if they
Raven’s Progressive Matrices (RPM)
are in a particular situation
[B]
- Purpose: Assessment and Evaluation
- 4 – 90 years old
- Settings: Industrial, Clinical
- nonverbal test
- Population: Job Applicants, Children
- used to measure general intelligence & abstract
7. Computers – using technology to assess an client,
reasoning
thus, can serve as test administrators and very efficient
- multiple choice of abstract reasoning
test scorers
- group test
8. Others: videos, biofeedback devices
- IRT-Based
Intelligence Tests
Culture Fair Intelligence Test (CFIT)
Stanford-Binet Intelligence Scale 5th Ed. (SB-5)
[ B]
[C]
- Nonverbal instrument to measure your analytical and
- 2-85 years old
reasoning ability in the abstract and novel situations
- individually administered
- Measures individual intelligence in a manner
- norm-referenced
designed to reduced, as much as possible, the influence
- Scales: Verbal, Nonverbal, and Full Scale (FSIQ)
of culture
- Nonverbal and Verbal Cognitive Factors: Fluid
- Individual or by group
Reasoning, Knowledge, Quantitative Reasoning,
- Aids in the identification of learning problems and
Visual-Spatial Processing, Working Memory
helps in making more reliable and informed decisions
- age scale and point-scale format
in relation to the special education needs of children
- originally created to identify mentally disabled
Purdue Non-Language Test
children in Paris
[B]
- 1908 Scale introduced Age Scale format and Mental
- Designed to measure mental ability, since it consists
Age
entirely of geometric forms
- 1916 scale significantly applied IQ concept
- Culture-fair
- Standard Scores: 100 (mean), 15 (SD)
- Self-Administering
- Scaled Scores: 10 (mean), 3 (SD)
Panukat ng Katalinuhang Pilipino
- co-normed with Bender-Gestalt and Woodcock- Basis for screening, classifying, and identifying needs
Johnson Tests
that will enhance the learning process
- based on Cattell-Horn-Carroll Model of General
- In business, it is utilized as predictors of occupational
Intellectual Ability
achievement by gauging applicant’s ability and fitness
- no accommodations for pwds
for a particular job
- 2 routing tests
- w/ teaching items, floor level, and ceiling level
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- Essential for determining one’s capacity to handle the
- K Scale = reveals a person’s defensiveness around
challenges associated with certain degree programs
certain questions and traits; also faking good
- Subtests: Vocabulary, Analogy, Numerical Ability,
- K scale sometimes used to correct scores on five
Nonverbal Ability
clinical scales. The scores are statistically corrected for
an individual’s overwillingness or unwillingness to
Wonderlic Personnel Test (WPT)
admit deviance
- Assessing cognitive ability and problem-solving
- “Cannot Say” (CNS) Scale: measures how a person
aptitude of prospective employees
doesn’t answer a test item
- Multiple choice, answered in 12 minutes
- High ? Scale: client might have difficulties with
Armed Services Vocational Aptitude Battery
reading, psychomotor retardation, or extreme
- Most widely used aptitude test in US
defensiveness
- Multiple-aptitude battery that measures developed
- True Response Inconsistency (TRIN): five true, then
abilities and helps predict future academic and
five false answers
occupational success in the military
- Varied Response Inconsistency (VRIN): random true
Kaufman Assessment Battery for Children-II
or false
(KABC-II)
- Infrequency-Psychopathology Scale (Fp): reveal
intentional or unintentional over-reporting
- Alan & Nadeen Kaufman
- FBS Scale: “symptom validity scale” designed to
- for assessing cognitive development in children
detect intentional over-reporting of symptoms
- 13 to 18 years old
- Back Page Infrequency (Fb): reflects significant
Personality Tests
change in the testtaker’s approach to the latter part of
Minnesota Multiphasic Personality Inventory
the test
(MMPI-2)
Myers-Briggs Type Indicator (MBTI)
[C]
- Katherine Cook Briggs and Isabel Briggs Myers
- Self-report inventory designed to identify a person’s
- Multiphasic personality inventory intended for used
personality type, strengths, and preferences
with both clinical and normal populations to identify
- Extraversion-Introversion Scale: where you prefer to
sources of maladjustment and personal strengths
focus your attention and energy, the outer world and
- Starke Hathaway and J. Charnley McKinley
external events or your inner world of ideas and
- Help in diagnosing mental health disorders,
experiences
distinguishing normal from abnormal
- Sensing-Intuition Scale: how do you take inform, you
- should be administered to someone with no guilt
take in or focus on interpreting and adding meaning on
feelings for creating a crime
the information
- individual or by groups
- Thinking-Feeling Scale: how do you make decisions,
- Clinical Scales: Hypochondriasis, Depression,
logical or following what your heart says
Hysteria,
Psychopathic
Deviate,
- Judging-Perceiving Scale: how do you orient the
Masculinity/Femininity, Paranoia, Psychasthenia
outer world? What is your style in dealing with the
(Anxiety,
Depression,
OCD),
Schizophrenia,
outer world – get things decided or stay open to new
Hypomania, Social Introversion
info and options?
- Lie Scale (L Scale): items that are somewhat negative
Edward’s Preference Personality Schedule (EPPS)
but apply to most people; assess the likelihood of the
[B]
test taker to approach the instrument with defensive
- designed primarily as an instrument for research and
mindset
counselling purposes to provide quick and convenient
- High in L scale = faking good
measures of a number of relatively normal personality
- High in F scale = faking bad, severe distress or
variables
psychopathology
- based of Murray’s Need Theory
- Superlative Self Presentation Scale (S Scale): a
- Objective, forced-choice inventory for assessing the
measure of defensiveness; Superlative Selfrelative importance that an individual places on 15
Presentation to see if you intentionally distort answers
personality variables
to look better
- Useful in personal counselling and with non-clinical
- Correction Scale (K Scale): reflection of the frankness
adults
of the testtaker’s self-report
- Individual
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
Guilford-Zimmerman
Temperament
Survey
- 5 years and older
(GZTS)
- subjects look at 10 ambiguous inkblot images and
- items are stated affirmatively rather than in question
describe what they see in each one
nd
form, using the 2 person pronoun
- once used to diagnose mental illnesses like
- measures 10 personality traits: General Activity,
schizophrenia
Restraint, Ascendance, Sociability, Emotional Stability,
- Exner System: coding system used in this test
Objectivity, Friendliness, Thoughtfulness, Personal
- Content: the name or class of objects used in the
Relations, Masculinity
patient’s responses
NEO Personality Inventory (NEO-PI-R)
Content:
- Standard questionnaire measure of the Five Factor
1. Nature
Model, provides systematic assessment of emotional,
2. Animal Feature
interpersonal,
experiential,
attitudinal,
and
3. Whole Human
motivational styles
4. Human Feature
- gold standard for personality assessment
5. Fictional/Mythical Human Detail
- Self-Administered
6. Sex
- Neuroticism: identifies individuals who are prone to
psychological distress
Determinants:
- Extraversion: quantity and intensity of energy
1. Form
directed
2. Movement
- Openness To Experience: active seeking and
3. Color
appreciation of experiences for their own sake
4. Shading
- Agreeableness: the kind of interactions an individual
5. Pairs and Reflections
prefers from compassion to tough mindedness
- Conscientiousness: degree of organization,
Location:
persistence, control, and motivation in goal-directed
1. W – the whole inkblot was used to depict an image
behavior
2. D – commonly described part of the blot was used
Panukat ng Ugali at Pagkatao/Panukat ng
3. Dd – an uncommonly described or unusual detail
Pagkataong Pilipino
was used
- Indigenous personality test
4. S – the white space in the background was used
- Tap specific values, traits and behavioral dimensions
Thematic Apperception Test
related or meaningful to the study of Filipinos
[C]
Sixteen Personality Factor Questionnaire
- Christiana Morgan and Henry Murray
- Raymond Cattell
- 5 and above
- constructed through factor analysis
- 31 picture cards serve as stimuli for stories and
- Evaluates a personality on two levels of traits
descriptions about relationships or social situations
- Primary Scales: Warmth, Reasoning, Emotional
- popularly known as the picture interpretation
Stability, Dominance, Liveliness, Rule-Consciousness,
technique because it uses a standard series of
Social
Boldness,
Sensitivity,
Vigilance,
provocative yet ambiguous pictures about which the
Abstractedness, Privateness, Apprehension, Openness
subject is asked to tell a story
to change, Self-Reliance, Perfectionism, Tension
- also modified African American testtakers
- Global Scales: Extraversion, Anxiety, ToughChildren’s Apperception Test
Mindedness, Independence, Self-Control
Big Five Inventory-II (BFI-2)
- Bellak & Bellak
- 3-10 years old
- Soto & John
- based on the idea that animals engaged in various
- Assesses big 5 domains and 15 facets
activities were useful in stimulating projective
- for commercial purposes to researches and students
storytelling by children
Projective Tests
Hand Test
Rorshcach Inkblot Test
[C]
- Edward Wagner
- Hermann Rorschach
- 5 years old and above
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
- used to measure action tendencies, particularly acting
- can also be used to assess brain damage and general
out and aggressive behavior, in adults and children
mental functioning
- 10 cards (1 blank)
- measures the person’s psychological and emotional
functioning
Apperceptive Personality Test (APT)
- The house reflects the person’s experience of their
immediate social world
- Holmstrom et. Al.
- The tree is a more direct expression of the person’s
- attempt to address the criticisms of TAT
emotional and psychological sense of self
- introduced objectivity in scoring system
- The person is a more direct reflection of the person’s
- 8 cards include male and female of different ages and
sense of self
minority group members
- testtakers will respond to a series of multiple choice
Draw-A-Person Test (DAP)
questions after storytelling
- Florence Goodenough
Word Association Test (WAT)
- 4 to 10 years old
- a projective drawing task that is often utilized in
- Rapaport et. Al.
psychological assessments of children
- presentation of a list of stimulus words, assessee
- Aspects such as the size of the head, placement of the
responds verbally or in writing the first thing that
arms, and even things such as if teeth were drawn or
comes into their minds
not are thought to reveal a range of personality traits
Rotter Incomplete Sentences Blank (RISB)
-Helps people who have anxieties taking tests (no strict
format)
- Julian Rotter & Janet Rafferty
-Can assess people with communication problems
- Grade 9 to Adulthood
-Relatively culture free
- most popular SCT
-Allow for self-administration
SACK’s Sentence Completion Test (SSCT)
Kinetic Family Drawing
- Joseph Sacks and Sidney Levy
- Burns & Kaufman
- 12 years old and older
- derived from Hulses’ FDT “doing something”
- asks respondents to complete 60 questions with the
Clinical & Counseling Tests
first thing that comes to mind across four areas: Family,
Sex, Interpersonal, Relationships and Self concept
Millon Clinical Multiaxial Scale-IV (MCMI-IV)
Bender-Gestalt Visual Motor Test
- Theodore Millon
[C]
- 18 years old and above
- for diagnosing and treatment of personality disorders
- Lauretta Bender
- exaggeration of polarities results to maladaptive
- 4 years and older
behavior
- consists of a series of durable template cards, each
- Pleasure-Pain: the fundamental evolutionary task
displaying a unique figure, then they are asked to draw
- Active-Passive: one adapts to the environment or
each figure as he or she observes it
adapts the environment to one’s self
- provides interpretative information about an
- Self-Others: invest to others versus invest to oneself
individual’s development and neuropsychological
functioning
Beck Depression Inventory (BDI-II)
- reveals the maturation level of visuomotor
perceptions, which is associated with language ability
- Aaron Beck
and various functions of intelligence
- 13 to 80 years old
- 21-item self-report that tapos Major Depressive
House-Tree-Person Test (HTP)
symptoms accdg. to the criteria in the DSM
- John Buck and Emmanuel Hammer
MacAndrew Alcoholism Scale (MAC & MAC-R)
- 3 years and up
- measures aspects of a person’s personality through
- from MMPI-II
interpretation of drawings and responses to questions
- Personality & attitude variables thought to underlie
alcoholism
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
California Psychological Inventory (CPI-III)
- can take note of verbal - sometimes, due to
and nonverbal cues
negligence of interviewer
- attempts to evaluate personality in normally adjusted
- flexible
and interviewee, it can
individuals
- time and cost effective
miss
out
important
- has validity scales that determines faking bad and
- both structured and information
faking good
unstructured
allows - interviewer’s effect on
- interpersonal style and orientation, normative
clinicians to place a wider, the interviewee
orientation and values, cognitive and intellectual
more meaningful context - various error such as
function, and role and personal style
- can also be used to help halo effect, primacy
- has special purpose scales, such as managerial
predict future behaviors
effect, etc.
potential, work orientation, creative temperament,
interviews allow
- interrater reliability
leadership potential, amicability, law enforcement
- clinicians to establish - interviewer bias
orientation, tough-mindedness
rapport and encourage
client self-exploration.
Rosenberg Self-Esteem Scale
Portfolio
- measures global feelings of self-worth
- provides comprehensive - can be very demanding
- 10-item, 4 point likert scale
illustration of the client - time consuming
- used with addolescents
which highlights the
Dispositional Resilience Scale (DRS)
strengths and weaknesses
Observation
- measures psychological hardiness defined as the
- flexible
- For private practitioners,
ability to view stressful situations as meaningful,
- suitable for subjs that it is typically not practical
changeable, and challenging
cannot be studied in lab or economically feasible
Ego Resiliency Scale-Revised
setting
to spend hours out of the
- measure ego resiliency or emotional intelligence
- more realistic
consulting
room
HOPE Scale
- affordable
observing clients as they
- developed by Snyder
- can detect patterns
go about their daily lives
- Agency: cognitive model with goal driven energy
- lack of scientific control,
- Pathway: capacity to contrast systems to meet goals
ethical
considerations,
- good measure of hope for traumatized people
and potential for bias from
- positively correlated with health psychological
observers and subjects
adjustment, high achievement, good problem solving
- unable to draw causeskills, and positive health-related outcomes
and-effect conclusions
- lack of control
Satisfaction with Life Scale (SWLS)
- lack of validity
- overall assessment of life satisfaction as a cognitive
- observer bias
judgmental process
Case History
Positive and Negative Affect Schedule (PANAS)
- measure the level of positive and negative emotions a
- can fully show the - cannot be used to
test taker has during the test administration
experience of the observer generalize a phenomenon
in the program
Strengths and weaknesses of assessment tools (2)
- shed light on an
Pros
Cons
individual’s past and
Test
current adjustment as well
- can gather a sample of - In crisis situations when
as on the events and
behavior objectively with relatively rapid decisions
circumstances that may
lesser bias
need to be made, it can be
have contributed to any
- flexible, can be verbal or impractical to take the
changes in adjustment
nonverbal
time
required
to
Role Play
administer and interpret
- encourages individuals - may not be as useful as
tests
to come together to find the real thing in all
Interview
solutions and to get to situations
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
know
how
their - time-consuming
colleagues think
- expensive
- group can discuss ways - inconvenient to assess in
to potentially resolve the a real situation
situation and participants - While some employees
leave with as much will be comfortable role
information as possible, playing, they’re less adept
resulting in more efficient at getting into the required
handling of similar real- mood needed to actually
life scenarios
replicate a situation
▪ The greater number of items, the higher the
Test Administration, Scoring, Interpretation and
reliability
Usage (20)
▪
Factors that contribute to inconsistency:
Detect Errors and impacts in Test
characteristics of the individual, test, or situation,
Issues in Intelligence Testing
which have nothing to do with the attribute being
1. Flynn Effect – progressive rise in intelligence score
measured, but still affect the scores
that is expected to occur on a normed intelligence test
o
Error
Variance – variance from irrelevant random
from the date when the test was first normed
sources
▪ Gradual increase in the general intelligence
Measurement
Error – all of the factors associated
among newborns
with
the
process
of measuring some variable, other than
▪ Frog Pond Effect: theory that individuals
the
variable
being
measured
evaluate themselves as worse when in a group of
difference
between
the
observed
score
and
the true
high-performing individuals
score
2. Culture Bias of Testing
Positive:
can
increase
one’s
score
▪ Culture-Free: attempt to eliminate culture so
Negative:
decrease
one’s
score
nature can be isolated
- Sources of Error Variance:
▪ Impossible to develop bec culture is evident in its
a.
Item Sampling/Content Sampling
influence since birth or an individual and the
b.
Test Administration
interaction between nature and nurture is
c. Test Scoring and Interpretation
cumulative and not relative
Random Error – source of error in measuring a
▪ Culture Fair: minimize the influence of culture
targeted variable caused by unpredictable fluctuations
with regard to various aspects of the evaluation
and inconsistencies of other variables in measurement
procedures
process (e.g., noise, temperature, weather)
▪ Fair to all, fair to some cultures, fair only to one
Systematic Error – source of error in a measuring a
culture
variable that is typically constant or proportionate to
▪ Culture Loading: the extent to which a test
what is presumed to be the true values of the variable
incorporates the vocabulary concepts traditions,
being
measured
knowledge etc. with particular culture
has
consistent
effect
on
the
true
score
Errors: Reliability
SD
does
not
change,
the
mean
does
o Classical Test Theory (True Score Theory) – score
▪ Error variance may increase or decrease a test
on ability tests is presumed to reflect not only the
score by varying amounts, consistency of test
testtaker’s true score on the ability being measured
score, and thus, the reliability can be affected
but also the error
Test-Retest
Reliability
▪ Error: refers to the component of the observed
Error:
Time
Sampling
test score that does not have to do with the
- the longer the time passes, the greater likelihood that
testtaker’s ability
the reliability coefficient would be insignificant
▪ Errors of measurement are random
- Carryover Effects: happened when the test-retest
interval is short, wherein the second test is influenced
by the first test because they remember or practiced the
previous test = inflated correlation/overestimation of
reliability
- Practice Effect: scores on the second session are
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
2. True Negatives (Specificity) – predict failure
higher due to their experience of the first session of
that does occur
testing
3. False Positive (Type 1) – success does not occur
- test-retest with longer interval might be affected of
4. False Negative (Type 2) – predicted failure but
other extreme factors, thus, resulting to low correlation
succeed
- target time for next administration: at least two weeks
Parallel Forms/Alternate Forms Reliability
Error: Item Sampling (Immediate), Item Sampling
changes
over
time
(delayed)
- Counterbalancing: technique to avoid carryover
effects for parallel forms, by using different sequence
for
groups
- most rigorous and burdensome, since test developers
create
two
forms
of
the
test
- main problem: difference between the two tests
- test scores may be affected by motivation, fatigue, or
intervening events
- create a large set of questions that address the same
construct then randomly divide the questions into two
Errors due to Behavioral Assessment
sets
1. Reactivity – when evaluated, the behavior increases
Internal Consistency (Inter-Item Reliability)
- Hawthorne Effect
Error: Item Sampling Homogeneity
2. Drift – moving away from what one has learned
Split-Half Reliability
going to idiosyncratic definitions of behavior
Error: Item sample: Nature of Split
- subjects should be retrained in a point of time
Inter-Scorer Reliability
- Contrast Effect: cognitive bias that distorts our
Error: Scorer Differences
perception of something when we compare it to
o Standard Error of Measurement – provide a
something else, by enhancing the differences between
measure of the precision of an observed test score
them
▪ Standard deviation of errors as the basic measure
3. Expectancies – tendency for results to be influenced
of error
by what test administrators expect to find
▪ Index of the amount of inconsistent or the amount
- Rosenthal/Pygmalion Effect: Test administrator’s
of the expected error in an individual’s score
expected results influences the result of the test
▪ Allows to quantify the extent to which a test
- Golem Effect: negative expectations decreases one’s
provide accurate scores
performance
▪ Provides an estimate of the amount of error
4. Rating Errors – intentional or unintentional misuse
inherent in an observed score or measurement
of the scale
▪ Higher reliability, lower SEM
- Leniency Error: rater is lenient in scoring (Generosity
▪ Used to estimate or infer the extent to which an
Error)
observed score deviates from a true score
- Severity Error: rater is strict in scoring
▪ Standard Error of a Score
- Central Tendency Error: rater’s rating would tend to
▪ Confidence Interval: a range or band of test
cluster in the middle of the rating scale
scores that is likely to contain true scores
- Halo Effect: tendency to give high score due to failure
o Standard Error of the Difference – can aid a test
to discriminate among conceptually distinct and
user in determining how large a difference should be
potentially independent aspects of a ratee’s behavior
before it is considered statistically significant
- snap judgement on the basis of positive trait
o Standard Error of Estimate – refers to the standard
- Horn Effect: Opposite of Halo Effect
error of the difference between the predicted and
- One way to overcome rating errors is to use rankings
observed values
5. Fundamental Attribution Error – tendency to
o Four Possible Hit and Miss Outcomes
explain someone’s behavior based on internal factors
1. True Positives (Sensitivity) – predict success
such as personality or disposition, and to underestimate
that does occur
the influence the external factors have on another
person’s behavior, blaming it on the situation
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
to ensure that services were not denied. However,
- Barnum Effect: people tend to accept vague
the services are discontinued once the appropriate
personality descriptions as accurate descriptions of
services are available
themselves (Aunt Fanny Effect)
o
Psychologists
should discuss the limits of
o Bias – factor inherent in a test that systematically
confidentiality, uses of the information that would be
prevents accurate, impartial measurement
generated from the services to the persons and
▪ Prejudice, preferential treatment
organizations with whom they establish a scientific
▪ Prevention during test dev through a procedure
or professional relationships
called Estimated True Score Transformation
o
Before recording voices or images, they must obtain
Ethical Principles and Standards of Practice (19)
permission first from all persons involved or their
o If mistakes was made, they should do something to
legal rep
correct or minimize the mistakes
o Only discuss confidential information with persons
o If an ethical violation made by another psychologist
clearly concerned/involved with the matters
was witnessed, they should resolve the issue with
o Disclosure is allowed with appropriate consent
informal resolution, as long as it does not violate any
▪ No consent is not allowed UNLESS mandated by
confidentiality rights that may be involved
the law
o If informal resolution is not enough or appropriate,
o No disclosure of confidential information that could
referral to state or national committees on
lead to the identification of a client unless they have
professional ethics, state licensing boards, or the
obtained prior consent or the disclosure cannot be
appropriate institutional authorities can be done.
avoided
Still, confidentiality rights of the professional in
▪ Only disclose necessary information
question must be kept.
o Exemptions to disclosure:
o Failure to cooperate in ethics investigation itself, is
✓ If the client is disguised/identity is protected
an ethics violation, unless they request for deferment
✓ Has consent
of adjudication of an ethics complaint
✓ Legally mandated
o Psychologists must file complaints responsibly by
o Psychologists can create public statements as long as
checking facts about the allegations
they would be responsible for it
o Psychologists DO NOT deny persons employment,
▪
They cannot compensate employees of the media
advancement, admissions, tenure or promotion based
in return for publicity in a news item
solely upon their having made or their being the
▪ Paid Advertisement must be clearly recognizable
subject of an ethics complaint
▪ when they are commenting publicly via internet,
▪ Just because they are questioned by the ethics
media, etc., they must ensure that their statement
committee or involved in an on-going ethics
are based on their professional knowledge in
investigation, they would be discriminated or
accord with appropriate psych literature and
denied advancement
practice, consistent with ethics, and do not
▪ Unless the outcome of the proceedings are
indicate that a professional relationship has been
already considered
established with the recipient
o Psychologists should do their services within the
o Must provide accurate information and obtain
boundaries of their competence, which is based on
approval prior to conducting the research
the amount of training, education, experience, or
o
Informed consent is required, which include:
consultation they had
✓
Purpose of the research
o When they are tasked to provide services to
✓ Duration and procedures
clients who are deprived with mental health
✓ Right to decline and withdraw
services (e.g., communities far from the urban
✓ Consequences of declining or withdrawing
cities), however, they were still not able to obtain
✓ Potential risks, discomfort, or adverse effects
the needed competence for the job, they could
✓ Benefits
still provide services AS LONG AS they make
✓ Limits of confidentiality
reasonable effort to obtain the competence
✓ Incentives for participation
required, just to ensure that the services were not
✓ Researcher’s contact information
denied to those communities
o Permission for recording images or vices are needed
o During emergencies, psychologists provide
unless the research consists of solely naturalistic
services to individuals, even though they are yet
to complete the competency/training needed just
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Psychological Assessment
#BLEPP2023
Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls
observations in public places, or research designed
o Art. 12 of Revised Penal Code – Insanity Plea
includes deception
end
▪ Consent must be obtained during debriefing
o Dispense or Omitting Informed consent only when:
congratulations on reaching the end of this reviewer!! i
1. Research would not create distress or harm
hope u learned something!! :D
▪ Study of normal educational practices conducted
in an educational settings
one day, we will be remembered.
▪ Anonymous
questionnaires,
naturalistic
observation, archival research
- aly <3
▪ Confidentiality is protected
2. Permitted by law
o Avoid offering excessive incentives for research
participation that could coerce participation
o DO not conduct study that involves deception unless
they have justified the use of deceptive techniques in
the study
▪ Must be discussed as early as possible and not
during the conclusion of data collection
o They must give opportunity to the participants about
the nature, results, and conclusions of the research
and make sure that there are no misconceptions about
the research
o Must ensure the safety and minimize the discomfort,
infection, illness, and pain of animal subjects
▪ If so, procedures must be justified and be as
minimal as possible
▪ During termination, they must do it rapidly and
minimize the pain
o Must no present portions of another’s work or data
as their own
▪ Must take responsibility and credit, including
authorship credit, only for work they have
actually performed or to which they have
substantially contributed
▪ Faculty advisors discuss publication credit with
students as early as possible
o After publishing, they should not withhold data from
other competent professionals who intends to
reanalyze the data
▪ Shared data must be used only for the declared
purpose
o RA 9258 – Guidance and Counseling Act of 2004
o RA 9262 – Violence Against Women and Children
o RA 7610 – Child Abuse
o RA 9165 – Comprehensive Dangerous Drugs Act of
2002
o RA 11469 – Bayanihan to Heal as One Act
o RA 7277 – Magna Carta for Disabled Persons
o RA 11210 – Expanded Maternity Leave Law
o RA 11650 – Inclusive Education Law
o RA 10173 – Data Privacy Act
o House Bill 4982 – SOGIE Bill
Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly
Download