Psychological Testing and Psychological Assessment
Differentiate Psychological Testing and Assessment
1905 Alfred Binet and Theodore Standford = world’s first IQ Test Developer
Binet-Stanford Test- measures the cognitive ability of the gifted and talented
children in Paris
Testing = the term used to describe the whole process, starting from administering
the test until the interpretation of the test
1914 World War I
1917 Large number of military is needs to be screened for intellectual and emotional
OSS Model (Office of Strategic Services) = the forerunner of today’s CIA (Central
Intelligence Agency).
Psychological Assessment
Psychological Testing
Psychological Soundness
Parapsychology = it is the study of alleged psychiatric phenomena
Psychometrist or Psychometrician = refer to a professional who uses, analyzes and
interprets psychological test data
Utility = refers to the usefulness/practical value of the test or other tool of assessment
has for a particular purpose
Test Developers and publishers = create tests or other methods of assessment
UTest Users = professionals who use the psychological tests and assessment
● clinicians,
● Experimental psychologist
● counselors
● social psychologists
● school psychologists
● human resources personnel
● consumer psychologist
Methods of Psychological Assessment
● Retrospective Assessment
● Remote Assessment
● Ecological Momentary Assessment
● Collaborative Assessment
● Therapeutic Assessment
● Dynamic Assessment
The Tools of Psychological Assessment
● Test
● Format
● Score
● Scoring
● Cut Score
MOtivational Interview
Case History data
Behavioral observation
Naturalistic observation
Role play
Test development
= is an umbrella term that all goes in the process of creating a Psychometrically sound-validreliable test.
The process are as follows;
Test Conceptualization=
Test Construction =
Test Try-out =
Test Analysis =
Test Revision
Test Conceptualization
= the test starts from the developers decision
to create a certain test, that measures certain construct in a certain way
Basis: review of literature of another existing test with psychometric
: existing social phenomena/ behavior pattern that needs to be
: needs of the test to assess mastery in emerging occupation or
Questions :
1. What is the test designed to measure? (purpose/rationale/ construct)
2. What is the objective of the test? what real world behavior are assessed
3. Is there a need for this test? (should be better and more important than previous
4. Who will use the test? (clinician? Education? Others?
5.Who will take the test? (children? Adults? Women?
6. What content will this cover? And why
7. How will the test be administered? (individual, group, amenable for both
8. Ideal format (True/ False, Multiple Choice, Essay, agree/disagree, open-ended
questions of some mix of these
9. Should more than tests be developed?
10. Competence of the administrator to qualify
11. What response is required from test takers (refers to limitation of the test
12. Who benefits from the test? How will test takers benefit from this? Is there
potential harm from the result
13. How is interpretation of the score done?
Criterion-Referenced – referencing how your scre compares to a criterion such as a
cutscore or a body of knowledge
Norm-referenced-referencing how your score compares to other people.
Test Construction
Scaling the process by which test/measuring device is designed and
calibrated to which numbers are assigned to different trait, atributes and char.
Scale Values : are assigned to differentiate amounts of the trait, state or
ability being measured
a. Few Types are:
1. Age-based = where the performance is a function of age
2. Grade-based = where the performance is a function of grad
3. Stanine scale (Standard Nine)= if all raw scores are
transformed into scores running from 1-9
—------------------------------------------------------------------Scaling Methods
● Rating Scale is defined by grouping of words, statements, or symbols
on which judgments of strength of a particular trait, emotion, attitude are
indicated by the test taker. Yields ordinal-level data
Example: Please rate the employees on ability to cooperate and get
along with fellow employees
● Summative Scale: a type of rating scale where the final score is
obtained by summing the ratings across all items
● Likert Scale = a type of summative rating scale, indiv. scores added to
get total score
=is usually used to scale attitudes
● Method of paired comparisons test takers are presented with pairs of
stimuli( 2 paragraphs, 2 subjects, 2 statements) which they are asked to
= They must select one stimulus to which they agree more or the one
they find more appealing
● Scoring= test takers receive more score for selecting the option
deemed more justifiable by majority of a group of judges. Yield
ordinal data
● Comparative Scaling = entails judgment of a stimulus in comparison in
every other stimulus on the scale
 Ranking of Experts= asking a panel of experts to rank behavioral
indicators and provide meaningful numerical score
 Method of Equal-Appearing Intervals=
Method of Absolute Scaling=
Guffmans Scales=
Method of Empirical Keying=
Method of Rational Scaling=
Categorical Scaling
Writing Items
Define clearly what you want to measure
generate an item pool
avoid exceptionally long items
Keep the level of difficulty appropriate for those who will take
Avoid double barrele items that convey two or more ideas at the same
consider mixing positively and negatively worded items
1. Rational (Theoretical) Approach = reliance on reason and logic over data
collection for statistic analysis
2. Empirical Approach= reliance on data gathering to identify items that relate
to the construct
3. Bootstrap= combination of rational and empirical approaches based on a
theory, then an empirical approach will be used to identify items that are highly
related to the construct
Multiple Choice
Binary-choice (i.e., True or False)
Short ANswer
is the form, plan, structure,
arranegement, and layout
of individual test items
Test Construction
Scoring Models
Cumulative Scoring = the higher the score on the test, the higher the testtaker is
on ability, trait, or other characteristics that the test purports to measure
Class/Category Scoring= testtaker responses ear credit towards placement in a
particular class/category with other testtakers whose patterns of response is
presumbaly similar in some ways
Ipsative=a typical objective is comparing a testtaker’s score on one scale
within a test to another scale within that same test.
Writing Items for Computer Administration
has 2 advantages for digital media;
Item bank
a relatively large and easily accessible
collection of test questions
Item branching
the ability of the computer to tailor
the content and order of test items
on the basis of response to previous
Computerized adaptive testing
an interactive, computer-administered test-taking process wherin items and its
difficulty presented to the the testtaker are based in part on his performance on
previous items.
Floor Effect
refers to the diminished utility of an assessment tool
for distinguishing testtakers at the low end of the
ability, trait or other attribute being measured
Ceiling effect
refers to the diminished utility of an assessment
tool for distinguishing testtakers at the high end
of the ability, trait or other attribute being
Test Tryout
=The test should be tried out on people who are similar in critical respects to
the people to whom the test was designed
Rule of thumb in dciding the number of people for whom the test should be tried out:
There should be no fewer that 5 subjects and preferably as many as 10 for each item.
The more subjects employed, the weaker the role of chance in subsequent factor analysis
A x 5 to 10 = n
A =items on a questionnaire n=participants
= for validation purposes, there must be at least 20 participants each
=the ff. conditions of the try out should be identical or similar to which the test
is primarily designed
o all instructions
o time limits alloted for completing the test
o atmosphere at the test site
What is a good
= a good test helps in
discriminating test takers
= a good test item is one that
is answered correctly (or in
an expected manner) by the high scores on the test as a whole
= is one that is answered incorrectly by low scores on the test as a whole
(is answered erroneously by low scorers on the exam as a whole)
What is a bad
= a test is answered correctly
by low scorers on the test as
a whole
Item Analysis
Item Analysis= refers to the process of statistically analyzing assessment data
to ecaluate the quality and performance of you test items
+ this provides documentaion of validity (proves that it performs well and score
interpretations mean what you intend
= a group of procedure used by test developers to identify the best items from
a pool of tryout items
= through this test deveopers can identify which items are good and which are
deficient (which items are retained, revised or removed
= this also identifies the concepts testtakers have mastered or have not
GOAL: 1. find the items that are not performing well ( difficulty and
2. find out why those items are not perofrming well
test is too difficult or easy
too confusing (not discriminating)
biased to a minority group
2 Paradigms for Test Analysis
Classical Test Theory
Item Response Theory
The analyses can differ based on whether the item is
Test Item Types
dichotonomous (right or wrong)
polytomous (2 or more points)
Forced-choice items (represents a different counstruct/domain, but they
Ipsative scores
Advantages of
=Item Difficulty Index
= Item Reliability Index
= Item Validity Index
= Item-Discrimination Index
-most popular and frequently used type of test item.
-easliy scored(time savings and enhance score reliability
Item fairness
Speed Tests
Qualitative Item ANalysis
“Think Aloud” Test Administration
-Innovative approach to cognitive assessment by having respondents
verbalize thoughts as they occur
Expert Panels
-Sensitive Review
- Testtakers could be interviewed
Test Revision
Select-response format (multiple choice, matching type, true or false