Uploaded by GABRIELLE KRIZIA DELA CUESTA

PSYCHOLOGICAL ASSESSMENT

advertisement
PSCYHOLOGICAL TEST
PSYCHOLOGICAL
ASSESSMENT
SCALES
SCORING
PRINCIPLES/INTRO
Set of items designed to measure characteristics of human beings that pertains to behavior.
Psychological assessment gathering use tools: tests, interviews, case studies, observations

Collaborative- assessor &assesse may work as partners

Therapeutic

Dynamic
Relate raw scores to some theoretical/empirical distribution (IRON)
*cut score*
SCALES OF MEASUREMENT (IRON)
Mag.
/
Eq.Int
/
-
Abs. 0
NOMINAL: no ranking
ORDINAL: ranking
-
/
-
-
-
-
PARAMETRIC: Normal distribution of scores (Pearson r)
/
/
NONPARAMETRIC: Abnormal Distribution of Scores
(Spearman, Chisquare in Nominal)
Mag.
Eq.Int
Abs. 0
INTERVAL: temp., time, iq
/
How frequent each value was obtained
Abnormal distribution- skewed
Normal distribution- falls on central tendency (mean, median, mode)

Mean- average score

SD- approximation of the average deviation around the mean; square root of
variance

Z scores- difference between a score and the mean, divided by SD
Falls at high end of distribution.
*means test is too difficult
Falls at lower ends of distribution.
*means test is too easy
Percentage of people whose scores on a test falls below a particular raw score
Percentile: specific scores within a distribution
DISTRIBUTION
RATIO: weight, height
FREQUIENCY DIST’N
POSITIVE SKEW
NEGATIVE SKEW
PERCENTILE RANK
PERSONALITY TESTS
OBSERVATION
-behavioral observation
-observation checklist
Unexpected and unpredictable ratings given for a number of reasons
Rater is not really interested in the process
Differing skills are rated similarly when sequentially ordered as in a process
Performance rating is influenced by unrelated impressions
Poorly worded skill specification in an unintended manner
Give middle range rating (Likert scale)
A rater marks at the same place on the rating scale regardless of examinee’s performance
Give high positive ratings despite differences among examinee’s performance
Give low negative ratings despite differences among examinee’s performance
BIAS SOURCES
ASSESSMENT TECHNIQUES (D I T O)
INTERVIEWS
TESTS
-interview responses,
-initial assessment >
screening
verification
-Structured
-written, verbal, visual
-Unstructured
o
Criterion-referenced test (CRT) –
relate to the content of the test (ex.
Theres a certain criterion to be met)
PROJECTIVE TESTS: ambiguous test stimulus, unclear
responses

Wishes, Intrapsychic conflict, Desires,
APTITUDE TESTS: predicting, acquiring skills or
Unconscious motives
competencies

Subjectivity on test- interpretation/clinical
Ex. Differential Aptitude Test
judgement

Self administered/individual tests

Unlimited responses
Results are integrated into a single score interpretation
DOCUMENTS
-records, protocols,
collateral reports
RESPONSE SET
LENIENCY ERROR
SEVERITY ERROR
CENTRAL TENDENCY
ERROR
PROXIMITY ERROR
HALO ERROR
LOGICAL ERROR
LACK OF INTEREST
ERROR
IDIOSYNCRATIC ERROR
Test has two forms
Test designed to be administered to an individual more than
once
Tests with factorial purity
Test-retest and Parallel-Form Reliabilty
Internal Consistency
Inter-rater Reliability
KR20
Split-half reliability
Cronbach Alpha
Test-retest Reliability
WHICH TYPE OF RELIABILITY IS APPROPRIATE?
Parallel Forms Reliability
PERSONALITY TESTS: traits/domains/factors. Usually no
right or wrong answers
Ex. MBTI
Test with items carefully ordered according to
difficulty
Tests involves some degree of subjective scoring
Tests involves dichotomous items
Dynamic Characteristics –ever changing
characteristics that change through time or situation.
Static Characteristics –Characteristics that would not
vary across time
PSYCHOLOGICAL TESTS
ABILITY TESTS
INTELLIGENCE TESTS: general potential to solve problems

Verbal intelligence

Non-verbal intelligence
Ex. WAIS, Stanford Binet Int. Scale, Culture Fare Intelligence
Test
ACHIEVEMENT TESTS: previous learnings.

Measures the extent of one’s knowledge; various
academic subject
Ex. Stanford Achievement Test in reading
OBJECTIVE TESTS: structured. “Yes or No” or “True or false”

Standardized: test administration, scoring,
interpreting scores

Limited number of responses

Group tests

NORMS: where we base the scores.
o
Norm-referenced test (NRT) – test
takers perform better or worse (ex.
Age norms)
CONTENT
VALIDITY
-measures what it purports to measure
CRITERION RELATED
-correlate what occurs in the future
- Test scores may be obtained at one time
and the criterion measure may be obtained
in the future after an intervening event.
-performance is predicted based on one or
more known measured variables
-ex. MAT, GRE< GMAT
PREDICTIVE
–how well a test corresponds w/ a particular criterion
-criterion – standard ->Characteristics: Relevant, valid and reliable, Uncontaminated
-Criterion contamination- criterion based on predictor measures
-both valid and reliable
-performance in the first measure should be highly correlated w/ performance on the second
CONCURRENT
-correlate what is occurring now
-Both the test scores and the criterion measures
are obtained at present
-valid, reliable and considered a standard
-often confused with a construct validity strategy
CONSTRUCT
CONVERGENT
DIVERGENT
-An informed scientific idea developed or hypothesized to describe or explain a behavior; something built by
mental synthesis. Unobservable, presupposed traits
-Required when no criterion or universe of content is accepted as entirely adequate to define the quality
being measured.
-A test has a good construct validity if there is an existing psychological theory which can support what the
test items are measuring.
-both logical analysis and empirical data.
-general than specific and provide frame of reference
EVIDENCES:
1. Test is homogenous, measuring a single construct.
2. Test score increases or decreases as a function of age, passage of time, or experimental manipulation.
3. Pretest, posttest differences
4. Test scores differ from groups.
5. Test scores correlate with scores on other test in accordance to what is predicted.
UNIDIMENSIONAL- one construct
MULTI-DIMENSIONAL- several constructs
INTER-RATER RELIABILITY
-Also called as divergent/discriminant validity
-A validity coefficient sharing little or no
relationship between the newly created test and
an existing test.
-Social Desirability test and Marital Satisfaction
test.
-test measuring something different from the
other test measure
INTERNAL CONSISTENCY
-Kappa Statistics
-different raters, using common rating form, measure the
object of interest consistently.
-“Are the raters consistent in their ratings?”
*Cohen’s Kappa –used to know the agreement among 2 raters
*Fleiss’ Kappa –used to know the agreement among 3 or more
raters.
- The test is correlated to another measure
-correlate well; measure the same construct as to
other test
Ex. Depression test and Negative Affect Scale
- administered to the same subjects as the measure is
being validated. Two measures are intended to
measure the same construct, but are NOT
administered in the same fashion.
PARALLEL-FORM RELIABILITY
-essence of what you’re measuring consist of
topics and processes
-often made by expert judgement
-GENERALIZABILITY – examiner will generalize
from the sample of items to the degree of
content mastery possessed by individual
examinee
EDUCATIONAL CONTENT VALID TEST - follows
TOS
EMPLOYMENT CONTENT VALID TESTappropriate job related skills. Reflect the job
specification of the test
CLINICAL CONTENT VALID TEST- symptoms of
disorders are covered. Reflects the diagnostic
criteria for a test
CONSTRUCT UNDERREPRESENTATION- failure to
capture important components of a construct
CONSTRUCT-IRRELEVANCE VARIANCE- when
scores are influence by factors irrelevant to the
construct
CONTENT VALIDITY RATIO (CVR)- by Lawshe,
proposed a structured & systematic way of
establishing content validity of a test
RELIABILITY
TEST-RETEST RELIABILITY
-r
-Equivalent (Are the two forms of the test equivalent?)
-different forms of the same test are administered to the same
group at different times ->high reliability coefficient
-Tests should contain the same number of items and the items
should be expressed in the same form and should cover the
same type of content. The range and level of difficulty of the
items should also be equal. Instructions, time limits, illustrative
examples, format and all other aspects of the test must
likewise be checked for equivalence.
-PROBLEM: difficulty of developing another form
-consistency of a test
-indicate how stable a test score.
-should produce similar results consistently if it measures the same thing
-A TEST CAN BE RELIABLE BUT WITHOUT BEING VALID
-Stability (Will the scores be stable over time?”
-Pearson r
-gives the same test to the same group of test takers on 2
different times
-carryover effect: “too short”when the first testing session
influences the results of the second session and this can affect
the test-retest reliability of a psychological measure
-practice effect: a type of carryover effect wherein the scores
on the second test administration are higher than they were
on the first.
-used only in measuring traits/characteristics that do not
change over time
-error variance: corresponds to the random fluctuations of
performance from one test session to the other.
-“How well does each item measure the content/construct
under consideration?”
-Used when tests are administered once.
-There is consistency among items within the test.
If all items on a test measure the same construct, then it has a
good internal consistency.
*SPLIT-HALF RELIABILITY- spearman brown prophecy formula
-splitting the items on a questionnaire or test in half,
computing a separate score for each half, and then calculating
the degree of consistency between the two scores for a group
of participants. (Odd or even)
*CRONBACH ALPHA- Used when two halves of the test have
unequal variances.
-Provides the lowest estimate of reliability.
-Average of all split halves. Ex. Likert scale items
*KR20- for binary; dichotomous. Tests with right or wrong
format
Central Tendency
Variability
Standard scores
Frequencies
I.DESCRIPTION OF THE GROUP
A.
B.
C.
D.
A.
B.
C.
D.
E.
II. CORRELATE VARIABLES
Pair of Interval or continuous – Pearson r
Pair of Ordinal – Spearman Rho
Pair of Dichotomous – KR20
a.
Both alternatives
One continuous and one dichotomous
a.
True – Point Biserial
b. Artificial – Biserial
3 or more raters – Agreement
a.
Kendal’s Coefficient Concordance
ASSESSMENT
A.
B.
III. COMPARISON OF GROUPS
Random Sampling
a.
2 separate groups w/ individual means – T-test independent
measures
b. 1 group, 2 scores – T-test dependent
c.
3 or more groups – ANOVA Repeated
d. 1 group, 3-more scores – ANOVA 1way
e. 2 or more groups per group – ANOVA Split Plot or Mixed design
f.
2 IV’s; 1 DV – ANOVA Two way
i. 4 groups- 2x2 design
Non Random Sampling
a.
2 separate groups – Mann Whitney U
b. 1 group; 2 Ordinals – Wilcoxon Signed Rank Test
c.
3 or more groups – Kruskal Wallis & H-test
d. 3 or more ranks – Freidman Test
e. 1 group into categories/frequencies – Chi-square
A.
B.
C.
D.
E.
IV. PREDICTING VARIABLES
One is to one – Linear Regression
More than one is to one (X1+ X2+ X3=Y) – Multiple Regression
Sets of predictors ; Significant or not: Hierarchical Regression
M1
Xq = Y
M2
X q + X2 = Y
M3
Xq + X2+ X3 = Y
Sets of predictors ; All significant – Stepwise Regression
M1
Xq* = Y
M2
Xq* + X2* = Y
M3
Xq* + X2*+ X3* = Y
Outcome is Nominal – Logistic Regression
TESTING
Broad array of evaluative process
Instruments that yield scored based on collected data (a subset of assessment)
Objective- answers, solves problems, decides
Obtain some measure (numerical in nature with regard to ability/attribute)
Process: Individualized process
Process: Individualized or grouped
Role of evaluator: Key in the choice of tests
Role of evaluator: May be substituted
Skills of evaluator: Educated selection of tools, and skilled
Skills: Technician-like skills
Outcome: Logical problem solving approach
Outcome: Yield test scores/series of test scores
Technical Quality – to a test’s psychometric soundness
TESTS
ITEM - Suggests a sample of behavior of an individual.
1. Content – the subject matter of the test
SCALE - Process by which a response can be scored.
3 FORMS OF ASSESSMENT (T C D)
2. Format – pertains to the form, plan,
TYPES OF PSYCHOLOGICAL TESTS
1. THERAPEUTIC PSYCHOLOGICAL ASSESSMENT – the patient gains insight about the disorder & later develop
structure, arrangement, and layout of
1. NUMBER OF TEST TAKERS
psychological wellness
test items
a. Individual
2. COLLABORATIVE PSYCHOLOGICAL ASSESSMENT – the patient helps the clinician to uncover the disorder
3. Administration Procedures –
b. Group
3. DYNAMIC PSYCHOLOGICAL ASSESSMENT – follow process (ABA Design)
administered on a one-to-one basis or by
2. VARIABLE BEING MEASURED
a.
Evaluation
group
a. ABILITY
b. Therapy/intervention
4. Scoring and Interpretation –
i. ACHIEVEMENT
c.
Evaluation
a.
Score – code or summary
ii. APTITUDE/PROGNOSTIC
ASSESSMENT TOOLS (O P I)
statement that reflects an
iii. INTELLIGENCE
1. OBSERVATION – monitoring the actions of others or oneself by visual or electronic means while recording
evaluation of performance on a
b. PERSONALITY
quantitative and/or qualitative information regarding those action
test
i. OBJECTIVE/STRUCTURED
a.
Natural observation - observing behaviors in setting in which behavior would typically be expected to
b. Scoring – process of assigning such
ii. PROJECTIVE/UNSTRUCTURED
occur
evaluative codes or statements to
iii. INTERESTS
b. Role play test - a tool of assessment wherein examinees are directed to act as if they were in a particular
performance on tests
situation,
2. PYSCHOLOGICAL TESTING – A set of items used for testing/ measuring/ determining individual difference. The process MAXIMUM PERFORMANCE TESTS
CHARACTERISTS OF PSYCHOLOGICAL TESTING

SPEED TEST – test is homogeneous, means that it is easy. Short
of measuring psychology related variables by means of a device.
1. Objective – free from the
time.
3. INTERVIEW – gathering information through direct communication. Differ from their purpose, length, and nature.
subjective perception

POWER TEST – few items but more complex
a.
Panel interview – multiple interviewers
2. Standardized – Uniformity exists
REFERENCE SOURCES –sources for authoritative info about published test
i. Advantage: minimizes the idiosyncratic biases of a lone interviewer
3. Reliable – there is consistency in
ii. Disadvantage: costly; the use of multiple interviewers may not be even justified

Test Catalogues – brief description of test
test results
iii. Portfolio: sample of one’s ability and accomplishment.

Test manuals – detailed information of a test
4. Valid – test measures what it
iv. Case history data: refers to records, transcripts, and other accounts in written, pictorial. CASE
REFERENCE VOLUMES – “one-stop shopping”
purports to measure
STUDY - a report or illustrative account concerning a person or an event that was compiled on

Journal articles
5. Good predictor validity – test
the basis of case history data

Online data bases
results suggest future behavior.
ETHICAL CODE

Professional guidelines for appropriate behavior
o
American Counseling Association (2005)
o
American Psychological Association (2003)
o
Psychological Association of the Philippines (2009)
WHEN CAN REVEAL CONFIDENTIAL INFORMATION
1.
If a client is in danger of harming himself or herself or someone else;
2.
If a child is a minor and the law states that parents have a right to information about their child;
3.
If a client asks you to break confidentiality (for example, your testimony is needed in court)
4.
If you are bound by the law to break confidentiality (for example you are hired by the courts to assess an individual’s capacity
to stand trial);
5.
To reveal information about your client to your supervisor in order to benefit the client;
6.
When you have a written agreement from your client to reveal information to specified sources (for example, the court has
asked you to send a test report to them).
RESPONSIBILITIES OF TEST USERS, PUBLISHERS, AND CONSTRUCTORS

Use assessment instrument to samples similar of the standardization group (reliability, validity, established norms)

Test users must possess knowledge of test construction and supporting researches of any test they administer.

Test developers should provide psychometric properties of the test specified scoring and administration and clear description
of the normative sample.
MORAL ISSUES
DIVIDED LOYALTIES - Psychologist are torn whether their client is the institution or the person.

Human Rights
Institutions should be informed of what they needed or answer the referral question only.

Labeling

Invasion of Privacy

Divided Loyalties

Responsibilities of Test Users, Test Publishers, and Test Constructors
HUMAN RIGHTS

Right to Informed Consent

Right to know their test results and basis of any decisions that affect their lives

Right to know who will have access to test data and right to confidentiality of test results.
INFORMED CONSENT

Permission given by the client after assessment process in explained.

Informed consent involves the right of clients to obtain information about the nature and purpose of all aspects of the
assessment process and for clients to give their permission to be assessed.
NON-REQUIREMENT OF INFORMED CONSENT

Mandated by the law.

Testing as routine educational, institutional, or organizational activity.

Evaluation of decisional capacity.
LABELING
CONFIDENTIALITY - Ethical guideline to protect client information. Whether

Effects of Labeling
conducting a broad assessment of a client or giving one test, keeping information
o
Results to Stigmatization
confidential is a critical part of the assessment process and follows similar guidelines
o
Affects one’s access to help
to how one would keep information confidential in a therapeutic relationship.
o
Make a person passive
INVASION OF PRIVACY

The codes generally acknowledge that, to some degree, all test invade one’s privacy and highlight the importance of clients
understanding how their privacy and highlight the importance of clients understanding how their privacy might be violated
upon.
TEST SCORING & INTERPRETATION

The codes highlight the fact that when scoring test and interpreting their results, professionals should reflect on how test
worthiness (reliability, validity, cross-cultural fairness, and practicality) might affect the results.
TEST SECURITY

The codes remind professionals that it is their responsibility to make reasonable efforts to ensure the integrity of test content
and the security of the test itself. Professionals should not duplicate tests or change test materials without the permission of
the publisher.
ETHICS IN PSYCHOLOGICAL TESTING
CHOOSING APPROPRIATE ASSESSMENT INSTRUMENTS

Ethical codes stress the importance of professionals choosing assessment instruments that show test worthiness, which has to
do with the reliability, validity, cross-cultural fairness, and practicality of a test.
Professional must take appropriate actions when issues of test worthiness arise during an assessment so that the results of
the assessment are not misconstrued.

COMPETENCE IN USING TESTS

Requires adequate knowledge and training in administering an instrument.

Competence to use tests accurately is another aspect that is stressed in the codes. The codes declare that professionals
should have adequate knowledge about testing and familiarity with many test they may use.
THREE-TIER SYSTEM

LEVEL A - those that can be administered, scored, and interpreted by responsible nonpsychologist who have carefully read the
manual and are familiar with the overall purpose of testing. Educational achievement tests fall into this category. Ex.
Achievement tests, Specialized Aptitude Test

LEVEL B - requires technical knowledge of test construction and use and appropriate advanced coursework in psychology and
related courses (Statistics, Individual Differences, and Counseling). Ex. Group Intelligence Test, Personality Test

LEVEL C - requires an advanced degree in Psychology or Licensure as a psychologist and advanced training/supervised
experience in the particular test. Ex. Projective Test, Individual Intelligence Test, Diagnostic Test
CROSSCULTURAL SENSITIVITY

Ethical guideline to protect clients from discrimination and bias in testing.

The code stresses the importance of professionals being aware of and attending to the effects of age, color, cultural identity,
disability, ethnicity, gender, religion, sexual orientation, and socioeconomic status on administration and test interpretation.
PROPER DIAGNOSIS

Choose appropriate assessment techniques for accurate diagnosis.

The codes emphasize the important role that professionals play when deciding which assessment techniques to use in
forming diagnosis for mental disorder and the ramification of making such diagnosis.
RELEASE OF TEST DATA

Test data are protected-client release required

The codes assert that data should only be released to others if the clients have given their consent.

The release of such data is generally only given to individuals who can adequately interpret the test data and to those who
will not misuse the information.
TEST ADMINISTRATION

The codes reinforce the notion that tests should be administered in a manner that is in accord with the way that they were
established and standardized.
Alterations to this process should be noted and interpretations of test data adjusted in the testing conditions were not ideal.

MORAL MODEL OF DECISION MAKING

AUTONOMY - Respecting the client’s right of self-determination and freedom of choice.

NON-MALEFICENCE - Ensuring the professionals do no harm

BENEFICENCE - Promoting the well being of others and of society

JUSTICE - equal and fair treatment to all people and being non discriminatory.

FIDELITY - Being loyal and faithful to your commitments in the helping relationship.

VERACITY - Dealing honestly with the client.
NORMS AND STATISTICS
USE OR TWO TYPES OF STATISTICS
MEASURE OF CENTRAL TENDENCY - Statistics that indicated the average or midmost score between the extreme scores in distribution.
1.
DESCRIPTIVE – used for making interpretation of test results. Provide concise description of quantitative information

MEAN – the most appropriate central tendency for interval and ratio when distribution is normal.
2.
INFERENTIAL – provide conclusions regarding a population based on the observation on a sample

MEDIAN – middle score of the population
SCALES OF MEASUREMENT

MODE – the most frequently occurring score in a distribution
1.
NOMINAL – naming; labeling; one category does not suggest that the other is higher or lower. Ex. Gender; religion
MEASUREMENT OF VARIABILITY
2.
ORDINAL – observations can be ranked into order but the degree of difference is unobtainable. Ex. Position in the company

Indicates how scattered the score are distribution; how far one score is from the other. Measures the dispersion of the scores.
3.
RATIO – there is magnitude, equal intervals, and true zero

Range –equal to the difference of HS to LS
4.
INTERVAL – there is magnitude and equal interval; no true zero
INTERQUARTILE AND SEMI-INTERQUARTILE RANGE
*magnitude - “moreness”; we suggests that one is more than the other

Quartile –points that divide the distribution into 4 equal parts.
*equal interval - the difference between two points at any place has the same meaning as the difference between two other points on other

Interquartilerange –difference between Q3 and Q1; represents the middle 50% of the distribution.
places.

Semi-interquartilerange -(Q3 –Q1)/2
*absolute zero - zero suggest absence of the variable being measured
*most psychological data are ordinal by nature but are treated as interval.
*IQ are initially for classification and not for measurement (cited by Binet)
FREQUENCY DISTRIBUTION - Displays scores on a variable or a measure to reflect how frequent each value was obtained.
*GRAPH - a diagram or chart illustrating data

Histogram - graphs with vertical lines at the true limits of each test score; connected bars; used for continuous data

Bar graph – used in describing frequencies; disconnected bars

Frequency Polygon – points are plotted at the class mark of each of the intervals; Continuous lines
KURTOSIS - The steepness of a Distribution

PLATYKURTIC – flat; the difference of the number of test takers who got high and low score is not far from the number of test takers
who got a score in equivalent to the mean

LEPTOKURTIC – Peaked; the difference of the number of test takers who got high and low score is far from the number of test takers
who got a score in equivalent to the mean.

MESOKURTIC – Middle; the distribution is deemed normal.
DECILE - Points where the distribution is equally divided into 10 parts. D1 –D9
LINEAR TRANSFORMATION - Derived formula of the Z-score to transform one score from a scale to another score. NS = SD(Z)+M
PERCENTILE RANK

Tells the relative position of a test taker in a group of 100.

Suggests how many samples fall below a specified score.

For example: if person has a score equivalent to percentile 50, it suggests that 50 percent of the test takers fall below that specific
score.
CORRELATION - Statistical tools for testing the relationship between variables.

COVARIANCE – How much two scores vary together

CORRELATIONAL COEFFICIENT – mathematical index that describes the direction and magnitude of a relationship.
o
Ranges from -1.00 to +1.00
o
The nearer to 1; the stronger the relationship
o
The nearer to 0; the weaker the relationship
o
The symbol suggests the type of relationship (negative = indirect relationship; positive = direct relationship)
CORRELATIONAL STATISTICS
o
PEARSON PRODUCT MOMENT CORRELATION – 2 variables in interval/ratio scale
o
SPEARMAN RHO – correlates 2 variables in ordinal scale. Also called rank-ordered correlation.
o
BISERIAL CORRELATION – 1 continuous and 1 artificial dichotomous data (dichotomy in which there are other possibilities
in a certain category)
o
POINT BISERIAL CORRELATION – 1 continuous and 1 true dichotomous data (dichotomy in which there are only two
possible categories.)
PHI COEFFICIENT – 2 dichotomous data; at least 1 true dichotomy
TETRACHLORIC COEFFICIENT – 2 dichotomous data; both are artificial dichotomy
COEFFICIENT OF ALIENATION - measure of non association between two variables
COEFFICIENT OF DETERMINATION - Suggests the percentage shared by two variables. The effect of one variable to
another. r=0.75; r2=0.56
o
o
o
o
STANDARD DEVIATION - Approximation of the average deviation around the mean. Gives detail of how much above or below a score to the mean.

NORMAL DISTRIBUTION – majority of the test takers are bulked at the middle of the distribution, very few test takers are at the
extremes

POSITIVELY SKEWED – more test takers got a low score. Mean>median>mode

NEGATIVELY SKEWED – more test takers got a high score. Mode>median>mean
STANDARD SCORES - A raw score that has been converted from one scale to another scale. Provide a context of comparing scores on different tests
by converting scores from the two tests into z-score

Z SCORE – Mean of 0; SD of 1. Zero plus or minus one scale. When determined, can be used to translate one scale to another.

T-SCORE – Mean = 50; SD = 10. Created by McCall in honor of his professor Thorndike

STANINE – Mean = 5; SD = 2. Used by US Airforce Assessment. Takes whole numbers 1 –9; no decimals

DEVIATION IQ – Mean = 100; SD = 15. Used for interpreting IQ

STEN – Standard ten. Mean = 5.5; SD = 2

GRE/SAT – Mean = 500; SD = 100. Used for admission for graduate school and college
NORMS - Performance by defined groups on a particular test. Transformation of raw scores in making meaningful interpretations of scores on a test

NORMING - process of creating norms

NORMATIVE SAMPLES - group of people whose performance on a particular test is analyzed and referred

RACE NORMING – norming based on race/ culture

USER NORMS - norms provided by the test manuals

NORMAN - the person who constructs a norm
CRITERION-REFERENCE - interpretation of test is based on a certain standards.
NORM-REFERENCE - Score is interpreted based on the performance of a standardized group.
1.
DEVELOPMENTAL NORMS – indicates how far along the normal developmental path an individual has progressed.
- AGE NORMS, GRADE NORMS, ORDINAL SCALE
2.
WITHIN GROUP NORMS – individual’s performance is evaluated in terms of the performance of the most nearly comparable
standardization group.
a.
PERCENTILE
b.
STANDARD SCORE
c.
DEVIATION IQ
3.
NATIONAL NORMS – norms on large scale samples
a.
SUBGROUP NORMS
b.
LOCAL NORMS
REGRESSION (Ŷ = a + Bx)

Intercept (a) –the point at which the regression line crosses the Y axis

Regression Coefficient (b) –the slope of the regression line.

Regression line –best fitting straight line through a set of points in a scatter plot

Standard Error of Estimate –measure the accuracy of prediction
MULTIPLE REGRESSION - statistical technique in predicting one variable from a series of predictors. Used to find linear combinations of three or
more variables. Applicable only when the data are all continuous. (FACTOR ANALYSIS)
STANDARDIZED REGRESSION COEFFICIENT - Also called as beta weights. Tells how much a variable from a given list of variables predict a single
variable.
FACTOR ANALYSIS - Used to study the interrelationships among set of variables.

Factors –variables; Also called as principal components

Factor Loading –the correlations between the original and the factors; depicted through beta weights.
META-ANALYSIS - Family of techniques used to statistically combine information across studies to produce single estimates of the data under
study.

Effect size –the estimate of the strength of relationship or size of differences. Evaluated through correlation coefficient
ITEM ANALYSIS AND ITEM CONSTRUCTION
ITEM WRITING GUIDELINES:
ITEM ANALYSIS - general term for a set of methods used to evaluate test items, one of the most important aspects of test construction.
I.
ITEM DIFFICULTY - measures achievement/ability, defined by the number of people who get correct items. Indicates the easiness of

Define clearly
the test. Should range from 0.30-0.70. Achievement tests make use of multiple choice because it has 0.25 chance of getting the correct

Generate item pool
response

Avoid long items
a.
Optimum item difficulty - suggests the best difficulty for an item based on the number of responses.

Keep level of reading difficulty appropriate for those who will complete the test.
i. OID = (chance performance + 1)/ 2

Avoid double-barreled items (more than one ideas in one item)
ii. Chance performance –performance based on guessing. Can be equated by dividing 1 from the number of

Consider making positive & negative worded items
distractors.
ITEM FORMAT - Form, plan, structure, arrangement, and layout of individual test items.
b.
Item difficulty index - value that describes the item difficulty for an ability test.
I.
SELECTED RESPONSE FORMAT – select a response from a set of alternative responses.
c.
Item endorsement index - value that describes the percentage of individuals who said endorsed an item in a personality
a.
DICHOTOMOUS FORMAT - offers 2 alternatives for each item. ADVANTAGE: simplicity, easy administration, quick score,
test.
no neutral response. DISADVANTAGE: needs more items, 50% chance of getting the correct answer; sample can
d.
Omnibus spiral format - Items in an ability test are arranged into increasing difficulty.
memorize responses
i. Give away items –presented near the beginning of the test to spur motivation and lessen test anxiety.
b.
POLYCHOTOMOUS - has more than 2 alternatives. Ex. multiple choice.
II.
ITEM RELIABILITY - Indicates the internal consistency of a test. The higher the index; the higher the internal consistency.
i. Question - stems
a.
(Item Reliability) = (SD of the item) x (item-total correlation)
ii. Correct choice - keyed response
b.
Factor analysis can also be used to determine which items has more load for the whole test.
iii. Distractors - incorrect choices.
III.
ITEM VALIDITY - indication of the degree to which a test is measuring what it purports to measure. Higher item-validity index; the
iv. Cute distractors - less likely to be chosen, may affect the reliability of the test
higher the criterion related validity for the test.
c.
LIKERT FORMAT - requires the respondent to indicate the degree of agreement with a particular attitudinal question.
a.
Item Validity = (item standard deviation) x (correlation of item and criterion)
Superior item format. Uses factor analysis. Can be 5-4/6 choice format *without neutral point*. Negative items are
IV.
ITEM DISCRIMINABILITY - How well an item performs in relation to some criterion. How adequately an item separates high scorers
reversed score then summed up all scores.
from low scorers on the entire test. Limits at 0.30 discrimination index. The higher the d the more high scorers answering the item
d.
CATEGORY - asked to rate a construct from 1-10; 1-lowest and 10-highest.
correctly
e.
CHECKLIST - a subject receives a long list of adjectives and indicates whether each one is characteristic of himself or
a.
Extreme group method – compares people who have done well with those who have done poorly on a test
herself
b.
Point biserial – correlating dichotomous and continuous data. Correlates whether those who got an item correct tends to
f.
QSORT - requires respondents to sort a group of statements into 9 piles.
have high scores as well
g.
GUTTMAN SCALE - Items are arranged from weaker to stronger expressions of attitude, belief, or feeling being measured.
V.
ITEM CHARACTERISTIC CURVE - Graphic representation of item difficulty and discrimination. Usually plots the scores at x-axis then p
II.
COMPLETION ITEMS – complete a set of stimuli to complete a certain item.
and d on the y-axis.
a.
ESSAY ITEMS - samples need to respond to a question by writing a composition; used to determine the depth of
VI.
ITEMS FOR CRITERION REFERENCE TEST - frequency polygon is created after the test given to two groups; one group that is exposed to
knowledge of the respondent.
learning unit, another group that is not exposed to learning unit
EQUAL APPEARING INTERVAL
a.
Antimode-the score with the lowest frequency

Described by Thurstone
b.
Determination of cut score (passing score) for a criterion referenced test.

Scale wherein + and –items are present
VII.
DISTRACTOR ANALYSIS –

Adds all responses in order to transform it into interval scale.
VIII.
ISSUES AMONG TEST ITEMS

Uses direct estimation scaling
a.
ITEM FAIRNESS - Degree of an item is biased.
o
Direct estimation scaling - Transformation of a scale to other scales is possible due to computable value of the mean
i. Biased Test Items –items that favor one particular group of examinees. Can be tested using inferential
o
Indirect estimation scaling - Cannot be transformed to other scales because the mean is not present.
statistics among groups.
COMPUTER ADAPTIVE TESTING - Also called as computer assisted testing. Interactive computer-administered test-taking process where in items
b.
QUALITATIVE ITEM ANALYSIS - Involve exploration of the issues through verbal means such as interviews and group
presented to the test taker are based in part on the test taker’s performance on previous items
discussions conducted with test takers and other relevant parties

ITEM BANK – relatively large and easily accessible collection of test questions
c.
THINK OUT LOUD ADMINISTRATION - Allows test takers (during standardization) to speak their mind while taking the

ITEM BRANCHING – ability of the computer to tailor the content and order of presentation of test items on the basis of response to
test. Used for shedding light to the test taker’s thought process during the administration of the test.
previous item
d.
EXPER PANELS - Guide researchers/test developers in doing sensitivity review (especially in cultural issues)
SCORING ITEMS
i. Sensitivity review –a study of test items typically to examine test bias, presence of offensive language and
I.
CUMULATIVE MODEL – the higher the score on the test, the higher the test taker is on the ability, trait, or other category.
stereotypes
II.
CLASS SCORING/CATEGORY SCORING – test taker response earn credit toward placement in a particular class or category with other
test takers whose pattern of responses is similar in some way. Most useful in diagnostic tests
III.
IPSATIVE SCORING – compares a test taker’s score on one scale within a test to another scale within that same test
TEST DEVELOPMENT - umbrella term that goes into the process of creating a test.
I.
TEST CONCEPTUALIZATION - wherein idea for a particular test is conceived. Following are determined: Construct, Goal, User, Taker, Administration, Format, Response, Benefits, Costs, Interpretation. Determination whether the test would be Norms-Referenced or Criterion-Referenced
a.
Pilot work - May be in the form of interview in determining appropriate item for the test
II.
TEST CONSTRUCTION –writing test items, formatting items, scoring rules, design and building a test.
a.
Scaling –process of setting rules for assigning numbers in measurement. Manifested through its item format (dichotomous, polytomous, likert, catergory)
b.
Item pool - usually 2 times the intended final form number of items. 3 times is more advisable
III.
TEST TRYOUT - administration of a test to a representative sample of test takers under conditions. Issues:
a.
Determination of target population
b.
Determination of number of samples for test tryout (# of items multiplied to 10)
c.
Test tryout should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered.
ITEM ANALYSIS - Entails procedures usually statistical designed to explore how individual test items work as compared to other items in the test and in the context of the whole test. (validity, reliability, item difficulty and discrimination
TEST REVISION - Balancing of the weakness and strengths of the test/an item.
a.
Norming - Done after the test has been revised into acceptable levels of reliability, validity, and item index.
IV.
V.
TEST ADMINISTRATION
ISSUES IN TEST ADMINISTRATION
The Examiner and the Subject
Subject Variables
Training of the Test Administrator
Behavior Assessment Issues
Mode of Administration
EXAMINER AND THE SUBJECT - Relationship between Examiner and the test taker

Wechsler Intelligence Scale for Children (WISC) enhanced rapport increased score

Faulty Response Style
o
Acquiescent Response –tendency to have increased agreement in responding in a test or interview. Most responses are
positive in test items regardless of item content
o
Socially Desirable Response Style –Present oneself in favorable or socially desirable way

Language of the test taker - Test takers proficient in two or more languages should be tested to the language they are most
comfortable.

Race of test taker - There are significant effects from the examiner’s race to the samples responses.
TRAINING OF TEST ADMINISTRATOR

Different assessment procedures require different levels of training.

According to research, at least 10 practice sessions are needed to gain competency in scoring WAIS –R
MODE OF ADMINISTRATOR

Self administered measures shows lower results than psychologist administered.

Telephone interviews show better health than self administered interviews.
SUBJECT VARIABLES
I.
TEST ANXIETY - anxiety based on test performance. (worry, emotionality, lack of self confidence)
II.
ILLNESS - diseases influence test taking behavior and performance (malingerers)
III.
HORMONES - imbalance of hormones affect mood cycles thus affect performance on a test
IV.
MOTIVATION - required to take testing as occupational requirement tend to have unreliable results
ERRORS OF BEHAVIORAL ASSESSMENT
I.
REACTIVITY - Being evaluated increases performance; also called as Hawthorne Effect
II.
DRIFT - moving away from what one has learned going to idiosyncratic definitions of behavior; this suggests that observers should be
retrained in a point of time
a.
CONTRAST EFFECT - tendency to rate the same behavior differently when observations are repeated in the same context.
III.
EXPECTANCIES - Tendency for results be influenced by what test administrators expect to find.
a.
Rosenthal Effect –the test administrator’s expected results influence the result of the test.
b.
Golem Effect –negative expectations from the test administrator decreases one’s performance.
IV.
RATING ERRORS - judgment resulting from intentional and unintentional misuse of a rating scale
a.
Halo Effect –tendency to ascribe positive attribute independently of the observed behavior; suggested by Thorndike
b.
Leniency Error/ Generosity Error –rater’s tendency to be too forgiving and insufficiently critical
c.
Severity Error –evaluation to be overly critical
d.
Central Tendency Error –The rater has reluctance in giving ratings at either positive or negative extreme.
e.
Rater’s ratings would tend to cluster in the middle of the continuum.
f.
General Standoutishness–People tend to judge on the basis of one outstanding characteristic.
INTERVIEW - Method of getting information by talk, discussion, or direct question.
I.
DIRECTIVE INTERVIEW - Interviewer directs, guides, and controls the course of the interview.
II.
NONDIRECTIVE INTERVIEW - the interviewee guides the interview process.
III.
SELECTION INTERVIEW - it was designed to elicit information pertaining an applicants qualifications and capabilities for particular
employment duties
IV.
SOCIAL FACILITATION INTERVIEW - Interviewers serve as a model for the interviewee.
PRINCIPLES OF EFFECTIVE INTERVIEW
I.
PROPER ATTITUDE –
a.
INTERPERSONAL INFLUENCE – degree to which one person can influence another.
b.
INTERPERSONAL ATTRACTION – degree to which people share a feeling of understanding mutual respect similarity and
the like.
II.
RESPONSES TO AVOID –
a.
JUDGEMENTAL STATEMENTS – evaluating the thoughts, feelings, or actions of another
b.
PROBING STATEMENTS – Demanding more information than the interviewee wishes to provide voluntarily
c.
HOSTILE STATEMENTS
d.
FALSE ASSURANCE
III.
EFFECTIVE RESPONSE
a.
OPEN ENDED QUESTIONS
b. SUMMARIZING
c.
TRANSITIONAL PHASE
d. CLARIFICATION RESPONSE
e.
PARAPHRASING AND RESTATEMENT
f. EMPATHY & UNDERSTANDING
TEST UTILITY
USES OF TEST

Classification –Assigning a person to one category rather than another

Screening –refers to quick and simple tests or procedures identify persons who might have special characteristics or needs.

Placement –sorting of persons into different programs appropriate to their needs or skills.

Selection -refers to a process whereby each person evaluated for a position will be either accepted or rejected for that position

Diagnosis and Treatment Planning –Determination of abnormal behavior; classify using diagnostic criteria; precursor to
recommendation of treatment of personal distress.

Self Knowledge –understanding of individual’s intelligence and personality characteristics

Program Evaluation –Systematic assessment and evaluation of educational and social programs

Research –measures variables that suggests correlations and causal relationships
UTILITY - Usefulness or practical value of testing efficiency

PSYCHOMETRIC SOUNDNESS – Tests should be reliable and valid for it to be used. Reliability sets the limit for Validity –the upper
boundary of validity is reliability

COST – Disadvantages, losses, or expenses in both economic and non economic terms associated with testing or non testing
o
ECONOMIC COST – monetary expenses (Personnel, test protocols, testing venues, etc.)
o
NON ECONOMIC COST – intangible loss (Loss of trust from patrons due to unqualified personnel)

BENEFIT – Profits, gains, advantages for testing or non testing
o
ECONOMIC BENEFIT – monetary benefits (Highly qualified salesperson (extroverted) can reach quotas equivalent to
financial gains)
o
NON ECONOMIC BENEFIT – Increase in quality and quantity of worker’s performance
UTILITY ANALYSIS - Family of techniques that entail a cost-benfitanalysis designed to yield information relevant to a decision about the usefulness
and/or practical value of a tool of assessment

Test Comparison

Assessment tools comparison

Addition of test/assessment tools

Determination of non-testing
APPROACH OF UTILITY ANALYSIS
I.
EXPECTANCY TABLES – shows the percentage of people within specified test-score intervals who subsequently were placed in various
categories of the criterion.
a.
TAYLOR-RUSSELL TABLES - Statistical tables once extensively used to provide test users with an estimate of the extent to
which inclusion of a particular test in the selection system would improve selection decisions
i. SELECTION RATIO – ratio of number of people to be hired and number of applicants
ii. BASE RATE – Lowest possible percentage of people hired expected to be successful in their job.
b.
NAYLOR-SHINE TABLES – Indicates the mean difference of the newly selected group and the mean of the standard
group/unselected group
II.
BROGDEN-CRONBACH-GLESER FORMULA (BCG FORMULA) – Calculates the dollar amount of a utility gain resulting from the use of a
particular selection instrument under specified conditions
a.
UTILITY GAIN – an estimate benefit of using a particular test.
b.
PRODUCTIVITY GAIN – estimated increase in work output
TYPES OF INTERVIEWS
SOURCES OF ERROR IN INTERVIEW
1. INTAKE INTERVIEWS - Entails detailed questioning about the
I. INTERVIEW VALIDITY
present complaints
a. HALO EFFECT
2. DIAGNOSTIC INTERVIEWS - assignment of DSM
b. GENERAL STANDOUTISHNESS
3. STRUCTURE - predetermined, planned sequence of questions that
c. CULTURAL DIFFERENCES
an interviewer asks a client
d. INTERVIEWER BIAS
4. UNSTRUCTURED - no predetermined plan of questions
II. INTERVIEW RELIABILITY
5. SEMI-STRUCTURED - Usually starts with unstructured followed by
a. MEMORY AND HONESTY OF THE INTERVIEWEE
structured targeting a diagnostic classification.
b. CLERICAL CAPABILITIES OF INTERVIEWER
6. MENTAL STATUS EXAMINATION(MSE) - quick assessment of how
MEASURING UNDERSTANDING
the client/patient is functioning at the time of evaluation.

LEVEL 1 – Little or no relationship to the interviewee’s
7. CRISIS INTERVIEW - Usually for suicidal or abuse cases
response
8. CASE HISTORY INTERVIEW - Discuss developmental stages of the

LEVEL 2 – Communicates superficial awareness of the
patient
meaning of a statement

LEVEL 3 – Interchangeable to interviewee’s statements

LEVEL 4 – Communicates empathy and adds minimal
information/idea
LEVEL 5 – Communicates empathy and adds major
information/idea

Download