Uploaded by ????

PsychAssessment MidtermsReviewer

advertisement
Chapter 1
Psychological Testing and Assessment
McGraw-Hill/Irwin
© 2013 McGraw-Hill Companies. All Rights Reserved.
Testing Defined
Testing: The process of measuring psychologyrelated variables by means of devices or
procedures designed to obtain a sample of
behavior.
The objective of testing is typically to obtain some
gauge, usually numerical in nature, with regard to
an ability or tribute.
1-2
Assessment Defined
Assessment: The gathering and integration of psychologyrelated data for the purpose of making a psychological
evaluation through tools such as tests, interviews, case
studies, behavioral observation, and other methods.
The objective of assessment is typically to answer a referral
question, solve a problem or arrive at a decision through the
tools of evaluation.
1-3
Testing in Contrast to Assessment
TESTING
ASSESSMENT
Tester/Test User – Test Taker
Assessor – Assessee/Client
Objective
To obtain some gauge, usually
numerical in nature, with regard
to an ability or attribute
Objective
To answer a referral question,
solve a problem, or arrive at a
decision through the use of tools
of evaluation
Process
Process
Individual or group. After test
Individualized, focuses on how an
administration, the tester will add individual processes rather than
up the number of correct answers the results of that processing.
or the number of certain types of
responses, with little regard for
the how or mechanics of the
content.
Testing in Contrast to Assessment
TESTING
ASSESSMENT
Tester/Test User – Test Taker
Assessor - Client
Role of Evaluator
The tester is not the key to the
process; one tester may be
substituted by another without
appreciably affecting the
evaluator
Role of Evaluator
The assessor is key to the process
of selecting tests and/or other
tools of evaluation as well as in
drawing conclusions from the
entire evaluation.
Skill of Evaluator
Testing typically requires
technician-like skills in terms of
administering and scoring test as
well as in interpreting a test result
Skill of Evaluator
Requires educated selection of
tools of evaluation, skill in
evaluation, and thoughtful
organization ad integration of
Testing in Contrast to Assessment
TESTING
ASSESSMENT
Tester/Test User – Test Taker
Assessor - Assessee
Outcome
Testing yields a test score or
series of test scores
Outcome
Entails a logical problemsolving approach that brings
to bear many sources of data
designed to shed light on
referral questions.
Assessment
Non-Collaborative Psychological Assessment:
Approach in assessment with minimal input from the
assessee.
Collaborative Psychological Assessment: The assessor
and assessee work as partners.
• Therapeutic Psychological Assessment:
Therapeutic self-discovery is encouraged through the
assessment process.
Meet an Assessment Professional
Dr. Stephen Finn
1-8
Assessment
Dynamic Assessment
Evaluation
Intervention
Dynamic assessment is typically
employed in educational settings but
also may be used in correctional,
corporate, neuropsychological,
clinical, and other settings
Evaluation
1-9
Tools of Psychological Assessment
The Test
• A psychological test is a device or procedure
designed to measure variables related to
psychology (e.g. intelligence, attitudes,
personality, interests, etc.).
• Psychological tests vary by content, format,
administration, scoring, interpretation, and
technical quality
1-10
Psychological Tests
Content: The subject matter of the test. Content
depends on the theoretical orientation of test
developers and the unique way in which they define
the construct of interest.
Format: The form, plan, structure, layout of test
items, and other considerations (e.g. time limits).
Administration: Tests may require certain tasks to
be performed, trained observation of performance, or
little involvement by the test administrators (e.g.
self-report questionnaires).
1-11
Psychological Tests
Scoring and Interpretation: Scoring of tests may
be simple, such as summing responses to items, or
may require more elaborate procedures.
• Some tests results can be interpreted easily, or
interpreted by computer, whereas other tests
require expertise for proper interpretation.
Cut Score: A reference point, usually numerical,
used to divide data into two or more classifications
(e.g. pass or fail).
1-12
Psychological Tests
Technical Quality or Psychometric Soundness:
Psychometrics is the science of psychological
measurement. The psychometric soundness of a test
depends on how consistently and accurately the test
measures what it purports to measure.
• Test users are sometimes referred to as
psychometrists or psychometricians.
1-13
The Interview
The interview is a method of
gathering information through
direct communication
involving reciprocal exchange
Interviews vary as to their
purpose, length and nature
The quality of information obtained in an interview often
depends on the skills of the interviewer (e.g. their pacing,
rapport, and their ability to convey genuineness, empathy,
and humor)
1-14
Other Tools of Psychological Assessment
The Portfolio: A file containing the products of
one’s work. May serve as a sample of one’s abilities
and accomplishments
Case History Data: Information preserved in
records, transcripts, or other forms.
Behavioral Observation: Monitoring the actions of
people through visual or electronic means
1-15
Other Tools of Psychological Assessment
Role-Play Tests: Assessees are directed to act as if
they were in a particular situation. Useful in
evaluating various skills.
Computers as Tools:
Computers can assist in test
administration, scoring, and
interpretation.
1-16
Computers as Tools Contd.
• Scoring may be done on-site (local processing) or at
a central location (central processing).
• Reports may come in the form of a simple scoring
report, extended scoring report, interpretive report,
consultative report, or integrative report.
• Computer Assisted Psychological Assessment
(CAPA) and Computer Adaptive Testing (CAT)
have allowed for tailor-made tests with built-in
scoring and interpretive capabilities.
1-17
Computers as Tools Contd.
• Assessment is increasingly conducted via the
internet.
Advantages of Internet Testing
1) Greater access to potential test-users
2) Scoring and interpretation tends to be quicker
3) Costs tend to be lower
4) Facilitates testing otherwise isolated populations
and people with disabilities
1-18
Who, What, Why, How, and Where?
Who Are the Parties?
The test developer – tests are created for research
studies, publication (as commercially available
instruments), or as modifications of existing tests.
• The Standards for Educational and Psychological
Testing covers issues related to test construction and
evaluation, test administration and use, special
applications of tests and considerations for
linguistic minorities.
1-19
Who are the Parties?
The test user – Tests are used by a wide range of
professionals
• The Standards contains guidelines for who should
be administering psychological tests but many
countries have no ethical or legal guidelines for
test use
The test-taker – Anyone who is the subject of an
assessment or evaluation is a test-taker.
• Test-takers may differ on a number of variables at
the time of testing (e.g. test anxiety, emotional
distress, physical discomfort, alertness, etc.)
1-20
Who are the Parties?
Society at large – Test developers create tests to
meet the needs of an evolving society.
• Laws and court decisions may play a major role in
test development, administration, and interpretation.
Other parties - Organizations, companies, and
governmental agencies sponsor the development of
tests.
• Companies may offer test scoring and interpretation
• Researchers may review tests and evaluate their
psychometric soundness
1-21
What Types of Settings?
Geriatric settings: Assessment primarily evaluates
cognitive, psychological, adaptive, or other
functioning. The issue is quality of life.
Business and military settings: Decisions regarding
careers of personnel are made with a variety of
achievement, aptitude, interest, motivational, and
other tests.
Government and organizational credentialing:
Includes governmental licensing, certification, or
general credentialing of professionals (e.g. attorneys,
physicians, teachers, and psychologists)
1-22
What Types of Settings?
Educational settings:
Students typically undergo
school ability tests and
achievement tests.
Diagnostic tests may be used
to identify areas for
educational intervention .
Educators may also make informal evaluations of
their students
1-23
What Types of Settings?
Clinical settings: Includes hospitals, inpatient and
outpatient clinics, private-practice consulting rooms,
schools, and other institutions.
• Assessment tools are used to help screen for or
diagnose behavior problems.
Counseling settings: Includes schools, prisons, and
governmental or privately owned institutions.
• The goal of assessments in this setting is
improvements in adjustment, productivity, or related
variable.
1-24
How are Assessments Conducted?
• There are many different methods used.
• Ethical testers have responsibilities before, during,
and after testing.
Obligations include:
• familiarity with test materials and procedures
• ensuring that the room in which the test will be
conducted is suitable and conducive to the testing
• It is important to establish rapport during test
administration. Rapport can be defined as a working
relationship between the examiner and the examinee.
1-25
Assessment of People with Disabilities
• The law mandates “alternate
assessment” – The definition of this
is up to states or school districts
• Accommodations need to be
made – the adaptation of a test,
procedure, or situation, or the
substitution of one test for
another, to make the assessment
more suitable for an assessee
with exceptional needs
1-26
Where to go for Information on Tests
•
•
•
•
Test catalogues - catalogues distributed by publishers of
tests. Usually brief, and un-critical, descriptions of tests.
Test manuals - Detailed information concerning the
development of a particular test and technical
information.
Reference volumes – reference volumes like the Mental
Measurements Yearbook or Tests in Print provide
detailed information on many tests.
Journal articles - contain reviews of a test, updated or
independent studies of its psychometric soundness, or
examples of how the instrument was used in either
research or an applied context.
1-27
Where to go for Information on Tests
• Online databases - Educational Resources Information
Center (ERIC) contains a wealth of resources and news
about tests, testing, and assessment. There are abstracts of
articles, original articles, and links to other useful websites.
• The American Psychological Association (APA) has a
number of databases including PsycINFO, ClinPSYC,
PsycARTICLES, and PsycSCAN.
• Other sources - Directory of Unpublished Experimental
Mental Measures and Tests in Microfiche. Also, university
libraries provide access to online databases such as
PsycINFO and full-text articles.
1-28
Question/s / Clarification/s
1-29
Activity 1
1. List down different standardized tests materials used in
the following settings:
A. Educational Setting
B. Clinical Setting
C. Industrial/ Organizational Setting
2. Include the name, proponent, goal/purpose of the each test.
3. To be submitted via teams in the folder for class activities.
4. Create your own folder indicating your Full Name (Dela
Cruz, Juan, A) and make sure to label your file appropriately
(Activity 1).
1-30
Agreed Weights on Board Examination Subjects
for Psychometricians
Board Examination
Courses
Weight
Number of
Items
Psychological Assessment
40%
150
Theories of Personality
20%
100
Abnormal Psychology
20%
100
Industrial Psychology
20%
100
Table of Specification
for the Psychometrician Board Examination
Psychological Assessment
Weight
Number
of Items
1. Apply technical concepts, basic principles and tools of
psychometrics and psychological assessment.
20%
29
2. Describe the process, research methods and statistics
used in test development and standardization.
20%
29
3. Identify the importance, benefits and limitations of
psychological assessment
10%
19
Outcome
Table of Specification
for the Psychometrician Board Examination
Psychological Assessment
Outcome
Weight
Number
of Items
4. Identify, assess and evaluate the methods and tools of
psychological assessment relative to the specific
purpose and context: school, hospital, industry and
community.
20%
29
5. Evaluate the administration and scoring procedures of
intelligence and objective personality tests and other
alternative forms of tests.
15%
22
6. Apply ethical considerations and standards in the
various dimensions of psychological assessment
15%
22
100%
150
total
A Brief History of
Psychological Testing
A Brief History of Psychological Testing
2200 B.C.
Chinese begin civil service examinations.
A.D.1862
Wilhelm Wundt uses a calibrated pendulum to measure the
“speed of thought.”
1884
Francis Galton administers the first test battery to thousands of
citizens at the International Health Exhibit.
1890
James McKeen Cattell uses the term mental test in announcing
the agenda for his Galtonian test battery.
1901
Clark Wissler discovers that Cattellian “brass instruments” tests
have no correlation with college grades.
1905
Binet and Simon invent the first modern intelligence test.
1914
Stern introduces the IQ, or intelligence quotient: the mental age
divided by chronological age.
1916
Lewis Terman revises the Binet-Simon scales, publishes the
Stanford-Binet. Revisions appear in 1937, 1960, and 1986.
A Brief History of Psychological Testing
1917
Robert Yerkes spearheads the development of the Army Alpha
and Beta examinations used for testing WWI recruits.
1917
Robert Woodworth develops the Personal Data Sheet, the first
personality test.
1920
Rorschach Inkblot test published.
1921
Psychological Corporation—the first major test publisher—
founded by Cattell, Thorndike, and Woodworth.
1927
First edition of the Strong Vocational Interest Blank published.
1935
Henry Murray and Christina Morgan developed the Thematic
Apperception Test.
1939
Wechsler-Bellevue Intelligence Scale published. Revisions
published in 1955, 1981, and 1997.
1942
Minnesota Multiphasic Personality Inventory published.
1949
Wechsler Intelligence Scale for Children published. Revisions
published in 1974, 1991.
1949
Raymond B. Cattell introduced the 16PF.
• Types of Psychological Tests
• ACHIEVEMENT TESTS – designed to measure what you have
already learned.
• APTITUDE TESTS – designed to determine your potential for
learning new information or skills.
• PERSONALITY TESTS – measures usual or habitual thoughts,
feelings, and behavior.
• ATTITUDE TESTS – a test designed to elicit personal beliefs
and opinions.
• INTEREST TESTS – a test designed to identify patterns of likes
and dislikes useful for making decisions about future careers
and job training.
Principles of Test Administration
• Tester must become thoroughly familiar with
the test.
• Tester must maintain an impartial and
scientific attitude.
• Tester must be able to establish and
maintain rapport.
Principles of Test Administration
• Tester must maintain a completely
unrevealing expression while at the same
time silently assuring the subject of his
interest.
• Tester observes the subject’s performance
with care.
Types of Test Administration
• Individual - common for intelligence and
ability tests and projective instruments.
• Group - more common than the individual
administration where the tests are
administered in a large group.
Common Intelligence Test
• The Stanford-Binet Intelligence Scales (5
th
ed.)
• The Weschler Adult Intelligence Scale (4
th
ed.)
• The Weschler Intelligence Scale for Children (4
th
• The Weschler Preschool and Primary
Scale of Intelligence (3 ed.)
rd
• Raven’s Progressive Matrices
• Panukat Ng Katalinuhang Pilipino
ed.)
Common Psychoeducational Test
Batteries
• The Kaufman Assessment Battery for Children
• The Woodcock-Johnson III Test
of Cognitive Abilities
Common Aptitude Test
• Differential Aptitude Test
• Flanagan Industrial Tests
• Philippine Aptitude Classification Test
Common Personality Test
• Neo Personality Inventory – Revised
• Sixteen Personality Factor Questionnaire
• Myers-Briggs Type Indicator
• Minnesota Multiphasic Personality Inventory
• Panukat Ng Pagkataong Pilipino
• Panukat Ng Ugali At Pagkatao
• Pictorial Self-Concept Scale For Children
• Vineland Adaptive Behavior Scales
Legal and Ethical
Considerations
• Laws - are rules that individuals must obey for the good of the
society as a whole. These are promulgated by the legislative bodies
of government.
• Ethics – is a body of principles of right, proper, or good conduct.
These are crafted by professional organizations and institutions.
What happens if there is a conflict between a
law and the code of ethics?
PAP Code of Ethics on Assessment
PAP Code of Ethics on Assessment
Bases for Assessment
• The expert opinions that we provide through our recommendations,
reports, and diagnostic or evaluative statements are based on
substantial information and appropriate assessment techniques.
 We provide expert opinions regarding the psychological characteristics
of a person after employing adequate assessment procedures and
examination to support our conclusions and recommendations.
PAP Code of Ethics on Assessment
Bases for Assessment
• In instances where we are asked to provide opinions about an
individual without conducting an examination on the basis of review of
existing test results and reports, we discuss the limitations of our
opinions and the basis of our conclusions and recommendations.
PAP Code of Ethics on Assessment
Informed Consent in Assessment
• We gather informed consent prior to the assessment of our clients
except for the following instances:
 when it is mandated by law
 when it is implied such as in routine educational, institutional
and organizational activity
 when the purpose of the assessment is to determine the
individual’s decisional capacity.
PAP Code of Ethics on Assessment
Informed Consent in Assessment
• We educate our clients about the nature of our services, financial
arrangements, potential risks, and limits of confidentiality. In instances
where our clients are not competent to provide informed consent on
assessment, we discuss these matters with immediate family member
or legal guardians.
PAP Code of Ethics on Assessment
Informed Consent in Assessment
• In instances where a third party interpreter is needed, confidentiality of
test results and the security of the tests must be ensured. The
limitations of the obtained data are discussed in our results,
conclusions and recommendations.
PAP Code of Ethics on Assessment
Assessment Tools
• We judiciously select and administer only those tests which are
pertinent to the reasons for referral and purpose of the assessment.
 We use data collection, methods and procedures that are
consistent with current scientific and professional developments.
PAP Code of Ethics on Assessment
Assessment Tools
• We use tests that are standardized, valid, reliable and has a normative
data directly referable to the population of our clients.
 We administer assessment tools that are appropriate to the
language, competence and other relevant characteristics of our
client.
PAP Code of Ethics on Assessment
Obsolete and Outdated Test Results
• We do not base our interpretations,
recommendations on outdated test results.
conclusions
and
 We do not provide interpretations, conclusions,
recommendations on the basis of obsolete tests.
and
PAP Code of Ethics on Assessment
Interpreting Assessment Results
• In fairness to our clients, under no circumstances should we report
the test results without taking into consideration the validity,
reliability and appropriateness of the test. We should therefore
indicate our reservations regarding the interpretations.
 We interpret assessment results while considering the purpose
of the assessment and other factors such as the client’s test
taking abilities, characteristics, situational, personal and
cultural differences.
PAP Code of Ethics on Assessment
Release of Test Results
• It is our responsibility to ensure that test results and interpretations
are not used by persons other than those explicitly agreed upon by
the referral sources prior to the assessment procedure.
 We do not release test data in the forms of raw and scaled
scores, client’s responses to test questions or stimuli, and notes
regarding the client’s statements and behaviors during the
examination unless ordered by the court.
PAP Code of Ethics on Assessment
Explaining Assessment Results
• We release test results only to sources of referral and with a written
permission from the client if it is a self referral.
 Where test results have to be communicated to relatives,
parents or teachers, we explain them through non-technical
language.
PAP Code of Ethics on Assessment
Explaining Assessment Results
• We explain findings and test results to our clients or designated
representatives except when the relationship precludes the provision
of explanation of results and it is explained in advanced to the client.
 When test results needs to be shared with schools, social
agencies, the courts or industry, we supervise such releases.
PAP Code of Ethics on Assessment
Test Security
 The administration and handling of all test materials shall be
done only by qualified users or personnel.
PAP Code of Ethics on Assessment
Assessment of Unqualified Persons
• We do not promote the use of assessment tools and methods by
unqualified persons except for training purposes with adequate
supervision.
• We ensure that test protocols, their interpretations and all
other records are kept secured from unqualified persons.
PAP Code of Ethics on Assessment
Test Construction
• We develop tests and other assessment tools using current scientific
findings and knowledge, appropriate psychometric properties,
validation and standardization procedures.
APA Committee on Ethical Standards
Test-User Qualifications
Level A – tests that can adequately be administered, scored, and
interpreted with the aid of the manual and a general orientation
to the kind of institution or organization in which one is working.
Level B – tests that require some technical knowledge of test
construction and use of supporting psychological and educational
fields such as statistics, individual differences, psychology of
adjustment, personnel psychology and guidance.
APA Committee on Ethical Standards
Test-User Qualifications
•Level C – tests that require substantial understanding of testing
and supporting psychological fields together with supervised
experience in the use of these devices.
APA Committee on Ethical Standards
The Rights of Testtakers
The right of informed consent.
The right to be informed of test findings.
The right to privacy and confidentiality.
The right to the least stigmatizing label.
APA Committee on Ethical Standards
The Rights of Testtakers
The right of informed consent – testtakers have the right to
know why they are being evaluated, how the test data will be
used and what information will be released to whom.
The right to be informed of test findings.
The right to privacy and confidentiality.
The right to the least stigmatizing label.
APA Committee on Ethical Standards
The Rights of Testtakers
The right of informed consent
The right to be informed of test findings – Testtakers have a right to
be informed, in language they can understand, of the nature of the
findings with respect to a test they have taken. They are also entitled
to know what recommendations are being made as a consequence
of the test data.
The right to privacy and confidentiality.
The right to the least stigmatizing label.
APA Committee on Ethical Standards
The Rights of Testtakers
The right of informed consent
The right to be informed of test findings
The right to privacy – the privacy right recognizes the freedom of the
individual to pick and choose for himself the time, circumstances, and
particularly the extent to which he wishes to share or withhold from others
his attitudes, beliefs, behavior, and opinions.
The right to confidentiality - information revealed through the process of
psychological assessment are not to be shared to any other person without
the consent of the assesse.
The right to the least stigmatizing label.
APA Committee on Ethical Standards
The Rights of Testtakers
The right of informed consent
The right to be informed of test findings
The right to privacy and confidentiality
The right to the least stigmatizing label - the least stigmatizing
labels should always be assigned when reporting test results.
APA Committee on Ethical Standards
The Rights of Testtakers
The right to the least stigmatizing label.
Chapter 3
A Statistics Refresher
McGraw-Hill/Irwin
© 2013 McGraw-Hill Companies. All Rights Reserved.
Scales of Measurement
Continuous scales – theoretically possible to divide
any of the values of the scale. Typically having a wide
range of possible values (e.g. height or a depression
scale). For example, A girl’s weight or height, the
length of the road. The weight of a girl can be any value
from 54 kgs, or 54.5 kgs, or 54.5436kgs.
Discrete scales – categorical values (e.g. male or
female). For example, when you roll a die, the possible
outcomes are 1, 2, 3, 4, 5 or 6 and not 1.5 or 2.45.
Error – the collective influence of all of the factors on
a test score beyond those specifically measured by the
test
3-2
Scales of Measurement (cont’d.)
Nominal Scales - involve classification or categorization
based on one or more distinguishing characteristics; all
things measured must be placed into mutually exclusive
and exhaustive categories (e.g. apples and oranges, DSMIV diagnoses, etc.).
Ordinal Scales – Involve
classifications, like
nominal scales but also
allow rank ordering (e.g.
Olympic medalists).
3-3
Scales of Measurement (cont’d.)
Interval Scales - contain equal intervals between
numbers. Each unit on the scale is exactly equal to any
other unit on the scale (e.g. IQ scores and most other
psychological measures).
Ratio Scales – Interval scales with a true zero point
(e.g. height or reaction time).
Psychological Measurement – Most psychological
measures are truly ordinal but are treated as interval
measures for statistical purposes.
3-4
Describing Data
Distributions - a set of test scores arrayed for
recording or study.
Raw Score - a straightforward, unmodified
accounting of performance that is usually
numerical.
Frequency Distribution - all scores are listed
alongside the number of times each score occurred
3-5
Describing Data
Frequency distributions may be in tabular form as in the example above.
It is a simple frequency distribution (scores have not been grouped).
3-6
Describing Data
Grouped frequency distributions have class intervals
rather than actual test scores
3-7
Describing Data
A histogram is a graph with vertical lines drawn at the true limits of
each test score (or class interval), forming a series of contiguous rectangles
3-8
Describing Data
Bar graph - numbers indicative
of frequency appear on the Y axis, and reference to some
categorization (e.g., yes/ no/
maybe, male/female) appears on
the X -axis.
3-9
Describing Data
frequency polygon - test
scores or class intervals
(as indicated on the X axis) meet frequencies (as
indicated on the Y -axis).
3-10
Types of Distributions
3-11
Types of Distribution
Bimodal literally means "two modes" and is typically used to describe
distributions of values that have two centers. For example, the distribution
of heights in a sample of adults might have two peaks, one for women and
one for men.
Skewness is a measure of the asymmetry of a distribution. A distribution is
asymmetrical when its left and right side are not mirror images. A
distribution can have right (or positive), left (or negative), or zero skewness.
A frequency distribution that is extremely asymmetrical in that the initial
(or final) frequency group contains the highest frequency, with succeeding
frequencies becoming smaller (or larger) elsewhere; the shape of the curve
roughly approximates the letter “J” lying on its side.
3-12
Measures of Central Tendency
Central tendency - a statistic that indicates the average or
midmost score between the extreme scores in a
distribution.
Mean - Sum of the observations (or test scores), in
this case divided by the number of observations.
Median – The middle score in a distribution. Particularly useful
when there are outliers, or extreme scores in a distribution.
Mode – The most frequently occurring score in a distribution.
When two scores occur with the highest frequency a
distribution is said to be bimodal.
3-13
Measures of Variability
Variability is an indication of the degree to which scores
are scattered or dispersed in a distribution.
Distributions A and B have the same mean score but Distribution
has greater variability in scores (scores are more spread out).
3-14
Measures of Variability
Measures of variability are statistics that describe the
amount of variation in a distribution.
Range - difference between the highest and the lowest
scores.
Average deviation – the average deviation of scores in
a distribution from the mean.
Variance - the arithmetic mean of the squares of the
differences between the scores in a distribution and
their mean
Standard deviation – the square root of the average
squared deviations about the mean. It is the square root
of the variance. Typical distance of scores from the
3-15
mean.
Measures of Variability
Skewness - the nature and extent to which symmetry is
absent in a distribution.
• Positive skew - relatively few of the scores fall at the high
end of the distribution.
• Negative skew – relatively few of the scores fall at the low
end of the distribution.
Kurtosis – the sharpness of the peak of a frequencydistribution curve.
• Platykurtic – relatively flat.
• Leptokurtic – relatively peaked.
• Mesokurtic – somewhere in the middle.
3-16
The Normal Curve
The normal curve is a bell-shaped, smooth, mathematically
defined curve that is highest at its center. Perfectly symmetrical.
Area Under the Normal Curve
The normal curve can be
conveniently divided into areas
defined by units of standard
deviations.
3-17
Standard Scores
A standard score is a raw score that has been converted from
one scale to another scale, where the latter scale has some
arbitrarily set mean and standard deviation.
Z-score - conversion of a raw score into a number
indicating how many standard deviation units the
raw score is below or above the mean of the
distribution.
T scores - can be called a fifty plus or minus ten scale; that is, a
scale with a mean set at 50 and a standard deviation set at 10
Stanine - a standard score with a mean of 5 and a standard
deviation of approximately 2. Divided into nine units.
Normalizing a distribution - involves “stretching” the skewed
curve into the shape of a normal curve and creating a
corresponding scale of standard scores
3-18
Correlation and Inference
• A coefficient of correlation (or correlation coefficient)
is a number that provides us with an index of the
strength of the relationship between two things.
• Correlation coefficients vary in magnitude between -1
and +1. A correlation of 0 indicates no relationship
between two variables.
• Positive correlations indicate that as one variable
increases or decreases, the other variable follows suit.
• Negative correlations indicate that as one variable
increases the other decreases.
• Correlation between variables does not imply
causation but it does aid in prediction.
3-19
Correlation and Inference
Pearson r: A method of computing correlation when
both variables are linearly related and continuous.
Once a correlation coefficient is obtained, it needs to
be checked for statistical significance (typically a
probability level below .05).
By squaring r, one is able to obtain a coefficient of
determination, or the variance that the variables share
with one another.
Spearman Rho: A method for computing correlation,
used primarily when sample sizes are small or the
variables are ordinal in nature.
3-20
Correlation and Inference
Scatterplot – Involves simply plotting one variable on the X
(horizontal) axis and the other on the Y (vertical) axis
Scatterplots of no correlation (left) and moderate correlation (right)
3-21
Correlation and Inference
Scatterplots of strong correlations feature points tightly clustered
together in a diagonal line. For positive correlations the line goes from
bottom left to top right.
3-22
Correlation and Inference
Strong negative correlations form a tightly clustered diagonal
line from top left to bottom right.
3-23
Correlation and Inference
Outlier – an extremely atypical point (case), lying relatively
far away from the other points in a scatterplot
3-24
Correlation and Inference
Restriction of range leads to weaker correlations
3-25
Meta-Analysis
• Meta-analysis allows researchers to look at the
relationship between variables across many
separate studies.
• Meta-analysis- a family of techniques to
statistically combine information across studies to
produce single estimates of the data under study.
• The estimates are in the form of effect size, which
is often expressed as a correlation coefficient.
3-26
Chapter 4
Of Tests and Testing
McGraw-Hill/Irwin
© 2013 McGraw-Hill Companies. All Rights Reserved.
Assumptions about Psychological Testing
Psychological Traits and States Exist
• A trait has been defined as “any distinguishable, relatively
enduring way in which one individual varies from another”
(Guilford, 1959, p. 6).
• States also distinguish one person from another but are
relatively less enduring (Chaplin et al., 1988).
• Thousands of trait terms can be found in the English
language (e.g. outgoing, shy, reliable, calm, etc.).
• Psychological traits exist as constructs - an informed,
scientific concept developed or constructed to describe or
explain behavior.
• We can’t see, hear, or touch constructs, but we can infer their
existence from overt behavior, such as test scores.
4-2
Assumptions about Psychological Testing
• Traits are relatively stable. They
may change over time, yet there
are often high correlations
between trait scores at different
time points.
• The nature of the situation
influences how traits will be
manifested.
• Traits refer to ways in which one
individual varies, or differs, from
another
Some people score higher than others
on traits like sensation-seeking
4-3
Assumptions about Psychological Testing
Traits and States Can Be Quantified and Measured
•
•
•
Different test developers may define and measure
constructs in different ways.
Once a construct is defined, test developers turn to
item content and item weighting.
A scoring system and a way to interpret results need
to be devised.
4-4
Assumptions about Psychological Testing
Test-Related Behavior Predicts Non-Test-Related Behavior
Responses on tests are thought to predict real-world
behavior. The obtained sample of behavior is expected
to predict future behavior.
Tests Have Strengths and Weaknesses
Competent test users understand and appreciate the
limitations of the tests they use as well as how those
limitations might be compensated for by data from
other sources.
4-5
Assumptions about Psychological Testing
Various Sources of Error are Part of Assessment
Error refers to a long-standing assumption that factors
other than what a test attempts to measure will
influence performance on the test.
Error variance - the component of a test score
attributable to sources other than the trait or ability
measured.
• Both the assessee and assessor are sources of error
variance
4-6
Assumptions about Psychological Testing
Testing and Assessment can be Conducted in a Fair Manner
• All major test publishers strive to develop instruments that are
fair when used in strict accordance with guidelines in the test
manual.
• Problems arise if the test is used with people for whom it was
not intended.
• Some problems are more political than psychometric in
nature.
Testing and Assessment Benefit Society
• There is a great need for tests, especially good tests,
considering the many areas of our lives that they benefit.
4-7
What’s a “Good Test?”
Reliability: The consistency of the measuring tool: the
precision with which the test measures and the extent to
which error is present in measurements.
Validity: The test measures what it purports to measure.
Other considerations: Administration, scoring,
interpretation should be straightforward for trained
examiners. A good test is a useful test that will ultimately
benefit individual testtakers or society at large.
4-8
Norms
• Norm-referenced testing and assessment: a method of
evaluation and a way of deriving meaning from test
scores by evaluating an individual testtaker’s score and
comparing it to scores of a group of testtakers.
• The meaning of an individual test score is understood
relative to other scores on the same test.
• Norms are the test performance data of a particular group
of testtakers that are designed for use as a reference when
evaluating or interpreting individual test scores.
• A normative sample is the reference group to which testtakers are compared.
4-9
Sampling to Develop Norms
Standardization: The process of administering a test to
a representative sample of testtakers for the
purpose of establishing norms.
Sampling – Test developers select a population, for
which the test is intended, that has at least one common,
observable characteristic.
Stratified sampling: Sampling that includes different
subgroups, or strata, from the population.
Stratified-random sampling: Every member of the
population has an equal opportunity of being included in
a sample.
4-10
Sampling to Develop Norms
Purposive sample: Arbitrarily selecting a sample that
is believed to be representative of the population.
Incidental/convenience sample: A sample that is
convenient or available for use. May not be
representative of the population.
• Generalization of findings from convenience
samples must be made with caution.
4-11
Sampling to Develop Norms
Developing Norms
Having obtained a sample test developers:
• Administer the test with standard set of instructions
• Recommend a setting for test administration
• Collect and analyze data
• Summarize data using descriptive statistics including
measures of central tendency and variability
• Provide a detailed description of the
4-12
Types of Norms
• Percentile - the percentage of people whose score
on a test or measure falls below a particular raw
score.
• Percentiles are a popular method for organizing
test-related data because they are easily calculated.
• One problem is that real differences between raw
scores may be minimized near the ends of the
distribution and exaggerated in the middle of the
distribution.
4-13
Types of Norms (cont’d.)
Age norms: average performance of different samples of testtakers who were at various ages when the test was administered.
Grade norms: the average test performance of testtakers in a
given school grade.
National norms: derived from a normative sample that was
nationally representative of the population at the time the
norming study was conducted.
National anchor norms: An equivalency table for scores on two
different tests. Allows for a basis of comparison.
Subgroup norms: A normative sample can be segmented by any
of the criteria initially used in selecting subjects for the sample.
Local norms: provide normative information with respect to the
local population’s performance on some test.
4-14
Fixed Reference Group Scoring Systems
Fixed Reference Group Scoring Systems: The
distribution of scores obtained on the test from one group
of testtakers is used as the basis for the calculation of test
scores for future administrations of the test.
• The SAT employs this method.
Norm-Referenced versus Criterion-Referenced
Interpretation
Norm referenced tests involve comparing individuals to
the normative group. With criterion referenced tests
testtakers are evaluated as to whether they meet a set
standard (e.g. a driving exam).
4-15
Culture and Inference
• In selecting a test for use, responsible test users
should research the test’s available norms to check
how appropriate they are for use with the targeted
testtaker population.
• When interpreting test results it helps to know
about the culture and era of the test-taker.
• It is important to conduct culturally informed
assessment.
4-16
Chapter 7
Utility
McGraw-Hill/Irwin
© 2013 McGraw-Hill Companies. All Rights Reserved.
What is Utility?
Utility: the usefulness or practical value of testing to
improve efficiency.
Factors Affecting Utility
Psychometric soundness – Generally, the higher the
criterion validity of a test the greater the utility.
• There are exceptions because many factors affect the
utility of an instrument and utility is assessed in many
different ways.
• Valid tests are not always useful tests.
7-2
Factors Affecting Utility
Costs – One of the most basic elements of a utility analysis is
the financial cost associated with a test.
• Cost in the context of test utility refers to disadvantages,
losses, or expenses in both economic and noneconomic terms.
• Economic costs may include purchasing a test, a supply bank
of test protocols, and computerized test processing.
• Other economic costs are more difficult to calculate such as
the cost of not testing or testing with an inadequate instrument.
• Non-economic costs include things such as human life and
safety
7-3
Factors Affecting Utility
Benefits – We should take into account whether the benefits
of testing justify the costs of administering, scoring, and
interpreting the test.
• Benefits can be defined as profits, gains, or advantages.
• Successful testing programs can yield higher worker
productivity and profits for a company.
• Some potential benefits include: an increase in the quality
of workers’ performance; an increase in the quantity of
workers’ performance; a decrease in the time needed to train
workers; a reduction in the number of accidents; a reduction
in worker turnover.
• Non-economic benefits may include a better work
environment and improved morale.
7-4
Utility Analysis
Utility Analysis: a family of techniques that entail a cost–
benefit analysis designed to yield information relevant to
a decision about the usefulness and/or practical value of a
tool of assessment.
• Some utility tests are straightforward, while others are
more sophisticated, employing complicated mathematical
models
•Endpoint of a utility analysis yields an educated decision
as to which of several alternative courses of action is
most optimal (in terms of costs and benefits).
7-5
Practical Considerations
The pool of job applicants – Some utility models are
based on the assumption that for a particular position there
is a limitless pool of candidates.
• However, some jobs require such expertise or sacrifice
that the pool of qualified candidates may be very small.
• The economic climate also affects the size of the pool.
• The top performers on a selection test may not accept a
job offer.
7-6
Practical Considerations
The complexity of the job – The same utility models are used for a
variety of positions, yet the more complex the job the bigger the
difference in people who perform well or poorly.
The cut score in use – Cut scores may be relative, in which case
they are determined in reference to normative data (e.g. selecting
people in the top 10% of test scores).
Fixed cut scores are made on the basis of having achieved a
minimum level of proficiency on a test (e.g. a driving license exam).
Multiple cut scores – The use of multiple cut scores for a single
predictor (e.g. students may achieve grades of A, B, C, D, or E).
Multiple hurdles - achievement of a particular cut score on one test
is necessary in order to advance to the next stage of evaluation in
the selection process (e.g. Miss America contest).
7-7
Methods of Setting Cut Scores
The Angoff Method: judgments of experts are averaged to
yield cut scores for the test.
• Can be used for personnel selection, traits, attributes, and
abilities.
• Problems arise if there is low agreement between experts.
The Known Groups Method: entails collection of data on
the predictor of interest from groups known to possess, and
not to possess, a trait, attribute, or ability of interest.
• After analysis of a data, a cut score is chosen that best
discriminates the groups.
• One problem with known groups method is that no
standard set of guidelines exist to establish guidelines.
7-8
Methods of Setting Cut Scores
IRT Based Methods: In an IRT framework, each item is
associated with a particular level of difficulty.
• In order to “pass” the test, the testtaker must answer
items that are deemed to be above some minimum level
of difficulty, which is determined by experts and serves
as the cut score.
7-9
Activity 3
Each group must write a topic/ variable proposal for their test
development.
Indicate what variable/s to be measured, what kind of
methodology and theoretical framework.
To be submitted Jan. 10, 2023.
7-10
Test Development
TEST DEVELOPMENT
The Five Stages of Test
Development
Test development is an
umbrella term for all that
goes into the process of
creating a test.
8-2
TEST CONCEPTUALIZATION
 The
impetus for developing a new test is some
thought that “there ought to be a test for…”
 The
stimulus could be knowledge of
psychometric problems with other tests, a new
social phenomenon, or any number of things.
 There
may be a need to assess mastery in an
emerging occupation.
8-3
TEST CONCEPTUALIZATION
What special training will
Some preliminary questions: be required of test users
for administering or
What is the test designed to
interpreting the test?
measure?
What types of responses
What is the objective of the test?
will be required of
Is there a need for this test?
testtakers?
Who will use this test?
Who benefits from an
Who will take this test?
What content will the test cover? administration of this
How will the test be administered? test?
Is there any potential for
What is the ideal format of the
harm as the result of an
test?
Should more than one form of the administration of this
test?
test be developed?
How will meaning be
8-4
attributed to scores on
this test?
ITEM DEVELOPMENT IN NORM
REFERENCED AND CRITERION-REFERENCED
TESTS




Generally a good item on a norm-referenced achievement test
is an item for which high scorers on the test respond
correctly. Low scorers respond incorrectly.
Ideally, each item on a criterion-oriented test addresses the
issue of whether the respondent has met certain criteria.
Development of a criterion-referenced test may entail
exploratory work with at least two groups of testtakers: one
group known to have mastered the knowledge or skill being
measured and another group known not to have mastered it.
Test items may be pilot studied to evaluate whether they
should be included in the final form of the instrument.
8-5
TEST CONSTRUCTION
Scaling: the process of setting
rules for assigning numbers in
measurement.
Types of scales: Scales are
instruments to measure some
trait, state or ability. May be
categorized in many ways (e.g.
multidimensional,
unidemensional, etc.).
L.L. Thorndike was very
influential in the development
of sound scaling methods.
8-6
TEST CONSTRUCTION – SCALING METHODS
Numbers can be assigned to responses to calculate test
scores using a number of methods.
Rating Scales - a grouping of words, statements, or
symbols on which judgments of the strength of a
particular trait, attitude, or emotion are indicated by
the testtaker.
Likert scale - Each item presents the testtaker with five
alternative responses (sometimes seven), usually on an
agree–disagree or approve–disapprove continuum.

An example of
a Likert scale
8-7
TEST CONSTRUCTION – SCALING METHODS
 Likert
scales are typically reliable.
 All rating scales result in ordinal level
data.
 Some rating scales are unidimensional,
meaning that only one dimension is
presumed to underlie the ratings.
 Other rating scales are multidimensional,
meaning that more than one dimension is
thought to underlie the ratings.
8-8
TEST CONSTRUCTION – SCALING METHODS
Method of Paired Comparisons – Test-takers must choose
between two alternatives according to some rule.
• For each pair of options, testtakers receive a higher score
for selecting the option deemed more justifiable by the
majority of a group of judges.
• The test score would reflect the number of times the
choices of a testtaker agreed with those of the judges.
8-9
TEST CONSTRUCTION – SCALING METHODS





Comparative scaling: Entails judgments of a stimulus
in comparison with every other stimulus on the scale.
Categorical scaling: Stimuli (e.g. index cards) are
placed into one of two or more alternative categories.
Guttman scale: Items range sequentially from weaker
to stronger expressions of the attitude, belief, or feeling
being measured.
All respondents who agree with the stronger statements
of the attitude will also agree with milder statements.
The method of equal-appearing intervals can be
used to obtain data that are interval in nature.
8-10
TEST CONSTRUCTION – WRITING ITEMS
Item pool: The reservoir or well from which items will
or will not be drawn for the final version of the test.
 comprehensive sampling provides a basis for content
validity of the final version of the test.
Item format: Includes variables such as the form,
plan, structure, arrangement, and layout of individual
test items.
 selected-response format – items require testtakers
to select a response from a set of alternative responses.
 constructed-response format – items require
testtakers to supply or to create the correct answer,
not merely to select it.
8-11
TEST CONSTRUCTION – WRITING ITEMS
Multiple-choice format has three elements: (1) a stem,
(2) a correct alternative or option, and (3) several
incorrect alternatives or options variously referred to
as distractors or foils.
Other commonly used selective response formats include
matching and true-false items
8-12
Download