Chapter 1 Psychological Testing and Assessment McGraw-Hill/Irwin © 2013 McGraw-Hill Companies. All Rights Reserved. Testing Defined Testing: The process of measuring psychologyrelated variables by means of devices or procedures designed to obtain a sample of behavior. The objective of testing is typically to obtain some gauge, usually numerical in nature, with regard to an ability or tribute. 1-2 Assessment Defined Assessment: The gathering and integration of psychologyrelated data for the purpose of making a psychological evaluation through tools such as tests, interviews, case studies, behavioral observation, and other methods. The objective of assessment is typically to answer a referral question, solve a problem or arrive at a decision through the tools of evaluation. 1-3 Testing in Contrast to Assessment TESTING ASSESSMENT Tester/Test User – Test Taker Assessor – Assessee/Client Objective To obtain some gauge, usually numerical in nature, with regard to an ability or attribute Objective To answer a referral question, solve a problem, or arrive at a decision through the use of tools of evaluation Process Process Individual or group. After test Individualized, focuses on how an administration, the tester will add individual processes rather than up the number of correct answers the results of that processing. or the number of certain types of responses, with little regard for the how or mechanics of the content. Testing in Contrast to Assessment TESTING ASSESSMENT Tester/Test User – Test Taker Assessor - Client Role of Evaluator The tester is not the key to the process; one tester may be substituted by another without appreciably affecting the evaluator Role of Evaluator The assessor is key to the process of selecting tests and/or other tools of evaluation as well as in drawing conclusions from the entire evaluation. Skill of Evaluator Testing typically requires technician-like skills in terms of administering and scoring test as well as in interpreting a test result Skill of Evaluator Requires educated selection of tools of evaluation, skill in evaluation, and thoughtful organization ad integration of Testing in Contrast to Assessment TESTING ASSESSMENT Tester/Test User – Test Taker Assessor - Assessee Outcome Testing yields a test score or series of test scores Outcome Entails a logical problemsolving approach that brings to bear many sources of data designed to shed light on referral questions. Assessment Non-Collaborative Psychological Assessment: Approach in assessment with minimal input from the assessee. Collaborative Psychological Assessment: The assessor and assessee work as partners. • Therapeutic Psychological Assessment: Therapeutic self-discovery is encouraged through the assessment process. Meet an Assessment Professional Dr. Stephen Finn 1-8 Assessment Dynamic Assessment Evaluation Intervention Dynamic assessment is typically employed in educational settings but also may be used in correctional, corporate, neuropsychological, clinical, and other settings Evaluation 1-9 Tools of Psychological Assessment The Test • A psychological test is a device or procedure designed to measure variables related to psychology (e.g. intelligence, attitudes, personality, interests, etc.). • Psychological tests vary by content, format, administration, scoring, interpretation, and technical quality 1-10 Psychological Tests Content: The subject matter of the test. Content depends on the theoretical orientation of test developers and the unique way in which they define the construct of interest. Format: The form, plan, structure, layout of test items, and other considerations (e.g. time limits). Administration: Tests may require certain tasks to be performed, trained observation of performance, or little involvement by the test administrators (e.g. self-report questionnaires). 1-11 Psychological Tests Scoring and Interpretation: Scoring of tests may be simple, such as summing responses to items, or may require more elaborate procedures. • Some tests results can be interpreted easily, or interpreted by computer, whereas other tests require expertise for proper interpretation. Cut Score: A reference point, usually numerical, used to divide data into two or more classifications (e.g. pass or fail). 1-12 Psychological Tests Technical Quality or Psychometric Soundness: Psychometrics is the science of psychological measurement. The psychometric soundness of a test depends on how consistently and accurately the test measures what it purports to measure. • Test users are sometimes referred to as psychometrists or psychometricians. 1-13 The Interview The interview is a method of gathering information through direct communication involving reciprocal exchange Interviews vary as to their purpose, length and nature The quality of information obtained in an interview often depends on the skills of the interviewer (e.g. their pacing, rapport, and their ability to convey genuineness, empathy, and humor) 1-14 Other Tools of Psychological Assessment The Portfolio: A file containing the products of one’s work. May serve as a sample of one’s abilities and accomplishments Case History Data: Information preserved in records, transcripts, or other forms. Behavioral Observation: Monitoring the actions of people through visual or electronic means 1-15 Other Tools of Psychological Assessment Role-Play Tests: Assessees are directed to act as if they were in a particular situation. Useful in evaluating various skills. Computers as Tools: Computers can assist in test administration, scoring, and interpretation. 1-16 Computers as Tools Contd. • Scoring may be done on-site (local processing) or at a central location (central processing). • Reports may come in the form of a simple scoring report, extended scoring report, interpretive report, consultative report, or integrative report. • Computer Assisted Psychological Assessment (CAPA) and Computer Adaptive Testing (CAT) have allowed for tailor-made tests with built-in scoring and interpretive capabilities. 1-17 Computers as Tools Contd. • Assessment is increasingly conducted via the internet. Advantages of Internet Testing 1) Greater access to potential test-users 2) Scoring and interpretation tends to be quicker 3) Costs tend to be lower 4) Facilitates testing otherwise isolated populations and people with disabilities 1-18 Who, What, Why, How, and Where? Who Are the Parties? The test developer – tests are created for research studies, publication (as commercially available instruments), or as modifications of existing tests. • The Standards for Educational and Psychological Testing covers issues related to test construction and evaluation, test administration and use, special applications of tests and considerations for linguistic minorities. 1-19 Who are the Parties? The test user – Tests are used by a wide range of professionals • The Standards contains guidelines for who should be administering psychological tests but many countries have no ethical or legal guidelines for test use The test-taker – Anyone who is the subject of an assessment or evaluation is a test-taker. • Test-takers may differ on a number of variables at the time of testing (e.g. test anxiety, emotional distress, physical discomfort, alertness, etc.) 1-20 Who are the Parties? Society at large – Test developers create tests to meet the needs of an evolving society. • Laws and court decisions may play a major role in test development, administration, and interpretation. Other parties - Organizations, companies, and governmental agencies sponsor the development of tests. • Companies may offer test scoring and interpretation • Researchers may review tests and evaluate their psychometric soundness 1-21 What Types of Settings? Geriatric settings: Assessment primarily evaluates cognitive, psychological, adaptive, or other functioning. The issue is quality of life. Business and military settings: Decisions regarding careers of personnel are made with a variety of achievement, aptitude, interest, motivational, and other tests. Government and organizational credentialing: Includes governmental licensing, certification, or general credentialing of professionals (e.g. attorneys, physicians, teachers, and psychologists) 1-22 What Types of Settings? Educational settings: Students typically undergo school ability tests and achievement tests. Diagnostic tests may be used to identify areas for educational intervention . Educators may also make informal evaluations of their students 1-23 What Types of Settings? Clinical settings: Includes hospitals, inpatient and outpatient clinics, private-practice consulting rooms, schools, and other institutions. • Assessment tools are used to help screen for or diagnose behavior problems. Counseling settings: Includes schools, prisons, and governmental or privately owned institutions. • The goal of assessments in this setting is improvements in adjustment, productivity, or related variable. 1-24 How are Assessments Conducted? • There are many different methods used. • Ethical testers have responsibilities before, during, and after testing. Obligations include: • familiarity with test materials and procedures • ensuring that the room in which the test will be conducted is suitable and conducive to the testing • It is important to establish rapport during test administration. Rapport can be defined as a working relationship between the examiner and the examinee. 1-25 Assessment of People with Disabilities • The law mandates “alternate assessment” – The definition of this is up to states or school districts • Accommodations need to be made – the adaptation of a test, procedure, or situation, or the substitution of one test for another, to make the assessment more suitable for an assessee with exceptional needs 1-26 Where to go for Information on Tests • • • • Test catalogues - catalogues distributed by publishers of tests. Usually brief, and un-critical, descriptions of tests. Test manuals - Detailed information concerning the development of a particular test and technical information. Reference volumes – reference volumes like the Mental Measurements Yearbook or Tests in Print provide detailed information on many tests. Journal articles - contain reviews of a test, updated or independent studies of its psychometric soundness, or examples of how the instrument was used in either research or an applied context. 1-27 Where to go for Information on Tests • Online databases - Educational Resources Information Center (ERIC) contains a wealth of resources and news about tests, testing, and assessment. There are abstracts of articles, original articles, and links to other useful websites. • The American Psychological Association (APA) has a number of databases including PsycINFO, ClinPSYC, PsycARTICLES, and PsycSCAN. • Other sources - Directory of Unpublished Experimental Mental Measures and Tests in Microfiche. Also, university libraries provide access to online databases such as PsycINFO and full-text articles. 1-28 Question/s / Clarification/s 1-29 Activity 1 1. List down different standardized tests materials used in the following settings: A. Educational Setting B. Clinical Setting C. Industrial/ Organizational Setting 2. Include the name, proponent, goal/purpose of the each test. 3. To be submitted via teams in the folder for class activities. 4. Create your own folder indicating your Full Name (Dela Cruz, Juan, A) and make sure to label your file appropriately (Activity 1). 1-30 Agreed Weights on Board Examination Subjects for Psychometricians Board Examination Courses Weight Number of Items Psychological Assessment 40% 150 Theories of Personality 20% 100 Abnormal Psychology 20% 100 Industrial Psychology 20% 100 Table of Specification for the Psychometrician Board Examination Psychological Assessment Weight Number of Items 1. Apply technical concepts, basic principles and tools of psychometrics and psychological assessment. 20% 29 2. Describe the process, research methods and statistics used in test development and standardization. 20% 29 3. Identify the importance, benefits and limitations of psychological assessment 10% 19 Outcome Table of Specification for the Psychometrician Board Examination Psychological Assessment Outcome Weight Number of Items 4. Identify, assess and evaluate the methods and tools of psychological assessment relative to the specific purpose and context: school, hospital, industry and community. 20% 29 5. Evaluate the administration and scoring procedures of intelligence and objective personality tests and other alternative forms of tests. 15% 22 6. Apply ethical considerations and standards in the various dimensions of psychological assessment 15% 22 100% 150 total A Brief History of Psychological Testing A Brief History of Psychological Testing 2200 B.C. Chinese begin civil service examinations. A.D.1862 Wilhelm Wundt uses a calibrated pendulum to measure the “speed of thought.” 1884 Francis Galton administers the first test battery to thousands of citizens at the International Health Exhibit. 1890 James McKeen Cattell uses the term mental test in announcing the agenda for his Galtonian test battery. 1901 Clark Wissler discovers that Cattellian “brass instruments” tests have no correlation with college grades. 1905 Binet and Simon invent the first modern intelligence test. 1914 Stern introduces the IQ, or intelligence quotient: the mental age divided by chronological age. 1916 Lewis Terman revises the Binet-Simon scales, publishes the Stanford-Binet. Revisions appear in 1937, 1960, and 1986. A Brief History of Psychological Testing 1917 Robert Yerkes spearheads the development of the Army Alpha and Beta examinations used for testing WWI recruits. 1917 Robert Woodworth develops the Personal Data Sheet, the first personality test. 1920 Rorschach Inkblot test published. 1921 Psychological Corporation—the first major test publisher— founded by Cattell, Thorndike, and Woodworth. 1927 First edition of the Strong Vocational Interest Blank published. 1935 Henry Murray and Christina Morgan developed the Thematic Apperception Test. 1939 Wechsler-Bellevue Intelligence Scale published. Revisions published in 1955, 1981, and 1997. 1942 Minnesota Multiphasic Personality Inventory published. 1949 Wechsler Intelligence Scale for Children published. Revisions published in 1974, 1991. 1949 Raymond B. Cattell introduced the 16PF. • Types of Psychological Tests • ACHIEVEMENT TESTS – designed to measure what you have already learned. • APTITUDE TESTS – designed to determine your potential for learning new information or skills. • PERSONALITY TESTS – measures usual or habitual thoughts, feelings, and behavior. • ATTITUDE TESTS – a test designed to elicit personal beliefs and opinions. • INTEREST TESTS – a test designed to identify patterns of likes and dislikes useful for making decisions about future careers and job training. Principles of Test Administration • Tester must become thoroughly familiar with the test. • Tester must maintain an impartial and scientific attitude. • Tester must be able to establish and maintain rapport. Principles of Test Administration • Tester must maintain a completely unrevealing expression while at the same time silently assuring the subject of his interest. • Tester observes the subject’s performance with care. Types of Test Administration • Individual - common for intelligence and ability tests and projective instruments. • Group - more common than the individual administration where the tests are administered in a large group. Common Intelligence Test • The Stanford-Binet Intelligence Scales (5 th ed.) • The Weschler Adult Intelligence Scale (4 th ed.) • The Weschler Intelligence Scale for Children (4 th • The Weschler Preschool and Primary Scale of Intelligence (3 ed.) rd • Raven’s Progressive Matrices • Panukat Ng Katalinuhang Pilipino ed.) Common Psychoeducational Test Batteries • The Kaufman Assessment Battery for Children • The Woodcock-Johnson III Test of Cognitive Abilities Common Aptitude Test • Differential Aptitude Test • Flanagan Industrial Tests • Philippine Aptitude Classification Test Common Personality Test • Neo Personality Inventory – Revised • Sixteen Personality Factor Questionnaire • Myers-Briggs Type Indicator • Minnesota Multiphasic Personality Inventory • Panukat Ng Pagkataong Pilipino • Panukat Ng Ugali At Pagkatao • Pictorial Self-Concept Scale For Children • Vineland Adaptive Behavior Scales Legal and Ethical Considerations • Laws - are rules that individuals must obey for the good of the society as a whole. These are promulgated by the legislative bodies of government. • Ethics – is a body of principles of right, proper, or good conduct. These are crafted by professional organizations and institutions. What happens if there is a conflict between a law and the code of ethics? PAP Code of Ethics on Assessment PAP Code of Ethics on Assessment Bases for Assessment • The expert opinions that we provide through our recommendations, reports, and diagnostic or evaluative statements are based on substantial information and appropriate assessment techniques. We provide expert opinions regarding the psychological characteristics of a person after employing adequate assessment procedures and examination to support our conclusions and recommendations. PAP Code of Ethics on Assessment Bases for Assessment • In instances where we are asked to provide opinions about an individual without conducting an examination on the basis of review of existing test results and reports, we discuss the limitations of our opinions and the basis of our conclusions and recommendations. PAP Code of Ethics on Assessment Informed Consent in Assessment • We gather informed consent prior to the assessment of our clients except for the following instances: when it is mandated by law when it is implied such as in routine educational, institutional and organizational activity when the purpose of the assessment is to determine the individual’s decisional capacity. PAP Code of Ethics on Assessment Informed Consent in Assessment • We educate our clients about the nature of our services, financial arrangements, potential risks, and limits of confidentiality. In instances where our clients are not competent to provide informed consent on assessment, we discuss these matters with immediate family member or legal guardians. PAP Code of Ethics on Assessment Informed Consent in Assessment • In instances where a third party interpreter is needed, confidentiality of test results and the security of the tests must be ensured. The limitations of the obtained data are discussed in our results, conclusions and recommendations. PAP Code of Ethics on Assessment Assessment Tools • We judiciously select and administer only those tests which are pertinent to the reasons for referral and purpose of the assessment. We use data collection, methods and procedures that are consistent with current scientific and professional developments. PAP Code of Ethics on Assessment Assessment Tools • We use tests that are standardized, valid, reliable and has a normative data directly referable to the population of our clients. We administer assessment tools that are appropriate to the language, competence and other relevant characteristics of our client. PAP Code of Ethics on Assessment Obsolete and Outdated Test Results • We do not base our interpretations, recommendations on outdated test results. conclusions and We do not provide interpretations, conclusions, recommendations on the basis of obsolete tests. and PAP Code of Ethics on Assessment Interpreting Assessment Results • In fairness to our clients, under no circumstances should we report the test results without taking into consideration the validity, reliability and appropriateness of the test. We should therefore indicate our reservations regarding the interpretations. We interpret assessment results while considering the purpose of the assessment and other factors such as the client’s test taking abilities, characteristics, situational, personal and cultural differences. PAP Code of Ethics on Assessment Release of Test Results • It is our responsibility to ensure that test results and interpretations are not used by persons other than those explicitly agreed upon by the referral sources prior to the assessment procedure. We do not release test data in the forms of raw and scaled scores, client’s responses to test questions or stimuli, and notes regarding the client’s statements and behaviors during the examination unless ordered by the court. PAP Code of Ethics on Assessment Explaining Assessment Results • We release test results only to sources of referral and with a written permission from the client if it is a self referral. Where test results have to be communicated to relatives, parents or teachers, we explain them through non-technical language. PAP Code of Ethics on Assessment Explaining Assessment Results • We explain findings and test results to our clients or designated representatives except when the relationship precludes the provision of explanation of results and it is explained in advanced to the client. When test results needs to be shared with schools, social agencies, the courts or industry, we supervise such releases. PAP Code of Ethics on Assessment Test Security The administration and handling of all test materials shall be done only by qualified users or personnel. PAP Code of Ethics on Assessment Assessment of Unqualified Persons • We do not promote the use of assessment tools and methods by unqualified persons except for training purposes with adequate supervision. • We ensure that test protocols, their interpretations and all other records are kept secured from unqualified persons. PAP Code of Ethics on Assessment Test Construction • We develop tests and other assessment tools using current scientific findings and knowledge, appropriate psychometric properties, validation and standardization procedures. APA Committee on Ethical Standards Test-User Qualifications Level A – tests that can adequately be administered, scored, and interpreted with the aid of the manual and a general orientation to the kind of institution or organization in which one is working. Level B – tests that require some technical knowledge of test construction and use of supporting psychological and educational fields such as statistics, individual differences, psychology of adjustment, personnel psychology and guidance. APA Committee on Ethical Standards Test-User Qualifications •Level C – tests that require substantial understanding of testing and supporting psychological fields together with supervised experience in the use of these devices. APA Committee on Ethical Standards The Rights of Testtakers The right of informed consent. The right to be informed of test findings. The right to privacy and confidentiality. The right to the least stigmatizing label. APA Committee on Ethical Standards The Rights of Testtakers The right of informed consent – testtakers have the right to know why they are being evaluated, how the test data will be used and what information will be released to whom. The right to be informed of test findings. The right to privacy and confidentiality. The right to the least stigmatizing label. APA Committee on Ethical Standards The Rights of Testtakers The right of informed consent The right to be informed of test findings – Testtakers have a right to be informed, in language they can understand, of the nature of the findings with respect to a test they have taken. They are also entitled to know what recommendations are being made as a consequence of the test data. The right to privacy and confidentiality. The right to the least stigmatizing label. APA Committee on Ethical Standards The Rights of Testtakers The right of informed consent The right to be informed of test findings The right to privacy – the privacy right recognizes the freedom of the individual to pick and choose for himself the time, circumstances, and particularly the extent to which he wishes to share or withhold from others his attitudes, beliefs, behavior, and opinions. The right to confidentiality - information revealed through the process of psychological assessment are not to be shared to any other person without the consent of the assesse. The right to the least stigmatizing label. APA Committee on Ethical Standards The Rights of Testtakers The right of informed consent The right to be informed of test findings The right to privacy and confidentiality The right to the least stigmatizing label - the least stigmatizing labels should always be assigned when reporting test results. APA Committee on Ethical Standards The Rights of Testtakers The right to the least stigmatizing label. Chapter 3 A Statistics Refresher McGraw-Hill/Irwin © 2013 McGraw-Hill Companies. All Rights Reserved. Scales of Measurement Continuous scales – theoretically possible to divide any of the values of the scale. Typically having a wide range of possible values (e.g. height or a depression scale). For example, A girl’s weight or height, the length of the road. The weight of a girl can be any value from 54 kgs, or 54.5 kgs, or 54.5436kgs. Discrete scales – categorical values (e.g. male or female). For example, when you roll a die, the possible outcomes are 1, 2, 3, 4, 5 or 6 and not 1.5 or 2.45. Error – the collective influence of all of the factors on a test score beyond those specifically measured by the test 3-2 Scales of Measurement (cont’d.) Nominal Scales - involve classification or categorization based on one or more distinguishing characteristics; all things measured must be placed into mutually exclusive and exhaustive categories (e.g. apples and oranges, DSMIV diagnoses, etc.). Ordinal Scales – Involve classifications, like nominal scales but also allow rank ordering (e.g. Olympic medalists). 3-3 Scales of Measurement (cont’d.) Interval Scales - contain equal intervals between numbers. Each unit on the scale is exactly equal to any other unit on the scale (e.g. IQ scores and most other psychological measures). Ratio Scales – Interval scales with a true zero point (e.g. height or reaction time). Psychological Measurement – Most psychological measures are truly ordinal but are treated as interval measures for statistical purposes. 3-4 Describing Data Distributions - a set of test scores arrayed for recording or study. Raw Score - a straightforward, unmodified accounting of performance that is usually numerical. Frequency Distribution - all scores are listed alongside the number of times each score occurred 3-5 Describing Data Frequency distributions may be in tabular form as in the example above. It is a simple frequency distribution (scores have not been grouped). 3-6 Describing Data Grouped frequency distributions have class intervals rather than actual test scores 3-7 Describing Data A histogram is a graph with vertical lines drawn at the true limits of each test score (or class interval), forming a series of contiguous rectangles 3-8 Describing Data Bar graph - numbers indicative of frequency appear on the Y axis, and reference to some categorization (e.g., yes/ no/ maybe, male/female) appears on the X -axis. 3-9 Describing Data frequency polygon - test scores or class intervals (as indicated on the X axis) meet frequencies (as indicated on the Y -axis). 3-10 Types of Distributions 3-11 Types of Distribution Bimodal literally means "two modes" and is typically used to describe distributions of values that have two centers. For example, the distribution of heights in a sample of adults might have two peaks, one for women and one for men. Skewness is a measure of the asymmetry of a distribution. A distribution is asymmetrical when its left and right side are not mirror images. A distribution can have right (or positive), left (or negative), or zero skewness. A frequency distribution that is extremely asymmetrical in that the initial (or final) frequency group contains the highest frequency, with succeeding frequencies becoming smaller (or larger) elsewhere; the shape of the curve roughly approximates the letter “J” lying on its side. 3-12 Measures of Central Tendency Central tendency - a statistic that indicates the average or midmost score between the extreme scores in a distribution. Mean - Sum of the observations (or test scores), in this case divided by the number of observations. Median – The middle score in a distribution. Particularly useful when there are outliers, or extreme scores in a distribution. Mode – The most frequently occurring score in a distribution. When two scores occur with the highest frequency a distribution is said to be bimodal. 3-13 Measures of Variability Variability is an indication of the degree to which scores are scattered or dispersed in a distribution. Distributions A and B have the same mean score but Distribution has greater variability in scores (scores are more spread out). 3-14 Measures of Variability Measures of variability are statistics that describe the amount of variation in a distribution. Range - difference between the highest and the lowest scores. Average deviation – the average deviation of scores in a distribution from the mean. Variance - the arithmetic mean of the squares of the differences between the scores in a distribution and their mean Standard deviation – the square root of the average squared deviations about the mean. It is the square root of the variance. Typical distance of scores from the 3-15 mean. Measures of Variability Skewness - the nature and extent to which symmetry is absent in a distribution. • Positive skew - relatively few of the scores fall at the high end of the distribution. • Negative skew – relatively few of the scores fall at the low end of the distribution. Kurtosis – the sharpness of the peak of a frequencydistribution curve. • Platykurtic – relatively flat. • Leptokurtic – relatively peaked. • Mesokurtic – somewhere in the middle. 3-16 The Normal Curve The normal curve is a bell-shaped, smooth, mathematically defined curve that is highest at its center. Perfectly symmetrical. Area Under the Normal Curve The normal curve can be conveniently divided into areas defined by units of standard deviations. 3-17 Standard Scores A standard score is a raw score that has been converted from one scale to another scale, where the latter scale has some arbitrarily set mean and standard deviation. Z-score - conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution. T scores - can be called a fifty plus or minus ten scale; that is, a scale with a mean set at 50 and a standard deviation set at 10 Stanine - a standard score with a mean of 5 and a standard deviation of approximately 2. Divided into nine units. Normalizing a distribution - involves “stretching” the skewed curve into the shape of a normal curve and creating a corresponding scale of standard scores 3-18 Correlation and Inference • A coefficient of correlation (or correlation coefficient) is a number that provides us with an index of the strength of the relationship between two things. • Correlation coefficients vary in magnitude between -1 and +1. A correlation of 0 indicates no relationship between two variables. • Positive correlations indicate that as one variable increases or decreases, the other variable follows suit. • Negative correlations indicate that as one variable increases the other decreases. • Correlation between variables does not imply causation but it does aid in prediction. 3-19 Correlation and Inference Pearson r: A method of computing correlation when both variables are linearly related and continuous. Once a correlation coefficient is obtained, it needs to be checked for statistical significance (typically a probability level below .05). By squaring r, one is able to obtain a coefficient of determination, or the variance that the variables share with one another. Spearman Rho: A method for computing correlation, used primarily when sample sizes are small or the variables are ordinal in nature. 3-20 Correlation and Inference Scatterplot – Involves simply plotting one variable on the X (horizontal) axis and the other on the Y (vertical) axis Scatterplots of no correlation (left) and moderate correlation (right) 3-21 Correlation and Inference Scatterplots of strong correlations feature points tightly clustered together in a diagonal line. For positive correlations the line goes from bottom left to top right. 3-22 Correlation and Inference Strong negative correlations form a tightly clustered diagonal line from top left to bottom right. 3-23 Correlation and Inference Outlier – an extremely atypical point (case), lying relatively far away from the other points in a scatterplot 3-24 Correlation and Inference Restriction of range leads to weaker correlations 3-25 Meta-Analysis • Meta-analysis allows researchers to look at the relationship between variables across many separate studies. • Meta-analysis- a family of techniques to statistically combine information across studies to produce single estimates of the data under study. • The estimates are in the form of effect size, which is often expressed as a correlation coefficient. 3-26 Chapter 4 Of Tests and Testing McGraw-Hill/Irwin © 2013 McGraw-Hill Companies. All Rights Reserved. Assumptions about Psychological Testing Psychological Traits and States Exist • A trait has been defined as “any distinguishable, relatively enduring way in which one individual varies from another” (Guilford, 1959, p. 6). • States also distinguish one person from another but are relatively less enduring (Chaplin et al., 1988). • Thousands of trait terms can be found in the English language (e.g. outgoing, shy, reliable, calm, etc.). • Psychological traits exist as constructs - an informed, scientific concept developed or constructed to describe or explain behavior. • We can’t see, hear, or touch constructs, but we can infer their existence from overt behavior, such as test scores. 4-2 Assumptions about Psychological Testing • Traits are relatively stable. They may change over time, yet there are often high correlations between trait scores at different time points. • The nature of the situation influences how traits will be manifested. • Traits refer to ways in which one individual varies, or differs, from another Some people score higher than others on traits like sensation-seeking 4-3 Assumptions about Psychological Testing Traits and States Can Be Quantified and Measured • • • Different test developers may define and measure constructs in different ways. Once a construct is defined, test developers turn to item content and item weighting. A scoring system and a way to interpret results need to be devised. 4-4 Assumptions about Psychological Testing Test-Related Behavior Predicts Non-Test-Related Behavior Responses on tests are thought to predict real-world behavior. The obtained sample of behavior is expected to predict future behavior. Tests Have Strengths and Weaknesses Competent test users understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources. 4-5 Assumptions about Psychological Testing Various Sources of Error are Part of Assessment Error refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test. Error variance - the component of a test score attributable to sources other than the trait or ability measured. • Both the assessee and assessor are sources of error variance 4-6 Assumptions about Psychological Testing Testing and Assessment can be Conducted in a Fair Manner • All major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual. • Problems arise if the test is used with people for whom it was not intended. • Some problems are more political than psychometric in nature. Testing and Assessment Benefit Society • There is a great need for tests, especially good tests, considering the many areas of our lives that they benefit. 4-7 What’s a “Good Test?” Reliability: The consistency of the measuring tool: the precision with which the test measures and the extent to which error is present in measurements. Validity: The test measures what it purports to measure. Other considerations: Administration, scoring, interpretation should be straightforward for trained examiners. A good test is a useful test that will ultimately benefit individual testtakers or society at large. 4-8 Norms • Norm-referenced testing and assessment: a method of evaluation and a way of deriving meaning from test scores by evaluating an individual testtaker’s score and comparing it to scores of a group of testtakers. • The meaning of an individual test score is understood relative to other scores on the same test. • Norms are the test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores. • A normative sample is the reference group to which testtakers are compared. 4-9 Sampling to Develop Norms Standardization: The process of administering a test to a representative sample of testtakers for the purpose of establishing norms. Sampling – Test developers select a population, for which the test is intended, that has at least one common, observable characteristic. Stratified sampling: Sampling that includes different subgroups, or strata, from the population. Stratified-random sampling: Every member of the population has an equal opportunity of being included in a sample. 4-10 Sampling to Develop Norms Purposive sample: Arbitrarily selecting a sample that is believed to be representative of the population. Incidental/convenience sample: A sample that is convenient or available for use. May not be representative of the population. • Generalization of findings from convenience samples must be made with caution. 4-11 Sampling to Develop Norms Developing Norms Having obtained a sample test developers: • Administer the test with standard set of instructions • Recommend a setting for test administration • Collect and analyze data • Summarize data using descriptive statistics including measures of central tendency and variability • Provide a detailed description of the 4-12 Types of Norms • Percentile - the percentage of people whose score on a test or measure falls below a particular raw score. • Percentiles are a popular method for organizing test-related data because they are easily calculated. • One problem is that real differences between raw scores may be minimized near the ends of the distribution and exaggerated in the middle of the distribution. 4-13 Types of Norms (cont’d.) Age norms: average performance of different samples of testtakers who were at various ages when the test was administered. Grade norms: the average test performance of testtakers in a given school grade. National norms: derived from a normative sample that was nationally representative of the population at the time the norming study was conducted. National anchor norms: An equivalency table for scores on two different tests. Allows for a basis of comparison. Subgroup norms: A normative sample can be segmented by any of the criteria initially used in selecting subjects for the sample. Local norms: provide normative information with respect to the local population’s performance on some test. 4-14 Fixed Reference Group Scoring Systems Fixed Reference Group Scoring Systems: The distribution of scores obtained on the test from one group of testtakers is used as the basis for the calculation of test scores for future administrations of the test. • The SAT employs this method. Norm-Referenced versus Criterion-Referenced Interpretation Norm referenced tests involve comparing individuals to the normative group. With criterion referenced tests testtakers are evaluated as to whether they meet a set standard (e.g. a driving exam). 4-15 Culture and Inference • In selecting a test for use, responsible test users should research the test’s available norms to check how appropriate they are for use with the targeted testtaker population. • When interpreting test results it helps to know about the culture and era of the test-taker. • It is important to conduct culturally informed assessment. 4-16 Chapter 7 Utility McGraw-Hill/Irwin © 2013 McGraw-Hill Companies. All Rights Reserved. What is Utility? Utility: the usefulness or practical value of testing to improve efficiency. Factors Affecting Utility Psychometric soundness – Generally, the higher the criterion validity of a test the greater the utility. • There are exceptions because many factors affect the utility of an instrument and utility is assessed in many different ways. • Valid tests are not always useful tests. 7-2 Factors Affecting Utility Costs – One of the most basic elements of a utility analysis is the financial cost associated with a test. • Cost in the context of test utility refers to disadvantages, losses, or expenses in both economic and noneconomic terms. • Economic costs may include purchasing a test, a supply bank of test protocols, and computerized test processing. • Other economic costs are more difficult to calculate such as the cost of not testing or testing with an inadequate instrument. • Non-economic costs include things such as human life and safety 7-3 Factors Affecting Utility Benefits – We should take into account whether the benefits of testing justify the costs of administering, scoring, and interpreting the test. • Benefits can be defined as profits, gains, or advantages. • Successful testing programs can yield higher worker productivity and profits for a company. • Some potential benefits include: an increase in the quality of workers’ performance; an increase in the quantity of workers’ performance; a decrease in the time needed to train workers; a reduction in the number of accidents; a reduction in worker turnover. • Non-economic benefits may include a better work environment and improved morale. 7-4 Utility Analysis Utility Analysis: a family of techniques that entail a cost– benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment. • Some utility tests are straightforward, while others are more sophisticated, employing complicated mathematical models •Endpoint of a utility analysis yields an educated decision as to which of several alternative courses of action is most optimal (in terms of costs and benefits). 7-5 Practical Considerations The pool of job applicants – Some utility models are based on the assumption that for a particular position there is a limitless pool of candidates. • However, some jobs require such expertise or sacrifice that the pool of qualified candidates may be very small. • The economic climate also affects the size of the pool. • The top performers on a selection test may not accept a job offer. 7-6 Practical Considerations The complexity of the job – The same utility models are used for a variety of positions, yet the more complex the job the bigger the difference in people who perform well or poorly. The cut score in use – Cut scores may be relative, in which case they are determined in reference to normative data (e.g. selecting people in the top 10% of test scores). Fixed cut scores are made on the basis of having achieved a minimum level of proficiency on a test (e.g. a driving license exam). Multiple cut scores – The use of multiple cut scores for a single predictor (e.g. students may achieve grades of A, B, C, D, or E). Multiple hurdles - achievement of a particular cut score on one test is necessary in order to advance to the next stage of evaluation in the selection process (e.g. Miss America contest). 7-7 Methods of Setting Cut Scores The Angoff Method: judgments of experts are averaged to yield cut scores for the test. • Can be used for personnel selection, traits, attributes, and abilities. • Problems arise if there is low agreement between experts. The Known Groups Method: entails collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability of interest. • After analysis of a data, a cut score is chosen that best discriminates the groups. • One problem with known groups method is that no standard set of guidelines exist to establish guidelines. 7-8 Methods of Setting Cut Scores IRT Based Methods: In an IRT framework, each item is associated with a particular level of difficulty. • In order to “pass” the test, the testtaker must answer items that are deemed to be above some minimum level of difficulty, which is determined by experts and serves as the cut score. 7-9 Activity 3 Each group must write a topic/ variable proposal for their test development. Indicate what variable/s to be measured, what kind of methodology and theoretical framework. To be submitted Jan. 10, 2023. 7-10 Test Development TEST DEVELOPMENT The Five Stages of Test Development Test development is an umbrella term for all that goes into the process of creating a test. 8-2 TEST CONCEPTUALIZATION The impetus for developing a new test is some thought that “there ought to be a test for…” The stimulus could be knowledge of psychometric problems with other tests, a new social phenomenon, or any number of things. There may be a need to assess mastery in an emerging occupation. 8-3 TEST CONCEPTUALIZATION What special training will Some preliminary questions: be required of test users for administering or What is the test designed to interpreting the test? measure? What types of responses What is the objective of the test? will be required of Is there a need for this test? testtakers? Who will use this test? Who benefits from an Who will take this test? What content will the test cover? administration of this How will the test be administered? test? Is there any potential for What is the ideal format of the harm as the result of an test? Should more than one form of the administration of this test? test be developed? How will meaning be 8-4 attributed to scores on this test? ITEM DEVELOPMENT IN NORM REFERENCED AND CRITERION-REFERENCED TESTS Generally a good item on a norm-referenced achievement test is an item for which high scorers on the test respond correctly. Low scorers respond incorrectly. Ideally, each item on a criterion-oriented test addresses the issue of whether the respondent has met certain criteria. Development of a criterion-referenced test may entail exploratory work with at least two groups of testtakers: one group known to have mastered the knowledge or skill being measured and another group known not to have mastered it. Test items may be pilot studied to evaluate whether they should be included in the final form of the instrument. 8-5 TEST CONSTRUCTION Scaling: the process of setting rules for assigning numbers in measurement. Types of scales: Scales are instruments to measure some trait, state or ability. May be categorized in many ways (e.g. multidimensional, unidemensional, etc.). L.L. Thorndike was very influential in the development of sound scaling methods. 8-6 TEST CONSTRUCTION – SCALING METHODS Numbers can be assigned to responses to calculate test scores using a number of methods. Rating Scales - a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker. Likert scale - Each item presents the testtaker with five alternative responses (sometimes seven), usually on an agree–disagree or approve–disapprove continuum. An example of a Likert scale 8-7 TEST CONSTRUCTION – SCALING METHODS Likert scales are typically reliable. All rating scales result in ordinal level data. Some rating scales are unidimensional, meaning that only one dimension is presumed to underlie the ratings. Other rating scales are multidimensional, meaning that more than one dimension is thought to underlie the ratings. 8-8 TEST CONSTRUCTION – SCALING METHODS Method of Paired Comparisons – Test-takers must choose between two alternatives according to some rule. • For each pair of options, testtakers receive a higher score for selecting the option deemed more justifiable by the majority of a group of judges. • The test score would reflect the number of times the choices of a testtaker agreed with those of the judges. 8-9 TEST CONSTRUCTION – SCALING METHODS Comparative scaling: Entails judgments of a stimulus in comparison with every other stimulus on the scale. Categorical scaling: Stimuli (e.g. index cards) are placed into one of two or more alternative categories. Guttman scale: Items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured. All respondents who agree with the stronger statements of the attitude will also agree with milder statements. The method of equal-appearing intervals can be used to obtain data that are interval in nature. 8-10 TEST CONSTRUCTION – WRITING ITEMS Item pool: The reservoir or well from which items will or will not be drawn for the final version of the test. comprehensive sampling provides a basis for content validity of the final version of the test. Item format: Includes variables such as the form, plan, structure, arrangement, and layout of individual test items. selected-response format – items require testtakers to select a response from a set of alternative responses. constructed-response format – items require testtakers to supply or to create the correct answer, not merely to select it. 8-11 TEST CONSTRUCTION – WRITING ITEMS Multiple-choice format has three elements: (1) a stem, (2) a correct alternative or option, and (3) several incorrect alternatives or options variously referred to as distractors or foils. Other commonly used selective response formats include matching and true-false items 8-12