Psychological Testing and Psychological Assessment Differentiate Psychological Testing and Assessment History 1905 Alfred Binet and Theodore Standford = world’s first IQ Test Developer Binet-Stanford Test- measures the cognitive ability of the gifted and talented children in Paris Testing = the term used to describe the whole process, starting from administering the test until the interpretation of the test 1914 World War I 1917 Large number of military is needs to be screened for intellectual and emotional problems OSS Model (Office of Strategic Services) = the forerunner of today’s CIA (Central Intelligence Agency). Definition Psychological Assessment Psychological Testing Psychological Soundness Parapsychology = it is the study of alleged psychiatric phenomena Psychometrist or Psychometrician = refer to a professional who uses, analyzes and interprets psychological test data Utility = refers to the usefulness/practical value of the test or other tool of assessment has for a particular purpose Test Developers and publishers = create tests or other methods of assessment UTest Users = professionals who use the psychological tests and assessment methodologies; ● clinicians, ● Experimental psychologist ● counselors ● social psychologists ● school psychologists ● F ● human resources personnel ● f ● consumer psychologist Methods of Psychological Assessment ● Retrospective Assessment ● Remote Assessment ● Ecological Momentary Assessment ● Collaborative Assessment ● Therapeutic Assessment ● Dynamic Assessment The Tools of Psychological Assessment ● Test ● Format ● Score ● Scoring ● Cut Score don nnnu Interview MOtivational Interview Portfolio Case History data Groupthink Behavioral observation Naturalistic observation Role play Test development = is an umbrella term that all goes in the process of creating a Psychometrically sound-validreliable test. The process are as follows; ● ● ● ● ● Test Conceptualization= Test Construction = Test Try-out = Test Analysis = Test Revision Test Conceptualization = the test starts from the developers decision to create a certain test, that measures certain construct in a certain way Basis: review of literature of another existing test with psychometric soundness : existing social phenomena/ behavior pattern that needs to be measured : needs of the test to assess mastery in emerging occupation or profession Questions : 1. What is the test designed to measure? (purpose/rationale/ construct) 2. What is the objective of the test? what real world behavior are assessed 3. Is there a need for this test? (should be better and more important than previous test) 4. Who will use the test? (clinician? Education? Others? 5.Who will take the test? (children? Adults? Women? 6. What content will this cover? And why 7. How will the test be administered? (individual, group, amenable for both 8. Ideal format (True/ False, Multiple Choice, Essay, agree/disagree, open-ended questions of some mix of these 9. Should more than tests be developed? 10. Competence of the administrator to qualify 11. What response is required from test takers (refers to limitation of the test 12. Who benefits from the test? How will test takers benefit from this? Is there potential harm from the result 13. How is interpretation of the score done? Criterion-Referenced – referencing how your scre compares to a criterion such as a cutscore or a body of knowledge Norm-referenced-referencing how your score compares to other people. Test development Test Construction = Scaling the process by which test/measuring device is designed and calibrated to which numbers are assigned to different trait, atributes and char. measured Scale Values : are assigned to differentiate amounts of the trait, state or ability being measured a. Few Types are: 1. Age-based = where the performance is a function of age 2. Grade-based = where the performance is a function of grad 3. Stanine scale (Standard Nine)= if all raw scores are transformed into scores running from 1-9 THERE IS NO BEST TYPE, ONLY MOST APPROPRIATE FOR THAT CERTAIN MEASUREMENT —------------------------------------------------------------------Scaling Methods ● Rating Scale is defined by grouping of words, statements, or symbols on which judgments of strength of a particular trait, emotion, attitude are indicated by the test taker. Yields ordinal-level data Example: Please rate the employees on ability to cooperate and get along with fellow employees ● Summative Scale: a type of rating scale where the final score is obtained by summing the ratings across all items ● Likert Scale = a type of summative rating scale, indiv. scores added to get total score =is usually used to scale attitudes ● Method of paired comparisons test takers are presented with pairs of stimuli( 2 paragraphs, 2 subjects, 2 statements) which they are asked to compare. = They must select one stimulus to which they agree more or the one they find more appealing ● Scoring= test takers receive more score for selecting the option deemed more justifiable by majority of a group of judges. Yield ordinal data ● Comparative Scaling = entails judgment of a stimulus in comparison in every other stimulus on the scale Ranking of Experts= asking a panel of experts to rank behavioral indicators and provide meaningful numerical score Method of Equal-Appearing Intervals= Test development Method of Absolute Scaling= Guffmans Scales= Method of Empirical Keying= Method of Rational Scaling= Categorical Scaling Writing Items - Define clearly what you want to measure generate an item pool avoid exceptionally long items Keep the level of difficulty appropriate for those who will take Avoid double barrele items that convey two or more ideas at the same time consider mixing positively and negatively worded items Approache s 1. Rational (Theoretical) Approach = reliance on reason and logic over data collection for statistic analysis 2. Empirical Approach= reliance on data gathering to identify items that relate to the construct 3. Bootstrap= combination of rational and empirical approaches based on a theory, then an empirical approach will be used to identify items that are highly related to the construct - Item Format Multiple Choice Matching Binary-choice (i.e., True or False) Short ANswer is the form, plan, structure, arranegement, and layout of individual test items Test development Test Construction Scoring Models Cumulative Scoring = the higher the score on the test, the higher the testtaker is on ability, trait, or other characteristics that the test purports to measure Class/Category Scoring= testtaker responses ear credit towards placement in a particular class/category with other testtakers whose patterns of response is presumbaly similar in some ways Ipsative=a typical objective is comparing a testtaker’s score on one scale within a test to another scale within that same test. Writing Items for Computer Administration has 2 advantages for digital media; Item bank a relatively large and easily accessible collection of test questions Item branching the ability of the computer to tailor the content and order of test items on the basis of response to previous items Computerized adaptive testing an interactive, computer-administered test-taking process wherin items and its difficulty presented to the the testtaker are based in part on his performance on previous items. Floor Effect refers to the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait or other attribute being measured Ceiling effect refers to the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait or other attribute being measured Test development Test Tryout =The test should be tried out on people who are similar in critical respects to the people to whom the test was designed Rule of thumb in dciding the number of people for whom the test should be tried out: There should be no fewer that 5 subjects and preferably as many as 10 for each item. The more subjects employed, the weaker the role of chance in subsequent factor analysis Test development A x 5 to 10 = n A =items on a questionnaire n=participants = for validation purposes, there must be at least 20 participants each =the ff. conditions of the try out should be identical or similar to which the test is primarily designed o all instructions o time limits alloted for completing the test o atmosphere at the test site What is a good = a good test helps in discriminating test takers = a good test item is one that is answered correctly (or in an expected manner) by the high scores on the test as a whole = is one that is answered incorrectly by low scores on the test as a whole (is answered erroneously by low scorers on the exam as a whole) GOOD BAD What is a bad = a test is answered correctly by low scorers on the test as a whole Item Analysis Item Analysis= refers to the process of statistically analyzing assessment data to ecaluate the quality and performance of you test items + this provides documentaion of validity (proves that it performs well and score interpretations mean what you intend = a group of procedure used by test developers to identify the best items from a pool of tryout items = through this test deveopers can identify which items are good and which are deficient (which items are retained, revised or removed = this also identifies the concepts testtakers have mastered or have not mastered Test development Nominal scale GOAL: 1. find the items that are not performing well ( difficulty and discrimination) 2. find out why those items are not perofrming well Reasons: test is too difficult or easy too confusing (not discriminating) miskeyed biased to a minority group 2 Paradigms for Test Analysis Classical Test Theory Item Response Theory The analyses can differ based on whether the item is Test Item Types Selected dichotonomous (right or wrong) polytomous (2 or more points) Forced-choice items (represents a different counstruct/domain, but they mathced Ipsative scores Advantages of SRI =Item Difficulty Index = Item Reliability Index = Item Validity Index = Item-Discrimination Index Considerations Guessing -most popular and frequently used type of test item. -easliy scored(time savings and enhance score reliability - Test development Item fairness Speed Tests Qualitative Item ANalysis “Think Aloud” Test Administration -Innovative approach to cognitive assessment by having respondents verbalize thoughts as they occur Expert Panels -Sensitive Review - Testtakers could be interviewed Test development Test Revision Select-response format (multiple choice, matching type, true or false CONSTRUCTED-RESPONSE FORMAT ( COMPLETION ITEM, FILL IN THE BLANKS)