Review Questions for PSY 405 Assessment and Evaluation Midterm Exam. Multiple-Choice Chap. 1 1) The use of formal oral examinations for the selection of civil servants began with the ancient A) Chinese. B) Egyptians. C) Greeks. D) Jews. Answer: A 2) Although philosophers and scientists in many countries have contributed to psychology, the science of psychology was not formally established until the last quarter of the 19th century in A) England. B) France. C) Germany. D) the United States. Answer: C 3) Early investigations of the hereditary basis of genius were associated primarily with the name of __________, whereas the research conducted by __________ was motivated by interest in identifying mentally retarded children. A) Binet, Cattell B) Darwin, Binet C) Galton, Binet D) Galton, Cattell Answer: C 23 4) Sensorimotor tests for measuring individual differences in abilities and temperament were devised originally by A) Alfred Binet. B) J. M. Cattell. C) Francis Galton. D) Wilhelm Wundt. Answer: C 5) The method of co-relations for analyzing test scores was introduced by A) J. M. Cattell. B) Gustav Fechner. C) Francis Galton. D) Wilhelm Wundt. Answer: C 6) The use of printed examinations by the Boston School Committee occurred under the guidance of A) George Fisher. B) Horace Mann. C) J. M. Rice. D) E. L. Thorndike. Answer: B 7) The tests devised by __________ emphasized the ability to judge, understand, and reason. A) Alfred Binet B) J. M. Cattell C) Francis Galton D) Joseph Jastrow AA 8) The concept of mental age was first used in scoring an intelligence test constructed by A) Alfred Binet. B) Francis Galton. C) Lewis Terman. D) David Wechsler. Answer: A 9) The first intelligence test that proved to be an effective predictor of scholastic achievement was constructed by A) Binet. B) Cattell. C) Galton. D) Otis. Answer: A 10) Thorndike was to achievement testing as Woodworth was to __________ testing. A) ability B) aptitude C) interest D) personality Answer: D 11) Inkblots are to personality inventory as __________ is to __________. A) Binet, Cattell B) Murray, Pearson C) Rorschach, Woodworth D) Terman, Spearman Answer: C 24 12) Credit for devising the first group test of intelligence belongs to A) Arthur Otis. B) Lewis Terman. C) Robert Woodworth. D) Robert Yerkes. Answer: A 13) The first group intelligence test used in the United States military service was the A) Armed Forces Qualification Test. B) Armed Services Vocational Aptitude Battery. C) Army Examination Alpha. D) Army General Classification Test. Answer: C 14) Group intelligence tests and personality inventories were both used initially on a mass basis during A) The Renaissance period. B) the late 19th century. C) World War I. D) World War II. Answer: C 15) The most obvious source to consult for a test review is the A) Mental Measurements Yearbooks. B) Psychological Abstracts. C) Standards for Educational and Psychological Tests. D) Tests in Print. Answer: A 16) A test that has printed directions for scoring and administration, as well as norms, and is constructed by testing professionals is by definition a(n) A) individual test. B) objective test. C) performance test. D) standardized test. Answer: D 17) Interpreting test results by referring to a table of norms is most closely associated with the question of whether the test is a(n) __________ instrument. A) achievement or aptitude B) affective or cognitive C) individual or group D) standardized or non-standardized E) verbal or nonverbal Answer: D 25 are those designed initially by A) Alfred Binet. B) Arthur Otis. 18) The most popular of all intelligence tests in clinical and counseling situations today C) Lewis Terman. D) David Wechsler. Answer: D 19) The most popular test of personality in clinical and counseling situations today is the A) Bender-Gestalt Test. B) Minnesota Multiphasic Personality Inventory. C) Rorschach Inkblot Test. D) Thematic Apperception Test. Answer: B 20) Which of the following dichotomies is concerned primarily with the method of scoring a test? A) affective vs. cognitive B) individual vs. group C) objective vs. nonobjective D) speed vs. power Answer: C 21) Tests requiring examinees to manipulate various physical objects are, of necessity, __________ tests. A) affective Answer: B B) performance C) power D) speed 22) Affective is to cognitive as A) ability is to achievemC) subjective is to objectiAnswer: D ent. ve. B) attitude is to pD) temperament ersonality. is to intelligence. 23) Most essay examinations can be characterized as __________ tests. A) affective B) group C) objective D) speed Answer: B 24) Interest inventories are __________ instruments. A) affective B) performance C) speed D) subjective Answer: A 26 25) The most comprehensive classification of psychological tests is found in the A) American Psychological Association's Test Classification Catalog. B) Library of Congress Test Classification System. C) Mental Measurements Yearbooks. D) Publishers' Index to Psychological Tests. Answer: C 26) At which qualification level are users required to have a PhD in psychology or education, the equivalent in training in assessment, or verification of licensure or certification? A) Level A B) Level B C) Level C D) Level D Answer: C Chap 2 Multiple-Choice Questions 1) The first step in devising a test to screen job applicants is A) analyzing the job content into components. B) constructing a table of specifications. C) consulting a taxonomy of job objectives. D) writing test items to predict the criterion. Answer: A 32 2) Classroom achievement tests are measures of A) affective objectives. B) cognitive objectives. C) nonintellective objectives. D) psychomotor objectives. Answer: B 3) In Bloom's Taxonomy of Educational Objectives: Cognitive Domain, "analysis and synthesis" are considered to be more fundamental (i.e., they appear earlier in the taxonomy) than A) application. B) comprehension. C) evaluation. D) knowledge. Answer: C 4) Before beginning the task of writing items for an achievement test, it is important to construct a(an) A) expectancy chart. B) list of critical behaviors. C) predictable criterion. D) table of specifications. Answer: D 5) In constructing an objective test, approximately __________ more items than needed for the final version of the test should be prepared initially. A) 10% B) 20% C) 50% D) 75% Answer: B 6) The only thing that is "objective" about an objective test is the A) construction. B) design. C) interpretation. D) scoring. Answer: D 7) Although __________ items are useful in measuring knowledge of terminology, they are of little value in assessing higher-order thinking. A) essay B) multiple-choice C) short-answer D) true-false Answer: C 8) Short answer items are most useful in measuring A) ability to evaluate propositions. B) analysis and synthesis. C) knowledge of terminology. D) understanding of principles. Answer: C 33 9) Qualifying words such as "never, sometimes, and always," which reveal the answer to an examinee who has no information about the subject of the item, are called A) false positives and false negatives. B) glittering generalities. C) interlocking adverbs. D) specific determiners. Answer: D 10) On which kind of objective test items are specific determiners a more serious problem? A) matching B) multiple-choice C) short-answer D) true-false Answer: D 11) Bluffing is more of a problem on __________ tests. A) essay B) objective C) oral D) performance Answer: A 12) The tendency to answer an item on the basis of its form rather than its content is known as a A) content distraction. B) context effect. C) form bias. D) response set. Answer: D 13) On a matching item, the number of A) response options should equal the number of premises. B) response options should be greater than the number of premises. C) the number of premises should be greater than the number of response options. D) the number of premises is immaterial, but the number of response options should be substantially less. Answer: B 14) Rearrangement and ranking items are special varieties of A) matching items. B) multiple-choice items. C) short-answer items. D) true-false items. Answer: A 15) Recall is to recognition as __________ is to __________. A) essay, multiple-choice B) matching, ranking C) multiple-choice, true-false D) short answer, completion Answer: A 34 16) The type of test item preferred by most professional testers, because of its ability to measure both simple and complex skills, is the A) completion item. B) essay item. C) multiple-choice item. D) true-false item. Answer: C 17) The type of objective item that is most effective in measuring comprehension, application, and other higher order objectives of instruction is the __________ item. A) multiple-choice(completion) B) rearrangement C) short answer D) true-false Answer: A 18) "Which of the following terms does not belong with the others?" is the stem of a(n) __________ item. A) classification B) correlate C) oddity D) relational Answer: C 19) The results of psychometric research have revealed that the quality of a cognitive test is improved when A) examinees are urged to attempt each item before going on to the next item. B) items are arranged in order from least to most difficult. C) items of the same general type are grouped together. D) more time is spent in item writing and less time in item arrangement. Answer: D 20) In a typical 50-minute class period at the high-school or college level, an examinee can be expected to answer approximately __________ multiple-choice or __________ true-false items during the time limit. A) 25, 50 B) 40, 75 C) 50, 100 D) 60, 100 Answer: C 21) The arrangement in which test items are placed in order of increasing difficulty but are alternated with other items of similar difficulty throughout the test is known as the __________ format. A) critical incidence B) homogeneous hierarchy C) spiral omnibus D) tandem order Answer: C 35 22) Which of the following is a type of Likert scale? A) visual analogue B) true-false C) performance assessment D) hierarchical omnibus Answer: A 23) Likert scales request examinees to make ratings using __________ choices. A)5 B)7 C) 4 D) can be all of the above Answer: A 24) Test scores are higher when items are grouped (1) in order from easiest to most difficult, and (2) according to the type of item A) Both 1 and 2 are true B) Only 1 is true C) Only 2 is true D) Neither 1 nor 2 is true Answer: D 25) Which of the following can be evaluated more effectively by a written test than by an oral test? A) personal qualities B) cheating and bluffing C) knowledge of facts D) in-depth understanding Answer: C 26) Oral tests are used most often in grades A) 1-3. B)4-6. C)7-9. D)10-12. Answer: A 27) Separate answer sheets can be used at the __________ school level and beyond. A) primary B) upper elementary C) junior high D) senior high Answer: B Chap.3 10) Test wiseness is an aspect of test A) planning. B) preparation. C) scoring. D) taking. Answer: D 11) Which of the following statements concerning test taking is true? A) Answers are more likely to be changed from right to wrong than vice versa. B) Changing answers tends to raise scores more on less difficult than on more difficult tests. C) Examinees should always review their answers when time permits. D) Girls tend to improve their test scores more than boys when they change answers. Answer: C 41 12) In scoring a classroom achievement test, it is recommended that A) a correction for guessing be applied to multiple-choice, but not to true-false items. B) essay answers containing the same information not be weighted according to the length of the answer. C) multiple-choice items be weighted according to their length and degree of complexity. D) multiple-choice items be weighted more than true-false and short-answer items. Answer: B 13) Giving a person a high score or high rating on one question simply because he or she scores high on other questions is referred to as the A) central tendency error. B) halo effect. C) leniency error. D) response set. Answer: B 14) Ebel's confidence weighting procedure was designed for __________ items. A) completion B) matching C) multiple-choice D) true-false Answer: D 15) Computing the absolute values of the differences between the correct answers and the examinee's answers is the first step in scoring a __________ item. A) matching B) multiple-choice C) rearrangement D) true-false Answer: C 16) Which of the following recommendations concerning test scoring is incorrect? A) Do not use correction for guessing formulas. B) Give multiple-choice items having four alternatives twice as much weight as true-false items. C) Let examinees know the scoring system before the test begins. D) Weight essay items according to the amount of space that examinees are instructed to use in their answers. Answer: B 17) On a classroom test composed of a 50 multiple-choice items, the most appropriate scoring formula, where S = final score, R = number of items answered correctly, W = number of items answered incorrectly, and k = number of options per item, is A) S = R. B)S = R -kW. C) S = R -W. D)S = R -W/(k -1). Answer: A 42 18) Employing the standard correction for guessing formula in scoring a 50-item truefalse test, an examinee who gets 30 items right and 20 items wrong will make a score of A) 10. B) 20. C) 25. D) 30. Answer: A 19) By sheer random guessing, on the average an examinee can expect to get __________ of the items on a true-false test right and __________ of the items on a four-option multiplechoice test right. A) 25%, 40% B) 50%, 25% C) 60%, 30% D) 75%, 20% Answer: B 20) On a four-option, multiple-choice test scored by the recommended procedure for a classroom achievement test, James gets 24 items right, 24 items wrong, and leaves 2 items blank. His score on the test is A)0. B)16. C)18. D)24. Answer: D 21) Which of the following is not an advantage of computer-based adaptive tests? A) Answers can be revised and changed if the examinee likes. B) Time to complete them is less than for conventional tests. C) They are less reliable and valid than conventional tests. D) Adaptive tests are less expensive than conventional tests. Chap. 4 23) The mean and standard deviation of z scores are __________ and __________. A)0, 1 B)5,2 C)50,10 D)100,15 E)500,100 Answer: A 24) Percentile is to percentile rank as A) decile is to quartile. B) percentage is to decile. C) score is to percentage. D) score is to quartile. Answer: C 25) The deviation IQs on the Wechsler Intelligence scales are A) CEEB scores. B) percentile ranks. C) standard scores. D) T scores. Answer: C 26) The use of either age norms or grade norms assumes that the rate of increase in achievement or ability is A) greater during the early years or grades. B) greater during the later years or grades. C) highly variable from year to year. D) the same from year to year or grade to grade. Answer: D 27) From which of the following are all the others derived? A) AGCT scores B) CEEB scores C) T scores D) z scores Answer: D 50 28) Given that the mean WAIS IQ is 100 and the standard deviation is 15, if Johnny's WAIS IQ is 125 what is his corresponding z score? A) -1.67 B) 1.67 C) 2.00 D) 2.33 Answer: B 29) The problem of unequal score units is least troublesome in the case of __________ norms. A) age B) grade C) percentile D) standard score Answer: D 30) Approximately 68% of a normal distribution of test scores falls between a z score range of A) -.5 to +.5. B) -1.0 to +1.0. C) -1.5 to +1.5. D) -2.0 to +2.0. Answer: B 31) Which of the following types of scale is an "open-ended" standard score scale? A) CEEB B) stanine C) Wechsler IQ D) z Answer: B 32) Wechsler deviation IQ is to Wechsler subtest scaled scores as __________ is to __________. A)15,2 B)15,10 C)100,3 D)100,10 Answer: D Chap. 5 1) The extent to which a test measures anything consistently is a definition of A) normality. B) objectivity. C) reliability. D) validity. Answer: C 2) According to classical test theory, s2obs = s2tru + s2err, and r11 = s2tru/s2obs. If the observed variance of a test is 50 and the error variance is 10, what is the estimated reliability of the test? A) .20 B) .25 C) .45 D) .80 Answer: D 3) The definition of reliability as the ratio of true score variable to observed score variance comes from A) dependability theory. B) classical test theory. C) generalizability theory. D) item response theory. Answer: B 4) If40%ofatest's observed variance is due to errors of measurement, then what is the reliability coefficient of the test? A) .20 B) .40 C) .60 D) .80 Answer: C 55 5) Which of the following types of reliability coefficients is a coefficient of equivalence? A) coefficient alpha B) Kuder-Richardson C) parallel forms D) split-half E) test-retest Answer: C 6) Which type of reliability takes into account error variance due to both different samples of test items and different conditions of administration? A) internal consistency B) parallel forms C) split-half D) test-retest Answer: B 7) The standard deviation of a particular WISC-III subtest is 3, and its reliability coefficient is .84. What is the standard error of measurement of the subtest? A) 1.2 B) 2.0 C) 2.5 D) 3.0 Answer: A 8) Assuming that the correlation between the odd-numbered and the even-numbered items on a test is .74, the corrected split-half reliability of the test (using the Spearman-Brown prophecy formula) is approximately A) .80. B) .85. C) .90. D) .95. Answer: B 9) The Kuder-Richardson method of determining reliability yields an average __________ coefficient. A) alternate tests B) parallel forms C) split-half D) test-retest Answer: C 10) Which of the following statistics enables an examiner to establish confidence limits for the true scores of examinees having a given observed score on a test? A) Kuder-Richardson predictive index B) Spearman-Brown prophecy coefficient C) standard error of estimate D) standard error of measurement Answer: D 56 11) A test can usually be made more reliable by increasing the A) correlation between test and criterion scores. B) length of time for administering the test. C) number of items on the test. D) observed variance relative to true variance. Answer: C 12) The most general formula for computing an internal consistency reliability coefficient is the A) Cronbach coefficient alpha formula. B) Kuder-Richardson formula 21. C) Spearman-Brown prophecy formula. D) Wherry-Doolittle consistency formula. Answer: A 13) Which of the following types of reliability does not belong in the same category as the other three? A) coefficient alpha B) Kuder-Richardson C) split-half D) test-retest Answer: D 14) An interrater reliability coefficient is computed in determining the reliability of __________ tests. A) completion and matching B) essay and oral C) multiple-choice and true-false D) objective and projective Answer: B 15) The standard error of measurement is always zero whenever the reliability coefficient equals A) -1.00. B) .00. C) .50. D) 1.00. Answer: D 16) Suppose that Jane makes a score of 60 on a test having a standard deviation of 5 and a reliability coefficient of .85. Between what two values can we be 95% confidant that Jane's true score on the test falls? A) 59-61 B)57-63 C)56-64 D)55-65 Answer: C 17) Percentile bands for a true score on a test are computed by determining the percentile rank equivalents of scores that are one standard __________ on either side of the examinee's obtained score. A) deviation B) error of the difference C) error of estimate D) error of measurement Answer: D 57 18) Increasing the true variance and the observed variance by the same amount will increase the test's A) arithmetic mean. B) error variance. C) reliability. D) validity. Answer: C 19) The reliability of a certain test is .70. Approximately how much longer will the test have to be made, by adding more items of the same general type, to increase the reliability of the test to .90? A) two times as long B) three times as long C) four times as long D) five times as long Answer: C 20) Analysis of variance techniques are used in the reliability estimation procedure known as A) classical reliability theory. B) generalizability theory. C) split-half theory. D) true score theory. Answer: B 21) A test consisting of 25 items has a reliability of .80. Approximately how many items of the same type as the original ones must be added to the test in order to increase its reliability to .95? A)25 B)50 C)75 D)95 Answer: D 22) The extent to which a test measures what it was designed to measure is its A) internal consistency. B) reliability. C) standardization. D) validity. Answer: D 23) If a test measures consistently but does not measure what it was designed to measure, then the test is A) reliable but not valid. B) reliable but not standardized. C) standardized but not valid. D) valid but not reliable. Answer: A 24) What type of validity is of greatest important for an achievement test? A) concurrent B) construct C) content D) predictive Answer: C 58 25) The type of validity that is used most often in selection programs in education and industry, in which criteria of success are specified, is A) concurrent validity. B) construct validity. C) content validity. D) predictive validity. Answer: D 26) A confidence interval for an examinee's obtained score on a criterion measure can be determined by using the standard error of A) estimate. B) measurement. C) the mean. D) the sample. Answer: A