reviewer-psychological-assessment-chapter-1-10 compress

Psychological Assessment Reviewer LEGEND: Main Topic ❖ Primary Terms ➢ Secondary-terms ▪ Definition • Examples  Additional Notes Chapter 1 Psychological Assessment VS Psychological Testing ❖ Psych Assessment ▪ the gathering and integration of psychology-related data for the purpose of making a psychological evaluation. ▪ answer a referral question, solve a problem, or arrive at a decision through the use of tools of evaluation. ▪ Assessor is the key to the process or evaluation ❖ Psych Testing ▪ the process of measuring psychology-related variables and designed to obtain a sample of behavior. ▪ obtain some gauge, usually numerical in nature, with regard to an ability or attribute. ▪ The tester does not affect the process or evaluation Varieties of assessment ❖ Retrospective Assessment ▪ draw conclusions about psychological aspects of a person as they existed at some point in time prior to the assessment. ❖ Remote Assessment ▪ the draw conclusions about a subject who is not in physical proximity to the assessor. ❖ Ecological momentary assessment ▪ evaluation of specific problems and related cognitive and behavioral variables at the very time and place they occur. ❖ Educational assessment ▪ to evaluate abilities and skills relevant to success or failure in a school or pre-school context. ❖ Collaborative Assessment ▪ Assessor and assessee may work as partners from initial contact through final feedback. ❖ Therapeutic psychological assessment ▪ Therapeutic self-discovery and new understandings are encouraged throughout the assessment process. ❖ Dynamic assessment ▪ Interactive approach to psychological assessment that usually follows a model of evaluation, intervention of some sort, and evaluation. The Tools of Psychological Assessment ❖ Test ▪ Defined simply as a measuring device or procedure. ➢ Psychological test tinyurl.com/yco7ur28 ▪ Refers to a device or procedure designed to measure variables related to psychology. ❖ Format ▪ The form, plan, structure, arrangement, and layout of test items as ➢ Panel Interview ▪ More than one interviewer partic ➢ Motivational Interview ▪ Dialogue that combines person-c openness and empathy, with the techniques designed to positively therapeutic change. Case History Data ❖ Case History Data ▪ Records, transcript, and other acc other form. ❖ Case Study (Case History) ▪ Report about a person or event t case history data. ❖ Groupthink ▪ Arises as a result of varied forces reach a consensus. Behavioral Observation ❖ Behavioral Observation ▪ Monitoring the actions of others mean while recording quantitativ regarding those actions. ❖ Naturalistic Observation ▪ Research participants are observe Unlike lab experiments. Role-Play Tests ❖ Role Play ▪ Acting an improvised or partially situation. ➢ Role-play test ▪ is a tool of assessment wherein a they were in particular situation. Computer as Tools ❖ Local Processing ▪ Scoring done on-site. ❖ Central Processing ▪ Scoring conducted at some centr ❖ Teleprocessing ▪ when test-related data are sent u ➢ Interpretive Report ▪ A formal or official computer-gen performance presented in both n including an explanation of the fi ▪ The three varieties of interpretive and consultative; contrast with sc report ➢ Consultative Report ▪ A type of interpretive report desi detailed analysis of test data that consultant. ➢ Integrative Report ▪ Includes medication records or b the test report. ❖ CAT (computer adaptive testing) ▪ well as to related considerations such as time limits. Computerized, pencil-and-paper, or some other form. ❖ Score ▪ Code or summary statement usually but not necessarily numerical ▪ Computer-based test that adapts Also called tailored testing.  ADDITIONAL: even a dec id d How Are Assessments Conducted? ▪ Test administrator must be familiar with the test materials and procedures. ➢ Protocol ▪ refer to a description of a set of tests- or assessment-related procedures. ➢ Rapport ▪ establishing a working relationship between the examiner and the examinee ➢ Level C ▪ Tests and aids that require subst and supporting psychological fie experience in the use of these de tests, individual mental tests). ▪ (Individually- administered tests o and projective methods) test use Master's degree in Psychology, o under a licensed psychologist. ❖ Computerized test administration, scori ▪ Computer-assisted psychologica Chapter 2 ▪ Any application of computers to t A Historical Perspective scoring or interpretation of tests, ▪ It is believed that test and testing came from China as early as 2200 questionnaires used in education B.C.E ▪ number of psychological tests can ▪ In ancient China, passing test examination gives the passer administered and scored online. government job or a particular benefit like official position or entitlement to wear a special garb. ❖ Major issues with CAPA: Twentieth Century ▪ Computer-administered test may ❖ The measure of Intelligence ▪ Comparability of pencil-and-pape ▪ In 1905, Alfred Binet and Theodore Simon published a 30-item tests. measuring of intelligence to help identify Paris schoolchildren with ▪ Thousands of words are spewed o intellectual disability. interpretation results, but the va is questionable. ➢ In 1939, David Weschler introduced a test to measure adult intelligence. ▪ Unprofessional, unregulated “psy Wechsler Adult Intelligence Scale (WAIS) contribute to more public skeptic The Rights of Test takers ❖ The measurement of Personality ❖ The right of Informed Consent ➢ Woodworth Psychoneurotic Inventory ❖ The right to be informed of test findings ▪ The measure of personality ❖ The right to privacy and confidentiality ▪ After the WW1, Woodworth introduced developed a personality ➢ Privacy Right test for civilian use that was based on the personal data sheet. ▪ “Recognizes the freedom of the i ▪ WPI or Woodsworth Psychoneurotic Inventory was the first widely himself the time, circumstances, used self-report measure of personality. which he wishes to share or with beliefs, behavior, and opinions” ❖ Self-report ➢ Confidentiality ▪ Refers to a process whereby the test taker themselves supply ▪ Confidentiality concerns matters assessment-related information by responding to questions, courtroom, privilege protects clie keeping a diary, or self-monitoring thoughts or behaviors. proceedings. ❖ Projective test ▪ A psychological test in which words, images, or situations are presented to a person and the responses analyzed for the unconscious expression of elements of personality that they reveal. Culture and Assessment ❖ Culture ▪ The socially transmitted behavior patterns, beliefs, and products of work of a particular population, community, or group of people. ➢ Culture-specific tests ▪ Test designed for use with people from one culture but not from another. ➢ Individualist culture ▪ Is characterized by value being placed on traits such as selfreliance, autonomy, independence, uniqueness, and competitiveness. ▪ Dominant culture in United States and Great Britain ➢ Collectivist culture ▪ Value being placed on traits such as conformity, cooperation, interdependence, and striving toward group goals. ▪ Dominant culture in Asia, Latin America, and Africa. ❖ The right to the least stigmatizing label. ❖ Advise that the least stigmatizing labels s reporting test results  Read about it more on “ CHAPTER 3 Scales of Measurement ❖ Measurement ▪ Act of assigning numbers or symb (people, events, whatever) accor ❖ Scale ▪ A system of ordered numerical or occurring at fixed intervals, used measurement. ▪ A set of number or symbols whos properties of the objects to whic ➢ Continuous Scale ▪ Measure a continuous variable ▪ Exists when it is theoretically pos the scale. ➢ Discrete Scale ▪ Measure a discrete variable CONTINUE READING: ❖ Test and Group Membership ❖ Legal and Ethical Considerations ▪ ❖ Error Discrete variable values can be o ▪ Most frequently used in psychology • Kerlinger (1973) said, “Intelligence, aptitude, and personality test scores are, basically and strictly speaking, ordinal. These tests indicate with more or less accuracy not the amount of intelligence, aptitude, and personality traits of individuals, but rather the rank-order positions of the individuals.” ❖ Interval Scales ▪ Contain equal intervals between numbers. ▪ Numerical ▪ Contains no absolute zero point or fixed beginning. ❖ Ratio Scales ▪ Contains true zero point ▪ Numerical and Informative ▪ Permits not only addition, subtraction, and multiplication but also division ▪ You can tell that there is a fixed beginning. ❖ Describing Data ➢ Distribution ▪ A sets of test scores arrayed for recording or study. ➢ Raw Scores ▪ A straightforward, unmodified accounting of performance that is unusually numerical. ❖ Frequency Distributions ▪ All scores are listed alongside the number of times each score occurred. ▪ The score might be in tabular or graphic form. ❖ Grouped frequency distribution ▪ Replace the actual test scores ▪ Described as an indicator of how many times each variable value occurs in a set of grouped observations. ❖ Histogram ▪ A graph with vertical lines drawn at the true limits of each test score (or class interval), forming a series of contiguous rectangles. ▪ ▪ a statistic that indicates the avera extreme scores in a distribution. The mean is a measure of centra ratio level of measurement, the tendency that takes into accoun ordinal in nature, and the mode that is nominal in nature. ❖ Arithmetic Mean ▪ Denoted by the symbol X (and pr sum of the observations (or test s number of observations. ▪ Formula: ❖ The Median ▪ The middle score in a distribution ▪ When the total number of scores the median can be calculated by of the two middle scores. ❖ The Mode ▪ Most frequently occurring score ➢ Bimodal distribution ▪ Probability distribution with two Measures of Variability ❖ Variability ▪ An indication of how scores in a d dispersed. ▪ two or more distributions of test even though differences in the di mean can be wide. ❖ The Range ▪ A distribution is equal to the diffe the lowest scores. ❖ The interquartile and semi-interquartile ▪ A distribution of test scores (or an be divided into four parts such th each quarter. ❖ Bar Graph ▪ Numbers indicative of frequency also appears on the Y-axis, and reference to some categorization (e.g., yes/no/maybe, male/female) appears on the X-axis. ❖ Interquartile range ▪ A measure of variability equal to Q1. ▪ It is an ordinal statistic ❖ Variance ▪ Equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean. • Formula: ▪ ▪ From raw scores, first calculate the summation of the raw scores squared, divide by the number of scores, and then subtract the mean squared. Formula Skewness ▪ The nature and extent to which symmetry is absent. ▪ An indication of how the measurements in a distribution are distributed. ❖ Positive Skew ▪ A distribution has a positive skew when relatively few of the scores fall at the high end of the distribution. • High left declining to right ▪ Positively skewed examination results may indicate that the test was too difficult. ❖ Negative Skew ▪ When relatively few of the scores fall at the low end of the distribution. • Low left rising to right ▪ Negatively skewed examination results may indicate that the test was too easy. ❖ T scores ▪ can be called a fifty plus or minus mean set at 50 and a standard de ▪ Used to describe how far from th the data follow a t-distribution. ❖ Stanine ▪ A method of scaling test scores o a mean of five and a standard dev ❖ Linear Transformation ▪ One that retains a direct numeric score. ❖ Non-linear Transformation ▪ The resulting standard score does numerical relationship to the orig Correlation and Inference ❖ Coefficient of correlation (or correlation ▪ A number that provides us with a relationship between two things. Concept of Correlation ▪ An expression of the degree and between two things ▪ Coefficient of correlation (r) expr between two (and only two) var nature. • If a correlation coefficient ha relationship between the tw perfect—without error in th ❖ Positive Correlation ▪ Exists when two variables simulta Kurtosis ▪ ▪ Use to refer to the steepness of a distribution in its center. To describe the peakedness/flatness of three general types of curves ❖ Platykurtic (relatively flat) ❖ Leptokurtic (relatively peak) ❖ Mesokurtic (somewhere in the middle) ❖ Negative Correlation ▪ Occurs when one variable increa decreases. • If a correlation is zero, then between the two variables. The Pearson R ▪ Devised by Karl Pearson, r can b when the relationship between th the two variables being correlate theoretically take any value). ▪ ▪ The value obtained for the coeffic interpreted by deriving from it w determination, or; The coefficient of determination variance is shared by the X- and t The Spearman Rho ▪ Developed by Charles Spearman Normal Curve coefficient of correlation is frequ ▪ A bell-shaped, smooth, mathematically defined curve that is is small (fewer than 30 pairs of m highest at its center. when both sets of measurements ▪ From the center it tapers on both sides approaching the X-axis form. asymptotically (meaning that it approaches, but never touches, the Graphic Representations of Correlation axis). ❖ Scatterplot ▪ The curve is perfectly symmetrical, with no skewness ▪ Useful in revealing the presence o ▪ ❖ O li Curvilinearity in this context refe curved a graph is. CHAPTER 4 Assumption 1: Psychological Traits and States Exist ❖ Trait ▪ “Any distinguishable, relatively enduring way in which one individual varies from. ▪ Tends to be a more stable and enduring characteristic or pattern of behavior. ❖ States ▪ Distinguish one person from another but are relatively less enduring (Chaplin et al., 1988) ▪ A state is a temporary way of being (i.e., thinking, feeling, behaving, and relating) ❖ Construct ▪ An informed, scientific concept developed or constructed to describe or explain behavior. ▪ We can’t see, hear, or touch constructs, but we can infer their existence from overt behavior ❖ Overt Behavior ▪ Refers to an observable action or the product of an observable action, including test- or assessment-related responses. Assumption 2: Psychological Traits and States Can Be Quantified and Measured ▪ Defined constructs that need to be measured ▪ Consider the item (that supposed to be indicative of the construct, traits, or stats that is being measured) to be included in the test. ❖ Cumulative Scoring ▪ Represent the strength of the targeted ability or trait or state. ▪ The assumption that the more the test taker responds in a particular direction as keyed by the test manual as correct or consistent with a particular trait, the higher that test taker is presumed to be on the targeted ability or trait. Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior ▪ Some tests are used not to predict future behavior but to postdict it. To understand behavior that has already taken place. ▪ the objective of the test is to provide some indication of other aspects of the examinee’s behavior. Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses ▪ Test user should understand the test that they are going to use. Assumption 5: Various Sources of Error Are Part of the Assessment Process ▪ error traditionally refers to something that is more than expected; it is actually a component of the measurement process. ▪ error refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test. Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner ▪ One source of fairness-related problems is the test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test was intended. ▪ ▪ evaluating an individual test take to scores of a group of test takers The singular is used in the schola that is usual, average, normal, st In a psychometric context, norms a particular group of test takers reference when evaluating or in ❖ Normative Sample ▪ Group of people whose performa for reference in evaluating the pe takers. ❖ Norming ▪ The process of deriving norms. ❖ Race norming ▪ The controversial practice of norm background. ❖ Program Norms/ User ▪ Which “consist of descriptive stat takers in a given period of time ra formal sampling methods”. Sampling to Develop Norms ❖ Test Standardization ▪ The process of administering a te test takers for the purpose of est ▪ a test is said to be standardized w procedures for administration an normative data. ❖ Sampling ▪ The process of selecting the port representative of the whole popu ❖ Sample ▪ A portion of the universe of peop of the whole population. ➢ Stratified Sampling ▪ Method of sampling from a popu into subpopulations. ➢ Purposive Sampling ▪ A sampling technique which relie members of population to partici selected. ➢ Convenience Sample/Sampling ▪ One that is convenient or availab ❖ Developing Norms for Standardized Test ▪ The test developer administers t set of instructions that will be us Assumption 7: Testing and Assessment Benefit Society ▪ The test developer also describes ▪ Imagine a world without a test, how can we identify the giving the test. It has to be consi proficiency, skill, and intelligence of people like Bong-bong Marcos and other officials that they are fit for presidency. ▪ Dapat ‘yung instruction at condit Good Test? magmula normative sample han ▪ Logically, the criteria for a good test would include clear Types of Norms instructions for administration, scoring, and interpretation. ▪ Most of all, a good test would seem to be one that measures what it claims to measure. ❖ Percentiles ▪ An expression of the percentage or measure falls below a particul t d th t f t A through sixth grades). ❖ Developmental Norms ▪ a term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life. ❖ National Norms ▪ Derived from a normative sample that was nationally representative of the population at the time the norming study was conducted. ❖ National Anchor Norms ▪ provide some stability to test scores by anchoring them to other test scores. ❖ Equipercentile Method ▪ The equivalency of scores on different tests is calculated with reference to corresponding percentile scores. ❖ Subgroup Norms ▪ A normative sample can be segmented by any of the criteria initially used in selecting subjects for the sample ❖ Local Norms ▪ provide normative information with respect to the local population’s performance on some test. ▪ Typically developed by test users themselves Fixed Reference Group Scoring Systems ▪ The distribution of scores obtained on the test from one group of test takers ▪ Used as the basis for the calculation of test scores for future administrations of the test. Norm-Referenced Versus Criterion-Referenced Evaluation ❖ Norm-Referenced ▪ One way to derive meaning from a test score is to evaluate the test score in relation to other scores on the same test. ❖ Criterion ▪ As a standard on which a judgment or decision may be based. ▪ Sometimes referred to as “noise, from one testing situation to ano that would systematically raise o ❖ Systemic Error ▪ Refers to a source of error in mea constant or proportionate to wha of the variable being measured. ▪ systematic source of error does n ▪ Once a systematic error becomes predictable—as well as fixable. Sources of Error Variance ❖ Test construction ➢ Item Sampling/Content sampling ▪ Terms that refer to variation am to variation among items betwe ▪ Consider two or more tests desig personality attribute, or body of to be found in the way the items content sampled. ❖ Test administration (Read in the Book) ▪ Sources of error variance that occ influence the test taker’s attentio ▪ Test taker’s reactions to those inf kind of error variance. • Test environment: room temp amount of ventilation and no ➢ Test taker variables: ▪ Pressing emotional problems, ph and the effects of drugs or medic variance. ➢ Examiner-related variables: ▪ examiner’s physical appearance presence or absence of an exami ❖ Test scoring and interpretation ▪ Scorers and scoring systems are variance. ▪ The advent of computer scoring objective, computer-scorable ite ❖ Criterion-referenced testing and assessment variance caused by scorer differe ▪ Method of evaluation and a way of deriving meaning from test ❖ Other sources of error scores by evaluating an individual’s score with reference to a set ▪ Surveys and polls are two tools o standard. researchers who study public op Reliability Estimates ❖ Test-retest Reliability Estimates CHAPTER 5 (read the book for example) ▪ One way of estimating the reliab Reliability by using the same instrument to ▪ in the language of psychometrics reliability refers to consistency in points in time is called Test-retes measurement. ▪ and the result of such an evaluat reliability. ❖ Reliability coefficient • When the interval between t ▪ An index of reliability, a proportion that indicates the ratio the estimate of test-retest re between the true score variance on a test and the total variance. coefficient of stability. The Concept of Reliability Parallel-Forms and Alternate-Forms Reliabili ❖ Variance (σ2) ❖ Coefficient of Equivalence ▪ The expected value of the squared variation of a random variable ▪ The degree of the relationship b from its mean value, in probability and statistics. can be evaluated by means of an ▪ A measure of how far a set of data (numbers) are spread out from coefficient of reliability their mean (average) value. ▪ A measure of how data points differ from the mean. ❖ Parallel Forms ❖ Alternate simply differe test exist when, for each ➢ True Variance ▪ Variance from true differences ➢ Error variance form of the test, the means and the variances of observed test scores of a test that constructed s parallel Split-half Reliability Estimates ❖ Split-half Reliability ▪ Statistical method used to measure the consistency of the scores of a test. ▪ Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.  CHECK EXAMPLE IN THE BOOK The Nature of the Test 1) the test items are homogeneous or heter 2) the characteristic, ability, or trait being m dynamic or static; 3) the range of test scores is or is not restric 4) the test is a speed or a power test; and 5) the test is or is not criterion-referenced. ❖ Spearman-Brown Formula ❖ Homogeneity versus heterogeneity of te ▪ Psychometric reliability to test length and used by ▪ Tests designed to measure one fa psychometricians to predict the reliability of a test after changing trait, are expected to be homoge the test length. ▪ Because the reliability of a test is affected by its length, a formula is ▪ For such tests, it is reasonable to consistency. necessary for estimating the reliability of a test that has been ▪ if the test is heterogeneous in ite shortened or lengthened. consistency might be low relative  CHECK EXAMPLE IN THE BOOK of test-retest reliability. Other Methods of Estimating Internal Consistency ❖ Inter-item consistency ❖ Dynamic versus static characteristics ▪ The degree of correlation among all the items on a scale. ➢ Dynamic ▪ Calculated from a single administration of a single form of a test. ▪ A trait, state, or ability presumed ▪ useful in assessing the homogeneity of the test. of situational and cognitive exper • Tests are said to be homogeneous if they contain items that measure a single trait. ➢ Static • The more homogeneous a test is, the more inter-item ▪ One in which hourly assessments consistency it can be expected to have. made on a trait, state, or ability p ▪ heterogeneity describes the degree to which a test measures unchanging, such as intelligence. different factor. • test takers with the same score on a more heterogeneous test ❖ Restriction or inflation of range may have quite different abilities. ➢ Restriction of Range/Variance ▪ If the variance of either variable i ❖ The Kuder–Richardson formulas restricted by the sampling proced ▪ Checks the internal consistency of measurements with correlation coefficient tends to b dichotomous choices. ▪ The statistic of choice for determining the inter-item consistency ➢ Inflation of Range/Variance of dichotomous items, primarily those items that can be scored ▪ If the variance of either variable i right or wrong (such as multiple-choice items). inflated by the sampling procedu ▪ If test items are more heterogeneous, KR-20 will yield lower coefficient tends to be higher. reliability estimates than the split-half method. ▪ may be used if there is reason to assume that all the test items have approximately the same degree of difficulty.  CHECK EXAMPLE IN THE BOOK ❖ Coefficient Alpha ▪ One way to quantify reliability and represents the proportion of observed score variance that is true score variance. ▪ Developed by Cronbach (1951) ▪ Appropriate for use on tests containing no dichotomous items.  CHECK EXAMPLE IN THE BOOK ❖ Average Proportional Distance ▪ New measure for evaluating the internal consistency of a test. ▪ a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.  CHECK EXAMPLE IN THE BOOK Measures of Inter-Scorer Reliability ❖ Inter-scorer reliability ▪ The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure. ▪ If the reliability coefficient is high, the prospective test user knows ❖ Speed tests versus power tests ➢ Power Test ▪ Long time limit and some items a able to obtain a perfect score. ➢ Speed Test ▪ Time limit on a speed test is esta test takers will be able to comple ▪ A reliability estimate of a speed t performance from two independ the following: (1) test-retest relia reliability, or (3) split-half reliabili tests. ❖ Criterion-referenced tests ▪ Designed to provide an indicatio with respect to some variable or or a vocational objective. ▪ Uses test scores to generate a sta can be expected of a person with ▪ Scores on criterion- referenced t pass–fail (or, perhaps more accu terms, and any scrutiny of perfor to be for diagnostic and remedial ▪ that test scores can be derived in a systematic, consistent way by various scorers with sufficient training. Often used when coding nonverbal behavior.  CHECK EXAMPLE IN THE BOOK The True Score Model of Measurement and A ❖ Classical test theory (CTT) Th t ( l i l) d ➢ Generalizability theory ▪ Based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation ▪ Cronbach encouraged test developers and researchers to describe the details of the particular test situation or universe leading to a specific test score. ▪ This universe is described in terms of its facets, which include things like the number of items in the test, the amount of training the test scores have had, and the purpose of the test administration. ▪ According to generalizability theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained. This test score is the universe score, and it is, as Cronbach noted, analogous to a true score in the true score model. ➢ Generalizability Study ▪ Examines how generalizable scores from a particular test are if the test is administered in different situations ➢ Coefficients of Generalizability ▪ It represents the influence of particular facets on the test score. ➢ Decision Study ▪ Developers examine the usefulness of test scores in helping the test user make decisions. ▪ Designed to tell the test user how test scores should be used and how dependable those scores are as a basis for decisions, depending on the context of their use. ❖ Item response theory (IRT) ▪ A paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. ▪ Also known as the latent response theory refers to a family of mathematical models that attempt to explain the relationship between latent traits (unobservable characteristic or attribute) and their manifestations (i.e., observed outcomes, responses or performance). CHAPTER 6 Validity ▪ ▪ Estimate of how well a test mea measure in a particular context. Judgment based on evidence abo inferences drawn from test score ❖ Inference ▪ A logical result or deduction. ❖ Validation ▪ The process of gathering and eva ▪ It is the test developer’s respons in the test manual ❖ Local validation studies ▪ Are absolutely necessary when th way the format, instructions, lang One way measurement specialists have t is according to three categories: ❖ Content validity ▪ This is a measure of validity based topics, or content covered by the ❖ Criterion-related validity ▪ This is a measure of validity obtai of scores obtained on the test to measures. ❖ Construct validity ▪ This is a measure of validity that comprehensive analysis of: ▪ how scores on the test relate to o and ▪ how scores on the test can be un framework for understanding the designed to measure. ❖ Ecological Validity ▪ A judgment regarding how well a measure at the time and place th (typically a behavior, cognition, o ▪ the greater the ecological validity procedure, the greater the gener ❖ Polytomous Test Items results to particular real-life circ ▪ Test items or questions with three or more alternative responses, Face Validity where only one is scored correct or scored as being consistent with ▪ A test can be said to have face v a targeted trait or other construct. to measure what it is supposed Reliability and Individual Scores • If a test is prepared to measu ▪ The reliability coefficient helps the test developer build an multiplication, and the peopl adequate measuring instrument, and it helps the test user select a that it looks like a good test o suitable test. demonstrates face validity of ▪ By employing the reliability coefficient in the formula for the • A test’s lack of face validity co standard error of measurement, the test user now has another confidence in the perceived e descriptive statistic relevant to test interpretation, this one useful consequential decrease in th in estimating the precision of a particular test score. motivation to do his or her be The Standard Error of Measurement • In a corporate environment, ▪ Used to estimate or infer the extent to which an observed score unwillingness of administrato deviates from a true score. use of a particular test ▪ Provides an estimate of the amount of error inherent in an Content Validity observed score or measurement. ▪ Describes a judgment of how ade ❖ Dichotomous Test Items ▪ Test items or questions that can be answered with only one of two alternative responses, such as true–false, yes–no, or correct– incorrect questions ▪ ▪ Often abbreviated as SEM The higher the reliability of a test (or individual subtest within a test), the lower the SEM. ▪ representative of the universe of designed to sample refers to the ability of a test to c ▪ measures how well a new test compares to a well-established test. ❖ Predictive Validity ▪ An index of the degree to which a test score predicts some criterion measure. ▪ refers to how likely it is for test scores to predict future job performance. ❖ Base Rate ▪ The extent to which a particular trait, behavior, characteristic, or attribute exists in the population (expressed as a proportion). ❖ Hit Rate ▪ Defined as the proportion of people a test accurately identifies as possessing or exhibiting a particular trait, behavior, characteristic, or attribute. ❖ Miss Rate ▪ Defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute. ❖ False Positive ▪ A miss wherein the test predicted that the test taker did possess the particular characteristic or attribute being measured when in fact the test taker did not. • Assumption is False ❖ False Negative ▪ A miss wherein the test predicted that the test taker did not possess the particular characteristic or attribute being measured when the test taker actually did. • False is actually true What Is Criterion ▪ The standard against which a test or a test score is evaluated. ❖ Characteristic of a Criterion 1. An adequate criterion measure must also be valid for the purpose for which it is being used. 2. Criterion should be uncontaminated. A variable that is used as a predictor and criterion and is called Criterion contamination. ❖ The validity coefficient ▪ A correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure. ▪ The correlation coefficient computed from a score (or classification) on a psychodiagnostics test and the criterion score (or classification) assigned by psychodiagnosticians is one example of a validity coefficient. ▪ The design of such pretest–postt a control group to rule out altern ❖ Test scores obtained by people from dis the theory ▪ Also referred to as the method o providing evidence for the valid scores on the test vary in a pred membership in some group. ▪ The rationale here is that if a test construct, then test scores from g presumed to differ with respect t correspondingly different test sc ▪ Test scores correlate with scores what would be predicted from a t manifestation of the construct in ❖ Convergent evidence ▪ Evidence for the construct validit converge from a number of sourc measures designed to assess the ▪ It is an example of convergent ev undergoing construct validation predicted direction with scores o already validated tests designed t construct. ❖ Discriminant evidence ▪ When measures of constructs tha highly related to each other are, each other. ❖ The multitrait-multimethod matrix (Cam ▪ The matrix or table that results fr within and between methods. ❖ Factor Analysis ▪ A shorthand term for a class of m to identify factors or specific vari characteristics, or dimensions on • Frequently employed as a da several sets of scores and the analyzed. • Conducted on either an explo ❖ Exploratory factor analysis ▪ Typically entails “estimating, or e many factors to retain; and rota orientation” ❖ Confirmatory Factor Analysis ▪ Researchers test the degree to w includes factors) fits the actual d ❖ Incremental validity ▪ The degree to which an additional predictor explains something ❖ Factor Loading about the criterion measure that is not explained by predictors ▪ Serves as a data reduction metho already in use. correlations between observed v • Predicting Grade Point Average by either studying or resting. factors. Construct Validity ▪ Judgment about the appropriateness of inferences drawn from test Validity, Bias, and Fairness scores regarding individual standings on a variable called a ❖ Test Bias construct. ▪ For psychometricians, bias is a fa ▪ The researcher investigating a test’s construct validity must systematically prevents accurate, formulate hypotheses about the expected behavior of high scorers and low scorers on the test. ❖ Construct ▪ Bias implies systematic variation ❖ Rating Error Aj d li f h i

reviewer-psychological-assessment-chapter-1-10 compress

Related documents

Products

Support

reviewer-psychological-assessment-chapter-1-10 compress

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib