Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Psychometric Properties and Principles (39) ▪ Item: a specific stimulus to which a person Psychometric Properties essential in Constructing, responds overtly and this response is being Selecting, Interpreting tests scored or evaluated ▪ Administration Procedures: one-to-one basis or Psychological Testing - process of measuring group administration psychology-related variables by means of devices or ▪ Score: code or summary of statement, usually procedures designed to obtain a sample of behavior but not necessarily numerical in nature, but - numerical in nature reflects an evaluation of performance on a test - individual or by group ▪ Scoring: the process of assigning scores to - administrators can be interchangeable without performances affecting the evaluation ▪ Cut-Score: reference point derived by judgement - requires technician-like skills in terms of and used to divide a set of data into two or more administration and scoring classification - yield a test score or series of test score ▪ Psychometric Soundness: technical quality - minutes to few hours ▪ Psychometrics: science of psychological Psychological Assessment - gathering and integration measurement of psychology-related data for the purpose of making ▪ Psychometrist or Psychometrician: refer to psychological evaluation professional who uses, analyzes, and interprets - answers referral question thru the use of different psychological data tools of evaluation Ability or Maximal Performance Test – assess what - individual a person can do - assessor is the key to the process of selecting tests 1. Achievement Test – measurement of the previous and/or other tools of evaluation learning - requires an educated selection of tools of evaluation, - used to measure general knowledge in a specific skill in evaluation, and thoughtful organization and period of time integration of data - used to assess mastery - entails logical problem-solving that brings to bear - rely mostly on content validity many sources of data assigned to answer the referral - fact-based or conceptual question 2. Aptitude – refers to the potential for learning or - Educational: evaluate abilities and skills relevant in acquiring a specific skill school context - tends to focus on informal learning - Retrospective: draw conclusions about psychological - rely mostly on predictive validity aspects of a person as they existed at some point in time 3. Intelligence – refers to a person’s general potential prior to the assessment to solve problems, adapt to changing environments, - Remote: subject is not in physical proximity to the abstract thinking, and profit from experience person conducting the evaluation - Ecological Momentary: “in the moment” evaluation Human Ability – considerable overlap of of specific problems and related cognitive and achievement, aptitude, and intelligence test behavioral variables at the very time and place that they Typical Performance Test – measure usual or habitual occur thoughts, feelings, and behavior - Collaborative: the assessor and assesee may work as - indicate how test takers think and act on a daily basis “partners” from initial contact through final feedback - use interval scales - Therapeutic: therapeutic self-discovery and new - no right and wrong answers understanding are encouraged Personality Test – measures individual dispositions - Dynamic: describe interactive approach to and preferences psychological assessment that usually follows the - designed to identify characteristic model: evaluation > intervention of some sort > - measured ideographically or nomothetically evaluation 1. Structured Personality tests – provide statement, o Psychological Test – device or procedure designed usually self-report, and require the subject to choose to measure variables related to psychology between two or more alternative responses ▪ Content: subject matter 2. Projective Personality Tests – unstructured, and the ▪ Format: form, plan, structure, arrangement, stimulus or response are ambiguous layout Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls o Behavioral Observation – monitoring of actions of 3. Attitude Test – elicit personal beliefs and opinions others or oneself by visual or electronic means while 4. Interest Inventories – measures likes and dislikes recording quantitative and/or qualitative information as well as one’s personality orientation towards the regarding those actions world of work ▪ Naturalistic Observation: observe humans in Other Tests: natural setting 1. Speed Tests – the interest is the number of times a ▪ SORC Model: Stimulus, Organismic Valuables, test taker can answer correctly in a specific period Actual Response, Consequence 2. Power Tests – reflects the level of difficulty of items o Role Play – defined as acting an improvised or the test takers answer correctly partially improvised part in a stimulated situation 3. Values Inventory ▪ Role Play Test: assesses are directed to act as if 4. Trade Test they are in a particular situation 5. Neuropsychological Test o Other tools include computer, physiological devices 6. Norm-Referenced test (biofeedback devices) 7. Criterion-Referenced Tests Psychological Assessment Process o Interview – method of gathering information 1. Determining the Referral Question through direct communication involving reciprocal exchange 2. Acquiring Knowledge relating to the content of the problem Standardized/Structured – questions are prepared 3. Data collection Non-standardized/Unstructured – pursue relevant ideas in depth 4. Data Interpretation Semi-Standardized/Focused – may probe further on o Hit Rate – accurately predicts success or failure specific number of questions o Profile – narrative description, graph, table. Or other representations of the extent to which a person has Non-Directive – subject is allowed to express his demonstrated certain targeted characteristics as a feelings without fear of disapproval result of the administration or application of tools of ▪ Mental Status Examination: determines the assessment mental status of the patient o Actuarial Assessment – an approach to evaluation ▪ Intake Interview: determine why the client came characterized by the application of empirically for assessment; chance to inform the client about demonstrated statistical rules as determining factor the policies, fees, and process involved in assessors’ judgement and actions ▪ Social Case: biographical sketch of the client o Mechanical Prediction – application of computer ▪ Employment Interview: determine whether the algorithms together with statistical rules and candidate is suitable for hiring probabilities to generate findings and ▪ Panel Interview (Board Interview): more than recommendations one interviewer participates in the assessment o Extra-Test Behavior – observations made by an ▪ Motivational Interview: used by counselors and examiner regarding what the examinee does and how clinicians to gather information about some the examinee reacts during the course of testing that problematic behavior, while simultaneously are indirectly related to the test’s specific content but attempting to address it therapeutically of possible significance to interpretation o Portfolio – samples of one’s ability and Parties in Psychological Assessment accomplishment o Case History Data – refers to records, transcripts, 1. Test Author/Developer – creates the tests or other and other accounts in written, pictorial, or other form methods of assessment that preserve archival information, official and 2. Test Publishers – they publish, market, sell, and informal accounts, and other data and items relevant control the distribution of tests to an assessee 3. Test Reviewers – prepare evaluative critiques based ▪ Case study: a report or illustrative account on the technical and practical aspects of the tests concerning a person or an event that was 4. Test Users – uses the test of assessment compiled on the basis of case history data 5. Test Takers – those who take the tests ▪ Groupthink: result of the varied forces that drive 6. Test Sponsors – institutions or government who decision-makers to reach a consensus contract test developers for various testing services 7. Society Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls the types of item content that would provide o Test Battery – selection of tests and assessment insight to it, to gauge the strength of that trait procedures typically composed of tests designed to o Measuring traits and states means of a test entails measure different variables but having a common developing not only appropriate tests items but objective also appropriate ways to score the test and Assumptions about Psychological Testing and interpret the results Assessment o Cumulative Scoring – assumption that the more the testtaker responds in a particular direction Assumption 1: Psychological Traits and States Exist keyed by the test manual as correct or consistent o Trait – any distinguishable, relatively enduring with a particular trait, the higher that testtaker is way in which one individual varies from another presumed to be on the targeted ability or trait - Permit people predict the present from the past - Characteristic patterns of thinking, feeling, and Assumption 3: Test-Rlated Behavior Predicts Nonbehaving that generalize across similar situations, Test-Related Behavior differ systematically between individuals, and remain o The tasks in some tests mimics the actual rather stable across time behaviors that the test user is attempting to - Psychological Trait – intelligence, specific understand intellectual abilities, cognitive style, adjustment, o Such tests only yield a sample of the behavior that interests, attitudes, sexual orientation and preferences, can be expected to be emitted under nontest psychopathology, etc. conditions o States – distinguish one person from another but Assumption 4: Test and Other Measurement are relatively less enduring Techniques have strengths and weaknesses - Characteristic pattern of thinking, feeling, and o Competent test users understand and appreciate behaving in a concrete situation at a specific moment the limitations of the test they use as well as how in time those limitations might be compensated for by - Identify those behaviors that can be controlled by data from other sources manipulating the situation Assumption 5: Various Sources of Error are part of o Psychological Traits exists as construct the Assessment Process - Construct: an informed, scientific concept developed o Error – refers to something that is more than or constructed to explain a behavior, inferred from expected; it is component of the measurement overt behavior process - Overt Behavior: an observable action or the product ▪ Refers to a long-standing assumption that of an observable action factors other than what a test attempts to o Trait is not expected to be manifested in behavior measure will influence performance on the test 100% of the time ▪ Error Variance – the component of a test o Whether a trait manifests itself in observable score attributable to sources other than the trait behavior, and to what degree it manifests, is or ability measured presumed to depend not only on the strength of the o Potential Sources of error variance: trait in the individual but also on the nature of the 1. Assessors action (situation-dependent) 2. Measuring Instruments o Context within which behavior occurs also plays a 3. Random errors such as luck role in helping us select appropriate trait terms for o Classical Test Theory – each testtaker has true observed behaviors score on a test that would be obtained but for the o Definition of trait and state also refer to a way in action of measurement error which one individual varies from another Assumption 6: Testing and Assessment can be o Assessors may make comparisons among people conducted in a Fair and Unbiased Manner who, because of their membership in some group o Despite best efforts of many professionals, or for any number of other reasons, are decidedly fairness-related questions and problems do not average occasionally rise Assumption 2: Psychological Traits and States can In al questions about tests with regards to fairness, it is be Quantified and Measured important to keep in mind that tests are tools ꟷthey can o Once the trait, state or other construct has been be used properly or improperly defined to be measured, a test developer consider Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls ▪ Factors that contribute to inconsistency: Assumption 7: Testing and Assessment Benefit characteristics of the individual, test, or situation, Society which have nothing to do with the attribute being o Considering the many critical decisions that are measured, but still affect the scores based on testing and assessment procedures, we o Goals of Reliability: can readily appreciate the need for tests ✓ Estimate errors Reliability ✓ Devise techniques to improve testing and reduce o Reliability – dependability or consistency of the errors instrument or scores obtained by the same person o Variance – useful in describing sources of test score when re-examined with the same test on different variability occasions, or with different sets of equivalent items ▪ True Variance: variance from true differences ▪ Test may be reliable in one context, but ▪ Error Variance: variance from irrelevant random unreliable in another sources ▪ Estimate the range of possible random Measurement Error – all of the factors associated fluctuations that can be expected in an with the process of measuring some variable, other than individual’s score the variable being measured ▪ Free from errors - difference between the observed score and the true ▪ More number of items = higher reliability score ▪ Minimizing error Positive: can increase one’s score ▪ Using only representative sample to obtain an - Negative: decrease one’s score observed score - Sources of Error Variance: ▪ True score cannot be found a. Item Sampling/Content Sampling: refer to variation ▪ Reliability Coefficient: index of reliability, a among items within a test as well as to variation among proportion that indicates the ratio between the items between tests true score variance on a test and the total - The extent to which testtaker’s score is affected by the variance content sampled on a test and by the way the content is o Classical Test Theory (True Score Theory) – score sampled is a source of error variance on a ability tests is presumed to reflect not only the b. Test Administration- testtaker’s motivation or testtaker’s true score on the ability being measured attention, environment, etc. but also the error c. Test Scoring and Interpretation – may employ ▪ Error: refers to the component of the observed objective-type items amenable to computer scoring of test score that does not have to do with the well-documented reliability testtaker’s ability ▪ Errors of measurement are random Random Error – source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in measurement process (e.g., noise, temperature, weather) Systematic Error – source of error in a measuring a variable that is typically constant or proportionate to what is presumed to be the true values of the variable being measured - has consistent effect on the true score - SD does not change, the mean does ▪ Reliability refers to the proportion of total variance attributed to true variance ▪ The greater the proportion of the total variance ▪ When you average all the observed scores attributed to true variance, the more reliable the obtained over a period of time, then the result test would be closest to the true score ▪ Error variance may increase or decrease a test ▪ The greater number of items, the higher the score by varying amounts, consistency of test reliability score, and thus, the reliability can be affected ▪ Factors the contribute to consistency: stable attributes Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Test-Retest Reliability time - most rigorous and burdensome, since test developers Error: Time Sampling create two forms of the test - time sampling reliability - main problem: difference between the two test - an estimate of reliability obtained by correlating - test scores may be affected by motivation, fatigue, or pairs of scores from the same people on two different intervening events administrations of the test - means and the variances of the observed scores must - appropriate when evaluating the reliability of a test be equal for two forms that purports to measure an enduring and stable - Statistical Tool: Pearson R or Spearman Rho attribute such as personality trait - established by comparing the scores obtained from Internal Consistency (Inter-Item Reliability) two successive measurements of the same individuals Error: Item Sampling Homogeneity and calculating a correlated between the two set of - used when tests are administered once scores - consistency among items within the test - the longer the time passes, the greater likelihood that - measures the internal consistency of the test which is the reliability coefficient would be insignificant the degree to which each item measures the same - Carryover Effects: happened when the test-retest construct interval is short, wherein the second test is influenced - measurement for unstable traits by the first test because they remember or practiced - if all items measure the same construct, then it has a the previous test = inflated correlation/overestimation good internal consistency of reliability - useful in assessing Homogeneity - Practice Effect: scores on the second session are - Homogeneity: if a test contains items that measure a higher due to their experience of the first session of single trait (unifactorial) testing - Heterogeneity: degree to which a test measures - test-retest with longer interval might be affected of different factors (more than one factor/trait) other extreme factors, thus, resulting to low - more homogenous = higher inter-item consistency correlation - KR-20: used for inter-item consistency of - lower correlation = poor reliability dichotomous items (intelligence tests, personality tests - Mortality: problems in absences in second session with yes or no options, multiple choice), unequal (just remove the first tests of the absents) variances, dichotomous scored - Coefficient of Stability - KR-21: if all the items have the same degree of - statistical tool: Pearson R, Spearman Rho difficulty (speed tests), equal variances, dichotomous Parallel Forms/Alternate Forms Reliability scored - Cronbach’s Coefficient Alpha: used when two Error: Item Sampling (Immediate), Item Sampling halves of the test have unequal variances and on tests changes over time (delaued) containing non-dichotomous items, unequal variances - established when at least two different versions of - Average Proportional Distance: measure used to the test yield almost the same scores evaluate internal consistence of a test that focuses on - has the most universal applicability the degree of differences that exists between item - Parallel Forms: each form of the test, the means, scores and the variances, are EQUAL; same items, different positionings/numberings Split-Half Reliability - Alternate Forms: simply different version of a test Error: Item sample: Nature of Split that has been constructed so as to be parallel - Split Half Reliability: obtained by correlating two - test should contain the same number of items and the pairs of scores obtained from equivalent halves of a items should be expressed in the same form and single test administered ONCE should cover the same type of content; range and - useful when it is impractical or undesirable to assess difficulty must also be equal reliability with two tests or to administer a test twice - if there is a test leakage, use the form that is not - cannot just divide the items in the middle because it mostly administered might spuriously raise or lower the reliability - Counterbalancing: technique to avoid carryover coefficient, so just randomly assign items or assign effects for parallel forms, by using different sequence odd-numbered items to one half and even-numbered for groups items to the other half - can be administered on the same day or different Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls o Criterion-Referenced Tests – designed to provide - Spearman-Brown Formula: allows a test developer an indication of where a testtaker stands with respect of user to estimate internal consistency reliability to some variable or criterion from a correlation of two halves of a test, if each half ▪ As individual differences decrease, a traditional had been the length of the whole test and have the measure of reliability would also decrease, equal variances regardless of the stability of individual - Spearman-Brown Prophecy Formula: estimates how performance many more items are needed in order to achieve the o Classical Test Theory – everyone has a “true score” target reliability on test - multiply the estimate to the original number of items ▪ True Score: genuinely reflects an individual’s - Rulon’s Formula: counterpart of spearman-brown ability level as measured by a particular test formula, which is the ratio of the variance of ▪ Random Error difference between the odd and even splits and the o Domain Sampling Theory – estimate the extent to variance of the total, combined odd-even, score which specific sources of variation under defined - if the reliability of the original test is relatively low, conditions are contributing to the test scores then developer could create new items, clarify test ▪ Considers problem created by using a limited instructions, or simplifying the scoring rules number of items to represent a larger and more - equal variances, dichotomous scored complicated construct - Statistical Tool: Pearson R or Spearman Rho ▪ Test reliability is conceived of as an objective Inter-Scorer Reliability measure of how precisely the test score assesses Error: Scorer Differences the domain from which the test draws a sample - the degree of agreement or consistency between two ▪ Generalizability Theory: based on the idea that a or more scorers with regard to a particular measure person’s test scores vary from testing to testing - used for coding nonbehavioral behavior because of the variables in the testing situations - observer differences ▪ Universe: test situation - Fleiss Kappa: determine the level between TWO or ▪ Facets: number of items in the test, amount of MORE raters when the method of assessment is review, and the purpose of test administration measured on CATEGORICAL SCALE ▪ According to Generalizability Theory, given the - Cohen’s Kappa: two raters only exact same conditions of all the facets in the - Krippendorff’s Alpha: two or more rater, based on universe, the exact same test score should be observed disagreement corrected for disagreement obtained (Universe score) expected by chance ▪ Decision Study: developers examine the o Tests designed to measure one factor (Homogenous) usefulness of test scores in helping the test user are expected to have high degree of internal make decisions consistency and vice versa ▪ Systematic Error o Dynamic – trait, state, or ability presumed to be evero Item Response Theory – the probability that a changing as a function of situational and cognitive person with X ability will be able to perform at a experience level of Y in a test o Static – barely changing or relatively unchanging ▪ Focus: item difficulty o Restriction of range or Restriction of variance – if ▪ Latent-Trait Theory the variance of either variable in a correlational ▪ a system of assumption about measurement and analysis is restricted by the sampling procedure used, the extent to which item measures the trait then the resulting correlation coefficient tends to be ▪ The computer is used to focus on the range of lower item difficulty that helps assess an individual’s o Power Tests – when time limit is long enough to ability level allow test takers to attempt all times ▪ If you got several easy items correct, the o Speed Tests – generally contains items of uniform computer will them move to more difficult items level of difficulty with time limit ▪ Difficulty: attribute of not being easily ▪ Reliability should be based on performance from accomplished, solved, or comprehended two independent testing periods using test-retest ▪ Discrimination: degree to which an item and alternate-forms or split-half-reliability differentiates among people with higher or lower levels of the trait, ability etc. Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Dichotomous: can be answered with only one of 3. False Positive (Type 1) – success does not occur two alternative responses 4. False Negative (Type 2) – predicted failure but ▪ Polytomous: 3 or more alternative responses succeed o Standard Error of Measurement – provide a measure of the precision of an observed test score ▪ Standard deviation of errors as the basic measure of error ▪ Index of the amount of inconsistent or the amount of the expected error in an individual’s score ▪ Allows to quantify the extent to which a test provide accurate scores ▪ Provides an estimate of the amount of error inherent in an observed score or measurement ▪ Higher reliability, lower SEM ▪ Used to estimate or infer the extent to which an Validity observed score deviates from a true score o Validity – a judgment or estimate of how well a test ▪ Standard Error of a Score measures what it supposed to measure ▪ Confidence Interval: a range or band of test ▪ Evidence about the appropriateness of inferences scores that is likely to contain true scores drawn from test scores o Standard Error of the Difference – can aid a test ▪ Degree to which the measurement procedure user in determining how large a difference should be measures the variables to measure before it is considered statistically significant ▪ Inferences – logical result or deduction o Standard Error of Estimate – refers to the standard ▪ May diminish as the culture or times change error of the difference between the predicted and ✓ Predicts future performance observed values ✓ Measures appropriate domain o Confidence Interval – a range of and of test score ✓ Measures appropriate characteristics that is likely to contain true score o Validation – the process of gathering and evaluating ▪ Tells us the relative ability of the true score within evidence about validity the specified range and confidence level o Validation Studies – yield insights regarding a ▪ The larger the range, the higher the confidence particular population of testtakers as compared to the o If the reliability is low, you can increase the number norming sample described in a test manual of items or use factor analysis and item analysis to o Internal Validity – degree of control among increase internal consistency variables in the study (increased through random o Reliability Estimates – nature of the test will often assignment) determine the reliability metric o External Validity – generalizability of the research a) Homogenous (unifactor) or heterogeneous results (increased through random selection) (multifactor) o Conceptual Validity – focuses on individual with b) Dynamic (unstable) or static (stable) their unique histories and behaviors c) Range of scores is restricted or not ▪ Means of evaluating and integrating test data so d) Speed Test or Power Test that the clinician’s conclusions make accurate e) Criterion or non-Criterion statements about the examinee o Test Sensitivity – detects true positive o Face Validity – a test appears to measure to the o Test Specificity – detects true negative person being tested than to what the test actually o Base Rate – proportion of the population that measures actually possess the characteristic of interest Content Validity o Selection ratio – no. of available positions compared describes a judgement of how adequately a test to the no. of applicants samples behavior representative of the universe of o Four Possible Hit and Miss Outcomes behavior that the test was designed to sample 1. True Positives (Sensitivity) – predict success - when the proportion of the material covered by the that does occur test approximates the proportion of material covered in 2. True Negatives (Specificity) – predict failure the course that does occur Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly ▪ Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - Test Blueprint: a plan regarding the types of - logical and statistical information to be covered by the items, the no. of items - judgement about the appropriateness of inferences tapping each area of coverage, the organization of the drawn from test scores regarding individual standing items, and so forth on variable called construct - more logical than statistical - Construct: an informed, scientific idea developed or - concerned with the extent to which the test is hypothesized to describe or explain behavior; representative of defined body of content consisting the unobservable, presupposed traits that may invoke to topics and processes describe test behavior or criterion performance - panel of experts can review the test items and rate - One way a test developer can improve the them in terms of how closely they match the objective homogeneity of a test containing dichotomous items is or domain specification by eliminating items that do not show significant - examine if items are essential, useful and necessary correlation coefficients with total test scores - construct underrepresentation: failure to capture - If it is an academic test and high scorers on the entire important components of a construct test for some reason tended to get that particular item - construct-irrelevant variance: happens when scores wrong while low scorers got it right, then the item is are influenced by factors irrelevant to the construct obviously not a good one - Lawshe: developed the formula of Content Validity - Some constructs lend themselves more readily than Ratio others to predictions of change over time - Zero CVR: exactly half of the experts rate the item as - Method of Contrasted Groups: demonstrate that essential scores on the test vary in a predictable way as a function of membership in a group Criterion Validity - If a test is a valid measure of a particular construct, - more statistical than logical then the scores from the group of people who does not - a judgement of how adequately a test score can be have that construct would have different test scores used to infer an individual’s most probable standing on than those who really possesses that construct some measure of interestꟷthe measure of interest being - Convergent Evidence: if scores on the test criterion undergoing construct validation tend to highly - Criterion: standard on which a judgement or decision correlated with another established, validated test that may be made measures the same construct - Characteristics: relevant, valid, uncontaminated - Discriminant Evidence: a validity coefficient - Criterion Contamination: occurs when the criterion showing little relationship between test scores and/or measure includes aspects of performance that are not other variables with which scores on the test being part of the job or when the measure is affected by construct-validated should not be correlated “construct-irrelevant” (Messick, 1989) factors that are - test is homogenous not part of the criterion construct - test score increases or decreases as a function of age, 1. Concurrent Validity: If the test scores obtained at passage of time, or experimental manipulation about the same time as the criterion measures are - pretest-posttest differences obtained; economically efficient - scores differ from groups 2. Predictive Validity: measures of the relationship - scores correlated with scores on other test in between test scores and a criterion measure obtained at accordance to what is predicted a future time o Factor Analysis – designed to identify factors or - Incremental Validity: the degree to which an specific variables that are typically attributes, additional predictor explains something about the characteristics, or dimensions on which people may criterion measure that is not explained by predictors differ already in use; used to improve the domain ▪ Developed by Charles Spearman - related to predictive validity wherein it is defined as ▪ Employed as data reduction method the degree to which an additional predictor explains ▪ Used to study the interrelationships among set of something about the criterion measure that is not variables explained by predictors already in use ▪ Identify the factor or factors in common between Construct Validity (Umbrella Validity) test scores on subscales within a particular test - covers all types of validity ▪ Explanatory FA: estimating or extracting factors; deciding how many factors must be retained Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Confirmatory FA: researchers test the degree to o Cost – disadvantages, losses, or expenses both which a hypothetical model fits the actual data economic and noneconomic terms ▪ Factor Loading: conveys info about the extent to o Benefit – profits, gains or advantages which the factor determines the test score or o The cost of test administration can be well worth it if scores the results is certain noneconomic benefits ▪ can be used to obtain both convergent and o Utility Analysis – family of techniques that entail a discriminant validity cost-benefit analysis designed to yield information o Cross-Validation – revalidation of the test to a relevant to a decision about the usefulness and/or criterion based on another group different from the practical value of a tool of assessment original group form which the test was validated o Expectancy table – provide an indication that a ▪ Validity Shrinkage: decrease in validity after testtaker will score within some interval of scores on cross-validation a criterion measure – passing, acceptable, failing ▪ Co-Validation: validation of more than one test o Might indicate future behaviors, then if successful, from the same group the test is working as it should ▪ Co-Norming: norming more than one test from o Taylor-Russel Tables – provide an estimate of the the same group extent to which inclusion of a particular test in the o Bias – factor inherent in a test that systematically selection system will improve selection prevents accurate, impartial measurement o Selection Ratio – numerical value that reflects the ▪ Prejudice, preferential treatment relationship between the number of people to be ▪ Prevention during test dev through a procedure hired and the number of people available to be hired called Estimated True Score Transformation o Rating – numerical or verbal judgement that places a person or an attribute along a continuum identified by a scale of numerical or word descriptors known as o Base Rate – percentage of people hired under the Rating Scale existing system for a particular position ▪ Rating Error: intentional or unintentional misuse o One limitation of Taylor-Russel Tables is that the of the scale relationship between the predictor (test) and criterion ▪ Leniency Error: rater is lenient in scoring must be linear (Generosity Error) o Naylor-Shine Tables – entails obtaining the ▪ Severity Error: rater is strict in scoring difference between the means of the selected and ▪ Central Tendency Error: rater’s rating would tend unselected groups to derive an index of what the test to cluster in the middle of the rating scale is adding to already established procedures ▪ One way to overcome rating errors is to use o Brogden-Cronbach-Gleser Formula – used to rankings calculate the dollar amount of a utility gain resulting ▪ Halo Effect: tendency to give high score due to from the use of a particular selection instrument failure to discriminate among conceptually o Utility Gain – estimate of the benefit of using a distinct and potentially independent aspects of a particular test ratee’s behavior o Productivity Gains – an estimated increase in work o Fairness – the extent to which a test is used in an output impartial, just, and equitable way o High performing applicants may have been offered o Attempting to define the validity of the test will be in other companies as well futile if the test is NOT reliable o The more complex the job, the more people differ on Utility how well or poorly they do that job o Utility – usefulness or practical value of testing to o Cut Score – reference point derived as a result of a improve efficiency judgement and used to divide a set of data into two o Can tell us something about the practical value of the or more classifications information derived from scores on the test Relative Cut Score – reference point based on normo Helps us make better decisions related considerations (norm-referenced); e.g, NMAT o Higher criterion-related validity = higher utility Fixed Cut Scores – set with reference to a judgement o One of the most basic elements in utility analysis is concerning minimum level of proficiency required; financial cost of the selection device e.g., Board Exams ▪ Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Validity Multiple Cut Scores – refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization Multiple Hurdle – multi-stage selection process, a cut score is in place for each predictor Compensatory Model of Selection – assumption that high scores on one attribute can compensate for lower scores o Angoff Method – setting fixed cut scores ▪ low interrater reliability o Known Groups Method – collection of data on the predictor of interest from group known to possess and not possess a trait of interest ▪ The determination of where to set cutoff score is inherently affected by the composition of Item Difficulty contrasting groups o IRT-Based Methods – cut scores are typically set based on testtaker’s performance across all the items on the test ▪ Item-Mapping Method: arrangement of items in histogram, with each column containing items with deemed to be equivalent value ▪ Bookmark Method: expert places “bookmark” between the two pages that are deemed to separate testtakers who have acquired the minimal Item Discrimination knowledge, skills, and/or abilities from those who are not o Method of Predictive Yield – took into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores o Discriminant Analysis – shed light on the relationship between identified variables and two naturally occurring groups P-Value Reason for accepting or rejecting instruments and o P-Value ≤ ∞, reject null hypothesis tools based on Psychometric Properties o P-Value ≥ ∞, accept null hypothesis Reliability o o Basic Research = 0.70 to 0.90 Clinical Setting = 0.90 to 0.95 Research Methods and Statistics (20) Statistics Applied in Research Studies on tests and Tests Development Measures of Central Tendency - statistics that indicates the average or midmost score between the extreme scores in a distribution Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - Goal: Identify the most typical or representative of Measures of Spread or Variability – statistics that entire group describe the amount of variation in a distribution - Measures of Central Location - gives idea of how well the measure of central tendency represent the data Mean - the average of all the - large spread of values means large differences raw scores between individual scores - Equal to the sum of the observations divided by Range - equal to the difference the number of between highest and the observations lowest score - Interval and ratio data - Provides a quick but (when normal gross description of the distribution) spread of scores - Point of least squares - When its value is based - Balance point for the on extreme scores of the distribution distribution, the resulting - susceptible to outliers description of variation may be understated or Median – the middle score of the overstated distribution - Ordinal, Interval, Ratio Interquartile Range - difference between Q1 - for extreme scores, use and Q2 median Semi-Quartile Range - interquartile range - Identical for sample and divided by 2 population Standard Deviation - approximation of the - Also used when there average deviation around has an unknown or the mean undetermined score - gives detail of how - Used in “open-ended” much above or below a categories (e.g., 5 or score to the mean more, more than 8, at - equal to the square root least 10) of the average squared - For ordinal data deviations about the - if the distribution is mean skewed for ratio/interval - Equal to the square root data, use median of the variance Mode - most frequently - Distance from the mean occurring score in the Variance - equal to the arithmetic distribution mean of the squares of - Bimodal Distribution: if the differences between there are two scores that the scores in a occur with highest distribution and their frequency mean - Not commonly used - average squared - Useful in analyses of deviation around the qualitative or verbal mean nature Measures of Location - For nominal scales, Percentile or Percentile - not linearly discrete variables Rank transformable, converged - Value of the mode gives at the middle and the an indication of the shape outer ends show large of the distribution as well interval as a measure of central tendency Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Quartile Decile/STEN Correlation Pearson R Spearman Rho Biserial Point Biserial Phi Coefficient Tetrachoric Kendall’s Rank Biserial Differences T-test Independent T-Test Dependent One-Way ANOVA Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - expressed in terms of and the differences of the percentage of persons their salaries in the standardization One-Way Repeated - 1 group, measured at sample who fall below a Measures least 3 times given score - e.g., measuring the - indicates the focus level of board individual’s relative reviewers during position in the morning, afternoon, and standardization sample night sessions of review - dividing points between Two-Way ANOVA - 3 or more groups, tested the four quarters in the for 2 variables distribution - e.g., people in different - Specific point socio-economic status - Quarter: refers to an and the differences of interval their salaries and their - divide into 10 equal eating habits parts ANCOVA - used when you need to - a measure of the control for an additional asymmetry of the variable which may be probability distribution of influencing the a real-valued random relationship between your about its mean independent and dependent variable - interval/ratio + ANOVA Mixed Design - 2 or more groups, interval/ratio measured more than 3 times - ordinal + ordinal - e.g., Young Adults, - artificial Dichotomous + Middle Adults, and Old interval/ratio Adults’ blood pressure is - true dichotomous + measured during interval/ratio breakfast, lunch, and - nominal (true dic) + dinner nominal (true/artificial Non-Parametric Tests dic.) Mann Whitney U Test - t-test independent - Art. Dichotomous + Art. Wilcoxon Signed Rank - t-test dependent Dichotomos Test - 3 or more ordinal/rank Kruskal-Wallis H Test - one-way/two-way - nominal + ordinal ANOVA Friedman Test - ANOVA repeated - two separate groups, measures random assignment Lambda - for 2 groups of nominal - e.g., blood pressure of data male and female grad Chi-Square students Goodness of Fit - used to measure - one group, two scores differences and involves - e.g., blood pressure nominal data and only before and after the one variable with 2 or lecture of Grad students more categories - 3 or more groups, tested Test of Independence - used to measure once correlation and involves - e.g., people in different nominal data and two socio-economic status Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls variables with two or II. Test Construction – stage in the process that entails more categories writing test items, revisions, formatting, setting scoring rules Regression – used when one wants to provide - it is not good to create an item that contains numerous framework of prediction on the basis of one factor in ideas order to predict the probable value of another factor - Item Pool: reservoir or well from which the items will Linear Regression of Y - Y = a + bX or will not be drawn for the final version of the test on X - Used to predict the - Item Banks: relatively large and easily accessible unknown value of collection of test questions variable Y when value of - Computerized Adaptive Testing: refers to an variable X is known interactive, computer administered test-taking process Linear Regression of X - X = c + dY wherein items presented to the testtaker are based in on Y - Used to predict the part on the testtaker’s performance on previous items unknown value of - The test administered may be different for each variable X using the testtaker, depending on the test performance on the known variable Y items presented - Reduces floor and ceiling effects - Floor Effects: occurs when there is some lower limit on a survey or questionnaire and a large percentage of respondents score near this lower limit (testtakers have low scores) - Ceiling Effects: occurs when there is some upper limit on a survey or questionnaire and a large percentage of respondents score near this upper limit (testtakers have high scores) - Item Branching: ability of the computer to tailor the content and order of presentation of items on the basis o True Dichotomy – dichotomy in which there are of responses to previous items only fixed possible categories - Item Format: form, plan, structure, arrangement, and o Artificial Dichotomy - dichotomy in which there are layout of individual test items other possibilities in a certain category - Dichotomous Format: offers two alternatives for each Methods and Statistics used in Research Studies and item Test Construction - Polychotomous Format: each item has more than two Test Development alternatives o Test Development – an umbrella term for all that - Category Format: a format where respondents are goes into the process of creating a test asked to rate a construct I. Test Conceptualization – brainstorming of ideas 1. Checklist – subject receives a longlist of adjectives about what kind of test a developer wants to publish and indicates whether each one if characteristic of - stage wherein the ff. is determined: construct, goal, himself or herself user, taker, administration, format, response, benefits, 2. Guttman Scale – items are arranged from weaker to costs, interpretation stronger expressions of attitude, belief, or feelings - determines whether the test would be norm- Selected-Response Format: require testtakers to select referenced or criterion-referenced response from a set of alternative responses - Pilot Work/Pilot Study/Pilot Research – preliminary 1. Multiple Choice - Has three elements: stem research surrounding the creation of a prototype of the (question), a correct option, and several incorrect test alternatives (distractors or foils), Should’ve one - Attempts to determine how best to measure a targeted correct answer, has grammatically parallel alternatives, construct similar length, alternatives that fit grammatically with - Entail lit reviews and experimentation, creation, the stem, avoid ridiculous distractors, not excessively revision, and deletion of preliminary items long, “all of the above”, “none of the above” (25%) Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - Effective Distractors: a distractor that was chosen 3. Constant Sum – respondents are asked to allocate a equally by both high and low performing groups that constant sum of units, such as points, among set of enhances the consistency of test results stimulus objects with respect to some criterion - Ineffective Distractors: may hurt the reliability of the 4. Q-Sort Technique – sort object based on similarity test because they are time consuming to read and can with respect to some criterion limit the no. of good items Non-Comparative Scales of Measurement - Cute Distractors: less likely to be chosen, may affect 1. Continuous Rating – rate the objects by placing a the reliability of the test bec the testtakers may guess mark at the appropriate position on a continuous line from the remaining options that runs from one extreme of the criterion variable to 2. Matching Item - Test taker is presented with two the other columns: Premises and Responses - e.g., Rating Guardians of the Galaxy as the best 3. Binary Choice - Usually takes the form of a sentence Marvel Movie of Phase 4 that requires the testtaker to indicate whether the 2. Itemized Rating – having numbers or brief statement is or is not a fact (50%) descriptions associated with each category - Constructed-Response Format: requires testtakers to - e.g., 1 if your like the item the most, 2 if so-so, 3 if supply or to create the correct answer, not merely you hate it selecting it 3. Likert Scale – indicate their own attitudes by 1. Completion Item - Requires the examinee to checking how strongly they agree or disagree with provide a word or phrase that completes a sentence carefully worded statements that range from very 2. Short-Answer - Should be written clearly enough positive to very negative towards attitudinal object that the testtaker can respond succinctly, with short - principle of measuring attitudes by asking people to answer respond to a series of statements about a topic, in terms 3. Essay – allows creative integration and expression of the extent to which they agree with them of the material 4. Visual Analogue Scale – a 100-mm line that allows - Scaling: process of setting rules for assigning subjects to express the magnitude of an experience or numbers in measurement belief Primary Scales of Measurement 5. Semantic Differential Scale – derive respondent’s 1. Nominal - involve classification or categorization attitude towards the given object by asking him to based on one or more distinguishing characteristics select an appropriate position on a scale between two - Label and categorize observations but do not make bipolar opposites any quantitative distinctions between observations 6. Staple Scale – developed to measure the direction - mode and intensity of an attitude simultaneously 2. Ordinal - rank ordering on some characteristics is 7. Summative Scale – final score is obtained by also permissible summing the ratings across all the items - median 8. Thurstone Scale – involves the collection of a 3. Ratio - contains equal intervals, has no absolute zero variety of different statements about a phenomenon point (even negative values have interpretation to it) which are ranked by an expert panel in order to develop - Zero value does not mean it represents none the questionnaire 4. Interval - - has true zero point (if the score is zero, - allows multiple answers it means none/null) 9. Ipsative Scale – the respondent must choose - Easiest to manipulate between two or more equally socially acceptable Comparative Scales of Measurement options 1. Paired Comparison - produces ordinal data by III. Test Tryout - the test should be tried out on people presenting with pairs of two stimuli which they are who are similar in critical respects to the people for asked to compare whom the test was designed - respondent is presented with two objects at a time and - An informal rule of thumb should be no fewer than 5 asked to select one object according to some criterion and preferably as many as 10 for each item (the more, 2. Rank Order – respondents are presented with the better) several items simultaneously and asked to rank them in - Risk of using few subjects = phantom factors emerge order or priority - Should be executed under conditions as identical as possible Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - A good test item is one that answered correctly by high - The higher Item-Validity index, the greater the test’s scorers as a whole criterion-related validity - Empirical Criterion Keying: administering a large - Item-Discrimination Index: measure of item pool of test items to a sample of individuals who are discrimination; measure of the difference between the known to differ on the construct being measured proportion of high scorers answering an item correctly - Item Analysis: statistical procedure used to analyze and the proportion of low scorers answering the item items, evaluate test items correctly - Discriminability Analysis: employed to examine - Extreme Group Method: compares people who have correlation between each item and the total score of the done well with those who have done poorly test - Discrimination Index: difference between these - Item: suggest a sample of behavior of an individual proportion - Table of Specification: a blueprint of the test in terms - Point-Biserial Method: correlation between a of number of items per difficulty, topic importance, or dichotomous variable and continuous variable taxonomy - Guidelines for Item writing: Define clearly what to measure, generate item pool, avoid long items, keep the level of reading difficulty appropriate for those who will complete the test, avoid double-barreled items, consider making positive and negative worded items - Double-Barreled Items: items that convey more than one ideas at the same time - Item-Characteristic Curve: graphic representation of - Item Difficulty: defined by the number of people who item difficulty and discrimination get a particular item correct - Guessing: one that eluded any universally accepted - Item-Difficulty Index: calculating the proportion of solutions the total number of testtakers who answered the item - Item analyses taken under speed conditions yield correctly; The larger, the easier the item misleading or uninterpretable results - Item-Endorsement Index for personality testing, - Restrict item analysis on a speed test only to the items percentage of individual who endorsed an item in a completed by the testtaker personality test - Test developer ideally should administer the test to be - The optimal average item difficulty is approx. 50% item-analyzed with generous time limits to complete with items on the testing ranging in difficulty from the test about 30% to 80% Scoring Items/Scoring Models 1. Cumulative Model – testtaker obtains a measure of the level of the trait; thus, high scorers may suggest high level in the trait being measured 2. Class Scoring/Category Scoring – testtaker response earn credit toward placement in a particular class or category with other testtaker whose pattern of responses is similar in some way 3. Ipsative Scoring – compares testtaker’s score on one scale within a test to another scale within that same test, - Omnibus Spiral Format: items in an ability are two unrelated constructs arranged into increasing difficulty IV. Test Revision – characterize each item according to - Item-Reliability Index: provides an indication of the its strength and weaknesses internal consistency of a test - As revision proceeds, the advantage of writing a large - The higher Item-Reliability index, the greater the item pool becomes more apparent because some items test’s internal consistency were removed and must be replaced by the items in the - Item-Validity Index: designed to provide an indication item pool of the degree to which a test is measure what it purports - Administer the revised test under standardized to measure conditions to a second appropriate sample of examinee Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls o Basal Level – the level of which a the minimum - Cross-Validation: revalidation of a test on a sample of testtakers other than those on who test performance was criterion number of correct responses is obtained originally found to be a valid predictor of some o Computer Assisted Psychological Assessment – criterion; often results to validity shrinkage standardized test administration is assured for - Validity Shrinkage: decrease in item validities that testtakers and variation is kept to a minimum inevitably occurs after cross-validation ▪ Test content and length is tailored according to - Co-validation: conducted on two or more test using the taker’s ability the same sample of testtakers Statistics - Co-norming: creation of norms or the revision of o Measurement – the act of assigning numbers or existing norms symbols to characteristics of things according to - Anchor Protocol: test protocol scored by highly rules authoritative scorer that is designed as a model for Descriptive Statistics – methods used to provide scoring and a mechanism for resolving scoring concise description of a collection of quantitative discrepancies information - Scoring Drift: discrepancy between scoring in an Inferential Statistics – method used to make anchor protocol and the scoring of another protocol inferences from observations of a small group of people - Differential Item Functioning: item functions known as sample to a larger group of individuals differently in one group of testtakers known to have the known as population same level of the underlying trait o Magnitude – the property of “moreness” - DIF Analysis: test developers scrutinize group by o Equal Intervals – the difference between two points group item response curves looking for DIF Items at any place on the scale has the same meaning as the - DIF Items: items that respondents from different difference between two other points that differ by the groups at the same level of underlying trait have same number of scale units different probabilities of endorsing a function of their o Absolute 0 – when nothing of the property being group membership measured exists o Computerized Adaptive Testing – refers to an o Scale – a set of numbers who properties model interactive, computer administered test-taking empirical properties of the objects to which the process wherein items presented to the testtaker are numbers are assigned based in part on the testtaker’s performance on Continuous Scale – takes on any value within the previous items range and the possible value within that range is infinite ▪ The test administered may be different for each - used to measure a variable which can theoretically be testtaker, depending on the test performance on divided the items presented Discrete Scale – can be counted; has distinct, countable ▪ Reduces floor and ceiling effects values - used to measure a variable which cannot be ▪ Floor Effects: occurs when there is some lower theoretically be divided limit on a survey or questionnaire and a large o Error – refers to the collective influence of all the percentage of respondents score near this lower factors on a test score or measurement beyond those limit (testtakers have low scores) specifically measured by the test or measurement ▪ Ceiling Effects: occurs when there is some upper ▪ Degree to which the test score/measurement may limit on a survey or questionnaire and a large be wrong, considering other factors like state of percentage of respondents score near this upper the testtaker, venue, test itself etc. limit (testtakers have high scores) ▪ Measurement with continuous scale always ▪ Item Branching: ability of the computer to tailor involve with error the content and order of presentation of items on Four Levels of Scales of Measurement the basis of responses to previous items Nominal – involve classification or categorization ▪ Routing Test: subtest used to direct or route the based on one or more distinguishing characteristics testtaker to a suitable level of items - Label and categorize observations but do not make ▪ Item-Mapping Method: setting cut scores that any quantitative distinctions between observations - mode entails a histographic representation of items and expert judgments regarding item effectiveness Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Ordinal - rank ordering on some characteristics is also - Bimodal Distribution: if there are two scores that permissible occur with highest frequency - median - Not commonly used - Useful in analyses of qualitative or verbal nature Interval - contains equal intervals, has no absolute zero - For nominal scales, discrete variables point (even negative values have interpretation to it) - Value of the mode gives an indication of the shape of - Zero value does not mean it represents none the distribution as well as a measure of central tendency Ratio - has true zero point (if the score is zero, it means o Variability – an indication how scores in a none/null) distribution are scattered or dispersed - Easiest to manipulate o Measures of Variability – statistics that describe the o Distribution – defined as a set of test scores arrayed amount of variation in a distribution for recording or study o Range – equal to the difference between highest and o Raw Scores – straightforward, unmodified the lowest score accounting of performance that is usually numerical ▪ Provides a quick but gross description of the o Frequency Distribution – all scores are listed spread of scores alongside the number of times each score occurred ▪ When its value is based on extreme scores of the o Independent Variable – being manipulated in the distribution, the resulting description of variation study may be understated or overstated o Quasi-Independent Variable – nonmanipulated o Quartile – dividing points between the four quarters variable to designate groups in the distribution ▪ Factor: for ANOVA ▪ Specific point Post-Hoc Tests – used in ANOVA to determine which ▪ Quarter: refers to an interval mean differences are significantly different ▪ Interquartile Range: measure of variability equal Tukey’s HSD test – allows the compute a single value to the difference between Q3 and Q1 that determines the minimum difference between ▪ Semi-interquartile Range: equal to the treatment means that is necessary for significance interquartile range divided by 2 o Measures of Central Tendency – statistics that o Standard Deviation – equal to the square root of the indicates the average or midmost score between the average squared deviations about the mean extreme scores in a distribution ▪ Equal to the square root of the variance ▪ Goal: Identify the most typical or representative ▪ Variance: equal to the arithmetic mean of the of entire group squares of the differences between the scores in a Mean – the average of all the raw scores distribution and their mean - Equal to the sum of the observations divided by the ▪ Distance from the mean number of observations o Normal Curve – also known as Gaussian Curve - Interval and ratio data (when normal distribution) o Bell-shaped, smooth, mathematically defined curve - Point of least squares that is highest at its center - Balance point for the distribution o Asymptotically = approaches but never touches the Median – the middle score of the distribution axis - Ordinal, Interval, Ratio o Tail – 2 – 3 standard deviations above and below the - Useful in cases where relatively few scores fall at the mean high end of the distribution or relatively few scores fall at the low end of the distribution - In other words, for extreme scores, use median (skewed) - Identical for sample and population - Also used when there has an unknown or undetermined score - Used in “open-ended” categories (e.g., 5 or more, more than 8, at least 10) - For ordinal data Mode – most frequently occurring score in the distribution Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls ▪ Mean < Median < Mode Skewed is associated with abnormal, perhaps because the skewed distribution deviates from the symmetrical or so-called normal distribution o Kurtosis – steepness if a distribution in its center Platykurtic – relatively flat Leptokurtic – relatively peaked Mesokurtic – somewhere in the middle o Symmetrical Distribution – right side of the graph is mirror image of the left side ▪ Has only one mode and it is in the center of the distribution ▪ Mean = median = mode o Skewness – nature and extent to which symmetry is absent o Positive Skewed – few scores fall the high end of the distribution ▪ The exam is difficult ▪ More items that was easier would have been desirable in order to better discriminate at the lower end of the distribution of test scores o o ▪ Mean > Median > Mode Negative Skewed – when relatively few of the scores fall at the low end of the distribution ▪ The exam is easy ▪ More items of a higher level of difficulty would make it possible to better discriminate between scores at the upper end of the distribution ▪ High Kurtosis = high peak and fatter tails ▪ Lower Kurtosis = rounded peak and thinner tails o Standard Score – raw score that has been converted from one scale to another scale o Z-Scores – results from the conversion of a raw score into a number indicating how many SD units the raw score is below or above the mean of the distribution ▪ Identify and describe the exact location of each score in a distribution ▪ Standardize an entire distribution ▪ Zero plus or minus one scale ▪ Have negative values ▪ Requires that we know the value of the variance to compute the standard error o T-Scores – a scale with a mean set at 50 and a standard deviation set at 10 ▪ Fifty plus or minus 10 scale ▪ 5 standard deviations below the mean would be equal to a t-score of 0 ▪ Raw score that fell in the mean has T of 50 ▪ Raw score 5 standard deviations about the mean would be equal to a T of 100 ▪ No negative values Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 o o o o o o Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls ▪ Used when the population or variance is unknown o Directional Hypothesis Test or One-Tailed Test – Stanine – a method of scaling test scores on a ninestatistical hypotheses specify either an increase or a point standard scale with a mean of five (5) and a decrease in the population mean standard deviation of two (2) o T-Test – used to test hypotheses about an unknown Linear Transformation – one that retains a direct population mean and variance numerical relationship to the original raw score ▪ Can be used in “before and after” type of research Nonlinear Transformation – required when the ▪ Sample must consist of independent data under consideration are not normally distributed observationsꟷthat is, if there is not consistent, Normalizing the distribution involves stretching the predictable relationship between the first skewed curve into the shape of a normal curve and observation and the second creating a corresponding scale of standard scores, a ▪ The population that is sampled must be normal scale that is technically referred to as Normalized ▪ If not normal distribution, use a large sample Standard Score Scale o Correlation Coefficient – number that provides us Generally preferrable to fine-tune the test according with an index of the strength of the relationship to difficulty or other relevant variables so that the between two things resulting distribution will approximate the normal o Correlation – an expression of the degree and curve direction of correspondence between two things STEN – standard to ten; divides a scale into 10 units ▪ + & - = direction ▪ Number anywhere to -1 to 1 = magnitude ▪ Positive – same direction, either both going up or both going down ▪ Negative – Inverse Direction, either DV is up and IV goes down or IV goes up and DV goes down ▪ 0 = no correlation Z-Score T-Score Stanine STEN IQ GRE or SAT Mean 0 50 5 5.5 100 500 SD 1 10 2 2 15 100 Hypothesis Testing – statistical method that uses a sample data to evaluate a hypothesis about a population Alternative Hypothesis – states there is a change, difference, or relationships Null Hypothesis – no change, no difference, or no relationship o Alpha Level or Level of Significance – used to define concept of “very unlikely” in a hypothesis test o Critical Region – composed of extreme values that are very unlikely to be obtained if the null hypothesis is true o If sample data fall in the critical region, the null hypothesis is rejected o The alpha level for a hypothesis test is the probability that the test will lead to a Type I error o Pearson r/Pearson Correlation Coefficient/Pearson Product-Moment Coefficient of Correlation – used when two variables being correlated are continuous and linear ▪ Devised by Karl Pearson ▪ Coefficient of Determination – an indication of how much variance is shared by the X- and Yvariables o Spearman Rho/Rank-Order Correlation Coefficient/Rank-Difference Correlation Coefficient – frequently used if the sample size is small and when both sets of measurement are in ordinal ▪ Developed by Charles Spearman o Outlier – extremely atypical point located at a relatively long distance from the rest of the coordinate points in a scatterplot Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly o Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Regression Analysis – used for prediction b. Aptitude – refers to the potential for learning or ▪ Predict the values of a dependent or response acquiring a specific skill variable based on values of at least one c. Intelligence – refers to a person’s general potential independent or explanatory variable to solve problems, adapt to changing environments, ▪ Residual: the difference between an observed abstract thinking, and profit from experience value of the response variable and the value of the Human Ability – considerable overlap of response variable predicted from the regression achievement, aptitude, and intelligence test line Typical Performance Test – measure usual or habitual ▪ The Principle of Least Squares thoughts, feelings, and behavior ▪ Standard Error of Estimate: standard deviation of Personality Test – measures individual dispositions the residuals in regression analysis and preferences ▪ Slope: determines how much the Y variable a. Structured Personality tests – provide statement, changes when X is increased by 1 point usually self-report, and require the subject to choose o T-Test (Independent) – comparison or determining between two or more alternative responses differences b. Projective Personality Tests – unstructured, and the ▪ 2 different groups/independent samples + stimulus or response are ambiguous interval/ratio scales (continuous variables) c. Attitude Test – elicit personal beliefs and opinions Equal Variance – 2 groups are equal d. Interest Inventories – measures likes and dislikes as well as one’s personality orientation towards the Unequal Variance – groups are unequal world of work o T-test (Dependent)/Paired Test – one groups - Purpose: for evaluation, drawing conclusions of some nominal (either matched or repeated measures) + 2 aspects of the behavior of a person, therapy, decisiontreatments making o One-Way ANOVA – 3 or more IV, 1 DV comparison - Settings: Industrial, Clinical, Educational, of differences Counseling, Business, Courts, Research o Two-Way ANOVA – 2 IV, 1 DV - Population: Test Developers, Test Publishers, Test o Critical Value – reject the null and accept the Reviewers, Test Users, Test Sponsors, Test Takers, alternative if [ obtained value > critical value ] Society o P-Value (Probability Value) – reject null and accept Levels of Tests alternative if [ p-value < alpha level ] 1. Level A – anyone under a direction of a supervisor o Norms – refer to the performances by defined groups or consultant on a particular test 2. Level B – psychometricians and psychologists only o Age-Related Norms – certain tests have different 3. Level C – psychologists only normative groups for age groups 2. Interview – method of gathering information o Tracking – tendency to stay at about the same level through direct communication involving reciprocal relative to one’s peers exchange Norm-Referenced Tests – compares each person with - can be structured, unstructured, semi-structured, or the norm non-directive Criterion-Referenced Tests – describes specific types - Mental Status Examination: determines the mental of skills, tasks, or knowledge that the test taker can status of the patient demonstrate - Intake Interview: determine why the client came for Selection of Assessment Methods and Tools and Uses, assessment; chance to inform the client about the Benefits, and Limitations of Assessment tools and policies, fees, and process involved instruments (32) - Social Case: biographical sketch of the client Identify appropriate assessment methods, tools (2) - Employment Interview: determine whether the 1. Test – measuring device or procedure candidate is suitable for hiring - Psychological Test: device or procedure designed to - Panel Interview (Board Interview): more than one measure variables related to psychology interviewer participates in the assessment Ability or Maximal Performance Test – assess what - Motivational Interview: used by counselors and a person can do clinicians to gather information about some a. Achievement Test – measurement of the previous problematic behavior, while simultaneously attempting learning to address it therapeutically Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly o Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls 3. Portfolio – samples of one’s ability and provides behavioral observations during accomplishment administration - Purpose: Usually in industrial settings for evaluation Wechsler Intelligence Scales (WAIS-IV, WPPSI-IV, of future performance WISC-V) 4. Case History Data – refers to records, transcripts, [C] and other accounts in written, pictorial, or other form - WAIS (16-90 years old), WPPSI (2-6 years old), that preserve archival information, official and WISC (6-11) informal accounts, and other data and items relevant to - individually administered an assessee - norm-referenced - Standard Scores: 100 (mean), 15 (SD) 5. Behavioral Observation – monitoring of actions of - Scaled Scores: 10 (mean), 3 (SD) others or oneself by visual or electronic means while - addresses the weakness in Stanford-Binet recording quantitative and/or qualitative information - could also assess functioning in people with brain regarding those actions injury - Naturalistic Observation: observe humans in natural - evaluates patterns of brain dysfunction setting - yields FSIQ, Index Scores (Verbal Comprehension, 6. Role Play – defined as acting an improvised or Perceptual Reasoning, Working Memory, and partially improvised part in a stimulated situation Processing Speed), and subtest-level scaled scores - Role Play Test: assesses are directed to act as if they Raven’s Progressive Matrices (RPM) are in a particular situation [B] - Purpose: Assessment and Evaluation - 4 – 90 years old - Settings: Industrial, Clinical - nonverbal test - Population: Job Applicants, Children - used to measure general intelligence & abstract 7. Computers – using technology to assess an client, reasoning thus, can serve as test administrators and very efficient - multiple choice of abstract reasoning test scorers - group test 8. Others: videos, biofeedback devices - IRT-Based Intelligence Tests Culture Fair Intelligence Test (CFIT) Stanford-Binet Intelligence Scale 5th Ed. (SB-5) [ B] [C] - Nonverbal instrument to measure your analytical and - 2-85 years old reasoning ability in the abstract and novel situations - individually administered - Measures individual intelligence in a manner - norm-referenced designed to reduced, as much as possible, the influence - Scales: Verbal, Nonverbal, and Full Scale (FSIQ) of culture - Nonverbal and Verbal Cognitive Factors: Fluid - Individual or by group Reasoning, Knowledge, Quantitative Reasoning, - Aids in the identification of learning problems and Visual-Spatial Processing, Working Memory helps in making more reliable and informed decisions - age scale and point-scale format in relation to the special education needs of children - originally created to identify mentally disabled Purdue Non-Language Test children in Paris [B] - 1908 Scale introduced Age Scale format and Mental - Designed to measure mental ability, since it consists Age entirely of geometric forms - 1916 scale significantly applied IQ concept - Culture-fair - Standard Scores: 100 (mean), 15 (SD) - Self-Administering - Scaled Scores: 10 (mean), 3 (SD) Panukat ng Katalinuhang Pilipino - co-normed with Bender-Gestalt and Woodcock- Basis for screening, classifying, and identifying needs Johnson Tests that will enhance the learning process - based on Cattell-Horn-Carroll Model of General - In business, it is utilized as predictors of occupational Intellectual Ability achievement by gauging applicant’s ability and fitness - no accommodations for pwds for a particular job - 2 routing tests - w/ teaching items, floor level, and ceiling level Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - Essential for determining one’s capacity to handle the - K Scale = reveals a person’s defensiveness around challenges associated with certain degree programs certain questions and traits; also faking good - Subtests: Vocabulary, Analogy, Numerical Ability, - K scale sometimes used to correct scores on five Nonverbal Ability clinical scales. The scores are statistically corrected for an individual’s overwillingness or unwillingness to Wonderlic Personnel Test (WPT) admit deviance - Assessing cognitive ability and problem-solving - “Cannot Say” (CNS) Scale: measures how a person aptitude of prospective employees doesn’t answer a test item - Multiple choice, answered in 12 minutes - High ? Scale: client might have difficulties with Armed Services Vocational Aptitude Battery reading, psychomotor retardation, or extreme - Most widely used aptitude test in US defensiveness - Multiple-aptitude battery that measures developed - True Response Inconsistency (TRIN): five true, then abilities and helps predict future academic and five false answers occupational success in the military - Varied Response Inconsistency (VRIN): random true Kaufman Assessment Battery for Children-II or false (KABC-II) - Infrequency-Psychopathology Scale (Fp): reveal intentional or unintentional over-reporting - Alan & Nadeen Kaufman - FBS Scale: “symptom validity scale” designed to - for assessing cognitive development in children detect intentional over-reporting of symptoms - 13 to 18 years old - Back Page Infrequency (Fb): reflects significant Personality Tests change in the testtaker’s approach to the latter part of Minnesota Multiphasic Personality Inventory the test (MMPI-2) Myers-Briggs Type Indicator (MBTI) [C] - Katherine Cook Briggs and Isabel Briggs Myers - Self-report inventory designed to identify a person’s - Multiphasic personality inventory intended for used personality type, strengths, and preferences with both clinical and normal populations to identify - Extraversion-Introversion Scale: where you prefer to sources of maladjustment and personal strengths focus your attention and energy, the outer world and - Starke Hathaway and J. Charnley McKinley external events or your inner world of ideas and - Help in diagnosing mental health disorders, experiences distinguishing normal from abnormal - Sensing-Intuition Scale: how do you take inform, you - should be administered to someone with no guilt take in or focus on interpreting and adding meaning on feelings for creating a crime the information - individual or by groups - Thinking-Feeling Scale: how do you make decisions, - Clinical Scales: Hypochondriasis, Depression, logical or following what your heart says Hysteria, Psychopathic Deviate, - Judging-Perceiving Scale: how do you orient the Masculinity/Femininity, Paranoia, Psychasthenia outer world? What is your style in dealing with the (Anxiety, Depression, OCD), Schizophrenia, outer world – get things decided or stay open to new Hypomania, Social Introversion info and options? - Lie Scale (L Scale): items that are somewhat negative Edward’s Preference Personality Schedule (EPPS) but apply to most people; assess the likelihood of the [B] test taker to approach the instrument with defensive - designed primarily as an instrument for research and mindset counselling purposes to provide quick and convenient - High in L scale = faking good measures of a number of relatively normal personality - High in F scale = faking bad, severe distress or variables psychopathology - based of Murray’s Need Theory - Superlative Self Presentation Scale (S Scale): a - Objective, forced-choice inventory for assessing the measure of defensiveness; Superlative Selfrelative importance that an individual places on 15 Presentation to see if you intentionally distort answers personality variables to look better - Useful in personal counselling and with non-clinical - Correction Scale (K Scale): reflection of the frankness adults of the testtaker’s self-report - Individual Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls Guilford-Zimmerman Temperament Survey - 5 years and older (GZTS) - subjects look at 10 ambiguous inkblot images and - items are stated affirmatively rather than in question describe what they see in each one nd form, using the 2 person pronoun - once used to diagnose mental illnesses like - measures 10 personality traits: General Activity, schizophrenia Restraint, Ascendance, Sociability, Emotional Stability, - Exner System: coding system used in this test Objectivity, Friendliness, Thoughtfulness, Personal - Content: the name or class of objects used in the Relations, Masculinity patient’s responses NEO Personality Inventory (NEO-PI-R) Content: - Standard questionnaire measure of the Five Factor 1. Nature Model, provides systematic assessment of emotional, 2. Animal Feature interpersonal, experiential, attitudinal, and 3. Whole Human motivational styles 4. Human Feature - gold standard for personality assessment 5. Fictional/Mythical Human Detail - Self-Administered 6. Sex - Neuroticism: identifies individuals who are prone to psychological distress Determinants: - Extraversion: quantity and intensity of energy 1. Form directed 2. Movement - Openness To Experience: active seeking and 3. Color appreciation of experiences for their own sake 4. Shading - Agreeableness: the kind of interactions an individual 5. Pairs and Reflections prefers from compassion to tough mindedness - Conscientiousness: degree of organization, Location: persistence, control, and motivation in goal-directed 1. W – the whole inkblot was used to depict an image behavior 2. D – commonly described part of the blot was used Panukat ng Ugali at Pagkatao/Panukat ng 3. Dd – an uncommonly described or unusual detail Pagkataong Pilipino was used - Indigenous personality test 4. S – the white space in the background was used - Tap specific values, traits and behavioral dimensions Thematic Apperception Test related or meaningful to the study of Filipinos [C] Sixteen Personality Factor Questionnaire - Christiana Morgan and Henry Murray - Raymond Cattell - 5 and above - constructed through factor analysis - 31 picture cards serve as stimuli for stories and - Evaluates a personality on two levels of traits descriptions about relationships or social situations - Primary Scales: Warmth, Reasoning, Emotional - popularly known as the picture interpretation Stability, Dominance, Liveliness, Rule-Consciousness, technique because it uses a standard series of Social Boldness, Sensitivity, Vigilance, provocative yet ambiguous pictures about which the Abstractedness, Privateness, Apprehension, Openness subject is asked to tell a story to change, Self-Reliance, Perfectionism, Tension - also modified African American testtakers - Global Scales: Extraversion, Anxiety, ToughChildren’s Apperception Test Mindedness, Independence, Self-Control Big Five Inventory-II (BFI-2) - Bellak & Bellak - 3-10 years old - Soto & John - based on the idea that animals engaged in various - Assesses big 5 domains and 15 facets activities were useful in stimulating projective - for commercial purposes to researches and students storytelling by children Projective Tests Hand Test Rorshcach Inkblot Test [C] - Edward Wagner - Hermann Rorschach - 5 years old and above Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls - used to measure action tendencies, particularly acting - can also be used to assess brain damage and general out and aggressive behavior, in adults and children mental functioning - 10 cards (1 blank) - measures the person’s psychological and emotional functioning Apperceptive Personality Test (APT) - The house reflects the person’s experience of their immediate social world - Holmstrom et. Al. - The tree is a more direct expression of the person’s - attempt to address the criticisms of TAT emotional and psychological sense of self - introduced objectivity in scoring system - The person is a more direct reflection of the person’s - 8 cards include male and female of different ages and sense of self minority group members - testtakers will respond to a series of multiple choice Draw-A-Person Test (DAP) questions after storytelling - Florence Goodenough Word Association Test (WAT) - 4 to 10 years old - a projective drawing task that is often utilized in - Rapaport et. Al. psychological assessments of children - presentation of a list of stimulus words, assessee - Aspects such as the size of the head, placement of the responds verbally or in writing the first thing that arms, and even things such as if teeth were drawn or comes into their minds not are thought to reveal a range of personality traits Rotter Incomplete Sentences Blank (RISB) -Helps people who have anxieties taking tests (no strict format) - Julian Rotter & Janet Rafferty -Can assess people with communication problems - Grade 9 to Adulthood -Relatively culture free - most popular SCT -Allow for self-administration SACK’s Sentence Completion Test (SSCT) Kinetic Family Drawing - Joseph Sacks and Sidney Levy - Burns & Kaufman - 12 years old and older - derived from Hulses’ FDT “doing something” - asks respondents to complete 60 questions with the Clinical & Counseling Tests first thing that comes to mind across four areas: Family, Sex, Interpersonal, Relationships and Self concept Millon Clinical Multiaxial Scale-IV (MCMI-IV) Bender-Gestalt Visual Motor Test - Theodore Millon [C] - 18 years old and above - for diagnosing and treatment of personality disorders - Lauretta Bender - exaggeration of polarities results to maladaptive - 4 years and older behavior - consists of a series of durable template cards, each - Pleasure-Pain: the fundamental evolutionary task displaying a unique figure, then they are asked to draw - Active-Passive: one adapts to the environment or each figure as he or she observes it adapts the environment to one’s self - provides interpretative information about an - Self-Others: invest to others versus invest to oneself individual’s development and neuropsychological functioning Beck Depression Inventory (BDI-II) - reveals the maturation level of visuomotor perceptions, which is associated with language ability - Aaron Beck and various functions of intelligence - 13 to 80 years old - 21-item self-report that tapos Major Depressive House-Tree-Person Test (HTP) symptoms accdg. to the criteria in the DSM - John Buck and Emmanuel Hammer MacAndrew Alcoholism Scale (MAC & MAC-R) - 3 years and up - measures aspects of a person’s personality through - from MMPI-II interpretation of drawings and responses to questions - Personality & attitude variables thought to underlie alcoholism Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls California Psychological Inventory (CPI-III) - can take note of verbal - sometimes, due to and nonverbal cues negligence of interviewer - attempts to evaluate personality in normally adjusted - flexible and interviewee, it can individuals - time and cost effective miss out important - has validity scales that determines faking bad and - both structured and information faking good unstructured allows - interviewer’s effect on - interpersonal style and orientation, normative clinicians to place a wider, the interviewee orientation and values, cognitive and intellectual more meaningful context - various error such as function, and role and personal style - can also be used to help halo effect, primacy - has special purpose scales, such as managerial predict future behaviors effect, etc. potential, work orientation, creative temperament, interviews allow - interrater reliability leadership potential, amicability, law enforcement - clinicians to establish - interviewer bias orientation, tough-mindedness rapport and encourage client self-exploration. Rosenberg Self-Esteem Scale Portfolio - measures global feelings of self-worth - provides comprehensive - can be very demanding - 10-item, 4 point likert scale illustration of the client - time consuming - used with addolescents which highlights the Dispositional Resilience Scale (DRS) strengths and weaknesses Observation - measures psychological hardiness defined as the - flexible - For private practitioners, ability to view stressful situations as meaningful, - suitable for subjs that it is typically not practical changeable, and challenging cannot be studied in lab or economically feasible Ego Resiliency Scale-Revised setting to spend hours out of the - measure ego resiliency or emotional intelligence - more realistic consulting room HOPE Scale - affordable observing clients as they - developed by Snyder - can detect patterns go about their daily lives - Agency: cognitive model with goal driven energy - lack of scientific control, - Pathway: capacity to contrast systems to meet goals ethical considerations, - good measure of hope for traumatized people and potential for bias from - positively correlated with health psychological observers and subjects adjustment, high achievement, good problem solving - unable to draw causeskills, and positive health-related outcomes and-effect conclusions - lack of control Satisfaction with Life Scale (SWLS) - lack of validity - overall assessment of life satisfaction as a cognitive - observer bias judgmental process Case History Positive and Negative Affect Schedule (PANAS) - measure the level of positive and negative emotions a - can fully show the - cannot be used to test taker has during the test administration experience of the observer generalize a phenomenon in the program Strengths and weaknesses of assessment tools (2) - shed light on an Pros Cons individual’s past and Test current adjustment as well - can gather a sample of - In crisis situations when as on the events and behavior objectively with relatively rapid decisions circumstances that may lesser bias need to be made, it can be have contributed to any - flexible, can be verbal or impractical to take the changes in adjustment nonverbal time required to Role Play administer and interpret - encourages individuals - may not be as useful as tests to come together to find the real thing in all Interview solutions and to get to situations Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls know how their - time-consuming colleagues think - expensive - group can discuss ways - inconvenient to assess in to potentially resolve the a real situation situation and participants - While some employees leave with as much will be comfortable role information as possible, playing, they’re less adept resulting in more efficient at getting into the required handling of similar real- mood needed to actually life scenarios replicate a situation ▪ The greater number of items, the higher the Test Administration, Scoring, Interpretation and reliability Usage (20) ▪ Factors that contribute to inconsistency: Detect Errors and impacts in Test characteristics of the individual, test, or situation, Issues in Intelligence Testing which have nothing to do with the attribute being 1. Flynn Effect – progressive rise in intelligence score measured, but still affect the scores that is expected to occur on a normed intelligence test o Error Variance – variance from irrelevant random from the date when the test was first normed sources ▪ Gradual increase in the general intelligence Measurement Error – all of the factors associated among newborns with the process of measuring some variable, other than ▪ Frog Pond Effect: theory that individuals the variable being measured evaluate themselves as worse when in a group of difference between the observed score and the true high-performing individuals score 2. Culture Bias of Testing Positive: can increase one’s score ▪ Culture-Free: attempt to eliminate culture so Negative: decrease one’s score nature can be isolated - Sources of Error Variance: ▪ Impossible to develop bec culture is evident in its a. Item Sampling/Content Sampling influence since birth or an individual and the b. Test Administration interaction between nature and nurture is c. Test Scoring and Interpretation cumulative and not relative Random Error – source of error in measuring a ▪ Culture Fair: minimize the influence of culture targeted variable caused by unpredictable fluctuations with regard to various aspects of the evaluation and inconsistencies of other variables in measurement procedures process (e.g., noise, temperature, weather) ▪ Fair to all, fair to some cultures, fair only to one Systematic Error – source of error in a measuring a culture variable that is typically constant or proportionate to ▪ Culture Loading: the extent to which a test what is presumed to be the true values of the variable incorporates the vocabulary concepts traditions, being measured knowledge etc. with particular culture has consistent effect on the true score Errors: Reliability SD does not change, the mean does o Classical Test Theory (True Score Theory) – score ▪ Error variance may increase or decrease a test on ability tests is presumed to reflect not only the score by varying amounts, consistency of test testtaker’s true score on the ability being measured score, and thus, the reliability can be affected but also the error Test-Retest Reliability ▪ Error: refers to the component of the observed Error: Time Sampling test score that does not have to do with the - the longer the time passes, the greater likelihood that testtaker’s ability the reliability coefficient would be insignificant ▪ Errors of measurement are random - Carryover Effects: happened when the test-retest interval is short, wherein the second test is influenced by the first test because they remember or practiced the previous test = inflated correlation/overestimation of reliability - Practice Effect: scores on the second session are Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls 2. True Negatives (Specificity) – predict failure higher due to their experience of the first session of that does occur testing 3. False Positive (Type 1) – success does not occur - test-retest with longer interval might be affected of 4. False Negative (Type 2) – predicted failure but other extreme factors, thus, resulting to low correlation succeed - target time for next administration: at least two weeks Parallel Forms/Alternate Forms Reliability Error: Item Sampling (Immediate), Item Sampling changes over time (delayed) - Counterbalancing: technique to avoid carryover effects for parallel forms, by using different sequence for groups - most rigorous and burdensome, since test developers create two forms of the test - main problem: difference between the two tests - test scores may be affected by motivation, fatigue, or intervening events - create a large set of questions that address the same construct then randomly divide the questions into two Errors due to Behavioral Assessment sets 1. Reactivity – when evaluated, the behavior increases Internal Consistency (Inter-Item Reliability) - Hawthorne Effect Error: Item Sampling Homogeneity 2. Drift – moving away from what one has learned Split-Half Reliability going to idiosyncratic definitions of behavior Error: Item sample: Nature of Split - subjects should be retrained in a point of time Inter-Scorer Reliability - Contrast Effect: cognitive bias that distorts our Error: Scorer Differences perception of something when we compare it to o Standard Error of Measurement – provide a something else, by enhancing the differences between measure of the precision of an observed test score them ▪ Standard deviation of errors as the basic measure 3. Expectancies – tendency for results to be influenced of error by what test administrators expect to find ▪ Index of the amount of inconsistent or the amount - Rosenthal/Pygmalion Effect: Test administrator’s of the expected error in an individual’s score expected results influences the result of the test ▪ Allows to quantify the extent to which a test - Golem Effect: negative expectations decreases one’s provide accurate scores performance ▪ Provides an estimate of the amount of error 4. Rating Errors – intentional or unintentional misuse inherent in an observed score or measurement of the scale ▪ Higher reliability, lower SEM - Leniency Error: rater is lenient in scoring (Generosity ▪ Used to estimate or infer the extent to which an Error) observed score deviates from a true score - Severity Error: rater is strict in scoring ▪ Standard Error of a Score - Central Tendency Error: rater’s rating would tend to ▪ Confidence Interval: a range or band of test cluster in the middle of the rating scale scores that is likely to contain true scores - Halo Effect: tendency to give high score due to failure o Standard Error of the Difference – can aid a test to discriminate among conceptually distinct and user in determining how large a difference should be potentially independent aspects of a ratee’s behavior before it is considered statistically significant - snap judgement on the basis of positive trait o Standard Error of Estimate – refers to the standard - Horn Effect: Opposite of Halo Effect error of the difference between the predicted and - One way to overcome rating errors is to use rankings observed values 5. Fundamental Attribution Error – tendency to o Four Possible Hit and Miss Outcomes explain someone’s behavior based on internal factors 1. True Positives (Sensitivity) – predict success such as personality or disposition, and to underestimate that does occur the influence the external factors have on another person’s behavior, blaming it on the situation Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls to ensure that services were not denied. However, - Barnum Effect: people tend to accept vague the services are discontinued once the appropriate personality descriptions as accurate descriptions of services are available themselves (Aunt Fanny Effect) o Psychologists should discuss the limits of o Bias – factor inherent in a test that systematically confidentiality, uses of the information that would be prevents accurate, impartial measurement generated from the services to the persons and ▪ Prejudice, preferential treatment organizations with whom they establish a scientific ▪ Prevention during test dev through a procedure or professional relationships called Estimated True Score Transformation o Before recording voices or images, they must obtain Ethical Principles and Standards of Practice (19) permission first from all persons involved or their o If mistakes was made, they should do something to legal rep correct or minimize the mistakes o Only discuss confidential information with persons o If an ethical violation made by another psychologist clearly concerned/involved with the matters was witnessed, they should resolve the issue with o Disclosure is allowed with appropriate consent informal resolution, as long as it does not violate any ▪ No consent is not allowed UNLESS mandated by confidentiality rights that may be involved the law o If informal resolution is not enough or appropriate, o No disclosure of confidential information that could referral to state or national committees on lead to the identification of a client unless they have professional ethics, state licensing boards, or the obtained prior consent or the disclosure cannot be appropriate institutional authorities can be done. avoided Still, confidentiality rights of the professional in ▪ Only disclose necessary information question must be kept. o Exemptions to disclosure: o Failure to cooperate in ethics investigation itself, is ✓ If the client is disguised/identity is protected an ethics violation, unless they request for deferment ✓ Has consent of adjudication of an ethics complaint ✓ Legally mandated o Psychologists must file complaints responsibly by o Psychologists can create public statements as long as checking facts about the allegations they would be responsible for it o Psychologists DO NOT deny persons employment, ▪ They cannot compensate employees of the media advancement, admissions, tenure or promotion based in return for publicity in a news item solely upon their having made or their being the ▪ Paid Advertisement must be clearly recognizable subject of an ethics complaint ▪ when they are commenting publicly via internet, ▪ Just because they are questioned by the ethics media, etc., they must ensure that their statement committee or involved in an on-going ethics are based on their professional knowledge in investigation, they would be discriminated or accord with appropriate psych literature and denied advancement practice, consistent with ethics, and do not ▪ Unless the outcome of the proceedings are indicate that a professional relationship has been already considered established with the recipient o Psychologists should do their services within the o Must provide accurate information and obtain boundaries of their competence, which is based on approval prior to conducting the research the amount of training, education, experience, or o Informed consent is required, which include: consultation they had ✓ Purpose of the research o When they are tasked to provide services to ✓ Duration and procedures clients who are deprived with mental health ✓ Right to decline and withdraw services (e.g., communities far from the urban ✓ Consequences of declining or withdrawing cities), however, they were still not able to obtain ✓ Potential risks, discomfort, or adverse effects the needed competence for the job, they could ✓ Benefits still provide services AS LONG AS they make ✓ Limits of confidentiality reasonable effort to obtain the competence ✓ Incentives for participation required, just to ensure that the services were not ✓ Researcher’s contact information denied to those communities o Permission for recording images or vices are needed o During emergencies, psychologists provide unless the research consists of solely naturalistic services to individuals, even though they are yet to complete the competency/training needed just Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly Psychological Assessment #BLEPP2023 Source: Cohen & Swerdlik (2018), Kaplan & Saccuzzo (2018), Psych Pearls observations in public places, or research designed o Art. 12 of Revised Penal Code – Insanity Plea includes deception end ▪ Consent must be obtained during debriefing o Dispense or Omitting Informed consent only when: congratulations on reaching the end of this reviewer!! i 1. Research would not create distress or harm hope u learned something!! :D ▪ Study of normal educational practices conducted in an educational settings one day, we will be remembered. ▪ Anonymous questionnaires, naturalistic observation, archival research - aly <3 ▪ Confidentiality is protected 2. Permitted by law o Avoid offering excessive incentives for research participation that could coerce participation o DO not conduct study that involves deception unless they have justified the use of deceptive techniques in the study ▪ Must be discussed as early as possible and not during the conclusion of data collection o They must give opportunity to the participants about the nature, results, and conclusions of the research and make sure that there are no misconceptions about the research o Must ensure the safety and minimize the discomfort, infection, illness, and pain of animal subjects ▪ If so, procedures must be justified and be as minimal as possible ▪ During termination, they must do it rapidly and minimize the pain o Must no present portions of another’s work or data as their own ▪ Must take responsibility and credit, including authorship credit, only for work they have actually performed or to which they have substantially contributed ▪ Faculty advisors discuss publication credit with students as early as possible o After publishing, they should not withhold data from other competent professionals who intends to reanalyze the data ▪ Shared data must be used only for the declared purpose o RA 9258 – Guidance and Counseling Act of 2004 o RA 9262 – Violence Against Women and Children o RA 7610 – Child Abuse o RA 9165 – Comprehensive Dangerous Drugs Act of 2002 o RA 11469 – Bayanihan to Heal as One Act o RA 7277 – Magna Carta for Disabled Persons o RA 11210 – Expanded Maternity Leave Law o RA 11650 – Inclusive Education Law o RA 10173 – Data Privacy Act o House Bill 4982 – SOGIE Bill Hi :) this reviewer is FREE! u can share it with others but never sell it okay? let’s help each other <3 -aly