PSCYHOLOGICAL TEST PSYCHOLOGICAL ASSESSMENT SCALES SCORING PRINCIPLES/INTRO Set of items designed to measure characteristics of human beings that pertains to behavior. Psychological assessment gathering use tools: tests, interviews, case studies, observations Collaborative- assessor &assesse may work as partners Therapeutic Dynamic Relate raw scores to some theoretical/empirical distribution (IRON) *cut score* SCALES OF MEASUREMENT (IRON) Mag. / Eq.Int / - Abs. 0 NOMINAL: no ranking ORDINAL: ranking - / - - - - PARAMETRIC: Normal distribution of scores (Pearson r) / / NONPARAMETRIC: Abnormal Distribution of Scores (Spearman, Chisquare in Nominal) Mag. Eq.Int Abs. 0 INTERVAL: temp., time, iq / How frequent each value was obtained Abnormal distribution- skewed Normal distribution- falls on central tendency (mean, median, mode) Mean- average score SD- approximation of the average deviation around the mean; square root of variance Z scores- difference between a score and the mean, divided by SD Falls at high end of distribution. *means test is too difficult Falls at lower ends of distribution. *means test is too easy Percentage of people whose scores on a test falls below a particular raw score Percentile: specific scores within a distribution DISTRIBUTION RATIO: weight, height FREQUIENCY DIST’N POSITIVE SKEW NEGATIVE SKEW PERCENTILE RANK PERSONALITY TESTS OBSERVATION -behavioral observation -observation checklist Unexpected and unpredictable ratings given for a number of reasons Rater is not really interested in the process Differing skills are rated similarly when sequentially ordered as in a process Performance rating is influenced by unrelated impressions Poorly worded skill specification in an unintended manner Give middle range rating (Likert scale) A rater marks at the same place on the rating scale regardless of examinee’s performance Give high positive ratings despite differences among examinee’s performance Give low negative ratings despite differences among examinee’s performance BIAS SOURCES ASSESSMENT TECHNIQUES (D I T O) INTERVIEWS TESTS -interview responses, -initial assessment > screening verification -Structured -written, verbal, visual -Unstructured o Criterion-referenced test (CRT) – relate to the content of the test (ex. Theres a certain criterion to be met) PROJECTIVE TESTS: ambiguous test stimulus, unclear responses Wishes, Intrapsychic conflict, Desires, APTITUDE TESTS: predicting, acquiring skills or Unconscious motives competencies Subjectivity on test- interpretation/clinical Ex. Differential Aptitude Test judgement Self administered/individual tests Unlimited responses Results are integrated into a single score interpretation DOCUMENTS -records, protocols, collateral reports RESPONSE SET LENIENCY ERROR SEVERITY ERROR CENTRAL TENDENCY ERROR PROXIMITY ERROR HALO ERROR LOGICAL ERROR LACK OF INTEREST ERROR IDIOSYNCRATIC ERROR Test has two forms Test designed to be administered to an individual more than once Tests with factorial purity Test-retest and Parallel-Form Reliabilty Internal Consistency Inter-rater Reliability KR20 Split-half reliability Cronbach Alpha Test-retest Reliability WHICH TYPE OF RELIABILITY IS APPROPRIATE? Parallel Forms Reliability PERSONALITY TESTS: traits/domains/factors. Usually no right or wrong answers Ex. MBTI Test with items carefully ordered according to difficulty Tests involves some degree of subjective scoring Tests involves dichotomous items Dynamic Characteristics –ever changing characteristics that change through time or situation. Static Characteristics –Characteristics that would not vary across time PSYCHOLOGICAL TESTS ABILITY TESTS INTELLIGENCE TESTS: general potential to solve problems Verbal intelligence Non-verbal intelligence Ex. WAIS, Stanford Binet Int. Scale, Culture Fare Intelligence Test ACHIEVEMENT TESTS: previous learnings. Measures the extent of one’s knowledge; various academic subject Ex. Stanford Achievement Test in reading OBJECTIVE TESTS: structured. “Yes or No” or “True or false” Standardized: test administration, scoring, interpreting scores Limited number of responses Group tests NORMS: where we base the scores. o Norm-referenced test (NRT) – test takers perform better or worse (ex. Age norms) CONTENT VALIDITY -measures what it purports to measure CRITERION RELATED -correlate what occurs in the future - Test scores may be obtained at one time and the criterion measure may be obtained in the future after an intervening event. -performance is predicted based on one or more known measured variables -ex. MAT, GRE< GMAT PREDICTIVE –how well a test corresponds w/ a particular criterion -criterion – standard ->Characteristics: Relevant, valid and reliable, Uncontaminated -Criterion contamination- criterion based on predictor measures -both valid and reliable -performance in the first measure should be highly correlated w/ performance on the second CONCURRENT -correlate what is occurring now -Both the test scores and the criterion measures are obtained at present -valid, reliable and considered a standard -often confused with a construct validity strategy CONSTRUCT CONVERGENT DIVERGENT -An informed scientific idea developed or hypothesized to describe or explain a behavior; something built by mental synthesis. Unobservable, presupposed traits -Required when no criterion or universe of content is accepted as entirely adequate to define the quality being measured. -A test has a good construct validity if there is an existing psychological theory which can support what the test items are measuring. -both logical analysis and empirical data. -general than specific and provide frame of reference EVIDENCES: 1. Test is homogenous, measuring a single construct. 2. Test score increases or decreases as a function of age, passage of time, or experimental manipulation. 3. Pretest, posttest differences 4. Test scores differ from groups. 5. Test scores correlate with scores on other test in accordance to what is predicted. UNIDIMENSIONAL- one construct MULTI-DIMENSIONAL- several constructs INTER-RATER RELIABILITY -Also called as divergent/discriminant validity -A validity coefficient sharing little or no relationship between the newly created test and an existing test. -Social Desirability test and Marital Satisfaction test. -test measuring something different from the other test measure INTERNAL CONSISTENCY -Kappa Statistics -different raters, using common rating form, measure the object of interest consistently. -“Are the raters consistent in their ratings?” *Cohen’s Kappa –used to know the agreement among 2 raters *Fleiss’ Kappa –used to know the agreement among 3 or more raters. - The test is correlated to another measure -correlate well; measure the same construct as to other test Ex. Depression test and Negative Affect Scale - administered to the same subjects as the measure is being validated. Two measures are intended to measure the same construct, but are NOT administered in the same fashion. PARALLEL-FORM RELIABILITY -essence of what you’re measuring consist of topics and processes -often made by expert judgement -GENERALIZABILITY – examiner will generalize from the sample of items to the degree of content mastery possessed by individual examinee EDUCATIONAL CONTENT VALID TEST - follows TOS EMPLOYMENT CONTENT VALID TESTappropriate job related skills. Reflect the job specification of the test CLINICAL CONTENT VALID TEST- symptoms of disorders are covered. Reflects the diagnostic criteria for a test CONSTRUCT UNDERREPRESENTATION- failure to capture important components of a construct CONSTRUCT-IRRELEVANCE VARIANCE- when scores are influence by factors irrelevant to the construct CONTENT VALIDITY RATIO (CVR)- by Lawshe, proposed a structured & systematic way of establishing content validity of a test RELIABILITY TEST-RETEST RELIABILITY -r -Equivalent (Are the two forms of the test equivalent?) -different forms of the same test are administered to the same group at different times ->high reliability coefficient -Tests should contain the same number of items and the items should be expressed in the same form and should cover the same type of content. The range and level of difficulty of the items should also be equal. Instructions, time limits, illustrative examples, format and all other aspects of the test must likewise be checked for equivalence. -PROBLEM: difficulty of developing another form -consistency of a test -indicate how stable a test score. -should produce similar results consistently if it measures the same thing -A TEST CAN BE RELIABLE BUT WITHOUT BEING VALID -Stability (Will the scores be stable over time?” -Pearson r -gives the same test to the same group of test takers on 2 different times -carryover effect: “too short”when the first testing session influences the results of the second session and this can affect the test-retest reliability of a psychological measure -practice effect: a type of carryover effect wherein the scores on the second test administration are higher than they were on the first. -used only in measuring traits/characteristics that do not change over time -error variance: corresponds to the random fluctuations of performance from one test session to the other. -“How well does each item measure the content/construct under consideration?” -Used when tests are administered once. -There is consistency among items within the test. If all items on a test measure the same construct, then it has a good internal consistency. *SPLIT-HALF RELIABILITY- spearman brown prophecy formula -splitting the items on a questionnaire or test in half, computing a separate score for each half, and then calculating the degree of consistency between the two scores for a group of participants. (Odd or even) *CRONBACH ALPHA- Used when two halves of the test have unequal variances. -Provides the lowest estimate of reliability. -Average of all split halves. Ex. Likert scale items *KR20- for binary; dichotomous. Tests with right or wrong format Central Tendency Variability Standard scores Frequencies I.DESCRIPTION OF THE GROUP A. B. C. D. A. B. C. D. E. II. CORRELATE VARIABLES Pair of Interval or continuous – Pearson r Pair of Ordinal – Spearman Rho Pair of Dichotomous – KR20 a. Both alternatives One continuous and one dichotomous a. True – Point Biserial b. Artificial – Biserial 3 or more raters – Agreement a. Kendal’s Coefficient Concordance ASSESSMENT A. B. III. COMPARISON OF GROUPS Random Sampling a. 2 separate groups w/ individual means – T-test independent measures b. 1 group, 2 scores – T-test dependent c. 3 or more groups – ANOVA Repeated d. 1 group, 3-more scores – ANOVA 1way e. 2 or more groups per group – ANOVA Split Plot or Mixed design f. 2 IV’s; 1 DV – ANOVA Two way i. 4 groups- 2x2 design Non Random Sampling a. 2 separate groups – Mann Whitney U b. 1 group; 2 Ordinals – Wilcoxon Signed Rank Test c. 3 or more groups – Kruskal Wallis & H-test d. 3 or more ranks – Freidman Test e. 1 group into categories/frequencies – Chi-square A. B. C. D. E. IV. PREDICTING VARIABLES One is to one – Linear Regression More than one is to one (X1+ X2+ X3=Y) – Multiple Regression Sets of predictors ; Significant or not: Hierarchical Regression M1 Xq = Y M2 X q + X2 = Y M3 Xq + X2+ X3 = Y Sets of predictors ; All significant – Stepwise Regression M1 Xq* = Y M2 Xq* + X2* = Y M3 Xq* + X2*+ X3* = Y Outcome is Nominal – Logistic Regression TESTING Broad array of evaluative process Instruments that yield scored based on collected data (a subset of assessment) Objective- answers, solves problems, decides Obtain some measure (numerical in nature with regard to ability/attribute) Process: Individualized process Process: Individualized or grouped Role of evaluator: Key in the choice of tests Role of evaluator: May be substituted Skills of evaluator: Educated selection of tools, and skilled Skills: Technician-like skills Outcome: Logical problem solving approach Outcome: Yield test scores/series of test scores Technical Quality – to a test’s psychometric soundness TESTS ITEM - Suggests a sample of behavior of an individual. 1. Content – the subject matter of the test SCALE - Process by which a response can be scored. 3 FORMS OF ASSESSMENT (T C D) 2. Format – pertains to the form, plan, TYPES OF PSYCHOLOGICAL TESTS 1. THERAPEUTIC PSYCHOLOGICAL ASSESSMENT – the patient gains insight about the disorder & later develop structure, arrangement, and layout of 1. NUMBER OF TEST TAKERS psychological wellness test items a. Individual 2. COLLABORATIVE PSYCHOLOGICAL ASSESSMENT – the patient helps the clinician to uncover the disorder 3. Administration Procedures – b. Group 3. DYNAMIC PSYCHOLOGICAL ASSESSMENT – follow process (ABA Design) administered on a one-to-one basis or by 2. VARIABLE BEING MEASURED a. Evaluation group a. ABILITY b. Therapy/intervention 4. Scoring and Interpretation – i. ACHIEVEMENT c. Evaluation a. Score – code or summary ii. APTITUDE/PROGNOSTIC ASSESSMENT TOOLS (O P I) statement that reflects an iii. INTELLIGENCE 1. OBSERVATION – monitoring the actions of others or oneself by visual or electronic means while recording evaluation of performance on a b. PERSONALITY quantitative and/or qualitative information regarding those action test i. OBJECTIVE/STRUCTURED a. Natural observation - observing behaviors in setting in which behavior would typically be expected to b. Scoring – process of assigning such ii. PROJECTIVE/UNSTRUCTURED occur evaluative codes or statements to iii. INTERESTS b. Role play test - a tool of assessment wherein examinees are directed to act as if they were in a particular performance on tests situation, 2. PYSCHOLOGICAL TESTING – A set of items used for testing/ measuring/ determining individual difference. The process MAXIMUM PERFORMANCE TESTS CHARACTERISTS OF PSYCHOLOGICAL TESTING SPEED TEST – test is homogeneous, means that it is easy. Short of measuring psychology related variables by means of a device. 1. Objective – free from the time. 3. INTERVIEW – gathering information through direct communication. Differ from their purpose, length, and nature. subjective perception POWER TEST – few items but more complex a. Panel interview – multiple interviewers 2. Standardized – Uniformity exists REFERENCE SOURCES –sources for authoritative info about published test i. Advantage: minimizes the idiosyncratic biases of a lone interviewer 3. Reliable – there is consistency in ii. Disadvantage: costly; the use of multiple interviewers may not be even justified Test Catalogues – brief description of test test results iii. Portfolio: sample of one’s ability and accomplishment. Test manuals – detailed information of a test 4. Valid – test measures what it iv. Case history data: refers to records, transcripts, and other accounts in written, pictorial. CASE REFERENCE VOLUMES – “one-stop shopping” purports to measure STUDY - a report or illustrative account concerning a person or an event that was compiled on Journal articles 5. Good predictor validity – test the basis of case history data Online data bases results suggest future behavior. ETHICAL CODE Professional guidelines for appropriate behavior o American Counseling Association (2005) o American Psychological Association (2003) o Psychological Association of the Philippines (2009) WHEN CAN REVEAL CONFIDENTIAL INFORMATION 1. If a client is in danger of harming himself or herself or someone else; 2. If a child is a minor and the law states that parents have a right to information about their child; 3. If a client asks you to break confidentiality (for example, your testimony is needed in court) 4. If you are bound by the law to break confidentiality (for example you are hired by the courts to assess an individual’s capacity to stand trial); 5. To reveal information about your client to your supervisor in order to benefit the client; 6. When you have a written agreement from your client to reveal information to specified sources (for example, the court has asked you to send a test report to them). RESPONSIBILITIES OF TEST USERS, PUBLISHERS, AND CONSTRUCTORS Use assessment instrument to samples similar of the standardization group (reliability, validity, established norms) Test users must possess knowledge of test construction and supporting researches of any test they administer. Test developers should provide psychometric properties of the test specified scoring and administration and clear description of the normative sample. MORAL ISSUES DIVIDED LOYALTIES - Psychologist are torn whether their client is the institution or the person. Human Rights Institutions should be informed of what they needed or answer the referral question only. Labeling Invasion of Privacy Divided Loyalties Responsibilities of Test Users, Test Publishers, and Test Constructors HUMAN RIGHTS Right to Informed Consent Right to know their test results and basis of any decisions that affect their lives Right to know who will have access to test data and right to confidentiality of test results. INFORMED CONSENT Permission given by the client after assessment process in explained. Informed consent involves the right of clients to obtain information about the nature and purpose of all aspects of the assessment process and for clients to give their permission to be assessed. NON-REQUIREMENT OF INFORMED CONSENT Mandated by the law. Testing as routine educational, institutional, or organizational activity. Evaluation of decisional capacity. LABELING CONFIDENTIALITY - Ethical guideline to protect client information. Whether Effects of Labeling conducting a broad assessment of a client or giving one test, keeping information o Results to Stigmatization confidential is a critical part of the assessment process and follows similar guidelines o Affects one’s access to help to how one would keep information confidential in a therapeutic relationship. o Make a person passive INVASION OF PRIVACY The codes generally acknowledge that, to some degree, all test invade one’s privacy and highlight the importance of clients understanding how their privacy and highlight the importance of clients understanding how their privacy might be violated upon. TEST SCORING & INTERPRETATION The codes highlight the fact that when scoring test and interpreting their results, professionals should reflect on how test worthiness (reliability, validity, cross-cultural fairness, and practicality) might affect the results. TEST SECURITY The codes remind professionals that it is their responsibility to make reasonable efforts to ensure the integrity of test content and the security of the test itself. Professionals should not duplicate tests or change test materials without the permission of the publisher. ETHICS IN PSYCHOLOGICAL TESTING CHOOSING APPROPRIATE ASSESSMENT INSTRUMENTS Ethical codes stress the importance of professionals choosing assessment instruments that show test worthiness, which has to do with the reliability, validity, cross-cultural fairness, and practicality of a test. Professional must take appropriate actions when issues of test worthiness arise during an assessment so that the results of the assessment are not misconstrued. COMPETENCE IN USING TESTS Requires adequate knowledge and training in administering an instrument. Competence to use tests accurately is another aspect that is stressed in the codes. The codes declare that professionals should have adequate knowledge about testing and familiarity with many test they may use. THREE-TIER SYSTEM LEVEL A - those that can be administered, scored, and interpreted by responsible nonpsychologist who have carefully read the manual and are familiar with the overall purpose of testing. Educational achievement tests fall into this category. Ex. Achievement tests, Specialized Aptitude Test LEVEL B - requires technical knowledge of test construction and use and appropriate advanced coursework in psychology and related courses (Statistics, Individual Differences, and Counseling). Ex. Group Intelligence Test, Personality Test LEVEL C - requires an advanced degree in Psychology or Licensure as a psychologist and advanced training/supervised experience in the particular test. Ex. Projective Test, Individual Intelligence Test, Diagnostic Test CROSSCULTURAL SENSITIVITY Ethical guideline to protect clients from discrimination and bias in testing. The code stresses the importance of professionals being aware of and attending to the effects of age, color, cultural identity, disability, ethnicity, gender, religion, sexual orientation, and socioeconomic status on administration and test interpretation. PROPER DIAGNOSIS Choose appropriate assessment techniques for accurate diagnosis. The codes emphasize the important role that professionals play when deciding which assessment techniques to use in forming diagnosis for mental disorder and the ramification of making such diagnosis. RELEASE OF TEST DATA Test data are protected-client release required The codes assert that data should only be released to others if the clients have given their consent. The release of such data is generally only given to individuals who can adequately interpret the test data and to those who will not misuse the information. TEST ADMINISTRATION The codes reinforce the notion that tests should be administered in a manner that is in accord with the way that they were established and standardized. Alterations to this process should be noted and interpretations of test data adjusted in the testing conditions were not ideal. MORAL MODEL OF DECISION MAKING AUTONOMY - Respecting the client’s right of self-determination and freedom of choice. NON-MALEFICENCE - Ensuring the professionals do no harm BENEFICENCE - Promoting the well being of others and of society JUSTICE - equal and fair treatment to all people and being non discriminatory. FIDELITY - Being loyal and faithful to your commitments in the helping relationship. VERACITY - Dealing honestly with the client. NORMS AND STATISTICS USE OR TWO TYPES OF STATISTICS MEASURE OF CENTRAL TENDENCY - Statistics that indicated the average or midmost score between the extreme scores in distribution. 1. DESCRIPTIVE – used for making interpretation of test results. Provide concise description of quantitative information MEAN – the most appropriate central tendency for interval and ratio when distribution is normal. 2. INFERENTIAL – provide conclusions regarding a population based on the observation on a sample MEDIAN – middle score of the population SCALES OF MEASUREMENT MODE – the most frequently occurring score in a distribution 1. NOMINAL – naming; labeling; one category does not suggest that the other is higher or lower. Ex. Gender; religion MEASUREMENT OF VARIABILITY 2. ORDINAL – observations can be ranked into order but the degree of difference is unobtainable. Ex. Position in the company Indicates how scattered the score are distribution; how far one score is from the other. Measures the dispersion of the scores. 3. RATIO – there is magnitude, equal intervals, and true zero Range –equal to the difference of HS to LS 4. INTERVAL – there is magnitude and equal interval; no true zero INTERQUARTILE AND SEMI-INTERQUARTILE RANGE *magnitude - “moreness”; we suggests that one is more than the other Quartile –points that divide the distribution into 4 equal parts. *equal interval - the difference between two points at any place has the same meaning as the difference between two other points on other Interquartilerange –difference between Q3 and Q1; represents the middle 50% of the distribution. places. Semi-interquartilerange -(Q3 –Q1)/2 *absolute zero - zero suggest absence of the variable being measured *most psychological data are ordinal by nature but are treated as interval. *IQ are initially for classification and not for measurement (cited by Binet) FREQUENCY DISTRIBUTION - Displays scores on a variable or a measure to reflect how frequent each value was obtained. *GRAPH - a diagram or chart illustrating data Histogram - graphs with vertical lines at the true limits of each test score; connected bars; used for continuous data Bar graph – used in describing frequencies; disconnected bars Frequency Polygon – points are plotted at the class mark of each of the intervals; Continuous lines KURTOSIS - The steepness of a Distribution PLATYKURTIC – flat; the difference of the number of test takers who got high and low score is not far from the number of test takers who got a score in equivalent to the mean LEPTOKURTIC – Peaked; the difference of the number of test takers who got high and low score is far from the number of test takers who got a score in equivalent to the mean. MESOKURTIC – Middle; the distribution is deemed normal. DECILE - Points where the distribution is equally divided into 10 parts. D1 –D9 LINEAR TRANSFORMATION - Derived formula of the Z-score to transform one score from a scale to another score. NS = SD(Z)+M PERCENTILE RANK Tells the relative position of a test taker in a group of 100. Suggests how many samples fall below a specified score. For example: if person has a score equivalent to percentile 50, it suggests that 50 percent of the test takers fall below that specific score. CORRELATION - Statistical tools for testing the relationship between variables. COVARIANCE – How much two scores vary together CORRELATIONAL COEFFICIENT – mathematical index that describes the direction and magnitude of a relationship. o Ranges from -1.00 to +1.00 o The nearer to 1; the stronger the relationship o The nearer to 0; the weaker the relationship o The symbol suggests the type of relationship (negative = indirect relationship; positive = direct relationship) CORRELATIONAL STATISTICS o PEARSON PRODUCT MOMENT CORRELATION – 2 variables in interval/ratio scale o SPEARMAN RHO – correlates 2 variables in ordinal scale. Also called rank-ordered correlation. o BISERIAL CORRELATION – 1 continuous and 1 artificial dichotomous data (dichotomy in which there are other possibilities in a certain category) o POINT BISERIAL CORRELATION – 1 continuous and 1 true dichotomous data (dichotomy in which there are only two possible categories.) PHI COEFFICIENT – 2 dichotomous data; at least 1 true dichotomy TETRACHLORIC COEFFICIENT – 2 dichotomous data; both are artificial dichotomy COEFFICIENT OF ALIENATION - measure of non association between two variables COEFFICIENT OF DETERMINATION - Suggests the percentage shared by two variables. The effect of one variable to another. r=0.75; r2=0.56 o o o o STANDARD DEVIATION - Approximation of the average deviation around the mean. Gives detail of how much above or below a score to the mean. NORMAL DISTRIBUTION – majority of the test takers are bulked at the middle of the distribution, very few test takers are at the extremes POSITIVELY SKEWED – more test takers got a low score. Mean>median>mode NEGATIVELY SKEWED – more test takers got a high score. Mode>median>mean STANDARD SCORES - A raw score that has been converted from one scale to another scale. Provide a context of comparing scores on different tests by converting scores from the two tests into z-score Z SCORE – Mean of 0; SD of 1. Zero plus or minus one scale. When determined, can be used to translate one scale to another. T-SCORE – Mean = 50; SD = 10. Created by McCall in honor of his professor Thorndike STANINE – Mean = 5; SD = 2. Used by US Airforce Assessment. Takes whole numbers 1 –9; no decimals DEVIATION IQ – Mean = 100; SD = 15. Used for interpreting IQ STEN – Standard ten. Mean = 5.5; SD = 2 GRE/SAT – Mean = 500; SD = 100. Used for admission for graduate school and college NORMS - Performance by defined groups on a particular test. Transformation of raw scores in making meaningful interpretations of scores on a test NORMING - process of creating norms NORMATIVE SAMPLES - group of people whose performance on a particular test is analyzed and referred RACE NORMING – norming based on race/ culture USER NORMS - norms provided by the test manuals NORMAN - the person who constructs a norm CRITERION-REFERENCE - interpretation of test is based on a certain standards. NORM-REFERENCE - Score is interpreted based on the performance of a standardized group. 1. DEVELOPMENTAL NORMS – indicates how far along the normal developmental path an individual has progressed. - AGE NORMS, GRADE NORMS, ORDINAL SCALE 2. WITHIN GROUP NORMS – individual’s performance is evaluated in terms of the performance of the most nearly comparable standardization group. a. PERCENTILE b. STANDARD SCORE c. DEVIATION IQ 3. NATIONAL NORMS – norms on large scale samples a. SUBGROUP NORMS b. LOCAL NORMS REGRESSION (Ŷ = a + Bx) Intercept (a) –the point at which the regression line crosses the Y axis Regression Coefficient (b) –the slope of the regression line. Regression line –best fitting straight line through a set of points in a scatter plot Standard Error of Estimate –measure the accuracy of prediction MULTIPLE REGRESSION - statistical technique in predicting one variable from a series of predictors. Used to find linear combinations of three or more variables. Applicable only when the data are all continuous. (FACTOR ANALYSIS) STANDARDIZED REGRESSION COEFFICIENT - Also called as beta weights. Tells how much a variable from a given list of variables predict a single variable. FACTOR ANALYSIS - Used to study the interrelationships among set of variables. Factors –variables; Also called as principal components Factor Loading –the correlations between the original and the factors; depicted through beta weights. META-ANALYSIS - Family of techniques used to statistically combine information across studies to produce single estimates of the data under study. Effect size –the estimate of the strength of relationship or size of differences. Evaluated through correlation coefficient ITEM ANALYSIS AND ITEM CONSTRUCTION ITEM WRITING GUIDELINES: ITEM ANALYSIS - general term for a set of methods used to evaluate test items, one of the most important aspects of test construction. I. ITEM DIFFICULTY - measures achievement/ability, defined by the number of people who get correct items. Indicates the easiness of Define clearly the test. Should range from 0.30-0.70. Achievement tests make use of multiple choice because it has 0.25 chance of getting the correct Generate item pool response Avoid long items a. Optimum item difficulty - suggests the best difficulty for an item based on the number of responses. Keep level of reading difficulty appropriate for those who will complete the test. i. OID = (chance performance + 1)/ 2 Avoid double-barreled items (more than one ideas in one item) ii. Chance performance –performance based on guessing. Can be equated by dividing 1 from the number of Consider making positive & negative worded items distractors. ITEM FORMAT - Form, plan, structure, arrangement, and layout of individual test items. b. Item difficulty index - value that describes the item difficulty for an ability test. I. SELECTED RESPONSE FORMAT – select a response from a set of alternative responses. c. Item endorsement index - value that describes the percentage of individuals who said endorsed an item in a personality a. DICHOTOMOUS FORMAT - offers 2 alternatives for each item. ADVANTAGE: simplicity, easy administration, quick score, test. no neutral response. DISADVANTAGE: needs more items, 50% chance of getting the correct answer; sample can d. Omnibus spiral format - Items in an ability test are arranged into increasing difficulty. memorize responses i. Give away items –presented near the beginning of the test to spur motivation and lessen test anxiety. b. POLYCHOTOMOUS - has more than 2 alternatives. Ex. multiple choice. II. ITEM RELIABILITY - Indicates the internal consistency of a test. The higher the index; the higher the internal consistency. i. Question - stems a. (Item Reliability) = (SD of the item) x (item-total correlation) ii. Correct choice - keyed response b. Factor analysis can also be used to determine which items has more load for the whole test. iii. Distractors - incorrect choices. III. ITEM VALIDITY - indication of the degree to which a test is measuring what it purports to measure. Higher item-validity index; the iv. Cute distractors - less likely to be chosen, may affect the reliability of the test higher the criterion related validity for the test. c. LIKERT FORMAT - requires the respondent to indicate the degree of agreement with a particular attitudinal question. a. Item Validity = (item standard deviation) x (correlation of item and criterion) Superior item format. Uses factor analysis. Can be 5-4/6 choice format *without neutral point*. Negative items are IV. ITEM DISCRIMINABILITY - How well an item performs in relation to some criterion. How adequately an item separates high scorers reversed score then summed up all scores. from low scorers on the entire test. Limits at 0.30 discrimination index. The higher the d the more high scorers answering the item d. CATEGORY - asked to rate a construct from 1-10; 1-lowest and 10-highest. correctly e. CHECKLIST - a subject receives a long list of adjectives and indicates whether each one is characteristic of himself or a. Extreme group method – compares people who have done well with those who have done poorly on a test herself b. Point biserial – correlating dichotomous and continuous data. Correlates whether those who got an item correct tends to f. QSORT - requires respondents to sort a group of statements into 9 piles. have high scores as well g. GUTTMAN SCALE - Items are arranged from weaker to stronger expressions of attitude, belief, or feeling being measured. V. ITEM CHARACTERISTIC CURVE - Graphic representation of item difficulty and discrimination. Usually plots the scores at x-axis then p II. COMPLETION ITEMS – complete a set of stimuli to complete a certain item. and d on the y-axis. a. ESSAY ITEMS - samples need to respond to a question by writing a composition; used to determine the depth of VI. ITEMS FOR CRITERION REFERENCE TEST - frequency polygon is created after the test given to two groups; one group that is exposed to knowledge of the respondent. learning unit, another group that is not exposed to learning unit EQUAL APPEARING INTERVAL a. Antimode-the score with the lowest frequency Described by Thurstone b. Determination of cut score (passing score) for a criterion referenced test. Scale wherein + and –items are present VII. DISTRACTOR ANALYSIS – Adds all responses in order to transform it into interval scale. VIII. ISSUES AMONG TEST ITEMS Uses direct estimation scaling a. ITEM FAIRNESS - Degree of an item is biased. o Direct estimation scaling - Transformation of a scale to other scales is possible due to computable value of the mean i. Biased Test Items –items that favor one particular group of examinees. Can be tested using inferential o Indirect estimation scaling - Cannot be transformed to other scales because the mean is not present. statistics among groups. COMPUTER ADAPTIVE TESTING - Also called as computer assisted testing. Interactive computer-administered test-taking process where in items b. QUALITATIVE ITEM ANALYSIS - Involve exploration of the issues through verbal means such as interviews and group presented to the test taker are based in part on the test taker’s performance on previous items discussions conducted with test takers and other relevant parties ITEM BANK – relatively large and easily accessible collection of test questions c. THINK OUT LOUD ADMINISTRATION - Allows test takers (during standardization) to speak their mind while taking the ITEM BRANCHING – ability of the computer to tailor the content and order of presentation of test items on the basis of response to test. Used for shedding light to the test taker’s thought process during the administration of the test. previous item d. EXPER PANELS - Guide researchers/test developers in doing sensitivity review (especially in cultural issues) SCORING ITEMS i. Sensitivity review –a study of test items typically to examine test bias, presence of offensive language and I. CUMULATIVE MODEL – the higher the score on the test, the higher the test taker is on the ability, trait, or other category. stereotypes II. CLASS SCORING/CATEGORY SCORING – test taker response earn credit toward placement in a particular class or category with other test takers whose pattern of responses is similar in some way. Most useful in diagnostic tests III. IPSATIVE SCORING – compares a test taker’s score on one scale within a test to another scale within that same test TEST DEVELOPMENT - umbrella term that goes into the process of creating a test. I. TEST CONCEPTUALIZATION - wherein idea for a particular test is conceived. Following are determined: Construct, Goal, User, Taker, Administration, Format, Response, Benefits, Costs, Interpretation. Determination whether the test would be Norms-Referenced or Criterion-Referenced a. Pilot work - May be in the form of interview in determining appropriate item for the test II. TEST CONSTRUCTION –writing test items, formatting items, scoring rules, design and building a test. a. Scaling –process of setting rules for assigning numbers in measurement. Manifested through its item format (dichotomous, polytomous, likert, catergory) b. Item pool - usually 2 times the intended final form number of items. 3 times is more advisable III. TEST TRYOUT - administration of a test to a representative sample of test takers under conditions. Issues: a. Determination of target population b. Determination of number of samples for test tryout (# of items multiplied to 10) c. Test tryout should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered. ITEM ANALYSIS - Entails procedures usually statistical designed to explore how individual test items work as compared to other items in the test and in the context of the whole test. (validity, reliability, item difficulty and discrimination TEST REVISION - Balancing of the weakness and strengths of the test/an item. a. Norming - Done after the test has been revised into acceptable levels of reliability, validity, and item index. IV. V. TEST ADMINISTRATION ISSUES IN TEST ADMINISTRATION The Examiner and the Subject Subject Variables Training of the Test Administrator Behavior Assessment Issues Mode of Administration EXAMINER AND THE SUBJECT - Relationship between Examiner and the test taker Wechsler Intelligence Scale for Children (WISC) enhanced rapport increased score Faulty Response Style o Acquiescent Response –tendency to have increased agreement in responding in a test or interview. Most responses are positive in test items regardless of item content o Socially Desirable Response Style –Present oneself in favorable or socially desirable way Language of the test taker - Test takers proficient in two or more languages should be tested to the language they are most comfortable. Race of test taker - There are significant effects from the examiner’s race to the samples responses. TRAINING OF TEST ADMINISTRATOR Different assessment procedures require different levels of training. According to research, at least 10 practice sessions are needed to gain competency in scoring WAIS –R MODE OF ADMINISTRATOR Self administered measures shows lower results than psychologist administered. Telephone interviews show better health than self administered interviews. SUBJECT VARIABLES I. TEST ANXIETY - anxiety based on test performance. (worry, emotionality, lack of self confidence) II. ILLNESS - diseases influence test taking behavior and performance (malingerers) III. HORMONES - imbalance of hormones affect mood cycles thus affect performance on a test IV. MOTIVATION - required to take testing as occupational requirement tend to have unreliable results ERRORS OF BEHAVIORAL ASSESSMENT I. REACTIVITY - Being evaluated increases performance; also called as Hawthorne Effect II. DRIFT - moving away from what one has learned going to idiosyncratic definitions of behavior; this suggests that observers should be retrained in a point of time a. CONTRAST EFFECT - tendency to rate the same behavior differently when observations are repeated in the same context. III. EXPECTANCIES - Tendency for results be influenced by what test administrators expect to find. a. Rosenthal Effect –the test administrator’s expected results influence the result of the test. b. Golem Effect –negative expectations from the test administrator decreases one’s performance. IV. RATING ERRORS - judgment resulting from intentional and unintentional misuse of a rating scale a. Halo Effect –tendency to ascribe positive attribute independently of the observed behavior; suggested by Thorndike b. Leniency Error/ Generosity Error –rater’s tendency to be too forgiving and insufficiently critical c. Severity Error –evaluation to be overly critical d. Central Tendency Error –The rater has reluctance in giving ratings at either positive or negative extreme. e. Rater’s ratings would tend to cluster in the middle of the continuum. f. General Standoutishness–People tend to judge on the basis of one outstanding characteristic. INTERVIEW - Method of getting information by talk, discussion, or direct question. I. DIRECTIVE INTERVIEW - Interviewer directs, guides, and controls the course of the interview. II. NONDIRECTIVE INTERVIEW - the interviewee guides the interview process. III. SELECTION INTERVIEW - it was designed to elicit information pertaining an applicants qualifications and capabilities for particular employment duties IV. SOCIAL FACILITATION INTERVIEW - Interviewers serve as a model for the interviewee. PRINCIPLES OF EFFECTIVE INTERVIEW I. PROPER ATTITUDE – a. INTERPERSONAL INFLUENCE – degree to which one person can influence another. b. INTERPERSONAL ATTRACTION – degree to which people share a feeling of understanding mutual respect similarity and the like. II. RESPONSES TO AVOID – a. JUDGEMENTAL STATEMENTS – evaluating the thoughts, feelings, or actions of another b. PROBING STATEMENTS – Demanding more information than the interviewee wishes to provide voluntarily c. HOSTILE STATEMENTS d. FALSE ASSURANCE III. EFFECTIVE RESPONSE a. OPEN ENDED QUESTIONS b. SUMMARIZING c. TRANSITIONAL PHASE d. CLARIFICATION RESPONSE e. PARAPHRASING AND RESTATEMENT f. EMPATHY & UNDERSTANDING TEST UTILITY USES OF TEST Classification –Assigning a person to one category rather than another Screening –refers to quick and simple tests or procedures identify persons who might have special characteristics or needs. Placement –sorting of persons into different programs appropriate to their needs or skills. Selection -refers to a process whereby each person evaluated for a position will be either accepted or rejected for that position Diagnosis and Treatment Planning –Determination of abnormal behavior; classify using diagnostic criteria; precursor to recommendation of treatment of personal distress. Self Knowledge –understanding of individual’s intelligence and personality characteristics Program Evaluation –Systematic assessment and evaluation of educational and social programs Research –measures variables that suggests correlations and causal relationships UTILITY - Usefulness or practical value of testing efficiency PSYCHOMETRIC SOUNDNESS – Tests should be reliable and valid for it to be used. Reliability sets the limit for Validity –the upper boundary of validity is reliability COST – Disadvantages, losses, or expenses in both economic and non economic terms associated with testing or non testing o ECONOMIC COST – monetary expenses (Personnel, test protocols, testing venues, etc.) o NON ECONOMIC COST – intangible loss (Loss of trust from patrons due to unqualified personnel) BENEFIT – Profits, gains, advantages for testing or non testing o ECONOMIC BENEFIT – monetary benefits (Highly qualified salesperson (extroverted) can reach quotas equivalent to financial gains) o NON ECONOMIC BENEFIT – Increase in quality and quantity of worker’s performance UTILITY ANALYSIS - Family of techniques that entail a cost-benfitanalysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment Test Comparison Assessment tools comparison Addition of test/assessment tools Determination of non-testing APPROACH OF UTILITY ANALYSIS I. EXPECTANCY TABLES – shows the percentage of people within specified test-score intervals who subsequently were placed in various categories of the criterion. a. TAYLOR-RUSSELL TABLES - Statistical tables once extensively used to provide test users with an estimate of the extent to which inclusion of a particular test in the selection system would improve selection decisions i. SELECTION RATIO – ratio of number of people to be hired and number of applicants ii. BASE RATE – Lowest possible percentage of people hired expected to be successful in their job. b. NAYLOR-SHINE TABLES – Indicates the mean difference of the newly selected group and the mean of the standard group/unselected group II. BROGDEN-CRONBACH-GLESER FORMULA (BCG FORMULA) – Calculates the dollar amount of a utility gain resulting from the use of a particular selection instrument under specified conditions a. UTILITY GAIN – an estimate benefit of using a particular test. b. PRODUCTIVITY GAIN – estimated increase in work output TYPES OF INTERVIEWS SOURCES OF ERROR IN INTERVIEW 1. INTAKE INTERVIEWS - Entails detailed questioning about the I. INTERVIEW VALIDITY present complaints a. HALO EFFECT 2. DIAGNOSTIC INTERVIEWS - assignment of DSM b. GENERAL STANDOUTISHNESS 3. STRUCTURE - predetermined, planned sequence of questions that c. CULTURAL DIFFERENCES an interviewer asks a client d. INTERVIEWER BIAS 4. UNSTRUCTURED - no predetermined plan of questions II. INTERVIEW RELIABILITY 5. SEMI-STRUCTURED - Usually starts with unstructured followed by a. MEMORY AND HONESTY OF THE INTERVIEWEE structured targeting a diagnostic classification. b. CLERICAL CAPABILITIES OF INTERVIEWER 6. MENTAL STATUS EXAMINATION(MSE) - quick assessment of how MEASURING UNDERSTANDING the client/patient is functioning at the time of evaluation. LEVEL 1 – Little or no relationship to the interviewee’s 7. CRISIS INTERVIEW - Usually for suicidal or abuse cases response 8. CASE HISTORY INTERVIEW - Discuss developmental stages of the LEVEL 2 – Communicates superficial awareness of the patient meaning of a statement LEVEL 3 – Interchangeable to interviewee’s statements LEVEL 4 – Communicates empathy and adds minimal information/idea LEVEL 5 – Communicates empathy and adds major information/idea