SESSION Measurement and Scaling: Fundamentals and Comparative Scaling SESSION: Outline • Measurement and Scaling • Primary Scales of Measurement • A Comparison of Scaling Techniques • Comparative Scaling Techniques Measurement and Scaling Measurement means assigning numbers or other symbols to characteristics of consumers according to certain pre-specified rules. Measure consumer perceptions, attitudes, preferences, and other relevant characteristics. The rules for assigning numbers should be standardized and applied uniformly. • Measurement is assigning range/ points across which respondents can not vary in their response Measurement and Scaling cont... • Scaling is an extension of measurement. • Scaling involves plotting consumers according to their responses e.g. Consider an attitude scale from 1 to 5. Each respondent is assigned a number from 1 to 5 Where: 1 = Extremely Unfavorable 5 = Extremely Favorable • Measurement is the actual assigning range from 1 to 5 to each respondent and Scaling is the placing of respondents on these numbers with respect to their attitude. Primary Scales of Measurement Nominal Scale • The numbers serve only as labels or tags for identifying and classifying objects. • When used for identification, there is a strict one-to-one correspondence between the numbers and the objects. • The numbers do not reflect the amount of the characteristic possessed by the objects. • The only permissible operation on the numbers in a nominal scale is counting. • Only a limited number of statistics, all of which are based on frequency counts, are permissible, e.g. percentages and mode. Primary Scales of Measurement Ordinal Scale • A ranking scale in which numbers are assigned to objects to indicate the relative extent to which the objects possess some characteristic and also categories of objects. • Can determine whether an object has more or less of a characteristic than some other object, but not how much more or less. • In addition to the counting operation allowable for nominal scale data, ordinal scales permit the use of statistics based on centiles, e.g., percentile, quartile, median. Primary Scales of Measurement Interval Scale • Numerically equal distances on the scale represent equal values in the characteristic being measured. • In Interval scale there is no absolute Zero scale point. • It permits comparison of the differences between objects. • Magnitude of differences can be captured and can comparing the differences in attributes. • Statistical techniques that may be used include all of those that can be applied to nominal and ordinal data, and in addition the arithmetic mean, standard deviation, and other statistics commonly used in marketing research. Nominal, Ordinal and Interval Scale • Country E refers to Croatia, assigned to denote the country and serve the purpose of identification and are not related to their football playing capabilities. - Nominal • Croatia, ranked 5, played better than Turkey, ranked 10. - Ordinal • The magnitude of the difference can be obtained only by looking at the points i.e. difference between 1,266 points and 1,033 points respectively. - Interval Nominal Team ID Ordinal Interval Rank Points A Spain 1 1,565 B Italy 2 1,339 C Germany 3 1,329 D Netherlands 4 1,295 E Croatia 5 1,266 F Brazil 6 1,252 G Argentina 7 1,230 H Czech Rep. 8 1,134 I Portugal 9 1,120 J Turkey 10 1,033 Primary Scales of Measurement Ratio Scale • Possesses all the properties of the nominal, ordinal, and interval scales: – Can identify or classify objects (nominal nature) – Rank the objects (ordinal nature) – Compare intervals or differences (interval nature) • It has an absolute zero point. Absolute zero represents a point on the scale where there is an absence of the given attribute. • All statistical techniques can be applied to ratio data. • Ratio scales determines the absolute rather than the relative strengths of the attribute. Illustration of Primary Scales of Measurement Nominal Scale Ordinal Scale Interval Scale Ratio Scale No. Store Preference Rankings Preference Ratings 1-7 $ spent last 3 months 1. Parisian 2. Macy’s 3. Kmart 4. Kohl’s 5. J.C. Penney 6. Neiman Marcus 7. Marshalls 8. Saks Fifth Avenue 9. Sears 10.Wal-Mart 7 2 8 3 1 5 9 6 4 10 5 7 4 6 7 5 4 5 6 2 0 200 0 100 250 35 0 100 0 10 Primary Scales of Measurement Scale Nominal Ordinal Interval Ratio Basic Characteristics Numbers identify & classify objects Common Examples Social Security nos., numbering of football players Nos. indicate the Quality rankings, relative positions rankings of teams of objects but not in a tournament the magnitude of differences between them Differences Temperature between objects (Fahrenheit) Zero point is fixed, Length, weight ratios of scale values can be compared Marketing Examples Brand numbers, store types Preference rankings, market position, social class Attitudes, opinions, index sales, income, & costs Primary Scales of Measurement (Source: Churchill & Brown, Basic MR, 6e, ISE, 2007 A Classification of Scaling Techniques Scaling Techniques Noncomparative Scales Comparative Scales Paired Comparison Rank Order Continuous Itemized Rating Scales Rating Scales Constant Sum Likert Semantic Differential Stapel A Comparison of Scaling Techniques • Comparative scales involve the direct comparison between options. e.g. What you prefer b/w Pepsi and Coca-Cola. Comparative scale data must be interpreted in relative terms and have only ordinal or rank order properties. • In noncomparative scales, each object is scaled independently of the others in the option set. The resulting data are generally assumed to be interval or ratio scaled. Comparative Scaling Techniques Paired Comparison Scaling • A respondent is presented with two objects and asked to select one according to some criterion. • The data obtained are ordinal in nature. • Paired comparison scaling is the most widely used comparative scaling technique. • Paired Comparison data can be converted to a Rank order data. Obtaining Shampoo Preferences Using Paired Comparisons Instructions: We are going to present you with ten pairs of shampoo brands. For each pair, please indicate which one of the two brands of shampoo you would prefer for personal use. Head & Pert Recording Form: Jhirmack Finesse Vidal Jhirmack aA 0 Sassoon Shoulders 0 1 Finesse 1a Vidal Sassoon 1 1 Head & Shoulders 0 0 0 Pert 1 1 0 1 Number of Times Preferredb 3 2 0 4 0 1 1 0 0 1 0 1 1 in a particular box means that the brand in that column was preferred over the brand in the corresponding row. A 0 means that the row brand was preferred over the column brand. bThe number of times a brand was preferred is obtained by summing the 1s in each column. Paired Comparison Selling The most common method of taste testing is paired comparison. The consumer is asked to sample two different products and select the one with the most appealing taste. The test is done in private and a minimum of 1,000 responses is considered an adequate sample. A blind taste test identifies the imagery, self-perception and brand reputation in the consumer’s mind to predict the probable purchasing decision. A paired comparison taste test Comparative Scaling Techniques Rank Order Scaling • Respondents are presented with several objects simultaneously and asked to order or rank them according to some criterion. • Measure preferences for brands and attributes. • Rank order scaling forces the respondent to discriminate among various brands. Preference for Toothpaste Brands Using Rank Order Scaling Instructions: Rank the various brands of toothpaste in order of preference. Begin by picking out the one brand that you like most and assign it a number 1. Then find the second most preferred brand and assign it a number 2. Continue this procedure until you have ranked all the brands of toothpaste in order of preference. The least preferred brand should be assigned a rank of 10. No two brands should receive the same rank number. Preference for Toothpaste Brands Using Rank Order Scaling Form Brand Rank Order 1. Crest _________ 2. Colgate _________ 3. Aim _________ 4. Gleem _________ 5. Macleans _________ 6. Ultra Brite _________ 7. Close Up _________ 8. Pepsodent _________ 9. Plus White _________ 10. Stripe _________ Comparative Scaling Techniques Constant Sum Scaling • Respondents allocate a constant sum of units, such as 100 points to attributes of a product to reflect their importance. • If an attribute is unimportant, the respondent assigns it zero points. • If an attribute is twice as important as some other attribute, it receives twice as many points. • The sum of all the points is 100. Hence, the name of the scale. Importance of Bathing Soap Attributes Using a Constant Sum Scale Instructions On the next slide, there are eight attributes of bathing soaps. Please allocate 100 points among the attributes so that your allocation reflects the relative importance you attach to each attribute. The more points an attribute receives, the more important the attribute is. If an attribute is not at all important, assign it zero points. If an attribute is twice as important as some other attribute, it should receive twice as many points. Importance of Bathing Soap Attributes Using a Constant Sum Scale Form Average Responses of Three Segments Attribute 1. Mildness 2. Lather 3. Shrinkage 4. Price 5. Fragrance 6. Packaging 7. Moisturizing 8. Cleaning Power Sum Segment I 8 2 3 53 9 7 5 13 100 Segment II 2 4 9 17 0 5 3 60 100 Segment III 4 17 7 9 19 9 20 15 100 SESSION Measurement and Scaling: Non comparative Scaling Techniques SESSION: Outline • • Non comparative Scaling Techniques • Continuous Rating Scale • Itemized Rating Scale Non comparative Itemized Rating Scale Decisions Noncomparative Scaling Techniques • Respondents evaluate only one object at a time. • Noncomparative techniques consist of: – Continuous and – Itemized rating scales. Continuous Rating Scale Respondents rate the objects by placing a mark at the appropriate position on a line that runs from one extreme of the criterion variable to the other. The form of the continuous scale may vary considerably. How would you rate Sears as a department store? Version 1 Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Probably the best Version 2 Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - Probably the best 0 10 20 30 40 50 60 70 80 90 100 Version 3 Very bad Neither good Very good nor bad Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - -Probably the best 0 10 20 30 40 50 60 70 80 90 100 RATE: Rapid Analysis and Testing Environment A relatively new research tool, the perception analyzer, provides continuous measurement of “gut reaction.” A group of up to 400 respondents are presented with TV or radio spots or advertising copy. The measuring device consists of a dial that contains a 100-point range. Each participant is given a dial and instructed to continuously record his or her reaction to the material being tested. As the respondents turn the dials, the information is fed to a computer, which tabulates second-by-second response profiles. As the results are recorded by the computer, they are superimposed on a video screen, enabling the researcher to view the respondents' scores immediately. The responses are also stored in a permanent data file for use in further analysis. The response scores can be broken down by categories, such as age, income, sex, or product usage. Itemized Rating Scales • The respondents are provided with a scale that has a number or brief description associated with each category. • The categories are ordered in terms of scale position, and the respondents are required to select the specified category that best describes the object being rated. • The commonly used itemized rating scales are the Likert, semantic differential, and Stapel scales. Likert Scale The Likert scale requires the respondents to indicate a degree of agreement or disagreement with each of a series of statements about the stimulus objects. Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree 1. Sears sells high quality merchandise. 1 2X 3 4 5 2. Sears has poor in-store service. 1 2X 3 4 5 3. I like to shop at Sears. 1 2 3X 4 5 • The analysis can be conducted on an item-by-item basis (profile analysis), or a total (summated) score can be calculated. Semantic Differential Scale • • • • • • • The semantic differential is a seven-point rating scale with end points associated with bipolar labels that have semantic meaning. SEARS IS: Powerful --:--:--:--:-X-:--:--: Weak Unreliable --:--:--:--:--:-X-:--: Reliable Modern --:--:--:--:--:--:-X-: Old-fashioned The negative adjective or phrase sometimes appears at the left side of the scale and sometimes at the right. This controls the tendency of some respondents, particularly those with very positive or very negative attitudes, to mark the right- or left-hand sides without reading the labels. Individual items on a semantic differential scale may be scored on either a -3 to +3 or a 1 to 7 scale. Mark the blank that best indicates how accurately one or more words/ adjectives describes the characteristics of an object (i.e. aspect to be studied – co., brand etc.) Profile Analysis is used here – each profile/bipolar words are separately analyzed. Means on each rating scale are calculated and compared by plotting the graph. Determines the overall differences and/or similarities among the objects/ respondents. Assess difference across segment of respondents – compare mean responses of different segments. A Semantic Differential Scale for Measuring Self- Concepts, Person Concepts, and Product Concepts 1) Rugged :---:---:---:---:---:---:---: Delicate 2) Excitable :---:---:---:---:---:---:---: Calm 3) Uncomfortable :---:---:---:---:---:---:---: Comfortable 4) Dominating :---:---:---:---:---:---:---: Submissive 5) Thrifty :---:---:---:---:---:---:---: Indulgent 6) Pleasant :---:---:---:---:---:---:---: Unpleasant 7) Contemporary :---:---:---:---:---:---:---: Obsolete 8) Organized :---:---:---:---:---:---:---: Unorganized 9) Rational :---:---:---:---:---:---:---: Emotional 10) Youthful :---:---:---:---:---:---:---: Mature 11) Formal :---:---:---:---:---:---:---: Informal 12) Orthodox :---:---:---:---:---:---:---: Liberal 13) Complex :---:---:---:---:---:---:---: Simple 14) Colorless :---:---:---:---:---:---:---: Colorful 15) Modest :---:---:---:---:---:---:---: Vain Stapel Scale The Stapel scale is a unipolar rating scale with ten categories numbered from -5 to +5, without a neutral point (zero). This scale is usually presented vertically. SEARS +5 +4 +3 +2 +1 HIGH QUALITY -1 -2 -3 -4X -5 +5 +4 +3 +2X +1 POOR SERVICE -1 -2 -3 -4 -5 • Evaluate how accurately each word/ phrase describes the characteristics. Basic Noncomparative Scales Scale Basic Characteristics Examples Advantages Disadvantages Place a mark on a continuous line Reaction to TV commercials Easy to construct Scoring can be cumbersome unless computerized Degrees of agreement on a 1 (strongly disagree) to 5 (strongly agree) scale Measurement of attitudes Easy to construct, administer, and understand More time - consuming Semantic Differential Seven - point scale with bipolar labels Brand, product, and company images Versatile Controversy as to whether the data are interval Stapel Scale Unipolar ten - point scale, - 5 to +5, without a neutral point (zero) Measurement of attitudes and images Easy to construct, administer over telephone Confusing and difficult to apply Continuous Rating Scale Itemized Rating Scales Likert Scale (As respondents have to read statements in detail) Some Commonly Used Scales in Marketing CONSTRUCT SCALE DESCRIPTORS Attitude Very Bad Bad Neither bad nor Good Good Very Good Importance Not at All important Not important Neutral Important Very Important Satisfaction Very Dissatisfied Dissatisfied Neither dissatisfied nor satisfied Satisfied Very Satisfied Purchase Intent Definitely Probably will Not Buy will Not Buy Might or Might Not Buy Probably Will Buy Definitely Will Buy Purchase Frequency Never Sometimes Often Very Often Rarely Non-comparative Itemized Rating Scale decisions Six major decisions when constructing noncomparative itemized rating scales: • Number of scale categories to use • Balanced vs. Unbalanced scale • Odd or Even number of categories • Forced vs. Non-forced choice • Nature and Degree of Verbal description • Physical form of the Scale Number of scale categories • Trade-offs required: greater the number of scale categories, finer the discrimination among respondents/ objects; whereas respondents have a limit of handling categories. • Traditional guidelines: – 7±2 = 5 or 9 categories are ideal. – If the respondent is knowledgeable about the study, a larger scale categories give them better choice for discrimination, whereas if respondent are not much aware then fewer categories should be used to avoid confusion. – Nature of objects: in some objects finer discrimination is difficult – small scale categories should be used e.g. attitude or choice behavioral studies. – Mode of data collection: if done through telephonic survey – lesser categories are suitable whereas use maximum categories in personal surveys. – Type of data analysis: if higher level of data analysis is needed, 7 or more categories are recommended (As size of the Correlation coefficient – measure of relationship – is influenced by the number of scale categories – Coefficient decreases with reduction in scale categories - further impact on other higher level analysis). Balanced vs. Unbalanced Scales • Balanced scale: number of favourable and unfavourable categories are equal. Balanced Scale Unbalanced Scale Surfing the Internet is Surfing the Internet is ____ Extremely Good ____ Extremely Good ____ Very Good ____ Very Good ____ Good ____ Good ____ Bad ____ Somewhat Good ____ Very Bad ____ Bad ____ Extremely Bad ____ Very Bad Odd vs. Even number of Scales & Forced vs. Non forced scales • Odd number of categories: middle scale position is designated as neutral/ impartial. e.g. Likert scale is the most balanced rating scale proposed with odd number of categories (5) and a neutral point. • Odd categories are useful when possibility of neutral responses is there. • When respondent is forced to respond on categories and no response option is not there – even number of scale categories are used. Nature and Degree of Verbal Description • Scale categories may have verbal, numerical or pictorial descriptions. • It is recommended – to have verbal description for each category to reduce scale ambiguity. Physical Form or Configuration Rating Scale Configurations Cheer detergent is: 1) Very harsh --- 2) Very harsh 1 --2 --- --- --- --- --- Very gentle 3 4 5 6 7 Very gentle 3) . Very harsh . Cheer . . Neither harsh nor gentle . . . Very gentle 4) ____ ____ ____ ____ Very Harsh Somewhat Neither harsh harsh Harsh nor gentle 5) -3 Very harsh -2 -1 0 Neither harsh nor gentle ____ Somewhat gentle +1 ____ Gentle ____ Very gentle +2 +3 Very gentle Some Unique Rating Scale Configurations Thermometer Scale Instructions: Please indicate how much you like McDonald’s hamburgers by coloring in the thermometer. Start at the bottom and color up to the temperature level that best indicates how strong your preference is. Form: Like very much Dislike very much 100 75 50 25 0 Smiling Face Scale Instructions: Please point to the face that shows how much you like the Barbie Doll. If you do not like the Barbie Doll at all, you would point to Face 1. If you liked it very much, you would point to Face 5. Form: 1 2 3 4 5 Measurement Accuracy • Measurement is not the complete representation of the characteristics of studied object but an observation of it. So, it may go wrong. This causes Measurement error. • This results in the measurement or observed score being different from the true score of the characteristic (e.g. behaviour) being measured. The True Score Model is a mathematical model that provides the framework for understanding the accuracy of measurement. X O X T X S X R where XO = the observed score or measurement XT = the true score of the characteristic XS = systematic error XR = random error Measurement Accuracy cont… Two possible errors may occur: • Systematic/Constant errors: affects the measurement always in a constant way. It represents the stable factors that affect the observed score in the same way every time. • Random errors: factors that affect the observed score in different ways each time based on situations. e.g. noise/ distractions, emotions, fatigue etc. Potential Sources of Error in Measurement • Stable characteristics of the individual that influence the test score mostly: intelligence, social background and education. • Short-term or transient personal factors: health, emotions and fatigue. • Situational factors: presence of other people, noise, and distractions. • Sampling of items included in the scale: addition, deletion, or changes in the scale items. • Lack of clarity of the scale: inadequate instructions on the items. • Mechanical factors: poor printing, overcrowding items in the questionnaire, and poor design. • Administration of the scale: differences among interviewers. • Analysis factors: fault in scoring and using the statistical analysis. Measurement Accuracy cont… • In the long run, the expected mean of measurement errors should be zero. When the error term is zero, the observed score is the true score: XO = XT + 0; XO = XT • Therefore, if an instrument is reliable, the observed scores should be fairly consistent and stable throughout repeated measures. Many popular reliability estimate methods such as Cronbach’s coefficient alpha, test-retest are based upon this rationale. Reliability V/S Validity • Reliability means the extent to which scaling results are free from experimental (random) errors i.e. XR => extent to which a scale produces consistent scores/ results, if measurements are repeated. (factor of ‘trust’) • Systematic errors do not have adverse impact on reliability – as they effect measurement in a constant way and thus do not lead to inconsistency. • Random errors produces inconsistency – leads to lower reliability. • Validity means that the data obtained must be unbiased and relevant to describe the characteristics being measured – extent to which differences in observed scale scores reflect true differences among objects/ respondents response on the characteristics being observed. [ XO = XT ] • Reliability is a necessary, but not sufficient, condition for validity – because Systematic error may still be there. [ XO = XT + XS ] Reliability V/S Validity • Reliability refers to the repeatability of findings. If the study were to be done a second time, would it yield the same results? If so, the data are reliable. • If more than one person is observing behavior or some event, all observers should agree on what is being recorded in order to claim that the data are reliable. E.g. When people take a vocabulary test two times, their scores on the two occasions should be very similar. Reliability V/S Validity • Validity refers to the credibility or believability of the research. Are the findings genuine? • There are two aspects of validity: • Internal validity - the instruments or procedures used in the research measured what they were supposed to measure. Example: As part of a stress experiment, people are shown photos of war killings. After the study, they are asked how the pictures made them feel, and they respond that the pictures were very upsetting. In this study, the photos have good internal validity as stress producers. • External validity - the results can be generalized beyond the immediate study. It should also apply to people beyond the sample in the study. Measurement Accuracy Scale Evaluation Validity Reliability Test/ Retest Alternative Forms Internal Consistency Content Criterion convergent Construct discriminant Test – Retest Reliability • Suppose, we measure Job Satisfaction among sales executives at the first week and then measure it again two weeks later. Condition: • If measure of job satisfaction is ‘reliable’, the two scores must be highly correlated. • If not, the measure is ‘unstable’ and unreliable. • This is Known as test – retest reliability. Limitations of Test – Retest Reliability 1. It is sensitive to the time interval of two measurements i.e. longer the time interval, lower the reliability. 2. The initial measurement may alter the characteristic being measured i.e. measuring attitude towards low-fat milk may cause them to become more health conscious and their later response may be biased due to the positive attitude towards low fat milk. 3. Respondent may attempt to remember responses and may give a biased/ polished response next time. Alternative – Forms Reliability • Two equivalent form of scales are constructed. • Same respondent is measured at two different time intervals (usually 2-4 weeks). • Scores are correlated to assess the reliability. E.g. Highly satisfied – Slightly satisfied – Neither satisfied nor dissatisfied – Slightly Dissatisfied – Highly Dissatisfied and Very Satisfied – Satisfied – Neither satisfied nor dissatisfied – Dissatisfied – Very Dissatisfied Limitation: • It is time consuming and expensive to construct equivalent type of scales i.e. equivalent w.r.t. the content in the scale and they should have the same meaning, variance and intercorrelations. Internal Consistency Reliability • Internal consistency reliability is a measure of how well a test addresses different constructs and delivers reliable scores. For example, Store image is a multidimensional construct that includes: quality of merchandise, variety, assortments, service, price offers, convenience of location, store layout, and credit and billing facilities. • The internal consistency reliability test provides a measure that each of these particular constructs are measured correctly and reliably. measure of how well a test addresses different constructs and delivers reliable scores. Measure 1: SPLIT-HALVES TEST • For example, a questionnaire to measure attitude could be divided into odd and even questions. The results from both halves are statistically analysed, and if there is weak correlation between the two, then there is a reliability problem with the test. The division of the question into two sets must be random. Internal Consistency Reliability • Each item measures some aspect of the belief construct i.e. 1-5 (strongly agree & strongly disagree are among five items in 5 point scale). • Items used in the scale should be consistent to measure all aspects of characteristics i.e. behaviour/attitude etc. Measure 2: Cronbach’s Alpha Cronbach's alpha Internal consistency α ≥ .9 Excellent • Coefficient alpha/ Cronbach’s Alpha is used to measure this reliability and widely used. .9 > α ≥ .8 Good .8 > α ≥ .7 Acceptable • This coefficient varies b/w 0 to 1 and value of ≤ 0.6 indicates unsatisfactory internal consistency reliability. .7 > α ≥ .6 Questionable .6 > α ≥ .5 Poor .5 > α Unacceptable • Thus, this measure focuses on internal consistency of scale points in the scale type. For example: a series of questions might ask the respondent to rate their response between one and five. Cronbach's Alpha gives a score of between zero and one, with 0.7 generally accepted as a sign of acceptable reliability. The test also takes into account both the size of the sample and the number of potential responses. A 40-question test with possible ratings of 1 – 5 is seen as having more accuracy than a tenquestion test with three possible levels of response. Validity Perfect validity requires that there be no measurement error. • Content or Face validity is a subjective but systematic evaluation of how well the content of a scale represents the measurement and whether the scale items adequately cover the entire domain of the construct being measured. E.g. a scale designed to measure store image would be inadequate, if it omitted any major dimension like quality, variety or cleanliness etc. • Criterion validity reflects whether a scale performs as expected in relation to the variables selected (criterion variables). It may include demographic and psychographic characteristics; attitudinal and behavioural measures. Based on the time period involved, it may take two forms: – Concurrent validity – Predictive validity Criterion Validity • Predictive Validity: Collects data on the scale at one time and data on criterion variables at a future time. E.g. attitude and intensions towards a commodity brand captured by survey can be used to predict future purchases of a scanner panel. Then their future purchases are tracked by scanner data (observation). The predicted and the actual purchases are compared to assess the predictive validity of attitudinal/ intension scale. • Concurrent Validity: this validity is assessed when data on the scale is evaluated and at the same time data on the criterion variables are collected and then both are compared. Construct Validity • Most difficult type of Validity to establish. • Theoretical model is constructed based on the variables to be measured. • This been tested out with further research to either reject or accept or modify the model. • Construct validity defines how a well a test or experiment measures up to its claims. • Construct validity is a device used almost exclusively in social sciences, psychology and education. E.g. research might design to measure whether an educational program increases artistic ability amongst pre-school children. Construct Validity Example: • Construct validity is a measure of how well the test measures the construct. • Construct validity is valuable in social sciences, where there is a lot of subjectivity to concepts. Often, there is no accepted unit of measurement for constructs and even fairly well known ones, such as IQ, are open to debate. CONVERGENT AND DISCRIMINANT VALIDITY • The basic difference between convergent and discriminant validity is that Convergent validity tests whether constructs that should be related, are related. Discriminant validity tests whether believed unrelated constructs are, unrelated. E.g. Imagine that a researcher wants to measure self-esteem, but also knows that the other four constructs are related to self-esteem, and have some overlap. The ultimate goal is to attempt to isolate self-esteem. CONVERGENT AND DISCRIMINANT VALIDITY • Convergent validity would test that the four other constructs are, related to self-esteem in the study. • Discriminant validity would ensure that, in the study, the non-overlapping factors do not overlap. For example, self esteem and intelligence (not in the figure) should not relate. HOW TO MEASURE CONSTRUCT VARIABILITY? • For major and extensive research, especially in education and language studies, most researchers test the construct validity before the main research. • These Pilot Studies establish the strength of their research and allow them to make any adjustments. • The other option is an intervention study, where a group with low scores in the construct is tested, taught the construct, and then remeasured. E.g. testing the knowledge level of a subject to the students. – If there is a significant difference pre and post-test, usually analyzed with simple statistical tests, then this proves good construct validity.