Validity in Reserch Dr Ayaz Afsar 1 Introduction There are many different types of validity and reliability. Threats to validity and reliability can never be erased completely; rather the effects of these threats can be attenuated by attention to validity and reliability throughout a piece of research. I will discuss validity and reliability in quantitative and qualitative, naturalistic research. It suggests that both of these terms can be applied to these two types of research, though how validity and reliability are addressed in these two approaches varies. Finally validity and reliability are addressed, using different instruments for data collection. It is suggested that reliability is a necessary but insufficient condition for validity in research; reliability is a necessary precondition of validity, and validity may be a sufficient but not necessary condition for reliability. 2 Defining validity Validity is an important key to effective research. If a piece of research is invalid then it is worthless. Validity is thus a requirement for both quantitative and qualitative/naturalistic research. While earlier versions of validity were based on the view that it was essentially a demonstration that a particular instrument in fact measures what it purports to measure, more recently validity has taken many forms. For example, in qualitative data validity might be addressed through the honesty, depth, richness and scope of the data achieved, the participants approached, the extent of triangulation and the disinterestedness or objectivity of the researcher (Winter 2000). In quantitative data, validity might be improved through careful sampling, appropriate instrumentation and appropriate statistical treatments of the data. 3 Cont…Defining validity It is impossible for research to be 100 per cent valid; that is the optimism of perfection. Quantitative research possesses a measure of standard error which is inbuilt and which has to be acknowledged. In qualitative data the subjectivity of respondents, their opinions, attitudes and perspectives together contribute to a degree of bias. Validity, then, should be seen as a matter of degree rather than as an absolute state (Gronlund 1981). Hence at best we strive to minimize invalidity and maximize validity. There are several different kinds of validity 4 Kinds of Validity • • • • • • • • • • • • • • • • • • content validity criterion-related validity construct validity internal validity external validity concurrent validity face validity jury validity predictive validity consequential validity systemic validity catalytic validity ecological validity cultural validity descriptive validity interpretive validity theoretical validity evaluative validity. 5 Cont…Kinds of Validity It is not my intention to discuss all of these terms in depth. Rather the main types of validity will be addressed. The argument will be made that, while some of these terms are more comfortably the preserve of quantitative methodologies, this is not exclusively the case. Indeed, validity is the touchstone of all types of educational research. That said, it is important that validity in different research traditions is faithful to those traditions; it would be absurd to declare a piece of research invalid if it were not striving to meet certain kinds of validity, e.g. generalizability, replicability and controllability. Hence the researcher will need to locate discussions of validity within the research paradigm that is being used. This is not to suggest, however, that research should be paradigm-bound, that is a recipe for stagnation and conservatism. 6 Cont…Kinds of Validity Nevertheless, validity must be faithful to its premises and positivist research has to be faithful to positivist principles, for example: controllability replicability predictability the derivation of laws and universal statements of behaviour context-freedom fragmentation and atomization of research randomization of samples observability. 7 Naturalistic research • By way of contrast, naturalistic research has several principles (Lincoln and Guba 1985; Bogdan and Biklen, 1992): • The natural setting is the principal source of data. • Context-boundedness and ‘thick description’ are important. • Data are socially situated, and socially and culturally saturated. • The researcher is part of the researched world. • As we live in an already interpreted world, a doubly hermeneutic exercise (Giddens 1979) is necessary to understand others’ understandings of the world; the paradox here is that the most sufficiently complex instrument to understand human life is another human, but that this risks human error in all its forms. • There should be holism in the research. • The researcher- rather than a research tool- is the key instrument of research. • The data are descriptive. • There is a concern for processes rather than simply with outcomes. 8 Data are analysed inductively rather than using a priori categories. Data are presented in terms of the respondents rather than researchers. Seeing and reporting the situation should be through the eyes of participants – from the native’s point of view (Geertz 1974). Respondent validation is important. Catching meaning and intention are essential. Indeed Maxwell (1992) argues that qualitative researchers need to be cautious not to be working within the agenda of the positivists in arguing for the need for research to demonstrate concurrent, predictive, convergent, criterion related, internal and external validity. The claim is made (Agar 1993) that, in qualitative data collection, the intensive personal involvement and in-depth responses of individuals secure a sufficient level of validity and reliability. 9 • • • • • • • Maxwell (1992) argues for five kinds of validity in qualitative methods that explore his notion of ‘understanding’: Descriptive validity (the factual accuracy of the account, that it is not made up, selective or distorted): in this respect validity subsumes reliability. Interpretive validity (the ability of the research to catch the meaning, interpretations, terms, intentions that situations and events, i.e. data, have for the participants/subjects themselves, in their terms). Theoretical validity (the theoretical constructions that the researcher brings to the research, including those of the researched). Generalizability (the view that the theory generated may be useful in understanding other similar situations). Evaluative validity (the application of an evaluative, judgemental of that which is being researched, rather than a descriptive, explanatory or interpretive framework). Both qualitative and quantitative methods can address internal and external validity. 10 Internal validity Internal validity seeks to demonstrate that the explanation of a particular event, issue or set of data which a piece of research provides can actually be sustained by the data. The findings must describe accurately the phenomena being researched. In ethnographic research internal validity can be addressed in several ways: using low-inference descriptors using multiple researchers using participant researchers using peer examination of data using mechanical means to record, store and retrieve data. In ethnographic, qualitative research there are several overriding kinds of internal validity (LeCompte and Preissle 1993: 323–4): 11 Internal validity confidence in the data the authenticity of the data (the ability of the research to report a situation through the eyes of the participants) the cogency of the data the soundness of the research design the credibility of the data the auditability of the data the dependability of the data the confirmability of the data. 12 External validity External validity refers to the degree to which the results can be generalized to the wider population, cases or situations. The issue of generalization is problematical. For positivist researchers generalizability is a sine qua non, while this is attenuated in naturalistic research. For positivists variables have to be isolated and controlled, and samples randomized, while for ethnographers human behaviour is infinitely complex, irreducible, socially situated and unique. Generalizability in naturalistic research is interpreted as comparability and transferability. Schofield (1990: 200) suggests that it is important in qualitative research to provide a clear, detailed and in-depth description so that others can decide the extent to which findings from one piece of research are generalizable to another situation, i.e. to address the twin issues of comparability and translatability. 13 External validity Lincoln and Guba (1985: 316) caution the naturalistic researcher against this; they argue that it is not the researcher’s task to provide an index of transferability; rather, they suggest, researchers should provide sufficiently rich data for the readers and users of research to determine whether transferability is possible. In this respect transferability requires thick description. Positivist researchers are more concerned to derive universal statements of general social processes rather than to provide accounts of the degree of commonality between various social settings (e.g. schools and classrooms). 14 In naturalistic research threats to external validity include (Lincoln and Guba 1985: 189, 300): selection effects: where constructs selected in fact are only relevant to a certain group setting effects: where the results are largely a function of their context history effects: where the situations have been arrived at by unique circumstances and, therefore, are not comparable construct effects: where the constructs being used are peculiar to a certain group. 15 Content validity To demonstrate this form of validity the instrument must show that it fairly and comprehensively covers the domain or items that it purports to cover. It is unlikely that each issue will be able to be addressed in its entirety simply because of the time available or respondents’ motivation to complete, for example, a long questionnaire. If this is the case, then the researcher must ensure that the elements of the main issue to be covered in the research are both a fair representation of the wider issue under investigation and that the elements chosen for the research sample are themselves addressed in depth and breadth. Careful sampling of items is required to ensure their representativeness. 16 For example, if the researcher wished to see how well a group of students could spell 1,000 words in French but decided to have a sample of only 50 words for the spelling test, then that test would have to ensure that it represented the range of spellings in the 1,000 words – maybe.by ensuring that the spelling rules had all been included or possible spelling errors had been covered in the test in the proportions in which they occurred in the 1,000 words. 17 Construct validity A construct is an abstract; this separates it from the previous types of validity which dealt in actualities – defined content. In this type of validity agreement is sought on the ‘operationalized’ forms of a construct, clarifying what we mean when we use this construct. Hence in this form of validity the articulation of the construct is important; is the researcher’s understanding of this construct similar to that which is generally accepted to be the construct? 18 Construct validity For example, let us say that the researcher wished to assess a child’s intelligence (assuming, for the sake of this example, that it is a unitary quality). The researcher could say that he or she construed intelligence to be demonstrated in the ability to sharpen a pencil. How acceptable a construction of intelligence is this? Is not intelligence something else (e.g. that which is demonstrated by a high result in an intelligence test)? To establish construct validity the researcher would need to be assured that his or her construction of a particular issue agreed with other constructions of the same underlying issue, e.g. intelligence, creativity, anxiety, motivation. … In qualitative/ethnographic research construct validity must demonstrate that the categories that the researchers are using are meaningful to the participants themselves, i.e. that they reflect the way in which the participants actually experience and construe the situations in the research, that they see the situation through the actors’ eyes. 19 Ecological validity In quantitative, positivist research variables are frequently isolated, controlled and manipulated in contrived settings. For qualitative, naturalistic research a fundamental premise is that the researcher deliberately does not try to manipulate variables or conditions, that the situations in the research occur naturally. The intention here is to give accurate portrayals of the realities of social situations in their own terms, in their natural or conventional settings. In education, ecological validity is particularly important and useful in charting how policies are actually happening ‘at the chalk face’. 20 For ecological validity to be demonstrated it is important to include and address in the research as many characteristics in, and factors of, a given situation as possible. The difficulty for this is that the more characteristics are included and described, the more difficult it is to abide by central ethical tenets of much research—non-traceablity, anonymity and non-identifiability. 21 Cultural validity A type of validity related to ecological validity is cultural validity (Morgan 1999). This is particularly an issue in cross-cultural, intercultural and comparative kinds of research, where the intention is to shape research so that it is appropriate to the culture of the researched, and where the researcher and the researched are members of different cultures. Cultural validity is defined as ‘the degree to which a study is appropriate to the cultural setting where research is to be carried out’ (Joy 2003: 1). Cultural validity applies at all stages of the research, and affects its planning, implementation and dissemination. It involves a degree of sensitivity to the participants, cultures and circumstances being studied. 22 Questions the researchers may face Is the research question understandable and of importance to the target group? Is the researcher the appropriate person to conduct the research? Are the sources of the theories that the research is based on appropriate for the target culture? How do researchers in the target culture deal with the issues related to the research question (including their method and findings)? Are appropriate gatekeepers and informants chosen? Are the research design and research instruments ethical and appropriate according to the standards of the target culture? How do members of the target culture define the salient terms of the research? Are documents and other information translated in a culturally appropriate way? Are the possible results of the research of potential value and benefit to the target culture? 23 Cont. Does interpretation of the results include the opinions and views of members of the target culture? Are the results made available to members of the target culture for review and comment? Does the researcher accurately and fairly communicate the results in their cultural context to people who are not members of the target culture? 24 Catalytic validity Catalytic validity embraces the paradigm of critical theory. Put neutrally, catalytic validity simply strives to ensure that research leads to action. However, the story does not end there, for discussions of catalytic validity are substantive; like critical theory, catalytic validity suggests an agenda. The agenda for catalytic validity is to help participants to understand their worlds in order to transform them. The agenda is explicitly political, for catalytic validity suggests the need to expose whose definitions of the situation are operating in the situation. 25 Lincoln and Guba (1986) suggest that the criterion of ‘fairness’ should be applied to research, meaning that it should not only augment and improve the participants’ experience of the world, but also improve the empowerment of the participants. 26 Cont. Catalytic validity – a major feature in feminist research which needs to permeate all research – requires solidarity in the participants, an ability of the research to promote emancipation, autonomy and freedom within a just, egalitarian and democratic society to reveal the distortions, ideological deformations and limitations that reside in research, communication and social structures (see also LeCompte and Preissle 1993).Validity, it is argued (Mishler 1990; Scheurich 1996), is no longer an ahistorical given, but contestable, suggesting that the definitions of valid research reside in the academic communities of the powerful. Catalytic validity reasserts the centrality of ethics in the research process, for it requires researchers to interrogate their allegiances, responsibilities and self-interestedness (Burgess 1989). 27 Consequential validity • Partially related to catalytic validity is consequential validity, which argues that the ways in which research data are used (the consequences of the research) are in keeping with the capability or intentions of the research, i.e. the consequences of the research do not exceed the capability of the research and the action-related consequences of the research are both legitimate and fulfilled. • Clearly, once the research is in the public domain, the researcher has little or no control over the way in which it is used. • However, and this is often a political matter, research should not be used in ways in which it was not intended to be used, for example by exceeding the capability of the research data to make claims, by acting on the research in ways that the research does not support (e.g. by using the research for illegitimate epistemic support), by making illegitimate claims by using the research in unacceptable ways (e.g. by selection, distortion) and by not acting on the research in ways that were agreed, i.e. errors of omission and commission. 28 A clear example of consequential validity is formative assessment. This is concerned with the extent to which students improve as a result of feedback given, hence if there is insufficient feedback for students to improve, or if students are unable to improve as a result of – a consequence of – the feedback, then the formative assessment has little consequential validity. 29 Criterion-related validity This form of validity endeavours to relate the results of one particular instrument to another external criterion. Within this type of validity there are two principal forms: predictive validity and concurrent validity. Predictive validity is achieved if the data acquired at the first round of research correlate highly with data acquired at a future date. A variation on this theme is encountered in the notion of concurrent validity. To demonstrate this form of validity the data gathered from using one instrument must correlate highly with data gathered from using another instrument. For example, suppose it was decided to research a student’s problem-solving ability. The researcher might observe the student working on a problem, or might talk to the student about how s/he is tackling the problem, or might ask the student to write down how s/he tackled the problem. 30 Here the researcher has three different data-collecting instruments – observation, interview and documentation respectively. If the results all agreed – concurred – that, according to given criteria for problemsolving ability, the student demonstrated a good ability to solve a problem, then the researcher would be able to say with greater confidence (validity) that the student was good at problem-solving than if the researcher had arrived at that judgement simply from using one instrument. 31 Cont. Here the researcher has three different data- collecting instruments – observation, interview and documentation respectively. If the results all agreed – concurred – that, according to given criteria for problem-solving ability, the student demonstrated a good ability to solve a problem, then the researcher would be able to say with greater confidence (validity) that the student was good at problem-solving than if the researcher had arrived at that judgement simply from using one instrument. An important partner to concurrent validity, which is also a bridge into later discussions of reliability, is triangulation. 32 Triangulation • Triangulation may be defined as the use of two or more methods of data collection in the study of some aspect of human behaviour. • The use of multiple methods, or the multi-method approach as it is sometimes called, contrasts with the ubiquitous but generally more vulnerable single method approach that characterizes so much of research in the social sciences. • In its original and literal sense, triangulation is a technique of physical measurement: maritime navigators, military strategists and surveyors, for example, use (or used to use) several locational markers in their endeavours to pinpoint a single spot or objective. • By analogy, triangular techniques in the social sciences attempt to map out, or explain more fully, the richness and complexity of human behaviour by studying it from more than one standpoint and, in so doing, by making use of both quantitative and qualitative data. • Triangulation is a powerful way of demonstrating concurrent validity, particularly in qualitative research (Campbell and Fiske 1959). 33 Cont…Triangulation The advantages of the multi-method approach in social research are manifold and I will examine two of them. First, whereas the single observation in fields such as medicine, chemistry and physics normally yields sufficient and unambiguous information on selected phenomena, it provides only a limited view of the complexity of human behaviour and of situations in which human beings interact. It has been observed that as research methods act as filters through which the environment is selectively experienced, they are never a theoretical or neutral in representing the world of experience (Smith 1975). Exclusive reliance on one method, therefore, may bias or distort the researcher’s picture of the particular slice of reality being investigated. The researcher needs to be confident that the data generated are not simply artefacts of one specific method of collection (Lin 1976). 34 I come now to a second advantage: some theorists have been sharply critical of the limited use to which existing methods of inquiry in the social sciences have been put. The use of triangular techniques, it is argued, will help to overcome the problem of ‘method-boundedness’, as it has been termed; indeed Gorard and Taylor (2004) demonstrate the value of combining qualitative and quantitative methods. 35 Types of triangulation & their characteristics We have just seen how triangulation is characterized by a multi-method approach to a problem in contrast to a single-method approach. Denzin (1970b) has, however, extended this view of triangulation to take in several other types as well as the multi-method kind which he terms ‘methodological triangulation’: Time triangulation: this type attempts to take into consideration the factors of change and process by utilizing cross-sectional and longitudinal designs. Space triangulation: this type attempts to overcome the parochialism of studies conducted in the same country or within the same subculture by making use of cross-cultural techniques. Combined levels of triangulation: this type uses more than one level of analysis from the three principal levels used in the social sciences, namely, the individual level, the interactive level (groups), and the level of collectivities (organizational, cultural or societal). 36 Cont…Types of triangulation and their characteristics Theoretical triangulation: this type draws upon alternative or competing theories in preference to utilizing one viewpoint only. Investigator triangulation: this type engages more than one observer, data are discovered independently by more than one observer (Silverman 1993: 99). Methodological triangulation: this type uses either the same method on different occasions, or different methods on the same object of study. 37 Many studies in the social sciences are conducted at one point only in time, thereby ignoring the effects of social change and process. Time triangulation goes some way to rectifying these omissions by making use of cross-sectional and longitudinal approaches. Crosssectional studies collect data at one point in time; longitudinal studies collect data from the same group at different points in the time sequence. 38 Ensuring validity It is very easy to slip into invalidity; it is both insidious and pernicious as it can enter at every stage of a piece of research. The attempt to build out invalidity is essential if the researcher is to be able to have confidence in the elements of the research plan, data acquisition, data processing analysis, interpretation and its ensuing judgment. At the design stage, threats to validity can be minimized by: choosing an appropriate time scale ensuring that there are adequate resources for the required research to be undertaken selecting an appropriate methodology for answering the research questions selecting appropriate instrumentation for gathering the type of data required using an appropriate sample (e.g. one which is representative, not too small or too large) 39 Cont…Ensuring validity • demonstrating internal, external, content, concurrent and construct validity and ‘operationalizing’ the constructs fairly • ensuring reliability in terms of stability (consistency, equivalence, splithalf analysis of test material) • selecting appropriate foci to answer the research questions • devising and using appropriate instruments: • ensuring that readability levels are appropriate; avoiding any ambiguity of instructions, terms and questions; using instruments that will catch the complexity of issues; • avoiding leading questions; • Ensuring that the level of test is appropriate – e.g. neither too easy nor too difficult; avoiding test items with little discriminability; • Avoiding making the instruments too short or too long; • avoiding too many or too few items for each issue • avoiding a biased choice of researcher or research team (e.g. insiders or outsiders as researchers). 40 The End 41