Data in empirical research Some fundamental issues Daniel Gile daniel.gile@yahoo.com www.cirinandgile.com D Gile Data in empir res 1 Reminder: Data, the foundation of progress in CSA (1) In HSA, scholars can observe reality, and then speculate and theorize with much freedom The norms of caution and rigorous inferencing make this impossible in CSA In CSA theoretical speculation is acceptable - As a starting point for further empirical exploration - As a basis for theory construction, but the theory will need to be tested empirically - As tentative ideas to explain findings But unlike the situation in HSA, in CSA, all progress is by definition based on data and their analysis D Gile Data in empir res 2 Reminder: Data, the foundation of progress in CSA (2) So the quality of research is limited by the quantity and quality of the data on which it is based In many cases, it is difficult to: - Collect valid, relevant data - Measure the data in a way that will help advance towards finding an answer to the research question(s) - Extrapolate from the data that can be collected on part of the environment or population to which the research question(s) apply to the whole population If the data are not valid or representative of the population, no reliable inferences on the population can be made If cannot measure them adequately, they are of limited use D Gile Data in empir res 3 Collecting data – Access and indicators Access to the data is often problematic: Cost, confidentiality, difficult to detect… Cost and complexity of technical equipment Physical access to the location Permission to observe/record… But more fundamentally How do you gain access to the content of dreams? How do you gain access to mental processes? How do you gain access to skills for observation? You cannot observe them directly What you generally observe (and measure) are indicators In other words, data are not the phenomenon itself, but an indicator of the phenomenon – more later D Gile Data in empir res 4 Collecting data – Identifying target data When collecting data on a phenomenon or an indicator Inot always easy to identify the target data from other information picked up When studying language skills and using errors and infelicities as an indicator, How identify error and infelicities in linguistic data? When studying translation tactics (decisions made when confronting a problem) How distinguish between the result of a tactic and the result of insufficient skills? (e.g. omissions, small semantic changes) D Gile Data in empir res 5 Problems with data validity (1) Reminder: Research explores various phenomena in Reality Generally, data are not the phenomena themselves,but something believed to ‘correspond’ to them in some way For instance, When studying voting behavior, the data used, e.g. the number of ballots cast in favor of a certain candidate, are not the voting behavior itself. They are something that reflects voting behavior. One could say that generally, data are indicators Though the term ‘indicator’ tends to be used to call ‘something’ that is even more remotely connected to the reality it is supposed to represent Data are said to be valid if they correspond strongly to what they are supposed to correspond to. D Gile Data in empir res 6 Problems with data validity (2) Data are valid if one or some of their features correspond strongly to what they are supposed to correspond to in the object of study. Such correspondence may be required for detection only i.e. if and only if a particular feature of the object of study exists, the data take on a particular feature and vice-versa (the presence of particular objects on archaeological sites is valid data to indicate skills/beliefs/rituals in the population which lived in these particular sites) Quantitative correspondence may be required in other cases (e.g. measuring the amount of radioactivity, of a particular chemical substance etc…) D Gile Data in empir res 7 Data validity – uncertain correspondence (1) Voting statistics are a valid indicator of voting behavior What about voting intentions as stated in interviews? are they valid as an indicator of voting behavior? They say something about voting behavior, but that something is not enough to determine how people are going to vote Because : Some people may change their mind Some people do not speak the truth Data Phenomenon D Gile Data in empir res 8 Data validity – uncertain correspondence (2) One frequent problem with data validity is the uncertain correspondence between the data and the target phenomenon e.g. Native speakers’ assessment of a non-native speaker’s mastery of their language (How sensitive are they to errors and infelicities? What are their personal norms? What are their expectations?…) Students’ assessment of their teachers (Personal bias, political correctness…) Problems because of interference from affective factors + (often subconscious) desire to preserve self-image Ex.: In Translation Studies, relative weight of quality components This problem is particularly frequent in behavioral sciences D Gile Data in empir res 9 Data validity – partial correspondence (1) Are police reports about sexual assaults a valid indicator of actual sexual assault activity in a given city? Most police reports about sexual assaults probably report genuine sexual assaults, but there are many which are never reported because the victims are afraid to report them or ashamed So the data are valid for one part of the phenomenon only Data Phenomenon ? D Gile Data in empir res 10 Data validity – partial correspondence (2) When data are valid for one part of the phenomenon only, whereas exploration of the whole phenomenon is sought How safe is it to extrapolate from info on part of the phenomenon only? (This is distinct from the issue of representativeness, taken up later) Example: A single test to test language proficiency? Language proficiency is multi-dimensional (declarative knowledge, procedural knowledge, distinct skills like pronunciation, fluency, reading ability, listening comprehension ability, flexibility in using various registers…) D Gile Data in empir res 11 Validity of other research environment components The validity of the data/the indicator chosen is not the only validity issue in empirical research As will be seen later, especially in experimental research Ecological validity can be an issue Task Environment Participants D Gile Data in empir res 12 Measurable data Often, advancing towards an answer to the research question(s) requires some kind of measurement of data (intensity, magnitude, amount, frequency…) In some cases, this is rather easy (thermometer, number of ballots cast, money/time spent…) In other cases, it is difficult (intensity of feelings, ‘amount’ of deviation from a norm…) D Gile Data in empir res 13 Representative data (1) Generally, it is not possible to have data on all the object of study (cost, time [including future], physical access…) You can only access data on part of it They may be valid and measurable, but are they representative of the whole object of study? Or of part of it only? Data Phenomenon D Gile Data in empir res 14 Representative data (2) If the phenomenon is very homogeneous If the accessible part has the same relevant features as the whole The data are said to be representative If not, you cannot legitimately make inferences from your sample on the whole Data Phenomenon D Gile Data in empir res 15 Validity and Representativeness They are not the same: Data can be valid, that is, provide reliable indications on part of a phenomenon/object of study (for instance, on a sample of people from a population) Without being representative Because it is possible that the characteristics of the sample are different from the characteristics of the population (for instance, the average height of a population, if the sample of people used has a high proportion of basket-ball players) D Gile Data in empir res 16 Priorities and strategies Validity is particularly important Scientifically legitimate inferences on a phenomenon can only be made if the data are valid Representativeness is less of a problem Provided no generalization is asserted Measurability can be important If only to measure the actual impact of a particular factor or feature on the object of study Sometimes, measurability can be constructed (scales) But limited measurability does not mean nothing can be learned about the object of study → Qualitative research D Gile Data in empir res 17 The effects of variability One other important issue in empirical research is variability Variability can be intrinsic to the phenomenon (for instance in meteorological phenomena) It can also be a feature of the data collected Due to intrinsic variability in the phenomenon and/or Heterogeneity in the phenomenon and/or Variability in the collection procedures Its effects can be very large D Gile Data in empir res 18 CASE STUDY (FICTION): THE EFFECT OF EXPERIENCE ON TRANSLATION QUALITY • Suppose you want to investigate the effect of experience on translation quality • Suppose that in reality, on average, there is a fast progression along the learning curve during the first 5 years, and over the next decades, translators continue to improve, but at a lower and lower speed D. Gile Variability 19 “REAL” AVERAGE PERFORMANCE VS. EXPERIENCE As measured by some valid indicator on a scale from 1 to 10 Exper. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs Qual. 1 5 7 8 8.5 8.8 D. Gile Variability 20 “Real” average learning curve 10 9 8 7 6 5 4 3 2 1 0 0 ye ar 5 ye ar s 10 ye ar s 15 ye ar s 20 ye ar s 25 ye ar s Quality D. Gile Variability 21 Effects of attitude - The translators’ attitude towards translation may influence the quality of their work. - Attitudes may change over time - Suppose that attitudes are very positive in the beginning, that they become negative after a while because translators are disappointed with market conditions, and that they gradually become more positive when they adapt to the situation. D. Gile Variability 22 Experience vs. Attitude Very positive to very negative to positive Exp. 0 yrs Attit. + + + 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs ++ --- D. Gile Variability - + + 23 The effect of attitude: two scenarios Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs Large influ. +3 +2 -3 -1 +1 +1 Small influ. +0.3 +0.2 -0.3 -0.1 +0.1 +0.1 D. Gile Variability 24 The effect of attitude 12 10 8 Real Large influ Small influ 6 4 2 0 0 5 10 15 20 D. Gile Variability 25 25 The effect of attitude - In the small influence scenario, the output pattern is only changed marginally - In the large influence scenario, it is changed considerably. In particular, real improvement seems to occur only after 10 years of experience. D. Gile Variability 26 Controllability - Experimenters may be able to control attitude, for instance by telling participants that the quality of their output is important, or that they will be assessed by peers, etc. - But it is not realistic to assume they can control everything – the participants’ personality, fatigue, biorhythm, likes or dislikes of certain types of texts, themes, etc. D. Gile Variability 27 The effect of uncontrolled variability Assume a variability of up to ±30%, either intrinsic or from uncontrolled factors: Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs Var. +30% -30% -30% D. Gile Variability +30% -20% -30% 28 The effect of uncontrolled variability 12 10 8 Real w/ variab, 6 4 2 0 0 5 10 15 20 D. Gile Variability 25 29 The effect of variability - With such variability, very common in empirical studies in translation and interpreting (actually, in such studies variability is often of several hundred percent), the underlying “true” pattern is severely distorted - In particular, from the data, it seems that improvement occurs for 15 years, after which there is a steady decline in the quality of the translation output. D. Gile Variability 30 Consequences and conclusion (1) Variability is a major enemy of research, in that it is likely to hide ‘true’ trends and suggest false trends. In experiments, some variability is counter-balanced by the use of control over relevant variables, both in sampling and in the control of environmental and independent variables Variability is further reduced by strict design and implementation of the experimental procedure Replications also reduce the effects of variability by providing data for different constellation of parameters D. Gile Variability 31 Consequences and conclusion (2) But in behavioral sciences, residual variability is often very large If you plan to do experimental research, expect to find high variability, and do not be disappointed if this happens. Unless you need to arrive at a ‘clear-cut result’, results that are not clear cut can also be of interest They may show for instance that there is no regular, clear ‘superiority’ of one method or one condition over another so don’t let the probability of not reaching ‘significance’ stop you from doing the research. D. Gile Variability 32 The sensitivity of indicators/tools (1) The concepts of ‘signal’ and ‘noise’: (from radio transmission) In empirical research, when seeking to collect data, you need tools with a certain sensitivity For instance, casual listeners will not necessarily spot traces of foreign accent or infelicities in a non-native speaker Their sensitivity to these phenomena may be too low And they will miss the ‘signal’ which is supposed to be detected Other listeners may be too sensitive and mistake ‘native’ deviations from norms for signs of non-native language use (certain violations of rules of grammar, false cognates…) D. Gile data in empir 33 Sensitivity of data collection tools (2) a S e n s i t i v i t y At level a At level b At level c b c Not sensitive enough. Does not pick up the signal, or picks up part of it only Appropriate sensitivity. Picks up the signal, not the noise Too sensitive. Picks up the signal and the noise D. Gile data in empir 34 The sensitivity of indicators/tools (3) Very high sensitivity which may pick up the ‘noise’ (i.e. non-signal) is all right if it is then possible to filter out the noise from the signal But often, this is not possible, Because the noise is very similar to the signal Other tactics may help One is triangulation, i.e. using a different method to throw a different light on the phenomenon/data, including qualitative methods D. Gile data in empir 35