OBSERVATIONAL STUDIES AND EXPERIMENTAL STUDIES IN THE INVESTIGATION OF CONFERENCE INTERPRETING Daniel Gile Université Lumière Lyon 2 and ISIT, Paris Published in Target 10:1.69-93. 1998 Abstract: In conference interpreting research (IR), empirical investigation can be classified as observational or experimental. The former can be used for exploration, analysis and hypothesis-testing, and is either interactive or non-interactive. Besides its conventional role of hypothesis-testing, the latter can be exploratory. The main methodological problems in both are related to validity and representativity and to quantification. At this stage, the most important contribution to interpreting research can be expected from observational procedures. Simple descriptive statistics and uncomplicated quantitative processing of the data still have much to offer. Résumé: La recherche empirique sur l'interprétation peut être classée comme observationnelle ou expérimentale. La recherche observationnelle est interactive ou non et peut servir à l'exploration, à l'analyse et à la vérification d'hypothèses. Utilisée généralement pour la vérification d'hypothèses, la recherche expérimentale peut également être exploratoire. Les principaux problèmes méthodologiques dans ces deux démarches se posent en matière de validité et de représentativité et à propos de la quantification. A ce stade, c'est encore la recherche observationnelle qui présente le plus important apport potentiel. Les statistiques descriptives et des méthodes quantitatives simples gardent un grand intérêt. 1. Introduction As Translation Studies evolves into an academic (inter)discipline (Snell-Hornby, Pöchhacker and Kaindl 1994, Wilss 1996), it has also been migrating from a predominantly 'philosophical' and prescriptive approach towards forms of investigation more consistent with the norms of research in more established disciplines (see Holmes, Lambert and Van den Broeck 1978, Toury 1995). At the same time, there is more empirical research as opposed to essay-type reflection, as exemplified inter alia by Tirkkonen-Condit's regular endeavors in this direction (Tirkkonen-Condit and Condit 1989, Tirkkonen-Condit 1991, Tirkkonen-Condit and Laffling 1993) and by the work on Think Aloud Protocols spearheaded by German investigators (see Krings 1986, Lörscher 1986, Königs 1987, Kiraly 1995, Wilss 1996) - also see the collection of papers in Meta 41:1(1996) and Séguinot 1989. While the discipline is maturing, it is still vulnerable to various methodological weaknesses, as discussed in Toury 1991, 1995. Conference interpreting research (hereafter IR) is less mature and may be facing more fundamental questions about its identity and the way ahead (Gile 1995a, Gambier et al 1997). One fundamental decision to be made by investigators selecting empirical research projects is choosing between the experimental and observational paradigm. In IR, the present trend seems to favor experimental studies, thus possibly depriving the discipline of much useful data, and even of a large number of empirical research projects, since beginners and students who may be able and willing to carry out simple observational projects could hesitate to take on, or fail to complete experimental ones. This paper discusses methodological issues linked to this choice. 2. Definitions In the literature, a fundamental distinction is made between "theoretical research" which, roughly speaking, focuses on the intellectual processing of ideas, and "empirical research", which centers around the collection and processing of data. In principle, this distinction refers to different relative weights in a bipolar activity, not to the existence of two separate paradigms, one ignoring theory and the other ignoring data. D.Gile 1996 Obsvsexp page 1 Empirical investigation can be further classified into two broad categories: observational research, also called 'naturalistic', which consists in studying situations and phenomena as they occur 'naturally' in the field, and experimental research, in which situations and phenomena are generated for the specific purpose of studying them. The study of situations and phenomena initially generated for other purposes will be defined in this paper as observational. Classroom experiments will be considered experimental in this sense of the word if they are created for research purposes, and observational if they are initially designed for didactic purposes and are processed scientifically as an afterthought. As is the case in any classification, class boundaries can be fuzzy. For instance, having translators do their professional work in a room with a camera so that their activity can be observed with some precision (Séguinot 1989) could be considered an observational operation in spite of the controlled spatial environment, while having them work with a carefully controlled set of dictionaries and other documents could be enough to have the study cross the line into experimentation. However, boundary problems should not cause difficulty in the present discussion. As a footnote, one might add that in literary and historical studies, including those with a translation focus, the divisions explained above are less relevant, since research is based on pre-existing data and is therefore almost necessarily observational (though one could conceive of experimental setups in literary research). 3. Basic approaches in observational research Over the years, research methods have become increasingly sophisticated. While in sociology, in ethnology, in ethnolinguistics, in political science and in other disciplines, observational investigation is still the core approach, in other sciences, and in particular in psychology and in psychology-related disciplines, there has been a shift towards experimental procedures to the detriment of observational studies, which now tend to be viewed as preexperimental, and often as pre-scientific: "...exploratory activities that are not experimental are often denied the right to be classified as sciences at all" (Medawar 1981:69). In IR, after a long period of anecdote-, personal experience- and speculation-based writing, the research community is aspiring towards solid scientific status (see Arjona-Tseng 1989, Moser-Mercer 1991, Kondo and Mizuno 1995) and seems to be particularly sensitive to the scientific prestige factor. This may strengthen the attractiveness of the experimental paradigm, which has become prominent in conjunction with the recognition of the relevance of cognitive psychology to interpreting - cognitive psychology uses almost exclusively the experimental paradigm. As explained in the introduction, the predominance of the experimental approach may have deleterious effects. Most scientific disciplines need a descriptive foundation (cf. Beaugrand 1988:33, McBurney 1990:27). In interpreting research, there is not enough observational data yet, and Stenzl's call for more descriptive studies (1983:47) is still topical. It is therefore important to give such studies a chance by recalling their legitimacy as scientific endeavours per se even when they are not followed by hypothesis testing (cf. Fourastié 1966:134, Mialaret 1984:25, Grawitz 1986:430). Observational research can be broadly classified into three types of approach: 3.1 The exploratory approach In this paper, the adjective 'exploratory' refers to endeavors primarily concerned with the analysis of situations and events in the field without any prior intent to make a specific point, ask a specific question or test a specific hypothesis. Again, the popularity of experimental research in its hypothesis-testing variety (see below) has been detrimental to the exploratory approach, in that one often hears or reads that investigation around well-defined hypotheses is a sine qua non condition for "scientific work". This opinion is not shared by all (cf. Babbie 1992:91). Besides the authority argument, examples from mathematics, medicine and psychology show that good results can be obtained through the exploratory approach (also see Kourganoff 1971:54-55), though it must be stressed that: D.Gile 1996 Obsvsexp page 2 a. The absence of a well-defined and explicitly formulated hypothesis does not mean that exploration is done at random; some expectations always guide the investigator, b. Exploratory projects can, and in fact most often do lead to precise hypotheses and their subsequent testing. One example of such an exploratory approach can be found in a study of interpreting research productivity in the 80's and 90's undertaken by Franz Pöchhacker (1995a,b). Pöchhacker collected publication data, and then analysed it in terms of individual authors, of 'schools', of evolution over time, and drew a number of conclusions. He did not start with a specific hypothesis to test. 3.2 The focused analytical approach This title refers to research efforts focusing on a particular phenomenon through the analysis of observational data. This is the type of observational research most frequently encountered in IR. For example, Thiéry (1975) sent out a questionnaire to the 48 interpreters classified as 'true bilingual' members of the International Association of Conference Interpreters in order to identify their biographical, professional and 'linguistic' idiosyncrasies. A more recent example is a study of how interpreting affected the role of participants in a war crime trial (Morris 1989). The idea was to analyse data obtained during the trial to see whether and how the interpreters' intervention actually changed the courtroom situation, especially as regards the role and interaction between the various actors. Another relatively recent analytical study is Gile's case study of his own terminological preparation for a medical conference and its 'efficiency' (1989, ch.14). 3.3 The hypothesis-testing approach This approach is similar to that usually found in experimental studies, in that the researcher seeks to collect data that will strengthen or weaken specific hypotheses about a particular phenomenon. The difference is that here the data are derived from field observation rather than experimentation, with the obvious limitations which arise from the fact that environmental conditions are given, not controllable, hence the risks associated with uncontrolled variability. Observational hypothesis-testing is rare in interpreting resarch. One example is Shlesinger's M.A. thesis (1989), in which she used field data to test the hypothesis that interpreting turns rather literate original speeches into more oral-like target speeches and viceversa. Observational studies can be methodologically simple. Generally speaking, as research in a field makes headway, observation and analysis tend to become finer and require more sophisticated tools and methods. In IR, the paucity of empirical research at this stage implies that the complexity of methods cannot be taken as an indicator for the relative value of their contribution: there are still many important aspects of interpretation to discover with simple methods. And yet, when such simple projects are suggested to beginners, they often balk at the idea and look for more ambitious projects which mostly turn out to be out of their reach. 3.4 Interactive and non-interactive observational research Another methodologically important distinction can be made between 'interactive' and 'non-interactive' observational research. The latter consists in observing situations and phenomena as they happen without the observed subjects playing an active role in the collection, analysis and assessment of the data (as opposed to interview-based and questionnaire-based studies), and with minimum data-distorting influence from the observer during the collection of the data. Actually, one could claim that there is always some interaction: the terms "interactive" and "non-interactive" are simplifications. It would be more appropriate, but less convenient, to talk of higher-interaction vs. lower-interaction. An example of such "non-interactive" or lower-interaction observational research is Strolz's doctoral dissertation (1992), in which she tested a number of hypotheses on the basis of the recordings of two interpreters' rendering of the same speech on the media. The two D.Gile 1996 Obsvsexp page 3 interpreters were not involved in the collection of the data; they only 'provided' the corpus by doing their job in a professional situation. Similarly, Williams's 1995 study of anomalous stress in interpreting consisted of an analysis of recordings of source texts and target texts in an 'authentic' conference. "Interactive" observational research in translation and interpreting studies generally involves participation of the investigator in the process under study and/or questionnaires or interviews, and thus entails the risk of interference from the researcher and/or a significant influence of the research procedure on the phenomenon under study, or the risk of interference from the subjects' personal perception, interpreting and reporting of facts. For example, Thiéry's questionnaire was sent out to his 'true bilingual' colleagues who were also his competitors. Since the issue of language classification and in particular of 'true bilingualism' is a sensitive one in the conference interpreters' community, with concrete professional implications, it is difficult to rule out an influence of the very question on the responses on one hand, and on the investigator's interpretation thereof on the other. As to Gile's case study on conference preparation (1989), the knowledge that his preparatory work would be examined for research purposes is likely to have had some influence on the way he worked. The potential effects of such interaction may be taken on board usefully in the discussion of results if the direction of the bias is known or strongly suspected, as in the case of selfobservation of preparation for a conference (the assumption being that the preparation is done at least as seriously, and possibly more seriously, than under 'normal' circumstance). In other cases, its effects can be difficult to assess. Issues associated with interaction can be illustrated with a recent, rather large study on the expectations of users from conference interpreters (Moser 1995), based on user interviews performed by interpreters. Though the interviews followed a carefully designed questionnaire, one may question the validity and representativity of some of the findings for the following reasons (not to mention the fact that the interviewers were volunteers and had received written instructions, but no training - see Shipman 1988:85 on this subject): - It is a strong possibility that interpreters selected their respondents among users who seemed to show a friendly or otherwise positive attitude towards them or towards interpreting, rather than among indifferent or hostile delegates. - During the interview, the identity of the interviewers as interpreters was clear to the respondents, hence a possible effect on the answers given to the questions. - Because of the interpreters' subjective position on the matter, one cannot rule out bias in their interpretation of the responses. The risks associated with interaction in social anthropology, where the researcher's presence may influence the observed group's behaviour, have been acknowledged a long time ago. The numerous TAP (think-aloud-protocol studies) performed on translators over the past few years also entail a strong possibility of interaction between the research process and the translation processes under study (see for instance Toury 1991, Kiraly 1995 and Kussmaul 1995). 4. Experimental research: statistical hypothesis-testing vs. 'open experimenting'. Statistical hypothesis-testing is the most frequently used approach in experimental research in the social sciences. The procedure involves the selection of a sample of subjects who perform a task or are submitted to certain conditions in a 'controlled environment' in order to test a hypothesis on the basis of a 'statistical test' ('Chi square', 't test', 'F test', analysis of variance, etc.) showing 'significance' or 'non-significance' of measured and calculated values. The hypothesis is then either rejected or 'not rejected'. Most of the experimental research done on interpreting is of this type (see Gran and Taylor 1990, Lambert and Moser-Mercer 1994 and Tommola 1995). For many behavioral scientists, 'experimental research' has become a synonym for statistical hypothesis-testing. Researchers tend to forget that experiments can also be used for other purposes. In particular, they can have an exploratory function and help find answers to D.Gile 1996 Obsvsexp page 4 questions of the type "what will happen if..." (Fraisse, Piaget and Reuchlin 1989:I-92). Such experiments will be called 'open experiments' here. Again, present social norms make hypothesis-testing more appealing, and open experimenting appears as 'naive'. However, in industrial research, which is anything but naive, and in physics, chemistry and biology, which do not have to worry about their scientific status, researchers have no qualms about its use. The same applies to medicine, to agronomic research, and even to education research. It may also be worthwhile to recall that one of the main criticisms levelled against open experimenting, namely its inefficiency, attributable to the fact that the absence of a precise hypothesis to test leads to possible waste of time, material means and efforts, loses much of its relevance when computer simulation can be performed. Last but not least, advocates of the primacy of the hypothesis testing-paradigm point out repeatedly that "you only find what you are looking for", meaning that researchers need a precise hypothesis or research question to work on for fear of missing the significant phenomenon in a sea of data. The principle is generally true, although major discoveries have also been made at some distance from the researcher's focal point. On the other hand, focusing on one item may prevent the observer from noticing the presence of another, which may be important as well (see Reuchlin 1969:29-30). One example of open experimenting in translation research is a study in which an idea was presented graphically and subjects were asked to verbalise it in their native language (Gile 1995b, Ch. 3). The initial objective was to identify phenomena that could help justify some liberty in the production of the target text. Findings included a confirmation of interindividual and intra-individual variability in production, but also the identification of several types of information, partly language-dependent, that subjects added to the information they aimed to get across to their receivers (the 'message'), without necessarily being aware of it or controlling this added information. Open experimenting provides an observation platform which may then lead to more precise hypotheses and to statistical hypothesis-testing. In terms of experimental design, it is less difficult than the latter, because it requires less stringent controls (Fraisse, Piaget and Reuchlin 1989), but it does provide the basic advantages of experimental procedure as a whole, and in particular the possibility of having the same task performed under identical conditions by more than one person, something which in interpreting only rarely happens in the field. 5. Fundamental methodological issues in experimental and observational research in IR In principle, the major advantage of observational research is that it allows the investigation of phenomena as they occur naturally, with no distortion induced by the study. However, as already pointed out, this is not always the case, as many observational procedures do generate interference with the 'natural' phenomena (see also Shipman 1988). This is particularly true in interpreting: due to the evanescent nature of both input and output, on-line on site observation is often required, but when interpreters know they are being 'observed' for the purposes of close scrutiny, this may change their behavior, and not telling them is ethically problematic. The obvious way around the problem is to use data which are 'naturally' recorded and non-confidential, as is the case in TV interpreting, but samples thus obtained are not representative of all practitioners and all professional environments (see Kurz forthcoming). Another problem is associated with the diversity of working conditions (see section 6.2), which makes it difficult to identify the contribution of individual variables to the phenomena observed. Ideally, experimental research offers the solution to the problem: by creating an environment where all relevant factors are controlled, it is possible to 'isolate' and measure the effect of the variable under study (the 'experimental' or 'independent' variable). Unfortunately, the implementation of this principle is often far from simple. The following obstacles are particularly salient and have been generating much criticism among interpreters against experimental research: D.Gile 1996 Obsvsexp page 5 a. The experimental situation itself, being 'unnatural' (if only because the subjects know that they are being observed by researchers as opposed to being listened to by clients for remuneration within a professional framework), may generate processes somewhat different from the ones occurring in the natural environment (see section 6.1). Not enough data are available about differences in interpreting performance depending on the context being natural or experimental to decide that experiments do or do not distort processes significantly, but the possibility must be considered. b. The very selection of the relevant variables to be controlled is determined by subjective, non-scientific considerations, whenever not enough solid evidence is available (see for example Moravcsik 1980:46). As pointed out earlier, this is the case in IR. By controlling some less relevant variables and ignoring more important ones, investigators may well lose much of the potential benefit of the experimental paradigm. This is a serious risk in research performed by non-interpreters, who may be unaware of factors deemed relevant by interpreters and fail to control them. c. To address the issue of variability, experiments are done on samples. In interpreting, access to data and to subjects is problematic (see for example Gile 1995a), which leads to small, nonrandom samples, and to the possibility of interference from uncontrolled variables associated with the sampling procedure (see Section 6.2). 6. Methodological issues in the literature Though interpreting and translation investigators are vulnerable to a wide range of possible methodological weaknesses and errors, in the context of observational studies vs. experimental studies, three types of problems seem to be particularly salient in their research: 6.1 Weaknesses related to the validity of the data One of the most frequently encountered problems in IR has to do with the validity of the data as actual interpreting data. While in non-interactive observational research, data can be considered 'authentic', validity problems start with interactive observational research, precisely because responses elicited 'artificially' tend to be influenced by the way questions are put, by the interaction between the interviewer and the interviewee, and by the image of the interviewee or respondent that will be reflected back onto him by his own responses (see for instance Frankfort-Nachmias and Nachmias 1992). At an even more basic level, the subjective perception of 'reality' by respondents and interviewees can also be misleading. For instance, in Déjean Le Féal 1978, it turned out that interpreters may perceive delivery speed as the number one difficulty in the interpretation of a speech read from a text, while the actual delivery rate is not high - other features of the speech make it appear to be high. Similarly, experiments by Collados Ais (1996) and Gile (1985 and 1995c) suggest that informants assessing the quality of an interpreter's linguistic output or its fidelity to the original can be very unreliable. This misleading nature of personal perception is one of the main reasons why scientific methods in general can be advocated in the field of interpreting despite the limitations of the scientific approach and the significant contribution of the 'personal theorizing' approach (Gile 1990). A validity problem that deserves special attention is associated with self-observation, as mentioned above. Not only is the analysis of mental processes through introspection obviously vulnerable to personal bias, but even attempts at more objective observational research on one's own work are probably less reliable than studies of other peoples' work. This does not imply that self-observation should be banned, as it offers advantages in terms of motivation, availability and access to a corpus. Nevertheless, attention to potential methodological problems arising in such research is called for. Validity problems become more severe in experimental research, especially in hypothesis testing and in research done by non-interpreters. Practitioners challenge the validity of data obtained with student interpreters and with amateurs with no training in interpreting as D.Gile 1996 Obsvsexp page 6 opposed to professionals (e.g. in Treisman 1965, Goldman-Eisler 1972, Lawson 1967, Chernov 1969, Barik 1971, Kopczynski 1980). They also express doubts as to the comparability of 'real' interpreting and the tasks given to subjects in experiments: interpreting written texts (Gerver 1974), some artificially composed with the aim of obtaining certain linguistic characteristics (Dillinger 1989), or even 'interpreting' random strings of words (Treisman 1965). Finally, as mentioned earlier, they have doubts as to the validity of data obtained in a laboratory situation where the speaker is not actually seen and where the interpreter does not 'feel the atmosphere' of a real conference. References to this issue are numerous: Politi 1989, Shlesinger 1989 (she rejected experimental procedures and later changed her mind and started performing experimental research herself), Dillinger 1990, Lambert 1994, Jörg 1995 (who reports that one of his subjects complained about the 'artificial' nature of the experiment). 6.2 Representativity of the data Another major methodological problem lies with the representativity of the data even when they are considered valid as interpreting data: - As already mentioned, samples in interpreting research are generally small, with a few exceptions in observational research such as Thiéry's dissertation (1975), a Japanese study on 189 subjects (Yoshitomi and Arai 1991), and Moser's user expectation survey (1995). In small samples, the statistical procedure only partly smoothes out possible inter-individual variation. - The fact that samples are not random undermines the validity of statistical tests performed on them (Hansen et al. 1953, Snedecor and Cochran 1967, Charbonneau 1988, Babbie 1992:448), as non-random sampling does not exclude bias, and bias implies that sample means do not tend to approximate the population mean. The same problem is found in research in many other fields, in particular medicine and psychology, and the latter has traditionally come under criticism for studying not the human population at large but the population of first-year psychology students (Mc Nemar 1946, Smart 1966 and Sears 1986 are quoted on this point in Charbonneau 1988:333). An acceptable approximation can sometimes be found in the form of critically controlled convenience sampling, in which subjects are selected because they are easy to access but are screened on the basis of the researcher's knowledge of the field and (ideally) of empirical data derived from observation and experimentation. In interpreting, it is sometimes risky to make similar claims, as interpreters do have direct knowledge of the field, but are also highly involved, and may therefore lack objectivity when investigating issues on which they have strong views. - Interpreting conditions are diverse with respect to many parameters widely viewed as significant, such as the subject of the speech, previous knowledge of the subject by the interpreter, the syntactic and lexical make up of the speech, delivery parameters, interpreter fatigue, interpreter motivation, working conditions, etc. In particular, the source-language and target-language variables are poorly taken into account, as most experiments and observational projects are carried out with two languages only, and hardly ever with 'exotic' languages such as Chinese, Japanese, Arabic, Polynesian languages, etc. Each experiment only represents one, two or up to what can only be a minute fraction of the total number of possible combinations of values of the significant parameters. Generalising is therefore hazardous. - Another problem is linked to the possible variation over time in individual interpreting performance (the interpreter's "ups and downs"), a phenomenon intuitively perceived by interpreters, which may also limit the representativity of any single experiment. The amplitude of and precise reasons for such variations are not known, but the phenomenon is perceived as sufficiently salient by the practitioner's community to require caution. 6.3 Quantification Interpretation is a complex phenomenon which interpreters consider difficult to measure (see for instance Pöchhacker 1994). Though they may accept the validity of a word or syllable count as a measure of 'linguistic throughput' (with reservations, as explained in Pöchhacker D.Gile 1996 Obsvsexp page 7 1994), no such yardstick exists for the measurement of information or 'message' throughput. Neither have satisfactory ways been found to measure fidelity or the message reception rate among delegates, and fidelity indicators such as errors are challenged because of allegedly inadequate definitions (see for example Bros-Brann 1976 and Stenzl 1983). In this context, observational research is less vulnerable to errors than experimental research. This is obvious as regards the validity of data, but is also easily explained for the two other types of problems. Observational studies generally do not rely on quantification to the same extent as experimental studies, and tend to be content with rather gross order-ofmagnitude indications. Experimental studies of the hypothesis-testing type are based on mathematical comparisons, and uncertainty and variability in the measurement of the relevant values have more serious consequences. Furthermore, while such studies are designed for the purpose of drawing inferences on the population from the samples used (hence the importance of proper statistical procedure, including sampling methods), generalizations are not intrinsically part of observational research. When authors of such studies do generalize, their inferencing is intuitive rather than mathematical, and easier to assess for readers than in experimental research, because the data are known to be authentic rather than derived from an 'artificial' experimental environment. 7. Strategies 7.1 Validity of the data Validity problems in interactive observational interpreting research as described above are essentially the same as in questionnaire- and interview-based studies in other disciplines, and countering them is possible with the strategies and tactics developed for such studies in other fields such as sociology and psychology. Validity problems in experimental interpreting research are more difficult to deal with: too little solid data are available for assertions to be made regarding the comparability of sight translation data with interpreting data, of student performance with professional performance, of work into a regular active language with experimental interpreting into a usually passive language, etc. Under the circumstances, I believe that in research on interpreting (as opposed to linguistic, psycholinguistic or neurolinguistic research performed on interpreters for the purpose of investigating cognitive processes rather than interpreting per se), more weight should be given to the study of phenomena as they occur in the field, in other words to observational research. Experimental set-ups create 'artificial' environments by definition. Some would probably be considered an acceptable approximation for the phenomena under study (even Seleskovitch, the most outspoken anti-experimentalist, based her own dissertation on an experiment - cf. Seleskovitch 1975), while others have been rejected as non valid. The uncertainty zone between the two may be quite wide, and it would appear reasonable to try as much as possible to design experiments in which factors intuitively deemed relevant by practitioners are not ignored (see Dillinger 1990 for a challenge regarding the interpreters' doubts and an answer in Gile 1991). One constructive way of addressing the problem would be to compare experimental results and field performance, and study the amplitude and direction of possible differences: recordings of actual speeches can be used for experiments, and the product of their 'real life' interpretation and laboratory interpretation can be compared. As such comparisons accumulate, a solid foundation for assumptions on the comparability of experimental findings and interpretation in the field will gradually be constructed. Studies in which professional interpreters and students, amateurs and other subjects are given the same tasks (see for example Dillinger 1989 and Viezzi 1989) are also useful insofar as they provide data on comparability (or lack of comparability) in the performance of various categories of subjects and may thus open new avenues for extending virtual sample size. 7.2 Representativity of the data D.Gile 1996 Obsvsexp page 8 There is no easy solution to the problem of representativity, because of the small size of samples and the large number of different working environments and conditions, as well as the variety of individual interpreters' personal parameters. I believe the best way to address this problem is replication, in both experimental and observational procedures. Even if it is only partial (with somewhat different procedures and/or conditions), replication generates data comparable to at least some extent with the initial data, and thus may be said to 'increase' sample size, though admittedly not in the strict sense, which prevents the indiscriminate use of the data as if they came from a single sample (see Gay 1990:115). Unfortunately, at present, the number of interpreting investigators who perform experimental or observational research is very small (see Pöchhacker 1995a,b), and all but a handful engage in a single effort for the purpose of obtaining a degree and then stop. Such investigators are interested in original research, much less in replication. The situation may change if interpreting schools establish research centers or research programs (see Gile 1995a). Another potentially useful resource for the assessment of data representativity in a study is the corpus of previously published studies, which may differ in their objectives and in conditions, but offer relevant data nevertheless. For instance, publications which contain transcripts of original speeches and their interpretation can be used to study the typology and distribution of errors, though in some, inferencing possibilities vary, depending on available information on how the data were obtained. 7.3 Quantification Admittedly, complex aspects of human behaviour such as interpreting are difficult to measure directly in a precise, thorough and monolithic way. There are, however, specific aspects of interpreting which are easier to quantify, such as the number and duration of pauses in the interpreter's delivery, the proportion of proper names correctly rendered, the number of lexical units in source and target texts and the relative frequency of specific lexical units in each, etc. Other aspects are more difficult to quantify. In a study on the quality of interpreting students' linguistic output (Gile 1987), the sensitiviity and criteria of native informants for deviations from acceptable linguistic usage was found to be highly variable (Gile 1985). Similarly, the quality of interpreting is difficult to assess accurately, as stressed inter alia by Pöchhacker (1994) and demonstrated in Collados Ais (1996). Measurements can be inaccurate because of a lack of precise definitions (in error counts, quality assessment, etc.), because of observational inaccuracies or errors (due to personal bias or shifts in attention - see Gile 1995c), and because of the natural variability of the phenomena under study. In cases where such inaccuracies are likely, this must be taken into account when drawing conclusions. For instance, in a study on interpreting, error rates for two groups were respectively 27.22 and 23.45 (see Politi 1989, reporting on a study by Balzani), and the difference, about 15%, was checked for statistical significance. One may wonder about the usefulness of such statistical significance, considering the variability and possible non-random inaccuracies that may have crept into this small-sample experiment, including possible variability in error identification (in her 1983 M.A. thesis, Stenzl reports that when she applied Barik's error criteria to the corpus and tried to replicate the procedure, she found different results from his - see Barik 1971). The effect of a single outlier in a small sample can be large, as pointed out by Jörg (1995) in his data from a sample of 6 interpreters. No matter how powerful and refined new statistical methods are, they cannot overcome the fundamental limitations associated with high variability in small non-random samples. On the other hand, such inaccuracies and the limitations they entail need not be a major problem in research projects. Less accurate quantification can be performed in many cases using the tools that have been developed for similar problems in the social sciences, and be sufficient to provide the required answers. For instance, in a study on the deterioration of linguistic output quality during interpreting exercises (Gile 1987), in spite of poor informant reliability, the existence of 3 distinct levels of linguistic deviation frequency depending on the type of exercise (simultaneous interpreting, consecutive interpreting and ad-libbed presentations) could be established and proved robust when submitted to sensitivity analysis D.Gile 1996 Obsvsexp page 9 (sensitivity analysis checks to what extent results and conclusions change when relevant parameters vary within specified intervals, as can happen inter alia because of variability in human evaluation). Similarly, when trying to decide whether proper names pose a 'real' problem in interpreting, the point will have been made regardless of whether 80, 90 or 100% of the names in a speech are incorrectly rendered in the experiment (Gile 1984). If the question is whether or not interpreters lose some information in simultaneous under given conditions, the result is significant regardless of whether 10, 20 or 30 errors are detected in a half-hour interpreting segment (Gile 1989 - also see Babbie 1992:127). Here as elsewhere, common sense is paramount, and more important than statistical techniques (see Medawar 1981:27,93, McBurney 1990:5). One might also recall in this respect that 'statistically significant' differences may have no practical significance: the 20% difference between an average of 10 and an average of 12 errors per half an hour of interpreting may not be enough to provide 'significantly better' interpreting to a client; the 30% difference between having an average of 150 and 200 new terms to learn while preparing for a technical conference may not change significantly the difficulty of the endeavor or the choice of appropriate interpreting strategies. Most of the debates in interpreting research still focus not on minute differences, but on basic principles of 'doctrine': does work into one's non-native language produce a target speech of significantly poorer quality than that produced in one's native language? Should consecutive interpreting be taught for a whole year before training in simultaneous interpreting starts? Are shadowing exercises useful ?Small random differences between conditions do not contribute much to the debate, while wide, regular differences can have practical significance. Interpretation being largely unexplored, I believe that such wide, regular differences can still be found with simple quantitative methods, and advocate their use rather than more complex techniques which may yield less reliable data. Science proceeds by "successive approximations" (Kourganoff 1971, Camus 1996:8). First approximations, achieved with simple methods, are necessary to provide the basis for finer approximations, which will eventually require more sophisticated methods. At this stage, due to the paucity of data on variability and on co-variability between a larger number of potentially important variables, descriptive statistics (which describe phenomena) may be a more appropriate tool than inferential statistics (which attempt to make mathematics-based inferences on the population from the data collected on samples). Explicit descriptive statistics enable readers of published studies to decide for themselves whether samples and conditions are representative and whether results are significant, while statistical tests may give an undeserved seal of 'scientific objectivity' to a procedure which is not valid by strict standards (see Gore and Altman 1982). 8. Observational studies vs. experimental studies in translation research Similar issues exist in translation research, but their relative weights are different. The following are some differences which can have implications on research strategies, and in some cases on the choice of observational vs. experimental studies: 8.1 Factors facilitating observational research: - The product of translation, basically a text, is easy to store and readily available to investigators on a large scale at little cost, especially as regards literary translation. The product of interpreting, composed of spoken discourse plus body language (a "hypertext" Pöchhacker 1994), is evanescent. Its observation requires recordings, preferably of both image and sound. There are relatively few recordings of interpreted speeches, and existing ones are difficult to access. - Technically speaking, translated texts are easier to process, with the possibility of rapid visual access to all parts of the corpus, while recordings of interpreted speeches require timeconsuming linear listening, or equally time-consuming transcription work, which inevitably leads to a loss of information. From this viewpoint, the experimental approach, which involves a 'controlled' limited corpus, is likely to be preferred by interpreters. D.Gile 1996 Obsvsexp page 10 - In translation, the time elapsed between the moment the translator reads a source-text segment and the moment s/he produces its final target-language version can last anywhere from a few seconds to hours, days or weeks. A corollary is that in field conditions, reactions to specific stimuli (such as manipulated independent variables) can be offset by further thinking and strategies or 'drown' in the general noise associated with the operation, and be undetectable to the researcher. This makes experimental procedures more difficult to conduct when studying the translation process. The TAP paradigm is a clever attempt to address the problem, but it is not without its weaknesses (cf. Toury 1991). In simultaneous interpreting, the EVS (ear-voice span, or time elapsed between the moment the interpreter detects a speech segment and the moment s/he produces its target-language reformulation) rarely exceeds a few seconds, and reactions to stimuli are likely to be more salient. 8.2 Factors facilitating experimental research: - Translators are numerous and are used to their product being exposed, as opposed to interpreters, whose population is much smaller, who feel vulnerable and who dislike exposure. Access to samples is therefore easier in translation. - 'Authentic' replication of an interpreting task in the field is essentially limited to speeches made during international events which are interpreted by several interpreters and broadcast over several radio and television channels. The validity of any further replication in the laboratory can be challenged because of the 'artificial' environment. In translation, it is possible to generate 'authentic' replication of translated tasks, for instance by commissioning the translation of the same text into the same language from several translators or translation agencies. The cost of a similar operation replicating field conditions in interpreting would be much higher. - In translation, text manipulations are easy to perform without the subject noticing them and without jeopardizing the validity of the translation task. In interpreting, text manipulations have been criticized for possibly generating interpreting behavior inconsistent with 'authentic' behavior. - Translation performance is less dependent on cognitive load management than interpreting performance, in which phenomena lasting a fraction of a second can lead to major irreversible consequences (Gile 1995a,b). It follows that: (1) Strict control of environmental conditions is less critical in translation than in interpreting. A gradual increase of sample size over time in several locations and under slightly different conditions is therefore less problematic. (2) Intra-individual performance variability is probably lower in translation then in interpreting, and may well lead to solid conclusions on the basis of smaller samples. The preceding analysis underscores two major differences between translation and interpretation as regards research possibilities: - On the whole, access to data on the product and their analysis are easier in translation. - The observation of the process is more difficult in translation. This may offer a partial explanation for the fact that translation is far ahead of interpretation in observational product-centered studies, while it is struggling with processoriented studies. 9. Conclusion Observational and experimental studies should be seen as mutually reinforcing, not mutually exclusive. Somes issues are better addressed by the observational paradigm, and some by experimental investigations, especially in fields where a considerable corpus of research already exists, as is the case of neurophysiology. This paper does not challenge the basic advantages of the experimental paradigm, nor does it advocate the choice of this or that method in any individual research endeavor. It does claim that in IR, there is a lack of descriptive data obtainable through simple methods, and that this weakens the power of more complex methods. In problem-solving oriented research disciplines such as mathematics, medicine, chemistry, etc., self-regulation occurs naturally as a function of results. In the D.Gile 1996 Obsvsexp page 11 behavioral sciences, including IR, it is more difficult to assess the value or 'truthfulness' of research findings, and paradigmatic directions are to a larger extent a function of social norms and research policy. Underlining problematic aspects of sophisticated methods and stressing the value of simple methods may encourage leaders of the IR community to shift somewhat the balance of efforts in favor of the latter, in particular among students working on a graduation thesis, which may help fill the existing gap in descriptive data. References Arjona-Tseng, Etilvia.1989. "Preparing for the XXIst Century". Proceedings of the Twentieth Anniversary Symposium of the Monterey Institute of International Studies on The Training of Teachers of Translation and Interpretation. Monterey. pages not numbered. Babbie, Earl. 1992. The practice of social research. Belmont, California: Wadsworth. Sixth Edition. Barik, Henri. 1971. "A description of various types of omissions, additions and errors encountered in simultaneous interpretation". Meta 15:1. 199-210. Beaugrand, Jacques P. 1988. "Démarche scientifique et cycle de la recherche". Robert, ed. 1988: 1-35. Bros-Brann, Eliane. 1976. "Critical Comments on H.C. Barik's "Interpreters Talk a Lot, Among Other Things"". Bulletin de l'AIIC 4:1. 16-18. Camus, Jean-François. 1996. La psychologie cognitive de l'attention. Paris: Armand Colin. Charbonneau, Claude. 1988. "Problématique et hypothèses d'une recherche". Robert, ed. 1988: 66-77. Chernov, Ghelly V. 1969. "Linguistic Problems in the Compression of Speech in Simultaneous Interpretation". Tetradi Perevodchika Vol.6: 52-65. Collados Aís, Ángela. 1996. La entonación monótona como parametro de calidad en interpretación simultánea: la evaluación de los receptores. Unpublished doctoral dissertation. Universidad de Granada. Déjean Le Féal, Karla. 1978. Lectures et improvisations: incidences de la forme de l'énonciation sur la traduction simultanée. Unpublished doctoral dissertation. University of Paris III. Dillinger, Michael L. 1989. Component Processes of Simultaneous Interpreting. Unpublished PhD dissertation. McGill University, Montreal. Dillinger, Michael L. 1990. "Comprehension during interpreting: What do interpreters know that bilinguals don't ?" The Interpreter's Newsletter 3: 41-55. Fourastié, Jean. 1966. Les conditions de l'esprit scientifique. Paris: Gallimard. Fraisse, Paul, Jean Piaget and Maurice Reuchlin. 1989. Traité de psychologie expérimentale. Vol. I. Histoire et méthode. Paris: PUF. Sixth Edition. Frankfort-Nachmias, Chava and David Nachmias. 1992. Research Methods in the Social D.Gile 1996 Obsvsexp page 12 Sciences. London, Melbourne, Auckland: Edward Arnold. Fourth Edition. Gambier, Yves, Daniel Gile and Christopher Taylor, eds. 1997. Conference Interpreting: Current Trends in Research. Amsterdam/¨Philadelphia: John Benjamins. Gay, L.R. 1990. Educational Research. Competencies for Analysis and Application. New York: Merril, MacMillan Publishing Company. Third Edition. Gerver, David. 1974. "The effects of noise on the performance of simultaneous interpreters: accuracy of performance". Acta Psychologica 38: 159-167. Gile, Daniel. 1984. "Les noms propres en interprétation simultanée". Multilingua 3:2. 79-85. Gile, Daniel. 1985. "La sensibilité aux écarts de langue et la sélection d'informateurs dans l'analyse d'erreurs: une expérience". The Incorporated Linguist 24:1. 29-32. Gile, Daniel. 1987. "Les exercices d'interprétation et la dégradation du français: une étude de cas". Meta 31:4. 363-369. Gile, Daniel. 1989. La communication linguistique en réunion multilingue, Les difficultés de la transmission informationnelle en interprétation simultanée. Unpublished PhD dissertation. Université Paris III. Gile, Daniel. 1990. "Scientific Research vs. Personal Theories in the investigation of interpretation". Gran and Taylor, eds. 1990: 28-41. Gile, Daniel. 1991. "Methodological Aspects of Translation (and Interpretation) Research". Target 3:2. 153-174. Gile, Daniel. 1995a. Regards sur la recherche en interprétation de conférence. Lille: Presses Universitaires de Lille. Gile, Daniel. 1995b. Basic Concepts and Models for Interpreter and Translator Training. Amsterdam/Philadelphia: John Benjamins Publishing Company. Gile, Daniel. 1995c. "Fidelity Assessment in Consecutive Interpretation: An Experiment". Target 7:1. 151-164. Goldman-Eisler, Frieda. 1972. "Segmentation of input in simultaneous interpretation", Psycholinguistic Research 1. 127-140. Gore, Sheila M. and Douglas G. Altman. 1982. Statistics in Practice. London: British Medical Journal. Gran, Laura and Christopher Taylor, eds. 1990. Aspects of applied and experimental research on conference interpretation. Udine: Campanotto Editore. Grawitz, Madeleine. 1986. Méthodes des sciences sociales, Paris: Dalloz. Hansen, M., W. Horwitz and W. Madow. 1953. Sample Surveys: Methods and Theory. New York, London, Sydney: John Wiley and Sons. Holmes, James S., José Lambert and Raymond van den Broeck, eds. 1978. Literature and Translation: New Perspectives in Literary Studies. Leuven: acco. D.Gile 1996 Obsvsexp page 13 Jörg, Udo. 1995. Verb anticipation in German-English simultaneous interpreting. Unpublished M.A. dissertation. University of Bradford. Kiraly, Donald. 1995. Pathways to Translation. Pedagogy and Process. Kent, Ohio and London, England: The Kent State University Press. Kondo, Masaomi and Akira Mizuno. 1995. "Interpretation Research in Japan". Target 7:1. 91106. Königs, Frank G. 1987. "Was beim Übersetzen passiert". Die Neueren Sprachen 86. 162-185. Kopczynski, Andrzej. 1980. Conference Interpreting (Some Linguistic and Communicative Problems). Nauk, Poznan: Wydawn. Kourganoff, Vladimir. 1971. La recherche scientifique, Paris:PUF. Krings, Hans P. 1986. Was in den Köpfen von Übersetzern vorgeht. Tübingen: Gunter Narr. Kurz, Ingrid. forthcoming. "Getting the Message Across: Simultaneous Interpreting for the Media". Proceedings of the EST conference in Prague, September 1995. Kussmaul, Paul. 1995. Training the Translator. Amsterdam/Philadelphia: John Benjamins Publishing Company. Lambert, Sylvie. 1994. "Foreword". Lambert, Sylvie and Barbara Moser-Mercer, eds. 1994: 5-14. Lambert, Sylvie and Barbara Moser-Mercer, eds. 1994. Bridging the Gap: Empirical research on simultaneous interpretation. Amsterdam and Philadelphia: John Benjamins Publishing Company. Lawson, Everdina. 1967. "Attention and Simultaneous Translation". Language and Speech 10:1. 29-35. Lörscher, Wolfgang. 1986. "Linguistic aspects of translation processes". House, J. and S. Blum-Kulka, eds. 1986. Interlingual and intercultural communication. Tübingen: Gunter Narr: 277-292. McBurney, Donald H. 1990. Experimental Psychology. Belmont, California: Wadsworth. Mc Nemar, Q. 1946. "Opinion-attitude methodology". Psychological Bulletin 43. 289-374. Medawar, Peter B. 1981. Advice to a Young Scientist. London and Sydney: Pan Books. Meta 41:1. 1996. Special issue on Translation processes, guest-edited by Frank Königs. Mialaret, Gaston. 1984. La pédagogie expérimentale. Paris: PUF. Moravcsik, Michael J. 1980. How to Grow Science. New York: Universe Books. Morris, Ruth. 1989. The impact of interpretation on legal proceedings. Unpublished M.A. thesis. Communications Institute, Hebrew University of Jerusalem. D.Gile 1996 Obsvsexp page 14 Moser, Peter. 1995. Survey on Expectations of Users of Conference Interpretation. Final Report. produced by SRZ Stadt + Regionalforschung GmbH, Vienna. Mimeographed. English translation. Moser-Mercer, Barbara. 1991. "Paradigms gained or the art of productive disagreement". AIIC Bulletin 19:2. 11-15. Pöchhacker, Franz. 1994. Simultandolmetschen als komplexes Handeln. Tübingen: Gunter Narr. Pöchhacker, Franz. 1995a. ""Those Who Do...": A Profile of Research(ers) in Interpreting". Target 7:1. 47-64. Pöchhacker, Franz. 1995b. "Writings and Research on Interpreting: A Bibliographic Analysis". The Interpreter's Newsletter 6. 17-32. Politi, Monique. 1989. "Le signal non verbal en interprétation simultanée". The Interpreter's Newsletter 2. 6-10. Reuchlin, Maurice. 1969. Les méthodes en psychologie. Paris: PUF. Robert, Michèle, ed. 1988. Fondements et étapes de la recherche scientifique en psychologie. St-Hyacinte, Québec: Edisem and Paris: Maloine. Schjoldager, Anne. 1995. "Interpreting Research and the 'Manipulation School' of Translation Studies". Target 7:1. 29-45. Sears, D.O. 1986. "College sophomores in the laboratory: infuences of a narrow data base on social psychology's view of human nature". Journal of personality and social psychology 51. 515-530. Séguinot, Candace. 1989. "The Translation Process: An Experimental Study". Séguinot, Candace, ed. 1989. The Translation Process. Toronto: H.G. Publications, School of Translation, York University. Seleskovitch, Danica. 1975. Langages, langues et mémoire. Paris: Minard. Shipman, Martin. 1988. The Limitations of Social Research. London and New York: Longman. Third Edition. Shlesinger, Miriam. 1989. Simultaneous Interpretation as a Factor in Effecting Shifts in the Position of Texts on the Oral-Literate Continuum. Unpublished M.A. thesis. Tel Aviv University. Smart, R. 1966. "Subject selection bias in psychological research". Canadian psychologist 7. 115-121. Snedecor, George W. and William Cochran. 1967. Statistical Methods. Ames, Iowa: The Iowa State University. Sixth Edition. Snell-Hornby, Mary, Franz Pöchhacker and Klaus Kaindl, eds. 1994. Translation Studies An interdiscipline. Amsterdam/Philadelphia: John Benjamins Publishing Company. Stenzl, Catherine. 1983. Simultaneous Interpretation - Groundwork towards a Comprehensive D.Gile 1996 Obsvsexp page 15 Model. Unpublished M.A. thesis. University of London. Target 7:1. Special Issue on Interpreting Research. John Benjamins Library. Strolz, Birgit. 1992. Theorie und Praxis des Simultandolmetschens. Unpublished PhD Dissertation. Geisteswissenschaftliche Fakultät der Universität Wien. Thiéry, Christopher. 1975. Le bilinguisme chez les interprètes de conférence professionnels. Unpublished PhD dissertation. University of Paris III. Tirkkonen-Condit, Sonja, ed. 1991. Empirical Research in Translation and Intercultural Studies. Tübingen: Gunter Narr Verlag. Tirkkonen-Condit, Sonja and John Laffling, eds. 1993. Recent Trends in Empirical Translation Research. University of Joensuu. Faculty of Arts. Tirkkonen-Condit, Sonja and Stephen Condit, eds. 1989. Empirical Studies in Translation and Linguistics. University of Joensuu. Faculty of Arts. Tommola, Jorma, ed. 1995. Topics in Interpreting Research. University of Turku. Centre for Translation and Interpreting. Toury, Gideon. 1991. "Experimentation in Translation Studies: Achievements, Prospects and some Pitfalls". Tirkkonen-Condit, Sonja, ed. 1991: 45-66. Toury, Gideon. 1995. Descriptive Translation Studies and Beyond. Amsterdam and Philadelphia: John Benjamins Publishing Company. Treisman, Anne. 1965. "The Effects of Redundancy and Familiarity on Translation and Repeating Back a Native and Foreign Language". British Journal of Psychology 56. 369-379. Viezzi, Maurizio. 1989. "Information retention as a parameter for the comparison of sight translation and simultaneous interpretation: an experimental study". The Interpreter's Newsletter 2. 65-69. Williams, Sarah. 1995. "Observations on Anomalous Stress in Interpreting". The Translator 1:1. 47-64. Wilss, Wolfram. 1996. Knowledge and Skills in Translator Behavior. Amsterdam/Philadelphia: John Benjamins Publishing Company. Yoshitomi, Asako and Kiwa Arai. 1991. "Tsûyaku katei ni kansuru kisô jikken" ("A basic experiment on the interpreting process"), in Watanabe, Shoichi, ed. 1991. Mombushô josei kagaku kenkyû: gaikokugokyôiku no ikkan to shite no tsûyakuyôsei no tame no kyôikunaiyôhôhô no kaihatsu ni kansuru sôgôteki kenkyû (Report on a Project funded by the Ministry of Education: A Comprehensive Study of the Development of Interpretation Training Methodology as a Part of Foreign Language Training). Unpublished manuscript. Sophia University, Tokyo. 373-439. D.Gile 1996 Obsvsexp page 16