HD28 .M414 -11 ALFRED P. WORKING PAPER SLOAN SCHOOL OF MANAGEMENT Using the Literature in the Study of emerging FIELDS OF Science and Technology Michael A. Rappi Massachusetts Institute of Raghu Garud New York University Technology July 1991 Sloan WP# 3392-92 MASSACHUSETTS TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASSACHUSETTS 02139 INSTITUTE OF Massachusetts Institute of Technology Using the literature in the Study of Emerging Fields of Science and Technology Raghu Garud Michael A. Rappa New Massachusftts Institute of Technology July 1991 © York University Sloan 1991 WP # 3392-92 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Alfred P. Sloan School of Management Massachusetts Institute of Technology 50 Memorial Drive, E52-538 Cambridge, 02139-4307 MA Using the Literature in the Study of emerging Fields of Science and Technology: AN Examination of Researcher Contribution-Spans Michael A. RAPPA Raghu GARUD''' New Massachusetts Institute of York University TechnoloQ ABSTRACT This study describes how the scientific and technical literature can be used as a source of data to analyze the emergence of new fields. In particular, the literature is used to determine the length of contribution spans of individual researchers in afield analysis, the authors to the field for Through the use ofstatistical technicjues, such as survival examine the probability that a researcher will contribute a specified length of time having contributed and the probability that a researcher, for a specified period of time, will cease to The authors aLo test the significance of several to the field contribute in the future. variables in explaining researchers' contribution spans. introduction New of science and technology typically emerge fields individuals who common develop a set in the context of ideas and techniques and of a community of who communicate with each other about the nature of their work. Frequently, a portion of the communication between these individuals takes the form of conference presentations and which then become documented fullest in and in in the scientific and technical literature. conjunaion with other kinds of data, this used to its information can be extremely helpful improving our understanding of the dynamics of a research community. The following study, which serves as one illustration of this approach, uses literature to measure the length of time researchers remain statistically their survival longevity. In particular, it and hazard rates as well as some of the faaors associated with their examines the duration of the participation of individual researchers their first and is, contribution to the published literature in a given of the contribution-spans of researchers of: (I) in the field the time span between field. From an Michael Rappa is assistant professor management with analysis of cochlear implants, estimates are made the probability that a researcher's contribution-span will extend a given assisunt professor of the thereby determining through an analysis of their "contribution-spans": that last dau from in a field, in a field ' scientific papers, When number of of management with the Massachusetts Institute of Technology. Raghu Garud New York University. The is authors are indebted to Susan Bjarner for her assistance in conducting the literature searches and Gary Quick for his assisunce in creating the daubase. TTie authors also thank Joel Baum, Hayagreeva Rao and Robert Yaffee for helpful comments and guidance, and Edward Roberts for his support and encouragement. This study was funded, in part, with a grant from the Council on Library Resources. years, and having contributed a given number of years, a researcher (2) the probability that, will cease to number of contribute in the future. Furthermore, this paper examines the relevance of a individual-, organizational-, and community-related variables in explaining the duration of researchers' contribution-spans. Before elaborating and results of assessing the upon the concept of contribution-spans this study, a brief growth of new fields and discussing the procedures review of earlier studies that have used the literature in is in order. Prior Litrrature-based Studies ofErmrging Fields The use of the literature to understand the growth of a field has a long and well- established tradition. When comparative anatomy in Cole and Eales published 1917, they were among the literature in order to quantify the progress in a field.' truly a painstaking effort to compile their data. by the two The research in comparative result scientists, development of to seriously utilize the published Given the technology of the day, who examined was a detailed anatomy over three their study of the first nearly 6,500 books was account of the ebb and flow of statistical Cole and Eales centuries. it and papers clearly illustrated the prevalence and magnitude of cyclical changes in the level of publication activity that occur in a field over time. This finding led attributed, in part, to the The pioneering who them to suggest that such movement of individuals between effort waves of activity can be different research fields. of Cole and Bales was joined subsequendy by Wilson and Fred, published a study in 1935 of the growth and development of research on the subject of nitrogen fixation by plants.^ It is clear that value of the literature as something among scientists. "Study of the Wilson and Fred had an more than literature as a an intuitive grasp of the means of communicating research entity...," results they claim, "...should provide valuable information for the interpretation of past production and might afford for prediction of future trends" within a given field. ^ Like their predecessors, Fred showed empirical evidence of the cyclical nature of research within a some basis Wilson and field. Similarly, they suggest that the fluctuating level of publication aaivity in nitrogen fixation research was F.J. Cole and N.B. Eales, "The History of Comparative Anatomy, Part (1917): 578-96. I. —A Sutistical Analysis of the Literature," Science PmgrenU ^P.W. Wilson and LB. 41 (1935): 240-50. 3lbid., p. 240. Fred, "The Growth Curve of a Scientific Literature: Nitrogen Fixation by Plants," Scientific Monthly movement of a result of the would rise researchers between fields. with an important breakthrough and afterward to further experiments; problems too complicated to It it as it They claim anraaed new would subside 1956 used the physics led found the remaining solve. The one most literature would once again who responsible for this resurgence was Price, document the "exponential growth of literature to work brought legitimacy and scientists to the field ai scientists was not for another two decades that the study of the receive serious attention. that the level of aaivity to the scholarly examination of in science.""* Price's literature, partly because his conclusions coincided with, and indeed reinforced, the belief that the proliferation of science seemingly knew no bounds. More science and technology have become diverse. However, fairly in general, there have model the growth of a different in that recendy, literature-based studies of emerging fields of it field common. The emerged two nature of this research basic types of studies. is first quite seeks to by measuring annual publication volume. ^ The second is quite uses the citation as a unit of analysis.^ Citation-based studies seek to understand the interlocking nature of citation panerns as a development of clusters of researchers who may ultimately come field. The Both types of literature although attention to the studies cover a latter has wide range of topics means to for observing the form the in science basis for a new and technology, recendy led to a more intensive examination of the patent literature.^ The contributions of Cole and Eales, Wilson and Fred, and Price, have established a tradition of research that uses the literature to direction of a field. literature, their '*D.J. Price, However, work in among many assess the others, growth and having led the way in revealing the promise of using the has exposed "The Exponential Curve of Science," some of its limitations, as well. Perhaps the most Durat/fry 17 (1956): 240-43. 'For example, see A. Granberg, 'A Bibliomctric Survey of Fiber-Optics Research Institute, University of Lund, Sweden, 1985. in Sweden, West Germany, and Japan," Monograph, Research Policy °TTic notion of "bibliographic coupling" was proposed by Kessler in 1962. Since then, numerous "co-citation" studies have been published, including the pioneering work of Small and his colleagues. For a recent example, see H. Small and E. A variant of co-ciution analysis is co-word Greenlee, 'Collagen Research in the 1970s," Scienumutrics 10 (1986): 95-1 17. combination with co-citation) sometimes referred to as "science mapping." See P. Healy, H. Rothman, and P.K. Hoch, "An Experiment in Science Mapping for Research Planning," Research Policy 15 (1986): 233-51; R.R. Brahm, H.F. Moed, and A.F.J, van Raan, "Mapping of Science: Critical Elaboration and New Approaches." Informetha (1987/88): 15-28; and M. Gallon, J. Law and A. Rip (eds.) Mapping the Dynamics of Science and Technology (London: analysis (aJone or in Macmillian Press, 1986). 'For an overview, see B.L. Basberg, "Patents Research Policy 16 (1987): 131-41. An and the Measurement of Technological Change: A Survey of the is provided by W.D. Reekie, "Patent Dau as early study of patents Industrial Aaivity,' Research Policy 2 (1973): 246-64; for a more recent study, see Literature," a Guide to R,M. Wilson, "Patent Analysis using Online Databases: Technological Trend Analysis," World Patent Information 9 (1987): 18-26. troublesome criticism of prior studies is that the literature While indicator of the level of research aaivity in a field. offers a useful patent measure of alone statistics Although the scientific criticisms are valid, it would be unfortunate of the literature for the study of new literature a rich source of information. Many fields exploit the full potential of the literature. it beyond measures of publication and sophistication, enabling citations to take full advantage of the information in and of itself, investigators to go from treating the changing. Recent is possible to use the literature with volume and shifted of these studies to inability this situation greater ease is it contains. literature as something of molecular biology, literature to in which he this vein, a small fields. One pioneering effort a different approach, literature can be analyzed to major advances in clinical is Mullins' 1972 study takes advantage of the incidence of co-authorship in the understand the communication network that forms Taking among Comroe and Dripps show how researchers in the the content of the understand the contribution of long-term basic research medicine. ^ Yet another approach is uses the literature to identify individual researchers in order to manpower measured in research scholars have sought to use the literature as a source of data about research communities and the emergence of new field. ^ to be of data about what goes on to using the literature as a source communities, many new avenues of research become readily apparent. In number of interest of science and technology. The from the However, advances in computer hardware and software are making the perspeaive dampen they were to if field. of the limitations of previous studies are not necessarily inherent in the data, but rather they arise Once be true that the literature about the process of the emergence of a new in the use is a singular, coarse-grained is may and technological output, the focus on publication and us linle else tells it in different countries."^ Notice that in that of Spiegel-Rosing, compare the level to who of scientific each of these instances, the investigators went beyond numbers of publications and instead probed the literature for specific N.C. Mullins, 'TTie Development of a Scientific Specialty: The Phage Group and the Origins of Molecular Biology," Minerva 10 (1972): 51-82. Since Mullins, others have sought to use co-authorship data to investigate collaboration among researchers, including constructing a map of the inter-organizational linkages within a community. For example, see D. deB. Beaver and R. Rosen, "Studies in Scientific Collaboration," Scientorrutrus 1 (1978): 63-84; M.S. Sridhar, "A Study of Coauthorship and Collaborative Research Among Indian Space Scientists," R&D Mamgemmt 15 (1985): 243-49; and M.A. Rappa, "Assessing the Emergence of New Technologies: The Case of Compound Semiconductors," Management of Innovation, A. Van dc Ven et al. (Cambridge, Mass.: Ballingcr, 1989). -^J.H. Comroe, Jr. and R.D. Dripps, Two on the Support of Biomedical ScKntx," Science 192 (1976): 105-1 Authors as an Indicator of Scientific Manpower: Germanics and Europe," Science Studial (1972): 337-59. '^I.S. Spiegel-Rasing, 'Journal the "Scientific Basis for the in Research A 1 1. Methodological Study Using Data for information contained therein: that other, or the nature of their These studies have only is, about the people involved, their interaction with each work. begun to just within the literature. Journal articles, up the abundance of information that conference papers, and patents in a given is contained field represent a detailed, self-reported archival record of the effort generated by researchers to solve the and technical problems confronting them. Furthermore, the scientific appealing source of quality dau literature is in several respeas: the conventions of publication ensure a level an of and authenticity; the data can be collected unobtrusively; the findings can be replicated and tested for reliability; inexpensive to collea. Clearly, and longitudinal nature of the and the data are publicly available and would be very it difficult to relatively match the comprehensive scope literature using other data colleaion techniques. When taken together, the literature can be viewed as a unique chronology of the efforts of researchers to establish a new where they are when field, and can provide information with respect to the individuals involved, employed, who they were aaive in the they collaborate with, what problems they are pursuing, and field. This paper illustrates how such data can be used to obtain a better understanding of the longevity of researchers' contributions during the a new emergence of field. Researcher Contribution-spans The challenge to understanding the emergence of in part, a fields of science and technology is, problem of understanding why researchers choose the topic they do, and why they remain committed to that topic or leave is new that fields which those that are Furthermore, attract and less attractive it is it for something else. The fiindamental assumption retain researchers are likely to progress more quickly than and retain researchers. and are therefore less able to recruit reasonable to assume that researchers' decisions to remain with a field are influenced by their assessment of the rate of progress being made. Given this behavior, be interesting to use the literature to understand in a researchers' panicipation in a given field statistical and the faaors that may influence Determining the number of years a researcher spends working task. First, it requires manner some degree of historical it may the duration of it. in a field is not a simple investigation: if we simply ask everyone who currently works in a particular field how long they have panicipated, experience of numerous individuals who have already come and gone over the years. Second, the number of individuals who have participated in a widely dispersed geographically, thereby making it field we would overlook the can be quite large and can be very hard to ascertain the extent of their participation. Confronted with these constraints, literature as a source "contribution-span" who of data about — publications in a field that the is becomes appealing it number of years spanning an — can serve how panicipates in a field and for author's first to look to the Thus, the long. and last known unique and useful measure of the duration of their as a participation in a field. Using the statistical techniques of survival analysis, data on researcher contribution-spans derived from the literature can be used to estimate the probability that a researcher will remain in the community a given number of years (the survival rate). Furthermore, the data can also be used to estimate the conditional probability of a researcher leaving the contributing a given number of years survivor and hazard functions understood, is it field after However, once the distribution of the (the hazard rate). becomes pertinent to ask what faaors might influence researcher contribution-spans. Here again, the literature can offer some insights in terms of providing additional data regarding each researcher. For example, the length of an might be affeaed by: individual's contribution-span cumulative produaivity in the size of their organization's research effort in the they are employed and the among dispersion the the field; (6) field. factors its geographic location; number of other and researchers different organizations; Using data from the (1) their own or their organization's the size of their collaborative social network; (3) the field; (2) and field; (4) (5) the kind of organization in which the accumulated working amount of knowledge in the field evolutionary stage of development of (7) the literature, the statistical relationship researchers' contribution-spans will be in and the extent of their between each of these examined. The Cochlear Implants Field The but it is field of cochlear implants was chosen just one of several fields that the program on the assessment of emerging of the electrical stimulation as means for enabling individuals result illustrative case for the present analysis, areas of science and technology.'' Although studies of the ear have a long history, researchers began the systematic investigation of One an authors are studying as part of a larger research how it was not until the elearical stimulation 1950s that might provide a with sensorineural deafness to gain some sense of hearing. of this research was the advent of the cochlear implant, a device that uses elearical stimulation of the cochlear to provide a sense of sound for profoundly deaf individuals.'-^ For a full account of the historical development of cochlear implanti, see R. Garud and A.H. Innovation and Industry Emergence: Tlie Case of Cochlear Implants," in Van de Ven, ct al., Van de Ven, "Technological Management of Research on the Innovation (Cambridge, Mass.: Ballingcr, 1989). A cochlear implant a receiver that is is composed of a microphone, surgically implanted behind the ear signal processor, just and transmitter worn on the outside of the body, and under the skin cither into the cochlear or placed around it. For a One of the first trials by William House, a of a rudimentary cochlear implant device was performed clinical physician, who center for cochlear implant research. At later first, founded the House Ear device to lack a scientific basis and believed that experimentation. It implants emerged was not it Many A as a viable research field. field: Otological Society meeting (a workshop held were held as part group that was adamantly opposed As a result of these meetings, a limited and cochlear in 1973 that of the American to the topic in previous major international conference on the subject and a workshop California, San Francisco. major in cochlear was too premature for human flurry of meetings including, a 1961 considered the until the early 1970s that the controversy subsided galvanized the burgeoning years), a Institute, a few researchers took an interest implants, largely due to the intense controversy the field encountered. in at the University of number of clinical trials began. '3 One study of particular importance is Bilger's work, initiated in 1975, on the comparative performance of cochlear implants.''* Funded by the U.S. National Institutes of Health and released in 1977, the "Bilger Report" showed that although the early claims regarding performance were exaggerated, cochlear implants did indeed have merit. In conclusions, the report lent the door for a greater some sorely needed legitimacy to the field number of researchers be seen clearly in terms of the number of to participate. This growth its and thereby opened in participation can researchers contributing to the literature (See Figure l).^^ It is estimated that by 1990 there were about four-hundred individuals active in the who were employed in nearly one-hundred different organizations worldwide. About percent of the cochlear implant progress has been made over community is field, forty- located in the U.S. Although tremendous the years and about three-thousand patients have received the detailed description of cochlear implants see: Gerald E. Loeb, 'TTie Functional Replacement of the Ear" Scientific American, 225(2):104-n. 'Proceedings of the First International Conference on the Electrical Stimulation of the Acoustic Nerve as a Treatment for Profound Sensorineural Deafness in Man, San Francisco, California, 1974 (edited by M.M. Merzenich. R_A. Schindler and F. Sooy); Report on a Workshop on Cochlear Implants held at the University of California at San Francisco, October 2325, 1974 (edited by M.M. Merzenich and F. Sooy). ^R.C. Bilger, et al., 'Evaluation of Subjects Presently Fitted With Implanted Auditory Prostheses," Ann. Otol. RhinoL LaryngoL, (Suppl. 38) 86 (1977): 3-10. -'The number of entering the field researchers in the (as evidenced by an community initial in a given year is calculated to be the cumulative publication) subtracted by the cumulative the field (as evidenced by their failure to continue to publish in a future year). number of number researchers who have left individuals r cochlear devices, the cochlear implant remains largely an experimental procedure with many challenging problems to be overcome.'^ 400 —— — — — — — ——— — — —— — — ' I I I I r~i I I I I ' I I I I I 90 Yea FIGURE 1: Growth of Cochlear Implant Research Community, 1973-89 DATA COLLECTION AND METHODS Five commercial electronic databases covering the medical science and engineering literature were used to identify publications related to the field daubases were searched on-line using a in the lexicon classification set of key terms that are of cochlear implants.''' The known of cochlear implant researchers and might be either terms of a document. The by the National Library of Medicine to be commonly used in the title, abstract or search strategy was derived fi'om an earlier one used to create a research bibliography on cochlear implants.'^ Several researchers have provide historicaJ accounts of the field of cochlear implants and its technical evolution. For example, see W.F. House and K.I. Berliner, "Cochlear Implants: From Idea to Clinical Practice," in H. Cooper, ed.. Practical Aspects of Cochlear Implants (London: Taylor and Francis) in press; and F.B. Simmons, 'History of Cochlear Implants in the United States," in R-A. Schindlcr and M.M. Mcrzenich, eds. Cochlear Implants (New York: Raven Press) 1985. ' 'The databases used are MedJine, Compendcx, was reduced to K. Patrias 1 Medica and INSPEC. A total of 3064 documents were implementing an on-line duplication removal process, the database Biosis, Exerpta originally identified as related to cochlear implants. After 884 documents. and R.F. Naunton, National Library of Medicine, Current Bibliographies in Medicine: Cochlear Implants, DC: U.S. Department of Health and Human Services, National Public Health January 1983-March 1988 (Washington, Service, National Institute of Health, 1988). The cochlear implant documents retrieved electronically from the search were temporarily placed in a bibliographic relational database operating on a personal computer. This allowed for a careful inspection of each document order to ensure the accuracy and of the search procedure. Since multiple source databases were used, integrity to in it was necessary remove duplicate documents. In addition, while inspecting the database, an made remove to misclassified documents effort was that did not pertain to cochlear implants as well as those that did not have a research orientation, such as editorials or journalist documents. In the process of inspeaing the documents, any that seemed inappropriate were flagged, so that an individual active in cochlear implant research could make the final judgment as to its relevance. The data collection procedure described above ultimately resulted in the identification of 1,329 unique documents related to cochlear implants published between 1973 and 1989.'^ The data subsequently used in this study were derived from these documents. However, before the documents could be used as a source of data, they required extensive editing in order to create a consistency case that the especially among author names and name of an author when using misspellings, but different mosdy this is or an affiliation is affiliation names. It is frequently the not standardized across documents, commercial databases. Sometimes this arises because of the result of variations in the use of abbreviations, middle hyphenations, and other sources of inconsistency.^^ Although such a initials, capitalizations, lack of standardization might not be a problem for the typical user of an electronic literature database, it would be a major source of error in determining the duration of researcher contribution-spans. Therefore, author and affiliation in it was essential to meticulously the relational database so that all inspea the name of each inconsistencies could be eliminated. Upon completing the editing of the documents, the database was used individual author who procedure yielded a total contributed to the field of 1,257 authors. At to identify each over the seventeen-year period. This this stage, a statistical database was created containing several individual-, organizational-, and community-level variables for each '-T^otc that the electronic databases do not provide any indication of the wori< in cochlear implants prior to 1973- This due It in part to the fact that also is due many eienronic services. Unfortunately, this ^he literature daubaie to the lack of publication of the pre-seventies prevalence of author is services did not begin until the late sixties work in one limitation of the electronic databases name inconsistencies is examined by Research," National On-Line Meeting (10th) Proceedingi, May M.L 9-11, and is early seventies. cochlear implants in journals abstracted by the database 1 that is extremely difficult and cosdy to circumvent. Pao, "Importance of Quality Data for Bibliomctric 989. 10 author that were derived from information obtained from the published documents. Table provides a of the variables and their definitions. list The dependent number of variable for the analysis, the contribution-span, years that have elapsed from the some methodological is is these individuals have not yet account for publication for each issues that arise that require further explanation. TTie that for those researchers is calculated as the is relatively straightforward, there are who aaive are left the field, some minimum it is value (that this, survival analysis statistics is, indeterminate: that is known only primary issue of cochlear implants in the field present time, the ultimate length of their contribution-span contribution-span known to the last first author. 21 Although calculating the contribution-span concern 1 at the is, since that the length of their the entry year to the present year). were implemented in of To analysing the data.^^ Such techniques take into consideration precisely this kind of problem in the calculations with a procedure that adjusts for the biases that right-censored data Determining whether or not certain cases. The reason for this a researcher is is still create. aaive that typically researchers in the field can be difficult in do not publish every and year, indeed, an author's contribution-span in a field can be characterized by "gaps" of several which there years in duration in no publications are discontinuities in publication records raise the issue of to their credit. how publish in order to be considered an active contributor to the field. nature and prevalence of gaps in a researcher's contribution-span the proper censoring scheme to use in the analysis. someone answer to ceases to publish this question is is it The The existence of frequently a researcher must is Thus, understanding the important question arises: in determining How long after reasonable to assume they are no longer in the field? necessary in order to determine who has exited the field and The who continues to be a panicipant. Normally, the time between publications can be quite long. Therefore, a rule is after the last publication in order for exited the field. An may be a year or two, but in some instances required to determine it how many to be reasonable to classify a researcher as having analysis of the frequency of gaps between publication in researcher contribution-spans provides the evidence on which to base this decision (see Figure For example, be caJculated Note if a researcher as six years. first is 2). The in 1975 and last published in 1980, the researcher's contribution span would assumed that a researcher who publishes in only one year has a span of one year. published Furthermore, that the contribution span it years should transpire it is unaffected by the frequency of publication within a given year. ^^Sce R-C. Elandt-Johnson and N.L. Johnson, Survival Modeb and Data Analysis (New York: John Wiley and J.D. Kalbfleisch and R.L. Prentice, The Statistical Analysis of Failure Time Data (New York: John Wiley & Sons, 980) & Sons. 1980). 1 11 cochlear implant data indicate that 242 (19%) researchers have a gap in their contributionspans. Among those who have a gap have a gap of three years or researchers who less in in their contribution-span, four duration. Based upon out of this evidence, published within the past three years of the last it five researchers was decided that year of the data set (1989) would be censored. 2 1 3 4 3 Duration of Gaps FIGURE Only 2: a small than three years 6 in 7 9 8 13 12 Contribution Span (Years) Frequency of Gaps Between Publication in Researcher Contribution-spans number (3.7%) of all in duration, and still researchers have a gap between publications of longer fewer have an exceptionally large gap, such as ten years or more. Although rare, these "sparse" contribution-spans who n 10 may does not in fact continuously contribute to the be indicative of an individual field; and thus, although contribution-span might be quite long, their aaual participation in the field is far less their than the span implies. Because of their relative rarity in the present data, researchers with sparse contribution-spans were not treated advisable to address this issue more as special cases in the analysis. Having determined the distribution of contribution-spans, what faaors might affea how long a researcher contributes a number of explanatory could be treated as Nevertheless, it is closely in future studies. variables were constructed. time-varying covariates (that is, as it is interesting to to the field. Using the Although the explanatory examine literature, variables having values that vary yearly in the course of an author's contribution-span), the present analysis implements a "single-spell" 12 approach to formulating the data taken according to the last Therefore, the value for each explanatory variable set.-^ year in the author's contribution-span. In this manner, several variables were created to control for certain factors that among is coded according to whether they reside in might account for heterogeneity which each author researchers. First, the kind of organization in employed was is an academic, industrial, government, or independent research laboratory.^'* Second, the country in which the author located was is coded. Since nearly half of the authors were based in the United States, to simplify the analysis, a location variable was coded as to whether the author resided Third, a variable was created to determine whether the author in the U.S. or not. the field during the left "pioneering" phase (considered to be prior to the publication of the Bilger Repon), or during the "rapid growth" phase that occurred from 1978 onwards. measure of the accumulated knowledge in the field, as The control variable last is a determined by the cumulative number of publications. Additional explanatory variables were created, which are grouped according to their level of analysis: individual-, organizational-, and community-level. At the individual-level, a variable was constructed to reflect an author's cumulative productivity in the field as measured by the cumulative number of publications to their credit. In addition, a variable was created embedded to reflect the extent to collaborators. This network is which an author with which an author has been associated with Two number of The size as a implant publications by individuals Three community-level of an author's organization is measured affiliated and produaiviry of an variables individual authors who analysis that as who was created were created to in measured in terms of also publish in the field. the cumulative is reflect the size measured publish in the field in a given year. size, is number of cochlear with an author's organization. cochlear implant field in each year. Population size square of population network of co-author on publications. individuals affiliated with that organization Cumulative organizational productivity ^An in a larger social organizational-level variables were created to reflect the size author's institutional affiliation. the is measured by the cumulative number of unique individuals A in and dispersion of the terms of the number of second-order variable, the order to capture any quadratic association between implements time-varying covariatcs would require a muitiple-spcli data structure. This wor^l is currently being performed and the results will be described by the authors in a future report. 24 It is important to note that one unfortunate limiution of electronic literature databases affiliation data for the lead author only. Therefore, in organization may be misclassified. some instances secondary authors who is the tendency of providing are affiliated with a different 13 population size among authors and contribution-span. The third variable different organizations: that is a measure of dispersion of the extent to which the is, community is concentrated in a few organizations or spread across many. For this purpose, a Hirfindahl determined by calculating the sum of the squared share of researchers statistic, which affiliated with each organization, is in used. RESULTS Using data from the scientific and technical literature on cochlear implants published between 1973 and 1989, the contribution-spans for 1,257 researchers and several explanatory variables associated with each were compiled into a database. statistical The were analyzed using the LIFETEST and LIFEREG procedures of SAS (version 5.18). 1,257 cases, 700 (55.7%) were active within three years of the last year of the data, data Of the and were therefore classified as censored. Using the LIFETEST procedure, the first step in the analysis estimates of the survival and hazard functions for the data. chosen. The results of this procedure are illustrated negatively-sloped and non-monotonic. sample leave the field within The median five years on in Figure was 3. make non-parametric to The lifetable The approach was survival function survival time in 5 years: that their first publication. researcher's contributions-span lasting longer than two continues to diminish rapidly between two and six years, years is The is, half the probability of a about 0.6. The survival and then is rate levels-off eventually reaching a value of about 0.33 for contribution-spans often years or more. The hazard rate is also a negatively-sloped decreases very rapidly for researchers is, is who non-monotonic function. The hazard have contribution-spans of at least two the probabilit)' of a researcher ceasing to contribute after having contributed for two years only about 0.08 compared to about 0.25 for a researcher in the field only one year. hazard rate subilizes for researchers whose contribution-spans are between two and and than diminishes rapidly between hazard ftinaion is rate years: that to leave it, is is and twelve years. The basic implication of the that the longer a researcher contributes to the field, the less likely he or she with the leaving the field six The six years, first and sixth years being particularly critical points. Indeed, the risk of highest within the first year. 14 1 1 1 r - 0.2 _i i_ I 4 2 6 8 10 12 14 16 18 Year FIGURE Non-parametric estimation of survival and hazard functions for researcher 3: contribution-spans in cochlear implants. The next step in the analysis was to determine the parametric model that best fits the distribution of contribution-spans (or, in the terminology of survival analysis, the failure time). Although non-parametric analysis permits cenain assumptions that the shape of the failure distribution (for instance, that was decided to basic statistically model adopted examine for the analysis Y is variables, several different distributions for = Xp P a veaor of unknown regression parameters, a X is vector of errors from an assumed distribution. This model accelerated failure time goodness of fit. it The ae + the log of the contribution-span (the failure time), is can be made about non-monotonic), nonetheless is: Y where it is model because the effect is the matrix of explanatory a scale parameter is and e is a often referred to as an of the explanatory variables to scale a is baseline distribution of failure times. Specifically, four different types of distributions were evaluated: the exponential, Weibull, procedure are provided using a in Table 2. gamma, and TTie parameters are estimated by Newton-Raphson algorithm. The likelihood fiinaion. log-logistic distributions. overall Minus two times the fit of each model is The results maximum As is likelihood represented by the log- log-likelihood value has a y} distribution with appropriate degrees of freedom. Using the baseline model, the goodness of distribution of this fit for each evaluated in term of minimizing the absolute value of the log-likelihood score. a result, the log-logistic distribution was chosen and became the regression coefficients of the explanatory variables in the model. basis for estimating the 15 The model was estimated with the LIFEREG procedure in a sequence of adding explanatory variables into the equation according to the control variables are added variables, first, and the individual (see Table 3): followed by the population variables, the organization variables. Model 1, of -1299. variables, has a log-likelihood score variables has the effea of analysis level six steps by- which did not include any explanatory The addition of each set of explanatory of improving the log-likelihood score, such that Model score of -295, was chosen as the baseline for comparing other models 6, which has a understand the to efFea of each explanatory variable. The estimation results of Model 6 are provided in Table 3. Among the dummy variables that control for organization type, only the variable that distinguishes research institutes from other types of organizations researchers employed is significant (p<.05 level). Its positive coefficient suggests that in research institutes have longer contribution-spans researchers with other types of affiliations. researchers represent only a small However, as compared to should be noted that industrial it segment of the sample (3%); the largest segment being academic (58%). In addition, the is non significant, dummy variable that distinguishes between US implying that the contribution-spans of than their counterparts in other countries. The remaining who were The who were aaive significant right from The their initial inclusion in accumulated number Model and second-order population terms 3. (The third small (< 250 it The (see Figure 4). researchers), population size spans and increasingly so until variable, dispersion, combined with the is are not positive term implies a U-shaped relationship between population and researcher contribution-spans is first- negative coefficient for population size coefficient for the second-order community growth phase. ^5 in the field, the longer the contribution-spans. In the case of the population variables, the significant.) negative sign of the in the coefficient of cumulative publication suggests that the greater the of publications size level). active prior to the field's legitimacy (before 1978) have shorter contribution-spans than those The and non-US researchers researchers are no different control variables, the phase and cumulative publication, are both highly significant (p<.001 phase variable implies that researchers US reaches a size of about is data indicate that when the negatively related to contribution- 250 individuals; at slope of the curve turns positive. This result suggests that there may which point the be a point of critical •^'Upon careful rcflcaion, the true effect of this variable is difficult to asccruin given the structure of the present analysis. Understanding the issue of phases of growth in the community might be more appropriately addressed with the use of timevarying covariaces. 16 ma5S for the communiry, at which The increasing contribution-spans. and has a positive coefficient. its size is become sufficiently large to poplation dispersion variable significant in also significant (p<.001) is Thus, the more disperse researchers are among organizations, the shoner are researcher's contribution-spans. 200 100 Population FIGURE 4: in Model 5, variables, and are both non 400 size Relationship between population size The second group of explanatory added 300 and researcher contribution-spans which examine organizational-level significant at first. Model In factors are 6, organizational size is weakly significant (p<.05) and has a negative coefficient. This suggests that the larger the number of researchers working researcher's contributions. significant. The in cochlear The cumulative implants in one's organization, the shoner are a productivity of a researcher's organization of significance of organization variables relative lack continued significance of the population variables: opposed to a researcher's own it is is is not noteworthy given the the size of the entire community, as organizational effort, which is the contributing factor to the length of contribution-spans. The last two explanatory variables, the cumulative number of co-authors and an author's cumulative produaivity, are highly significant (p<.001 expected, that researchers who accumulate contribution-spans. Perhaps more individual co-authors a researcher The data show a greater interesting, is is level). The data indicate, number of publications will as might be have longer the significance of the cumulative number of associated with in the course of their contribution-span. that researchers with a larger network of co-authors will have a longer contribution-span. Since the co-author network can be viewed as a measure of the extent to which a researcher is socially embedded into the research community as a whole, this result 17 suggests that the degree to their peers in the which researchers community is a contributing are conneaed in collaborative aaivities faaor to their longevity with in the field. CONCLUSION Using the of data, this paper provides an analysis of the contribution-spans of researchers in the emerging field of cochlear implants. Non-parametric literature as a source estimates of the survival rate and hazard rate are made, and it is found that the distribution of contribution-spans follows most closely a log-logistic function. In addition, a model of the The relationship between the hazard and a set of explanatory variables statistical examined. is findings of this analysis indicate that the sample survival and hazard ftinaions for 1,257 cochlear implant researchers are negatively-sloped and non-monotonic. About 60- percent of the researchers have a contribution-span of more than two years, and the estimated median contribution-span contribute to the field is it The five years. risk first The hazard rate year and remains fairly constant until the sixth year, after continues to decline rapidly. This result might have an explanation in a behavioral characteristic of the research process. For example, a researcher new of a researcher ceasing to greatest in the first year of their contribution-span. declines sharply after the which is might be drawn to enter a by the prospect of making an important contribution and by the possibility of field attraaing the resources to underwrite the cost of his or her work. Barring success at either, the researcher might exit the field shortly thereafter for researcher can initially "survive" in the field, a research in place it would require fertile territory. several years of data colleaion and analysis. By But if established; the once the sixth year, a reached, at which time the success of the program leads the researcher to critical point remain in the field is more program might be with linle probability of ever leaving, or lacking success (or intellectual interest), the researcher might decide to move in an alternative research direction. After controlling for organizational type, geographic location, and the field's phase of growth, it is found that the size of the community, the community's organizational dispersion, the researcher's productivity, statistically significant variables in and the researcher's collaborative TTie size of a researcher's organization (in terms of the same field) is found produaivity in the to be only field is network are explaining the length of a researcher's contribution-span. number of researchers working in the weakly significant, and the organization's cumulative found to be not significant. 18 These findings provide support to the often-cited importance of the existence of critical mass within a field (that the field is, rate of progress to be sustained is membership sufficiently large in minimum community', the data indicate a perhaps most interesting, is to enable a healthy by researchers). In the case of the cochlear implants about 250 researchers. effective size of that critical mass pertains to the same within the same number of organization important than the number of researchers working in the The is less working significance of a researcher's co-author network is the in is as organization: the researchers What community field opposed to the whole. field as a also interesting in this regard, since it underscores the importance of the researcher being connected to the larger research community The in collaborative activities. present analysis has certain limitations, based studies and some of which in this research to understand the process of a is are more generic some of which in nature. new field's are peculiar to literature- Given that the primary emergence, it will interest be necessary to construct a data set that implements a time-varying covariate data structure. Such an approach will permit a more careful scrutiny of the dynamic phenomena that affect a researcher's contribution-spans. Furthermore, this whether or not changes future momentum in of the the hazard rate of a field. It is approach will community can also necessary to allow for the examination of serve as an indicator of the determine the extent to which the present and future findings from the cochlear implants field can be generalized to other fields of science and technology and to examine the importance of other explanatory variables in understanding contribution-spans. Work is currently underway to address these issues. First, a preliminary investigation suggests that data from the literature is structured in such a manner that time-varying covariates should be feasible to create. Second, data sets for ten additional fields are currendy being construaed, with national and fields varying in terms of their size and disciplinary composition, the sectoral distribution of their researchers, their commercial impact, and the degree to which they have succeeded in becoming well-established, institutionalized research communities. Third, further studies will be supplemented with other data, derived both ft-om the literature and from other sources. For example, data from the cochlear implant literature is currendy being gathered with respect to the nature of the work being conduaed by each researcher (such as basic research, applied research, development, or clinical research), and the particular "technological trajectory" each researcher versus multi-channel device). Other kinds of is pursuing (such as a single-channel data, such as annual funding levels extent of market commercialization, are being sought as well. and the 19 TABLE 1 Variables used in the analysis CATEGORY and their definitions 20 TABLE ML 2 ESTIMATION OF CONTRJBUTION-SPANS USING DIFFERENT DISTRIBUTIONS 21 TABLE ML 3 ESTIMATION OF CONTRJBUTION-SPANS: LOG-LOGISTIC DISTRIBUTION bb 204 MIT LIBRARIES DUPL 3 TOaO OQVSbTie T Date Duef-s-'iz. JULO6«08 Lib-26-67 MIT LIBRARIES 3 IDflO DD75bT12 T