ms^m HD2R .M414 ALFRED P. WORKING PAPER SLOAN SCHOOL OF MANAGEMENT Technological Progress and the duration of contribution spans Michael A. Rappa Massachusetts Institute of Technology December 1991 Raghu Garud Ko«nraacl Debackere Rijksuniversiteit Gent New Sloan York University WP # 3393-92 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE. MASSACHUSETTS 02139 Massachusetts Institute of Technology Technological Progress and the duration of contribution spans Michael A. Rappa Massachusftts Institute of Technology December 1991 Forthcoming New Sloan in Technological Forecasting 1991 Raghu Garud Koenraad Debackere Gfnt Rijksuntversiteit and Social Change, York University WP # 3393-92 Vol. 42, MASSACHUSETTS INSTITUTE OF TECHNOLOGY Alfred P. Sloan School of Management Massachusetts Institute of Technology 50 Memorial Drive, E52-538 Cambridge, 02139-4307 MA No. 4 (199*) ;M 9 1992 Technological Progress and the Duration OF Contribution Spans Michael A. Rappa, Koenraad Debackcre and Raghu Garud' December 1991 ABSTRACT This study uses the scientific and patent literature as a source of data analyze the relationship between author!inventor contribution spans the rate to and of technological progress in two chemical fields. Using survival examine the probability that an individual analysis statistics, the authors of time and the probability a specified period of contribute in the future. The authors aLo test the will contribute to the field for a specified length that an individual, having contributed will cease to time, significance of several to the field for covariates in predicting the length of contribution spans. introduction Predicting the rate of technological progress within a given field problem for those individuals Ideally, if who are responsible for the allocation of scarce resources. managers and government policy-makers had them indicators to enable resource allocation to predict the rate would be of technological progress in a is no small an array of field, optimal and policymakers to direction of technological advancement will largely determine their firm's or nation's competitive performance century. This at their disposal assured. Indeed, the ability of managers comprehend the pace and the an enduring is task. Historical in world markets into the next accounts of industrial evolution, such as with the development of semiconductors, videocassette recorders, and personal computers, show the immense some difficulties firms encounter when conft-onted by new technologies [1 , 2, 3]. 'Michael Rappa is an assisunc professor with the Massachusetts Institute of Technology, Sloan School of Management. Koenraad Debackere is a Fulbright post -doctoral research fellow at MIT, and a research associate with the Vierick School, Rijkuniversiteit Gent. Raghu Garud is an assisunt professor with the Stem School, New York University. The authors thank Hans van Cool, Jef Vincent and George Evens for helpful comments and assistance. This study was funded, from ICM in part, with a grant from the (Brussels). Preliminary results International Meeting held in Brussels, of DSM Research. Dr. Debackere 's research was supported by a fellowship this research were presented Belgium, June 24-26, 1991. at the Decision Sciences Institute First Undoubtedly, there new is technologies emerge. an obvious need to enhance our understanding of the way in which To end, different methodologies, ranging from qualtitative this case studies to sophisticated quantitative forecasting models, have been developed [4, 5, 6, 7]. These developments have established technological forecasting discipline in own its right. However, the many and the often limited usefulness of forecasts characteristic of technological pitfalls outcomes have been noted with striking their As a consequence, technological regularity [8, 9, 10]. forecasters face a one hand, indicators of technological progress have often been intense global competition in industry technology have One made approach to the search for such indicators forward. problems The On illusive. all the more who solve the for science urgent. and reassessment of a particular rapid, researchers will be more likely to stick reap the rewards of their work. Conversely, may The its to formulate their own field, opinions field. Based researchers will decide (or Simply stated, if progress is with their research agenda long enough to if progress is more slow, researchers will be out more promising research areas where they can apply their energies. As a be possible to ascertain the relative rate of progress in a the duration of the contributions shaping is basic rationale for this perspective follows that in the process of solving their assessment it field problems that enable the technology influence the decision) where to best apply their energies. 2 result, and are responsible for creating progress; at the technological frontier, researchers continually likely to seek the the other hand, regarding the rate of progress and the probability of success within a particular upon On understanding the rate of technological progress within a given namely, the scientists and engineers move dilemma. and constrained government budgets on the aaions of those individuals who to focus an academic as made by those individuals who field by examining are actively involved in technological progress. following study focuses on how long researchers persist in a the relative rate of technological progress. Scientific and patent field in order to assess literature are used as a source of data to measure the length of time that individual authors and inventors ^This basic rationale has been a recurring theme scientists as "investors of credibility.' That is, among many scientists sociologists of the sciences, who have come to view are likely to invest their credibility in those specialties they hypothesize the probability of reaping rewards to be the highest [11]. where contribute to the field and to determine statistically the survival and hazard rates as well some of the factors associated with panicipation of individuals a field: that is, is longevity. examined through an the time span between their and first 3 In panicular, the duration of the analysis of their "contribution spans" in last an analysis of the contribution spans, estimates are paper or patent contribution. made author/inventor's contribution span will extend a given probability that, having contributed a given to contribute in the future. ofi (1) as years, and number of years, an author/inventor (2) the will cease Furthermore, the relevance of a number of covariates two stereoregular polymers — EPDM and polypropylene (a plastic) as on the independent evaluation of individuals and polypropylene 5 The —were choice of selected EPDM (a was based solely chemical industry that these fields comparative in the in examined.'* is the technical fields for comparative analysis in this study. synthetic rubber) From the probability that an number of prediaing the duration of author/inventors' contribution spans Catalysts for as test cases experienced markedly different rates of technical progress over the past several decades. Indeed, the historical record reveals that since the discovery of the Ziegler-Natta catalyst process in the 1950s, ft-om which both processes are based, the rate of catalyst development in each field diverges significantly. In the case of polypropylene, four major breakthroughs have lead to the development of three well-defined generations of EPDM catalysts have evolved less rapidly. the past fifteen years no radically EPDM new are generally considered to be No major breakthroughs have occurred, and processes have been developed. first some may insight researchers is ^For lack of a better term, we and polypropylene of the rate of technological will refer to these individuals as author/inventors in order to erKomp>ass individuals scientific publications or patents. 5Tlie historical development of must be noted used in discussed in [12]. whose contributions may include *It catalysts be gained into the question of whether or not the duration of their participation in the field can serve as an accurate indicator ^Tlic methodology The in generation technology.^ By comparing the contribution spans of EPDM researchers, catalysts. In contrast, EPDM and polypropylene is described in [13, 14, 15]. that although the fields realized different rates of technical progress in catalyst development, this does not imply that one field is necessarily unsuccessful in a successes, although polypropylene has commercial sense. Both become much more widely used. fields arc considered commercial progress. It is expected that EPDM researchers would be more likely to leave their field sooner than their counterparts in polypropylene. DATA COLLECTION AND METHODS Commercial electronic databases were used to identify patents related to the fields of EPDM and polypropylene were searched on-Une using a set of lexicon of author/inventors and might be The of a document. catalyst key terms that are either in the development. The databases known title, to be commonly used searches resulted in the retrieval of 1,383 polypropylene- of documents retrieved were patents in the abstraa or classification terms EPDM-related patents and publications between 1955 and 1989. In each EPDM and publications and 613 case, the majority (60% of the polypropylene documents and 78% of the documents). The documents were retrieved and were temporarily placed electronically in a bibliographic relational database operating on a personal computer. This allowed for a careful inspeaion of each document in order to ensure the accuracy and integrity of the search procedure. Since multiple source databases were used, it was necessary to remove duplicate documents. In addition, while inspecting the database, an effort was remove misclassified documents that did not pertain to EPDM made to or polypropylene catalyst development. In the process of inspeaing the documents, any that seemed inappropriate were flagged, so that an individual active to its in these areas could make the final judgment as due to relevance. Furthermore, to avoid problems of incompleteness in the later years patent lags, as well as in the earlier years, the databases were constrained to the period fi-om 1960 to 1986. The data collection to described above ultimately resulted in the 613 EPDM and 1,314 polypropylene patent applications and publications 1986. The contribution span data subsequently used in this study were identification of from 1960 procedure derived from these documents. However, before the documents could be used as a source of data, they required extensive editing in order to create consistency among author/inventor names and the their affiliation names. It name of an author frequently the case with commercial databases that is or an affiliation is not standardized across documents. Sometimes the inconsistencies arise because of misspellings, but mostly they are the result of variations in the use of abbreviations, middle initials, capitalizations and hyphenations. Although such a lack of standardization might not be a problem for the typical user of an electronic literature database, it would be a major source of error author/inventor contribution spans. Therefore, name of each author and could be eliminated. literature databases is A it was in determining the duration of essential to meticulously inspect the affiliation in the relational database so that all inconsistencies particularly unfortunate complication specific to the patent the frequent absence of inventor names from corporate patents. This required the use of multiple databases and the cross-checking of patent numbers in order to obtain the missing data. Upon completing the editing of the documents, the database was used to identify each who author/inventor procedure yielded a contributed to the total field over the twenty-seven year period. This of 3,280 individuals. At was this stage, a statistical database created containing several covariates for each author/inventor that were derived from information obtained from the published documents provides a list as well as number of years variable for the analysis, the contribution span, that have elapsed from the first to the last each author.7 Although calculating the contribution span some methodological issue of concern is is is only is known is calculated as the patent or publication for relatively straightforward, there issues that arise that require ftarther explanation. that for those author/inventors polypropylene during the span last who that the length of their contribution span in EPDM and year of the database, the ultimate length of their contribution entry year to the present year). implemented The primary are active in the fields of indeterminate. In other words, since these individuals have not yet known 1 of the variables and their definitions. The dependent arc other sources. Table To account for analyzing the data [16, 17, is some minimum value this, survival analysis 18, left 19]. the field, (that statistics Such techniques take is, it the were into ^ For example, if a researcher first published in 1975 and last published in 1980, the researcher's contribution span would be caJcuiated as six years. Furthermore, it is assumed that a researcher who publishes in only one year has a span of one year. Note that the contribution span is unaffected by the frequency of publication within a given year. consideration precisely this kind of problem in the calculations with a procedure that adjusts for the biases that right-censored data create. Having determined the distribution of contribution spans, what might affea faCTors literature, a how it is interesting to long an author/inventor contributes to the varying covariates (that is, as Using the field. number of covariates were construaed. Although they could be examine treated a5 time- having values that vary yearly in the course of an author's contribution span), the present analysis does not implement such an approach to formulating the data Therefore, the value for each covariatc set. is taken according to the year of the last author's contribution span. In this manner, several covariates were created, including two dummy variables to control for factors that population examined. coded according to First, might account for heterogeneity within the the kind of organization in which each author whether they are employed in is employed was an industrial or non-industrial academic or government) research laboratory. Second, the country in (i.e., which the author is located was coded, and a covariate was created to signify whether the individual's affiliation located in a Western industrial country or in an Eastbloc country. is Additional covariates were created which reflect individual, organization population anributes. At the individual-level, a covariate was constructed to author's productivity in the field as measured by the cumulative and patents produaivity the field — to their credit. — An number of reflect or an publications organizational-level covariate was created to reflect the or what might be considered as an organization's cumulative investment in terms of the cumulative number of patents and publications assigned to the in author/inventor's affiliation. Three population-level covariates were created each field in who publish or patent in the field in a given year, Figure 1. A each year. Population size is measured to reflect the size in as terms of the and dispersion of number of individuals described below and illustrated in second-order covariate, the square of population size, was created in order to capture any quadratic association between population size and contribution spans. covariate that is, is a The third measure of dispersion of authors/inventors among different organizations: the extent to which the population is concentrated in a few organizations or spread many. For across this purpose, a Hirfindahl concentration sum of calculating the organization annually, Lastly, market aggregate is determined by to reflect the maturity of the field. EPDM The produaion of Western Europe, Japan and the U.S. is the cumulative first is and polypropylene produced annually). Because world produaion are not available over the entire twenty-seven maturity covariate each used. is terms of kilotons of statistics for total which the squared share of author/inventors affiliated with two covariates were included size (in statistic, used instead. 8 is number of patents granted years, the The second in each field. RESULTS Using data from the scientific and patent literature on EPDM and polypropylene published between I960 and 1986, the contribution spans for 3,280 author/inventors and several explanatory variables associated with each were compiled into a database: 3,280 2267 (69%) cases, individuals in polypropylene and 1013 739 (22.5%) were aaive the classified as censored. Sixty percent firms and nearly twenty-five percent The historical growth last of the (31%) in statistical EPDM. Of the three years of the database and were therefore total population are employed in industrial are located in Eastbloc countries. in participation in each field number of author/inventors contributing researchers in each field in a given year can be seen clearly in terms of the to the literature (see Figure is 1). The number of calculated to be the cumulative number of individuals entering the field (as evidenced by an initial publication or patent application) subtracted by the cumulative by number of individuals who have left the field (as evidenced their failure to continue to publish or patent in a future year). The data were analyzed using the LIFETEST and LIFEREG procedures of SAS (v5.18). Using the LIFETEST, the first step in the analysis was to the survival and hazard functions for the data. The make non-parametric lifetable estimates of approach was chosen. The ^The market data were provided by the marketing research department of a major chemical firm. The data were checked for accuracy with dau from Kline &C Co.. an organization that publishes statistics on the chemical industry. results of this procedure are illustrated monotone decreasing function and in Figure 2. TTie survival funaion for each field are nearly identical. Tests of is a homogeneity of the survival curves stratified by field can not reject the null hypothesis that the strata have identical survival distributions. The probability of a author/inventor's contributions span lasting two years or longer is about 0.3. After two years, the survival rate continues to diminish, eventually leveling-off at about 0.07 for contribution spans of 15 years or more. likely he or she will be to leave with the it, points. Indeed, the risk of leaving the field first is and sixth years being particularly critical highest within the first year. > E 3 C/5 13 10 Duration of Contribution Span FIGURE The 15 15 10 15 Duration of Contribution Span (years) (years) Non-Parametric Estimates of Survival and Hazard Functions far Author!Inventor Contribution Spans in EPDM and Polypropylene 2: next step in the analysis wa5 to determine the parametric model that best fits the distribution of contribution spans. Although non-parametric analysis permits certain assumptions that can be made about the shape of the survival distribution it is non-monotonic), nonetheless we decided distributions for goodness of fit. The basic to examine model adopted (for instance, that statistically several different for the analysis is: Y = Xp + ae where Y is unknown the log of the contribution span, regression parameters, a assumed distribution. This model is is a scale X is the matrix of covariates, P parameter and £ is a vector is a veaor of of errors from an referred to as an accelerated failure time model 10 because the effect of the explanatory variables is to scale a baseline distribution times. In order to determine the underlying distribution that best are provided in Using LIFEREG, the Table 2. The parameters Newton-Raphson algorithm. The likelihood funaion. the data, four of fit procedure for the entire sample this by are estimated overall Minus two times results maximum of each model is likelihood using a represented by the log- the log-likelihood value has a chi-square distribution with appropriate degrees of freedom. Using the baseline model, the goodness of each distribution is failure were evaluated: the exponential, Weibull, gamma, and log- different types of distributions logistic distributions. fits of fit for evaluated in term of minimizing the absolute value of the log- likelihood score. As a result, the log-logistic distribution (with a log-likelihood score of -1287) was chosen and became the basis for estimating the regression coefficients explanatory variables in the model. This findings, suggests a stratify which consistent with the non-parametric is non-monotonic hazard funaion. Table 2 also shows the to result, of the on the two results fields being investigated. distribution indicate no field effect. of contribution spans for the two non-parametric analysis. This In the next step, the dummy of the inclusion of a There is is estimates for the log-logistic no evidence to suggest that the distribution fields are divergent, result The variable (Technical Field) which confirms the findings from the consistent across all distributions. model was estimated with LIFEREG in a sequence of steps by adding each covariate into the equation using the log-logistic distribution. Since desirable to investigate whether the significance of covariates differs among the two it is fields, the modelling results are presented separately (see Tables 3a and b). In the case of each field, the addition of each covariate has the effect of generally improving the log- likelihood score. Model 9 was chosen as the baseline for the comparison to understand the effea of the covariates. The estimation results of Model 9 indicate a number of differences in terms of the significance of the covariates examined. First, several covariates that are significant in the case of polypropylene are not significant for industrial (-), patents (+), and market EPDM: namely, organization productivity size (+) are all significant for (+), polypropylene but not 11 for EPDM. Second, although the population size variables are significant in both cases, the signs of the coefficients indicate a different relationship in each case. between the two fields are The only similarities the significance of author/inventor produaivity (+), and the lack of significance for the Eastbloc and concentration covariates. In the case of the population variables, the significant right from their initial inclusion in the negative coefficient for population size second-order term implies first- and second-order population terms both fields. combined with the after it is small, size its is population, at which contribution spans. this size (refer population its may size 1 is critical positive coefficient for the data indicate that 1). mass for the become sufficiently large to In contrast to the EPDM EPDM and the length of contribution spans EPDM 1 00 -2 -3 size and when the - 1 1 author/inventor significant in increasing population never grew beyond case, the relationship in the case negatively sloped, and increasingly so, as the population grows larger. 3 EPDM, 50 individuals the slope of the curve turns positive. This be a point of size back to Figure The 3). interesting to note that the It is of negatively related to the length of contribution spans; but reaches the size of about result suggests that there in the case U-shaped relationship between population a author/inventor contribution spans (see Figure population However, are between of polypropylene is 12 DISCUSSION Using the analysis scientific and patent literature as a source of data, of the contribution spans of author/inventors paper provides an this EPDM the field of in and polypropylene catalyst development. Non-parametric estimates of the survival rate and hazard rate are made, and most it is found that the distribution of contribution spans follows closely a log-logistic function. In addition, a statistical between the hazard The rate and of covariates a set is model of the relationship examined. findings of this analysis indicate that the sample survival 3,280 author/inventors in EPDM and polypropylene and hazard functions are generally similar. percent of the author/inventors have a contribution span of more than two years. an author/inventor ceasing to contribute to the contribution span. remains year, for fairly when The hazard constant. The the hazard for field is greatest in the rate declines sharply after the first year between the two critical difference EPDM researchers The fact that understandable, since at that point not field. much The risk of year of their and subsequendy comes in the second suggestive of the basic hypothesis that is individuals confronted with slow progress in a field will be alternative area of research. About 30- more than one-and-a-half times higher than is polypropylene researchers. This result fields first for it more likely to seek an the early years that are most critical is of a researcher's career has been invested Needless to say, the longer one stays in the field, the less likely is in the they are to leave it, regardless of the rate of progress. An examination of relationship between several covariates and the length of contribution spans indicates that noteworthy spans. It is EPDM and polypropylene are quite the difference in the relationship between population size appears that in the rapidly progressing field of polypropylene different. Most and contribution catalysts, the larger the field became, the shorter and shorter the duration of contribution spans became. This perhaps the result of the competitive pressures that arise as more people work in a contrast, EPDM shows a much is field. In different relationship, suggesting that a lack of individuals in the field had a detrimental effea on the length of contribution spans. 13 Another important difference market size is lies in significant is author/inventor contribution spans. This of technology The based studies and set that more may have is and has a positive In the case of an imporunt "market pull" component. some of which to is on the length of effect are more generic understand progress some of which in nature. in a field, it are peculiar to literature- Given that the primary will be necessary to dynamic phenomena that scrutiny of the affect interest construa a data implements a time-varying covariate data struaure. Such an approach careful EPDM, suggestive of the notion that rapidly progressing present analysis has certain limitations, in this research size. not significant in influencing contribution spans. However, in the case of polypropylene, market size fields the efFea of market will permit a an author/inventor's contribution spans. Furthermore, this approach will allow for the examination of whether or not changes in the hazard rate of a of the field. It is also findings from the community can serve as an indicator of future momentum necessary to determine the extent to which the present and future EPDM science and technology and polypropylene and to can be generalized to other field fields examine the importance of other explanatory variables of in understanding contribution spans. Work is currently underway to address these issues. First, a preliminary investigation suggests that data from the literature is structured in such a manner that time-varying covariates should be feasible to create. Second, data sets for ten additional fields are currently being constructed, with fields varying in terms of their size and disciplinary composition, the national and sectoral distribution of their author/inventors, their commercial impact, and the degree established, institutionalized to research which they have succeeded this becoming communities. Third, funher studies supplemented with other data, derived both from the We in believe the approach towards the scientific literature and from other and technological well- will be sources. literature outlined in paper offers new perspectives to the application of bibliometric methods to technological forecasting. Instead of predicting the growth and the decline of particular fields by looking at publication or patent volumes, our research points to the usefijlness of 14 publication and patent information in determining the contribution spans of researchers. Analyzing contribution spans may eventually serve government policy-makers who are responsible for technologies globally. In essence, we monitoring the progress of emerging are proposing a technique that allows worldwide pulse of technological progress by measuring the commitment our aim is to a field. By managers and as a useful tool for rate one of change to gauge the in researchers' focusing on the determinants of researcher contribution spans, to shift attention away from prediaing the technological future and towards understanding the underlying fijndamentals of researcher behavior. Improvements understanding of survival and hazard rates for researchers in a field may in our ultimately lead to the identification of critical factors and events that can inform our policy decisions regarding emerging technologies. In this manner, researchers' persistence using contribution spans rate of technological progress may in a particular field. we suggest that measurements of serve as an indicator of Perhaps we can most change in the clearly envision our technological future by understanding in a comprehensive and systematic manner the sustained commitment of researchers to the ideas they are pursuing today. 15 NOTES [I] Braun, E. and Macdonald, UK Cambridge, [2] S., Revolution in Miniature, Cambridge University Press, (1978). Rosenbloom, R.S. and Cusumano, M., "Technological Pioneering and Competitive Advantage: The Birth of the VCR Industry," California Management Review, 29, 4 (1987). [3] D.K. Smith and R.C. Alexander, Fumbling the Future: William Morrow, Ignored, the First Personal Computer, [4] [5] How Xerox Invented, New York (1988). then Rowe, C, Wright, G. and Bolger, F. "Delphi: A Reevaluation of Research and Theory, " Technological Forecasting and Social Change 39,235-51 (1991) Webler, T., Levine, D., Rakel H., and Renn, O., "A Novel Approach to Reducing The Group Delphi," Technological Forecasting and Social Change 39, 253- Uncertainty: 63(1991). [6] Martino, [7] Girifalco, L., Technological Forecasting J., and Planning, North Holland, New York (1983) The Dynamics of Technological Change, Van Nostrand Reinhold, New York (1991). [8] Einhom Illusion [9] and Hogarth, R,M. "Confidence in Judgement: Persistence of the of Validity," Psychological Review, 85, 5, 395-476 (1978). H.J., Hogarth, R.M., and Makridakis, S. "Forecasting and Planning: Science, 27, 2, 1 15-38 (1981). An Evaluation," Management [10] Schnaars, S.P., Press, New [II] Latour B., and Edge, Megamistakes and the Myth of Rapid Technological Change, The Free York (1989). and Woolgar, eds., MIT S. Press, "The Cycle of Credibility," in Science in Context, Barnes Cambridge, Mass. (1982). [12] Rappa, M.A., and Garud, R,, "Using the Literature in the Study of Emerging Fields of Science and Technology," MIT International Center for Research in the Management ofTechnolo^, Working Paper #59-92 (1991). [13] McMillan, F.M., The Chain [14] Morris, P.J.T., Straighteners, MacMillan Press, Lx)ndon (1979). The American Synthetic Rubber Industry, University of Pennsylvania Press, Philadelphia (1989). 16 [15] Sicilia, D.B., "A Most Invented Invention," Invention & Technology Spring/Summer, 45-50(1990). [16] Elandt-Johnson, R.C., Wiley & Sons, New [17] KaJbfleisch J.D., Wiley & Sons, [18] Allison, P.D., New Analysis, ]o\\vi Prentice, R.L., The Statistical Analysis ofFailure Time Data, John York, (1980). Event History Analysis, Sage Publications, Newbury Park (1984). The Econometric Analysis of Transition Data, Cambridge University Cambridge, (1990). [19] Lancaster, T., Press, and and Johnson, N.L., Survival Models and Data York, (1980). TABLE 1 Variables used in the analysis and their definitions CATEGORY TABLE ML estimation 2 of contribution spans using different distributions c- o^ : d^ ^ o O o o o 9 o " §s MIT 3 \ TDflD LIBRARIES DUPL 007Shfll7 D -^iys :- i* '" MIT LIBRARIES 3 TOaO 0D75bfll7 i J f'^ D i