file

Supporting Online Material Data We parsed 8,592,483 PubMed records, extracting from each the PubMed ID, journal name, year of publication, publication type, chemical names, and MeSH terms. We chose a time-span between 1985 and 2004 characterized by a steady growth of the number of MeSH terms, chemical names, and articles (Sup.Fig.1). Our dataset for this period covers 12,039 journals that mention a total of 22,371 unique MeSH terms and 153,756 unique chemical names. There are 49 different publication types. Supplementary Figure 1. Number of articles, MeSH terms and chemical names mentioned in PubMed since 1950. Analysis A list of terms (MeSH terms or chemical names) that accompanies most timestamped records in the PubMed database allows us to monitor how popularity of terms changes in time. Given a time point t, we define Nti as the number of terms that occur in PubMed before t exactly i times. To characterize the probability of encountering terms with the same level of popularity (number of instances in PubMed before the same point of time), we introduce a popularity variable q that takes integer values 0, 1, 2, …; notation P(q | t, parameter values) represents the expected proportion of terms with popularity q at time point t given our model and parameter values. When we model stochastic generation of scientific texts, we assume that each time-stamped text is allowed to contain terms with zero popularity (novel terms), and that the expected frequency of such terms is , a parameter that we call novelty: p(q  0 | Nt ,  ,  )   , (1) where N t is a vector summarizing all popularity counts associated with time point t. We further assume that the expected frequency of known (q > 0) terms is p(q | Nt ,  ,  )  (1   ) Ntq q  N n n 1 n  t , (2) where  is another model parameter that we call temperature. We use equations (1) and (2) to compute the likelihood of any collection of term mentions given parameter values, and, assuming an uninformative prior parameter distribution, estimate the joint posterior distribution of  and . In our analysis, we first estimate the novelty and temperature separately for topic (MeSH) and method (chemical) content of articles published in journals that mentioned at least 1,000 MeSH terms, and at least 1,000 chemicals within the chosen interval, and had a known impact factor. This left us with a set of 1,757 journals. The journal’s impact factor was computed as an average of its IF values reported between 1999 and 2004. We use the following linear regression model with a stepwise regression analysis framework to test for a five-way correlation among journal specific parameters (  and  are temperature and novelty, respectively) and the impact factor (IF) of a journal, IFi  A topic,i  B topic,i  C method,i  D method,i  E  error, where subscript i refers to the ith journal and A, B, C, D and E are parameters of the linear regression model. We assume that the error term follows a normal  distribution. Our analysis shows that estimates of B and D are not significantly different from zero. The estimate for A is significantly larger than zero (4.55, with 95% confidence interval [3, 6]) and estimate for C is significantly smaller than zero (-9.8, with 95% confidence interval [-12.6, -7]). We estimate model parameters and credible intervals for publication types using a version of the Markov chain Monte Carlo approach (see Figure 1c and 1d; we use the maximum posterior probability estimator in each case). Parameter estimation for topics is done for publication types that mentioned at least 1,000 MeSH terms (same selection strategy applies to parameter estimation for methods). ‘Average temperature’ and ‘average novelty’ refers to a weighted average of temperature and novelty when all publication types are considered together. To fit the topic and method volumes to a Zipf’s (Pareto) distribution we use the maximum likelihood estimate of -parameter of Zipf’s distribution; our estimates of -values for topic and method volumes are 1.153 and 1.528, respectively.

file

Related documents

Products

Support

file

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib