Public Comments on NISO Altmetrics White Paper received from June 9-July 18, 2014 These comments are also available via http://www.niso.org/apps/group_public/document.php?document_id=13295&wg_abbrev=altmetrics 510 Cynthia Hodgson chodgson@niso.org NISO 6/10/14 6:26 Editorial References There is currently no reference section in the white paper. Following the public comment period, we plan to add a references section. 511 Gregor McDonagh grmc@nerc.ac.uk NERC 6/10/14 9:41 N/A personal feedback via Powerpoint Picking up on issues with traditional citations analysis and scope of usage, and options going forward to address cultural resistance. 512 Laurel Haak l.haak@orcid.org 6/12/14 5:46 Substantive General Comments and use of PIDS ORCID Overall, this paper is a balanced summary of the state of the art of altmetrics. Well done! A few comments, by section: Research Outputs: conference presentations and papers are absolutely critical for researchers in computer sciences, maths, engineering... Data Quality: The emphasis on resolution of the source document is important and fundamental. This is not just about a digital document identifier, however. It also must encompass identifiers for the contributor and related organization. Also, it is important to provide guidelines on how to cite things that are not journal articles. figshare does this well for items deposited there, we can do better in the community with citation guidelines for blogs and other communication products. Groups: One plea here, please use the parlance ""ORCID record"" (not profile). More substantively, it seems this section would benefit from an action item/next step in the area of persistent identifiers, namely how PIDs can be a major assist in grouping documents/people/orgs. 513 Pat Loria ploria@csu.edu.au Charles Sturt University Library 6/17/14 0:49 N/A Feedback on Altmetrics White Paper Firstly, I would like to applaud NISO for attempting to develop standards for such a nascent, yet significant field. In my opinion, altmetrics would be better described as ""alternative impact metrics"" rather than ""alternative assessment metrics"". The latter implies quality assessment, while the former more accurately describes what they in fact measure - impact. To attempt to call them something else in order to engender wider acceptance and adoption risks the creation of competing terms, as altmetrics is what they have become known as by their developers and users. And I believe it is quite helpful to have a reference built into the name that distinguishes them from their traditional counterparts, yet highlight their complementarity with other metrics. Furthermore, the nature of altmetrics suggests they should be defined in as inclusive a manner as possible, and not in a prescriptive way. This is one of the defining features of altmetrics, compared with the narrower citation metrics. In essence, altmetrics measure the engagement of online communities with a wide variety of digital research outputs. In other words, they should not be merely defined as ""alternative to the established citation counts"" (page 4 of white paper). The types of research outputs they measure and the types of metrics they employ should also not be prescriptive. Altmetrics measure both scholarly and social impact. Scholarly impact is sometimes quantitatively defined as number of citation counts in scholarly books and journals. However, there is no end to the number of ways that social impact may be defined or counted, and therefore output types and metric types should be open and inclusive to reflect the wide diversity of possibilities for communicating and engaging with research. I am not a fan of aggregated metrics, such as the h-index or the journal impact factor, with all of their inherent faults, not the least of which is their inability to reveal contextual time-related artifact-level impact data. A defining feature of altmetrics is that they gather artifact-level metrics, which users can use and value as they wish, without being subjected to imposed value weightings for various outputs or metrics, thus maximizing the number of user-defined applications. Specifying use casesmay inadvertently serve to undermine the value of potential applications. According to page 8 of the white paper, critics of altmetrics argue that altmetrics correlate poorly with scholarly citations. If we accept this critique as true, then it is equally true that scholarly citations correlate poorly with altmetrics. While altmetrics include scholarly citation counts (via APIs with Scopus and other scholarly citation applications), what this reveals is that altmetrics measure a much wider range of outputs and impacts, including social impact, which scholarly citations do not measure. This is why altmetrics adherents highlight the complementarity of the new metrics to the more traditional ones. They reveal different flavors of impact, or impact occurring in different sectors beyond the academy, which can include government, industry and public impact. Promoting the use of persistent identifiers may be problematic, due to the difficulties associated with imposing public compliance with industry standards. The public will engage with and make reference to research outputs in all manner of ways, and it would be best to develop standards and processes that are able to effectively capture this diversity of engagement. The proposed stakeholder groups do not include third-party systems developers, such as institutional repository developers, research information management developers and researcher profiling developers. Due to the early uptake and popularity of altmetrics in Australia, another potential partner for NISO in the standardization process is Standards Australia. Thank you for the opportunity to comment on the standardization process of this significant development in the field of bibliometrics. Pat Loria 514 Mustapha Mokrane mustapha.mokrane@icsu-wds.org 6/18/14 20:31 Substantive Dataset are traditional research outputs ICSU WDS I was rather surprised to read in the project description "this project will explore potential assessment criteria for non-traditional research outputs, such as data sets..." in opposition to the "traditional" articles and books! In the White Paper itself, however, you mention sometimes "New forms of scholarly outputs, such as datasets posted in repositories" and that ""There seems to be consensus that research datasets and scientific software should be included in the list of valuable research outputs". This shows in my opinion the perpetuation of an unfortunate and artificial divide created between papers and datasets when they used to be one unit! This has certainly resulted in the devaluation of datasets as research output, as they became "supplementary material" to articles. Datasets are research outputs and are as valuable as articles for the simple reason that the article has no value without the underlying datasets. 515 Peter Kraker pkraker@know-center.at Know-Center 7/1/14 7:31 Substantive Data Quality The importance of openness and transparency This white paper is a concise yet comprehensive summary of the various issues surrounding altmetrics. The members of this working group have obviously gone to great lengths to prepare this report and I would like to congratulate them on their effort. In my opinion there is, however, one important issue that has not been raised in this report: the inherent biases of altmetrics. Altmetrics are usually created as a by-product of the acitivity of a certain community, e.g. the members of a social reference management system, the Twitter users in a certain discipline etc. These communities are usually not representative of the basic population in a discipline/research community. Therefore, the altmetrics created from this activity carry an inherent bias, e.g. towards a certain age group, geographic region etc. This has also been reported in the literature, see [1] and [2]. Of course, biases affect all scientometric analysis. In citation analysis, the criteria for the inclusion of authors and papers in the analysis have an impact on the result. Therefore, the question is not how to avoid biases, but how to make these biases visible and reproducible. In my opinion, the only way to deal with this issue properly is to make the underlying data openly available. This way, the properties of the sample are intersubjectively reproducible. Open data would also give more context to the aggregated measures discussed in the report. Furthermore, open datasets would make it easier to uncover gaming (and therefore possibly less appealing). In my opinion, openness and transparency should therefore be strongly considered for altmetrics standards. [1] Bollen, J., & Sompel, H. Van De. (2008). Usage Impact Factor : The Effects of Sample Characteristics on Usage-Based Impact Metrics. Journal of the American Society for Information Science, 59(1998), 136--149. [2] Kraker, P. (2013). Visualizing Research Fields based on Scholarly Communication on the Web. University of Graz. Available from http://media.obvsg.at/p-AC11312305-2001 517 Marcus Banks mab992@yahoo.com Independent consultant 7/3/14 14:41 Substantive Grouping and Aggregation; Context 12-Oct Nomenclature, contributorship/context Thank you for preparing this excellent and comprehensive white paper. It definitely captures the tenor of conversations at the NISO altmetrics meeting held in San Francisco last fall. Two observations and one request. The request first--please use the same numbering of action items in the body of the text as in the beginning of the text, rather than starting over with # 1 in every new set of action items. Consecutive numbering throughout please! This would make it easier to see at a glance which action items are associated with which broad category of recommendations. Thanks for considering this. On to the observations: 1.Nomenclature: I agree that "altmetrics" is not an apt term anymore, as we've moved past the "alt" stage. How about "digital scholarship metrics?" This is less catchy but more descriptive and more current. 2. Contributorship/context: The need for a typology of contributorship roles, so that contributors at various levels of intensity get proper credit, is pressing. Likewise, contextual clues--not all references to other work are positive--are vital. That said, this information would necessarily be at a level of granularity that cuts against the desire for one simple number that explains everything. The forest vs. the trees. Proposed solution below. To resolve the tension between the desire for granularity regarding contributorship roles and context, and the simultaneous desire for a simple number "to rule them all," I propose that NISO develop standards that facilitate analysis at multiple levels. The standards should allow for both broad/high level and deep/granular exploration. 518 Paola De Castro paola.decastro@iss.it Istituto Superiore di Sanità /EASE Council 7/4/14 6:58 Substantive Stakeholders perspectives 12 Editors as missing stakeholders Editors, as gatekeepers of information to be published in scholarly journal articles, are missing stakeholders in this draft including "researchers, institutions, funders, publishers and general public". Editors strive to guarantee the quality of information to be published in their journals, which then will be evaluated through different metrics. Therefore we suggest to consider also the role and perspectives of editors and editors’ associations (like the European Association of Science Editors, EASE), striving for quality in scientific publications and supporting editorial freedom, research integrity and ethical principles in publications. We believe that the issue of quality cannot be disregarded when considering any form of alternative metrics. Recognizing the need to improve the ways in which the output of scientific research is evaluated, many journal editors and editors’ associations (including EASE) signed The San Francisco Declaration on Research Assessment (DORA), a worldwide initiative covering all disciplines, mentioned in the NISO draft. The DORA includes a set of recommendations to improve the ways in which the output of scientific research is evaluated by funding agencies, academic institutions, and other parties. It is noteworthy that the DORA was preceded by the EASE Statement on Inappropriate Use of Impact Factors, published by the EASE, as early as in 2007, to alarm the scientific community with drastic examples of inappropriate use of IF. EASE was one of the initial signers organizations of the DORA. 519 Judy Luther judy.luther@informedstrategies.com Informed Strategies 7/6/14 12:21 N/A Scientists or researchers? Throughout the document terminology refers to scientists which typically does not include arts and humanities although there are references to books and to performances. "Researcher" is a more inclusive term. 520 Lorrie Johnson johnsonl@osti.gov / U.S. Department of Energy / Office of Sci & Tech Info. 7/11/14 8:04 N/A Funding agency perspective Thank you for an informative white paper, and for the opportunity to comment. The Department of Energy funds over $10 billion per year in energy-related research, and we are pleased to see that “Funders” have been identified as an important stakeholder in the development of altmetrics measures. As a funding agency, the ability to assess the impact and influence of the Department’s research programs is important, for traditional text-based scientific and technical information, as well as for new and emerging forms, such as multimedia, software tools, and datasets. This white paper, and the potential action items defined therein, address many aspects of interest to DOE, including the determination of differences in practices between various disciplinary fields; the recognition of contextual and qualitative facets, such as how research is spreading into other disciplines; and the assessment of long-term economic impact and benefit to tax-payers. We look forward to further development of standards and best practices in this area. 521 Ferdinando Pucci pucci.ferdinando@mgh.harvard.edu Massachusetts General Hospital 7/16/14 16:24 N/A Fork Factor: an index of research impact based on re-use I am very happy to know about the NISO effort to improve and standardize the current ways to assess scholarship. As a postdoctoral fellow, I believe there is an urgent need to increase reproducibility of science, especially in the biomedical field. The intent of this comment is to propose to the NISO Alternative Assessment Metrics Project committee a novel index to measure the impact of research, i.e. the fork factor (FF). I recently developed the ideas behind the FF, and you can find a blog post describing them at: https://www.authorea.com/users/6973/articles/8213/_show_article Briefly, the main advantages of the FF are: it can distinguish positive from negative citations (research evaluation issue, page 7-8) it is a metric for nanopublication (research output, page 6) as well as for journal articles it is based on a solid, well established versioning infrastructure (GitHub) it promotes the publication of negative results (journalization of science issue, page 6) it increases reproducibility by immediately spotting bad science (journalization of science issue, page 6) In addition, I believe that the FF will allow a smooth transition from journal articles to nanopublication, which will considerably increase the speed of scientific discovery (considering that a biomedical paper in high impact journals can take 4 to 6 years to be published, and that researchers prune away all the negative results and dead end investigations). Nonetheless, journal articles may still be considered for later publication, after the research story line shapes up, which may allow researchers to also target a broader/lay audience. In conclusion, I would love to contribute as working group member. Thank you 523 Richard O'Beirne richard.obeirne@oup.com Oxford University Press 7/17/14 9:47 N/A General comments Comments are in a personal capacity from a publisher's point of view. [p. 2] Potential Action Items. Some of these should be regarded as best practice which any organization which considers itself to be an academic publisher is expected to support -- regardless of whether it's in the context of altmetrics. I'd say the following fall into this category: 4,5 - agreeing then supporting a taxonomy of academic Research Output types 10 - Promote and use persistent identifiers 12 (possibly) - data normalization (e.g. COUNTER rules) 13 - standardized APIs 18 - define and use contributorship roles [p. 9] I think we need to get used to the idea that altmetrics - or at least some of the components of what make up altmetrics - are inherently messy, change rapidly, are unreliable and to some degree should be considered transient/disposable. Maybe 'indicator' rather than 'metric' is more appropriate. [p. 10] Grouping and aggregation is indeed complex, but I would focus NISO's efforts on modularization of specific metrics almost as a 'raw material', leaving aggregation questions to the altmetrics provider. [p.14] Publishers also want to (have an obligation to) demonstrate to their authors the reach of their research in terms of readership etc. [p. 15] Fully agree with the three bullet points re prioritization (and I'd add the 'best practice' items above.) Many thanks for the work NISO has coordinated and put into the white paper. In the increasingly interconnected academic publishing world, standards, and engagement between standards organizations and all service providers is essential. Keep up the good work! 524 Anna Maria Rossi annamaria.rossi@iss.it Italian National Institute of Health-Publishing Unit 7/17/14 10:58 Editorial – Bioresource Research Impact Factor (BRIF) The BRIF Journal editors subgroup, including researchers and experts with editorial competencies working at the Istituto Superiore di Sanità (ISS-Italian National Institute of Health) and at the Institut National de la Santé et de la Recherche Médicale (INSERM-French National Institute of Health and Medical Research ) would recommend to consider the development of a new metrics, the Bioresource Research Impact Factor (BRIF), in the NISO Alternative Assessment Metrics (Altmetrics) Project. The BRIF is an ongoing international initiative aiming to develop a framework in order to facilitate accurate acknowledgement of resource use in scientific publications and grant applications via unique resource identifiers, and to measure the impact of such resources through relevant metrics (algorithm). Bioresources include both biological samples and their derivatives (e.g. blood, tissues, cells, RNA, DNA) and/or related data (associated clinical and research data) stored in biobanks or databases. An increasing proportion of biomedical research relies on biosamples and much of our medical knowledge is acquired with the aid of bioresources collections. Sharing bioresources has been recognised as an important tool for the advancement of biomedical research. A major obstacle for sharing bioresources is the lack of acknowledgements of efforts directed at establishing and maintaining such resources. Thus, the BRIF main objective is to promote the sharing of bioresources by creating a link between their initiators or implementers and their impact on scientific research. A BRIF would allow to trace the quantitative use of a bioresource, the kind of research using it, and people and institutions involved. The idea is to construct a quantitative parameter, similar to the well known journal Impact Factor (IF), that will recognise the most influential bioresources for the biomedical scientific community and to measure their impact on research production through relevant metrics. The BRIF Journal editors subgroup works in relation with science journals editors and is leading the development of a bioresource citation guideline in the scientific literature. A proposal for a specific guideline was posted in the Reporting Guidelines under development section of the EQUATOR Network (October 2013). Please consult: www.gen2phen.org/groups/brif-bio-resource-impact-factor Equator network [http://www.equator-network.org/library/reporting-guidelines-underdevelopment/#19] Mabile L, Dalgleish R, Thorisson GA, Deschênes M, Hewitt R, Carpenter J, Bravo E, Filocamo M, Gourraud PA, Harris JR, Hofman P, Kauffmann F, Muñoz-Fernàndez MA, Pasterk M, CambonThomsen A; BRIF working group: Quantifying the use of bioresources for promoting their sharing in scientific research. Gigascience 2013, 2(1):7. De Castro P, Calzolari A, Napolitani F, Rossi AM, Mabile L, Cambon-Thomsen A, Bravo E. Open Data Sharing in the Context of Bioresources. Acta Inform Med 2013, 21(4):291-292. Bravo E, Cambon-Thomsen A, De Castro P, Mabile L, Napolitani F, Napolitano M, Rossi AM: Citation of bioresources in biomedical journals: moving towards standardization for an impact evaluation. European Science Editing 2013, 39:36-38. Cambon-Thomsen A, De Castro P, Napolitani F, Rossi AM, Calzolari A, Mabile L, Bravo E: Standardizing Bioresources Citation in Scientific Publications. International Congress on Peer Review and Biomedical Publication: 8-10 September 2013; Chicago, USA Anna Maria Rossi on behalf of the BRIF Journal editors subgroup : Paola De Castro1 , Elena Bravo 2, Anne Cambon-Thomsen2, Alessia Calzolari 1, Laurence Mabile 2, Federica Napolitani1 , Anna Maria Rossi1 1Istituto Superiore di Sanità, Rome, Italy 2 UMR U 1027, Inserm, Université Toulouse III - Paul Sabatier, Toulouse, France" 525 Andrew Sandland andysandland@yahoo.co.uk n/a 7/18/14 7:15 Editorial General comments - "(Publisher point of view, though all comments personal) P.7 you point to the conflation of discovery and evaluation use cases, and I think the paper misses a chance to distinguish the conflation of ‘article level metrics’ (ALMs) as opposed to ‘alternative metrics’. The former can be somewhat traditional measures and metrics, but are non-aggregated measures that perform much of the discovery functions described. Discovery enhancement via ALMs (including altmetrics) is essential to the long-term outlook for highvolume publishing outlets and some of the purer elements of the Open Access ethos-- the idea that journal brand and aggregated metrics are less valuable than publishing and letting the market -- via appropriate filters (rather than the editors that put value behind the journal brand) -- show the way to the most ‘valuable’ articles. The paper misses this stake for volume publishers in the stakeholder perspective (many different publishers are now volume publishers/technical review publishers via at least a few products). P.5: The term ‘social media metrics’ is inherently limiting, and denies some of the more targeted or bespoke measures of impact that would be more relevant to stakeholders such as funders. Similarly ‘altmetrics’ is a nascent terminology that would pre-date a period when these become ‘metrics’, and ideas of ‘alternativeness’ should probably be dropped. P16: The outreach section doesn’t seem to be within the remit of the NISO project P8. Use of these metrics in tenure and promotion decisions -- I’m not a researcher, but I have heard much about the use and misuse of impact factor and the unforeseen consequences that this has had on academic culture. Altmetrics are still metrics, and though a more diverse base for judgement is perhaps a good thing, pouring on ‘more metrics’ to this process is perhaps a case of not learning our lessons from prior experience. While it is nice for alternative research outputs to be formally recognised, do the majority of academics across the majority of disciplines really want to have their work put into these checks and balances as they have the IF? General: IF developed as the product of a single commercial outlet that now holds the monopoly on this metric. Many of the inputs to altmetrics, and some of the providers themselves, are private, for profit enterprises (measures of Facebook, Twitter, Mendeley, and providers like Digital Science’s Altmetric.com). Inclusivity in these metrics must be fluid and and barrier-less, and the mechanism by which they are implemented must be open to inputs from potentially competing commercial interests. Otherwise they stand the danger of becoming walled gardens, following the direction of the singleprovider status of IFs. P6: The argument regarding personality types -- and the idea that scientists/researchers should be expected to have a Twitter profile. Notwithstanding dedication to single commercial entities like Twitter, there is a degree to which funding agencies can and could dictate these activities. If they wish to demonstrate impact and social engagement via these statistics, I can envisage it being a requirement -part of the job -- to publicise as well as conduct the research. " 526 Micah Altman escience@mit.edu MIT 7/18/14 13:10 Substantive - Scientific Basis of Altmetric Construction Scholarly metrics should be broadly understood as measurement constructs applied to the domain of scholarly/research (broadly, any form of rigorous enquiry), outputs, actors, impacts (i.e. broader consequences), and the relationships among them. Most traditional formal scholarly metrics, such as the H-Index, Journal impact Factor, and citation count, are relatively simple summary statistics applied to the attributes of a corpus of bibliographic citations extracted from a selection of peer-reviewed journals. The Altmetrics movement aims to develop more sophisticated measures, based on a broader set of attributes, and covering a deeper corpus of outputs. As the Draft aptly notes, in general our current scholarly metrics, and the decision systems around them are far from rigorous: ""Unfortunately, the scientific rigor applied to using these numbers for evaluation is often far below the rigor scholars use in their own scholarship.”[1] The Draft takes a step towards a more rigorous understanding of alt metrics. It’s primary contribution is to suggest a set of potential action items to increase clarity and understanding. However, the Draft does not yet identify either the key elements of a rigorous (or systematic) foundation for defining scholarly metrics, their properties, and quality. Nor does the Draft identify key research in evaluation and measurement that provide a potential foundation. The aim of these comments is to start to fill this structural. Informally speaking, good scholarly metrics are fit for use in a scholarly incentive system. More formally, most scholarly metrics are parts of larger evaluation and incentive systems, where the metric is used to support descriptive and predictive/causal inference, in support of some decision. Defining metrics formally in this way also helps to clarify what characteristics of metrics are important for determining their quality and usefulness. - Characteristics supporting any inference. Classical test theory is well developed in this area. [2] Useful metric supports some form of inference, and reliable inference requires reliablilty.[3] Informally, good metrics should yield the similar results across repeated measurements of the same purported phenomenon. - Characteristics supporting descriptive inference. Since an objective of most incentive systems is descriptive, good measures must have appropriate measurement validity. [4] In informal terms, all measures should be internally consistent; and the metric should be related to the concept being measured. - Characteristics supporting prediction or intervention. Since objective of most incentive systems is both descriptive and predictive/causal inference, good measures must aid accurate and unbiased inference. [5] In informal terms, the metric should demonstrably be able to increase the accuracy of predicting something relevant to scholarly evaluation. - Characteristics supporting decisions. Decision theory is well developed in this area [6]: The usefulness of metrics is dependent on the cost of computing the metric, and the value of the information that the metric produces. The value of the information depends on the expected value of the optimal decisions that would be produced with and without that information. In informal terms, good metrics provide information that helps one avoid costly mistakes, and good metrics cost less than the expected of the mistakes one avoids by using them. - Characteristics supporting evaluation systems. This is a more complex area, but the field of game theory and mechanism design are most relevant. Measures that are used in a strategic context must be resistant to manipulation -- either (a) requiring extensive resources to manipulate, (b) requiring extensive coordination across independent actors to manipulate, or by (c) inventing truthful revelation. Trust engineering is another relevant area -- characteristics such as transparency, monitoring, and punishment of bad behavior, among other systems factors, may have substantial effects. [8] The above characteristics comprise a large part of the scientific basis for assessing the quality and usefulness of scholarly metrics. They are necessarily abstract, but closely related to the categories of action items already in the report. In particular to Definitions; Research Evaluation; Data Quality; and Grouping. Specifically, we recommend adding the following action items respectively: - [Definitions] Develop specific definitions of altmetrics that are consistent with best practice in the social-science field on the development of measures - [Research evaluation] - Promote evaluation of the construct and predictive validity of individual scholarly metrics, compared to the best available evaluations of scholarly impact. - [Data Quality and Gaming] - Promote the evaluation and documentation of the reliability of measures, their predictive validity, cost of computing, potential value of information, and susceptibility to manipulation based on the resources available, incentives, or collaboration among parties. [1] NISO Altmetrics Standards Project White Paper, Draft 4, June 6 2014; page 8 [2] See chapter 5-7 in Raykov, Tenko, and George A. Marcoulides. Introduction to psychometric theory. Taylor & Francis, 2010. [3] See chapter 6 in Raykov, Tenko, and George A. Marcoulides. Introduction to psychometric theory. Taylor & Francis, 2010. [4] See chapter 7 in Raykov, Tenko, and George A. Marcoulides. Introduction to psychometric theory. Taylor & Francis, 2010. [5] See Morgan, Stephen L., and Christopher Winship. Counterfactuals and causal inference: Methods and principles for social research. Cambridge University Press, 2007. [6] See Pratt, John Winsor, Howard Raiffa, and Robert Schlaifer. Introduction to statistical decision theory. MIT press, 1995. [7] See ch 7. in Fudenberg, Drew, and Jean Tirole. ""Game theory, 1991."" Cambridge, Massachusetts (1991). [8] Schneier, Bruce. Liars and outliers: enabling the trust that society needs to thrive. John Wiley & Sons, 2012. Submitter Proposed Solution: "The above characteristics comprise a large part of the scientific basis for assessing the quality and usefulness of scholarly metrics. They are necessarily abstract, but closely related to the categories of action items already in the report. In particular to Definitions; Research Evaluation; Data Quality; and Grouping. Specifically, we recommend adding the following action items respectively: - [Definitions] Develop specific definitions of altmetrics that are consistent with best practice in the social-science field on the development of measures - [Research evaluation] - Promote evaluation of the construct and predictive validity of individual scholarly metrics, compared to the best available evaluations of scholarly impact. - [Data Quality and Gaming] - Promote the evaluation and documentation of the reliability of measures, their predictive validity, cost of computing, potential value of information, and susceptibility to manipulation based on the resources available, incentives, or collaboration among parties. " 527/8 Cameron Neylon cneylong@plos.org PLOS 7/21/14 Substantive - PLOS Response to NISO Altmetrics White Paper *Executive Summary* PLOS welcomes the White Paper which offers a coherent statement of the issues and challenges facing the development, deployment, adoption and effective use of new forms of research impact indicators. PLOS has an interest in pursuing several of the actions suggested, with an interest in three areas: 1.Actions to increase the availability, diversity and quality of new and emerging indicators of research use and impact 2.Community building to create shared, public and open infrastructures that will enhance trust in, critique of, and access to useful information 3.Expansion of existing data sources and tools to enhance the availability of indicators on the use and impact of research data and other under-served research output types PLOS proposes that the Potential Action Items be prioritised by the community as well as ordered in terms of dependencies and timelines. We propose some prioritisation below. In terms of timelines we suggest that a useful categorisation would be to divide actions into "near-term", i.e the next 6-12 months, "within-project" and "beyond-project". NISO should focus its efforts on those actions that are a good fit for the organisation's scope and remit. These fill focus on tractable best practice developments alongside coordinating community developments in this space. The best fit actions for NISO are: 1. Develop specific definitions for alternative assessment metrics 4. Identify research output types that are applicable for the use of metrics 14. Develop strategies to increase trust, e.g. openly available data, audits or a clearinghouse 17. Identify best practices for grouping and aggregation by journal, author, institution, and funder One aspect missing from the report is a discussion of data licensing. This is directly coupled to the issue of availability but also underpins many aspects of our comments. Downstream service provision as well as wider critique and research will be best supported by clear, consistent and open terms of usage data. Identifying how to achieve this will be an ongoing community challenge. *Categorising and Prioritising the Potential Action Items* The following builds on our submission to the HEFCE Enquiry on Metrics in Research Assessment. Our fundamental position is that these new indicators are currently useful as evidence to support narratives. They may be useful in the mid-term as comparators within specific closely related sets of work but even in the long term it is unlikely that any form of global analysis across the research enterprise could rely credibly on mechanistic analysis. This position is built on our understanding and expertise in what currently available indicators are useful for and what further work is required for their sensible use in a range of use cases. Currently the available underlying data is not generally available, is not of high enough consistency and quality and has not been subject to sufficient critical analysis to support quantitative comparisons across research outputs in general. To enable deep critical analysis of what the data can tell us requires access to a larger corpus of consistent and coherent data. This data should be available for analysis by all interested parties to build trust and enable critique from a wide range of perspectives. Therefore our priorities are: first to make more data available; second to drive an increase in the consistency and comparability of that data through comparison, definition development and ultimately agreement on standards; third to create the environment in which interested parties can compare, analyse, aggregate and critique both the underlying data, emerging standards, aggregation analysis processes, and the uses to which indicators and aggregate measures are put. Our experience is that widespread adoption can only be built on community trust in frameworks, systems and uses of these new sources of information. That community trust can only be built in turn by using reliable and widely available data to support conversations on how that information might be used. Throughout all of this a powerful mechanism for ensuring an even playing field for innovative service providers, enabling scholarly critique and ensuring transparency is to ensure that the underlying data that supports the development of metrics is openly available. *Suggestions on prioritisation* We propose priorities at the level of the categories used in the White Paper. The Potential Actions listed under Data Quality and Gaming speak most directly to the question of data availability and consistency. In parallel with this initial actions under Discovery and Evaluation will define the use cases needed for the next stage of analysis. Much of these actions are also possible immediately and several are underway. We suggest that those actions listed under Research Outputs, Grouping and Aggregation and Context form the next tier of priorities, not because these are less important but because they will depend on wider data availability to be properly informed. They will also build on each other and therefore should be taken forward together. Work on Definitions, Stakeholder Perspectives and Adoption is necessary and elements of this should be explored. Success in these areas will be best achieved through ensuring stakeholder engagement in the development activities above. Trust and wider engagement will be best achieved through adopting open approaches to the technical developments which includes an active outreach component. PLOS is keen to engage in concrete actions that will help to deliver community infrastructures and wider data availability. We are already engaged in projects that are investigating share community spaces and infrastructure (Crossref DOI Event Tracker Working Group) exploring new data sources and forms of indicator, as well as ongoing work on community building and advocacy (European Research Monitoring Workshop - report forthcoming). *Timing* Several potential actions are feasible to undertake immediately or are already underway. Engaging with those initiatives already promoting unique identifiers and building shared infrastructure will be valuable. The Crossref Working Group should be engaged as a locus for data availability and experimentation. The Data Citation Interest Group hosted by FORCE11 is another point of contact which is already engaged with the Research Data Alliance. In the medium term a stable working group locus should be established to provide a centre for ongoing conversations. This could be based within a NISO context or could take advantage of the FORCE11 infrastructure. There should be coordination with the European Commission Study on 'Science 2.0' which proposes an observatory of changing practices in the research enterprise. Within the scope of the NISO project such a group could seek to develop Best Practice statements in this space using the Data Citation Principles as a template. Any such group should be broad based and include a range of stakeholders including those beyond the technical development community. We regard the development of "capital S Standards" as only feasible for a small subset of indicators within 2014-16 timeframe. It may be possible to define Standards development for some data collection practices and definition of data sources/API usage such as Mendeley reader counts or Facebook counts. However as all such efforts take time it is important to lay the groundwork for such Standards to be developed as and when they can. The working group should therefore focus on producing Best Practice and "small s standards" as and when possible so as to build the wider conversation towards wide adoption of consistent community practice that can ultimately be codified.