World Benchmark Calculations for IntERAct Citations per paper for world benchmarking We can use the Scopus Custom Data set to determine year-specific, FoR-specific world benchmark citations per paper. These benchmark citations per paper need to be calculated for each publication year 2005-2010 for each (relevant) 4-digit FoR code (designated cppFoR,y). For a particular 4-digit FoR code, and a given year in the ERA window, we need to identify the outputs in the Scopus Custom Dataset that satisfy the following eligibility criteria: 1. have the particular publication year (based on element publicationyear in the Scopus Custom Dataset). 2. are in journals with that 4-digit FoR permitted on the 2012 ERA journal list (based on ISSN and/or title matching). 3. are indexed with a document type of journal article, conference paper, or review (based on the element citation type having a value of ar, cp or re in the Scopus Custom Dataset – these are the Scopus article types the ARC used in ERA 2010 to determine the world benchmarks). If c is the sum of citation counts for the publications that satisfy 1, 2 and 3 above, and n is the number of publications that satisfy 1, 2 and 3 above, then the benchmark for the particular FoR for that publication year (cppFoR,y) would be: cppFoR,y = c/n That is, the citations per paper benchmark in a particular year and FoR is the sum of citations to the publications in that FoR published in that year divided by the total number of publications. We need to do the above for each publication year 2005, 2006, 2007, 2008, 2009, 2010 for each FoR in which citation analysis is being used. Relative Citation Impact (RCI) RCI for an individual paper For an individual indexed journal article published in year y with a citation count of cja as at the date we capture static citation counts and calculate benchmarks, the Relative Citation Impact for that article in a particular FoR (i.e. the RCIja,FoR) would be: RCIja,FoR = cja/cppFoR,y If the apportionment of the journal article to a particular FoR is a, then the apportioned Relative Citation Impact for that FoR (aRCIja,FoR) is: aRCIja,FoR = RCIja,FoR * a Average Apportioned RCI(world) for an FoR The Average Apportioned RCI for a FoR is the average of the apportioned RCIs for each paper in a given FoR (i.e. the sum of all of the apportioned RCI’s divided by the total apportioned count of indexed articles in the FoR). In the ERA 2010 Citation Benchmarking Methodology document, this was referred to as the Average RCI (world). In this current document, it is referred to as the Average Apportioned RCI, AaRCI. (This is to differentiate it from the average of the RCIs, calculated without taking into account apportionment, designated ARCI). 1 Table 1 shows each of these calculations for a hypothetical FoR to which 10 papers are assigned. (The number of outputs is too small to be statistically meaningful; it is for illustrative purposes only.) Paper Static citation count cja (from Scopus Custom Dataset) 1 2 3 4 5 6 7 8 9 10 2 3 3 4 3 5 6 4 5 6 Total apportionment: Total number of articles: Apportionment Publication World a year y benchmark (from IntERAct) (from cppFoR,y 0.8 0.5 1.0 0.3 1.0 0.7 0.8 0.2 1.0 1.0 7.3 10 IntERAct) (calculated from Scopus Custom Dataset) 2005 2005 2006 2006 2007 2007 2007 2008 2008 2009 2.3 2.3 3.1 3.1 3.8 3.8 3.8 4.2 4.2 4.2 RCIja,FoR = cja/cppFoR,y Apportioned RCI = aRCIja,FoR = RCIja,FoR * a RCI Class 0.869565 0.695652 1.304348 0.652174 0.967742 0.967742 1.290323 0.387097 0.789474 0.789474 1.315789 0.921053 1.578947 1.263158 0.952381 0.190476 1.190476 1.190476 1.428571 1.428571 Total 8.485873 aRCIFoR AaRCIFoR 1.16 (world) Total RCIFoR 11.68762 II III II III I III III II II III ARCIFoR 1.17 Table 1: RCI calculations for a hypothetical FoR with 10 publications assigned with various apportionments. RCI Classes We can use the RCI calculated for each article (RCIja,FoR) and assign a RCI Class based on the seven classes of RCIs used by the ARC in ERA 2010. The range of RCIs matching each RCI Class is shown in the first 2 columns of Table 2. An RCI Class Profile can then be compiled by counting the number of apportioned articles within each RCI Class. As an example, the RCI classes for each of the 10 papers in the hypothetical FoR are shown in the last column of Table 1. (Note: the RCI class assignment for an individual publication is based on the article’s RCI not the apportioned RCI. However, as described, the profile is done by counting the apportioned articles within each RCI class.) Based on the information in Table 1, the RCI class profile for this FoR would be as shown in Table 2. 2 Class RCI range Apportioned % of No. of indexed apportioned articles indexed articles 0 0 0 0% I 0.01 – 0.79 1.0 13.7% II 0.80 – 1.19 3.0 30.1% III 1.20 – 1.99 3.3 45.2% IV 2.00 – 3.99 0 0% V 4.0 – 7.99 0 0% VI >=8 0 0% Table 2: RCI Classes and RCI Class profile for the hypothetical FoR with publications described in Table 1. Notes on the proposed methodology for world benchmark calculations We are assuming that benchmarking will be based on outputs indexed by Scopus as articles, reviews or conference papers as per the 2010 benchmarking methodology. The proposed methodology for determining the year-specific benchmarks does not use any data from MD or 2-digit-coded journals. In ERA 2010, the ARC actually used the data submitted to identify MD and 2-digit journals that behaved more like 4-digit journals. If ≥ 25% of all articles submitted to ERA 2010 in a particular MD or 2-digit coded journal was assigned a particular 4-digit FoR code AND this constituted more than 50 apportioned items, that MD or 2-digit journal contributed to the relevant FoR code world benchmark. The ARC applied this methodology to 157 journals in ERA 2010. The way in which these journals were coded for the purpose of the benchmark calculations is presented in Appendix 2 of the ERA 2010 Benchmarking Methodology document. As an example, the journal Applied Physics Letters was given 02, 09 and 01 on the ERA 2010 RJL, but contributed to the benchmarks in FoR 0204. (I wondered whether the ARC used this to refine the FoR codes provided for these journals on the 2012 list, but "Applied Physics Letters" still has 09 and 02 in the 2012 list.) Obviously, we don't have access to this data for the 2012 submission, since it will be based on the actual FoR code allocations institutions use in the 2012 submission. Further, the 66% rule may be a complicating factor - as an example, if journal articles in a particular journal are consistently reassigned to a particular non-listed FoR using the 66% rule then the ARC might choose to use that journal in the benchmark calculations for the non-listed FoR. In short, whilst we can calculate reliable benchmarks based on the Scopus Custom Dataset they will not be exact matches of the benchmarks used by the ARC. This is because of the MD and 2-digit journal issue above, as well as the fact that the "capture date" of the citation counts we use to determine benchmarks will be some months earlier than 1 March 2012 used by ARC. For the Average Apportioned RCI calculations we have assumed that apportionment will be used in the same way in ERA 2012 as in ERA 2010. The ARC has not made any statement regarding the citation benchmark methodology to be adopted in ERA 2012. Hence it is proposed that we present both the Average RCIs and Average Apportioned RCIs in IntERAct. 3 Proposed Stages for Benchmark Calculations for IntERAct Stage 1 a. Match the ERA 2012 Journal List to the Scopus Custom Data Set A critical first step is to match the Scopus Custom Data to the ERA 2012 Journal List. This can be done on the basis of ISSNs as well as journal title, but will require some manual effort as well. For example, the British Medical Journal, indexed by Scopus as “BMJ”, is on the ERA Journal List as “BMJ: British Medical Journal”. This would not match on title. Further, the ISSNs on the ERA 2012 Journal list do not include the ISSN in the Scopus journal record for BMJ, so ISSN-matching would also fail. These would need to be manually matched. b. Calculate world benchmarks for 4-digit FoRs and extract static citation counts for UQ indexed papers Calculate year-specific, FoR-specific world benchmark citations per paper (cppFoR,y) at the 4-digit FoR level. Extract static citation counts from the Scopus Custom Dataset for each publication in eSpace with an EID. Calculate RCIs and aRCIs for each indexed article and present in IntERAct. Calculate Average RCIs (ARCI) and Average Apportioned RCIs (AaRCI) for each 4-digit FoR where citation analysis is used. Make data available in IntERAct. IntERAct: Screenshots in Appendix Stage 2 a. MD and 2-digit journal considerations Include the journals listed in appendix 2 of the 2010 benchmarking methodology document - look at the 2010 MD and 2-digit list (Appendix 2), work out which ones are still MD/2-digit in 2012, assign them as per the re-mapped FoR codes used in the 2010 exercise, and re-calculate year-specific, FoR-specific world benchmark citations per paper (cppFoR,y) at the 4-digit FoR level. Identify which 4-digit benchmarks are changed because of the inclusion of these MD/2-digit journals, and decide whether to update 4digit benchmarks based on this. b. 2-digit benchmark calculations Calculate year-specific, FoR-specific world benchmark citations per paper (cppFoR,y) at the 2-digit FoR level. Where the number of indexed journal articles indexed in Scopus for a given FoR across all ERA years is <800, a 2-digit level benchmark was used in ERA 2010. In 4-digit FoRs where the combined outputs 2005-2010 number <800, compare the 2-digit benchmark with the 4-digit benchmark and decide which to use. c. Centile Analysis The world centile thresholds for a particular FoR code and publication year can be derived by determining the number of citation counts required to be in the top 1, 5, 10, 25 and 50 percent of all articles in the Scopus Custom Dataset satisfying the eligibility criteria outlined in 1, 2 and 3 above. Put together a “Percentiles Table”, showing the minimum number of citations a paper needs to meet the percentiles within each 4-digit FoR, by year of publication. This comes with the caveat that our calculations will not be same as the ARC’s. 4 Stage 3 1. Australian HEP Benchmarks It has been decided not to pursue Australian Benchmark Calculations, as we cannot accurately calculate Australian HEP benchmarks. For ERA, these will be calculated on the basis of the papers actually submitted by Australian HEPs to ERA, determined by the staff census date approach rather than addresses on papers. However, options were discussed and these included the use of “addresses on papers” as proxies. A number of possibilities were considered: three options are described here. One is to follow the same methodology as for the world benchmarks but restrict to those items in the Scopus Custom Dataset with Australia listed in the address (as an affiliation country). This would include all outputs with Australian research organisations listed on publications, including CSIRO etc and not just the HEPs. The second method would be to search for the outputs of each of the Australian HEPs separately based on organization in the Scopus Custom Dataset. The third method would be to derive Go8 benchmarks (plus, perhaps, some of the others who did well in ERA 2010 like Griffith). We would then calculate HEP Go8 benchmarks based on address searches of the Scopus Custom Dataset. This may be useful in FoRs where the Australian RCI is higher than the world benchmark. 5 APPENDIX: Suggestions for IntERAct 6 Add column here for Average Apportioned RCI - AaRCI. Only have rows for the following: UQ Total Indexed (number of pubs) UQ Total Indexed apportioned number UQ Total Cites (Scopus) AaRCI - Average Apportioned RCI (taking apportionment into account Include additional columns here for RCIja,FoR and aRCIja,FoR for each paper. The column headings would be RCI, aRCI 7 One or more RCIs will need to be presented here as they will be FoR-code specific. For an output assigned 100% to one FoR, only one RCI is required. Where assignment is to more than one FoR, all relevant RCIs will need to be presented. Present article-level RCIs and apportioned RCIs (aRCI) Present as: FoR code, RCIja,FoR, aRCIja,FoR e.g. 0302, 2.3, 1.8 Column Heading should be: “FoR, RCI, aRCI” 8 This is the article-level apportioned RCI (i.e. aRCIja,FoR) Column heading should be: “aRCI” This is the article-level RCI: that is, it is RCIja,FoR It will show the relative performance of the article across all possible FoRs. Column heading should be: “RCI” 9 For each assigned FoR, have columns for: RCIja,FoR(article-level RCI) aRCIja,FoR (article-level apportioned RCI) Column headings for the above would be “RCI” and “aRCI” (“World RCI” column should be relabelled “RCI”, and add another column for “aRCI”) Remove World cpp, and all columns with “National” benchmark information. 10 Definitions: Acronym FoR a cpp IntERAct Label FoR % Description Field of Research apportionment Citations per paper cpp c Scopus Cites AaRCI Citation count of individual paper Average Apportioned RCI (taking apportionment into account) AaRCI RCIja,FoR RCI aRCIja,FoR 11 aRCI Article-level RCI (based on the citations per paper benchmark for papers in that FoR published in the same year as the article of interest) Article level apportioned RCI Definition/Formula Derived from Scopus Custom Dataset; this is year- and FoR-dependent Based on static citation count in Custom Data Set the sum of all of the article-level apportioned RCI’s divided by the total apportioned count of indexed articles in the FoR (In ERA 2010 this was referred to as the Average RCI (world) RCIja,FoR = cja/cppFoR,y aRCIja,FoR = RCIja,FoR * a