1
Sara Pérez Álvarez, Felipe Pablo Álvarez and Isidro F. Aguillo
{sara.perez.alvarez; felipepablo.alvarez; isidro.aguillo}@cchs.csic.es
Cybermetrics Lab, CCHS-CSIC, Albasanz 26-28, Madrid, 28037 (Spain)
Abstract
Open access repositories are a reliable source of academic items that can be used for testing the capabilities of the webometric analysis. This paper deals with actions needed for extracting web indicators from bibliographic records in open access repositories, provides guidelines to support a further webometric study and presents the results of a preliminary web impact evaluation carried out over a sample of 1386 EU FP7 output papers available from the OpenAIRE database. The
European Commission project OpenAIRE aims, among other objectives, to provide impact measures to assess the research performance from repositories contents and, especially, of
Special Clause 39 project participants within EU FP7. Using URL citations, title mentions and copies of titles as main web impact indicators, this study suggests that a priori the implementation of the mandatory clause SC39 to encourage open access to European research may be resulted indeed in a greater and more immediate web visibility of these papers.
Introduction
Webometrics is a quantitative science devoted to the analysis of scholarly communication, such as Informetrics, Bibliometrics and Scientometrics. It was introduced in mid-nineties by a group of researchers including Ingwersen (1997, 1998), Rousseau (1997), Aguillo (1998), Bar-Ilan
(1999), Smith (1999) and Thelwall (2002, 2009), among others.
Methodologically, Webometrics is concerned with gathering data and measuring aspects of the
Web, like for example: web sites or web pages; hyperlinks; rich (documents) or media files; web search engine results; Web 2.0 social networks, etc. The data collection can be performed directly using robots or crawlers specially customized for this task or indirectly extracting information from the databases of the large commercial search engines (Google, Bing and others).
The web indicators can be useful for describing both formal and informal academic activities and results, the performance of organizations, institutions, research groups or even individual scientists and scholars. They can be grouped in three main families (Table 1): those describing activity or presence, counting number of pages, documents, files or other items; a second group describes the visibility or impact of such contents, obtaining statistics after applying link or mention analysis; finally, usage analysis is a fairly new group consisting of numbers related to visits and visitors of the websites. A more detailed classification is available in Aguillo (2009).
1 This work is supported by OpenAIRE project, grant agreement number 246686, under the Seventh Framework
Programme of the European Union. The authors appreciate the Statistical Analysis Unit of the CCHS-CSIC (Spain) for its assistance.
Table 1. Comparative classification of the main webometric and bibliometric indicators.
FAMILY
Activity
Impact
WEBOMETRICS
Web pages
Web documents
Web domains
Web contents
Link Analysis
Mention Analysis
Web 2.0
BIBLIOMETRICS
Publications
Authors/Affiliations
Disciplines
Evolution/Dynamics
Citation Analysis
Semantic Analysis
Usage Visits/Visitors
Downloads
Altmetrics
Journal Circulation
As it is shown in table 1, a novel and promising approach is to examine the use and citation of articles in new forums: Web 2.0 services (Priem & Hemminger, 2010). Because measurements of these new traces may inform alternatives to traditional citation metrics they have been dubbed
“altmetrics”. This is an umbrella term which condenses ideas on how to combine social media with aspects of traditional scholarly practice (Priem et al., 2010). As such, it is properly a subset of Webometrics (Bar-Ilan et al., 2012).
Within this framework, this paper deals with actions needed for extracting web indicators from bibliographic records in open access repositories, provides guidelines to support further webometric studies and presents the results of a preliminary web impact evaluation carried out over a sample of records available in the OpenAIRE repository network.
The European Commission project OpenAIRE aims to deliver an electronic infrastructure and supporting mechanisms for the identification, deposition, access, monitoring of Framework
Programme 7 and European Research Council (ERC) funded articles and providing impact measures to assess the research performance from repositories contents and, especially, of
Special Clause 39 project participants within EC FP7. SC39 covers scholarly literature across 7 disciplines of projects granted after August 2008. In particular, the participants of projects with
SC39 in their contracts shall deposit their scientific publications in institutional or subject-based repositories allowing open access (after an embargo period, if applicable) to project outcomes.
OA means free availability of the results of the research, but also some other advantages such as immediate and global dissemination, increased citations, new metrics, open data, access to publicly funded research, etc. (Giglia, 2010; Swan, 2007; Suber, 2006; Jeffery, 2006; Zhang,
2006). OpenAIRE is working closely to integrate its information with the CORDA database, the master database of all EU-funded research projects. Soon it should be possible to click on a project in CORDIS (the EU’s portal for research funding), for example, and to access all the open access papers published by that project (Manola, 2012).
Given the limited duration of the OpenAIRE project (2009-2012), data collection for traditional bibliometrics is highly constrained since impact analysis based on bibliometrics requires an extensive period of time (at least 2-3 years) after research results have been published in order to gain enough insights. As a result, the alternative metrics promising an earlier analysis (usage statistics and webometrics) are also considered in OpenAIRE as indicators for assessing the impact of FP7 publications.
Figure 1 shows the relative positions of the three groups of indicators being developed according to their specific characteristics. In that figure, the concept of quality refers to the fact that citations come from peers recognizing papers already published in refereed journals with a high visibility.
COVERAGE
LATENCY FRESHNESS
Visits/Downloads
QUALITY Citations
Mentions/Links
Figure 1: Mapping the indicators according to general characteristics. Source: Aguillo (2011a)
Methodology
The original and most widespread approach to the web impact analysis is to count hyperlinks to the objects studied. However, although link counts have been available from commercial free search engines for over a decade, this search facility no longer exists in 2012 (Thelwall & Sud,
2011). The loss of this tool requires theoretical and methodological developments of
Webometrics (Aguillo, 2012). The role of link analysis might now be assumed by mention analysis , a promising technique that had already been reported by several authors (Aguillo, 2009;
Thelwall, 2009). Without abandoning the search engines, the goal now is not to analyze links but terms or phrases and to evaluate its presence in a quantitative way . In this regard, two possible alternative methods for estimating the online impact of any piece of information in the
Web are URL citations and title mentions. An URL citation is the mention of the URL of a web page or web site in another web page, whether accompanied by a hyperlink or not, and a title mention is the inclusion of a title in a web page, with or without a hyperlink (Thelwall, Sud &
Wilkinson, 2012).
The data set analyzed here is a collection of 1386 records (9% are SC39 records: 122 titles) available in the OpenAIRE network and extracted in November 1st 2011. Not all the repositories with SC39 records are OpenAIRE compliant. That means at this stage only partial results are available.
OpenAIRE is based on a technology developed in an earlier project called Driver (www.driverrepository.eu/). It uses the same underlying technology to index Framework Programme 7 (FP7) publications and results. FP7 project participants are encouraged to publish their papers, reports and conference presentations to their institutional open access repositories. The OpenAIRE engine constantly crawls these repositories to identify and index any publications related to FP7funded projects (Manola, 2012). It is also linked to CERN's open access repository for 'orphan' publications, those from FP7 participants that do not have access to its own institutional repository.
OpenAIRE stores bibliographic metadata (harvested from repositories or claimed by end-users) and related project information (from CORDIS) in a database. Relevant data is then indexed and represented in the OpenAIRE portal (www.openaire.eu). The sample records were obtained by querying the OpenAIRE database and the results were transformed into a CVS file.
First, an analysis of the CSV fields extracted is carried out. The aim is detecting bibliographic errors, evaluating the quality of metadata in the records and also preparing the different web search strategies. The unit for the webometric analysis is each one of the deposited items, using as representation of them the title and the (first or corresponding) author of the paper in order to count title mentions and copies of documents, and also their URL/URLs in order to count their
URL citations. Figure 2 shows the basic model for data extraction, from parsing the record, cleaning strange characters and preparing the different strategies with the correct syntax for the search engines to be used in the web impact study.
Figure 2. Theoretical diagram of actions needed for extracting web indicators from bibliographic records in an open access repository.
As for the sources used for extracting the web indicators, the following have been considered
(Table 2):
Table 2. Proposal of sources for building individual web indicators.
WEB INDICATORS
SECTIONS TOOLS
Public
Web
Search engines
Bing
Specialized Scholar
ACTIVITY IMPACT
Webpages Documents Ranks
YES
YES
YES
YES
YES
PageRank
USAGE
Links Mentions Ranks
YES
YES
YES
Web
2.0
Web 2.0 tools
Mendeley
Bibsonomy
YES
YES
(YES)
(YES)
Google includes the "link" operator, but this does not allow an easy collection of aggregated data.
That is, it provides the total number of links to a webpage, but it does not allow quantifying the links coming from a source (Orduña-Malea, 2012).
Regarding Web 2.0, we have also studied other tools like CiteULike, Connotea and Delicious.
Additional applications and their possibilities for research assessment are described in Wouters &
Costas (2012). We have considered the search results returned by general search engines because they provide all different mentions at once (Table 3). The strategy used is: "title" site:Web2.0
domain. Finally, we have chosen Mendeley (mendeley.com) and Bibsonomy (bibsonomy.org). In case of Connotea and Delicious, a small set of titles from the sample have been searched, but no usable results were obtained.
Table 3. Title mentions returned by general search engines in Web 2.0 domains.
Domain
Mendeley
General search engine results (title mentions)
Metadata record.
Mentions as related research.
Number of times referenced by other documents.
Bibsonomy
CiteULike
Connotea
Delicious
Number of times referenced by other documents.
Metadata record in each author’s list of documents in Bibtex format.
Metadata record.
“Posting history”.
Number of times the document is tagged in a bookmark.
Number of times the document is tagged in a bookmark.
Quality of metadata
The main problem detected is the lack of homogeneity among repositories when providing the information:
Character codification is different among repositories and in many cases, especially when the language is different from English or the discipline uses Greek (mathematical or scientific symbols) or non-roman letters, the record is full of strange characters. This applies to titles but to the authors’ names too.
In some repositories, the fields are unexpectedly empty.
Some fields have multiple entries. E.g. different URLs.
Guidelines for webometric analysis
The following criteria and procedures are recommended for performing the analysis:
Titles of most of the scientific papers usually have a great length, which reduces the probability of generating noise, so the full title of the record is used (but without exceeding the limits of search engines: no more than 32 words in Google or 150 characters in Bing). The text should be enclosed between quotation marks (strict adjacency operator) for exact matching.
When the number of characters is low, the first author’s last name can be added.
If there are two versions of the title (original and translated), they can be combined using the OR operator, but must take into account the limitations of search engines when using more than one "boolean" operator.
Regarding titles with non-standard characters, the use of wildcard operator (*) by Google,
Google Scholar and Bing has been studied. Table 4 summarizes the main results obtained when testing out the use of wildcard operator (*) in the titles from the sample. As a result, it has been questioned the effective use of this operator in all search engines, and it is recommended using the search string with standard characters (or parts of the title with standard characters combined by the AND operator) + author last name.
Table 4. Use of wildcard operator (*) by Bing, Google Scholar and Google.
Wildcard operator (*)
BING:
Find multiple forms of a word
Exceptions found
(titles from the sample)
1 st e.g. Title: An example of high order residual distribution scheme using non
Lagrange elements: example of Bézier and NURBS.
Search query (*):
“An example of high order residual distribution scheme using non Lagrange elements: example of B* and NURBS”
. No results.
Search query (by incomplete title): "An example of high order residual distribution scheme using non Lagrange elements: example of ". Relevant results.
2 nd e.g. Title: La conception et les usages de ressources en ligne comme moteur et révélateur du travail collectif des enseignants.
Search query (*): "La conception et les usages de ressources en ligne comme moteur et révélat* du travail collectif des enseignants ”. No results.
Search query (AND): "La conception et les usages de ressources en ligne comme moteur" AND "du travail collectif des enseignants" . Relevant results.
SCHOLAR:
Substitute for whole words
E.g. Title: A colecção de estirpes autóctones de Saccharomyces cerevisiae das principais regiões vitivinÃcolas portuguesas.
Search query (*): “A * de estirpes * de Saccharomyces cerevisiae das principais * portuguesas” . No results.
However, correct full title: “A colecção de estirpes autóctones de
Saccharomyces cerevisiae das principais regiões vitivinícolas portuguesas”
.
Relevant results.
Proposed search: allintitle:Saccharomyces cerevisiae das principais author:Machado (or: estirpes AND "Saccharomyces cerevisiae das principais"
AND portuguesas author:Machado). Relevant results.
E.g. Title: The cosmology of induced $f({\cal R})$ gravity.
GOOGLE:
Substitute for whole words
Search query (*): "The cosmology of induced * gravity" AND "Brouzakis". 3 filtered results are returned (5 unfiltered), but 2 of them are unrelated.
However, the search using the same criteria as for Bing and Google
Scholar : "The cosmology of induced" AND "gravity" AND "Brouzakis" returns 21 filtered results (70 unfiltered), all relevant.
Once these criteria have been applied to prepare titles and authors, Table 5 presents a summary of the indicators and search strategies used in the analysis:
Table 5. Web indicators and search strategies.
Web indicator
Search strategy
Title mentions
“Title”. Exceptions:
Titles with less than 5 relevant words: “title” AND last name’s author.
Title with non-standard characters: “The search string with standard characters (or parts with standard characters combined by the AND operator)” AND last name’s author.
Search in Mendeley and Bibsonomy:
"Title" site:Web 2.0 domain .
Title copies
Google: intitle: "x" or allintitle:x .
Google Scholar: tags allintitle and author . Also, the structure intitle:"x" AND author:x. As allintitle returns errors when using some symbols, we are going to use
INTITLE. As in the case of Google.
Bing: there is no tag allintitle , but instead, it can be used intitle: "x".
As a result, the search strategy suggested here is:
Bing and Google: intitle:“x” (intitle: “x”) (AND author last name)
Google Scholar: intitle:“x” (intitle: “x”) ( author:x)
“URL”
URL citations
Results
Some relevant results from the data set are:
94% of the sample records are “open”. However, 69% links only to the metadata record in its repository instead of the full-text document.
Only 26% (16% for the SC39 records) provides URL to the full-text (97% to PDF files;
50% in the case of SC39 records).
Regarding the name of the PDF files, most of them (95%) are not representative as they do not refer clearly to the document content (instead, they refer to numbers, title abbreviations combined with authors, other codes, parts, etc.). We consider that the most correct way to name a PDF file would include explicit semantic content related to the author/s, publication year and the title.
There is a lack of homogeneity among repositories in the type and number of URLs to be extracted for this field. 88% of the total records present one unique URL (76%) or two
(22%). In the case of SC39 records this percentage is still higher, reaching almost 100%:
79% presents 1 URL and 19% presents 2.
The results of the webometric analysis are presented below. All data come from the filtered results offered by the search engines.
URL citations
The total number of URLs which have been analysed is 1800 and the total number of URL citations received using Google is 4807. As most of the records in the sample have a unique
URL, and 69% links only to the metadata record in its repository, it is not surprising that the largest number of citations received come from this type of URL (Table 6). Main repositories in number of URL citations received are: 1) The CERN Document Server (http://cdsweb.cern.ch/);
2) French Repositories: L'archive ouverte pluridisciplinaire HAL (http://hal.archives-ouvertes.fr),
HAL – Inria (http://hal.inria.fr/); 3) University of Twente (http://doc.utwente.nl/); 4) The Orphan
Repository (http://openaire.cern.ch/).
Table 6. Distribution of URL citations.
Type of URL
Nº of URLs in the s am ple
Nº of URL citations in Google
Bibliographic citations (by
DOI) in Google
Scholar
Percentage over the total citations
Percentage over the total nº of
URLs
Metadata records in the main repositories
PDFs
PURL (handles)
Other URLs (m ainly, m etadata records in databas es -s uch as IEEE
Xplore, Science Direct, etc.-)
999
374
266
2834
921
496
59%
19%
10%
56%
21%
15%
68 305 6% 4%
DOIs (identifier)
24 197 288 4% 1%
Other file form ats (not PDF)
TOTAL
69
1800
54
4807
1%
100%
4%
100%
However, we get that the number of citations received varied significantly by type of URL
(Kruskal-Wallis test, p < 0.001). According to this fact, the DOIs receive more citations than the metadata record URLs (Figure 3). This is also true for SC39 records.
Figure 3. Nº of citations/Type of URLs.
Considering the ratio of the URL citations related to DOIs, it is worth analyzing in more detail this set. Thus, it has been obtained through Google Scholar the number of times these 24 titles have been cited by other works, obtaining a total of 288 bibliographic citations. As it was expected, the most cited publication of this set has proved to be the oldest (from 2004).
Nevertheless, next three most cited DOIs are SC39 titles (from 2009 and 2011).
Title mentions
In webometric analysis, the self-mentions should always be excluded (Aguillo, 2012) using expressions like "-site:urlrepository". Self-mentions in Google represent only 1% of the results, while in Bing this figure is significantly higher, 47%. Taking this into account, it must be stated that the results presented in this study always refer to non self-mentions. It is also noteworthy that the difference between the number of mentions offered by Google (56414) and those offered by
Bing (7793) is quite considerable: Bing offers 86% lower results.
As for the distribution of mentions and focusing on Google (dismissing Bing due to fewer number of results), the highest percentage of titles (Figure 4) is in the range extending from 31 to
40 mentions (20% of titles).
Figure 4. Title mentions (without self-mentions) in Google.
SC39 titles represent 10% of total mentions. In this case (Figure 5), the highest percentage of results is in the range of 41 to 50 mentions (27% of titles). Figure 5 also shows that 61% of the results are concentrated in the second part of the graph, from 41 mentions onwards, unlike what happened with the whole sample set (Figure 4), where the highest weight lies on the first half of the chart (0 to 40 mentions).
Figure 5. SC39 title mentions (without self-mentions) in Google.
Copies of titles
There have been found 8960 copies of titles in Google. 10% of them are SC39 titles. The highest percentage of results (52%) is in the range between 1 and 5 copies. In the case of the SC39 titles,
most have between 6 and 10 copies. On the other hand, it is rare to find titles that have more than
20 copies or none (only 4%), the same being true with SC39 titles too.
Using Google Scholar, 1601 copies have been detected (82% less than in Google). 9% are SC39 titles. 81% of the titles in Google Scholar contains only 1 copy and in no case exceed 10 copies per title. Again, this situation applies also for SC39 titles.
Social bookmarking: Mendeley and Bibsonomy
The study of title mentions in Mendeley and Bibsonomy sites using Google reflects a larger presence in the former than in the second (Figure 6). For the total sample: 64% presence in
Mendeley versus 35% presence in Bibsonomy. For the SC39 records the ratios are 83% versus
27%.
There were a total of 4216 title mentions from Mendeley (11% relates to SC39 titles). In both cases, the highest concentration of mentions is in the range of 1 to 5. Specifically, 45% of the titles are mentioned from 1 to 5 times in Mendeley, while in the case of SC39 titles this figure rises to 61%. No title exceeds 60 mentions in this site (no more than 30 mentions in the case of
SC39 titles).
Regarding Bibsonomy, there were a total of 2639 mentions (5% refers to SC39 titles). In both cases, most titles are not mentioned in this social bookmarking. Of those titles Bibsonomy mentioned, most only appear 1 to 5 times, still rarer are those mentioned over 10 times. The same applies to SC39 titles (Figure 6).
Figure 6. Title mentions in Bibsonomy and Mendeley (by Google)
As it has been observed, in general, the representativeness of the SC39 records is similar in all the cases, around 9 to 11%. However, this is not true for the case of Bibsonomy, wherein this representation drops up to 5%.
Considering the sum of the collected mentions (titles, copies of titles and presence on Mendeley and Bibsonomy) as an overall indicator of web impact, there is a slight statistically significant difference in favor of the SC39 titles (Mann-Whitney test, p <0.02). In other words, titles that meet clause SC39 have a visibility slightly higher than the rest of the titles in the sample.
Conclusions
One of the main conclusions to be drawn from this study is that the lack of homogeneity and standardization in the records in terms of levels of description and/or terminology used in certain fields -such as titles, authors or URLs (type and number)-, makes the webometric analysis difficult, so it has been required to establish some recommendations. Furthermore, to conduct an analysis of this type on a set of titles it is necessary a specific design of the search strategies which must be studied in detail to ensure that results obtained are really representative of each and every one of the records in the sample.
Regarding the results found in the present study, it can be extracted that the number of title mentions is greater than URL citations. That is, there are fewer mentions to the “addresses” of the documents than to their titles: 4807 URLs citations versus 56414 title mentions. Likewise, we have obtained that the most cited “addresses”, proportionately, have been those referring to DOIs.
As for the title mentions, the fact that self-mentions do not pose a relevant percentage of the total mentions retrieved by Google (not in the case of Bing), suggests that the sample achieved substantial visibility (taking into account that, a priori, 99% of the total mentions come from external sites). However, it is necessary to develop similar studies to compare and to correlate the results obtained.
In connection with S39 records, it is interesting to highlight that most of records that have a URL to a DOI are SC39 and how, even though these publications are recent (mainly 2009-2011), they are in the leading positions in terms of number of bibliographic citations received. Furthermore, most records in the sample do not exceed 40 mentions per title, but in the case of SC39 titles their visibility is proportionally higher as most are between 41 and 100 citations per title. In terms of visibility on the selected social bookmarking tools, there is a much more notorious presence in
Mendeley than in Bibsonomy, especially in the case of SC39 records.
Searching for copies of titles, there was again a greater presence in the record set of SC39 titles
(in Google), with an average of 6 to 10 copies detected for each title. However, it is rare (both on
Google and Google Scholar) to find a title which does not have any copy.
Ideally, the full-text files of the documents should have greater visibility. For that, it is advisable to use as official URL the one to the full text document, as Aguillo (2011b) recommended when discussing research priorities in relation to the open access initiatives. It is also recommended that this URL to the full text would be, if possible, short in length, without strange or complex codes and with meaningful content.
There is growing evidence suggesting that open access increases citation and impact of research results, as Swan (2010) concluded after analyzing a series of studies devoted specifically to the analysis of this issue. The data obtained in the present study a priori suggest that the implementation of the SC39 mandatory clause to encourage open access to European research
may be resulted indeed in a greater and more immediate visibility of these titles. Of course, subsequent studies are needed to confirm this result more firmly.
References
(All URLs have been reviewed in May 2012)
Aguillo, I.F. (1998). STM information on the Web and the development of new Internet R & D databases and indicators. In D. Raitt, (Ed.). Proceedings, Online Information 98 (pp. 239-243).
London: Learned Information.
Aguillo, I.F. (2012). La necesaria evolución de la cibermetría.
Anuario ThinkEPI, 2012 , v. 6.
Retrieved from http://www.thinkepi.net/la-necesaria-evolucion-de-la-cibermetria
Aguillo, I.F. (2011a). Building web indicators for the EU OA repository. In: Workshop on New
Research Lines in Informetrics . IPP-CCHS (CSIC). Madrid, May 16th 2011. Retrieved from http://digital.csic.es/bitstream/10261/40279/1/OpenAIRE%20Webometrics.pdf
Aguillo, I.F. (2011b). Ranking Web de repositorios: Webometrics y el acceso abierto. In:
Visibilidad y Acceso a la Producción Científica . Lima (Perú). 22-24 de septiembre de 2011.
Aguillo, I.F. (2009). Measuring the institution's footprint in the web. Library Hi Tech , 27(4): 540-
556.
Bar-Ilan, J. (1999). Search engine results over time - a case study on search engine stability.
Cybermetrics , 2 (1): Paper 1. Retrieved from http://cybermetrics.cindoc.csic.es/articles/v2i1p1.html
Bar-Ilan, J., Haustein, S., Peters, I., Priem, J., Shema, H. & Terliesner, J. (2012). Beyond citations: Scholar’s visibility on the social Web (Preprint). Retrieved from http://arxiv.org/ftp/arxiv/papers/1205/1205.5611.pdf
Giglia, E. (2010). Open access to scientific research: where are we and where are we going?.
European Journal of Physical and Rehabilitation Medicine. Minerva Medica , 461-469.
Retrieved from http://eprints.rclis.org/bitstream/10760/14980/1/eur_jnl_med_rehab_3_2010_open_access%5B
1%5D.pdf
Ingwersen, P. (1998). The calculation of Web Impact Factors. Journal of Documentation , 54 (2),
236-243.
Jeffery, K. (2006). Open access: An introduction. Retrieved from http://www.ercim.org/publication/Ercim_News/enw64/jeffery.html
Manola, N. (2012). Open access: EU project results go public. Retrieved from http://cordis.europa.eu/fetch?CALLER=PRINT_OFFR&SESSION=&ACTION=D&RCN=851
9
Orduña-Malea, E. (2012). Fuentes de enlaces web para análisis cibermétricos (2012).
Anuario
ThinkEPI, 2012 (to be published).
Priem, J., Taraborelli, D., Groth, P. & Neylon, C. (2010). Alt-metrics: A manifesto. Retrieved from http://altmetrics.org/manifesto
Priem, J., & Hemminger, B. H. (2010). Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web. First Monday , 15 (7). Retrieved from http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2874/2570
Rousseau, R. (1997). Sitations: an exploratory study. Cybermetrics , 1(1), paper 1. Retrieved from http://cybermetrics.cindoc.csic.es/articles/v1i1p1.html
Smith, A. G. (1999). A tale of two Web spaces; comparing sites using Web Impact Factors.
Journal of Documentation , 55(5), 577-592.
Suber, P. (2006). Open access http://www.earlham.edu/~peters/fos/overview.htm overview. Retrieved from
Swan, A. (2007). Open Access and the progress of science. American Scientist , 95, (3), 198-200.
Swan, A. (2010). The Open Access citation advantage: Studies and results to date. Technical
Report, School of Electronics & Computer Science, University of Southampton. Retrieved from http://eprints.ecs.soton.ac.uk/18516/2/Citation_advantage_paper.pdf
Thelwall, M. (2002). An initial exploration of the link relationship between UK university Web sites. ASLIB Proceedings , 54(2), 118-126.
Thelwall, M. (2009). Introduction to webometrics: Quantitative Web research for the social sciences. Synthesis Lectures on Information Concepts, Retrieval, and Services , 116 pp. doi:10.2200/S00176ED1V01Y200903ICR004
Thelwall, M. and Sud, P. (2011). A comparison of methods for collecting web citation data for academic organizations. Journal of the American Society for Information Science and
Technology , 62: 1488–1497. doi: 10.1002/asi.21571
Thelwall, M., Sud, P., & Wilkinson, D. (2012). Link and co-inlink network diagrams with URL citations or title mentions. Journal of the American Society for Information Science and
Technology (in press). Retrieved from http://www.scit.wlv.ac.uk/~cm1993/papers/URCitationsTitleMentionNetworks_preprint.doc
Wouters, P. & Costas, R. (2012). Users, narcissism and control – tracking the impact of scholarly publications in the 21st century, SURFfoundation. Utrecht. Retrieved from http://www.surffoundation.nl/nl/publicaties/Documents/Users%20narcissism%20and%20contr ol.pdf
Zhang, Y. (2006). The Effect of Open Access on Citation Impact: A Comparison Study Based on
Web Citation Analysis. Libri , 56 (3), 133-199. Retrieved from http://librijournal.org/pdf/2006-
3pp145-156.pdf