Public Engagement with Research Online Appendix N: Evaluating the impact of research online with Google Analytics The web provides extensive opportunities for raising awareness and discussion of research findings and issues. As a commonly used channel for communication, the web can also provide a source of data for evidencing the impact of research dissemination, public engagement and knowledge transfer activities. For example, the number and location of people accessing a research report can be used as an indicator of reach, and favorable quotations from practitioner discussion forums citing research can illustrate significance. Web analytics refers to the study of user data collected on websites. Online commerce has been the main application area that has driven the development of web analytics in recent years. Nonetheless, the goal of web analytics is to capture and analyse data on the use made of websites. Here we present an overview of the historical background and the technologies used for tracking user behavior. We also highlight the features of Google Analytics and how it can be set up to monitor and evidence the impact of research online. Background The World Wide Web was proposed in March 1989 and the first web browser was developed in December 1990. As use of the web spread people became interested in who was accessing their pages and for what purposes, and by the mid 1990s commercial companies (such as Web Trends and Analog) were emerging that provided reports of log file data. This was the start of web analytics, the measurement, analysis and reporting of user behavior on the web. From a technical perspective, a web server program logs each request for a HTML element, recording the: Internet Protocol (IP) address of the client computer (i.e. browser), date, time, element requested, and status of the program. Each request in the log file is referred to as a ‘hit’. As the IP address of each web browser can be attributed to a geographical location the summary reports typically identify the number of hits for specific time periods (e.g. hourly, weekly, monthly) per location (e.g. Europe, America, Asia). In 1996 hit-counters started to appear on pages showing the number of requests for that page. However, by 1997 the number of hits logged no longer represented the number of page requests because multiple elements (e.g. text and images) were being used to create a page. This led to the use of JavaScript tags that can be included as a page element to explicitly log each page request. JavaScript tags are still the most commonly used way of explicitly logging page request data. In 2004 the Web Analytics Association was formed as a professional association for web analytics, which changed its name to the Digital Analytics Association in 2010 to reflect a broader approach to multi-channel analytics. In 2005, Google launched the Google Analytics platform. This uses JavaScript tags to log page requests with Google that can be accessed as online reports through the Google Analytics website. Other commercial providers like ClickTale also host customer analytics services. As well as third-party services, a range of server software for logging and reporting user behavior is available for installing on a web server, these include open source solutions such as Piwik. Web analytics for online commerce Online commerce has been one of the primary drivers for web analytics and there has been significant interested in using analytics to help identify the impact of website design and marketing initiatives on online sales and popularity. Google Analytics, for example, currently presents reports on: audience, advertising, traffic sources, content and conversions. The audience reports include: demographics (in terms of the location and browser language setting), visitor behavior (i.e. the number of new and returning visits, and the duration of their page visits), technology used (i.e. the browser version, operating system and the network service provider), mobile (i.e. the number of visitors via specific phone or other mobile devices), and visitors’ flow (i.e. the pathways commonly used through the website). The reports on advertising can be used to describe access associated with AdWords. Traffic source reports identify the source and frequency of referral links from other websites, access via search engines including the search terms used, and the number of visitors accessing the website directly by entering the web address. The content reports identify the relative popularity of pages within the website. Finally, the conversions reports can be used to help indicate the performance of the website with regard e-commerce. Web analytics for evaluating research impact The potential of web analytics to help evidence impact has recently drawn considerable interest from the academic research community, where researchers are increasingly required to account for the impact of their research in terms of the reach and significance of their work outside of their research communities. The following section gives a brief overview of the process of setting up a Google Analytics account and the types of report that are produced. Other web analytic platforms are available, but Google Analytics is currently the most widely used free service. Getting started: Setting up Google Analytics To set up a Google Analytics account you first need to sign into the website (i.e. http://www.google.com/analytics see Figure 1 left) with a Google user account, (which can be created at the http://accounts.google.com website). After signing in, new Google Analytics accounts can be added, edited and deleted under the ‘Admin’ area (accessed by clicking on the ‘Admin’ tab in the top navigation bar of the webpage, see Figure 1 right). Figure 1. Google Analytics sign in page (left) and account administration page (right). Figure 2. Google Analytics profile administration page. As previously noted, JavaScript tags are used to record user behavior in web analytics platforms. In Google Analytics this is referred to as the ‘tracking code’. The tracking code is unique to each Google Analytics account and once signed in the code can be copied from the profile page within the ‘Admin’ area (see Figure 2) and pasted into every web page that is to be tracked under that account (for further guidance see the Google Analytics support pages1). Once a Google Analytics account has been set up and the tracking code has been inserted into the web pages that are to be monitored, the resulting data can then be accessed via the Google Analytics account page. A range of reports is automatically generated (see Figure 3). The date range for a report can be modified; under each type of report an initial overview is presented with more specific reports available under each section. Figure 3. Google Analytics overview reports for audience (left) and traffic sources (right). The specific metrics used within each report are explained in the Google Analytics help pages. A brief description of each metric is also displayed as mouse-over pop-ups within the reports. There is also a set of explanatory video clips provided as preparation for the Google Analytics Individual Qualification test2 that support the interpretation of reports. Examples use case: Using Google Analytics to evaluate impact When web pages are part of a research dissemination, public engagement or knowledge transfer activity, the use made of those pages can provide an insight into the impact of the activity. The audience report in Google Analytics, for example, provides information on the number and location of website visitors over a specified time period, which can be used to evidence reach. The following examples, drawn from the work of the Centre for Competitive Advantage in the Global Economy (CAGE) at the University of Warwick, are provided to illustrate how Google Analytics can be deployed to inform Google Analytics Support Page: Manage Google Analytics – Basic web tracking setup ‘How to set up the web tracking code’ http://support.google.com/analytics/bin/answer.py?hl=en&answer=1008080 (last accessed November 2012). 2 The Google Analytics Individual Qualification text and associated preparatory video clips are available from http://www.google.com/intl/en/analytics/iq.html (last accessed November 2012). 1 engagement activities and evidence impact. The following examples will illustrate how the impact of dissemination, engagement and knowledge transfer activities can be informed by (and evidenced through) the audience, traffic source and content reports, and how dashboards can be configured to provide specific report features. Finally, we consider how these reports can provide an overview of user behavior, in terms of their online activity, and how the reports can be further explored to extract specific details. Audience reports The audience reports present data on the number and location of people visiting the website (see Figure 4). The number of visitors is recorded in several forms, including the number of visits, the number of unique visits, the number of page visits, and the average number of pages per visit. It is important to remember the form of data being collected when interpreting the data, there is no explicit way to identify each person instead the Internet address and details of each web browser are recorded. Selecting the appropriate form of visitor metric is dependent on the question that you need to answer. For example, the number of visits includes repeat visits, where as the number of unique visits does not. The location data refers to the registered city of the Internet Service Provider. As a result of the form of data being captured, audience reports tend to give a good indication of the frequency and location of visitors, to help evidence the reach of the work being communicated through the website. In the audience overview report shown in Figure 4 there is a clear spike in the number of visits on October 4th with over 67% of visits during that month being from browsers where the language setting was set to en-us (i.e. English - United States) and over 10% with the language setting of en-gb (i.e. English - Great Britain). By default, the demographic data refers to the language rather than the location (i.e. country / territory or city). In this case, 739 of the 1,421 visitors in the month (52%) were accessing the website from the United Kingdom and 215 (15%) were from the United States. Figure 4. An audience report for the CAGE website at the University of Warwick for a one month time period (i.e. October 2012). Of the 1,421 visits during October 2012, 733 were from new visitors (i.e. people who had not accessed the website before) and 866 were from returning visitors (733 + 866 = 1,421). Of the 1,421 visitors, 863 were from unique visitors, indicating that some of the visitors were accessing the website through more than one browser (either on the same or multiple devices). Traffic source reports The traffic source reports present data on the visits resulting from: search engines (i.e. the results of a search query); referrals (i.e. links) from other websites; directly entering the web address; and from campaigns (i.e. online promotion through paid campaign search keywords and adverts). The traffic source overview indicates the pathways used to access the website, as some of the details regarding search term access are not available or may be withheld (e.g. by not agreeing to browser cookies) (see Figure 5). Figure 5. A traffic source report for the CAGE website at the University of Warwick for a one month time period (i.e. October 2012). For campaigns that pay for search keywords and adverts, the source of search traffic (i.e. the search engine providers) and the search keywords used will be an important indicator of the value of the campaign. For both paid campaigns and organic search, the keywords used by people that visit the website can also provide a useful indicator of the concepts associated with the website and the work that it communicates. The other part of the traffic source data that can be useful for informing dissemination, engagement and knowledge transfer activities is the source of referral traffic. Much like the most commonly used search engines or search keywords, the more common sources of website referrals (reported in the Referral Traffic Source data table) can give a clear indication of the publics that are engaging with the website. In the example used here, the spike in website visits around October 4th can be attributed to an article published in the cnn.com news website on Andrew Oswald’s work (that had links to the pages on the CAGE website). Content reports The content report overview provides data regarding the specific pages (i.e. website content) accessed during the selected time period (see Figure 6). As with the audience report, the number of page views is measured in terms of the total number (including returns to a previously viewed page) and the number of unique page views. Based on the time between the page views within a visit, the average time on the page is also presented for the given time period. The overview also displays the bounce rate and percentage exit, which are terms used to refer to the percentage of single page visits to the website within the selected time period (i.e. landing on the website and ‘bouncing off’ again), and the percentage of website exits that occurred (i.e. the number of page views that were not followed by a page view within the same website). Figure 6. A content report for the CAGE website at the University of Warwick for a one month time period (i.e. October 2012). Dashboards As well as the themed reports, Google Analytics provide ‘dashboards’ as part of the ‘home’ section of the Google Analytics website. A dashboard is a set of reporting widgets that can be added, moved and deleted by the user in order to create their own reports using any of the Google Analytic metrics. By default, an account includes an initial ‘my dashboard’ (see Figure 7), which can be viewed, edited and deleted; and additional dashboards can be created. Figure 7. A (default) dashboard report for the CAGE website at the University of Warwick for a one month time period (i.e. October 2012). Overviews and specifics The Google Analytics reports are live and interactive web pages that can be used to generate monthly or annual reports. These can be archived as (static) reports, but one of their strengths is that they provide interactive visualisations of the recorded user data, which can be actively explored. For example, the site content reports in the content area can be presented as a line graph showing changes in a selected metric over a chosen reporting period (see Figure 6), or they can be ‘played’ as frame animations of scatterplots or bar charts showing daily changes in the metrics over the selected time period (see Figure 8). The user can configure the visualisations included in the reports. For example, they can select the metrics plotted on the y axis of the timeline graphs; in the scatterplot, the user can select the metrics plotted on the x and y axis, and the metrics to be represented by the point colour and size; and for the bar charts, the user can select the metrics plotted on the two axes and the bar colour. Individual data values are also displayed in each of the visualisations when the user moves the cursor over each data item. Figure 8. Two interactive visualisations from the ‘All Pages’ section of the content report for the CAGE website at the University of Warwick for a one month time period (i.e. October 2012). A further example of the ways in which the Google Analytic reports can be explored interactively is shown in Figure 9. This screen image shows a visitor flow report, which illustrates the order of page views through the website for a selected segment of the visitors (such as the country / territory demographic). Moving the mouse cursor over the banded links between the blocks of the diagram reveals the number of visits and the percentage of total traffic that took that path through the website. Although this is a complex diagram it gives an overview of the how the website is being navigated, and can therefore be a useful source of evidence for exploring the effect of changes to the website structure or navigation mechanisms. Figure 9. An interactive visualisation from the ‘Visitors Flow’ section of the audience report for the CAGE website at the University of Warwick for a one month time period (i.e. October 2012). Summary and further information While Google Analytics reports can be used to evidence the impact of research with regard to reach, additional process information regarding how the research work has been used or the changes that have happened as a result of the research for specific groups is needed to evidence significance. So, although Google Analytics does not currently provide a complete view of research impact, the information on user behavior captured through web analytics can help identify the demographics, interests and access routes of those engaging with the research through the web. These in turn can be used to help guide and inform how the significance of the research impact could be monitored and facilitated. Interest in web analytics has been increasing since the creation of the web. Specialist research communities are now emerging around application areas, such as learning analytics (e.g. the Society for Learning Analytics Research3) and research analytics (e.g. Altmetrics4 and ImpactStory5). How analytic tools might be adapted or extended through these initiatives is difficult to predict. Nonetheless, the tracking and analysis of online behavior as part of evaluating research dissemination, public engagement and knowledge transfer is clearly an important aspect of recognizing the role the web plays in research, and The Society for Learning Analytics Research (SoLAR) is an international network of researchers exploring how analytics can be used for teaching, learning, training and development. For further details see http://www.solaresearch.org (last accessed November 2012). 4 Altmetrics refers to the development and use of social web metrics for analysing and informing scholarship. For further details see http://altmetrics.org/manifesto (last accessed November 2012). 5 ImpactStory is a toolset for altmetrics. For further details see http://impactstory.org (last accessed November 2012). 3 informing how researchers and research institutions use the web to share their research and engage with publics. Online resources The following resources provide information and support on web analytics and their application (links last accessed November 2012). The history of web analytics Brice Bottegal blog – Definition and history of web analytics (March 2012) http://en.bricebottegal.com/definition-history-web-analytics Web analytics and usability blog – A brief history of web analytics (November 2010) http://blog.clicktale.com/2010/11/17/a-brief-history-of-web-analytics Online video tutorials Google Analytics Walkthrough (February 2012) http://www.youtube.com/watch?v=XZDUWd_ezcI Google Analytics - Getting Started with Google Analytics (May 2012) http://www.youtube.com/watch?v=l9joLoZOjK4 Google Analytics information and reference materials Google Analytics Blog http://analytics.blogspot.co.uk Google Analytics Help Centre http://support.google.com/analytics Google Analytics You Tube channel http://www.youtube.com/googleanalytics