Online reading and power browsing Some interesting questions for e-research Ian Rowlands UCL Department of Information Studies CIBER group 1 Understanding online behaviour Trends in scientific reading Supporting strategic reading 2 Understanding online behaviour 3 Reading the tea leaves Deep log analysis 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/google-sm.gif HTTP/1.1" 200 654 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/xhtml1.0.png HTTP/1.1" 200 929 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/css.png HTTP/1.1" 200 918 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 4 Reading the tea leaves Deep log analysis 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/google-sm.gif HTTP/1.1" 200 654 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/xhtml1.0.png HTTP/1.1" 200 929 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/css.png HTTP/1.1" 200 918 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618) Strengths of DLA Weaknesses of DLA Comprehesive data, no sampling issues. What people actually did, not what they remember or invent Grounded theory, no prior assumptions 5 Reading the tea leaves Deep log analysis 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/google-sm.gif HTTP/1.1" 200 654 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/xhtml1.0.png HTTP/1.1" 200 929 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" 86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/css.png HTTP/1.1" 200 918 "http:// www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618) Strengths of DLA Comprehesive data, no sampling issues. Weaknesses of DLA Little or no contextual information about task or motivation. What people actually did, not what they remember Unit of analysis is usually the session, not the or invent individual. Grounded theory, no prior assumptions What does it mean? (Value judgements inappropriate.) 6 What the tea leaves tell us Deep log analysis horizontal information seeking: skimming, viewing 1-2 pages of an online source, probably never returning navigation: extended time navigating a site rather than viewing content In densely printed pages of text, reading is linear and strictly coded. Such texts must be read the way they are designed to be read: from left to right and from top to bottom, line by line. Any other form of reading (skipping, looking at the last page to see how the plot will be resolved) is a form of cheating and produces a slight sense of guilt in the reader. Kress and Van Leeuwen, 2006: 204. power browsing: very short dwell times, rapid clicking squirreling behaviour: squirelling away material, either by downloading it, bookmarking it, or cutting and pasting I am not thinking the way I used to think. I can feel it most strongly when I am reading. Immersing myself in a book or lengthy article used to be easy ... deep reading that used to come naturally has become a struggle. Once I was a scuba diver in a sea of words. Now I zip along the surface like a guy on a jet-ski. Nicholas Carr, 2008:2. checking: establishing the reliability of information by rapid cross-checking across multiple sites 7 Age-related differences in information-seeking BBC Digital Revolution experiment pilot phase Generation X born after 1973 Generation Y born before 1973 Google generation born after 1993 8 Age-related differences in information-seeking BBC Digital Revolution experiment pilot phase Question: Where did the first commercial flight land? 3.8 0.78 0.54 1.9 20 and younger 21 and older Number of searches 20 and younger 21 and older Edit distance (Where 1=Copy and paste question) 9 Age-related differences in information-seeking BBC Digital Revolution experiment pilot phase 12.6 4.1 2.2 4.2 20 and younger 21 and older Pages viewed 20 and younger 21 and older Domains visited 10 Age-related differences in information-seeking BBC Digital Revolution experiment pilot phase 4.9 2.5 21.5 11.7 20 and younger 21 and older Confidence in answer 20 and younger 21 and older Working memory (Where 10=Highly confident) 11 Strategic reading In the physical space • text search gather browse scan assess chain filter compare arrange link annotate analyse fragments 12 Strategic reading In the digital space search gather browse scan assess chain filter compare arrange link annotate analyse fragments 13 Trends in scientific reading 14 Trends in information production Growth of the scientific literature 15 Trends in e-journal use CIBER analysis of Sconul library returns The growth in article downloads n=67 UK universities 219 The graph opposite shows the number of full text article downloads (from all publishers). Downloads are indexed to 100 for the academic year 2003/04 for ease of comparison. 189 142 In just three years: !total use more than doubled !... at a staggering compound annual growth rate (CAGR) of 21.7 per cent per annum. 100 2003/04 2004/05 2005/06 2006/07 Source: Sconul / COUNTER 2008 16 Trends in e-journal availability CIBER analysis of Sconul returns E-journal titles per academic FTE n=115 UK universities 8 7 E-journal titles 6 5 4 2001/02 2002/03 2003/04 2004/05 2005/06 2006/07 Source: Sconul / COUNTER 2008 17 Trends in scientific reading Carol Tenopir and Don King surveys 18 Strategic reading: do scientists recognise themselves? CIBER Virtual Scholar findings Scientists have always strived to avoid unnecessary reading. Like all researchers, they use indexing and citations as indicators of relevance, abstracts and literature reviews as surrogates for full papers, and social networks of colleagues and postgraduate students as personal alerting services. The aim is to move rapidly through the literature to assess and exploit content with as little actual reading as possible. As indexing, recommending, and navigation has become more sophisticated in the online environment, these strategic reading practices have intensified. Renear and Palmer, 2009: 828. Broad levels of agreement with the above statement CIBER survey of UK academics (n=228) Agriculture and biological sciences Chemistry and chemical engineering Earth and environmental sciences Economics and econometrics History Physics 80% 85% 90% 75% 55% 90% 19 Strategic reading: do scientists recognise themselves? CIBER Virtual Scholar findings The climate of intense proposal throughput and paper generation to meet targets and funding criteria will eventually push us into [this situation] driven by the lack of time for consistent reading. Chemist, 30-39 The pressure to publish has increased the volume of literature produced in every field, with the result that it is harder and harder to keep on top of it. So, scientists use whatever techniques they can to avoid reading anything which isn’t essential. Chemist, 40-49 Intensified strategic reading is a both good and bad. But it has allowed more work to be done. Mining engineer, 40-49 In general, I agree. A scientific paper has to have an amazing and obvious draw for me to read it if it outside of my immediate field. I rarely have time to read a paper from introduction to final conclusions. Typically I will read the methods and look at the results and then skim through the discussion. Botanist, 30-39 This does reflect to some extent the changing practices over the past two decades. The pressure to publish has increased the volume of literature produced in every field, with the result that it is harder and harder to keep on top of it. So, scientists use whatever techniques they can to avoid reading anything which isn’t essential. Economist, 60-65 Zoologist, 40-49 ... finding papers by citations and reading only sections . abstracts of papers does speed up the [research] process Sounds pretty accurate. Scientists want to “do”, not read! Agronomist, 40-49 Historian, 40-49 Metallurgist, 40-49 20 So, what’s new? Scientists and literature anxiety It is certainly impossible for any person who wishes to devote a portion of his time to chemical experiment, to read all the books and papers that are published; their number is immense, and the labour of winnowing out the few [of interest] .. is such, that most persons who try .., pass by what is really good. Michael Faraday 1826 diary entry 21 Exponential literature growth 1665-2000 Source: Elsevier research department extant journal titles log scale year 22 So, what’s new? Scientists and literature anxiety In 1900, in 1800, and perhaps in 1700, one could look back and say that most of the scientists that have ever been are alive now, and most of what is known has been determined within living memory… The scientific world is no different now from what it has always been since the seventeenth century. Science has always been modern ... Scientists have always felt themselves to be awash in a sea of scientific literature that augments each decade as much as in all times before. Derek de Solla Price Little Science, Big Science, 1963:14-15. 23 Science is always modern Source: Elsevier Science research department 24 Supporting strategic reading 25 Supporting strategic reading What are journal publishers doing? downloadable XML formats, in addition to pdf (e.g. Public Library of Science) downloadable datasets (e.g. SourceOECD) and other supplementary materials podcasts (e.g. Nature) tabbed articles (e.g. New England Journal of Medicine) reference management (e.g. Connotea) structured digital abstracts (e.g. FEBS Letters) semantic markup of text (e.g. Royal Chemistry Society) 26 Stimulating strategic reading What are journal publishers doing? 27 Semantic retrieval Don R Swanson and linking literatures Scenario Two research tribes (1 and 2) - who would not like to be stuck in the lift with one another. Tribe 1 is researching A, Tribe 2 is researching C. They have b in common, but don’t realise it. This poses an information retrieval problem. The two tribes inhabit communication rich environments but they don’t get out much. Tribes 1 and 2 read different journals, go to different conferences. Citation links are rich within the two communities, but almost non-existent between them. A and C are linked transitively by b but citations and reference clicking will not reveal this. Keyword searches on A will not recover C, not vice versa. Search on b and you might see that A and C are linked. Research tribe 1 Research tribe 2 A b C b b But how do you know that searching on b is a productive strategy? How can we load the dice in b’s favour? 28 Semantic retrieval New generation of search engines Textpresso is a text-mining system for scientific literature. It offers (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc). A search engine enables the user to search for one or a combination of these categories and/or keywords within an entire literature. biological concept relationship nematode-specific relationship 29 Allen H. Renear and Carole L. Palmer, Strategic reading, ontologies, and the future of scientific publishing, Science, 14 August 2009, pp 828-832. 30