Online reading and power browsing Some interesting questions for e-research Ian Rowlands

advertisement
Online reading and power browsing
Some interesting questions for e-research
Ian Rowlands
UCL Department of Information Studies CIBER group
1
Understanding online behaviour
Trends in scientific reading
Supporting strategic reading
2
Understanding online behaviour
3
Reading the tea leaves
Deep log analysis
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/google-sm.gif HTTP/1.1" 200 654 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/xhtml1.0.png HTTP/1.1" 200 929 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/css.png HTTP/1.1" 200 918 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
4
Reading the tea leaves
Deep log analysis
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/google-sm.gif HTTP/1.1" 200 654 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/xhtml1.0.png HTTP/1.1" 200 929 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/css.png HTTP/1.1" 200 918 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
Strengths of DLA
Weaknesses of DLA
Comprehesive data, no sampling issues.
What people actually did, not what they remember
or invent
Grounded theory, no prior assumptions
5
Reading the tea leaves
Deep log analysis
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/google-sm.gif HTTP/1.1" 200 654 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/xhtml1.0.png HTTP/1.1" 200 929 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
86.42.128.213 - - [02/Aug/2009:23:56:31 +0100] "GET /image/css.png HTTP/1.1" 200 918 "http://
www.slais.ucl.ac.uk/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR
2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
Strengths of DLA
Comprehesive data, no sampling issues.
Weaknesses of DLA
Little or no contextual information about task or
motivation.
What people actually did, not what they remember Unit of analysis is usually the session, not the
or invent
individual.
Grounded theory, no prior assumptions
What does it mean? (Value judgements
inappropriate.)
6
What the tea leaves tell us
Deep log analysis
horizontal information seeking: skimming, viewing
1-2 pages of an online source, probably never
returning
navigation: extended time navigating a site rather
than viewing content
In densely printed pages of text, reading is linear and
strictly coded. Such texts must be read the way they
are designed to be read: from left to right and from top
to bottom, line by line. Any other form of reading
(skipping, looking at the last page to see how the plot
will be resolved) is a form of cheating and produces a
slight sense of guilt in the reader.
Kress and Van Leeuwen, 2006: 204.
power browsing: very short dwell times, rapid
clicking
squirreling behaviour: squirelling away material,
either by downloading it, bookmarking it, or cutting
and pasting
I am not thinking the way I used to think. I can feel it
most strongly when I am reading. Immersing myself in
a book or lengthy article used to be easy ... deep
reading that used to come naturally has become a
struggle. Once I was a scuba diver in a sea of words.
Now I zip along the surface like a guy on a jet-ski.
Nicholas Carr, 2008:2.
checking: establishing the reliability of information by
rapid cross-checking across multiple sites
7
Age-related differences in information-seeking
BBC Digital Revolution experiment pilot phase
Generation X
born after 1973
Generation Y
born before 1973
Google generation
born after 1993
8
Age-related differences in information-seeking
BBC Digital Revolution experiment pilot phase
Question:
Where did the first commercial flight land?
3.8
0.78
0.54
1.9
20 and younger
21 and older
Number of searches
20 and younger
21 and older
Edit distance
(Where 1=Copy and paste question)
9
Age-related differences in information-seeking
BBC Digital Revolution experiment pilot phase
12.6
4.1
2.2
4.2
20 and younger
21 and older
Pages viewed
20 and younger
21 and older
Domains visited
10
Age-related differences in information-seeking
BBC Digital Revolution experiment pilot phase
4.9
2.5
21.5
11.7
20 and younger
21 and older
Confidence in answer
20 and younger
21 and older
Working memory
(Where 10=Highly confident)
11
Strategic reading
In the physical space
• text
search
gather
browse
scan
assess
chain
filter
compare
arrange
link
annotate
analyse fragments
12
Strategic reading
In the digital space
search
gather
browse
scan
assess
chain
filter
compare
arrange
link
annotate
analyse fragments
13
Trends in scientific reading
14
Trends in information production
Growth of the scientific literature
15
Trends in e-journal use
CIBER analysis of Sconul library returns
The growth in article downloads
n=67 UK universities
219
The graph opposite shows the number of full
text article downloads (from all publishers).
Downloads are indexed to 100 for the
academic year 2003/04 for ease of
comparison.
189
142
In just three years:
!total use more than doubled
!... at a staggering compound annual growth
rate (CAGR) of 21.7 per cent per annum.
100
2003/04
2004/05
2005/06
2006/07
Source: Sconul / COUNTER 2008
16
Trends in e-journal availability
CIBER analysis of Sconul returns
E-journal titles per academic FTE
n=115 UK universities
8
7
E-journal
titles
6
5
4
2001/02
2002/03
2003/04
2004/05
2005/06
2006/07
Source: Sconul / COUNTER 2008
17
Trends in scientific reading
Carol Tenopir and Don King surveys
18
Strategic reading: do scientists recognise themselves?
CIBER Virtual Scholar findings
Scientists have always strived to avoid unnecessary reading. Like all researchers,
they use indexing and citations as indicators of relevance, abstracts and literature
reviews as surrogates for full papers, and social networks of colleagues and
postgraduate students as personal alerting services. The aim is to move rapidly
through the literature to assess and exploit content with as little actual reading as
possible. As indexing, recommending, and navigation has become more
sophisticated in the online environment, these strategic reading practices have
intensified.
Renear and Palmer, 2009: 828.
Broad levels of agreement with the above statement
CIBER survey of UK academics (n=228)
Agriculture and
biological
sciences
Chemistry and
chemical
engineering
Earth and
environmental
sciences
Economics and
econometrics
History
Physics
80%
85%
90%
75%
55%
90%
19
Strategic reading: do scientists recognise themselves?
CIBER Virtual Scholar findings
The climate of intense proposal throughput and
paper generation to meet targets and funding
criteria will eventually push us into [this situation]
driven by the lack of time for consistent reading.
Chemist, 30-39
The pressure to publish has increased the volume
of literature produced in every field, with the result
that it is harder and harder to keep on top of it. So,
scientists use whatever techniques they can to
avoid reading anything which isn’t essential.
Chemist, 40-49
Intensified strategic reading is a both good and
bad. But it has allowed more work to be done.
Mining engineer, 40-49
In general, I agree. A scientific paper has to have
an amazing and obvious draw for me to read it if it
outside of my immediate field.
I rarely have time to read a paper from introduction
to final conclusions. Typically I will read the
methods and look at the results and then skim
through the discussion.
Botanist, 30-39
This does reflect to some extent the changing
practices over the past two decades.
The pressure to publish has increased the volume
of literature produced in every field, with the result
that it is harder and harder to keep on top of it. So,
scientists use whatever techniques they can to
avoid reading anything which isn’t essential.
Economist, 60-65
Zoologist, 40-49
... finding papers by citations and reading only
sections . abstracts of papers does speed up the
[research] process
Sounds pretty accurate. Scientists want to “do”,
not read!
Agronomist, 40-49
Historian, 40-49
Metallurgist, 40-49
20
So, what’s new?
Scientists and literature anxiety
It is certainly impossible for any person
who wishes to devote a portion of his
time to chemical experiment, to read
all the books and papers that are
published; their number is immense,
and the labour of winnowing out the
few [of interest] .. is such, that most
persons who try .., pass by what is
really good.
Michael Faraday 1826 diary entry
21
Exponential literature growth 1665-2000
Source: Elsevier research department
extant
journal
titles
log scale
year
22
So, what’s new?
Scientists and literature anxiety
In 1900, in 1800, and perhaps in 1700,
one could look back and say that most
of the scientists that have ever been are
alive now, and most of what is known
has been determined within living
memory… The scientific world is no
different now from what it has always
been since the seventeenth century.
Science has always been modern ...
Scientists have always felt themselves to
be awash in a sea of scientific literature
that augments each decade as much as
in all times before.
Derek de Solla Price
Little Science, Big Science, 1963:14-15.
23
Science is always modern
Source: Elsevier Science research department
24
Supporting strategic reading
25
Supporting strategic reading
What are journal publishers doing?
downloadable XML formats, in addition to pdf (e.g. Public Library of Science)
downloadable datasets (e.g. SourceOECD) and other supplementary materials
podcasts (e.g. Nature)
tabbed articles (e.g. New England Journal of Medicine)
reference management (e.g. Connotea)
structured digital abstracts (e.g. FEBS Letters)
semantic markup of text (e.g. Royal Chemistry Society)
26
Stimulating strategic reading
What are journal publishers doing?
27
Semantic retrieval
Don R Swanson and linking literatures
Scenario
Two research tribes (1 and 2) - who would not like to be stuck in the lift with one another.
Tribe 1 is researching A, Tribe 2 is researching C.
They have b in common, but don’t realise it.
This poses an information retrieval problem. The two tribes inhabit communication rich environments but they
don’t get out much. Tribes 1 and 2 read different journals, go to different conferences. Citation links are rich within
the two communities, but almost non-existent between them. A and C are linked transitively by b but citations and
reference clicking will not reveal this. Keyword searches on A will not recover C, not vice versa. Search on b and
you might see that A and C are linked.
Research tribe 1
Research tribe 2
A
b
C
b
b
But how do you know that searching on b is a productive strategy? How can we load the dice in b’s favour?
28
Semantic retrieval
New generation of search engines
Textpresso is a text-mining system for scientific literature. It offers
(1) access to full text, so that entire articles can be searched, and
(2) introduction of categories of biological concepts and classes that relate two objects (e.g., association,
regulation, etc.) or describe one (e.g., methods, etc).
A search engine enables the user to search for one or a combination of these categories and/or keywords within an
entire literature.
biological concept
relationship
nematode-specific
relationship
29
Allen H. Renear and Carole L. Palmer, Strategic reading, ontologies, and the future of
scientific publishing, Science, 14 August 2009, pp 828-832.
30
Download