Information Seeking Behavior of Scientists Brad Hemminger bmh@ils.unc.edu School of Information and Library Science University of North Carolina at Chapel Hill Contributors • Assisting Researchers – – – – – Jackson Fox (web survey) Steph Adams (participant recruiter) Dihui Lu (initial descriptive statistical analysis) Billy Saelim (continued statistical analysis) Chris Weisen (Odum Institute, statistical consultant) • Feedback on Survey Design – UNC Libraries: Bill Burke (Botany), David Romito (Zoology), Jimmy Dickerson (Chemistry), Zari Kamarei (Math/Physics) – KT Vaughan (Health Sciences Library) – Cecy Brown (University of Oklahoma) • Supported by – – – – UNC Libraries Carolina Center for Genome Sciences Basic Science Department chairs RENCI P20 grant Why Study Information Seeking Behavior of Scientists • Goal is to improve scholarly communications. Other areas of my research involve presentation aspects (visualization/computer human interaction) and the storage and communication of scholarly information (digital libraries, institutional repositories, virtual communities of practice). • To do this we need to understand how people search out and use information currently, and why. As part of investigating this we found that there has been a significant change in the last 5-10 years. • So we’re studying ISB both to understand it, and to look at recent changes. How to Study the Information Seeking Behavior of Scientists? • Survey – – – – Reach many people Address common questions Produce lots of feedback for libraries Quantitative, models of variance (“positivist” approach) • Interviews – In depth coverage of selected groups (bioinformatics) – Use grounded theory and critical incident techniques to capture more qualitative, contextual experiences – Develop models of information processing and use Survey--Long Term Plan • Conduct an initial survey study at UNC. Develop survey instrument and interview methodologies that work here, but could easily be applied on a larger scale. • From the results of the initial UNC study, draft national version (with feedback from national sites). • Run national study. Setup so that other sites only have to recruit subjects; the entire survey runs off of UNC website. Hopefully this results in large number of sites and participants for minimal experimental costs. Survey Sampling Technique • Census – Need to be able to reach all members – Best if can get response from large segment of population – Results in potentially more input from wider audiences, especially for the open comment questions. – Subject to bias (only computer users take, etc.) • Random sample – Statistically, generally a better choice – Higher cost and significantly more work due to identifying and following up with individual subjects Questions • Questions were based on – Prior studies with which we wished to correlate our results. This is facilitated by authors who have published their surveys (in papers as appendix, e.g. Cecy Brown), and especially to folks who have put theirs collections of surveys online (e.g. Carol Tenopir). – This allows us to compare results over time, as well as to clarify current practices (for instance whether print or electronic formats are used—and looking breaking this out into two questions, retrieval versus reading) – Covering issues that our librarians were concerned about – Developed during several drafts and that were reviewed by representatives from all libraries on campus. Survey Instrument Choices • • • • Paper Phone Email Web-based. While these can require more effort than anticipated, if the number of survey respondents is over several hundred it is generally more cost effective*. This seemed the best choice since our pilot survey was of several thousand subjects, and our national survey was planned for tens of thousands. Since we have web and database expertise we were able to automate the process with minimal startup costs. *[Schonlau 2001, “Conducting Research Surveys via E-mail and the Web”]. Data Acquisition Details • PHP Surveyor used for web based survey. Another common choice at our school for simpler surveys is Survey Monkey. PHP Surveyor allowed us to ask multi-part questions, and to constrain answers to specific format responses. • PHP Surveyor dumps data directly into MySQL database. • Data is cleaned up then feed into SAS for analysis. (data cleaning is still a significant manual effort! Examples were determining Dept/CB, browsers that didn’t validate datatypes on forms properly). Subjects and Recruitment • Subjects are university faculty, grad students and research staff. • We approached all science department chairs to get support first. • Contact – Initial contact was by email giving motivation for study, indication of support by depts&campus, and link to web-based survey. – Follow-ups by letter, then two emails – Flyers in department, Pizza Party Rewards Look at Survey 902 participants from recruited departments, which were classified as either science or medicine. Participation rate was 26%. Participants by Department Survey Analysis • For the quantitative response variables standard descriptive statistics (mean, min, max, standard deviation) are computed, and histograms are used to visualize the distribution. • Categorical variables are reported as counts and percentages for each category, and displayed as frequency tables. Analysis: Correlations • Categorical vs Categorical – Chi-square • Categorical vs Quantitative – Analysis of Variance • Quantitative vs Quantitative – Correlation • Examples are by dept analysis of other features; age vs preferred interface (Google or Library) Participants Position Science Science (%) Medicine Medicine (%) Total Total (%) professor 58 12.47 39 8.92 97 10.75 associate professor 23 4.95 41 9.38 64 7.10 assistant professor 40 8.60 46 10.53 86 9.53 research staff/adjunct 15 3.23 17 3.89 32 3.55 post graduate/fellow 46 9.89 37 8.47 83 9.20 others 19 4.09 48 10.98 67 7.43 doctoral student 246 52.90 179 40.96 425 47.12 masters student 18 3.87 30 6.86 48 5.32 Gender Science Science % Medicine Medicine Total Total % % Female 179 38.49 280 64.07 459 50.89 Male 286 61.51 157 35.93 443 49.11 Distance to Library Distance to Library Count Percentage Same building 175 19.40 1/4 mile 570 63.19 1/2 mile 88 9.76 1 mile or more 69 7.65 Simple Questions • Ninety-one percent of the participants had access to the internet in their office or lab. • Do you maintain a personal article collection?” Most all participants (85.4%) responded that they did, while only 14.6% did not • Do you maintain a personal bibliographic database for print and/or electronic references?”, and 52.2% of the participants did maintain one, while 47.8% did not. How often do you use… Daily or Week ly % daily weekly monthl y quarterly annuall y never book 24% 60 157 241 223 148 73 journal 87% 509 277 72 22 6 16 preprint 18% 57 105 155 109 72 404 conference 2% 4 14 37 193 492 162 proceeding 5% 14 37 79 168 273 331 webpage 70% 362 277 132 67 19 45 online database 67% 293 311 119 49 32 98 personal communic ation 52% 241 228 132 114 64 123 5 7 3 0 2 885 other 1% Most Important Individual Sources Basic Science Journals Count Medicine Journals Count Science 99 Science 45 Nature 90 Nature 39 Cell 36 JAMA 38 Journal of the American Chemical Society 34 30 UpToDate Journal of Cell Biology 20 New England Journal of Medicine 28 Journal of Biological Chemistry 19 18 Journal of Immunology Analytical Chemistry 18 American Journal of Epidemiology 17 PNAS 13 Cell 16 Journal of Neuroscience 12 Lexi-Comp 15 Evolution 11 Journal of Biological Chemistry 14 Neuron 11 Epidemiology 13 Development 10 AIDS 12 Journal of Organic 12 Important Alerts Basic Science Alerts Count Medical Alerts Count PubMed 40 PubMed 53 Faculty of 1000 27 Medscape 11 ISI 14 Nature 10 ACS Journal Alert 11 Faculty of 1000 9 Nature 10 PubCrawler 9 ScienceDirect 9 ISI 7 Science 7 ePocrates 6 PubCrawler 4 ASHP 5 Biomail 3 NEJM 5 COS 3 MDLinx 4 J Biol Chem 3 Science 4 ACM 2 ScienceDirect 4 ArXiv 2 ADA Daily Knowledge BMC alerts 2 JAMA 3 Cancer Research 2 Kaiser listserv 3 3 Tools for Searching Information Search tool type Frequency Percentage Citation index database 1084 47.25% General web search engine 694 30.25% Fulltext digital library 156 6.80% Personal search tool 125 5.45% Knowledgebase web portal 93 4.05% Others 69 3.01% Online or local database 52 2.27% Library collection 21 0.92% Types of Information Sources Sources Science Medicine Total (electronic) library subscribed journal 20.17 19.89 20.03 (electronic) open (free) access journal or institutional repository or digital library 7.86 9.29 8.57 (print) library subscribed journal 4.48 3.61 4.05 (electronic) web site (author's website) 4.36 3.31 3.89 (print) Personally subscribed journal 3.44 4.01 3.73 (print) copy of colleague's print copy 1.07 5.00 3.00 (electronic) personal subscribed journal 3.10 2.65 2.88 (electronic) personal digital library 2.89 1.97 2.43 (electronic) lab subscribed journal 2.72 1.14 1.97 (electronic) copy of colleague's electronic copy 1.60 1.98 1.79 (print) lab subscribed journal 2.05 0.79 1.43 (print) interlibrary loan 0.59 0.55 0.57 (print) document delivery service 0.13 0.19 0.16 other 0.02 0.13 0.07 Articles in Personal Collection Number of Articles Print Print % Electronic Electronic % none 45 104 1-49 154 21.24% 259 38.89% 50-99 160 22.07% 127 19.07% 100-499 280 38.62% 210 31.53% 500-999 81 11.17% 44 6.61% 1000+ 50 6.90% 26 3.90% Articles in Personal Article Collection that have annotations Percentage of entries with notes Total count Total Percentage <10% 327 36.25 11-20% 75 8.31 21-30% 82 9.09 31-40% 30 3.33 41-50% 126 13.97 51-60% 19 2.11 61-70% 26 2.88 71-80% 100 11.09 81-90% 47 5.21 >90% 70 7.76 Preferred Search Method Science Science Medicine Medicine % % Total Total % Electronic versions of databases and journals 443 95.27 429 98.17 872 96.67 Print versions of databases and journals 22 4.73 8 1.83 30 3.33 Preferred Viewing Method Science Science (%) Medicine Medicine (%) Total Total (%) Both/it depends 292 62.80 260 59.50 552 61.20 electronic (computer) only 63 13.55 52 11.90 115 12.75 print (hard copy) only 110 23.66 125 28.60 235 26.05 Number of Visits to the Library in the past 12 Months Science Science% Medicine Medicine % Total Total% 0-2 101 21.72% 107 24.49% 208 23.06% 3-5 75 16.13% 99 22.65% 174 19.29% 6-10 77 16.56% 71 16.25 148 16.41 11-20 84 18.06% 55 12.59 139 15.41 21-50 85 18.28% 67 15.33 152 16.85 51-100 34 7.31% 19 4.35 53 5.88 101-200 7 1.51% 13 2.97 20 2.22 >200 2 0.43% 6 1.37 8 0.89 Reasons for Visiting the Library Science Science Medicine Medicine Total Total photocopy 256 22.54% 274 22.81% 530 22.68% get assistance from a librarian 65 5.72% 96 7.99% 161 6.89% use computers 59 5.19% 112 9.33% 171 7.32% perform searches 81 7.13% 117 9.74% 198 8.47% read current journals or other materials 161 14.17% 156 12.99% 317 13.56% quiet reading space 156 13.73% 179 14.90% 335 14.33% meeting 45 3.96% 73 6.08% 118 5.05% browse 99 8.71% 60 5.00% 159 6.80% pick up /drop off materials 214 18.84% 134 11.16% 348 14.89% Factors Affecting Choice of Journal to Publish In Factors Affecting Choice of Journal to Publish in Science Medicine Total Ability to include links, color, graphics, multimedia 1.38 1.24 2.31 audience 3.52 3.38 4.45 author having to pay cost of publication 1.51 1.54 2.53 availability on campus 1.79 1.88 2.83 editorial board 2.11 1.95 3.03 page charges for long articles or color figures 1.40 1.45 2.42 speed of publication 2.42 2.27 3.35 standing of journal in your field 3.77 3.61 4.70 support of open access to journal articles 2.09 2.17 3.13 Google vs Library Search Page • “Which interface would you rather use to begin you search process?” with the possible responses “Google search page” and “Your library’s home page”. Overall, a slight majority of users preferred Google (53.3%) over the library page (46.7%); however, the difference was substantially larger for basic science researchers (Google 58.5% versus Library 41.5%) compared to medical researchers (Google 52.2% versus Library 47.8%). Google vs Library Search Page • This difference may also be larger if the question had asked which style or type of interface the users preferred, as many of the comments in the survey indicated a strong preference for a single “meta” search tool where the user could enter a single search string that would result in all content in all resource collections being searched (as opposed to manually identifying resource collections and individually searching them). Summary We never leave our chairs… • Most all information seeking and use interactions occur on the researchers’ computer in their office. • As a result library visits have dramatically declined, and the reasons for visits to library have changed. • Researchers read both in electronic and print form, but print (paper) is still the most preferred form. Single Text Box + MetaSearch • Researchers prefer a single text box for initial searching, that covers all resources. • This is most evidenced by preference for Google Scholar over library web page interfaces. More than just text • Researchers are making increasing use of content contained in online databases like Genbank, or web pages of research labs. • For the scientists in our survey this type of access has surpassed personal communications and is close to journal articles in frequency of usage by researchers. Transformative Changes • Transformative collaborative group communications have already taken place in the consumer marketplace, and are finding their way into scholarly communications. Examples include folksonomies supporting community tagging (Del.icio.us), comment and review systems like Amazon’s rankings, FLickr, etc. Beginnings of similar changes are in their initial stages for scholarly communities, for instance Faculty of 1000 and the Connotea application for online sharing of bibliographic databases and annotations by scientists. What might the future hold? • In the future the researcher may all maintain all their scholarly knowledge online and make it accessible to others as they see fit. Having scholars’ descriptions and annotations of the digital scholarly materials as well as the materials themselves available on the web will allow online communities and community review systems to blossom, just like the availability of online journals articles has transformed basic information seeking of science scholars today. Future Work • Upcoming papers from UNC survey – Correlations, information seeking behavior predictions from demographics – By department/research area comparisons – Review and reflection on major changes (with Cecy Brown, Don King, Carol Tenopir) – Textual analysis of library comments (Meredith Pulley) – New work being proposed by other researchers using this data (if you think the data from this study might help you in your research come talk to me). • National Study….about to begin… • Interview Studies (labs, individuals) bmh@ils.unc.edu