Flying to the Top, One Tweet at a Time: Using Social Media to Rank Online Search Results Robyn B. Reed, MA, MLIS Co-authors: Carrie L. Iwema, PhD, MLS Ansuman Chattopadhyay, PhD Health Sciences Library System University of Pittsburgh Molecular Biology Information Service Workshops Consultations Website Software Licensing Online Bioinformatics Resources Collection (OBRC) http://www.hsls.pitt.edu/obrc/ Resources displayed by keyword ranking http://www.hsls.pitt.edu/obrc/ Challenges: Many tools exist and increasing in number User may retrieve several resources Common question – How do I know which one(s) to use? Goal: Provide up-to-date ratings of most highly regarded resources in bioinformatics Objectives: Using social media, design ranking system of OBRC resources Determine if social media results reflect opinions of bioinformatics experts Why use the social media?? • No official rankings of bioinformatics tools • Opinions of several people • Social media data has many applications http://beta.socialguide.com/ Methodology Wrote 5 research questions Common bioinformatics queries Each question listed 3 possible resources to accomplish that task Methodology Research questions Experts (2) independently ranked resources Resources were ranked using social media data Methodology – Social Media Ranking Sources used for data collection Google Blogs Google Discussions Google Discussions includes • Forums • Groups • Comments www.google.com Methodology – Data Sources Twitter considered and removed • 50% of the resources had zero Tweets • 20% captured non-specific Tweets Facebook not included • Concern over private settings Methodology – Social Media Ranking Searched “all time” Optimized for most accurate retrieval • Resource in quotes • Increased specificity, decreased noise • Fewer hits Methodology – Search Filter • Put all OBRC resources in bioinformatics context • Automate the searches [(“ucsc genome browser”) AND ( bioinformatics | genome | genetics | genomics | computer | algorithm | software | server | database | computer model | protein | proteomics | proteome | gene | DNA | RNA | sequence | alignment | interactions | structure | modeling | prediction | biochemistry | molecular biology | systems biology | computational biology)] Example of search of UCSC genome browser Results Bioinformatics Tools CPHmodels Blogs + Discussion Raw Numbers Social Media Expert 1 Expert 2 Rank Rank Rank 49 2 2 2 17 3 3 3 228 1 1 1 4 2 2 2 728 1 1 1 Primer Design Assistant 0 3 3 3 DIANA-microT 12 1 1 2 9 2 2 3 3 3 3 1 1494 1 1 3 8 3 3 1 63 2 2 2 3070 1 3 2 56 3 2 3 928 2 1 1 3-D protein prediction ESypred3D SWISS-MODEL IDT SciTools PCR primer design Primer3 microRNA target design miRGator siRNA target finder Ambion ClustalW multiple sequence alignment ECR Browser Tcoffee Ensembl genome browsers NCBI Map Viewer UCSC Genome Browser Results Bioinformatics Tools CPHmodels Blogs + Discussion Raw Numbers Social Media Expert 1 Expert 2 Rank Rank Rank 49 2 2 2 17 3 3 3 228 1 1 1 4 2 2 2 728 1 1 1 Primer Design Assistant 0 3 3 3 DIANA-microT 12 1 1 2 9 2 2 3 3 3 3 1 1494 1 1 3 8 3 3 1 63 2 2 2 3070 1 3 2 56 3 2 3 928 2 1 1 3-D protein prediction ESypred3D SWISS-MODEL IDT SciTools PCR primer design Primer3 microRNA target design miRGator siRNA target finder Ambion ClustalW multiple sequence alignment ECR Browser Tcoffee Ensembl genome browsers NCBI Map Viewer UCSC Genome Browser Results Bioinformatics Tools CPHmodels Blogs + Discussion Raw Numbers Social Media Expert 1 Expert 2 Rank Rank Rank 49 2 2 2 17 3 3 3 228 1 1 1 4 2 2 2 728 1 1 1 Primer Design Assistant 0 3 3 3 DIANA-microT 12 1 1 2 9 2 2 3 3 3 3 1 1494 1 1 3 8 3 3 1 63 2 2 2 3070 1 3 2 56 3 2 3 928 2 1 1 3-D protein prediction ESypred3D SWISS-MODEL IDT SciTools PCR primer design Primer3 microRNA target design miRGator siRNA target finder Ambion ClustalW multiple sequence alignment ECR Browser Tcoffee Ensembl genome browsers NCBI Map Viewer UCSC Genome Browser Conclusions: This system can be used to determine highly regarded tools Explain that rankings are subjective; try the top 3-5 resources Provides patron with a starting point when using the OBRC Limitations • Quotation marks can be limiting if resource >1 word • Very small part of the total social media • “Negative” discussion about a resource Future Directions • Test > 3 bioinformatics tools/category • Increase number of expert ratings • Test applicability of system in areas other than bioinformatics Special thanks to: Project collaborators and experts: Ansuman Chattopadhyay, PhD Carrie Iwema, PhD, MLS Research and academic advisors: Nancy Tannery, MLS Rebecca Crowley, MD, MS Funding from the Pittsburgh Biomedical Informatics Training Program NLM Grant 3 T15 LM007059-23S1 Thank you! Any questions? Robyn Reed rreed@pitt.edu