PubMed Search: Building Query - HSLS

advertisement

Informatics for biologists

9 th June 2009

Ansuman Chattopadhyay, PhD

Head, Molecular Biology Information Services

Health Sciences Library System

University of Pittsburgh ansuman@pitt.edu

http://www.hsls.pitt.edu/guides/genetics

Goals

HSLS Molecular Biology

Information Service

Literature searching tools

Gene-centered databases

DNA and Protein analysis tools http://www.hsls.pitt.edu/guides/genetics

My Background

1991-1996: University of Nebraska, Lincoln

PhD: Biochemistry, Protein Synthesis

1996-2000: Vanderbilt University School of

Medicine, Nashville, Cancer Biology

2000-2001: Cellomics Inc. Pittsburgh,

Bioinformatics

2002-2008: HSLS, University of Pittsburgh,

Molecular Biology Information service http://www.hsls.pitt.edu/guides/genetics

Molecular Biology in the Library

University of Washington, Health Sciences Library

University of Pittsburgh, Falk Library

Harvard Medical School, Countway Medical Library

Washington University , Bernard Becker Medical

Library

University of Florida, Health Sciences Center Library

University of Southern California , Norris Medical

Library

Stanford University, Lane Medical Library http://www.hsls.pitt.edu/guides/genetics

Information Overload

5,200

Journals

197K • Breast Cancer

82K • Schizophrenia

6.4K

• BRCA1

48 K • p53 http://www.hsls.pitt.edu/guides/genetics

Sequence Based Information http://www.hsls.pitt.edu/guides/genetics

In silico Support

Literature

Search

Sequence

Analysis

Omics

Data

Analysis http://www.hsls.pitt.edu/guides/genetics

http://www.hsls.pitt.edu/guides/genetics

Literature Informatics

 US National Library of Medicine (NLM) Medline

Database

 Medical Subject Heading

 PubMed Based Literature Informatics Tools

ClusterMed

GoPubMed

NovoSeek

EtBlast

PubMed

MESH

Database

ClusterMed

GoPubMed http://www.hsls.pitt.edu/guides/genetics

PubMed (http://pubmed.gov)

MEDLINE is the largest component of PubMed the freely accessible online database of biomedical journal citations and abstracts created by the U.S. National Library of Medicine

(NLM®).

5,200 journals published in the United States and more than 80 other countries

Contains citations indexed 1949 to the present .

http://www.hsls.pitt.edu/guides/genetics

PubMed Search Stats http://www.hsls.pitt.edu/guides/genetics

PubMed Display: Abstract Plus http://www.hsls.pitt.edu/guides/genetics

PubMed Display: Medline http://www.hsls.pitt.edu/guides/genetics

Medical Subject Headings (MeSH)

The U.S. National Library of

Medicine's controlled vocabulary (thesaurus)

Arranged in a hierarchical manner called the MeSH Tree

Structures

Updated annually http://www.hsls.pitt.edu/guides/genetics

MeSH Vocabulary

Headings over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste)

Subheadings attached to headings to describe a specific aspect of a concept

(adverse effects , metabolism, diagnosis, therapy)

Supplementary Concept Records over 172,000 terms in a separate chemical thesaurus -updated weekly (cordycepin , valspodar , tacrolimus binding protein 4)

 Publication Types

(Letter, Review, Randomized Controlled Trial) http://www.hsls.pitt.edu/guides/genetics

MeSH Tree Structure

A. Anatomy

B. Organisms

C. Diseases

D. Chemical and Drugs

E. Analytical, Diagnostic and

Therapeutic Techniques and Equipment

F. Psychiatry and Psychology

G. Biological Sciences

H. Physical Sciences

I. Anthropology, Education,

Sociology and Social Phenomena

J. Technology and Food and Beverages

K. Humanities

L. Information Science

M. Persons

N. Health Care

V. Publication Characteristics

Z. Geographic Locations http://www.hsls.pitt.edu/guides/genetics

MeSH Indexing http://www.hsls.pitt.edu/guides/genetics

Source: NLM

MeSH Indexing

Genes/Chemicals

MeSH Terms http://www.hsls.pitt.edu/guides/genetics

PubMed Search

 Retrieve published articles on

Dengue outbreaks

Dengue outbreaks in India

Dengue outbreaks outside India including

Statistical and numerical data on dengue outbreaks http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Dengue Outbreaks

1 http://www.ncbi.nlm.nih.gov/mesh?itool=sidebar http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Dengue Outbreaks

MESH DATABASE

2

Dengue http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query

5

3

4 http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query

6 http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query

7 http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query

9

8 http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query

10 http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query http://www.hsls.pitt.edu/guides/genetics

PubMed Search: Building Query

Term

Dengue

Dengue

Dengue

Dengue

Boolean

AND

AND

AND

AND

Term Boolean Term outbreaks outbreaks AND outbreaks NOT

Outbreaks/

Statistics and numerical data

India

India

# of

Papers

687

109

578

82 http://www.hsls.pitt.edu/guides/genetics

Search Result Clustering http://www.hsls.pitt.edu/guides/genetics

Search Result Clustering http://clusty.com

http://www.hsls.pitt.edu/guides/genetics

Vivisimo ClusterMed http://demos.vivisimo.com/vivisimo/cgi-bin/query-meta?v:frame=form&frontpage=1&v:project=clustermed http://www.hsls.pitt.edu/guides/genetics

Query Details

Dengue AND Outbreaks AND India 109 http://www.hsls.pitt.edu/guides/genetics

ClusterMed

Topics http://www.hsls.pitt.edu/guides/genetics

Clustering in PubMed: ClusterMed

 organizes the long list of results returned by PubMed into hierarchical folders with meaningful categories, allowing researchers to hone in on the most relevant results quickly.

 does this on-the-fly without requiring any preprocessing, using terms taken from the brief descriptions in the search results.

 with the folders, users can discover themes, view related articles, and drill down from the topic hierarchy http://www.hsls.pitt.edu/guides/genetics

Topical Clusters

Institutions

Authors

Pub Dates http://www.hsls.pitt.edu/guides/genetics

GoPubMed Developed by Transinsight http://www.gopubmed.com

/ http://www.hsls.pitt.edu/guides/genetics

GoPubMed

Dengue AND Outbreaks NOT India 578 http://www.hsls.pitt.edu/guides/genetics

GoPubMed

Authors

Journals

Pub Dates

Dengue AND Outbreaks NOT http://www.hsls.pitt.edu/guides/genetics

India 578

PubMed Advanced Search http://www.hsls.pitt.edu/guides/genetics

Literature Search Workflow

ClusterMed

PubMed

MESH

Database

GoPubMed http://www.hsls.pitt.edu/guides/genetics

PubMed Query with Boolean

 Retrieve articles focused on molecular genetics of Schizophrenia published in past five years :

 ("schizophrenia"[Majr] AND (("genetics, medical"[MeSH Terms] OR

("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All

Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields])

OR "genetics"[Subheading] AND ("genetics"[Subheading] OR

"genetics"[All Fields] OR "genetics"[MeSH Terms])) AND

("2004/03/13"[PDat] : "2009/03/11"[PDat]) http://www.hsls.pitt.edu/guides/genetics

Research on Optimal Search Strategies http://www.hsls.pitt.edu/guides/genetics

PubMed Pre-Build Queries http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.shtml

http://www.hsls.pitt.edu/guides/genetics

PubMed Query with Boolean

 Retrieve articles focused on molecular genetics of Schizophrenia published in past five years:

 ( "schizophrenia"[Majr] AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND

"medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND

"genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR

"genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR

"genetics"[MeSH Terms])) AND ("2004/03/13"[PDat] : "2009/03/11"[PDat]) http://www.hsls.pitt.edu/guides/genetics

GoPubMed Analysis http://www.hsls.pitt.edu/guides/genetics

GoPubMed Search Result Statistics http://www.hsls.pitt.edu/guides/genetics

Go PubMed Authors Network http://www.hsls.pitt.edu/guides/genetics

GoPubMed World Map of Publications http://www.hsls.pitt.edu/guides/genetics

My NCBI: an Automated email

Notification Tool

Save your searches at MyNCBI and set up an email notification on new publication based on your search query http://www.hsls.pitt.edu/guides/genetics

My NCBI

3

2 http://www.hsls.pitt.edu/guides/genetics

1

My NCBI

4 http://www.hsls.pitt.edu/guides/genetics

NovoSeek : a Search Engine for Biomedical

Literature in Medline and US Grants http://www.hsls.pitt.edu/guides/genetics

NovoSeek Search: Alzheimer Clinical Trials http://www.hsls.pitt.edu/guides/genetics

Literature Search Workflow

Authors

R

A

T

U

R

E

L

I

T

E

Global Map

Topics

Authors Networks

S

E

A

R

C

H

Journals Genes http://www.hsls.pitt.edu/guides/genetics

eTBLAST

ABSTRACT

 Visualize Related Articles

 Find an Expert in this Field

 Find a Journal for your Manuscript

 View the Publication History of this Topic http://www.hsls.pitt.edu/guides/genetics

eTBLAST http://www.hsls.pitt.edu/guides/genetics

eTBLAST Analysis http://www.hsls.pitt.edu/guides/genetics

eTBLAST Analysis on Scientific Papers http://www.hsls.pitt.edu/guides/genetics

Growth of Molecular Databases and Software

Related Publications in PubMed http://www.hsls.pitt.edu/guides/genetics

Source: Nodal Point Blog

Molecular Databases

 Nucleic Acids Research: Oxford Journals

Annual databases Issue

Annual Web Server Issue

 Journals

Bioinformatics

BMC Bioinformatics

 Articles on molecular databases

ClusterMed search : 3196 http://www.hsls.pitt.edu/guides/genetics

Growth of Molecular Databases

2008:1078

2008: 1078 http://www.hsls.pitt.edu/guides/genetics

Source: Nodal Point Blog

Molecular Databases Catalog

HSLS

OBRC

1768 links to databases and software

~2050 hits/day http://www.hsls.pitt.edu/guides/genetics

HSLS OBRC http://www.hsls.pitt.edu/guides/genetics

HSLS OBRC http://www.hsls.pitt.edu/guides/genetics

HSLS OBRC http://www.hsls.pitt.edu/guides/genetics

Search.HSLS.MolBio

 Integrated search system

Databases & Software

Articles on Databases & Software

Genes/Proteins

Pathways

Protocols

Recommended Articles

 Tabbed browsing

 Clustered search results http://www.hsls.pitt.edu/guides/genetics

Search.HSLS.MolBio

http://www.hsls.pitt.edu/guides/genetics

Molecular Databases and Software: protein structure prediction http://www.hsls.pitt.edu/guides/genetics

Molecular Databases and Software:

Sumoylation http://www.hsls.pitt.edu/guides/genetics

Genes/Proteins Info http://www.hsls.pitt.edu/guides/genetics

Entrez Gene http://www.hsls.pitt.edu/guides/genetics

Genes/Proteins Info: p53 http://www.hsls.pitt.edu/guides/genetics

Clustering by Organisms http://www.hsls.pitt.edu/guides/genetics

Search pathways:p53 http://www.hsls.pitt.edu/guides/genetics

Protocols: http://www.hsls.pitt.edu/guides/genetics

Search protocols for microarray data analysis http://www.hsls.pitt.edu/guides/genetics

Recommended Articles http://www.hsls.pitt.edu/guides/genetics

Recommended Articles http://www.hsls.pitt.edu/guides/genetics

Hands-on exercise

 Locate databases on

Natural antisense, UTR , copy number variation

Sumoylation, phosphorylation

 Retrieve gene information for

Your favorite gene or BRCA1 or BAD

 Find a suitable protocol for experiments:

Real time PCR , methylation PCR , CGH array

Multiple sequence alignment , BLAST search

Protein structure prediction http://www.hsls.pitt.edu/guides/genetics

HSLS Mol Bio Homepage Hits

Hits /Day

2853

1576

1027

329

2003

439

523

2004 2005 2006 http://www.hsls.pitt.edu/guides/genetics

2007 2008

http://www.hsls.pitt.edu/guides/genetics

HSLS Molecular Biology Information Service

Workshops

Bioinformatics

Consultations

Software

Licensing http://www.hsls.pitt.edu/guides/genetics

Website

HSLS Licensed Bioinformatics

Resources http://www.hsls.pitt.edu/guides/genetics

HSLS Licensed Tools http://www.hsls.pitt.edu/guides/genetics

Pathway Drawing Tool: PathwayBuilder http://www.hsls.pitt.edu/guides/genetics

Pathway Drawing Tool: PathwayBuilder http://www.hsls.pitt.edu/guides/genetics

HSLS Licensed Tools http://www.hsls.pitt.edu/guides/genetics

VectorNTI http://www.hsls.pitt.edu/guides/genetics

Sequencher http://www.hsls.pitt.edu/guides/genetics

HSLS Licensed Tools http://www.hsls.pitt.edu/guides/genetics

Microarray Data Analysis raw data

GeneSpring

Statistical packages

Gene List

Literature findings

Biology

• Ingenuity IPA

• GeneGO Metacore

• PathwayArchitect

• Explain http://www.hsls.pitt.edu/guides/genetics

Omics Tools

Control

A549 cell

+

Ethanol (25 m mol/L) 48 hr http://www.hsls.pitt.edu/guides/genetics

Treated

A549 cell

+

Resveratrol (25 m mol/L) 48 hr

Gene Expression Databases

NCBI

• Gene Expression Omnibus

(GEO)

EBI

• ArrayExpress http://www.hsls.pitt.edu/guides/genetics

Raw data processing : ArrayAssist http://www.hsls.pitt.edu/guides/genetics

Microarray Data Processing: ArrayAssist

Workflow: http://www.hsls.pitt.edu/guides/genetics

Microarray Processed Data http://www.hsls.pitt.edu/guides/genetics

Discovery Step: Pathway Analysis

Gene List

Ingenuity

IPA

GeneGo

Metacore

Biobase

ExPlain http://www.hsls.pitt.edu/guides/genetics

Common Functions and Pathways: IPA

Analysis http://www.hsls.pitt.edu/guides/genetics

Common Functions and Pathways: IPA

Analysis http://www.hsls.pitt.edu/guides/genetics

Common Pathways: IPA http://www.hsls.pitt.edu/guides/genetics

Pathway Map: p53 http://www.hsls.pitt.edu/guides/genetics

Pathway Map: NFkB and TGF beta http://www.hsls.pitt.edu/guides/genetics

Interaction Map: IPA http://www.hsls.pitt.edu/guides/genetics

Interaction Map with Data Overlay: IPA http://www.hsls.pitt.edu/guides/genetics

Pathway Analysis: GeneGo Metacore http://www.hsls.pitt.edu/guides/genetics

Metacore: Analyze Network (receptors)

C-Myc p53

Pathway Analysis: GeneGo Metacore http://www.hsls.pitt.edu/guides/genetics

Promoter Analysis: BioBase Explain

 Over representation of TF binding sites http://www.hsls.pitt.edu/guides/genetics

BioBase Explain: Key Molecules Prediction

Key Molecules

Over representation of

TF Binding site

Promoter http://www.hsls.pitt.edu/guides/genetics

Microarray

Use of Ingenuity, GeneGO and BioBase

 DNA microarrays

 Protein arrays

 CHIP-chip

 Search for

Gene/protein info (human, mouse, rat, yeast)

Drug/small molecule

Disease info

Promoter sequences

Transcription factors binding sites http://www.hsls.pitt.edu/guides/genetics

Gene Regulation: Transfac and

TransPro http://www.hsls.pitt.edu/guides/genetics

HSLS Molecular Biology Information Service

Workshops

Bioinformatics

Consultations

Software

Licensing http://www.hsls.pitt.edu/guides/genetics

Website

HSLS Bioinformatics Workshops http://www.hsls.pitt.edu/guides/genetics

Online PPT

Bioinformatics FAQ: How Do I?

http://www.hsls.pitt.edu/guides/genetics

http://www.hsls.pitt.edu/guides/genetics

Bioinformatics Databases & Software Providers

 NCBI

Home page

Site map

Resource Guide

 EBI

Home page

Databases

Software http://www.hsls.pitt.edu/guides/genetics

US NCBI Databases http://www.hsls.pitt.edu/guides/genetics

NCBI sequence databases

 GenBank

 archival database of nucleotide sequences from >160,000 organisms More info

 GenPept

 conceptual translation of GenBank CDS

 Refseq

 based on GenBank record, non-redundant expert verified databases of reference sequences http://www.hsls.pitt.edu/guides/genetics

International Nucleotide Sequence

Database Collaboration http://www.hsls.pitt.edu/guides/genetics

Primary Vs Derivative databases http://www.hsls.pitt.edu/guides/genetics

RefSeq Scope & Accessions

 Genomic DNA

NC_123456 - complete genome, complete chromosome, complete plasmid

NG_123456 - genomic region

NT_123456 - genomic contig

 mRNA NM_123456

 Protein - NP_123456

 more about RefSeq scope and accessions... http://www.hsls.pitt.edu/guides/genetics

Entrez Gene

 a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map Viewer

 Statistics

Gene: 4818 organisms

Genbank: 160,000 organisms

 each record represents a single gene from a given organism http://www.hsls.pitt.edu/guides/genetics

US NCBI : Entrez Gene

Chromosomal

Localization

Genomic

Sequence mRNA Sequence

Amino acid

Sequence

Homologous

Sequences

Expression

Profile

3D Structure

Interacting

Partners

Disease SNP http://www.hsls.pitt.edu/guides/genetics

Entrez Gene

Find:

 gene symbols and aliases sequences: genomic, mRNA, protein

 intron-exon architecture genomic context: neighboring and antisense genes

Interacting partners

 associated gene ontology terms: function, cellular component and biological process http://www.hsls.pitt.edu/guides/genetics

Entrez Gene search

 Query: BRCA1

 Search Tips:

Query text box: BRCA1

 Limits:

 select: “Gene name” from drop-down menu

Limit by taxonomy: select “Homo sapiens” http://www.hsls.pitt.edu/guides/genetics

Alternative Splicing http://www.hsls.pitt.edu/guides/genetics

Gene Ontology ( GO )

Controlled vocabulary tagging

Function

Biological Processes

Cellular Component

http://www.hsls.pitt.edu/guides/genetics

Sample Exercises

 Find intron-exon coordinates for your favorite gene

 What are the genes involved in the disease phenylketonuria in humans?

 From where you can order cDNA clone for the human

PAH gene?

 How many splice variants have been reported for human EGFR gene?

 What is the genomic sequence for mouse muscle protein titin?

http://www.hsls.pitt.edu/guides/genetics

Gene-centered open access resources: beyond NCBI & EBI

 Weizmann Institute of Science : GeneCards

 UCSC genome bioinformatics:

UCSC gene page

UCSC BLAT

How do I?

identify a short nucleotide fragment and determine its corresponding protein sequence?

http://www.hsls.pitt.edu/guides/genetics

HSLS licensed Gene-centered Resources

 Ingenuity IPA : Ingenuity Systems

Expert extracted gene focused knowledgebase

 Metacore : GeneGo

Human curated gene-based knowledge library

 Protein Lounge

Biological pathway database

 Biobase Knowledge Library

Promoter sequence database

 Transfac http://www.hsls.pitt.edu/guides/genetics

Hands-on exercise:

 Take your favorite gene or p53 (human) as a search term and

 find its promoter sequence

 identify transcription factors which could bind its regulatory region

 find its upstream regulators and report supporting citations http://www.hsls.pitt.edu/guides/genetics

Thank you!

Any questions?

Carrie Iwema iwema@pitt.edu

412-383-6887

Ansuman Chattopadhyay ansuman@pitt.edu

412-648-1297 http://www.hsls.pitt.edu/guides/genetics

Download