9 th June 2009
Ansuman Chattopadhyay, PhD
Head, Molecular Biology Information Services
Health Sciences Library System
University of Pittsburgh ansuman@pitt.edu
http://www.hsls.pitt.edu/guides/genetics
Goals
HSLS Molecular Biology
Information Service
Literature searching tools
Gene-centered databases
DNA and Protein analysis tools http://www.hsls.pitt.edu/guides/genetics
My Background
1991-1996: University of Nebraska, Lincoln
PhD: Biochemistry, Protein Synthesis
1996-2000: Vanderbilt University School of
Medicine, Nashville, Cancer Biology
2000-2001: Cellomics Inc. Pittsburgh,
Bioinformatics
2002-2008: HSLS, University of Pittsburgh,
Molecular Biology Information service http://www.hsls.pitt.edu/guides/genetics
Molecular Biology in the Library
University of Washington, Health Sciences Library
University of Pittsburgh, Falk Library
Harvard Medical School, Countway Medical Library
Washington University , Bernard Becker Medical
Library
University of Florida, Health Sciences Center Library
University of Southern California , Norris Medical
Library
Stanford University, Lane Medical Library http://www.hsls.pitt.edu/guides/genetics
Information Overload
5,200
Journals
197K • Breast Cancer
82K • Schizophrenia
6.4K
• BRCA1
48 K • p53 http://www.hsls.pitt.edu/guides/genetics
Sequence Based Information http://www.hsls.pitt.edu/guides/genetics
In silico Support
Literature
Search
Sequence
Analysis
Omics
Data
Analysis http://www.hsls.pitt.edu/guides/genetics
http://www.hsls.pitt.edu/guides/genetics
Literature Informatics
US National Library of Medicine (NLM) Medline
Database
Medical Subject Heading
PubMed Based Literature Informatics Tools
ClusterMed
GoPubMed
NovoSeek
EtBlast
PubMed
MESH
Database
ClusterMed
GoPubMed http://www.hsls.pitt.edu/guides/genetics
PubMed (http://pubmed.gov)
MEDLINE is the largest component of PubMed the freely accessible online database of biomedical journal citations and abstracts created by the U.S. National Library of Medicine
(NLM®).
5,200 journals published in the United States and more than 80 other countries
Contains citations indexed 1949 to the present .
http://www.hsls.pitt.edu/guides/genetics
PubMed Search Stats http://www.hsls.pitt.edu/guides/genetics
PubMed Display: Abstract Plus http://www.hsls.pitt.edu/guides/genetics
PubMed Display: Medline http://www.hsls.pitt.edu/guides/genetics
Medical Subject Headings (MeSH)
The U.S. National Library of
Medicine's controlled vocabulary (thesaurus)
Arranged in a hierarchical manner called the MeSH Tree
Structures
Updated annually http://www.hsls.pitt.edu/guides/genetics
MeSH Vocabulary
Headings over 24,000 representing concepts found in the biomedical literature (Body Weight, Kidney, Radioactive Waste)
Subheadings attached to headings to describe a specific aspect of a concept
(adverse effects , metabolism, diagnosis, therapy)
Supplementary Concept Records over 172,000 terms in a separate chemical thesaurus -updated weekly (cordycepin , valspodar , tacrolimus binding protein 4)
Publication Types
(Letter, Review, Randomized Controlled Trial) http://www.hsls.pitt.edu/guides/genetics
MeSH Tree Structure
A. Anatomy
B. Organisms
C. Diseases
D. Chemical and Drugs
E. Analytical, Diagnostic and
Therapeutic Techniques and Equipment
F. Psychiatry and Psychology
G. Biological Sciences
H. Physical Sciences
I. Anthropology, Education,
Sociology and Social Phenomena
J. Technology and Food and Beverages
K. Humanities
L. Information Science
M. Persons
N. Health Care
V. Publication Characteristics
Z. Geographic Locations http://www.hsls.pitt.edu/guides/genetics
MeSH Indexing http://www.hsls.pitt.edu/guides/genetics
Source: NLM
MeSH Indexing
Genes/Chemicals
MeSH Terms http://www.hsls.pitt.edu/guides/genetics
PubMed Search
Retrieve published articles on
Dengue outbreaks
Dengue outbreaks in India
Dengue outbreaks outside India including
Statistical and numerical data on dengue outbreaks http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Dengue Outbreaks
1 http://www.ncbi.nlm.nih.gov/mesh?itool=sidebar http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Dengue Outbreaks
MESH DATABASE
2
Dengue http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query
5
3
4 http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query
6 http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query
7 http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query
9
8 http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query
10 http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query http://www.hsls.pitt.edu/guides/genetics
PubMed Search: Building Query
Term
Dengue
Dengue
Dengue
Dengue
Boolean
AND
AND
AND
AND
Term Boolean Term outbreaks outbreaks AND outbreaks NOT
Outbreaks/
Statistics and numerical data
India
India
# of
Papers
687
109
578
82 http://www.hsls.pitt.edu/guides/genetics
Search Result Clustering http://www.hsls.pitt.edu/guides/genetics
Search Result Clustering http://clusty.com
http://www.hsls.pitt.edu/guides/genetics
Vivisimo ClusterMed http://demos.vivisimo.com/vivisimo/cgi-bin/query-meta?v:frame=form&frontpage=1&v:project=clustermed http://www.hsls.pitt.edu/guides/genetics
Query Details
Dengue AND Outbreaks AND India 109 http://www.hsls.pitt.edu/guides/genetics
ClusterMed
Topics http://www.hsls.pitt.edu/guides/genetics
Clustering in PubMed: ClusterMed
organizes the long list of results returned by PubMed into hierarchical folders with meaningful categories, allowing researchers to hone in on the most relevant results quickly.
does this on-the-fly without requiring any preprocessing, using terms taken from the brief descriptions in the search results.
with the folders, users can discover themes, view related articles, and drill down from the topic hierarchy http://www.hsls.pitt.edu/guides/genetics
Topical Clusters
Institutions
Authors
Pub Dates http://www.hsls.pitt.edu/guides/genetics
GoPubMed Developed by Transinsight http://www.gopubmed.com
/ http://www.hsls.pitt.edu/guides/genetics
GoPubMed
Dengue AND Outbreaks NOT India 578 http://www.hsls.pitt.edu/guides/genetics
GoPubMed
Authors
Journals
Pub Dates
Dengue AND Outbreaks NOT http://www.hsls.pitt.edu/guides/genetics
India 578
PubMed Advanced Search http://www.hsls.pitt.edu/guides/genetics
Literature Search Workflow
ClusterMed
PubMed
MESH
Database
GoPubMed http://www.hsls.pitt.edu/guides/genetics
PubMed Query with Boolean
Retrieve articles focused on molecular genetics of Schizophrenia published in past five years :
("schizophrenia"[Majr] AND (("genetics, medical"[MeSH Terms] OR
("genetics"[All Fields] AND "medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND "genetics"[All
Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields])
OR "genetics"[Subheading] AND ("genetics"[Subheading] OR
"genetics"[All Fields] OR "genetics"[MeSH Terms])) AND
("2004/03/13"[PDat] : "2009/03/11"[PDat]) http://www.hsls.pitt.edu/guides/genetics
Research on Optimal Search Strategies http://www.hsls.pitt.edu/guides/genetics
PubMed Pre-Build Queries http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.shtml
http://www.hsls.pitt.edu/guides/genetics
PubMed Query with Boolean
Retrieve articles focused on molecular genetics of Schizophrenia published in past five years:
( "schizophrenia"[Majr] AND (("genetics, medical"[MeSH Terms] OR ("genetics"[All Fields] AND
"medical"[All Fields]) OR "medical genetics"[All Fields] OR ("medical"[All Fields] AND
"genetics"[All Fields])) OR ("genotype"[MeSH Terms] OR "genotype"[All Fields]) OR
"genetics"[Subheading] AND ("genetics"[Subheading] OR "genetics"[All Fields] OR
"genetics"[MeSH Terms])) AND ("2004/03/13"[PDat] : "2009/03/11"[PDat]) http://www.hsls.pitt.edu/guides/genetics
GoPubMed Analysis http://www.hsls.pitt.edu/guides/genetics
GoPubMed Search Result Statistics http://www.hsls.pitt.edu/guides/genetics
Go PubMed Authors Network http://www.hsls.pitt.edu/guides/genetics
GoPubMed World Map of Publications http://www.hsls.pitt.edu/guides/genetics
My NCBI: an Automated email
Notification Tool
Save your searches at MyNCBI and set up an email notification on new publication based on your search query http://www.hsls.pitt.edu/guides/genetics
My NCBI
3
2 http://www.hsls.pitt.edu/guides/genetics
1
My NCBI
4 http://www.hsls.pitt.edu/guides/genetics
NovoSeek : a Search Engine for Biomedical
Literature in Medline and US Grants http://www.hsls.pitt.edu/guides/genetics
NovoSeek Search: Alzheimer Clinical Trials http://www.hsls.pitt.edu/guides/genetics
Literature Search Workflow
Authors
R
A
T
U
R
E
L
I
T
E
Global Map
Topics
Authors Networks
S
E
A
R
C
H
Journals Genes http://www.hsls.pitt.edu/guides/genetics
eTBLAST
Visualize Related Articles
Find an Expert in this Field
Find a Journal for your Manuscript
View the Publication History of this Topic http://www.hsls.pitt.edu/guides/genetics
eTBLAST http://www.hsls.pitt.edu/guides/genetics
eTBLAST Analysis http://www.hsls.pitt.edu/guides/genetics
eTBLAST Analysis on Scientific Papers http://www.hsls.pitt.edu/guides/genetics
Growth of Molecular Databases and Software
Related Publications in PubMed http://www.hsls.pitt.edu/guides/genetics
Source: Nodal Point Blog
Molecular Databases
Nucleic Acids Research: Oxford Journals
Annual databases Issue
Annual Web Server Issue
Journals
Bioinformatics
BMC Bioinformatics
Articles on molecular databases
ClusterMed search : 3196 http://www.hsls.pitt.edu/guides/genetics
Growth of Molecular Databases
2008:1078
2008: 1078 http://www.hsls.pitt.edu/guides/genetics
Source: Nodal Point Blog
Molecular Databases Catalog
HSLS
OBRC
1768 links to databases and software
~2050 hits/day http://www.hsls.pitt.edu/guides/genetics
HSLS OBRC http://www.hsls.pitt.edu/guides/genetics
HSLS OBRC http://www.hsls.pitt.edu/guides/genetics
HSLS OBRC http://www.hsls.pitt.edu/guides/genetics
Search.HSLS.MolBio
Integrated search system
Databases & Software
Articles on Databases & Software
Genes/Proteins
Pathways
Protocols
Recommended Articles
Tabbed browsing
Clustered search results http://www.hsls.pitt.edu/guides/genetics
Search.HSLS.MolBio
http://www.hsls.pitt.edu/guides/genetics
Molecular Databases and Software: protein structure prediction http://www.hsls.pitt.edu/guides/genetics
Molecular Databases and Software:
Sumoylation http://www.hsls.pitt.edu/guides/genetics
Genes/Proteins Info http://www.hsls.pitt.edu/guides/genetics
Entrez Gene http://www.hsls.pitt.edu/guides/genetics
Genes/Proteins Info: p53 http://www.hsls.pitt.edu/guides/genetics
Clustering by Organisms http://www.hsls.pitt.edu/guides/genetics
Search pathways:p53 http://www.hsls.pitt.edu/guides/genetics
Protocols: http://www.hsls.pitt.edu/guides/genetics
Search protocols for microarray data analysis http://www.hsls.pitt.edu/guides/genetics
Recommended Articles http://www.hsls.pitt.edu/guides/genetics
Recommended Articles http://www.hsls.pitt.edu/guides/genetics
Hands-on exercise
Locate databases on
Natural antisense, UTR , copy number variation
Sumoylation, phosphorylation
Retrieve gene information for
Your favorite gene or BRCA1 or BAD
Find a suitable protocol for experiments:
Real time PCR , methylation PCR , CGH array
Multiple sequence alignment , BLAST search
Protein structure prediction http://www.hsls.pitt.edu/guides/genetics
HSLS Mol Bio Homepage Hits
Hits /Day
2853
1576
1027
329
2003
439
523
2004 2005 2006 http://www.hsls.pitt.edu/guides/genetics
2007 2008
http://www.hsls.pitt.edu/guides/genetics
HSLS Molecular Biology Information Service
Workshops
Bioinformatics
Consultations
Software
Licensing http://www.hsls.pitt.edu/guides/genetics
Website
HSLS Licensed Bioinformatics
Resources http://www.hsls.pitt.edu/guides/genetics
HSLS Licensed Tools http://www.hsls.pitt.edu/guides/genetics
Pathway Drawing Tool: PathwayBuilder http://www.hsls.pitt.edu/guides/genetics
Pathway Drawing Tool: PathwayBuilder http://www.hsls.pitt.edu/guides/genetics
HSLS Licensed Tools http://www.hsls.pitt.edu/guides/genetics
VectorNTI http://www.hsls.pitt.edu/guides/genetics
Sequencher http://www.hsls.pitt.edu/guides/genetics
HSLS Licensed Tools http://www.hsls.pitt.edu/guides/genetics
Microarray Data Analysis raw data
GeneSpring
Statistical packages
Gene List
Literature findings
Biology
• Ingenuity IPA
• GeneGO Metacore
• PathwayArchitect
• Explain http://www.hsls.pitt.edu/guides/genetics
Omics Tools
Control
A549 cell
+
Ethanol (25 m mol/L) 48 hr http://www.hsls.pitt.edu/guides/genetics
Treated
A549 cell
+
Resveratrol (25 m mol/L) 48 hr
Gene Expression Databases
NCBI
• Gene Expression Omnibus
(GEO)
EBI
• ArrayExpress http://www.hsls.pitt.edu/guides/genetics
Raw data processing : ArrayAssist http://www.hsls.pitt.edu/guides/genetics
Microarray Data Processing: ArrayAssist
Workflow: http://www.hsls.pitt.edu/guides/genetics
Microarray Processed Data http://www.hsls.pitt.edu/guides/genetics
Discovery Step: Pathway Analysis
Gene List
Ingenuity
IPA
GeneGo
Metacore
Biobase
ExPlain http://www.hsls.pitt.edu/guides/genetics
Common Functions and Pathways: IPA
Analysis http://www.hsls.pitt.edu/guides/genetics
Common Functions and Pathways: IPA
Analysis http://www.hsls.pitt.edu/guides/genetics
Common Pathways: IPA http://www.hsls.pitt.edu/guides/genetics
Pathway Map: p53 http://www.hsls.pitt.edu/guides/genetics
Pathway Map: NFkB and TGF beta http://www.hsls.pitt.edu/guides/genetics
Interaction Map: IPA http://www.hsls.pitt.edu/guides/genetics
Interaction Map with Data Overlay: IPA http://www.hsls.pitt.edu/guides/genetics
Pathway Analysis: GeneGo Metacore http://www.hsls.pitt.edu/guides/genetics
Metacore: Analyze Network (receptors)
C-Myc p53
Pathway Analysis: GeneGo Metacore http://www.hsls.pitt.edu/guides/genetics
Promoter Analysis: BioBase Explain
Over representation of TF binding sites http://www.hsls.pitt.edu/guides/genetics
BioBase Explain: Key Molecules Prediction
Key Molecules
Over representation of
TF Binding site
Promoter http://www.hsls.pitt.edu/guides/genetics
Microarray
Use of Ingenuity, GeneGO and BioBase
DNA microarrays
Protein arrays
CHIP-chip
Search for
Gene/protein info (human, mouse, rat, yeast)
Drug/small molecule
Disease info
Promoter sequences
Transcription factors binding sites http://www.hsls.pitt.edu/guides/genetics
Gene Regulation: Transfac and
TransPro http://www.hsls.pitt.edu/guides/genetics
HSLS Molecular Biology Information Service
Workshops
Bioinformatics
Consultations
Software
Licensing http://www.hsls.pitt.edu/guides/genetics
Website
HSLS Bioinformatics Workshops http://www.hsls.pitt.edu/guides/genetics
Online PPT
Bioinformatics FAQ: How Do I?
http://www.hsls.pitt.edu/guides/genetics
http://www.hsls.pitt.edu/guides/genetics
Bioinformatics Databases & Software Providers
NCBI
Home page
Site map
Resource Guide
EBI
Home page
Databases
Software http://www.hsls.pitt.edu/guides/genetics
US NCBI Databases http://www.hsls.pitt.edu/guides/genetics
NCBI sequence databases
GenBank
archival database of nucleotide sequences from >160,000 organisms More info
GenPept
conceptual translation of GenBank CDS
Refseq
based on GenBank record, non-redundant expert verified databases of reference sequences http://www.hsls.pitt.edu/guides/genetics
International Nucleotide Sequence
Database Collaboration http://www.hsls.pitt.edu/guides/genetics
Primary Vs Derivative databases http://www.hsls.pitt.edu/guides/genetics
RefSeq Scope & Accessions
Genomic DNA
NC_123456 - complete genome, complete chromosome, complete plasmid
NG_123456 - genomic region
NT_123456 - genomic contig
mRNA NM_123456
Protein - NP_123456
more about RefSeq scope and accessions... http://www.hsls.pitt.edu/guides/genetics
Entrez Gene
a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map Viewer
Statistics
Gene: 4818 organisms
Genbank: 160,000 organisms
each record represents a single gene from a given organism http://www.hsls.pitt.edu/guides/genetics
US NCBI : Entrez Gene
Chromosomal
Localization
Genomic
Sequence mRNA Sequence
Amino acid
Sequence
Homologous
Sequences
Expression
Profile
3D Structure
Interacting
Partners
Disease SNP http://www.hsls.pitt.edu/guides/genetics
Entrez Gene
Find:
gene symbols and aliases sequences: genomic, mRNA, protein
intron-exon architecture genomic context: neighboring and antisense genes
Interacting partners
associated gene ontology terms: function, cellular component and biological process http://www.hsls.pitt.edu/guides/genetics
Entrez Gene search
Query: BRCA1
Search Tips:
Query text box: BRCA1
Limits:
select: “Gene name” from drop-down menu
Limit by taxonomy: select “Homo sapiens” http://www.hsls.pitt.edu/guides/genetics
Alternative Splicing http://www.hsls.pitt.edu/guides/genetics
Gene Ontology ( GO )
Controlled vocabulary tagging
•
•
•
http://www.hsls.pitt.edu/guides/genetics
Sample Exercises
Find intron-exon coordinates for your favorite gene
What are the genes involved in the disease phenylketonuria in humans?
From where you can order cDNA clone for the human
PAH gene?
How many splice variants have been reported for human EGFR gene?
What is the genomic sequence for mouse muscle protein titin?
http://www.hsls.pitt.edu/guides/genetics
Gene-centered open access resources: beyond NCBI & EBI
Weizmann Institute of Science : GeneCards
UCSC genome bioinformatics:
UCSC gene page
UCSC BLAT
How do I?
identify a short nucleotide fragment and determine its corresponding protein sequence?
http://www.hsls.pitt.edu/guides/genetics
HSLS licensed Gene-centered Resources
Ingenuity IPA : Ingenuity Systems
Expert extracted gene focused knowledgebase
Metacore : GeneGo
Human curated gene-based knowledge library
Protein Lounge
Biological pathway database
Biobase Knowledge Library
Promoter sequence database
Transfac http://www.hsls.pitt.edu/guides/genetics
Hands-on exercise:
Take your favorite gene or p53 (human) as a search term and
find its promoter sequence
identify transcription factors which could bind its regulatory region
find its upstream regulators and report supporting citations http://www.hsls.pitt.edu/guides/genetics
Any questions?
Carrie Iwema iwema@pitt.edu
412-383-6887
Ansuman Chattopadhyay ansuman@pitt.edu
412-648-1297 http://www.hsls.pitt.edu/guides/genetics