bc17

advertisement
What is systems biology?
What is systems biology?
“Systems biology is the analysis of
the relationships among the
elements in a system in response to
genetic or environmental
perturbations, with the goal of
understanding the system or the
emergent properties of the
system.”
What is systems biology?
What is systems biology?
…
The integration between
computer science and biology
can:
•have a predictive potential
• identify new molecular targets
• help in designing targeted experiments
• bring a better understanding of cellular
mechanisms and dynamics (even in the
presence of “noise”)
Post-genomics methodologies
Post-genomics methodologies
Genetics
Genomics
Transcriptomics
Proteomics
Systems biology
Science (2001)
Protein networks
Systems of molecular interactions guide
cellular processes
Protein networks direct:
•development programs
•signalling
• metabolic pathways …
They depend on cellular localization
and physical interactions
Systems biology and Protein networks
One of the goal of systems biology is
to explain the relationships between
structure, function, and regulation of
molecular networks by combining
experimental and theoretical
approaches
Final goal: a full molecular map of
the cell
graph theory enables the analysis
of PPI networks structural properties
and links them with function.
Protein Networks: graphs
Protein-protein interaction
networks are commonly
represented as graphs.
Graphs are a collection of
points connected by lines.
Points Nodes
proteins/genes…
Lines Edges interaction
Graphs
Interactions:
Activation
Inhibition
Feedback loop…
Directed graph
Interactions:
Binary interactions
Undirected graph
Graphs
A graph is
weighted if there
is a weight
function
associated with its
edges (or nodes)
e.g., confidence view
Graphs
A graph is
complete if it has
an edge every
pair of nodes
(graph called
clique)
e.g., uncomplete
Graphs
•Adjacent
•Closed neighbors
Hub: connects many nodes
Degree: 6
Subnetwork
Shortcomings
1. False-positive
•
Some experimental protocol do generate complex data:
• Eg. Tandem affinity purification (TAP)
•
One may want to convert these complexes into sets of binary
interactions, 2 algorithms are available:
Bait
Preys
Shortcomings
2. False-negative: imperfect experimental
techniques overlap of multiple data
3. No spatial and temporal information
B
A
B
A
How to build networks?
•web-inference
•Literature meta-analysis
•Genomics/transcriptomics/proteomics data
How to build networks?
•Database co-occurence
•Physical interction
•Genomic proximity
•Co-expression
•Proteomics
•Literature(pubmed)
•Pathways (KEGG, Reactome, …)
•GO Terms
Physical interactions databases
http://www.imexconsortium.org/
•A non-redundant set of protein-protein interaction data from a broad
taxonomic range of organisms
•Expertly curated from direct submissions or peer-reviewed journals to a
consistent high standard.
•Provided by a network of participating major public domain databases.
Includes :
o IntACT (EBI) http://www.ebi.ac.uk/intact/
o DIP http://dip.doe‐mbi.ucla.edu/dip/Main.cgi
o UniProt http://www.uniprot.org/
o MINT http://mint.bio.uniroma2.it/mint/ (ora IntAct)
o BioGRID http://www.thebiogrid.org/
o …
Altri:
•
NCBI Entrez Gene http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
•
BIND/BOND http://bond.unleashedinformatics.com/
•
HPID http://wilab.inha.ac.kr/hpid/
Physical interactions databases
•PSICQUIC: Proteomics Standard Initiative Common QUery InterfaCe
Is an effort from the HUPO Proteomics Standard Initiative (HUPO-PSI) to standardise
the access to molecular interaction databases programmatically.
https://code.google.com/p/psicquic/
Functional interactions databases
• Pathways
– Reactome http://www.reactome.org/
– KEGG http://www.genome.jp/kegg/pathway.html
– Panther http://www.pantherdb.org/pathway/
– NCI Nature PathwayInteractionDb http://pid.nci.nih.gov/
– BioPATH http://www.molecular‐networks.com/biopath/index.html
• Platforms
– WikiPathways http://www.wikipathways.org
•established to facilitate the contribution and maintenance of pathway information by
the biology community.
•built on the same MediaWiki software that powers Wikipedia
“The familiar web-based format of WikiPathways reduces the barrier to participate in pathway
curation. More importantly, the open, public approach of WikiPathways allows for broader
participation by the entire community, ranging from students to senior experts in each field. This
approach also shifts the bulk of peer review, editorial curation, and maintenance to the
community.”
– PathwayCommons http://www.pathwaycommons.org
•point of access to biological pathway information collected from public pathway
databases, which you can search, visualize and download. All data is freely available,
under the license terms of each contributing database.
Physical interactions - IntAct
http://www.ebi.ac.uk/intact/
Physical interactions - IntAct
Physical interactions – IntAct
BRCA2 interactome
KEGG
Kyoto Encyclopedia of Genes and Genomes
KEGG is a database resource for
understanding high-level functions and
utilities of the biological system, such as
the cell, the organism and the ecosystem,
from molecular-level information,
especially large-scale molecular datasets
generated by genome sequencing and
other high-throughput experimental
technologies
KEGG pathways
A Database of
human biological
pathways
Rationale – Journal information
Nature 407(6805):770-6.The Biochemistry of Apoptosis.
“Caspase-8 is the key initiator caspase in the death-receptor pathway. Upon ligand binding,
death receptors such as CD95 (Apo-1/Fas) aggregate and form membrane-bound signalling
complexes (Box 3). These complexes then recruit, through adapter proteins, several molecules of
procaspase-8, resulting in a high local concentration of zymogen. The induced proximity model
posits that under these crowded conditions, the low intrinsic protease activity of procaspase-8
(ref. 20) is sufficient to allow the various proenzyme molecules to mutually cleave and activate
each other (Box 2). A similar mechanism of action has been proposed to mediate the activation
of several other caspases, including caspase-2 and the nematode caspase CED-3 (ref. 21).”
How can I access the pathway described here
and reuse it?
Rationale - Figures
A picture paints a thousand
words…
but….
• Just pixels
• Omits key details
• Assumes
• Fact or Hypothesis?
Nature. 2000 Oct
12;407(6805):770-6.
The biochemistry of apoptosis.
Reactome is…
Free, online, open-source, curated
database of pathways and
reactions in human biology
Authored by expert biologists,
maintained by
Reactome editorial staff (curators)
Mapped to cellular compartment
Reactome is…
Extensively cross-referenced
Tools for data analysis –
Pathway Analysis,
Expression Overlay, Species
Comparison, Biomart…
Used to infer orthologous
events in 20 non-human
species
Theory - Reactions
Pathway steps = the “units” of Reactome
= events in biology called reactions
BINDING
DEGRADATION
DISSOCIATION
DEPHOSPHORYLATION
PHOSPHORYLATION
CLASSIC
TRANSPORT
BIOCHEMICAL
Reaction Example 1: Enzymatic
inputs
catalyst
outputs
Reaction Example 2: Transport
Transport of Ca2+ from platelet dense tubular system to cytoplasm
REACT_945.4
output
facilitator
input
•In Reactome a molecule in the cytosol is NOT the same as the same
molecule in another part of the cell.
•Reactome uses the GO compartment classification.
Other Reaction Types
Dimerization
Binding
In Reactome a
phosphorylated
protein is NOT the
same as the same
protein non
phosphorylated.
Phosphorylation
Reactions Connect into Pathways
CATALYST
CATALYST
CATALYST
INPUT
OUTPUT
INPUT
OUTPUT
INPUT
OUTPUT
Evidence Tracking – Inferred Reactions
Direct evidence
PMID:5555
PMID:4444
Human pathway
PMID:8976
Indirect evidence
mouse
PMID:1234
cow
Data Expansion - Link-outs From Reactome
• GO
• Molecular Function
• Compartment
• Biological process
• KEGG, ChEBI – small molecules
• UniProt – proteins
• Sequence dbs – Ensembl, OMIM, Entrez Gene, RefSeq, HapMap,
UCSC, KEGG Gene
• PubMed references – literature evidence for events
Data Expansion – Projecting to Other Species
Human
B
A
+ ATP
A -P + ADP
Mouse
B
A
A -P + ADP
+ ATP
Drosophila
A
+ ATP
B
No orthologue - Protein not inferred
Reaction not
inferred
Exportable Protein-Protein Interactions
Reactome is not an interactions db but inferred
from complexes and reactions
Types of derived interactions:
•Interactions between proteins in the same complex
(direct complex or indirect subcomplex)
•interactors participate in the same reaction
•or adjoining reactions (2 consecutive reactions)
Lists available from Downloads
Front Page
http://www.reactome.org
GO
• The Gene Ontology (GO) project is a collaborative effort to
address the need for consistent descriptions of gene products
across databases.
• Founded in 1998 as a collaboration between three model
organism databases, FlyBase (Drosophila), the Saccharomyces
Genome Database (SGD) and the Mouse Genome Database
(MGD).
• The GO Consortium (GOC) has since incorporated many
databases, including several of the world's major repositories for
plant, animal, and microbial genomes.
• The GO project has developed three structured, controlled
vocabularies (ontologies) that describe gene products in terms of
their associated biological processes, cellular components and
molecular functions in a species-independent manner.
Outside the Scope of GO
• Gene products: e.g. cytochrome c is not in the ontologies, but
attributes of cytochrome c, such as oxidoreductase activity, are.
• Processes, functions or components that are unique to
mutants or diseases: e.g. oncogenesis is not a valid GO term, as
"causing cancer" is the result of reprogrammed, not normal cells
and thus it is not the normal function of a gene.
• Attributes of sequence such as "intron" or "exon" parameters.
• Protein domains or structural features.
• Protein-protein interactions.
• Environment, evolution and expression.
• Anatomical or histological features above the level of cellular
components, including cell types.
http://www.geneontology.org/
GO enrichment
One of the main uses of the GO is to perform enrichment analysis on gene
sets.
given a set of genes that are up-regulated under certain conditions, an
enrichment analysis will find which GO terms are over-represented (or
under-represented) using annotations for that gene set.
Identifiers
•
•
•
•
Uniprot accession number (http://www.uniprot.org )
Gene symbol (http://www.genenames.org )
Entrez gene (http://www.ncbi.nlm.nih.gov/gene )
ENSEMBLE (http://www.ensembl.org/index.html )
On-line tools to convert ID (http://biit.cs.ut.ee/gprofiler/gconvert.cgi )
String 9.1
http://string-db.org/
String 9.1
String 9.1
WEBGESTALT
http://bioinfo.vanderbilt.edu/webgestalt/
Enrichment Analysis
1.Gene Ontology Analysis: Enrichment analysis for the Gene Ontology categories.
The result is visualized in a directed acyclic graph (DAG) in order to maintain the relationship
among the enriched GO categories.
2.KEGG Analysis: Enrichment analysis for the KEGG pathways. Genes can be highlighted
in the KEGG pathway maps.
3.Wikipathways Analysis: Enrichment analysis for the pathways in the Wikipathways
database. Genes and corresponding changes can be colored in the Wikipathways maps.
4.Pathway Commons Analysis: Enrichment analysis for the pathways in the
Pathway Commons database.
5.Transcription Factor Target Analysis:Enrichment analysis for the targets of
transcription factors (data source: MSigDB).
6.MicroRNA Target Analysis:Enrichment analysis for the targets of MicroRNAs (data
source: MsigDB).
7.Protein Interaction Network Module Analysis:Enrichment analysis for
hierarchical network modules predicted from Protein Interaction Networks.
8.Cytogenetic Band Analysis: Enrichment analysis for the cytogentic bands.
9.Disease Association Analysis:Enrichment analysis for disease associated genes
dereived from GLAD4U.
10.Drug Association Analysis:Enrichment analysis for drug associated genes
dereived from GLAD4U.
11.Phenotype Analysis:Enrichment analysis based on Mammalian Phenotype
Ontology and Human Phenotype Ontology.
GO Slim Classification
1.Biological process
2.Molecular Function
3.Cellular component
An example:
-Organism: hsapiens
-Gene ID type: entrezgene
-File: OUR.txt
1.Gene Ontology Analysis
2.KEGG Analysis
3.Wikipathways Analysis
4.PathwayCommons Anaysis
GO enrichment analysis
GO enrichment: description of results
KEGG analysis
WIKIPATHWAYS analysis
PathwayCommons analysis
BioProfiling
http://www.bioprofiling.de/
BioProfiling - results
R spider
R spider implements the Global Network statistical framework to analyze gene list
using as reference knowledge a global gene network constructed by combining
signaling and metabolic pathways from Reactome and KEGG databases. Reactome
is an expert-authored, peer-reviewed knowledgebase of human reactions and
pathways. Reactome database model specifies protein-protein interaction pairs.
The meaning of "interaction" is broad: 2 protein sequences occur in the same
complex or they occur in the same or neighbouring reaction(s). Both, Reactome
signaling network and KEGG metabolic network were united into the integral
network. For the human genome, the resulting integral network covers about 4000
genes involved in approximately 50,000 unique pairwise gene interactions.
Reference: if you will find the results produced by R spider usefull, please cite:
1. Antonov A.V., Schmidt E., Dietmann S., Krestyaninova M.,Hermjakob H. R
spider: a network-based analysis of gene lists by combining signaling and
metabolic pathways from Reactome and KEGG databases Nucleic Acids
Research, 2010, Vol. 38, No. suppl_2 W118-W123
PPI spider
PPI spider implements Global Network statistical framework to
analyze gene/protein list using as reference knowledge a global
protein-protein interaction network from IntAct database. For the
human genome, the reference network covers about 7960 genes
involved in approximately 40,000 unique pairwise interactions.
Reference: if you will find the results produced by PPI spider
usefull, please cite:
1. Antonov A.V., Dietmann S., Rodchenkov I., Mewes H.W. PPI
spider: A tool for the interpretation of proteomics data in the
context of protein protein interaction networks. PROTEOMICS.
Volume 9, Issue 10, 10 May 2009.
BioProfiling - results
BioProfiling - results
BioProfiling - results
http://bioprofiling.simbioms.org/Results/dir_24924_1409769655/main.html
The p-value provided, computed by Monte Carlo simulation (Global
Network), refers to the probability to get a model of the same quality for a
random gene list of the same size.
http://bioprofiling.simbioms.org/Results/dir_24924_1409769655/r_spider_main.html
http://bioprofiling.simbioms.org/Results/dir_24924_1409769655/ppi_spider_main.html
ProfCom-GO
ProfCom-GO
http://www.cytoscape.org/
Cytoscape
“Cytoscape is an open source bioinformatics software platform for
visualizing molecular interaction networks and integrating these
interactions with gene expression profiles and other state data”
Collaboration of private and public
institutions
Apps, previously called plugin
http://www.cytoscape.org/
Cytoscape – Import Network from Public Databases
Cytoscape – Import Network from Public Databases
Cytoscape – Style
Cytoscape – Layout
Cytoscape App Store
BiNGO (Biological Network Gene Ontology)
To do GO enrichment
Jepetto
Jepetto
TRANSPORT
CHROMATIN MODIFICATION
APOPTOSIS
RESPONSE TO UNFOLDED PROTEIN
METABOLISM
ATP PRODUCTION
POSITIVE REGULATION OF IKB
KINASE/NF-KB CASCADE
MCODE
MCODE is a Cytoscape plugin that finds clusters (highly interconnected
regions) in a network. Clusters mean different things in different types
of networks. For instance, clusters in a protein-protein interaction
network are often protein complexes and parts of pathways, while
clusters in a protein similarity network represent protein families.
ClusterViz (include MCODE)
Download