What is systems biology? What is systems biology? “Systems biology is the analysis of the relationships among the elements in a system in response to genetic or environmental perturbations, with the goal of understanding the system or the emergent properties of the system.” What is systems biology? What is systems biology? … The integration between computer science and biology can: •have a predictive potential • identify new molecular targets • help in designing targeted experiments • bring a better understanding of cellular mechanisms and dynamics (even in the presence of “noise”) Post-genomics methodologies Post-genomics methodologies Genetics Genomics Transcriptomics Proteomics Systems biology Science (2001) Protein networks Systems of molecular interactions guide cellular processes Protein networks direct: •development programs •signalling • metabolic pathways … They depend on cellular localization and physical interactions Systems biology and Protein networks One of the goal of systems biology is to explain the relationships between structure, function, and regulation of molecular networks by combining experimental and theoretical approaches Final goal: a full molecular map of the cell graph theory enables the analysis of PPI networks structural properties and links them with function. Protein Networks: graphs Protein-protein interaction networks are commonly represented as graphs. Graphs are a collection of points connected by lines. Points Nodes proteins/genes… Lines Edges interaction Graphs Interactions: Activation Inhibition Feedback loop… Directed graph Interactions: Binary interactions Undirected graph Graphs A graph is weighted if there is a weight function associated with its edges (or nodes) e.g., confidence view Graphs A graph is complete if it has an edge every pair of nodes (graph called clique) e.g., uncomplete Graphs •Adjacent •Closed neighbors Hub: connects many nodes Degree: 6 Subnetwork Shortcomings 1. False-positive • Some experimental protocol do generate complex data: • Eg. Tandem affinity purification (TAP) • One may want to convert these complexes into sets of binary interactions, 2 algorithms are available: Bait Preys Shortcomings 2. False-negative: imperfect experimental techniques overlap of multiple data 3. No spatial and temporal information B A B A How to build networks? •web-inference •Literature meta-analysis •Genomics/transcriptomics/proteomics data How to build networks? •Database co-occurence •Physical interction •Genomic proximity •Co-expression •Proteomics •Literature(pubmed) •Pathways (KEGG, Reactome, …) •GO Terms Physical interactions databases http://www.imexconsortium.org/ •A non-redundant set of protein-protein interaction data from a broad taxonomic range of organisms •Expertly curated from direct submissions or peer-reviewed journals to a consistent high standard. •Provided by a network of participating major public domain databases. Includes : o IntACT (EBI) http://www.ebi.ac.uk/intact/ o DIP http://dip.doe‐mbi.ucla.edu/dip/Main.cgi o UniProt http://www.uniprot.org/ o MINT http://mint.bio.uniroma2.it/mint/ (ora IntAct) o BioGRID http://www.thebiogrid.org/ o … Altri: • NCBI Entrez Gene http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene • BIND/BOND http://bond.unleashedinformatics.com/ • HPID http://wilab.inha.ac.kr/hpid/ Physical interactions databases •PSICQUIC: Proteomics Standard Initiative Common QUery InterfaCe Is an effort from the HUPO Proteomics Standard Initiative (HUPO-PSI) to standardise the access to molecular interaction databases programmatically. https://code.google.com/p/psicquic/ Functional interactions databases • Pathways – Reactome http://www.reactome.org/ – KEGG http://www.genome.jp/kegg/pathway.html – Panther http://www.pantherdb.org/pathway/ – NCI Nature PathwayInteractionDb http://pid.nci.nih.gov/ – BioPATH http://www.molecular‐networks.com/biopath/index.html • Platforms – WikiPathways http://www.wikipathways.org •established to facilitate the contribution and maintenance of pathway information by the biology community. •built on the same MediaWiki software that powers Wikipedia “The familiar web-based format of WikiPathways reduces the barrier to participate in pathway curation. More importantly, the open, public approach of WikiPathways allows for broader participation by the entire community, ranging from students to senior experts in each field. This approach also shifts the bulk of peer review, editorial curation, and maintenance to the community.” – PathwayCommons http://www.pathwaycommons.org •point of access to biological pathway information collected from public pathway databases, which you can search, visualize and download. All data is freely available, under the license terms of each contributing database. Physical interactions - IntAct http://www.ebi.ac.uk/intact/ Physical interactions - IntAct Physical interactions – IntAct BRCA2 interactome KEGG Kyoto Encyclopedia of Genes and Genomes KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies KEGG pathways A Database of human biological pathways Rationale – Journal information Nature 407(6805):770-6.The Biochemistry of Apoptosis. “Caspase-8 is the key initiator caspase in the death-receptor pathway. Upon ligand binding, death receptors such as CD95 (Apo-1/Fas) aggregate and form membrane-bound signalling complexes (Box 3). These complexes then recruit, through adapter proteins, several molecules of procaspase-8, resulting in a high local concentration of zymogen. The induced proximity model posits that under these crowded conditions, the low intrinsic protease activity of procaspase-8 (ref. 20) is sufficient to allow the various proenzyme molecules to mutually cleave and activate each other (Box 2). A similar mechanism of action has been proposed to mediate the activation of several other caspases, including caspase-2 and the nematode caspase CED-3 (ref. 21).” How can I access the pathway described here and reuse it? Rationale - Figures A picture paints a thousand words… but…. • Just pixels • Omits key details • Assumes • Fact or Hypothesis? Nature. 2000 Oct 12;407(6805):770-6. The biochemistry of apoptosis. Reactome is… Free, online, open-source, curated database of pathways and reactions in human biology Authored by expert biologists, maintained by Reactome editorial staff (curators) Mapped to cellular compartment Reactome is… Extensively cross-referenced Tools for data analysis – Pathway Analysis, Expression Overlay, Species Comparison, Biomart… Used to infer orthologous events in 20 non-human species Theory - Reactions Pathway steps = the “units” of Reactome = events in biology called reactions BINDING DEGRADATION DISSOCIATION DEPHOSPHORYLATION PHOSPHORYLATION CLASSIC TRANSPORT BIOCHEMICAL Reaction Example 1: Enzymatic inputs catalyst outputs Reaction Example 2: Transport Transport of Ca2+ from platelet dense tubular system to cytoplasm REACT_945.4 output facilitator input •In Reactome a molecule in the cytosol is NOT the same as the same molecule in another part of the cell. •Reactome uses the GO compartment classification. Other Reaction Types Dimerization Binding In Reactome a phosphorylated protein is NOT the same as the same protein non phosphorylated. Phosphorylation Reactions Connect into Pathways CATALYST CATALYST CATALYST INPUT OUTPUT INPUT OUTPUT INPUT OUTPUT Evidence Tracking – Inferred Reactions Direct evidence PMID:5555 PMID:4444 Human pathway PMID:8976 Indirect evidence mouse PMID:1234 cow Data Expansion - Link-outs From Reactome • GO • Molecular Function • Compartment • Biological process • KEGG, ChEBI – small molecules • UniProt – proteins • Sequence dbs – Ensembl, OMIM, Entrez Gene, RefSeq, HapMap, UCSC, KEGG Gene • PubMed references – literature evidence for events Data Expansion – Projecting to Other Species Human B A + ATP A -P + ADP Mouse B A A -P + ADP + ATP Drosophila A + ATP B No orthologue - Protein not inferred Reaction not inferred Exportable Protein-Protein Interactions Reactome is not an interactions db but inferred from complexes and reactions Types of derived interactions: •Interactions between proteins in the same complex (direct complex or indirect subcomplex) •interactors participate in the same reaction •or adjoining reactions (2 consecutive reactions) Lists available from Downloads Front Page http://www.reactome.org GO • The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases. • Founded in 1998 as a collaboration between three model organism databases, FlyBase (Drosophila), the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD). • The GO Consortium (GOC) has since incorporated many databases, including several of the world's major repositories for plant, animal, and microbial genomes. • The GO project has developed three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. Outside the Scope of GO • Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are. • Processes, functions or components that are unique to mutants or diseases: e.g. oncogenesis is not a valid GO term, as "causing cancer" is the result of reprogrammed, not normal cells and thus it is not the normal function of a gene. • Attributes of sequence such as "intron" or "exon" parameters. • Protein domains or structural features. • Protein-protein interactions. • Environment, evolution and expression. • Anatomical or histological features above the level of cellular components, including cell types. http://www.geneontology.org/ GO enrichment One of the main uses of the GO is to perform enrichment analysis on gene sets. given a set of genes that are up-regulated under certain conditions, an enrichment analysis will find which GO terms are over-represented (or under-represented) using annotations for that gene set. Identifiers • • • • Uniprot accession number (http://www.uniprot.org ) Gene symbol (http://www.genenames.org ) Entrez gene (http://www.ncbi.nlm.nih.gov/gene ) ENSEMBLE (http://www.ensembl.org/index.html ) On-line tools to convert ID (http://biit.cs.ut.ee/gprofiler/gconvert.cgi ) String 9.1 http://string-db.org/ String 9.1 String 9.1 WEBGESTALT http://bioinfo.vanderbilt.edu/webgestalt/ Enrichment Analysis 1.Gene Ontology Analysis: Enrichment analysis for the Gene Ontology categories. The result is visualized in a directed acyclic graph (DAG) in order to maintain the relationship among the enriched GO categories. 2.KEGG Analysis: Enrichment analysis for the KEGG pathways. Genes can be highlighted in the KEGG pathway maps. 3.Wikipathways Analysis: Enrichment analysis for the pathways in the Wikipathways database. Genes and corresponding changes can be colored in the Wikipathways maps. 4.Pathway Commons Analysis: Enrichment analysis for the pathways in the Pathway Commons database. 5.Transcription Factor Target Analysis:Enrichment analysis for the targets of transcription factors (data source: MSigDB). 6.MicroRNA Target Analysis:Enrichment analysis for the targets of MicroRNAs (data source: MsigDB). 7.Protein Interaction Network Module Analysis:Enrichment analysis for hierarchical network modules predicted from Protein Interaction Networks. 8.Cytogenetic Band Analysis: Enrichment analysis for the cytogentic bands. 9.Disease Association Analysis:Enrichment analysis for disease associated genes dereived from GLAD4U. 10.Drug Association Analysis:Enrichment analysis for drug associated genes dereived from GLAD4U. 11.Phenotype Analysis:Enrichment analysis based on Mammalian Phenotype Ontology and Human Phenotype Ontology. GO Slim Classification 1.Biological process 2.Molecular Function 3.Cellular component An example: -Organism: hsapiens -Gene ID type: entrezgene -File: OUR.txt 1.Gene Ontology Analysis 2.KEGG Analysis 3.Wikipathways Analysis 4.PathwayCommons Anaysis GO enrichment analysis GO enrichment: description of results KEGG analysis WIKIPATHWAYS analysis PathwayCommons analysis BioProfiling http://www.bioprofiling.de/ BioProfiling - results R spider R spider implements the Global Network statistical framework to analyze gene list using as reference knowledge a global gene network constructed by combining signaling and metabolic pathways from Reactome and KEGG databases. Reactome is an expert-authored, peer-reviewed knowledgebase of human reactions and pathways. Reactome database model specifies protein-protein interaction pairs. The meaning of "interaction" is broad: 2 protein sequences occur in the same complex or they occur in the same or neighbouring reaction(s). Both, Reactome signaling network and KEGG metabolic network were united into the integral network. For the human genome, the resulting integral network covers about 4000 genes involved in approximately 50,000 unique pairwise gene interactions. Reference: if you will find the results produced by R spider usefull, please cite: 1. Antonov A.V., Schmidt E., Dietmann S., Krestyaninova M.,Hermjakob H. R spider: a network-based analysis of gene lists by combining signaling and metabolic pathways from Reactome and KEGG databases Nucleic Acids Research, 2010, Vol. 38, No. suppl_2 W118-W123 PPI spider PPI spider implements Global Network statistical framework to analyze gene/protein list using as reference knowledge a global protein-protein interaction network from IntAct database. For the human genome, the reference network covers about 7960 genes involved in approximately 40,000 unique pairwise interactions. Reference: if you will find the results produced by PPI spider usefull, please cite: 1. Antonov A.V., Dietmann S., Rodchenkov I., Mewes H.W. PPI spider: A tool for the interpretation of proteomics data in the context of protein protein interaction networks. PROTEOMICS. Volume 9, Issue 10, 10 May 2009. BioProfiling - results BioProfiling - results BioProfiling - results http://bioprofiling.simbioms.org/Results/dir_24924_1409769655/main.html The p-value provided, computed by Monte Carlo simulation (Global Network), refers to the probability to get a model of the same quality for a random gene list of the same size. http://bioprofiling.simbioms.org/Results/dir_24924_1409769655/r_spider_main.html http://bioprofiling.simbioms.org/Results/dir_24924_1409769655/ppi_spider_main.html ProfCom-GO ProfCom-GO http://www.cytoscape.org/ Cytoscape “Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data” Collaboration of private and public institutions Apps, previously called plugin http://www.cytoscape.org/ Cytoscape – Import Network from Public Databases Cytoscape – Import Network from Public Databases Cytoscape – Style Cytoscape – Layout Cytoscape App Store BiNGO (Biological Network Gene Ontology) To do GO enrichment Jepetto Jepetto TRANSPORT CHROMATIN MODIFICATION APOPTOSIS RESPONSE TO UNFOLDED PROTEIN METABOLISM ATP PRODUCTION POSITIVE REGULATION OF IKB KINASE/NF-KB CASCADE MCODE MCODE is a Cytoscape plugin that finds clusters (highly interconnected regions) in a network. Clusters mean different things in different types of networks. For instance, clusters in a protein-protein interaction network are often protein complexes and parts of pathways, while clusters in a protein similarity network represent protein families. ClusterViz (include MCODE)