Current Integrated View of Macrophage Activation To do: Other systems: scavenger receptors, cytokine networks (CCL, CXCL, IL’s), growth factors (TGFB, CSF’s), kinase cascades, phagocytosis….. Better integration of transcription factor networks/gene expression networks from profiling data 3D modelling of pathways Computation modelling of flow through pathway network - Petri-net flow analyses 2,031 Nodes 2,494 Edges An Introduction to Network Modelling of Pathway Knowledge NESC 2009 http://www.systemsbiology.org/cd/ SBGN Current Draft Notation - 16th June 2008 http://www.cytoscape.org/ Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Plugins * Analysis Plugins - Used for analyzing existing networks. * Network and Attribute I/O Plugins - Used for importing networks and attributes in different file formats. * Network Inference Plugins - Used for inferring new networks. * Functional Enrichment Plugins - Used for functional enrichment of networks. * Communication/Scripting Plugins - Used for communicating with or scripting Cytoscape. Reactome - a curated knowledgebase of biological pathways http://www.reactome.org/ http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166662& Lectin pathway of complement activation [Homo sapiens] http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166662& Mannose-binding lectin (MBL), a Ca-dependent (C-type) lectin, initiates the complement cascade after binding to specific carbohydrate patterns on pathogenic cell surfaces. The MBL polypeptide chain consists of a short N-terminal cysteine-rich region, a collagen-like region comprising 19 Gly-X-Y triplets, a 34-residue hydrophobic stretch, and a C-terminal C-type lectin domain. MBL monomers associate via their cysteine-rich and collagenlike regions to form homotrimers, and these in turn associate into oligomers. The predominant oligomers found in human serum contain three (MBL-I) or four (MBL-II) homotrimers (Fujita et al. 2004; Teillet et al. 2005). These oligomers are associated with homodimers of the MASP2 serine protease (Fujita et al. 2004; Hajela et al. 2002). MBL-II is associated with one or two MASP homodimers (Chen and Wallis 2001, 2004). The carbohydrate recognition domain (CRD) of MBL binds carbohydrates with 3- and 4- OH groups in the pyranose ring, such as mannose and N-acetyl-D-glucosamine, in the presence of Ca2+. This binding results in a change in conformation of the MBL and activation of MASP by cleavage (Fujita et al. 2004). MASP2a cleaves C4 to generate C4a and C4b. C4b binds to the bacterial or foreign cell surface via its thioester bond (Law and Dodds 1997) and binds circulating C2. Bound C2 is then cleaved by MASP2 to yield the C3 convertase C4b-C2a. MASP1, a serine protease encoded by an alternatively spliced transcript of the same gene that encodes MASP2, can also be activated by binding of the MBL complex to carbohydrate patterns. MASP-1a cleaves fibrinogen to yield fibrinopeptide B, and cleaves and activates factor XIII. While MASP1 can also cleave C2, it is not thought to mediate the initial cleavage and activation of C2 in vivo (Chen and Wallis 2004). MASP-1a may have a role in cleavage of 'dead C3', i.e. C3(H2O) (Hajela et al. 2002). [Chen & Wallis 2001, Chen & Wallis 2004, Fujita et al 2004, Hajela et al 2002, Law & Dodds 1997, Teillet et al 2005] Cytoscape Visualisation of Lection Pathway of Complement Activation (.sif file) Lectin pathway of complement activation [Homo sapiens] http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166662& Mannose-binding lectin (MBL), a Ca-dependent (C-type) lectin, initiates the complement cascade after binding to specific carbohydrate patterns on pathogenic cell surfaces. The MBL polypeptide chain consists of a short Nterminal cysteine-rich region, a collagen-like region comprising 19 Gly-X-Y triplets, a 34-residue hydrophobic stretch, and a C-terminal C-type lectin domain. MBL monomers associate via their cysteine-rich and collagen-like regions to form homotrimers, and these in turn associate into oligomers. The predominant oligomers found in human serum contain three (MBL-I) or four (MBL-II) homotrimers (Fujita et al. 2004; Teillet et al. 2005). These oligomers are associated with homodimers of the MASP2 serine protease (Fujita et al. 2004; Hajela et al. 2002). MBL-II is associated with one or two MASP homodimers (Chen and Wallis 2001, 2004). The carbohydrate recognition domain (CRD) of MBL binds carbohydrates with 3- and 4- OH groups in the pyranose ring, such as mannose and N-acetyl-D-glucosamine, in the presence of Ca2+. This binding results in a change in conformation of the MBL and activation of MASP by cleavage (Fujita et al. 2004). MASP2a cleaves C4 to generate C4a and C4b. C4b binds to the bacterial or foreign cell surface via its thioester bond (Law and Dodds 1997) and binds circulating C2. Bound C2 is then cleaved by MASP2 to yield the C3 convertase C4b-C2a. MASP1, a serine protease encoded by an alternatively spliced transcript of the same gene that encodes MASP2, can also be activated by binding of the MBL complex to carbohydrate patterns. MASP-1a cleaves fibrinogen to yield fibrinopeptide B, and cleaves and activates factor XIII. While MASP1 can also cleave C2, it is not thought to mediate the initial cleavage and activation of C2 in vivo (Chen and Wallis 2004). MASP-1a may have a role in cleavage of 'dead C3', i.e. C3(H2O) (Hajela et al. 2002). [Chen & Wallis 2001, Chen & Wallis 2004, Fujita et al 2004, Hajela et al 2002, Law & Dodds 1997, Teillet et al 2005] Visualisation and Analysis of Biological Networks An Introduction to BioLayout Express3D A Brief History of Gene Expression Profiling Northern blot (1977) Differential Display (1991) Spotted cDNA (1996) SAGE (1995) AAA NextGen (2005) ISH (1969) RT-PCR (1983) In situ oligo array (1996) DATA Microarray Gene Expression Profiling Statistics Explorative There must be a better way……. GNF1M Mouse Atlas •61 ‘tissues’ C57/BL6 mice, 8-11 weeks old •tissue pooled from 7 mice, 4 male/3 female •synthesize cRNA and hybridise to chip •122 Affymetrix chips (36k features) Su et al. PNAS 101(16):6062-7 (2004) PLoS Comp Biol. 3:2032-42 (2007) www.biolayout.org • New tool for the construction and analysis of large network graphs (25K nodes, 2M edges) • Built in correlation (Pearson, Spearman rank) matrix calculation • Highly visual and interactive interface for the analysis of 2D and 3D network graphs • Integration of powerful network clustering algorithm (MCL) • Built in data mining capabilities • Particularly suited for the analysis of large datasets (up to 500 genome arrays) • Supports the layout of numerous data types • Main code Java 1.6, providing OS-independent, multi-platform compatibility Open source and available now at: www.biolayout.org Repositories of Gene Expression Data BioLayout Requirements • Windows, Macintosh or Linux • 3 button mouse (preferably) • Must have JAVA 1.6 installed – Java.sun.com (support on Tiger operating system limited to Java 1.5) • 1GB of RAM • Fast 3D graphics card – Nvidia – ATI BioLayout Input • Garbage in / Garbage out • Numerous file types – – – – – 1) Regular (.layout, .txt, .tgf) 2) Cytoscape CIF format (.sif) 3) Graphml (.graphml) 4) Matrix (.matrix) 5) Expression (.expression) Under development – 5) BioPax Level 2 (.owl) – 6) SBML (.sbml) Data Formats B Simple A ProteinA ProteinB ProteinC ProteinZ ProteinB ProteinA ProteinB ProteinA 80 90 50 40 C Z Complicated (e.g. BioPAX) <sbml xmlns:bp=“http://www.biopax.org/release1/biopaxrelease1.owl”xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><listOfSpecies> <species id=“PdhA” metaid=“PdhA”> <annotation> <bp:protein rdf:ID=“#PdhA”/></annotation> </species> <species id=“NADP+” metaid=“NADP+”> <annotation><bp:smallMolecule rdf:ID=“#NADP+”/> </annotation></listOfSpecies> <listOfReactions><reaction id=“pyruvate_dehydrogenase_cplx”> <annotation><bp:complexAssembly rdf:ID=“#pyruvate_dehydrogenase_cplx”/> </annotation></reaction> <reaction id=“pyruvate_dehydrogenase_rxn” metaid=“pyruvate_dehydrogenase_rxn”><annotation> <bp:biochemicalReaction rdf:ID=“#pyruvate_dehydrogenase_rxn” /></annotation> BioLayout Input Formats: Simple Pairwise (.txt, .tgf) NodeA NodeA NodeB NodeC NodeD NodeE NodeC NodeF NodeA NodeD NodeG NodeG NodeE NodeH NodeH NodeJ NodeE NodeC NodeB NodeJ NodeB NodeC NodeC NodeC NodeC NodeC NodeE NodeA NodeE NodeE NodeF NodeD NodeG NodeG NodeJ NodeA NodeJ NodeG NodeG NodeD Go to BioLayout wesite: www.biolayout.org BioLayout Input Formats: Weighted Pairwise (.txt, .tgf) NodeA NodeA NodeB NodeC NodeD NodeE NodeC NodeF NodeA NodeD NodeG NodeG NodeE NodeH NodeH NodeE NodeC NodeB NodeJ NodeB NodeC NodeC NodeC NodeC NodeC NodeE NodeA NodeE NodeE NodeF NodeD NodeG NodeG NodeJ NodeJ NodeG NodeG NodeD 1.0 0.93 0.34 0.87 0.70 0.01 0.25 0.51 0.553 0.778 0.358 0.965 0.02 1.0 0.338 0.87 0.61 0.448 0.17 0.17 0.338 0.87 0.778 0.965 1.0 0.02 0.553 1 0.51 0.934 0.358 0.448 0.61 1.00 0.34 0.70 BioLayout Input Formats: Cytoscape Annotated Edge (.sif) NodeA phosphorylation NodeB NodeA binding NodeC NodeB phosphorylation NodeC NodeD binding NodeC NodeE binding NodeC NodeC phosphorylation NodeE NodeF phosphorylation NodeA NodeA phosphorylation NodeE NodeD binding NodeE NodeG ubiquitinisation NodeF NodeG ubiquitinisation NodeD NodeE phosphorylation NodeG NodeH ubiquitinisation NodeG NodeH ubiquitinisation NodeJ NodeE phosphorylation NodeJ NodeC ubiquitinisation NodeG NodeB binding NodeG NodeJ phosphorylation NodeD BioLayout Input Formats: Matrix (.matrix) A B C D E F A 1 0.925071 0.917047 0.913604 0.908269 0.900065 B 0.925071 1 0.914733 0.909063 0.90429 0.898442 C 0.917047 0.914733 1 0.945363 0.939911 0.929149 D 0.913604 0.909063 0.945363 1 0.933104 0.928441 E 0.908269 0.90429 0.939911 0.933104 1 0.9277 F 0.900065 0.898442 0.929149 0.928441 0.9277 1 Adding Annotations to Nodes Nodes can be assigned to ‘classes’ which can be used to select nodes and provide different visual properties individual classes. Terms can be mined //NODECLASS //NODECLASS //NODECLASS A B C “up” “down” “Kinase” “4hrs” “4hrs” “GOTerm” Likewise nodes can be assigned a shape and size in the input file BioLayout Input Formats: Expression Data (.expression) Probe XXXX YYYY ZZZZ Annotation1 Kinase Unknown Phosphatase Annotation2 MAPKK PKZZ HPp43a Data1 400.3 300.3 405.3 Can be created in Excel as Tab Separated Text file Input file must be called “something.expression” BioLayout will then create a something.pearson Watch the console for graph size – current max size 25,000 nodes and 2,000,000 edges = 600MB Data2 900.34 100.3 1002.4 Data3 295.3 0.22 299.2 Graph Paradigm for Gene Expression Data • Co-expression defined using correlation measure (e.g. Pearson) • Genes (nodes) are connected to each other in a network based on their level of co-expression (edges) 120 Tissue2 100 95 50 40 4 Gene1 Gene2 Gene3 Gene4 Gene5 Tissue3 50 55 50 50 2 Tissue4 50 60 50 55 5 100 100 100 100 4 gene1 gene2 gene3 gene4 gene5 100 80 Expression Tissue1 60 40 20 1 3 0 1 4 Gene1 Gene2 Gene3 Gene4 Gene5 100% 99% 58% 38% 23% 99% 100% 64% 46% 31% 58% 64% 100% 97% 13% 38% 46% 97% 100% 16% 23% 31% 13% 16% 100% 50,000 Gene1 Gene2 Gene3 Gene4 Gene5 3 Sample 5 2 2 1.25 billion calculations 50,000 4 GNF1M Node/Edge Profiles Across Range of Pearson Correlation Coefficients gcRMA MAS5 MAS5 gcRMA Rank Maximum Expressio n67-100% 34-66% No. of nodes 0-33% GNF1M Graph Profiles Across Range of Pearson Correlation Coefficients gcRMA MAS5 0.9 Pearson Markov Cluster Routine (MCL) MCL is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for network graphs based on simulation of stochastic flow in graphs. This is one of the few methods available for clustering data in network graphs and arguably the best. Application of MCL allows large graphs to be divided in sub-groups where the connectivity between groups of nodes Stijn van Dongen (2000) Graph clustering by flow simulation http://micans.org/mcl/ Experimental Design The design of a given experiment influences the resultant graph The size of a graph at a given correlation cut off value, is a function of the number of probes on the array and number of samples analysed •The smaller the dataset the larger the graph •The less biological or experimental variation the less structure of the network The order of the samples has no effect on the correlation The experimental question has no influence on the graph Normalisation Method and Platform Dependency BioLayout Express3D does not possess the ability to normalise data, nor in principle does it matter whether the input data has been normalised, log transformed or converted into ratio-metric data. A correlation matrix will be calculated and a graph plotted regardless. Normalisation method does have a profound affect on graph structure: data normalised by quantile methods reduce variation therefore increase correltion. BioLayout Express3D is not restrained in analysing data from any commercial or academic microarray platform; the input format is the same regardless of the platform the data was generated on. Focus of the Study In contrast to a statistical approach to identifying genes of interest in a dataset where the biological groupings and contrasts of interest need to be defined, the network paradigm presents the answer irrespective of the question asked. Expression data (normalised and annotated) Gene to gene Pearson correlation calculated for every probe set on the array Pearson correlations >0.7 saved Pearson correlation file >0.7 filtered based on user defined threshold (0.7-1.0) Edges drawn between nodes (genes) based on correlations > than selected threshold Singletons and graphs with <n members removed optimised weighted Fruchterman-Rheingold layout 3-D visualisation Open-GL Network graphs laid out in tiled arrangement and clustered