Current Integrated View of Macrophage Activation

advertisement
Current Integrated View of
Macrophage Activation
To do:
Other systems: scavenger receptors, cytokine
networks (CCL, CXCL, IL’s), growth factors
(TGFB, CSF’s), kinase cascades,
phagocytosis…..
Better integration of transcription factor
networks/gene expression networks from
profiling data
3D modelling of pathways
Computation modelling of flow through
pathway network - Petri-net flow analyses
2,031
Nodes
2,494
Edges
An Introduction to Network Modelling of
Pathway Knowledge
NESC 2009
http://www.systemsbiology.org/cd/
SBGN Current Draft Notation - 16th June 2008
http://www.cytoscape.org/
Cytoscape is an open source bioinformatics software platform for
visualizing molecular interaction networks and biological pathways and
integrating these networks with annotations, gene expression profiles and
other state data.
Plugins
* Analysis Plugins - Used for analyzing existing networks.
* Network and Attribute I/O Plugins - Used for importing networks and attributes in
different file formats.
* Network Inference Plugins - Used for inferring new networks.
* Functional Enrichment Plugins - Used for functional enrichment of networks.
* Communication/Scripting Plugins - Used for communicating with or scripting Cytoscape.
Reactome - a curated knowledgebase of biological pathways
http://www.reactome.org/
http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166662&
Lectin pathway of complement activation [Homo sapiens]
http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166662&
Mannose-binding lectin (MBL), a Ca-dependent (C-type) lectin, initiates the complement cascade after binding
to specific carbohydrate patterns on pathogenic cell surfaces. The MBL polypeptide chain consists of a short
N-terminal cysteine-rich region, a collagen-like region comprising 19 Gly-X-Y triplets, a 34-residue hydrophobic
stretch, and a C-terminal C-type lectin domain. MBL monomers associate via their cysteine-rich and collagenlike regions to form homotrimers, and these in turn associate into oligomers. The predominant oligomers found
in human serum contain three (MBL-I) or four (MBL-II) homotrimers (Fujita et al. 2004; Teillet et al. 2005). These
oligomers are associated with homodimers of the MASP2 serine protease (Fujita et al. 2004; Hajela et al. 2002).
MBL-II is associated with one or two MASP homodimers (Chen and Wallis 2001, 2004). The carbohydrate
recognition domain (CRD) of MBL binds carbohydrates with 3- and 4- OH groups in the pyranose ring, such as
mannose and N-acetyl-D-glucosamine, in the presence of Ca2+. This binding results in a change in
conformation of the MBL and activation of MASP by cleavage (Fujita et al. 2004). MASP2a cleaves C4 to
generate C4a and C4b. C4b binds to the bacterial or foreign cell surface via its thioester bond (Law and Dodds
1997) and binds circulating C2. Bound C2 is then cleaved by MASP2 to yield the C3 convertase C4b-C2a.
MASP1, a serine protease encoded by an alternatively spliced transcript of the same gene that encodes MASP2,
can also be activated by binding of the MBL complex to carbohydrate patterns. MASP-1a cleaves fibrinogen to
yield fibrinopeptide B, and cleaves and activates factor XIII. While MASP1 can also cleave C2, it is not thought
to mediate the initial cleavage and activation of C2 in vivo (Chen and Wallis 2004). MASP-1a may have a role in
cleavage of 'dead C3', i.e. C3(H2O) (Hajela et al. 2002). [Chen & Wallis 2001, Chen & Wallis 2004, Fujita et al
2004, Hajela et al 2002, Law & Dodds 1997, Teillet et al 2005]
Cytoscape Visualisation of Lection Pathway of Complement Activation (.sif file)
Lectin pathway of complement activation [Homo sapiens]
http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=166662&
Mannose-binding lectin (MBL), a Ca-dependent (C-type) lectin, initiates the complement cascade after binding to
specific carbohydrate patterns on pathogenic cell surfaces. The MBL polypeptide chain consists of a short Nterminal cysteine-rich region, a collagen-like region comprising 19 Gly-X-Y triplets, a 34-residue hydrophobic
stretch, and a C-terminal C-type lectin domain. MBL monomers associate via their cysteine-rich and collagen-like
regions to form homotrimers, and these in turn associate into oligomers. The predominant oligomers found in
human serum contain three (MBL-I) or four (MBL-II) homotrimers (Fujita et al. 2004; Teillet et al. 2005). These
oligomers are associated with homodimers of the MASP2 serine protease (Fujita et al. 2004; Hajela et al. 2002).
MBL-II is associated with one or two MASP homodimers (Chen and Wallis 2001, 2004). The carbohydrate
recognition domain (CRD) of MBL binds carbohydrates with 3- and 4- OH groups in the pyranose ring, such as
mannose and N-acetyl-D-glucosamine, in the presence of Ca2+. This binding results in a change in conformation
of the MBL and activation of MASP by cleavage (Fujita et al. 2004). MASP2a cleaves C4 to generate C4a and C4b.
C4b binds to the bacterial or foreign cell surface via its thioester bond (Law and Dodds 1997) and binds
circulating C2. Bound C2 is then cleaved by MASP2 to yield the C3 convertase C4b-C2a. MASP1, a serine
protease encoded by an alternatively spliced transcript of the same gene that encodes MASP2, can also be
activated by binding of the MBL complex to carbohydrate patterns. MASP-1a cleaves fibrinogen to yield
fibrinopeptide B, and cleaves and activates factor XIII. While MASP1 can also cleave C2, it is not thought to
mediate the initial cleavage and activation of C2 in vivo (Chen and Wallis 2004). MASP-1a may have a role in
cleavage of 'dead C3', i.e. C3(H2O) (Hajela et al. 2002). [Chen & Wallis 2001, Chen & Wallis 2004, Fujita et al 2004,
Hajela et al 2002, Law & Dodds 1997, Teillet et al 2005]
Visualisation and Analysis of
Biological Networks
An Introduction to
BioLayout Express3D
A Brief History of Gene Expression Profiling
Northern blot (1977)
Differential Display (1991)
Spotted cDNA (1996)
SAGE (1995)
AAA
NextGen (2005)
ISH (1969)
RT-PCR (1983)
In situ oligo array (1996)
DATA
Microarray Gene Expression Profiling
Statistics
Explorative
There must be a better way…….
GNF1M Mouse Atlas
•61 ‘tissues’ C57/BL6 mice, 8-11 weeks old
•tissue pooled from 7 mice, 4 male/3 female
•synthesize cRNA and hybridise to chip
•122 Affymetrix chips (36k features)
Su et al. PNAS
101(16):6062-7 (2004)
PLoS Comp Biol. 3:2032-42 (2007)
www.biolayout.org
• New tool for the construction and analysis of large network graphs
(25K nodes, 2M edges)
• Built in correlation (Pearson, Spearman rank) matrix calculation
• Highly visual and interactive interface for the analysis of 2D and 3D
network graphs
• Integration of powerful network clustering algorithm (MCL)
• Built in data mining capabilities
• Particularly suited for the analysis of large datasets (up to 500
genome arrays)
• Supports the layout of numerous data types
• Main code Java 1.6, providing OS-independent, multi-platform
compatibility
Open source and available now at:
www.biolayout.org
Repositories of Gene Expression Data
BioLayout Requirements
• Windows, Macintosh or Linux
• 3 button mouse (preferably)
• Must have JAVA 1.6 installed
– Java.sun.com
(support on Tiger operating system limited to Java 1.5)
• 1GB of RAM
• Fast 3D graphics card
– Nvidia
– ATI
BioLayout Input
• Garbage in / Garbage out
• Numerous file types
–
–
–
–
–
1) Regular (.layout, .txt, .tgf)
2) Cytoscape CIF format (.sif)
3) Graphml (.graphml)
4) Matrix (.matrix)
5) Expression (.expression)
Under development
– 5) BioPax Level 2 (.owl)
– 6) SBML (.sbml)
Data Formats
B
Simple
A
ProteinA
ProteinB
ProteinC
ProteinZ
ProteinB
ProteinA
ProteinB
ProteinA
80
90
50
40
C
Z
Complicated (e.g. BioPAX)
<sbml xmlns:bp=“http://www.biopax.org/release1/biopaxrelease1.owl”xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><listOfSpecies>
<species id=“PdhA” metaid=“PdhA”> <annotation>
<bp:protein rdf:ID=“#PdhA”/></annotation>
</species> <species id=“NADP+” metaid=“NADP+”>
<annotation><bp:smallMolecule rdf:ID=“#NADP+”/>
</annotation></listOfSpecies>
<listOfReactions><reaction id=“pyruvate_dehydrogenase_cplx”>
<annotation><bp:complexAssembly rdf:ID=“#pyruvate_dehydrogenase_cplx”/>
</annotation></reaction>
<reaction id=“pyruvate_dehydrogenase_rxn”
metaid=“pyruvate_dehydrogenase_rxn”><annotation>
<bp:biochemicalReaction rdf:ID=“#pyruvate_dehydrogenase_rxn”
/></annotation>
BioLayout Input Formats:
Simple Pairwise (.txt, .tgf)
NodeA
NodeA
NodeB
NodeC
NodeD
NodeE
NodeC
NodeF
NodeA
NodeD
NodeG
NodeG
NodeE
NodeH
NodeH
NodeJ
NodeE
NodeC
NodeB
NodeJ
NodeB
NodeC
NodeC
NodeC
NodeC
NodeC
NodeE
NodeA
NodeE
NodeE
NodeF
NodeD
NodeG
NodeG
NodeJ
NodeA
NodeJ
NodeG
NodeG
NodeD
Go to BioLayout wesite:
www.biolayout.org
BioLayout Input Formats:
Weighted Pairwise (.txt, .tgf)
NodeA
NodeA
NodeB
NodeC
NodeD
NodeE
NodeC
NodeF
NodeA
NodeD
NodeG
NodeG
NodeE
NodeH
NodeH
NodeE
NodeC
NodeB
NodeJ
NodeB
NodeC
NodeC
NodeC
NodeC
NodeC
NodeE
NodeA
NodeE
NodeE
NodeF
NodeD
NodeG
NodeG
NodeJ
NodeJ
NodeG
NodeG
NodeD
1.0
0.93
0.34
0.87
0.70
0.01
0.25
0.51
0.553
0.778
0.358
0.965
0.02
1.0
0.338
0.87
0.61
0.448
0.17
0.17
0.338
0.87
0.778
0.965
1.0
0.02
0.553
1
0.51
0.934
0.358
0.448
0.61
1.00
0.34
0.70
BioLayout Input Formats:
Cytoscape Annotated Edge (.sif)
NodeA
phosphorylation
NodeB
NodeA
binding
NodeC
NodeB
phosphorylation
NodeC
NodeD
binding
NodeC
NodeE
binding
NodeC
NodeC
phosphorylation
NodeE
NodeF
phosphorylation
NodeA
NodeA
phosphorylation
NodeE
NodeD
binding
NodeE
NodeG
ubiquitinisation
NodeF
NodeG
ubiquitinisation
NodeD
NodeE
phosphorylation
NodeG
NodeH
ubiquitinisation
NodeG
NodeH
ubiquitinisation
NodeJ
NodeE
phosphorylation
NodeJ
NodeC
ubiquitinisation
NodeG
NodeB
binding
NodeG
NodeJ
phosphorylation
NodeD
BioLayout Input Formats:
Matrix (.matrix)
A
B
C
D
E
F
A
1
0.925071
0.917047
0.913604
0.908269
0.900065
B
0.925071
1
0.914733
0.909063
0.90429
0.898442
C
0.917047
0.914733
1
0.945363
0.939911
0.929149
D
0.913604
0.909063
0.945363
1
0.933104
0.928441
E
0.908269
0.90429
0.939911
0.933104
1
0.9277
F
0.900065
0.898442
0.929149
0.928441
0.9277
1
Adding Annotations to Nodes
Nodes can be assigned to ‘classes’ which can be
used to select nodes and provide different visual
properties individual classes. Terms can be mined
//NODECLASS
//NODECLASS
//NODECLASS
A
B
C
“up”
“down”
“Kinase”
“4hrs”
“4hrs”
“GOTerm”
Likewise nodes can be assigned a shape and size in
the input file
BioLayout Input Formats:
Expression Data (.expression)
Probe
XXXX
YYYY
ZZZZ
Annotation1
Kinase
Unknown
Phosphatase
Annotation2
MAPKK
PKZZ
HPp43a
Data1
400.3
300.3
405.3
Can be created in Excel as Tab Separated Text file
Input file must be called “something.expression”
BioLayout will then create a something.pearson
Watch the console for graph size – current max size
25,000 nodes and 2,000,000 edges = 600MB
Data2
900.34
100.3
1002.4
Data3
295.3
0.22
299.2
Graph Paradigm for Gene Expression Data
• Co-expression defined using correlation measure (e.g. Pearson)
• Genes (nodes) are connected to each other in a network based on their level
of co-expression (edges)
120
Tissue2
100
95
50
40
4
Gene1
Gene2
Gene3
Gene4
Gene5
Tissue3
50
55
50
50
2
Tissue4
50
60
50
55
5
100
100
100
100
4
gene1
gene2
gene3
gene4
gene5
100
80
Expression
Tissue1
60
40
20
1
3
0
1
4
Gene1 Gene2 Gene3 Gene4 Gene5
100% 99% 58% 38% 23%
99% 100% 64% 46% 31%
58% 64% 100% 97% 13%
38% 46% 97% 100% 16%
23% 31% 13% 16% 100%
50,000
Gene1
Gene2
Gene3
Gene4
Gene5
3
Sample
5
2
2
1.25 billion
calculations
50,000
4
GNF1M Node/Edge Profiles Across Range of Pearson
Correlation Coefficients
gcRMA
MAS5
MAS5
gcRMA
Rank
Maximum
Expressio
n67-100%
34-66%
No. of nodes
0-33%
GNF1M Graph Profiles Across Range of Pearson
Correlation Coefficients
gcRMA
MAS5
0.9 Pearson
Markov Cluster Routine (MCL)
MCL is short for the Markov Cluster Algorithm, a fast
and scalable unsupervised cluster algorithm for network
graphs based on simulation of stochastic flow in graphs.
This is one of the few methods available for clustering
data in network graphs and arguably the best.
Application of MCL allows large graphs to be divided in
sub-groups where the connectivity between groups of
nodes
Stijn van Dongen (2000)
Graph clustering by flow simulation
http://micans.org/mcl/
Experimental Design
The design of a given experiment influences the resultant graph
The size of a graph at a given correlation cut off value, is a function of the number
of probes on the array and number of samples analysed
•The smaller the dataset the larger the graph
•The less biological or experimental variation the less structure of the network
The order of the samples has no effect on the correlation
The experimental question has no influence on the graph
Normalisation Method and Platform Dependency
BioLayout Express3D does not possess the ability to normalise data, nor in
principle does it matter whether the input data has been normalised, log
transformed or converted into ratio-metric data. A correlation matrix will be
calculated and a graph plotted regardless.
Normalisation method does have a profound affect on graph structure: data
normalised by quantile methods reduce variation therefore increase correltion.
BioLayout Express3D is not restrained in analysing data from any commercial
or academic microarray platform; the input format is the same regardless of the
platform the data was generated on.
Focus of the Study
In contrast to a statistical approach to identifying genes of interest in a
dataset where the biological groupings and contrasts of interest need to be
defined, the network paradigm presents the answer irrespective of the
question asked.
Expression data
(normalised and annotated)
Gene to gene
Pearson correlation calculated
for every probe set on the array
Pearson correlations >0.7 saved
Pearson correlation file >0.7
filtered based on user defined threshold (0.7-1.0)
Edges drawn between nodes (genes) based on
correlations > than selected threshold
Singletons and graphs with
<n members removed
optimised weighted Fruchterman-Rheingold layout
3-D visualisation Open-GL
Network graphs laid
out in tiled arrangement and clustered
Download