pptx

advertisement
David Amar
http://tau.ac.il/~davidama/bioinfo_tutorials
Network biology
 Overview: systems biology
 Represent molecular entities
 Represent interactions
 Two main data types
 Pathways
 Interaction networks
Biological interaction networks
 Nodes: genes or other molecules
 Edges: evidence for some interaction – can contain
weights, directions
Magtanong
et al. 2011
Nature
Biological interaction networks
 Nodes: genes/proteins or other molecules
 Edges based on evidence for interaction
Gene co-expression
Protein-protein
interaction
Genetic interaction
Breker and
Schuldiner
2009
Voineagu et al. 2011 Nature
4
Cytoscape
 Cytoscape is an open source software for integrating,
visualizing, and analyzing networks.
 This tutorial describes the Cytoscape 3 user interface.
 Outline
 Basics
 Load and visualize data
 Customize
 Applications
 Clustering
 Enrichment analysis
 GeneMANIA
 Modmap
 Gene expression analysis
Initial window
Main Network View, initially blank.
The toolbar, contains
command buttons, the
name is shown when the
mouse pointer hovers
over it.
Control Panel: lists the
available networks by
name
Network Overview
Pane
Table Panel: can be used to display node,
edge, and network table data
Load data: import from databases
Load data: import from databases
The initial window enables
searching in the big public
databases
Load data: import from databases
Search example: by gene
name
Choose
databases
Import result
The
imported
networks by
name
Basic
statistics
Look at a network
Main Network View
The toolbar, contains
command buttons, the
name is shown when the
mouse pointer hovers
over it.
Control Panel: lists the
available networks by
name
Network Overview
Pane: move around!
Table Panel: displays node, edge, and
network table data
Search for a gene
Information about the
marked nodes
Load data: import all interactions
Load data: import all interactions
Import result
The new network
Load data: from files
 We sometimes have our own data
 From papers
 A special search in a database
 Our experiment (e.g., correlation between genes)
 Famous formats
 SIF
 A table
 OWL – for pathways, “complex” text

But easy to get and very informative once uploaded
Load from files
Load from files
Contains an interaction
network of 331 genes from
Ideker et al. 2001 Science
Load data: from SIF files
Text: name1<space or tab>interaction_type<space or tab>name2
Load data: from a table
 From excel files or tab-delimited text tables
Load data: from a table
Load data: from a table
Set where to look for
the nodes and the
type
Load data: from a table
OPTIONAL: Click on
the columns that you
want to be kept as
“attributes”
Result
Load data: OWL
 Good for looking at pathways
 This example: data from the Reactome database
Load data: result
Directed
edges:
signaling
Zoom
Zoom
Focus on a selected region (nodes in yellow)
Zoom: result
Move around
Get a sub-network
Get a sub-network
The subnetwork was
created below
the original
network
Save the session
 We imported six networks
 Before we start modifying them lets save the session
 File -> Save
Sanity check:
close
Cytoscape
and load the
session!
Remarks
 At this point we know to load data from databases and
files
 We can perform simple navigation, zoom and save
 We saved different networks each its own visualization
‘rules’
 A good habit that saves troubles: save a session for
each visualization type
 Multiple networks, but keep a consistent visualization
Modifying and saving a visualization
 Cytoscape supports countless options
 Layouts
 Node size, color, label…
 Edge width, line type…
 We will show main examples that are enough to start
 To save the graph as an image:
Change the layout
Organic layout
Circular layout
•Places all of the nodes in a
circular arrangement.
•Very quick
•Partitions the network into
disconnected parts and
independently lays out
those parts.
Force-directed
Uses physical simulation that models the nodes as physical objects and
the edges as springs connecting those objects together.
Change layout scale
Change the scale
Before: scale is 1
Scale is 8
Style
Open and
modify
The IntAct netowrk: node color
The IntAct netowrk: node color
Node color
Each column represents some
information that we have
Discrete: set a value for each type of
information
Apps
 Cytoscape also has many tools called ‘apps’
 Install by going to Apps -> App Manager
 Applications support
 Advanced analysis
 Biological analysis
 Integrating data
 Import special data
I) Find and annotate dense areas
 Use an app that “clusters” the
network
 Biological assumption
 We look for protein
communities
 Many interactions within
 Probably share function
 Gene function prediction
Step 1: remove duplicated edges
 Sometimes nodes are linked by more than one edge
 Multiple evidence for interaction
 Remove them for clustering and simpler visualization
Step 2: use ClusterViz
Step 3: look at the results
All clusters
Sorted by size
Select a cluster
Step 3: look at the results
Step 4: biological function?
 We discovered a cluster
 A set of highly connected proteins
 What biological processes/functions are enriched in
this cluster?
 Discover significantly over-represented biological
functions
 Compared to creating random clusters
Step 4: BINGO
Select all nodes (Ctrl+A)
Step 4: BINGO
Give the cluster a
name (“Cluster 1”)
Select human
Step 4: Results
Summary table
GO graph
Only correted p-values
matter!!!
Mark in the
network
II) Analyze a gene set
 We have a set of genes we want to interpret
 From papers
 From data analysis
 We want to discover
 Functional enrichments
 How they interact within themselves and similar genes
 Use GeneMANIA
Resources and installation
 Installing GeneMANIA may
take >30 minutes
 Steps
1. Apps -> Apps Manager
2. Install GeneMANIA
3. Open GeneMANIA
(Apps->GeneMANIA)
1.
2.
Confirm data download
A new window will open:
select human for this
tutorial
GeneMANIA
 Our input: a set of genes from Hauser et al. 2005
(http://archneur.ama-assn.org/cgi/pmidlookup?view=long&pmid=15956162)
HSPA1B, HSPA1A, DNAJC6, DNAJB2, UBE1, PARK5,
SLC25A5, COX5B, COX6C, NDUFA3, ATP5I, HK1, COX4I1,
ATP1B1, COX6B, SLC25A3, NDUFS5, ATP5O, UQCRH,
ATP5C1, NDUFB8, ATP5G3, ATP5C1, VDAC3, COX4I1,
COX7B, NDUFA9, ATP1B1, ATP6V0A1, ATP6V0D1, ATP6V0C,
ATP6V1B2, SLC9A6, ATP61P1, ATP6V1D, ATP6V0B,
ATP6V1A1, ATP6V1E1, GDI1, STXBP1, SYT1, VAMP1
GeneMANIA: input window
Paste here the gene
names (or ids)
separated by spaces
(no commas)
GeneMANIA: input window
GeneMANIA: input window
The recognized
genes and their full
names
The type of the supported
networks
For each
interaction type
there is a list of
networks that can
be marked
GeneMANIA: input window
Use physical interactions, pathways
and co-expression for our example
Results
The output network. Grey nodes are
new genes that were added to improve
the connectivity
Information tables.
For example: the
detected functions
Results
Layout was modified to organic for
better visualization
Mark a function:
automatically
marks the relevant
nodes
VS.
Highlight specific interactions
Highlight specific interactions
III) Analyze different interaction types…
Members of
protein complex
VS.
 “Positive” – expected within families
 “Negative” – expected between families
 Some networks contain both
Members of parallel
pathways
Analysis of network pairs
 Interactions types can differ: within (“positive”) vs.
between (“negative”) functional units
 Input: networks H,G with same vertex set
 Goal: summarize both networks in a module map
 Node – module: gene set highly connected in H
 Link – two modules highly
interconnected in G
 Between-pathway models
Kelley and Ideker 2005
Ulitsky et al. 2008
Kelley and Kingsford 2011
Leiserson et al. 2011
69
Solution: ModMap
 Cytoscape app: under construction
 Currently: run the command line tool and upload to
Cytoscape as a solution
 We will show how to upload a solution
Load ModMap analysis
 Our example: combined analysis of yeast PPI and GI
data
 Find GI among complexes
1. Load the network: type interaction types
2. Load the association of nodes to modules
3. Color the results and the set layout
Load the network
 Load the YeastData.xlsx file
Important, we have
several types
Load the network
 Load the YeastData.xlsx file
The network is large,
we tell Cytoscape to
generate it
Load a clustering solution
Modmap_modules.txt file format
(text file):
Node module_name
Import Table: a way to add external
information about the nodes
Load a clustering solution
Right click and give it a
name
Load a clustering solution
Right click and give it a
name
Load a clustering solution
Layout a clustering solution
Layout a clustering solution: results
A circle for each cluster
Unclustered nodes
Remove unclustered nodes
Mark the selected nodes and create a sub-network
Remove self and duplicated edges
Zoom in on a part of the solution
Not informative enough, we cannot see edge types…
Change the visualization style
Change the visualization style
Change the visualization style
IV) Overlay gene expression data
 Class/Home exercise (data in the exp_data directory)
 Load human PPI
 Load gene fold-change in a gene expression
experiment
 Set node color and size by the fold change
 Play with the layout
 For example, group attribute layout
 Run BINGO on a selected sub-network
Download