Introduction_final - Bioinfo-casl

advertisement
Exploring PPI networks using
Cytoscape
EMBO Practical Course Session 8
Nadezhda Doncheva and Piet Molenaar
Course Outline
Lectures & Labs


Protein focus

Graph context

Demo & Do it yourself use cases

Data from recent literature

Tips & Tricks
Biological questions


I have a protein


I have a list of proteins


Shared features, connections
I have data

2
Function, characteristics from known
interactions
Derive causal networks

Network



Topology
Hubs
Clusters
New hypotheses
4/13/2015
Instructor Introductions
Nadezhda Doncheva
Max Planck Institute for
Informatics,
Saarbrücken, Germany
http://www.mpiinf.mpg.de/departments/d3
Graph analysis using Cytoscape
Developed Cytoscape core
plugin
Piet Molenaar
AMC Oncogenomics,
Amsterdam, The Netherlands
piet.amc@gmail.com
http://humangenetics-amc.nl/
Network visualization and
analysis using Cytoscape
Developing Cytoscape plugins in
Java
Member of Cytoscape dev-team
Aidan Budd
Computational Biologist,
Gibson Team,
EMBL Heidelberg
http://www.embl.de/~budd/
Course coordinator/organizer
3
4/13/2015
Schedule
Timeslot
Course item
09:00-10:30
1. Introduction
• Networks and graph theory
• Cytoscape workflow
2. Tutorial session 1
• Focus: network generation
10:30-11:00
Coffee break
11:00-12:30
3. Tutorial session 2
• Focus: network annotation and visualization
12:30-14:00
Lunch
14:00-15:30
4. Tutorial session 3
• Focus: network analysis
15:30-16:00
Tea break
17:30-18:30
Afternoon session; Additional networking ;-)
4
4/13/2015
Overview Introduction
Part I: Introduction to molecular networks and graph
concepts


What are molecular networks?

Why are they useful?

What tools are available?
Part II: Introduction to Cytoscape

5

Network visualization

Plugins/Apps

Workflows
4/13/2015
Why networks?

Complex systems are better described as networks of
interacting components

The topology of a network characterizes the underlying
complex system (global topology parameters) and its
individual components (local topology parameters)

Network topology parameters are easily compared

Useful for discovering patterns in large data sets (better
than tables in Excel)

Allow the integration of multiple data types
6
4/13/2015
Biological networks

Nodes can represent proteins,
genes, metabolites, etc.

Edges can be physical or
functional interactions like

Protein-Protein interactions

Protein-DNA interactions

Metabolic interactions

Co-expression relations

Genetic interactions

…
Important to understand what
the nodes and edges mean

7
4/13/2015
Applications of network biology
”What do you want to do with your network?”

Gene function prediction based on connections to sets of
genes/proteins involved in same biological process

Detection of protein complexes by analyzing modularity
and higher order organization (motifs, feedback loops)

Identification of disease subnetworks that are
transcriptionally active in a disease
8
4/13/2015
Network visualization
Network layouts


Force-directed: nodes repel and
edges pull

Hierarchical: for tree-like networks

Manually adjust layout
Visually interpret a network

9

Global relationships

Dense clusters
4/13/2015
Visual features

Node and edge attributes
represent e.g. gene or
interaction attributes

Map attributes to node and
edge visual properties like
color, shape or size
10
4/13/2015
Common network analysis tasks

Network topology statistics
such as node degree,
betweenness, degree distribution
of nodes, clustering coefficient,
shortest path between nodes
and robustness of the network
to the random removal of single
nodes.

Modularity refers to the
identification of sub-networks of
interconnected nodes that might
represent molecules physically
or functionally linked that work
coordinately to achieve a specific
function.
11

Motif analysis is the
identification of small network
patterns that are overrepresented when compared
with a randomized version of
the same network. Discrete
biological processes such as
regulatory elements are often
composed of such motifs.

Network alignment and
comparison tools can identify
similarities between networks
and have been used to study
evolutionary relationships
between protein networks of
organisms.
4/13/2015
Networks as graphs

Formal graph definition: A graph G is a pair of two sets V
(nodes) and E (edges): G = (V, E)

Neighbors are two nodes n1 and n2 connected by an edge

Neighborhood is the set of all neighbors of node n

Connectivity kn is the size of the neighborhood of n

Degree k is the number of edges incident on n
 Note that cases exist with k ≠ kn!
12
4/13/2015
Node degree and shortest path




Hub is a node with an exceptionally
high degree, larger than the average
node degree (see red nodes).
A shortest path between the nodes n
and m is a path between n and m of
minimal length.
The shortest path length, or distance,
between n and m is the length of a
shortest path between n and m.
The characteristic path length is the
average shortest path length, the
expected distance between two
connected nodes.
13
4/13/2015
Small-world networks

A network is a small-world
network if any two arbitrary
nodes are connected by a small
number of intermediate edges, i.e.
the network has an average
shortest path length much smaller
than the number of nodes in the
network (Watts, Nature, 1998).

Interaction networks have been
shown to be small-world
networks (Barabási, Nature
Reviews in Genetics, 2004)
14
4/13/2015
Scale-free networks

Node degree
distribution counts the
number of nodes with
degree k, for k = 0, 1, 2, …

If the node degree
distribution of a network
approximates a power law
P(k) ~ ak-b with b < 3, the
network is scale-free
(Barabási, Science, 1999).
Many biological networks are scale-free.
15
4/13/2015
Scale-free vs. random networks

Random networks are
homogeneous, most nodes
have the same number of links)
 not robust to arbitrary
node failure

Scale-free networks have a
number of highly connected
nodes)
 robust to random failure,
but very sensitive to hub
failures

Implications to the robustness
of PPI networks (Jeong, Nature,
2001)
16
4/13/2015
Clustering coefficient

The clustering coefficient of
a node n is a ratio N=M, where
N is the number of edges
between the neighbors of a
node n, and M is the maximum
number of edges that could
possibly exist between the
neighbors of n.

The network clustering
coefficient is the average of
the clustering coefficients for all
nodes in the network.
17
4/13/2015
Network clustering

Find subsets of nodes, modules or
clusters, that satisfy some pre-defined
quality measure

Benefits


Finding “natural” clusters

Classifying the data

Detecting outliers

Reducing the data
Downsides

Real data very rarely presents a unique
clustering

Many different models  try out more
than one

Several alternative solutions could exist

Interpretation of clusters
18
4/13/2015
Motifs

A small connected graph with a
given number of nodes

Motif frequency is the number of
different matches of a motif

Functionally relevant motifs in
biological networks:


Feed-forward loop (1)

Bifan motif (2)

Single-input motif (3)

Multi-input motif (4)
Significance profiles of motifs
19
2.
1.
3.
4.
4/13/2015
Network organization
The levels of organization of
complex networks:

Node degree provides
information about single nodes

Three or more nodes represent a
motif

Larger groups of nodes are called
modules or communities

Hierarchy describes how the
various structural elements are
combined
20
4/13/2015
Available software tools

Cytoscape http://cytoscape.org/

BioLayout Express3D http://www.biolayout.org/

VisANT http://visant.bu.edu/

Ondex http://www.ondex.org/

Pajek http://pajek.imfm.si/

Ingenuity Pathway Analysis
http://www.ingenuity.com/products/pathways_analysis.html

Pathway Studio
http://www.ariadnegenomics.com/products/pathway-studio/
21
4/13/2015
Why Cytoscape?
www.cytoscape.org

Visualization, Integration & Analysis

Free & open source software application (LGPL license)

Written in Java: can run on Windows, Mac, & Linux

Developed by a consortium: UCSD, ISB, Agilent, MSKCC, Pasteur,
UCSF, Unilever, Utoronto; provide a permanent dedicated team of
developers

Active community: mailing lists, annual conferences

10,000s users, 3000 downloads/month

Extensible through plugins developed by third parties

It is used! Lots of citations
22
4/13/2015
Network analysis using Cytoscape
23
4/13/2015
Cytoscape extended functionality

Cytoscape extends its functionality
with plugins or apps

Developed by third parties

Listed at http://apps.cytoscape.org/

Usually available through the Plugin
Manager

Can be downloaded from the
plugins’s websites

Cover many diverse areas of
application
24
4/13/2015
A typical Cytoscape workflow
1.
Load networks
2.
Load attributes
3.
Analyze and visualize
networks
4.
Prepare for publication
Cline, et al. ”Integration of biological networks and
gene expression data using Cytoscape”, Nature
Protocols, 2, 2366-2382 (2007).
25
4/13/2015
Some useful Cytoscape links

Download: http://www.cytoscape.org/download.html

Tutorials:
http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape

Cytoscape Mailing lists:
http://www.cytoscape.org/community.html

Plugins/Apps: http://apps.cytoscape.org/

Documentation:
http://www.cytoscape.org/documentation_users.html
26
4/13/2015
On to the first Tutorial session

Unless any questions ???
27
4/13/2015
Download