Crunching Huge Phylogenies: A Rapid Bootstrap Algorithm and Massive Parallelism on the IBM BlueGene Alexandros Stamatakis Swiss Federal Institute of Technology Lausanne (EPFL) School of Computer & Communication Sciences Laboratory for Computational Biology and Bioinformatics Lausanne, Switzerland & Swiss Institute of Bioinformatics Alexandros.Stamatakis@epfl.ch icwww.epfl.ch/~stamatak The Missing Part Data Assembly Alexandros Stamatakis, October 2007 Inference ? Tree Analysis The Missing Part Data Assembly Alexandros Stamatakis, October 2007 Tree Analysis IBM BlueGene/L supercomputer Alexandros Stamatakis, October 2007 Rapid Bootstrapping Bootstopping Criterion Alexandros Stamatakis, October 2007 The Big Hardware Problem CPU Speed 40% p.a. Memory Speed 9% p.a. 1980 Alexandros Stamatakis, October 2007 2007 ... and why this concerns Bioinformatics Sequence Data CPU Speed 40% p.a. Memory Speed 9% p.a. 1980 Alexandros Stamatakis, October 2007 2007 ... and why this concerns Bioinformatics Application of HPC techniques will become CPU Speed 40% p.a. much more important Sequence Data Memory Speed 9% p.a. 1980 Alexandros Stamatakis, October 2007 2007 Cache Hierarchy Alexandros Stamatakis, October 2007 Outline ● Introduction ● ● ● ● ● ● ● Web & Grid Services Three Steps Towards the Tree of Life ● ● Computation of Phylogenies Maximum Likelihood Parallelism on IBM BlueGene/L Rapid Bootstrapping A Bootstopping criterion Related Projects Outlook Alexandros Stamatakis, October 2007 Phylogenetics Input: “good” multiple Alignment Output: unrooted binary tree Various methods for phylogenetic inference Neighbour Joining (fast & simple) Maximum Parsimony (relatively fast & simple) Maximum Likelihood (complex & slow) Bayesian Methods (complex & slower) Alexandros Stamatakis, October 2007 Phylogenetics Input: “good” multiple Alignment Output: unrooted binary tree ML & Bayesian: explicit model choice Various methods for phylogenetic inference Neighbour Joining (fast & simple) Maximum Parsimony (relatively fast & simple) Maximum Likelihood (complex & slow) Bayesian Methods (complex & slower) Alexandros Stamatakis, October 2007 Phylogenetics Complex Methods & Input: “good” multiple Alignment Models required to Output: unrooted binary tree reconstruct large & Various methods for phylogenetic complicated trees ! inference of(fast this talk is on NeighbourFocus Joining & simple) Maximum Likelihood! Maximum Parsimony (relatively fast & simple) Maximum Likelihood (complex & slow) Bayesian Methods (complex & slower) Alexandros Stamatakis, October 2007 Phylogenetics Input: “good” multiple Alignment Output: unrooted binary tree Various methods for phylogenetic inference NeighbourThe Joining & simple) real (fast reason for on (relatively ML: ...... fast & Maximum working Parsimony simple) Maximum Likelihood (complex & slow) Bayesian Methods (complex & slower) Alexandros Stamatakis, October 2007 Challenges for Phyloinformatics Holy grail: “Tree of Life” What is a good alignment in a phylogenetic context? Simultaneous alignment and tree building Improve/extend models ... but thereby size of computable trees decreases! More HPC awareness Exploit multi-core architectures Amount of available data grows at a higher rate than algorithms are getting faster Alexandros Stamatakis, October 2007 The algorithmic problem Alexandros Stamatakis, October 2007 The number of trees Alexandros Stamatakis, October 2007 The number of trees Alexandros Stamatakis, October 2007 The number of trees Alexandros Stamatakis, October 2007 The number of trees explodes! BANG ! Alexandros Stamatakis, October 2007 Outline ● Introduction ● ● ● ● ● ● ● Web & Grid Services Three Steps Towards the Tree of Life ● ● Computation of Phylogenies Maximum Likelihood Parallelism on IBM BlueGene/L Rapid Bootstrapping A Bootstopping criterion Related Projects Outlook Alexandros Stamatakis, October 2007 Maximum Likelihood Length: m Seq1 Seq2 Seq3 Seq4 Alignment Alexandros Stamatakis, October 2007 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Alexandros Stamatakis, October 2007 Substitution model Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Alexandros Stamatakis, October 2007 Prior probabilities, Empirical base frequencies Substitution model πA πC πG πT Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 Prior probabilities, Empirical base frequencies Substitution model πA πC πG πT Seq 3 b3 b1 b5 b2 Seq 2 Alexandros Stamatakis, October 2007 b4 Seq 4 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 Prior probabilities, Empirical base frequencies Substitution model πA πC πG πT Seq 3 b3 b1 b5 b2 b4 Seq 2 Seq 4 virtual root: vr Alexandros Stamatakis, October 2007 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 b1 Substitution model vr Seq 3 b5 b4 P(A) P(C) P(G) P(T) P(A) P(C) P(G) P(T) m Alexandros Stamatakis, October 2007 πA πC πG πT b3 b2 Seq 2 Prior probabilities, Empirical base frequencies Seq 4 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Substitution model πA πC πG πT Lots of floating point Seq 3 b3 vr operations! b5 Seq 1 b1 b2 Seq 2 Prior probabilities, Empirical base frequencies b4 P(A) P(C) P(G) P(T) P(A) P(C) P(G) P(T) m Alexandros Stamatakis, October 2007 Seq 4 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Seq 1 Seq 2 Prior probabilities, Empirical base frequencies Substitution model πA πC πG πT Seq 3 Seq 4 optimize branch lengths Alexandros Stamatakis, October 2007 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Prior probabilities, Empirical base frequencies Substitution model πA πC πG πT optimize model parameters Seq 1 Seq 2 Alexandros Stamatakis, October 2007 Seq 3 Seq 4 Maximum Likelihood Goal: Obtain topology with maximum likelihood value Problem I: Number of possible topologies is exponential in n Problem II: Computation of likelihood function is expensive Problem III: Probably high score accuracy required Problem IV: High memory consumption Solution: • New Algorithms • New Models • High Performance Computing Alexandros Stamatakis, October 2007 Maximum Likelihood Goal: Obtain topology with maximum likelihood value Problem I: Number of possible topologies is exponential in n RAxML Randomized Axelerated Problem III: Probably high score accuracy requiredLikelihood Maximum Problem II: Computation of likelihood function is expensive Problem IV: High memory consumption Solution: • New Algorithms • New Models • High Performance Computing Alexandros Stamatakis, October 2007 Web & Grid Services RAxML Web-Server at San Diego Supercomputing Center via www.phylo.org (CIPRES project) Web-Server at Vital-IT unit of Swiss Institute of Bioinformatics phylobench.vital-it.ch/raxml-bb/ Includes novel search algorithm with 1 order of magnitude run-time improvement Since Sept 3, about 700 jobs from 130 Ips Extension to SwissGrid planned Novel algorithm with Bootstopping to be integrated into CIPRES portal soon RAxML integration into Distributed European Infrastructure for Supercomputing Applications www.deisa.org started 10 days ago Integration into Debian medical distribution Alexandros Stamatakis, October 2007 RAxML Black Box Alexandros Stamatakis, October 2007 RAxML Black Box Why are Black Boxes useful? Alexandros Stamatakis, October 2007 Outline ● Introduction ● ● ● ● ● ● ● Web & Grid Services Three Steps Towards the Tree of Life ● ● Computation of Phylogenies Maximum Likelihood Parallelism on IBM BlueGene/L Rapid Bootstrapping A Bootstopping criterion Related Projects Outlook Alexandros Stamatakis, October 2007 Levels of Parallelism Embarrassing Parallelism MPI, CORBA, Grid Technologies Alexandros Stamatakis, October 2007 Coarse-Grained Parallelism: MPI Version of RAxML PC-CLUSTER Worker Processes B-2 B-0 B-1 B-3 Interconnection Network B-4 Master Process Alexandros Stamatakis, October 2007 Levels of Parallelism Embarrassing Parallelism MPI, CORBA, Grid Technologies Inference Parallelism MPI, algorithm-dependent Alexandros Stamatakis, October 2007 Levels of Parallelism Embarrassing Parallelism MPI, CORBA, Grid Technologies Inference Parallelism MPI, algorithm-dependent Loop-Level Parallelism OpenMP, GPUs, IBM CELL (Playstation), IBM BlueGene, Clusters with fast Interconnect Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P Q R P[i] = f(Q[i], R[i]) Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P This operation uses ≥ 90% of total execution time ! Q R P[i] = f(Q[i], R[i]) Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P This operation uses ≥ 90% of total execution time ! → simple fine-grained parallelization Q R P[i] = f(Q[i], R[i]) Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P Q R Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P Q R Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P Q R Alexandros Stamatakis, October 2007 Loop Level Parallelism virtual root P The real reason for assuming independent evolution among sites: ...... Q R Alexandros Stamatakis, October 2007 Fine-Grained Parallelism: OpenMP version of RAxML Alexandros Stamatakis, October 2007 Fine-Grained Parallelism: OpenMP version of RAxML Alexandros Stamatakis, October 2007 HPC for ML (Bayesian) Proof of Concept & Programming Techniques: RAxML on a Graphics Processing Unit RAxML on the IBM CELL & Playstation Production Level Implementations: RAxML with OpenMP RaxML with MPI RAxML on BlueGene Multi-Core Architectures Alexandros Stamatakis, October 2007 HPC for ML (Bayesian) Proof of Concept & Programming Techniques: RAxML on a Graphics Processing Unit RAxML on the IBM CELL & Playstation Production Level Implementations: A good excuse to buy one RAxML with OpenMP RaxML with MPI RAxML on BlueGene Multi-Core Architectures Alexandros Stamatakis, October 2007 RAxML-BlueGene Many slow processors: 1024 in one rack 512 MB or 1GB of main memory per node But: high performance network Challenges: Distribute tree data structure among CPUs Exploit fast collective communication network For optimal efficiency: loop-level + embarrassing parallelism hybrid parallelism with MPI Test & Production Run Data With Olaf Bininda-Emonds, Jena: 2,182 mammalian sequences x 51,000 base pairs With Dan Janies, Ohio State: 270 Human Haplotype Map sequences x 500,000 base pairs Alexandros Stamatakis, October 2007 RAxML-BlueGene To be presented at IEEE/ACM 2007 Supercomputing Many slow processors: 1024 in one rack Conference. 512 MB or 1GB of main memory per node But: high performance network Challenges: Distribute tree data structure among CPUs Exploit fast collective communication network For optimal efficiency: loop-level + embarrassing parallelism hybrid parallelism with MPI Test & Production Run Data With Olaf Bininda-Emonds, Jena: 2,182 mammalian sequences x 51,000 base pairs With Dan Janies, Ohio State: 270 Human Haplotype Map sequences x 500,000 base pairs Alexandros Stamatakis, October 2007 RAxML-BlueGene Many slow processors: 1024 in one rack 512 MB or 1GB of main memory per node But: high performance network Challenges: ML analysis toCPUs date in Distribute treeLargest data structure among terms ofcommunication memory footprint Exploit fast collective network For optimal efficiency: loop-level + embarrassing parallelism hybrid parallelism with MPI Test & Production Run Data With Olaf Bininda-Emonds, Jena: 2,182 mammalian sequences x 51,000 base pairs With Dan Janies, Ohio State: 270 Human Haplotype Map sequences x 500,000 base pairs Alexandros Stamatakis, October 2007 Loop-Level Parallelism on BlueGene Alexandros Stamatakis, October 2007 50 Seqs x 23,385 bp Alexandros Stamatakis, October 2007 50 Seqs x 23,385 bp Superlinear Speedup Alexandros Stamatakis, October 2007 250 Seqs x 403,581 bp Alexandros Stamatakis, October 2007 Embarrassing Parallelism W W W W W M M W W M M W W W W W Alexandros Stamatakis, October 2007 Outline ● Introduction ● ● ● ● ● ● ● Web & Grid Services Three Steps Towards the Tree of Life ● ● Computation of Phylogenies Maximum Likelihood Parallelism on IBM BlueGene/L Rapid Bootstrapping A Bootstopping criterion Related Projects Outlook Alexandros Stamatakis, October 2007 Confidence Values Tree without node confidence values is mostly useless Problem: Confidence value calculation is major computational obstacle We can compute large trees but not analyse them: compute ≠analyse ! Current Slow Methods Sampling with Bayesian methods Non-parametric Bootstrapping Alexandros Stamatakis, October 2007 A Tree with Confidence Values JointAlexandros work Stamatakis, with Marc Charite Hospital, Berlin OctoberGottschling, 2007 Bootstrapping Original Alignment perturbation compute tree compute tree compute tree Alexandros Stamatakis, October 2007 Bootstrapping Original Alignment This needs to be done 100-1000 times Embarrassingly Parallel ! perturbation compute tree compute tree compute tree Alexandros Stamatakis, October 2007 Two Questions How to compute Bootstraps faster? How many Bootstrap replicates do we need? Alexandros Stamatakis, October 2007 Current Work: Rapid Bootstrapping Algorithm Tested on 22 diverse (mammals, bacteria, archaea, grasses, fishes, plants, viral) real-world DNA/AA single-/multi-gene datasets containing 125-7,764 sequences Pearson correlation on best-scoring ML trees between RBS (Rapid BS) & SBS (Standard BS) support values 0.95-0.99 (except one dataset at 0.91), average 0.97 Weighted topological distance < 6%, average 4% Program Acceleration: 8-20, average ≈ 15 Acceleration by one order of magnitude Full ML analysis (100BS + ML search) of datasets of up to 5,000 sequences within less than 5 days on your desktop! Allows for a sufficiently large number of Bootstrap replicates Alexandros Stamatakis, October 2007 Quick & Dirty Bootstrap Modify Algorithm Computational Experiments Alexandros Stamatakis, October 2007 Quick & Dirty Bootstrap Modify Algorithm iterate Computational Experiments Alexandros Stamatakis, October 2007 Rapid Bootstrap 11111111111111 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007 Rapid Bootstrap 11111111111111 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007 Compute Starting Tree Rapid Bootstrap 11111111111111 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007 Optimize Model Params & Branch Lengths Rapid Bootstrap Use Starting Tree & Model Params to compute RELL scores 11111111111111 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007 -110 -105 -100 Rapid Bootstrap Use Starting Tree & Model Params to compute RELL scores 11111111111111 01102211111111 10111102220111 11111110112021 Alexandros Stamatakis, October 2007 -110 -105 -100 Sort by RELL Rapid Bootstrap 11111111111111 11111110112021 10111102220111 01102211111111 Alexandros Stamatakis, October 2007 -100 -105 -110 T0: Thorough Search Rapid Bootstrap 11111111111111 11111110112021 10111102220111 01102211111111 Alexandros Stamatakis, October 2007 -100 -105 -110 T0: Thorough Search T1: Quick Search on T0 Rapid Bootstrap 11111111111111 11111110112021 10111102220111 01102211111111 Alexandros Stamatakis, October 2007 -100 -105 -110 T0: Thorough Search T1: Quick Search on T0 T2: Quick Search on T1 Rapid Bootstrap 11111111111111 sequential dependency is bad for 11111110112021 parallelism 10111102220111 01102211111111 Alexandros Stamatakis, October 2007 -100 -105 -110 T0: Thorough Search T1: Quick Search on T0 T2: Quick Search on T1 Scalability of Rapid Bootstrap Alexandros Stamatakis, October 2007 Scalability of Rapid Bootstrap Some datasets are harder than others Alexandros Stamatakis, October 2007 Scalability of Rapid Bootstrap Alexandros Stamatakis, October 2007 ML-Scores: Garli, RAxML, PHYML 715 Sequences Alexandros Stamatakis, October 2007 Correlation 125 Taxa: 0.91 Alexandros Stamatakis, October 2007 Support Value Distribution Alexandros Stamatakis, October 2007 Bootstrap Likelihood Values 125 x 19,436 10,000 replicates only 195 non-trivial bipartitions Alexandros Stamatakis, October 2007 Bootstrap Likelihood Values 125 x 19,436 Alexandros Stamatakis, October 2007 3,491 rBCL sequences Rapid versus Standard BS Correlation: 0.98 Alexandros Stamatakis, October 2007 7,764 DNA Best Tree Alexandros Stamatakis, October 2007 7,764 DNA All Bipartitions Alexandros Stamatakis, October 2007 775 x 3,838 AA Alexandros Stamatakis, October 2007 New Opportunities Assess Impact of Alignment Method on tree and support values Test Bootstrap of the Bootstrap (double Bootstrap) procedures Devise and empirically verify Bootstopping criteria Alexandros Stamatakis, October 2007 Bootstrap of the Bootstrap 140 AA (Efron et al PNAS 1996) Alexandros Stamatakis, October 2007 Bootstrap of the Bootstrap 3,491 rBCL Alexandros Stamatakis, October 2007 Bootstopping Rapid Bootstrapping allows to assess Bootstopping criteria as follows 1. Compute a high number of BS replicates (10,000) 2. Devise topology-based bootstopping criterion and apply it to these 10,000 replicates 3. Compare support values induced by bootstopped trees (say 300 replicates) with 10,000 replicates We have 10,000 replicates for 18 datasets containing 125 to 2,554 sequences Alexandros Stamatakis, October 2007 Bootstopping Criterion Every 50, 100, 150, ... replicates do a test: Say we have N BS trees Do the following 100 times: Randomly split up this set of N trees into 2 equal sets S1, S2, of size N/2 Compute the bipartition support vectors for S1 and S2 Compute Pearson correlation of the support vectors return average of the 100 Pearson correlations if average > 0.99 stop Alexandros Stamatakis, October 2007 Result Overview Bootstopped between 100-400 (avg 213) Correlation on best tree: Bootstopped versus 10,000 replicates > 0.99 (avg 0.995) Correlation of all bipartitions > 0.995 (avg 0.997) Alexandros Stamatakis, October 2007 Bootstopping Best 140 AA Alexandros Stamatakis, October 2007 Bootstopping Best 404 DNA (Multi-Gene) Alexandros Stamatakis, October 2007 Bootstopping Best 994 DNA Alexandros Stamatakis, October 2007 Bootstopping All 994 DNA Alexandros Stamatakis, October 2007 Bootstopping Best 1,908 DNA Alexandros Stamatakis, October 2007 Bootstopping Best 2,554 DNA Alexandros Stamatakis, October 2007 Putting the Pieces together Blue-Gene: Can handle huge datasets Use Cat approximation on BlueGene Further speedup of factor 3.5 Memory footprint reduction factor 4 Alexandros Stamatakis, October 2007 8,864 Bacteria under GTR+Γ and GTR+CAT Log Likelihood Score under Γ 7 days Execution Time Alexandros Stamatakis, October 2007 14 days Putting the Pieces together Blue-Gene: Can handle huge datasets Use Cat approximation on BlueGene Integrate rapid Bootstrap into BlueGene version Further speedup of factor 3.5 Memory footprint reduction factor 4 Additional speedup ≈ 15 Mechanisms available to accelerate BlueGene version by factor 50-60 Integrate Bootstopping into BlueGene Conclusion: We will soon be able to compute a small tree of life with 10,000 organisms and data from multiple genes! Alexandros Stamatakis, October 2007 Outline ● Introduction ● ● ● ● ● ● ● Web & Grid Services Three Steps Towards the Tree of Life ● ● Computation of Phylogenies Maximum Likelihood Parallelism on IBM BlueGene/L Rapid Bootstrapping A Bootstopping criterion Related Projects Outlook Alexandros Stamatakis, October 2007 Host-Parasite Co-Evolution Hosts (eg Mammals) Alexandros Stamatakis, October 2007 Parasites (eg Lice) Host-Parasite Co-Evolution Hosts Parasites Co-Evolution Hypothesis 8 Parasites Adjacency 6 hosts Matrix 0/1 Alexandros Stamatakis, October 2007 Host-Parasite Co-Evolution Hosts Parasites Co-Evolution Hypothesis 8 Parasites Adjacency 6 hosts Matrix 0/1 Statistical Test Alexandros Stamatakis, October 2007 What can HPC do forBioinformatics? Axelerated Parafit “Parafit: statistical test of co-evolution”, Pierre Legendre, Syst. Biol. 2003 AxParafit (Axelerated Parafit) Statistical test of hypotheses of host-parasite coevolution C porting, optimization, BLAS integration Speedup up to factor 67 Master-Worker MPI-parallelization Largest co-phylogenetic study to date conducted within 8 minutes instead of 4 weeks Open-Source Code: http://icwww.epfl.ch/~stamatak/AxParafit.html SwissGrid-based Web-Server planned Alexandros Stamatakis, October 2007 AxParafit: Sequential Performance Alexandros Stamatakis, October 2007 AxParafit: Parallel Performance Alexandros Stamatakis, October 2007 The ML Benchmark: A Current Community Project Standardized way required to test ML search programs Web-Server with real-world alignments and performance data at Swiss Institute of Bioinformatics Many developers of popular ML programs involved Stephane Guindon (PHYML) Montpellier Simon Wheelan (LeaPhy) Manchester Bui Quang Minh (IQPNNI) Vienna Derrick Zwickl (GARLI) Virginia Thomas Keane (dprML) Cambridge Byproduct: SPEC-like CPU benchmark for phylogenetics Follow-up: (planned) ML competition at major conference with industrial sponsor Alexandros Stamatakis, October 2007 A Current Problem: Handling Multi-Gene Alignments Gene 1 Gene 2 Sequence 1 Sequence 5 Missing Data ≠ Gap Data Alexandros Stamatakis, October 2007 A Multi-Gene Model Alexandros Stamatakis, October 2007 A Multi-Gene Model Alexandros Stamatakis, October 2007 A Multi-Gene Model Alexandros Stamatakis, October 2007 A Multi-Gene Model LogLH (T) = LogLh (T|Red) Alexandros Stamatakis, October 2007 A Multi-Gene Model LogLH (T) = LogLh (T|Red) + LogLH(T|Yellow) Alexandros Stamatakis, October 2007 A Multi-Gene Model Challenge: devise efficient data structures for this LogLH (T) = LogLh (T|Red) + LogLH(T|Yellow) Alexandros Stamatakis, October 2007 Why are Individual Branches per Gene a Challenge? Alexandros Stamatakis, October 2007 Why are Individual Branches per Gene a Challenge? Alexandros Stamatakis, October 2007 Outlook Alexandros Stamatakis, October 2007 Outlook Tree of Life What is a good alignment in a phylogenetic context? Simultaneous alignment and tree building More HPC & memory-aware programming Multi-core architectures Models for “gappy” multi-gene alignments Alexandros Stamatakis, October 2007 Acknowledgements BlueGene Project Michael Ott, TUM Srinivas Aluru, Jaroslaw Zola, Iowa State Dan Janies, Andrew Johnson, Ohio State IBM CELL & Playstation Filip Blagojevic, Dimitris Nikolopoulos, Virginia Tech Christos Antonopoulos, Univ. of Thessaly Bootstopping Bernard Moret, Masoud Alipour, EPFL Olaf Bininda-Emonds, Univ. Jena RAxML Web-Server Jacques Rougemont, SIB Terri Liebowitz, SDSC AxParafit/AxPcoords Markus Goeker, Alexander Auch, Jan Meier-Kolthoff, University of Tuebingen Datasets for Studies Jun Inoue (Florida), Nicolas Salamin (Lausanne), Marc Gottschling (Berlin), Guido Grimm (Tuebingen), Nikos Poulakakis (Yale), Usman Roshan (NJIT) Alexandros Stamatakis, October 2007 Thank you for your Attention ! Lake Geneva, Switzerland Alexandros Stamatakis, October 2007