How EPCC can help with computationally demanding code and large data sets MVM Research Symposium, 3rd November 2008 Eilidh Grant Applications Consultant, EPCC egrant1@epcc.ac.uk +44 131 650 5115 Overview • What is EPCC? • Sample projects – Modelling gene regulatory networks – Storing large amounts of microscopy data • How to get involved How EPCC can help – 3rd November 2008 2 What is EPCC? • • • • High performance computing Develop software for all areas of science and technology 80 expert staff 2 national UK supercomputers - HECToR and HPCx QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. How EPCC can help – 3rd November 2008 3 Biology projects Genetic markers of Circadian rhythm of colorectal cancer Gene regulatory Arabidopsis thaliana EPCC networks Parallel R for the statistical analysis of microarray data A data grid for cell biology Evolution of virulence in bird flu How EPCC can help – 3rd November 2008 4 Gene regulatory networks • Aim is to statistically analyse various biological data (microarrays, proteomics) to produce a bayesian network model of the underlying gene regulatory network. • Can only handle small networks of genes on a desktop computer. • EPCC are investigating how to get this code, originally written in Matlab to run on HPCx. How EPCC can help – 3rd November 2008 5 Gene regulatory networks CSBE Research Fellow Marco Grzegorczyk said: The implementation of the Metropolis-coupled Markov chain Monte Carlo algorithms [on HPCx] will allow us to infer more interesting networks with much more genes." – Centre for Systems Biology at Edinburgh How EPCC can help – 3rd November 2008 6 Data grid • Aim is to help scientists to share their data: – in simple, secure, and effective manner. • Microscope image data from cell biology: – though any (experimental) data could be relevant. • Working with researchers in Oxford and Edinburgh: – Davis and Finnegan groups studying early development of fruit fly Drosophilia Melanogaster. How EPCC can help – 3rd November 2008 7 Data grid • EPCC have developed a technology called DiGS (Distributed Grid Storage), • Data grid software, provides: – multi-Terabyte, distributed storage capability. – automated data replication and backup function. – periodic validation and consistency checking of data. • Supports annotation of experiments, with scientific metadata: – assisting with data provenance. – allowing fast and efficient searches across large volumes of data. How EPCC can help – 3rd November 2008 8 How to get involved • Speak to us • Collaborative research through a joint funding bid • HECToR - dCSE grants available for 6-12 months of effort and 10 000s CPU hours. • MSc projects or co-supervision of a PhD • HPCx - for the next year there will be time available on HPCx with some developer support. How EPCC can help – 3rd November 2008 9 Contact Details • Please get in touch to discuss any projects that could benefit from working with EPCC • http://www.epcc.ed.ac.uk • Contact EPCC at EPCC-Support@ed.ac.uk How EPCC can help – 3rd November 2008 10