Evolving inversion methods in Geophysics with Cloud Computing – a case study of an eScience collaboration Mudge, Chandrasekhar, Heinson, Thiel Prof J Craig Mudge FTSE University of Adelaide Australia School of Computer Science/ School of Earth Sceinces 7th IEEE eScience Conference, Stockholm, December 2011 1 Two South Australian successes in geology 1. Hot rocks for geo-thermal energy - 95% investment is in South Australia 2. Olympic Dam - BHP Billiton -- world's fourth largest copper deposit, fifth largest gold deposit and the largest uranium deposit. 2 craig.mudge@adelaide.edu.au IEEE eScience 2011 Outline 1. 2. 3. 4. 5. 6. Cloud computing Collaborative Cloud Computing Lab (C3L) Inversion in magnetotelluric processing Geothermal – EGS in South Australia Results and Lessons learned Future work Cloud service provider owns and operates the infrastructure and innovates to keep technology leading edge, handle software upgrades, and steadily reduce energy costs Google, Dalles Oregon Microsoft Azure, Chicago 4 Massive scale of data centres delivers 4 – 7X cost reduction and energy efficiency Air flow 5 A no-machines Lab machines eScience enabled by cloud computing Seed funding from -- Department of Mines www.pir.sa.gov.au -- MSFT Research Jim Gray Seed Grant Started June 2010 6 Our three cloud service providers 1. Amazon Web Services 2. Microsoft Azure Now adding government funded eResearch clouds which will run Open Stack (NASA and Rackspace) craig.mudge@adelaide.edu.au IEEE eScience 2011 7 Magnetotelluric (MT) imaging 1. 2. 3. Using the magnetic and electric fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest. It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution Applications 1. 2. 3. 4. 5. 6. mineral exploration, water management in mining, geothermal exploration, carbon storage, aquifer research and management earthquake and volcano studies. (Heinson and Mudge, 2010) CO2 in depleted gas field 8 Electrical resistivity Electromagnetic methods Data logging by University of Adelaide Geophysics, on a geothermal site – Paralana, SA, Australia 11 MT Processing steps Inversion craig.mudge@adelaide.edu.au IEEE eScience 2011 12 start Searching the solution space compute sensitivity matrix compute model’s MT response Inversion iterations: locally improve model misfit Compute model response, compare with observed data compare model response to observed data yes can locally improve misfit? no no required misfit? yes yes locally improve model smoothness can locally improve smoothness? no smooth enough? yes no no > max iterations? yes finish 13 craig.mudge@adelaide.edu.au IEEE eScience 2011 craig.mudge@adelaide.edu.au IEEE eScience 14 craig.mudge@adelaide.edu.au IEEE eScience 2011 15 Setting up a new inversion – part 1 craig.mudge@adelaide.edu.au IEEE eScience 2011 16 Setting up a new inversion – part 2 craig.mudge@adelaide.edu.au IEEE eScience 2011 17 Dashboard craig.mudge@adelaide.edu.au IEEE eScience 2011 18 Results and Lessons learned 19 craig.mudge@adelaide.edu.au IEEE eScience 2011 Speedup Sequential Parallel craig.mudge@adelaide.edu.au IEEE eScience 2011 20 Performance analysis beyond speedup Sequential Parallel Examples of recent performance analysis 1. Effect of FORTRAN compiler with different optimisations has been worth exploring. A factor of 3X speed up from the Intel Visual Fortran Composer XE 2011 for Windows. 2. “Steal time” - time lost due to hypervisor’s management of a virtual machine – Netflix have analysed their Amazon experience extensively craig.mudge@adelaide.edu.au IEEE eScience 2011 21 Results and learnings 1. “No-machines” works 2. Speedup has led to 100% adoption in MT research 3. First results of monitoring fluid injection in EGS Reservoirs using magnetotellurics (MT) – promising since seismic does not indicate fluid flow, and MT is low cost 4. Taking chunks of FORTRAN is achievable in a timely manner 5. Capability building – a true eScience partnership 6. Our Web Services user interactions took same amount of programming effort as parallelising craig.mudge@adelaide.edu.au IEEE eScience 2011 22 eScience in the cloud - observations of a veteran of the computer industry (but not my co-authors in this eScience paper) 1. Web Services (giving interoperability between disparate services of historic proportion) could have been adopted faster in eScience craig.mudge@adelaide.edu.au IEEE eScience 23 craig.mudge@adelaide.edu.au IEEE eScience 2011 (Mudge, 2002) 24 (Mudge, 2002) 25 craig.mudge@adelaide.edu.au IEEE eScience 2011 eScience in the cloud - observations of a veteran of the computer industry (but not my co-authors in this eScience paper) 1. Web Services (giving interoperability between disparate services of historic proportion) could have been adopted faster in eScience 2. Cloud computing will speed up the use of web services , because cloud makes it natural to interact using web services (service craig.mudge@adelaide.edu.au IEEE eScience orientation, discovery, interoperability) 26 Lessons learned – HPC programming 1. MapReduce (Hadoop) is the programming model that best matches data centre as the computer. However, because it requires rewrite of existing programs, the first wave of benefits come from simpler parallelism – parameter sweeps, Monte Carlo simulation, job-level parallelism, etc. 2. Second wave of benefits will be new algorithms and rewrites using MapReduce 3. Nevertheless, the first wave in cloud-based bioinformatics (matching short reads against reference genome) did use MapReduce craig.mudge@adelaide.edu.au IEEE eScience 27 Lessons learned - Azure 1. Why was Azure much harder to migrate to than predicted? Answer: - We came from a non .Net environment - Azure younger than Amazon (2 years) - - Virtual Machine in Beta Deployment times 20 minutes vs 20 seconds slows debugging Azure designed for long running applications, e.g., ecommerce, more than for scientific 2. However, we persist. - Warehouse-sized data centre – operating system is robust and rich, e.g., hot swap of patches - Benefits of PaaS craig.mudge@adelaide.edu.au IEEE eScience 2011 28 Future work craig.mudge@adelaide.edu.au IEEE eScience 2011 29 Future work 1 of 2 1. Inversion on demand, available to colleagues and explorers world-wide, wrapped in workflow (persistence, provenance, partial runs, ...) 2. National/international collaboration building on a national Geophysics Virtual Lab - access to disparate data (seismic, borehole images, gravity, magnetic, ...) built by Auscope using results of GeoSciML Interoperability Working Group craig.mudge@adelaide.edu.au IEEE eScience 30 Societal Need Sustainable Energy Policy Environment Virtual Laboratory Energy Exploration Integrated Virtual Laboratory Virtual Geophysical Laboratory National Borehole Laboratory Processing Services Processing Services Data Geophysics Virtual Geodesy Laboratory Processing Services Data Borehole Virtual Earth Observation Laboratory Processing Services Data Geodesy Land cover craig.mudge@adelaide.edu.au Virtual Laboratories Modelling & analytic tools Processing Services Data Dr Robert Woodcock and Dr IEEE Lesley Wyborn eScience 2011 Virtual Oceans Laboratory Integrated Virtual Labs Data Virtual Libraries Marine 31 Future work 2 of 2 3. Explore statistical machine learning to detect interesting patterns 4. Exploring solution space using Evolutionary Algorithms implemented on thousands of processors in the cloud (Brad Alexander) 5. Promulgate security best practices 6. Following the success of speedup, model size has become the limiter for our geophysicists craig.mudge@adelaide.edu.au IEEE eScience 32 Acknowledgements Brad Alexander Gordon Bell Pinaki Chandrasekhar Dennis Gannon Graham Heinson Tony Hey Ed Lazowska Stephan Thiel craig.mudge@adelaide.edu.au IEEE eScience 33 Summary 1. 2. 3. 4. 5. 6. Cloud computing Collaborative Cloud Computing Lab (C3L) Inversion in magnetotelluric processing Geothermal – EGS in South Australia Lessons learned Future work Thanks and questions craig.mudge@adelaide.edu.au www.cloudinnovation.com.au +61 417 679 266 +1 650 224 2111 craig.mudge@adelaide.edu.au IEEE eScience 2011 35 Security best practices 1. 2. 3. 4. 5. 6. 7. 8. Certifications Physical security Secure services Data privacy via encryption Backups Constant monitoring External review Compare yours with Google, Amazon, Azure craig.mudge@adelaide.edu.au IEEE eScience 36