UK e-Science OGC Technical Committee Edinburgh Malcolm Atkinson Director & e-Science Envoy e-Science Institute www.nesc.ac.uk 28th June 2006 Overview Brief History E-Science, Grids & Service-oriented Architectures (Geo)Data Deluge Causes of Growth Interpretational challenges Crucial Issues Usability & Abstraction Interoperation & Federations What is e-Science? Goal: to enable better research in all disciplines Method: Develop collaboration supported by advanced distributed computation to generate, curate and analyse rich data resources X X From experiments, observations and simulations Quality management, preservation and reliable evidence to develop and explore models and simulations X X Computation and data at all scales Trustworthy, economic, timely and relevant results to enable dynamic distributed collaboration X X Facilitating collaboration with information and resource sharing Security, trust, reliability, accountability, manageability and agility A Grid Computing Timeline 8 F ‘9 s G C 5 G S um ‘9 t r a o g rm o s F it n f , m d u i r e p d g fo Gr e r r m e rm e o r P m p e C m a r p fo ru & A p s e a ” o p p .0 G m y F n ” u g 1 u y r v S d ea lo -W i o m : o r i A A F o p y t s S G ro S a y a d i h n G G r S u W U E O IG … “P O “A 1995 ’96 ’97 ’98 ’99 2000 ’01 • • UK UKe-Science e-Scienceprogram programstarts starts •• •• •• •• •• DARPA DARPAfunds fundsGlobus GlobusToolkit Toolkit&&Legion Legion EU funds UNICORE project EU funds UNICORE project US USDoE DoEpioneers pioneersgrids gridsfor forscientific scientificresearch research NSF funds National Technology Grid NSF funds National Technology Grid NASA NASAstarts startsInformation InformationPower PowerGrid Grid Source: Hiro Kishimoto GGF17 Keynote May 2006 ’02 ’03 ’04 ’05 2006 Japan Japangovernment governmentfunds: funds: • • Business Grid project Business Grid project • • NAREGI NAREGIproject project Today: Today: • • Grid Gridsolutions solutionsare arecommon commonfor forHPC HPC • • Grid-based business solutions are Grid-based business solutions are becoming becomingcommon common • • Required technologies Required technologies&&standards standardsare are evolving evolving What is a Grid? AAgrid gridis isaasystem systemconsisting consistingof of −− −− Distributed Distributedbut butconnected connectedresources resourcesand and Software Softwareand/or and/orhardware hardwarethat thatprovides providesand andmanages manageslogically logically seamless access to those resources to meet desired objectives seamless access to those resources to meet desired objectives License License Web Web server server Handheld Server Workstation Database Database Supercomputer Cluster Data Center Printer Printer R2AD Grid & Related Paradigms Distributed DistributedComputing Computing •• Loosely Looselycoupled coupled •• Heterogeneous Heterogeneous •• Single SingleAdministration Administration Grid GridComputing Computing •• Large Largescale scale •• Cross-organizational Cross-organizational •• Geographical Geographicaldistribution distribution •• Distributed Management Distributed Management Utility UtilityComputing Computing ••Computing Computing“services” “services” ••No knowledge No knowledgeofofprovider provider ••Enabled by grid technology Enabled by grid technology Cluster Cluster •• Tightly Tightlycoupled coupled •• Homogeneous Homogeneous •• Cooperative Cooperativeworking working How Are Grids Used? High-performance computing Collaborative design E-Business High-energy physics Financial modeling Data center automation Drug discovery Life sciences E-Science Collaborative data-sharing Commitment to e-Infrastructure A shared resource That enables science, research, engineering, medicine, industry, … It will improve UK / European / … productivity X X Lisbon Accord 2000 e-Science Vision SR2000 – John Taylor Commitment by UK government X Sections 2.23-2.25 Always there X c.f. telephones, transport, power UK e-Science Budget (2001-2006) Total: £213M + £100M via JISC EPSRC Breakdown M RC (£21.1M ) 10% EPSRC (£77.7M ) 37% Applied (£35M) Staff 45% HPC (£11.5M) BBSRC (£18M ) 15% 8% NERC (£15M ) 7% costs only Grid Resources Core (£31.2M) Computers & Network PPARC 40% (£57.6M ) funded separately 27% CLRC (£10M ) 5% ESRC (£13.6M ) 6% + Industrial Contributions £25M Source: Science Budget 2003/4 – 2005/6, DTI(OST) Slide from Steve Newhouse The e-Science On The Map Today Globus Apache Project & CDIG National Centre for e-Social Science NERC e-Science Centre e-Science Institute Funded centres National Grid Service Digital Curation Centre NGS Support Centre CeSC (Cambridge) OMII-UK EGEE-II National Institute for Environmental e-Science Invest in People • Training – – – – Targeted Immediate goals Specific skills Building a workforce • Education – – – – Pervasive Long term and sustained Generic conceptual models Developing a culture Strengthens Organisation Services & Applications Develop Enriches Training Skilled Workers Prepares Invests Society Innovation Create Invests Education Graduates Prepares • Both are needed INFSO-SSA-26637 25 May 2006 Compound Causes of (Geo)Data Growth Faster devices Cheaper devices Higher-resolution all ~ Moore’s law Increased processor throughput ⇒ more derived data Cheaper & higher-volume storage Remote data more accessible Public policy to make research data available Bandwidth increases Latency doesn’t get less though Interpretational Challenges Finding & Accessing data Variety of mechanisms & policies Interpreting data Variety of forms, value systems & ontologies Independent provision & ownership Autonomous changes in availability, form, policy, … Processing data Understanding how it may be related Devising models that expose the relationships Presenting results Humans need either X X Derived small volumes of statistics Visualisations Interpretational Challenges Finding & Accessing data Variety of mechanisms & policies Interpreting data Variety of forms, value systems & ontologies Independent provision & ownership Autonomous changes in availability, form, policy, … Processing data Understanding how it may be related Devising models that expose the relationships Presenting results Humans need either X X Derived small volumes of statistics Visualisations Interpretational Challenges Finding & Accessing data Variety of mechanisms & policies Interpreting data Variety of forms, value systems & ontologies Independent provision & ownership Autonomous changes in availability, form, policy, … Processing data Understanding how it may be related Devising models that expose the relationships Presenting results Humans need either X X Derived small volumes of statistics Visualisations Collaboration Collaboration is a Key Issue Multi-disciplinary Multi-national Academia & industry Trustworthy data sharing key for collaboration Plenty of opportunities for research and innovation Establish common frameworks where possible X Islands of stability – reference points for data integration Establish international standards and cooperative behaviour X Extend incrementally Trustworthy code & service sharing also key Federation Federation is a Key Issue Multi-organisation Multi-purpose Multi-national Academia & industry Build shared standards and ontologies Require immense effort Require critical mass of adoption Trustworthy code & e-Infrastructure sharing Economic & social necessity Major Intellectual Challenges Require Many approaches to be integrated Many minds engaged Many years of effort Using the Systems Requires well-tuned models Well-tuned relationships between systems & people Flexibility, adaptability & agility Enabling this Is itself a major intellectual challenge