X-ray single Mol Raman Jeremy Frey STM Drug Design & Delivery: The role of e-Science School of Chemistry Comb-e-Chem Sept 2004 University of Southampton, UK Ocean Monolayer Jeremy Frey e-Science • ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ • ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor, DG of UK OST • ‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’ Tony Blair, 2002 Jeremy G. Frey The UK e-Science Challenge • £120M over a 3 Year Programme to create the next generation IT infrastructure to support e-Science and Business • Essential that UK plays a leading role in Global Grid development with the USA and EU • Phase 1: Started roll out of plan for Grid Research, Development and Support of Jeremy G. Frey e-Science Pilot Projects UK e-Science Grid Edinburgh Glasgow DL Belfast Newcastle Manchester Cambridge Oxford Cardiff RAL London Southampton Jeremy G. Frey Hinxton National e-Science Centre (NeSC) • NeSC is in Edinburgh • Provides Courses & Meetings • Also has some funding for fellowships to visit NeSC Jeremy G. Frey The Collaboratory Concept • In 1989, William Wulf, then with the U.S. National Science Foundation, defined a collaboratory as "a center without walls, in which the nation's researchers can perform their research without regard to geographical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, and accessing information in digital libraries." Jeremy G. Frey The Current “Client – Server ad hock” model HPC Experiment Analysis Storage HPC Scientist Experiment Computing Storage Analysis HPC Jeremy G. Frey The Future The Grid Model - Information Utilities Scientist M I D L E W A R E Experiment Analysis Computing Storage Storage Analysis Experiment Computing Computing Jeremy G. Frey Storage Access Grid • Full multi-site video conferencing over the IP network • Many sites now in the UK all running the same system • System originated in the USA so also sites there. Jeremy G. Frey Access Grid nodes Jeremy G. Frey Access Grid The Grid • Grid is needed because – Volume of data (real time data, images, video) – Scale of computation (analysis, simulation) – Complexity of process (automation) – Variable demands on computation – Provenance (audit trials, timestamps, process) Jeremy G. Frey •Comb-e-Chem Partners •IBM •IT •Innovation •NCS •CCDC •ECS •Chemistry •Stats •Combi •Centre •Pfizer •Bristol •Chemistry •GSK •Southampton •AZ Jeremy G. Frey •IUPAC •RSC CombeChem People & Places AZ GSK Pfizer IBM Jeremy G. Frey People • Chemistry (Southampton & Bristol) – Mike Hursthouse, Chris Frampton, Jon Essex, Jeremy Frey, Guy Orpen, Stephan Christensen, Thomas Gelbrich, Sam Peppe, Hongchen Fu, Graham Tizard, Suzanna Ward, Lefteris Danos • National Crystallography Service (NCS) – Simon Coles, Mark Light, Ann Bingham • Electronics and Computer Science (Southampton) – Dave De Roure, Luck Moreau, Mike Luck, Hugo Mills, Graham Smith, Simon Miles, Nicky Harding, Gareth Hughes, monica Schraefel, Terry Payne • It-Innovation (Southampton) – Mike Surridge, Ken Meacham, Steve Taylor, Daren Marvin • Statistics (Southampton) – Alan Welsh, Sue Lewis, Ralph Manson, Dave Woods • Rutherford Appleton Laboratory Jeremy G. Frey Design Plan Goal Synthesis Dissemination All steps must be Grid Aware Structure Prediction Properties Modelling Analysis & Correlation I will illustrate the application of e-Science to some of these stages using examples from the Comb-e-Chem Project Jeremy G. Frey Salt Selection Design Plan Descriptors Goal Smart Lab Semantic Grid Synthesis Combinatorial Chemistry Dissemination All steps must be Grid Aware Structure Prediction Crystallography Simulations Properties Non-linear optical effects Publication@Source Modelling Analysis & Correlation Structural Similarities With examples……. Jeremy G. Frey The Comb-e-Chem Project • The exponential world of Combinatorial Synthesis and High throughput analysis meets the exponentially growing power of computing • Funding EPSRC, IBM, GSK, AZ, Southampton Jeremy G. Frey The Comb-e- Chem Vision Structure + Properties Knowledge + Prediction Structures DB Automation & Remote interaction Properties DB Simulation and calculation Jeremy G. Frey Co-Laboratory Interaction between users & “Dark Labs” Properties Models Design Experiment Automation Structures Analysis Jeremy G. Frey All about Automation • Design • Synthesis • Measurement Experiments Information & Knowledge • Analysis • Databases • Agents Jeremy G. Frey Goal Knowledge Literature Plan & COSHH Report Smart Laboratory Information Integration Digital Model Analysis Synthesis Jeremy G. Frey Goal Knowledge Literature Plan & COSHH not just one laboratory but many co-laboratories working together Digital Model Report Information Integration Analysis Synthesis Smart Laboratory Jeremy G. Frey Making best use of the Plan COSHH Jeremy G. Frey Smart Lab http://smarttea.org Jeremy G. Frey Smart Help http://smarttea.org Jeremy G. Frey Laboratory Context COSHH Plan Record Annotation Guide Experimenters Jeremy G. Frey Digital Context Chemistry Starts in the Lab URI URI URI Lab Lab Lab URI NCS Raw data URI URI URI Database Publication Structure URI Jeremy G. Frey Semantic Grid Project • Inference based on the semantics • Importance of Ontology • But problem of contradictions even within a domain • This is not an avoidable issue Jeremy G. Frey XML But need more general descriptions for services RDF – resource description framework DAML-S (for describing services) Interface Simulation program Gaussian ab initio program XML wrapper XML wrapper Personal Agent Jeremy G. Frey Databases • Database will become the key method of handling all data • Metadata must be generated at inception and added as data traverses the workflow • Version control, audit and backup handled at the database level. Jeremy G. Frey Talk • The UK e-Science Programme • The Comb-e-Chem Project • “Smart Lab” • NCS Grid Service • Structure Analysis Services • Dissemination & Publication Jeremy G. Frey Centralised remote equipment, multiple users, few experts Users Users Users Data & control links Experiment Experiment Expert Access Grid links Remote (Dark) Laboratory •Model for National crystallographic Service NCS Jeremy G. Frey Expert Expert is the central resource in short supply Users Users Users Experiment Experiment Experiment Access grid & control links Manufacturer Support Service Local link “External” link Jeremy G. FreyRaman Project •Model for Combinatorial Smart Labs Synthesis Sample NCS Archive Raw images Processed diffraction pattern Structure CCDC Validation Database CIF metadata Automated structure determination Jeremy G. Frey Journal Archiving of Data RAW DATA: Automatic archiving and retrieval with Atlas Datastore (RAL) Development of schema for retrieval of crystallographic metadata from relational databases (ISIS Data analysis group) Storage Resource Broker (SRB): Uniform access interface to different types of storage devices RESULTS DATA: Automatic deposition of CIF data with CCDC GRIDenabled pre-deposition database Jeremy G. Frey Data Trail • Drill down through the analysis path • Look at increasingly raw data • Often large expansion in quantity and variety at each stage Jeremy G. Frey Publication@Source • Must be able to track back to the original data • Primary reason is to allow new analysis in the future by other researchers. • In a university environment this may be viewed as a public responsibility in business environment ensuring maximum value from investment. • Does have implications for provenance and even fraud! Jeremy G. Frey Journals: Publication @ source Database Journal Journal Paper Materials Laboratory Data Multimedia Jeremy G. Frey “Full” record Publication Chain Bibliography Student Journal Professional Body Archive Institution Jeremy G. Frey Laboratory e-Bank Project • Link comb-e-chem and other semantic grid science projects to the e-print system at Southampton • Provide dissemination and provenance Jeremy G. Frey Changing the way we work E-Lab: X-Ray Crystallography Samples Quantum Mechanical Analysis Data Provenance Authorship/ Submission Samples Laboratory Processes Laboratory Processes Structures DB E-Lab: Combinatorial Synthesis Properties Prediction Data Mining, QSAR, etc Laboratory Processes Properties DB Design of Experiment Data Streaming Visualisation Agent Assistant Jeremy G. Frey E-Lab: Properties Measurement