National e-Science Centre Local Developments Dr Richard Sinnott Dr Dave Berry Technical Director National e-Science Centre ||| Deputy Director (Technical) Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk Research Manager National e-Science Centre University of Edinburgh daveb@nesc.ac.uk 5th February 2004 Overview NeSC Role in UK e-Science NeSC Edinburgh developments e-Science Institute Infrastructure/set-up Projects Plans NeSC Glasgow developments Infrastructure/set-up Projects Plans Conclusions NeSC’s Role Help coordinate and lead the UK e-Science Programme Community building activities, regional support & outreach Grid building as a member of the Engineering Task Force Skill building through training events & support centre Help establish the UK’s international role International meetings, standardisation work & presentations Undertake R&D projects To deliver reliable middleware To engage industry To stimulate the uptake of e-Science technology and methods Run the e-Science Institute Knowledge building through workshops and conferences Research visitors and events NeSC at Edinburgh: Recent Developments Globus Alliance Digital Curation Centre Edinburgh, Glasgow, UKOLN, CCLRC New e-Science Lecturer (Particle Physics) Training Team PPARC and EGEE funding Manager + 4 trainers Europe-wide role DAI Two (Extension of OGSA-DAI) OGSA Test Grid Digital Curation Centre communities of practice: users curation organisations communit y support & outreach Collaborative Associates Network of Data Organisations services management & coordination research research collaborators development testbeds & tools Industry standards bodies e-Science Institute A meeting place The focus for presenting UK e-Science Visiting researchers Collaborate in our research and development Engage in and develop our event programme Build bridges with their community Visits last between one week and six months Research-oriented event programme e-Science research topics Training to e-Science research teams eSI Workshops Space for real work Crossing communities Creativity: new strategies and solutions Written reports Scientific Data Mining, Integration and Visualisation Suggestions Grid Information Systems always Portals and Portlets Virtual Observatory as a Data Gridwelcome! Imaging, Medical Analysis and Grid Environments Open Issues in Grid Scheduling Data Provenance & Annotation e-Science Workflow Services GeoSciences & Scottish Bioinformatics Forum http://www.nesc.ac.uk/events/ Projects OGSA-DAI/DAIT, MS.NETGrid, SunDCG, GridWeaver, BRIDGES, PGPGrid, FirstDIG, ODD-Genes EGEE, NextGrid OGSA Test Grid, IBM Early Evaluation edikt Publishing Scientific Data GridPP, AstroGrid, QCDGrid, RealityGrid Portal Biological Spatio-Temporal Databases CoAKTinG, Grid-enabled Modelling Tools and Databases for Neuroinformatics, e-Diamond Dynamic Configuration of Grid Fabrics, Dependable Grid Services, Deductive Synthesis Techniques, Inferring QoS Properties for Grid Applications, Mobile Resource Guarantees TIES, TIES-II The Virtual Observatory International Virtual Observatory Alliance UK, Australia, EU, China, Canada, Italy, Germany, Japan, Korea, US, Russia, France, India How to integrate many multi-TB collections of heterogeneous data distributed globally? Sociological and technological challenges to be met Data Services GGF Data Access and Integration Svcs (DAIS) OGSI-compliant interfaces to access relational and XML databases Needs to be generalized to encompass other data sources (see next slide…) Generalized DAIS becomes the foundation for: Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated? Data Access & Integration Services 1a. Request to Registry for sources of data about “x” SOAP/HTTP Registry 1b. Registry responds with Factory handle service creation API interactions 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3c. Results of query returned to client as XML 2b. Factory creates GridDataService to manage access Grid Data Service XML / Relationa l database 3b. GDS interacts with database edikt Standards Requirements analysis Technology matchmaking E-Science Apps CS Research Edikt project Gap filling Grid Services for e-Science Data Management Rigorous engineering Commercial SW components and skills The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SAB SHEFC funded research and development grant 3 years funding: May 2002 – 2005 +3 years funding upon successful project and review ELDAS – Data Access Service Grid User1 ELDAS runs anywhere Suitable for grid & web Grid User2 Grid Proxy ELDAS DAC Xindice DB Web User1 Web Servlet Java Framework EJB - DAS DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Implemented using Enterprise Java Beans Data Access Components interface to distinct DBMSs Accessible as a grid data service or a web data service BinX – accessing legacy binary data simulations The Problem: Many binary data files Applications must “know” the data format Binary data formats are machine-specific Binary Binary Data File Binary Data File Data File The Solution: Write a “stand-aside” format description in XML Provide a library to Interpret the description Provide file access across different machines Build higher-level services BinX file describes binary file structure BinX Library e-Science Application NeSC at Glasgow E-Science Hub Externally Glasgow end of NeSC – Involved in UK wide activities » ETF: In May 2003 became first UK e-Science Centre to run integration tests across every site of the UK (Level 2) Grid. Therefore 100% access to UK Grid resources at this time – Public visibility of NeSC » responsible for NeSC web site Internally Focal point for e-Science research/activities at Glasgow Work closely with foundation departments – Department of Computing Science – Department of Physics & Astronomy Also working closely with other groups including – Bioinformatics Research Centre – Electronics and Electrical Engineering – Biostatistics, … Glasgow e-Science Investment Major investment by university 230m2 of newly renovated floor space in Kelvin Building offices access grid facility training room – equipped with 20PCs/server for training courses Funding Technical Director Resource Consolidation at Glasgow Building around ScotGrid Providing shared Grid resource for wide variety of scientists inside/outside Glasgow Particle physicists, computer scientists, electronic engineers, bioinformaticians, … Focal point, knowledge pool, primary resource for e-Science activity at Glasgow Target shares Shared Resources: Disk ~15TB Hardware CPU ~ 330 1GHz • 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory CDF • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet BIO LHC • 1TB disk – 60% PP, 20% Bioinf, 20% open share… • • • • • • • • LTO/Ultrium Tape Library Cisco ethernet switches IBM X Series 370 PIII Xeon with 32 x 512 MB RAM 5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory CDF 7.5TB Raid disk Projects with NeSC Glasgow Involvement DCC National Digital Curation Centre AMUSE Autonomous Management of Ubiquitous Systems for e-Health P2Popt Performance measurement & mgt of 2-Layer Peer to Peer NWs… PGPGrid Peppers Ghost Productions Equator Environmental e-Science Interdisciplinary Research Project BPS Biochemical Pathway Simulator BRIDGES Overview of BRIDGES Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES) NeSC (Edinburgh and Glasgow) and IBM 2 year project started 1st October 2003 Supporting project for CFG project Generating data on hypertension Rat, Mouse, Human genome databases Variety of tools used BLAST, FASTA, MPsrch, BLAT, Gene Prediction, visualisation, … Variety of data sources and formats Microarray data, genome DBs, project partner research data, medical records, … Aim is integrated infrastructure supporting Data federation Security CFG Partner Distribution Glasgow Shared data Edinburgh Private data Public curated data Private data Leicester Private data Oxford Private data Netherlands London Private data Private data Problems specific to BioCommunity PDB Content Growth •DBs growing exponentially!!! •Biobliographic (MedLine, …) •Amino Acid Seq (SWISS-PROT, …) •3D Molecular Structure (PDB, …) •Nucleotide Seq (GenBank, EMBL, …) •Biochemical Pathways (KEGG, WIT…) •Molecular Classifications (SCOP, CATH,…) •Motif Libraries (PROSITE, Blocks, …) •… More genomes …... Yersinia pestis Arabidopsis thaliana Buchnerasp. Aquifex APS aeolicus CaenorhabitisCampylobacter Chlamydia elegans jejuni pneumoniae Helicobacter Mycobacterium pylori leprae rat mouse Archaeoglobus Borrelia Mycobacterium fulgidus burgorferi tuberculosis Vibrio Drosophila EscherichiaThermoplasma cholerae melanogaster coli acidophilum Neisseria Plasmodium PseudomonasUreaplasma meningitidis falciparum aeruginosa urealyticum Z2491 Rickettsia SaccharomycesSalmonella prowazekii cerevisiae enterica Bacillus subtilis Thermotoga maritima Xylella fastidiosa Organisms Physiology Tissues Protein-protein interaction (pathways) Protein Structures Gene expressions Nucleotide structures Complexity of Biological Data BRIDGES Data Integration/Federation Local repository being developed Populated with data that cannot be federated e.g. public data sets with no programmatic interface Shared data sets of CFG scientists Security through X.509 PKI (authentication) PERMIS (authorisation) Will make use of e-Science technologies (OGSADAI/DAIT, ELDAS, IBM’s DiscoveryLink) Automatically keep fresh/updated data Web (Grid) services offered that allow to make use of these local data sets For example for visualising, searching, querying, … Example usage scenario … System Usage Scenario Smith W SV Authorisation Java App downloaded (via WebStart) Per user, per site DL Secure Data Repository Remote data in Oracle, DB2, Sybase, Excel, flat files, XML... BLAST results input to DB wrappers Shared/ Private Data Sets Personalised Services Generic services used by other Up to projects date OGSA-DAI Browser based clients… Client Site X Secure access for CFG VO BRIDGES Portal Push relevant data onto ScotGrid for BLAST’ing Conclusions NeSC continues to provide leadership in UK e-Science Difficult with multitude of scientific research areas, heterogeneity of systems and fluidity of technologies, GT2, GT3, WSRF, GT4…? Closer working with GridPP beneficial for everyone move towards Production Grid ScotGrid a good model for co-operation Planning for soft landing through diversification and more integration into university MRC bids, BBSRC bids, EPSRC bids, … UK e-Science operating as community for upcoming DTI funding opportunities Plans for developing Grid Computing teaching modules as part of advanced MSc Website National e-Science Centre http://www.nesc.ac.uk/ Mission, Background, Foundation Locations, Staff, Resources, Projects Register interest, Mailing lists, NeSCForge Regional associations and Collaborations News, Notices Presentations & Lectures http://www.nesc.ac.uk/presentations/ e-Science Institute http://www.nesc.ac.uk/esi/ Mission, Events (Future and Past) Register for Events, Visitor Programme UK e-Science Map and Index of Centres http://www.nesc.ac.uk/centres/ Technical Papers http://www.nesc.ac.uk/technical_papers/ Index of >100 Projects http://www.nesc.ac.uk/projects/ Task Forces http://www.nesc.ac.uk/teams/ General Information Glossary, Bibliography, Who’s who E-Science job vacancies Questions…?