Patterns for E-Research Dave Berry, Research Manager E-Research within the University of Edinburgh, 2nd March 2005 E-Research “The invention and application of computing methods to extend our capabilities in any research discipline” “Research in any discipline which benefits from and often depends on the use of advanced facilities and methods for computation, data curation, digital communication and visualisation” Performance per Dollar Spent Technology Growth Optical Fibre Doubling Time 9 12 Gilder’s Law (32X in 4 yrs) (bits per second) (months) 18 Data Storage Storage Law (16X in 4yrs) (bits per sq. inch) Chip capacity (# transistors) 0 1 2 Moore’s Law (5X in 4yrs) 3 4 5 Number of Years Triumph of Light – Scientific American. George Stix, January 2001 Pattern 1: Distributed Collaboration Groups in different sites working together Sharing knowledge and ideas Technologies: Shared repositories Wikis, SourceForge/NeSCForge, Forums, … Videoconferencing Computer Supported Cooperative Work (CSCW) Technology: Access Grid Microphones Cameras Pattern 2: Simulation & Modelling Large variety of topics, e.g. Protein folding Position of atoms in semiconductors Human heart Ecology of ice sheets Multiple scales Remote visualisation and control Example: The TeraGyroid Scientific Experiment High-density isosurface of the late-time configuration in a ternary amphiphilic fluid as simulated on a 643 lattice by LB3D. Gyroid ordering coexists with defect-rich, sponge-like regions. The dynamical behaviour of such defect-rich systems can only be studied with very large scale simulations, in conjunction with highperformance visualisation and computational steering. See http://www.realitygrid.org/workshop-2004/presentations/blake.ppt Example: Terrestrial Carbon Dynamics Pattern 3: Data archives Data archives maintain data for widespread use, e.g. UK Borders, Go-Geo, … (EDINA) ArkDB (Roslin) Mouse Atlas (HGU) EMBL, UniProt, … (EBI) Census, … (MIMAS) Client-server access Schemas defined centrally Often subject to change… … if they’re defined at all! Infrastructure: Digital Curation Centre communities of practice: users curation organisations eg DPC community support & outreach Collaborative Associates Network of Data Organisations service definition & delivery management & admin support research research collaborators development co-ordination testbeds & tools Industry standards bodies Pattern 4: Federated data Sites maintain their own data Remote access to other sites Control access to your site Integrated views Community-defined schemas Translation between schemas Distributed algorithms Run jobs remotely Distributed data mining Example: Mass-scale Data Mining Pattern 5: Parameter Search Run the same algorithm on different data, e.g. Finding local minima Combinatorial search Allows the use of multiple machines, e.g. A cluster Multiple clusters Desktop PCs Example: ClimatePrediction.net See www.climateprediction.net Composing Patterns Patterns that compose… Complex problems require many inputs and many processes Shared contributions compose indefinitely, accumulating knowledge … and how to compose them A common infrastructure Technologies, naming, schemas, … Workflow languages Portals and “problem-solving environments” Example: BRIDGES (BioInformatics) CFG Virtual Publically Curated Data Ensembl Organisation OMIM Glasgow SWISS-PROT Private Edinburgh MGI Authorisation data Private data Oxford HUGO … RGD Leicester DATA HUB Private data Netherlands Synteny Grid Service Private data London Private data Private data + Example: FireGrid (proposal) 1000s of sensors & gateway processing Emergency Responders KBS and Planning Super-real-time simulation (HPC) Maps, models, scenarios Mont Blanc Kings Cross Piper Alpha WTC Kob e Practical Challenges Technical A variety of partial answers Standardisation work is long and political Social Sharing of resources means sharing YOUR resources Contributor recognition and IPR Defining common schemas and ontologies Training, funding for software developers and sysadmins Responsibility of data publishers Cost, dependability, trustworthy, capable, flexibility, … Management of infrastructure Operation – NGS (national), ACF (local) Funding