Patterns for E-Research Dave Berry, Research Manager E-Research within the University of

advertisement
Patterns for E-Research
Dave Berry, Research Manager
E-Research within the University of
Edinburgh, 2nd March 2005
E-Research
“The invention and application of computing
methods to extend our capabilities in any
research discipline”
“Research in any discipline which benefits
from and often depends on the use of
advanced facilities and methods for
computation, data curation, digital
communication and visualisation”
Performance per Dollar Spent
Technology Growth
Optical Fibre
Doubling Time
9 12
Gilder’s Law
(32X in 4 yrs)
(bits per second)
(months)
18
Data Storage
Storage Law
(16X in 4yrs)
(bits per sq. inch)
Chip capacity
(# transistors)
0
1
2
Moore’s Law
(5X in 4yrs)
3
4
5
Number of Years
Triumph of Light – Scientific American. George Stix, January 2001
Pattern 1:
Distributed Collaboration
Groups in different sites working together
Sharing knowledge and ideas
Technologies:
Shared repositories

Wikis, SourceForge/NeSCForge, Forums, …
Videoconferencing
Computer Supported Cooperative Work
(CSCW)
Technology:
Access Grid
Microphones
Cameras
Pattern 2:
Simulation & Modelling
Large variety of topics, e.g.
Protein folding
Position of atoms in semiconductors
Human heart
Ecology of ice sheets
Multiple scales
Remote visualisation and control
Example:
The TeraGyroid Scientific Experiment
High-density isosurface of
the late-time
configuration in a
ternary amphiphilic fluid
as simulated on a 643
lattice by LB3D.
Gyroid ordering coexists
with defect-rich,
sponge-like regions.
The dynamical behaviour
of such defect-rich
systems can only be
studied with very large
scale simulations, in
conjunction with highperformance
visualisation and
computational steering.
See http://www.realitygrid.org/workshop-2004/presentations/blake.ppt
Example:
Terrestrial Carbon Dynamics
Pattern 3:
Data archives
Data archives maintain data for
widespread use, e.g.
UK Borders, Go-Geo, … (EDINA)
ArkDB (Roslin)
Mouse Atlas (HGU)
EMBL, UniProt, … (EBI)
Census, … (MIMAS)
Client-server access
Schemas defined centrally
Often subject to change…
… if they’re defined at all!
Infrastructure:
Digital Curation Centre
communities of
practice: users
curation organisations
eg DPC
community
support &
outreach
Collaborative
Associates
Network of
Data
Organisations
service
definition
& delivery
management
& admin
support
research
research
collaborators
development
co-ordination
testbeds
& tools
Industry
standards bodies
Pattern 4:
Federated data
Sites maintain their own data
Remote access to other sites
Control access to your site
Integrated views
Community-defined schemas
Translation between schemas
Distributed algorithms
Run jobs remotely
Distributed data mining
Example:
Mass-scale Data Mining
Pattern 5:
Parameter Search
Run the same algorithm on different data,
e.g.
Finding local minima
Combinatorial search
Allows the use of multiple machines, e.g.
A cluster
Multiple clusters
Desktop PCs
Example:
ClimatePrediction.net
See www.climateprediction.net
Composing Patterns
Patterns that compose…
Complex problems require many inputs and
many processes
Shared contributions compose indefinitely,
accumulating knowledge
… and how to compose them
A common infrastructure

Technologies, naming, schemas, …
Workflow languages
Portals and “problem-solving environments”
Example:
BRIDGES (BioInformatics)
CFG Virtual
Publically Curated Data
Ensembl
Organisation
OMIM
Glasgow
SWISS-PROT
Private
Edinburgh
MGI
Authorisation
data
Private
data
Oxford
HUGO
…
RGD
Leicester
DATA
HUB
Private
data
Netherlands
Synteny
Grid
Service
Private
data
London
Private
data
Private
data
+
Example:
FireGrid (proposal)
1000s of sensors
& gateway
processing
Emergency
Responders
KBS and
Planning
Super-real-time
simulation (HPC)
Maps, models,
scenarios
Mont
Blanc
Kings Cross
Piper
Alpha
WTC
Kob
e
Practical Challenges
Technical
A variety of partial answers
Standardisation work is long and political
Social
Sharing of resources means sharing YOUR resources
Contributor recognition and IPR
Defining common schemas and ontologies
Training, funding for software developers and sysadmins
Responsibility of data publishers
Cost, dependability, trustworthy, capable, flexibility, …
Management of infrastructure
Operation – NGS (national), ACF (local)
Funding
Download