Uptake and Sustainability of e-Research Technologies Alexander Voss alex.voss@ncess.ac.uk National Centre for e-Social Science and e-Science Institute 25th Oct., 2006 e-Science …the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists. (Research Councils UK) Goal: to enable better research in all disciplines, to enable research that was not feasible previously 25th Oct., 2006 2 Drivers Technical – Faster, cheaper devices, higher resolutions, increased throughput, cheaper and higher capacity storage, increased bandwidth, etc. Research Process: coping with the data deluge – – – – – – Finding and accessing data Independent provision and ownership, local policies Linking data Processing data Interpreting data Presenting results Increased international collaboration Doing what was previously impossible 25th Oct., 2006 3 e-Research in the UK UK e-Science Programme (since 2001) International Programmes (esp. US, EU) Supported data and information services Access to scientific facilities Communities developing resources, systems and practices Pilot projects in most areas Core middleware development and code repositories 25th Oct., 2006 Replace with google map 4 Grid Technologies ‘An infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources.’ (Ian Foster and Carl Kesselman) Like the power grid, the Grid makes services available through common interfaces without the user having to worry about the details of how these services are provided. 25th Oct., 2006 5 Grid Technologies What characterises a grid? – Coordinated resource sharing – Standard, open, general-purpose protocols and interfaces – Delivering non-trivial qualities of service Vision of “the Grid” not yet achieved and there are reasons why it may never be, but many ‘grids’ which each support one or more… ‘Virtual organisations’: people in different organisations seeking to cooperate and share resources across their organisational boundaries 25th Oct., 2006 6 The Web, the Grid and the Internet The Web provides access to documents (at least that is what it has been designed for) Communication between remote servers and person using web browser 25th Oct., 2006 7 The Web, the Grid and the Internet The Web provides access to documents (at least that is what it has been designed for) Communication between remote servers and person using web browser The Grid provides access to resources - data, computation, experimental apparatus, etc. Communication between resources brought together to perform an overall task Does not replace but rather complements the Web The Internet is the common underlying infrastructure for both - it provides connectivity and basic management of networks independent of their use. 25th Oct., 2006 8 Grid Middleware Grids support a vast range of applications accessing a range of different resources. Grid Middleware provides the ‘glue’ that binds them together. Open, standardised interfaces reduce the complexity of the interface between resources and applications. Applications Grid Middleware Resources 25th Oct., 2006 9 Related (but not identical) concepts Internet computing (e.g., FightAIDS@Home) Peer-to-peer computing (e.g., Napster) Utility computing (e.g., Sun, IBM) Cluster computing (e.g., supercomputing, reliability) Distributed systems (e.g., e-Business) Groupware (e.g., Lotus Notes) 25th Oct., 2006 10 What is e-Research? Extension of the concept of e-Science into other domains (social sciences, arts & humanities) and an extension from large research institutions into all parts of life where research might be conducted (e.g. in schools or at home). Recognising the ‘small steps’ that are sometimes crucial in ‘big science’. Grid technologies are not sufficient on their own to enable the vision of e-Science, other elements are needed, e.g., data sharing agreements, changed reward structures, domain standards, etc. Focus more on uses of ICTs in research than on the technology per se. e-Research is an emerging phenomenon - we all make use of modern ICT infrastructures in our daily research activities. 25th Oct., 2006 11 Example 1: Integrative Biology Overview - Integrative Biology IB is an EPSRC-funded eScience project tackling UK’s two biggest killers: cancer and heart disease through large-scale multi-scale simulations. Globally distributed and interdisciplinary community: US, Europe, New Zealand Developing a web-services based grid infrastructure providing tailored access to compute and data resources. Courtesy of Matthew Mascord, Oxford e-Research Centre Integrative Biology VRE 13 Heart Modelling Requires access to compute resource, data management facilities, visualisation capability and collaborative working tools. Typically solving coupled systems of PDEs (tissue level) and non-linear ODE’s (cellular level) for the electrical potential.Complex three-dimensional geometries Investigation of how ischemic tissue interacts with electric shocks in order to improve defibrillation efficacy in patients with coronary heart disease (Tulane/Oxford Courtesy of Tulane/Oxford Image is part of a study to figure out the arrangement of different cell types in the heart wall that accounts for the shape of the T wave in the ECG Courtesy of Richard Clayton, Sheffield Courtesy of the Integrative Biology Consorium, funded by EPSRC Visualization of Cardiac Partners: Oxford, Sheffield,New Orleans, Washington Lee, UCSD,UCLA, Baltimore, Monash, Auckland Graz, Utrecht Virtual Tissue 14 Example 2: e-Social Science (Grid-enabled microeconomic data analysis) Background Social Science Problem And Policy Issue • Researchers frequently have to use more than one data set in order to obtain a more complete answer to their questions What do we know about ethnic minority economic welfare when it is disaggregated by group and geography • One data set may provide a large sample of the target population, but offer incomplete coverage of the topics of interest Census data can lack direct measures of income • Another data set with coverage of the topics of interest may not sample the target population adequately Survey data yield minority samples that may be too small for meaningful results to be obtained Courtesy of Simon Peters Data The British Household Panel Survey (BHPS) provides the small scale survey data. •BHPS is a longitudinal (panel) study with yearly waves. The Sample of Anonymised Records (SARs) provides the large scale Census data. •SARs are a random sample of individuals and households from the UK Census Uses 1991 data because of projected confidentiality restrictions on the publicly available version of the 2001 SARs. •2% sample of individuals, 1% sample of households. Courtesy of Simon Peters Courtesy of Simon Peters Courtesy of Simon Peters Example 3: Environmental e-Science (Grid for Ocean Diagnostics Interactive Visualisation and Analysis) Exploring environmental data with Google Maps and Google Earth • • “Godiva2” website provides very quick visualisations of numerical model and satellite data Scientists use an interactive website to select dataset to visualise on a draggable, zoomable map – can view data at large range of scales • Can then view same data in Google Earth – 3-D globe – Lightweight, easy to use GIS tool – Can visualise alongside other datasets • • • Courtesy of Jon Blower Don’t have to download any data! Images generated dynamically on the server Spin-off from GODIVA project Demo • http://lovejoy.nerc-essc.ac.uk:8080/Godiva2/ Example 4: Archaeology (Silchester Roman Town) Courtesy of Michael Fulford 25th Oct., 2006 24 Courtesy of Michael Fulford 25th Oct., 2006 25 Courtesy of Michael Fulford 25th Oct., 2006 26 Integrated Archaeological Database Courtesy of Michael Fulford Silchester: A VRE for Archaeology Examples have show instances of: Use of public datasets Confidentiality issues Use of high-performance computing Fieldwork - not all research happens inside! Mapping geographies Record linkage (coping with incomplete data) Use of national infrastructures Collaborative activities Various web-base, desktop and mobile user interfaces Management of large datasets Meta-data: where is data from, what can be said about it? Mining data Using data previously thought worthless or intractable 25th Oct., 2006 28 Research Challenges Dealing with complexity and heterogeneity Just four examples have highlighted the complexity and heterogeneity of what is meant by ‘e-Research’. There tend to be similarities as well as differences between the needs of different researchers. This is where the chance lies for building common infrastructures while supporting – a wide range of different research activities – and different kinds of resources, – across organisational contexts. Need to know about the different cultures in, say, particle physics and sociology. Teasing out the similarities and differences is an important part of realising e-Research. 25th Oct., 2006 30 We need to know more about: The early adopters, the interested, the disengaged What motivates people to collaborate and share What the barriers to entry are and how they can be overcome How e-Science endeavours can be effectively and efficiently managed in different organisational contexts How we can manage user-designer relations to ensure what we build is useful and usable. How we can ensure people have reasonable expectations of what can and cannot be done How we engage future generations of researchers to engage in research in the first place and to make use of the vast potential of eResearch. And other socio-technical issues 25th Oct., 2006 31 Broad themes Supporting Innovation and Diffusion Improving usability Fostering new forms of research and community Deployability, configurability and sustainability National and International Comparisons Measuring Impact of e-Research 25th Oct., 2006 32 Commodification The process that transforms the market from a collection of individual, proprietary and idiosyncratic products to one that defines open standards and provides competing but interoperable implementations. Aims are to: – – – – Flatten the learning curve Easy deployment Centrally provide functionality Overcome / leverage network effects OGF - engaged in standardisation OMII - repository of production software NGS - national service providing a compute grid and operations support Role that University Computing Services play: centrally and locally provided services will be required. 25th Oct., 2006 33 Project Management Proposals are sales documents! Project funding assumes a project plan is in place and work can start soon This is routinely not the case E-Research projects tend to differ from other IT development projects, e.g.: – – – – Multiple stakeholders with only partially aligned agendas Raised expectations Different ways of working and professional cultures Short timelines (funding) Project management often tells us what to do but not how to do it. Need to pay attention to the ‘seen but unnoticed’ skills of good project managers: – e.g., tackling problems arising from peoples’ different motivations and professional identities and languages 25th Oct., 2006 34 Democratic e-Research How we communicate with the wider public is crucial where we touch upon potentially contentious issues or make use of personal data. This requires further interdisciplinary work involving, e.g., ethicists and social science researchers – EthOx centre in Oxford – Innogen in Edinburgh Also, involvement of the wider public as active participants in research activities. 25th Oct., 2006 35 Democratic e-Research 25th Oct., 2006 36 User-Designer Relations in e-Research Designers of e-Research systems need to be familiar with the working practices and concerns of researchers Researchers need to understand what is possible, what is feasible and what is not, what the tradeoff between different options are This involves a degree of familiarity with the research domain and e-Research technologies. This can be achieved through: – – – – Training (e.g., bioinformatics, Grid literacy) Boundary spanning (e.g., researchers employed on projects) Facilitation (e.g., workplace studies) Shared practice (co-location, corealisation) 25th Oct., 2006 37 eSI Theme Activities Establish and consolidate what we already know – e-Research BOK: Formulating e-Research practices – 1st step: Realising e-Research Endeavours, call to be issued end November, workshop in March ‘07, write-up soon after Identify major gaps and address through – Targeted research (focused observational studies, interviews, surveys, depending on the issue at hand) – in collaboration with other projects as well as – seeking additional grants Workshops and Visitors as input and control mechanism Raise awareness of e-Research in the communities Aim to drive technical development 25th Oct., 2006 38 Prior Work Usability Task Force: Usability Research Challenges in e-Science JISC Human Factors Audit of Selected e-Science Projects Angela Sasse and Brock Craft: Security and Usability of Grid Projects: Implications for e-Science Paul David: Towards a Cyberinfrastructure for Enhanced Scientific Collaboration: providing its ‘soft’ foundations may be the hardest part 25th Oct., 2006 39 Related Activities Projects funded under EPSRC Usability Call AVROSS (EU Strep): e-Social Science JISC e-Infrastructure Call (Issued end Sept.) – Barriers to Uptake – Service Usage Models (practice templates) SUPER: informing prioritisation of e-Infrastructure work Usability Task Force Portal: assembling a network of people working in this area and disseminating results 25th Oct., 2006 40 Outlook There is still much to be done to deliver the promise of e-Science and to extend its uptake. Each step requires work. More research communities will need to agree their methods of collaboration. The technology requires further development, in particular to make it more usable, versatile and economic. And production support requires more operational experience and extension of arrangements for sharing. It is now time to expand its application across the academic world and to introduce it to students as well as to academics. (Malcolm Atkinson writing in THES) 25th Oct., 2006 41 Credits Malcolm Atkinson, e-Science Envoy and Director of the e-Science Institute Anna Kenway, Deputy Director of the e-Science Institute Rob Procter, Research Director, NCeSS Tom Rodden, University of Nottingham and Usability Task Force Colleagues who have kindly allowed me to use their slides 25th Oct., 2006 42 Questions, please…