The monthly newsletter from the National e-Science Centre NeSC News Issue 60 May 2008 www.nesc.ac.uk Trust, But Verify By Iain Coleman On general election night, victorious candidates always give tedious speeches in which they thank the police, polling staff and council officials. And quite right too. The integrity of the system that has just sent a parliamentary candidate to Westminster, the very foundation of their legitimacy as an MP, depends on an unbroken trail of verification from the issuing of ballot papers to the final count. This is just one example of provenance: recording the origin, changes and influences on objects or information, in order to guarantee authenticity, integrity and quality. It’s as important in science as it is in politics: if you don’t know where your data came from, or can’t be sure it hasn’t been corrupted, what confidence can you have in your scientific results? In many areas of life, though, the shift to electronic data has made it much more difficult to guarantee provenance, and science is far from immune. Traditionally, provenance has been about a paper trail. Every change is physically documented, and modifications to documents – whether legitimate annotations or attempted forgeries – can be detected by physical inspection. This isn’t the case with electronic data. As anyone who has accidentally lost the only copy of a Word document knows, it is easy to create a finished product electronically without generating a sequence of separate drafts. And data in a file can be silently altered without leaving evidence of the change. So there is a real problem in tracking the provenance of electronic data. The latest e-Science Institute theme, Principles of Provenance, is aimed at studying this problem at a fundamental level, and discovering how computing science can help to solve it. In the first public lecture of the theme, held at eSI on 15 April, Theme Leader James Cheney outlined the main problems in the field of scientific provenance, and his vision of how the theme aims to tackle them. Theme Leader - Dr James Cheney The kinds of provenance information that scientists are interested in are varied. They include data integrity and quality filtering, error correction and propagation, and intellectual credit and citation. At the moment, whatever information exists along these lines is added manually, or at best maintained by customised systems. For example, there are thousands of specialised biological databases; independent, heterogeneous, and frequently updated. Many of these are curated by the manual efforts of scientists. Curators either do this manually – which is dull, time-consuming and not very robust – or they use ad hoc systems, which entails a lot of wheel reinvention and few guarantees of quality. This is expensive, and the results vary greatly in reliability. The question is, can we develop effective systems for tracking provenance automatically? It won’t be easy. There is currently no consensus on how to define provenance, and not enough understanding of what a provenance system is for. Just storing all the information that anyone ever generates is no solution: the quantity of data simply explodes. Nor is it enough to simply store whatever users say they want. Not only can you never please everyone, people don’t always consciously know what they want and need. Part of the job of the computer scientist is to understand what the problem really is, not just what people say they want. Making this happen will, however, require more interaction between computer science researchers and potential users than is generally seen at present. The plan for this theme is to support focused research into the foundations of provenance in computer science, to bridge the gap between computer science and e-Science, to identify key problems and set the research agenda, and to disseminate results and incubate further research programmes and funding proposals. The theme will host four small symposia focusing on these areas, bringing together computer science researchers and working scientists. By the end of the theme, it is hoped, there should be a much clearer understanding of the state of the art and of the direction in which the field needs to travel. Slides and a webcast of this event can be downloaded from: http://www.nesc.ac.uk/esi/events/884/ Issue 60, May 2008 Updated Call For Papers for the UK e-Science AHM 2008 Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructures 8th – 11th September 2008 in Edinburgh, Scotland http://www.allhands.org.uk/2008/conference/venue.cfm Abstract Submission Deadline: 15th May 2008. Early Bird Registration opens: 12th May 2008 AHM 2008 is the principal e-Science meeting in the UK and brings together researchers from all disciplines, computer scientists and developers to meet and exchange ideas. The meeting is in its seventh year and normally attracts between 500 and 600 participants. The theme for this year’s meeting is Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructures. The appointment of Professor Peter Coveney (UCL) as Programme Chair heralds a new approach. This year, for the first time, key papers will be published in two back-to-back editions of Philosophical Transactions of the Royal Society A, in the early part of 2009, with the title Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructures. One of the central aims of this year’s meeting is to promote the domain specific applications aspects of e-Science, as well as building bridges between the three communities of the theme title. The general format of the meeting will include cross-community symposia (kicked off by invited key speakers) and workshops. The workshops are being championed by Programme Committee members in what are considered to be key areas of e-Science that need to be addressed, rather than by a call for workshops as has been done in the past. There will also be opportunities to present 20 minute talks. The proposed workshops include: • • • • • • • • • • • Delivering Grid Services - the role of Central Computing Services Infrastructure Provision for ‘Grids’: Infrastructure for Users Software Development for Scientific Applications: Current and Future Perspectives Information Assurance for the Grid: Crossing Boundaries between Stakeholders Frontiers of High Performance and Distibuted Computing in Computational Science Interactive e-Science to Support Creativity and Intuition in Research HPC Grids of Continental Scope Computational Biomedicine: e-Science from Molecules to Man The Global Data-Centric View e-Science in the Arts and Humanities: Early Experiments and Systematic Investigations Profiling UK e-Research: Mapping Communities and Measuring Impacts We are therefore calling for abstract submissions for: 1. General papers which are not particularly attached to a workshop. 2. Workshop papers related to the above workshops. Further details about the workshops and important information about the submission and review process, including guidelines for authors can be found at: http://www.allhands.org.uk/2008/programme/call.cfm Enquiries: Please address any enquiries about abstract submission to: admin@allhands.org.uk NeSC News www.nesc.ac.uk Issue 60, May 2008 On the Road with NGS on the Microsoft Compute Cluster Server 2003. The cluster allows various industrial and academic users to work on their problems and codes in collaboration with academics at Southampton on an HPC platform, enabled by Windows Compute Cluster Server. Further information on the Windows resources is available at: http://www.ngs.ac.uk/sites/ southampton/. The final presentation of the day was from the local OMII-UK operations manager, Tim Parkinson, who gave an overview of OMII-UK, the services they can offer users and how OMII-UK and the NGS work together. Katie Weeks presenting at the first NGS Roadshow in Southampton. Photo courtesy of Gillian Sinclair The NGS and OMII-UK held the first National Grid Service (NGS) Roadshow in Southampton on the 10th of April. Katie Weeks kicked off the presentations with an introduction to the National Grid Service and an overview of the services that NGS offers. This was followed by a presentation on the training available in conjunction with the Training, Outreach and Education (TOE) team from NeSC, including courses tailored specifically for individual research groups. Andrew Price from the University of Southampton gave a presentation on environmental simulations on the NGS, focused on the GENIE project (http://www.genie.ac.uk/). The GENIE project aims to develop a Grid-based computing framework to produce a unified Earth System Model (ESM). Dan Adams from the University of Southampton Information Systems Services (ISS) presented the Microsoft HPC resources that are available on the NGS. The Spitfire-B Cluster is based The roadshow finished off with lunch for all participants as they browsed the NGS and OMII-UK exhibition stands. Many of the participants took advantage of the new system for applying for a grid certificate at the roadshow which enables users to leave with a grid certificate on a USB key. There are more photos from the day at: http://www.flickr.com/groups/uk_ ngs/ If you are interested in hosting a NGS roadshow, then please contact Gillian Sinclair at: gillian.sinclair@manchester.ac.uk AHM 2008 National Grid Service Workshop The National Grid Service (NGS) is hosting a workshop at the annual e-Science All Hands Meeting which will be held at the University of Edinburgh on the 8th – 11th of September. The workshop, organised by Andrew Richards, Gillian Sinclair, Katie Weeks and Claire Devereux, is entitled Infrastructure Provision for Grids, Infrastructure for Users. It will focus on grid support centres, user outreach, portals and user applications, end user experiences of grid computing, subject to specific requirements from the research community and the problems of recruiting users to grid initiatives. Full details can be found at http://www.allhands.org.uk/. The deadline for paper submissions is 1st May 2008. NeSC News www.nesc.ac.uk Issue 60, May 2008 e-Science Institute Conference Round-up The NGS has been busy attending several conferences over the last month. The NGS has exhibited at the Molecular Graphics and Modelling Society Spring meeting in Cardiff, the NERC environmental e Science meeting in London, the JISC annual conference in Birmingham, the RSC Theoretical Chemistry Group graduate student meeting at the University of Manchester and BioSysBio 2008 at Imperial College London. Photos from all these conferences can be found at: http://www.flickr.com/groups/uk_ngs/ Presentations given by NGS staff at these meetings will be available soon on the NGS website. The NGS exhibition stand at the JISC conference. Photo courtesy of Gillian Sinclair. An Introduction to Writing Ontologies in the Web Ontology Language (OWL) 21 – 22 May 2008 e-Science Institute, 15 South College Street, Edinburgh, EH8 9AA The aims of this workshop are to: • • • • • • Understand the use of ontologies. Understand statements written in OWL. Understand the role of automatic reasoning in ontology building. Build an ontology and use a reasoner to draw inferences based on that ontology. Gain experience in the Protégé 4 ontology building environment. Gain insight into how OWL can play a role in semantic metadata. This two-day, introductory, ’hands-on’ workshop aims to provide attendees with both the theoretical foundations and practical experience to begin building OWL ontologies using the latest version of the Protégé-OWL tools (Protege4). It is based on Manchester’s well-known “Pizza tutorial” (see http://www.co-ode.org). This tutorial will cover the main conceptual parts of OWL through the hands-on building of an ontology of pizzas and their ingredients. A series of exercises takes attendees through the process of conceptualizing the toppings found on a pizza; the entry of this classification into the Protégé environment and the description of many types of pizza. All this is set in the context of using automatic reasoning to check the consistency of the growing ontology and to use the reasoner to make queries about pizzas. Since 2003, this tutorial, in various forms, has been given over 20 times and been attended by hundreds of budding ontologists. For registration and more details see http://www.nesc.ac.uk/esi/events/895/. NeSC News www.nesc.ac.uk Issue 60, May 2008 e-Science Institute The Science of Scholarship by Iain Coleman Science is easy. You get nice clean data files, with measurements, times and locations all lined up in neat little rows, and a cornucopia of mathematical methods and visualisation tools to help you make sense of it all. What luxury this must seem to the humanities scholar, struggling to synthesise some new understanding from scribbled documents, fragmentary records and partial accounts, all expressed in the infinitely rich, mutating forms of natural language. And how enviously they must have looked upon the emerging technologies of e-Science, allowing unprecedented levels of collaboration and knowledge exchange, built on the well-defined quantitative data sets of natural science. So when scholars started to take up the tools of e-Science for their own purposes, it’s not surprising they had to do some serious customisation. The theoretical and practical issues encountered in the creation of digital scholarly texts were examined in The Marriage of Mercury and Philology: Problems and Outcomes in Digital Philology, a workshop held at the e-Science Institute on 25-27 March. Markup languages, particularly XML, are enjoying eager use in the humanities. These can be used to characterise a text in a rich, yet still machine-readable, way. The MOM Archive, presented by Georg Vogeler (Ludwig-Maximilian University, Munich) and Mirko Gontek (University of Cologne) is an attempt to bridge the gap between abstract data models and practical scholarly work. Individual words in the underlying data can be tagged to identify names of places and people, types of statement, and so on. The text can also be annotated, and the annotations themselves tagged, building up a multilayered, complex structure from relatively simple components. This system is available on a multilingual website, and users can annotate texts to create their own critical editions, according to a system that preserves technical consistency and reusability. Whether this private edition is added to the public archive is decided by an expert moderator – this human intervention is essential to maintaining scholarly standards. A more specialised and elaborate markup system can be seen in CLELIA, a project to create a unified, electronicallyeditable collection of Stendhal’s manuscripts. Thomas Lebarbé (Université Stendhal - Grenoble 3) described how existing Stendhal databases are scattered, poorly structured, poorly interfaced and difficult to publish online due to the use of proprietary software. The challenge for CLELIA is to take manuscript pages, which may have many different kinds of marking on them from the body of the text to date headers, corrections, notes and marginal scribbles, and structure all this data in an easily usable form. By breaking down the concept of a “page” into its component parts, and building this structure into a database with a usable interface and XML transcription support, the project aims to create a comprehensive web-based archive that can be used by a wide range of users – scholars, teachers and linguists – in many different ways. This has been a multidisciplinary effort, involving computer scientists, literature specialists, and computational linguists: a demonstration version is now available, with the site officially going live in September. These digital editions mark a qualitative change in how scholars and lay readers interact with text. The concept of a critical edition is central to the humanities, and until now these have always existed as written or printed texts. An electronic edition is more than just an e-book: it is a scholarly environment that comes with tools allowing you to explore it, modify it, and connect it with other information and analysis. Federico Meschini (De Montfort) called for an approach to digital scholarship that would maximise interaction and interoperability, by developing a set of specialised individual tools with well-defined interfaces, rather than trying to have a single do-it-all solution – lego bricks rather than a sonic screwdriver. In this way, technology and culture can work in partnership, with readers able to replicate an editor’s research, interact with an edition and even contribute to it. Sadly, there are institutional structures within the humanities that can get in the way of this ideal of digital scholarship. A current example was provided by Peter Robinson (Birmingham). An excellent scholarly treatment of the Domesday Book, with every word of the text annotated and classified, is virtually inaccessible because of commercial and licensing restrictions. Robinson outlined the institutional structures in research councils and libraries that lead to this frustrating state of affairs, and proposed a programme of reform to move away from the model of humanities data centres and towards empowering individual scholars to create and revise digital editions, with libraries making the editions available. Finally, Manfred Thaller touched on the emerging relationship between the humanities and computer science. Scholars certainly need the tools that computer science can provide, but what do the computer scientists get out of the deal? Perhaps an answer lies in the global data explosion. To a scientist accustomed to clean, well-ordered information, the internet now seems like a cacophony. Social networking, geographic tagging, commercial, social and political transactions – all human life is there, in all its glorious disorder. And this is exactly the kind of data scholars have been working with for millennia. Perhaps now is the ideal time for the humanities to show the sciences how it’s done. Slides from this event can be downloaded from: http://www.nesc.ac.uk/esi/events/854/ NeSC News www.nesc.ac.uk Issue 60, May 2008 e-Science Institute Forthcoming Events Timetable May 21 Symposium on Provenance in Databases eSI http://www.nesc.ac.uk/esi/events/894/ 21 - 22 An Introduction to Writing Ontologies in the Web Ontology Language (OWL) TOE http://www.nesc.ac.uk/esi/events/895/ 27 - 30 Building data grids with iRODS NeSC http://www.nesc.ac.uk/esi/events/866/ 3-4 Tutorial on Trusted Computing for eScientists eSI http://www.nesc.ac.uk/esi/events/886/ 9 - 10 Real-time qPCR Biostatistics and Gene Expression Profiling NeSC http://www.nesc.ac.uk/esi/events/890/ 17 HPCx 6th Annual Seminar eSI http://www.nesc.ac.uk/esi/events/891/ 18 Novel Parallel Programming Languages for HPC eSI http://www.nesc.ac.uk/esi/events/892/ 2 e-Science Directors’ Forum NeSC http://www.nesc.ac.uk/esi/events/870/ 17 EPSRC/TSB e-Science Projects Meeting NeSC http://www.nesc.ac.uk/esi/events/888/ 31 PASTA Workshop 2008 eSI http://www.nesc.ac.uk/esi/events/889/ June July Call For Papers: 4th International Digital Curation Conference Radical Sharing: Transforming Science? In partnership with the National e-Science Centre, the Digital Curation Centre (DCC) is holding the 4th International Digital Curation Conference on 1-3 December 2008 at the Hilton Grosvenor Hotel in Edinburgh, Scotland. The DCC invites the submission of full papers, posters and demos from individuals, organisations and institutions across all disciplines and domains engaged in the creation, use and management of digital data, especially those involved in the challenge of curating data in e-Science and e-Research. The closing date for papers is 25 July 2008. Full details available at: http://www.dcc.ac.uk/events/dcc-2008/ This is only a selection of events that are happening in the next few months. For the full listing go to the following websites: Events at the e-Science Institute: http://www.nesc.ac.uk/esi/esi.html External events: http://www.nesc.ac.uk/events/ww_events.html If you would like to hold an e-Science event at the e-Science Institute, please contact: Conference Administrator, National e-Science Centre, 15 South College Street, Edinburgh, EH8 9AA Tel: 0131 650 9833 Fax: 0131 650 9819 Email: events@nesc.ac.uk This NeSC Newsletter was edited by Katharine Woods. Layout by Jennifer Hurst. email kwoods1@nesc.ac.uk The deadline for the June 2008 Newsletter is: 23rd May 2008 NeSC News www.nesc.ac.uk