NeSC News Trust, But Verify By Iain Coleman Issue 60 May 2008

advertisement
The monthly newsletter from the National e-Science Centre
NeSC News
Issue 60 May 2008 www.nesc.ac.uk
Trust, But Verify
By Iain Coleman
On general election night, victorious candidates always give tedious speeches in which they thank the police, polling
staff and council officials. And quite right too. The integrity of the system that has just sent a parliamentary candidate
to Westminster, the very foundation of their legitimacy as an MP, depends on an unbroken trail of verification from
the issuing of ballot papers to the final count. This is just one example of provenance: recording the origin, changes
and influences on objects or information, in order to guarantee authenticity, integrity and quality. It’s as important in
science as it is in politics: if you don’t know where your data came from, or can’t be sure it hasn’t been corrupted,
what confidence can you have in your scientific results? In many areas of life, though, the shift to electronic data has
made it much more difficult to guarantee provenance, and science is far from immune.
Traditionally, provenance has been about a paper trail. Every change is physically documented, and modifications to
documents – whether legitimate annotations or attempted forgeries – can be detected by physical inspection. This
isn’t the case with electronic data. As anyone who has accidentally lost the only copy of a Word document knows, it
is easy to create a finished product electronically without generating a sequence of separate drafts. And data in a file
can be silently altered without leaving evidence of the change.
So there is a real problem in tracking the provenance of electronic
data. The latest e-Science Institute theme, Principles of Provenance,
is aimed at studying this problem at a fundamental level, and
discovering how computing science can help to solve it. In the first
public lecture of the theme, held at eSI on 15 April, Theme Leader
James Cheney outlined the main problems in the field of scientific
provenance, and his vision of how the theme aims to tackle them.
Theme Leader - Dr James Cheney
The kinds of provenance information that scientists are interested in are varied. They include data integrity and
quality filtering, error correction and propagation, and intellectual credit and citation. At the moment, whatever
information exists along these lines is added manually, or at best maintained by customised systems. For example,
there are thousands of specialised biological databases; independent, heterogeneous, and frequently updated.
Many of these are curated by the manual efforts of scientists. Curators either do this manually – which is dull,
time-consuming and not very robust – or they use ad hoc systems, which entails a lot of wheel reinvention and few
guarantees of quality. This is expensive, and the results vary greatly in reliability.
The question is, can we develop effective systems for tracking provenance automatically? It won’t be easy. There is
currently no consensus on how to define provenance, and not enough understanding of what a provenance system
is for.
Just storing all the information that anyone ever generates is no solution: the quantity of data simply explodes. Nor
is it enough to simply store whatever users say they want. Not only can you never please everyone, people don’t
always consciously know what they want and need. Part of the job of the computer scientist is to understand what
the problem really is, not just what people say they want. Making this happen will, however, require more interaction
between computer science researchers and potential users than is generally seen at present.
The plan for this theme is to support focused research into the foundations of provenance in computer science, to
bridge the gap between computer science and e-Science, to identify key problems and set the research agenda, and
to disseminate results and incubate further research programmes and funding proposals. The theme will host four
small symposia focusing on these areas, bringing together computer science researchers and working scientists.
By the end of the theme, it is hoped, there should be a much clearer understanding of the state of the art and of the
direction in which the field needs to travel.
Slides and a webcast of this event can be downloaded from: http://www.nesc.ac.uk/esi/events/884/
Issue 60, May 2008
Updated Call For Papers for the UK e-Science AHM 2008
Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructures
8th – 11th September 2008 in Edinburgh, Scotland
http://www.allhands.org.uk/2008/conference/venue.cfm
Abstract Submission Deadline: 15th May 2008. Early Bird Registration opens: 12th May 2008
AHM 2008 is the principal e-Science meeting in the UK and brings together researchers from all disciplines,
computer scientists and developers to meet and exchange ideas. The meeting is in its seventh year and
normally attracts between 500 and 600 participants. The theme for this year’s meeting is Crossing Boundaries:
Computational Science, E-Science and Global E-Infrastructures. The appointment of Professor Peter Coveney
(UCL) as Programme Chair heralds a new approach. This year, for the first time, key papers will be published
in two back-to-back editions of Philosophical Transactions of the Royal Society A, in the early part of 2009,
with the title Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructures. One of the
central aims of this year’s meeting is to promote the domain specific applications aspects of e-Science, as well
as building bridges between the three communities of the theme title.
The general format of the meeting will include cross-community symposia (kicked off by invited key speakers)
and workshops. The workshops are being championed by Programme Committee members in what are
considered to be key areas of e-Science that need to be addressed, rather than by a call for workshops as has
been done in the past. There will also be opportunities to present 20 minute talks. The proposed workshops
include:
•
•
•
•
•
•
•
•
•
•
•
Delivering Grid Services - the role of Central Computing Services
Infrastructure Provision for ‘Grids’: Infrastructure for Users
Software Development for Scientific Applications: Current and Future Perspectives
Information Assurance for the Grid: Crossing Boundaries between Stakeholders
Frontiers of High Performance and Distibuted Computing in Computational Science
Interactive e-Science to Support Creativity and Intuition in Research
HPC Grids of Continental Scope
Computational Biomedicine: e-Science from Molecules to Man
The Global Data-Centric View
e-Science in the Arts and Humanities: Early Experiments and Systematic Investigations
Profiling UK e-Research: Mapping Communities and Measuring Impacts
We are therefore calling for abstract submissions for:
1. General papers which are not particularly attached to a workshop.
2. Workshop papers related to the above workshops.
Further details about the workshops and important information about the submission and review process,
including guidelines for authors can be found at: http://www.allhands.org.uk/2008/programme/call.cfm
Enquiries: Please address any enquiries about abstract submission to: admin@allhands.org.uk
NeSC News
www.nesc.ac.uk
Issue 60, May 2008
On the Road with NGS
on the Microsoft Compute Cluster
Server 2003. The cluster allows
various industrial and academic
users to work on their problems
and codes in collaboration with
academics at Southampton on
an HPC platform, enabled by
Windows Compute Cluster Server.
Further information on the Windows
resources is available at:
http://www.ngs.ac.uk/sites/
southampton/.
The final presentation of the
day was from the local OMII-UK
operations manager, Tim Parkinson,
who gave an overview of OMII-UK,
the services they can offer users
and how OMII-UK and the NGS
work together.
Katie Weeks presenting at the first NGS Roadshow in Southampton.
Photo courtesy of Gillian Sinclair
The NGS and OMII-UK held the
first National Grid Service (NGS)
Roadshow in Southampton on the
10th of April.
Katie Weeks kicked off the
presentations with an introduction
to the National Grid Service and
an overview of the services that
NGS offers. This was followed
by a presentation on the training
available in conjunction with the
Training, Outreach and Education
(TOE) team from NeSC, including
courses tailored specifically for
individual research groups.
Andrew Price from the University of
Southampton gave a presentation
on environmental simulations on
the NGS, focused on the GENIE
project (http://www.genie.ac.uk/).
The GENIE project aims to develop
a Grid-based computing framework
to produce a unified Earth System
Model (ESM). Dan Adams from
the University of Southampton
Information Systems Services
(ISS) presented the Microsoft HPC
resources that are available on the
NGS.
The Spitfire-B Cluster is based
The roadshow finished off with
lunch for all participants as they
browsed the NGS and OMII-UK
exhibition stands. Many of the
participants took advantage of the
new system for applying for a grid
certificate at the roadshow which
enables users to leave with a grid
certificate on a USB key.
There are more photos from the day
at:
http://www.flickr.com/groups/uk_
ngs/
If you are interested in hosting a
NGS roadshow, then please contact
Gillian Sinclair at:
gillian.sinclair@manchester.ac.uk
AHM 2008 National Grid Service Workshop
The National Grid Service (NGS) is hosting a workshop at the annual e-Science All Hands Meeting which will be
held at the University of Edinburgh on the 8th – 11th of September. The workshop, organised by Andrew Richards,
Gillian Sinclair, Katie Weeks and Claire Devereux, is entitled Infrastructure Provision for Grids, Infrastructure for
Users. It will focus on grid support centres, user outreach, portals and user applications, end user experiences of
grid computing, subject to specific requirements from the research community and the problems of recruiting users
to grid initiatives. Full details can be found at http://www.allhands.org.uk/. The deadline for paper submissions is
1st May 2008.
NeSC News
www.nesc.ac.uk
Issue 60, May 2008
e-Science
Institute
Conference Round-up
The NGS has been busy attending several
conferences over the last month.
The NGS has exhibited at the Molecular
Graphics and Modelling Society Spring
meeting in Cardiff, the NERC environmental
e Science meeting in London, the JISC
annual conference in Birmingham, the RSC
Theoretical Chemistry Group graduate student
meeting at the University of Manchester and
BioSysBio 2008 at Imperial College London.
Photos from all these conferences can be
found at:
http://www.flickr.com/groups/uk_ngs/
Presentations given by NGS staff at these
meetings will be available soon on the NGS
website.
The NGS exhibition stand at the JISC conference.
Photo courtesy of Gillian Sinclair.
An Introduction to Writing Ontologies in the Web Ontology
Language (OWL)
21 – 22 May 2008
e-Science Institute, 15 South College Street, Edinburgh, EH8 9AA
The aims of this workshop are to:
•
•
•
•
•
•
Understand the use of ontologies.
Understand statements written in OWL.
Understand the role of automatic reasoning in ontology building.
Build an ontology and use a reasoner to draw inferences based on that ontology.
Gain experience in the Protégé 4 ontology building environment.
Gain insight into how OWL can play a role in semantic metadata.
This two-day, introductory, ’hands-on’ workshop aims to provide attendees with both the theoretical foundations and
practical experience to begin building OWL ontologies using the latest version of the Protégé-OWL tools (Protege4).
It is based on Manchester’s well-known “Pizza tutorial” (see http://www.co-ode.org).
This tutorial will cover the main conceptual parts of OWL through the hands-on building of an ontology of pizzas and
their ingredients. A series of exercises takes attendees through the process of conceptualizing the toppings found on
a pizza; the entry of this classification into the Protégé environment and the description of many types of pizza. All
this is set in the context of using automatic reasoning to check the consistency of the growing ontology and to use
the reasoner to make queries about pizzas. Since 2003, this tutorial, in various forms, has been given over 20 times
and been attended by hundreds of budding ontologists.
For registration and more details see http://www.nesc.ac.uk/esi/events/895/.
NeSC News
www.nesc.ac.uk
Issue 60, May 2008
e-Science
Institute
The Science of Scholarship by Iain Coleman
Science is easy. You get nice clean data files, with measurements, times and locations all lined up in neat little rows,
and a cornucopia of mathematical methods and visualisation tools to help you make sense of it all. What luxury this
must seem to the humanities scholar, struggling to synthesise some new understanding from scribbled documents,
fragmentary records and partial accounts, all expressed in the infinitely rich, mutating forms of natural language. And
how enviously they must have looked upon the emerging technologies of e-Science, allowing unprecedented levels
of collaboration and knowledge exchange, built on the well-defined quantitative data sets of natural science.
So when scholars started to take up the tools of e-Science for their own purposes, it’s not surprising they had to do
some serious customisation. The theoretical and practical issues encountered in the creation of digital scholarly texts
were examined in The Marriage of Mercury and Philology: Problems and Outcomes in Digital Philology, a workshop
held at the e-Science Institute on 25-27 March.
Markup languages, particularly XML, are enjoying eager use in the humanities. These can be used to characterise
a text in a rich, yet still machine-readable, way. The MOM Archive, presented by Georg Vogeler (Ludwig-Maximilian
University, Munich) and Mirko Gontek (University of Cologne) is an attempt to bridge the gap between abstract data
models and practical scholarly work. Individual words in the underlying data can be tagged to identify names of
places and people, types of statement, and so on. The text can also be annotated, and the annotations themselves
tagged, building up a multilayered, complex structure from relatively simple components. This system is available on
a multilingual website, and users can annotate texts to create their own critical editions, according to a system that
preserves technical consistency and reusability. Whether this private edition is added to the public archive is decided
by an expert moderator – this human intervention is essential to maintaining scholarly standards.
A more specialised and elaborate markup system can be seen in CLELIA, a project to create a unified, electronicallyeditable collection of Stendhal’s manuscripts. Thomas Lebarbé (Université Stendhal - Grenoble 3) described how
existing Stendhal databases are scattered, poorly structured, poorly interfaced and difficult to publish online due
to the use of proprietary software. The challenge for CLELIA is to take manuscript pages, which may have many
different kinds of marking on them from the body of the text to date headers, corrections, notes and marginal
scribbles, and structure all this data in an easily usable form. By breaking down the concept of a “page” into its
component parts, and building this structure into a database with a usable interface and XML transcription support,
the project aims to create a comprehensive web-based archive that can be used by a wide range of users – scholars,
teachers and linguists – in many different ways. This has been a multidisciplinary effort, involving computer
scientists, literature specialists, and computational linguists: a demonstration version is now available, with the site
officially going live in September.
These digital editions mark a qualitative change in how scholars and lay readers interact with text. The concept of
a critical edition is central to the humanities, and until now these have always existed as written or printed texts.
An electronic edition is more than just an e-book: it is a scholarly environment that comes with tools allowing you
to explore it, modify it, and connect it with other information and analysis. Federico Meschini (De Montfort) called
for an approach to digital scholarship that would maximise interaction and interoperability, by developing a set of
specialised individual tools with well-defined interfaces, rather than trying to have a single do-it-all solution – lego
bricks rather than a sonic screwdriver. In this way, technology and culture can work in partnership, with readers able
to replicate an editor’s research, interact with an edition and even contribute to it.
Sadly, there are institutional structures within the humanities that can get in the way of this ideal of digital scholarship.
A current example was provided by Peter Robinson (Birmingham). An excellent scholarly treatment of the Domesday
Book, with every word of the text annotated and classified, is virtually inaccessible because of commercial and
licensing restrictions. Robinson outlined the institutional structures in research councils and libraries that lead to this
frustrating state of affairs, and proposed a programme of reform to move away from the model of humanities data
centres and towards empowering individual scholars to create and revise digital editions, with libraries making the
editions available.
Finally, Manfred Thaller touched on the emerging relationship between the humanities and computer science.
Scholars certainly need the tools that computer science can provide, but what do the computer scientists get out
of the deal? Perhaps an answer lies in the global data explosion. To a scientist accustomed to clean, well-ordered
information, the internet now seems like a cacophony. Social networking, geographic tagging, commercial, social and
political transactions – all human life is there, in all its glorious disorder. And this is exactly the kind of data scholars
have been working with for millennia. Perhaps now is the ideal time for the humanities to show the sciences how it’s
done.
Slides from this event can be downloaded from: http://www.nesc.ac.uk/esi/events/854/
NeSC News
www.nesc.ac.uk
Issue 60, May 2008
e-Science
Institute
Forthcoming Events Timetable
May
21
Symposium on Provenance in Databases
eSI
http://www.nesc.ac.uk/esi/events/894/
21 - 22
An Introduction to Writing Ontologies in
the Web Ontology Language (OWL)
TOE
http://www.nesc.ac.uk/esi/events/895/
27 - 30
Building data grids with iRODS
NeSC
http://www.nesc.ac.uk/esi/events/866/
3-4
Tutorial on Trusted Computing for
eScientists
eSI
http://www.nesc.ac.uk/esi/events/886/
9 - 10
Real-time qPCR Biostatistics and Gene
Expression Profiling
NeSC
http://www.nesc.ac.uk/esi/events/890/
17
HPCx 6th Annual Seminar
eSI
http://www.nesc.ac.uk/esi/events/891/
18
Novel Parallel Programming Languages
for HPC
eSI
http://www.nesc.ac.uk/esi/events/892/
2
e-Science Directors’ Forum
NeSC
http://www.nesc.ac.uk/esi/events/870/
17
EPSRC/TSB e-Science Projects Meeting
NeSC
http://www.nesc.ac.uk/esi/events/888/
31
PASTA Workshop 2008
eSI
http://www.nesc.ac.uk/esi/events/889/
June
July
Call For Papers: 4th International Digital Curation Conference
Radical Sharing: Transforming Science?
In partnership with the National e-Science Centre, the Digital Curation Centre (DCC) is holding the 4th International
Digital Curation Conference on 1-3 December 2008 at the Hilton Grosvenor Hotel in Edinburgh, Scotland.
The DCC invites the submission of full papers, posters and demos from individuals, organisations and institutions
across all disciplines and domains engaged in the creation, use and management of digital data, especially those
involved in the challenge of curating data in e-Science and e-Research.
The closing date for papers is 25 July 2008.
Full details available at: http://www.dcc.ac.uk/events/dcc-2008/
This is only a selection of events that are happening in the next few months. For the full listing go to the following
websites:
Events at the e-Science Institute: http://www.nesc.ac.uk/esi/esi.html
External events: http://www.nesc.ac.uk/events/ww_events.html
If you would like to hold an e-Science event at the e-Science Institute, please contact:
Conference Administrator,
National e-Science Centre, 15 South College Street, Edinburgh, EH8 9AA
Tel: 0131 650 9833 Fax: 0131 650 9819
Email: events@nesc.ac.uk
This NeSC Newsletter was edited by Katharine Woods. Layout by Jennifer Hurst.
email kwoods1@nesc.ac.uk
The deadline for the June 2008 Newsletter is: 23rd May 2008
NeSC News
www.nesc.ac.uk
Download