Progress towards a Holistic Web

advertisement
533577047
Page 1 of 20
Progress towards a Holistic Web: Integrating OpenSource
programs, Semantic data, Wikis and Podcasts
Henry S. Rzepa and Marion E. Cass
A Contribution from Imperial College London and Carleton College, USA.
Abstract
The way educators typically use the Web to support their teaching in 2006 is arguably a regression from many
of the ideals first anticipated in 1994. Time pressure, a reluctance to learn "difficult HTML", and pressure from
the publishing industry has allowed the Web to retreat into "shrink-wrapped" black holes known as Acrobat
files. An ever greater reluctance (by both authors and publishers) to appreciate the importance of deploying
meta-data in a meaningful manner means that most often, these Acrobat files represent the bones lying in an
information graveyard, stripped of any "reusability" and really fit only for printing (e-books have yet to take off
in any significant sense). In our article (we hesitate to perpetuate the above by calling it a "paper"!) we discuss
two particular themes. Firstly, how a greater emphasis on data capture and its re-usability, together with the use
of opensource software such as the remarkable Jmol, can result in a much more meaningful and future-proofed
way of presenting chemical knowledge to students. We illustrate this via two resources, one designed to
introduce symmetry to chemistry students, the other a dynamical introduction to pseudorotation in fluxional
molecules. Secondly, we address the issue of how to create holistic resources and to overcome the reluctance of
stressed and pressured academics by discussing two recent phenomena, that of the "Wiki" and the "Podcast".
The Wikipedia is perhaps the best known illustration of how a community can coalesce and produce something
far greater than the sum of its parts.1 Podcasting, which seems to be taking off in chemistry, focuses on audio
and video content, but seems divorced from other forms of content, and is currently rather less than holistic.
Currently, these two broad themes about how the Web should evolve are more or less developing
independently. The prospects of coalescence are discussed.
Introduction. Is the current state of the Web what was intended: an
Accumulation of Acrobat?
What exactly has the "Web" evolved to in 2006? Its a difficult question, since it now means so many things to
so many people. From a student's perspective, it is frequently synonymous with "Google", and very possibly a
feeling of frustration at being unable to find high quality "trusted" or relevant information amongst the
overwhelming mass of trivia. Google may also be a convenient avoidance of a rigorous search of available
scientific data bases (perhaps databases of reviewed scientific materials). From many an educator's point of
view, we suspect the Web is simply a convenient avoidance of the photocopier. They see it as providing an
alternative and apparently easier mechanism for the delivery of "shrink-wrapped" printable lecture-sized
nuggets of otherwise conventionally authored notes which accompany (but which must not compromise) the
still traditional 50 minutes of oral lecture delivery. The motivation to the lecturer is to minimise the amount of
time spent in "maintenance of the lecture", This means the adoption of mechanisms which require a minimal
learning curve, and which incur little or no subsequent maintenance.
How does many a time-pressured educator approach using the Web in 2006? Coerced by most publishers into
learning the shrink-wrapping technology of Acrobat (the collective noun for which is an accumulation) and
receiving this back in turn from essentially all Web-based scientific journals, it becomes easy to be lulled into
thinking that re-use of this expertise for producing teaching and learning materials for students is an effective
533577047
Page 2 of 20
use of an educator's time. Implicit is the assumption that students too will also benefit from being introduced to
this system at an early stage, both as consumers, and in turn as authors. Before moving on to consider how these
objects are used, we would like to ask the reader to consider whether they have ever seen, or contemplated
adding to, the menus shown in Figure 1 as part of the process of producing an Acrobat file. These relate to
"metadata", a topic that will be returned to in more detail below (and which in colloquial terms is now referred
to as the Next Intel Inside).
533577047
Page 3 of 20
533577047
Page 4 of 20
Figure 1: Menu pages in Acrobat 7.0 Professional relating to document information and meta-data for (a) a
typical journal article, with the (b) the additional metadata opened and (c) the advanced selection inspected. The
values (or lack of values) for the various fields are exactly as found for a typical journal article.
Consider next the inevitable consequence of being bombarded by the Acrobat cannonball. As it happens,
delivery of such an object often bypasses the Web altogether, being emailed by individual lecturers to students
directly, or via intermediate tutors (a mechanism particularly favored by colleagues in the department of one of
the present authors). A little more arduously, the material will be posted to a content management system such
as WebCT (perhaps more accurately described as document management system if the content is largely still
managed by Acrobat), in order to provide greater context for each document. The CMS also serves to provide
an access control (or in the modern parlance, a Digital-rights-management) mechanism, with a parochial
navigation tree for students to find their way to a document and its relatives. The net outcome of this use of the
Web (or of email) is that the student acquires a copy of the Acrobat file in their own document space,
committed from thence more often than not to a printer. Other Acrobat files come from sources such as
journals, expedited nowadays by the recent introduction of digital-document-identifiers (DOIs).2 These provide
1-click access if embedded in Acrobat files or Web pages as a hyperlink. An example is given here:3
http://dx.doi.org/10.1021/ci010082q (note however that you will need valid access to the journal to acquire the
reprint). The outcome is an increasingly large, and quite possibly disorganized collection of these objects (one
533577047
Page 5 of 20
author speaks purely from personal experience here!). During a typical degree, a student might nowadays
accumulate perhaps 1000 such files; a researcher may have many more, perhaps 5000+.
How does this collection represent the facts and knowledge pertaining to the bulk of a degree course, in say
chemistry? The assemblage has some of the following properties:



The Acrobat files are likely to be distributed across a number of computer hard drives, memory sticks
and mobile devices such as iPods etc. Aggregating and re-conciliating these is not a trivial task.
The files are unlikely to have a obvious, consistent, or even meaningful naming convention.
It is very unlikely that any document meta-data, illustrated in Figure 1, will have been added (the
document illustrated above dates from 2002, but in fact inspection of 2006-vintage examples produces
very similar results). Thus it is not obvious how even simple questions such as "is this document about
chemistry?" can be answered without opening each document in a suitable reader and having a human
read it. A "mouse-over" on an unopened document merely informs that it is an Acrobat document. A
"right-click" to reveal the document properties reflects (Figure 2) merely fields which were left
undefined in Figure 1.
Figure 2: Document properties panes from Windows XP, illustrating known metadata for a document.
One also notes that such meta-data relates to the document as a whole; any finer grained structure which the
document may contain (such as reference to discrete molecular species) cannot be captured at this level.


Any relationships between two or more documents (such as one document citing the other, or two
documents each covering the same concepts or terms) are only really definable via the keywords or title
meta-data field, and given no real context. To achieve this one would have to read and understand each
document.
Chemical relationships (defined by say having one or more molecules in common) may be equally
difficult to detect, even if individual documents are visually inspected, since molecules may not be
533577047
Page 6 of 20
represented in a consistent, and hence comparable, manner. Thus some may merely be named, either
trivially or systematically, others may be drawn as unparsable line diagrams, and yet others represented
generically using e.g. the Markush R conventions to denote a variable group.
 Conceptual relationships (in chemistry), the so-called synoptic or holistic view of a subject, represent the
ultimate challenge in perception. Normally, students would rely on a carefully and lucidly written text
book, to achieve this sort of perspective across a subject. The issue with texts books, or other static
organizations of knowledge, is that they are not readily extensible, and new information cannot so easily
be related to existing organized knowledge. Lecture handouts, it is true, can have an appended reading
list, this indeed being part of the function of a lecturer giving a course, but it is still hard work actually
making the semantic connections between diverse sets of materials.
Perhaps a chemical analogy to this state of affairs can be drawn. The hard drive would be equivalent to a
container filled with argon gas. Being "inert", the individual atoms of this gas hardly interact with one another,
and apart from a count of the total number of atoms, little further structure to the collection can easily be
discerned; molecules are certainly unlikely to form under these conditions, and nor will it be easy to crystallize
this collection into something with a well defined structure and internal relationships.
Partial solutions to some of these issues do exist. Thus;




One might rely on the likes of Google (or the rather more selective Google Scholar4) to perform a freetext indexing of the content of any information object at its original site of publication, but, as noted
above, with the limitation that heavy contamination with less relevant sources is likely. It is unlikely that
the Google robot would be allowed to access protected content in a content/document management
system which probably has its own parochial index and search facility. This latter could not achieve the
holistic overview required.
If the content can be consolidated to a single computer hard drive, "desktop searching tools" can now be
acquired to index the free text-based contents, and thus to establish some sort of holistic relationships
between documents from diverse sources. Such relationships will however rely on finding unique and
common descriptors of suitable concepts in the body of the document. In the example we illustrate
below, one might find terms like Berry pseudorotation and non dissociative ligand exchange which
describe the same concept, but in fact have no words in common.
Higher order concepts, for example requiring an understanding of three dimensions, may be definable
only with great difficulty. Describing, uniquely, the actual processes involved in a Berry pseudorotation
using only simple words and definitions, is quite a challenge! In our article10 we introduce two other
modes for this process, the "lever" and the "turnstile" modes, and just to complicate matters, show how
some systems can have attributes of all three. If you are not familiar with these terms, you are unlikely
to understand what is meant by them by reading just the preceding text. Any indexing/search system
might make a connection between these various concepts, but the outcome is likely to be quite variable
in general, and may depend on how effective any dictionary of synonyms available to the indexing
engine is.
It is possible to achieve greater structure to a document collection using bibliographic tools such as
Reference manager or EndNote, but this requires a great deal of effort on the part of the user. The lack
of self-identifying metadata, as noted above for a typical document, makes this process relatively manual
and arduous, and one few students would assiduously adopt over a degree course.
533577047
Page 7 of 20
Data is key to the Holistic Web (Data is the new Intel
Inside)
Metadata; The Dublin Core
The emptiness illustrated via Figures 1 and 2 emphasizes how neglected meta-data is. Simple fields such as
document title, author names, dates, and keywords are all too frequently omitted from all kinds of documents.
This may be because the authors of such documents regard these fields as self-evident to anyone reading them.
This misses the point entirely. If indeed one is in possession of 1000, or even worse 5000 documents, merely
opening them each would take 27 hours in the latter case (assuming 20 seconds per document). As it happens,
computers are rather good at doing this automatically; appropriately declared meta-data is simply a standard
mechanism for exposing this information to a computer (rather than just a human), thus enabling more efficient
subsequent indexing and retrieval. The so-called Dublin-core (DC) meta-data schema ensures crosscompatibility across a diversity of programs and documents. To drive this point home, we here declare our
manifesto, the first point of which is;
1. Wherever possible, meta-data based on the Dublin Core should be declared in any document
circulated to students.
Molecular Metadata: InChI
Unlike the bibliographic-derived DC metadata schema, general chemical metadata schemes are relatively
underdeveloped. However, one particular type is worth noting at this stage; the so-called InChI identifier.5 This
International Chemical Identifier enables a unique descriptor to be derived for a molecule, based purely on how
the atoms are connected (but not specifying the type of bond that connects them). Although it is difficult to be
precise, we estimate that in a typical chemical degree, students may come across around 1000 unique molecules
exemplifying some aspect of their lectures or laboratories. Many of these probably recur in more than one
context. If each of these were to be labeled with an InChI, then detecting instances of each molecule would be
greatly facilitated. An InChI string can be embedded into documents in various ways. Thus in an HTML page, it
might be found as such:
<link type="chemical/x-cml" rel="meta"
href="http://www.ch.ic.ac.uk/local/symmetry/symmetry/exercises/12GW98.cml"
title="InChI=1/CHBrClF/c2-1(3)4/h1H/t1-/m1/s1 for (S)-CHBrFCl)" />
or
<link type="application/rdf+xml" rel="meta"
href="http://www.ch.ic.ac.uk/rzepa/bpr/YX5.rdf"
title="InChI identifiers for YX5 species" />
This latter approach was adopted in adding molecular meta-data to our symmetry exercises web resource,
described below.
2. Wherever possible, molecular metadata based on the InChI identifier should be declared in any
document circulated to students.
Metadata: Resource Description Framework
533577047
Page 8 of 20
The type of metadata described above is a relatively one-dimensional descriptor, it merely states that
somewhere in the document it relates to one can find e.g. a particular type of data or information. It provides for
little more context than this. A much more powerful form of metadata has been developed known as RDF, or
Resource Description Framework, in which each item actually has three components. These are known as the
object, which makes an assertion or predicate about a subject. The predicate imparts a powerful context to the
information, and both subject and object can exist either within a document, or independently of it. More
interestingly, the subject of one assertion can be the object of another. In theory at least, such a declared
mechanism could provide a powerful method of creating links between different types of information; a process
which has been referred to as intertwingling. Until now, such mechanisms have hardly been exploited in
chemistry, but they could form the basis for one holistic approach to the subject. For example, close inspection
of Figure 1C reveals an entry referred to as XMP Core properties. This is the way in which Acrobat can contain
an RDF declaration (known technically as a triplet). The purpose of carrying it in this manner is that it can now
be queried by any system wishing to find out about the document properties; it is self-identifying metadata. To
refer back to the metaphor introduced of a container of argon gas, RDF can introduce strong intermolecular
interactions! Perhaps even bonding. The collection now becomes much easier to crystallize. For further detail of
such a system, see reference 6.
3. In future, relationships between information expressed as RDF or other information triples should be
declared in any document circulated to students.
Data and the Means to display it
What better means to complement metadata, than to have access to the actual data itself! Such data can be
merely referenced within the document, or it can become an integral part of the document itself, in which latter
case we would use the term datument instead. Most participants at this conference are entirely familiar with
such concepts, which date back to 1994 when they were first introduced7 and which have no doubt been the
subject of many previous articles and discussions at this forum. The particular topic of the use of XML in this
context will be deferred to the discussion of podcasting below. Here the focus will be on how data can be used,
on the presumption that operations rather more complex than merely printing are now required. Here, we leave
the realm of Acrobat almost entirely, and enter that of the (hyper)active, re-usable document.
Molecular Structures and Animations + OpenSource: Jmol and Molecular Symmetry
Exercises
Several of the themes introduced above can be illustrated via our project to illustrate molecular symmetry. The
premise is that this topic is particularly difficult to teach without the help of three-dimensions, either real or
virtualized by movement (or indeed stereoscopic computer visualization). Clearly, an additional component
must be supplied in addition to the data and its context that comprise a datument, namely some suitable
software to visualize the combination. Enter the remarkable Jmol opensource software. This owes its origins to
two software packages. The first was called Xmol, which had been developed in the 1980s to allow users of
national supercomputer centers to visualize the results of their molecular calculations. The second was called
Rasmol, and it had been developed around 1990 to visualize the results of protein crystallography. Both were
early examples of OpenSource software, in that source codes were available for new generations of enthusiasts
to adopt. In 1996, Xmol evolved with such a new generation into Jmol (with a new set of developers), this now
being a Java-based reworking of the program; its OpenSource credentials were emphatically retained however.
Rasmol, a little earlier in 1995, had been adopted by a commercial vendor and reworked as a so-called browser
plug-in known as Chime. Significantly, this re-working was never released as OpenSource, and it is noticeable
that whilst Chime has been little enhanced and extended recently, Jmol goes from strength to strength. Our first
effort at producing Web-based symmetry exercises had in fact been based on Chime, but we decided to seize
the opportunity of adopting an OpenSource presentational method and have reworked it using Jmol.
533577047
Page 9 of 20
We also took the opportunity to incorporate some of the themes discussed above.8 DC metadata and InChI
identifiers were added so that the molecules used to illustrate the symmetry operations could be readily
identified and indeed indexed and searched for (Figure 3).
Figure 3:. Introductory screen for the Molecular symmetry exercises, viewable at
http://teaching.ch.ic.ac.uk/symmetry/
Molecular animations of pseudorotations
533577047
Page 10 of 20
Our second example9 arose from a not untypical scenario. A member of the teaching staff in one of our
institutions was taking a sabbatical, and someone else was asked to give the course for one year only. One
component involved discussing with the students a phenomenon known as pseudorotation, along with
associated apparent modes called turnstile rotation, and a further obscure mode referred to as an umbrella
motion. All they were given by way of explanation was a rough sketched diagram of the former, and a reference
to a journal article for the latter. The surrogate lecturer approached us asking whether we could devise a Web
page to illustrate these three effects for students (and actually for the lecturer themselves in the first instance,
Figure 4).
Figure 4:. One of the interactive figures produced for the pseudorotation article; viewable at
http://www.ch.ic.ac.uk/rzepa/bpr/esi/
533577047
Page 11 of 20
We deliberately couched this to have the appearance of a journal article (which indeed it spun out into10), in part
to emphasize to students what a holistic integration of journals and teaching materials might look like. But there
were other outcomes of taking this approach.
1. To gain an accurate representation of the process, we undertook ab initio calculations for our systems.
This held some surprises.
2. Thus for the molecule PF5, we could find no evidence of turnstile rotation, or indeed any umbrella
motion. The latter in particular we think is merely a speculation, for which no explicit examples have
ever been identified. Even the former mode is unproven for PF5.
3. Seeking to generalize the effect for students, we approached IF5. During the course of viewing the Jmolgenerated animations of this species, we concluded it approximated to an "upside-down" inverted
version of pseudorotation. We struggled to find a name for this mode, and observed that depending on
the angle the vibration was viewed from, it magically assumed differing characters. Finally, we settled
on "chimeric pseudorotation". Chimeric stems from Chimera, the mythological Greek demon of the
mixed and "monstrous" character of a goat, snake and lion. Recently, chimeric has become "a byword
for fabulous and fantastic - but utterly mixed-up - ideas, in this case ligand exchange via Berry
pseudorotation, turnstile rotation, and a new mode called lever rotation. Ligand exchange in IF7 proved
equally challenging to describe using only words. The point of this discourse is that in extending how
the concept of pseudorotation could be presented to an audience of students seeing it for the first time,
we had rapidly discovered new and magical modes which had hardly been described in the chemical
literature (and in which two well known text books actually made fundamental errors of description of
these phenomena). By presenting the data from calculations of these effects in such an animated manner,
even these quite new and conventionally difficult concepts may become more accessible for students,
and staff alike.
4. Further surprises were still in store for us. We found that the related BrF5 could sustain not one but two
distinct modes for interchanging the F atoms. We eventually found a convenient label for these two
modes in a quite different area of chemistry; that of the stereochemistry of Pb(II) compounds. Such
systems exhibit both hemidirected and holodirected coordinations. We had thus joined one aspect of the
stable structures of lead systems with the modes adopted by the fluxional transition states of bromine
molecules. A lot of concepts in chemistry have been covered by the last few sentences. Metadata
mechanisms such as RDF would nicely capture these connections. Whereas we discovered the
connection in part by accident (one of us attended a scientific conference and came across a student
poster describing Pb(II) systems, which, some months later, "clicked" as they say), perhaps in the future
the expression of these concepts as espoused above will allow a more systematic and less accidental
mode of discovery!
5. Convinced that some of these insights were sufficiently novel to be worth publishing a formal research
article, in our first submission we made the tactical mistake of referring to the pedagogic stimulation for
the discoveries (a referee disparaged the work on this basis). A rewriting which avoided such mention
was more successful; indeed the article has been selected by the journal for "enhancement" in the sense
that the data and resulting animations are now to be incorporated into the mainstream article, rather than
merely being downgraded to "supporting information".10
In this sense, the holistic aspirations of this work have been recognized. However, we look forward with some
anticipation to receiving the Acrobat version of the eventually published article (see Figure 1)! In particular, a
unique feature of the "enhanced" resource is the potential to "re-use" the data contained in it. Figure 5 shows
how 3D coordinates for a molecule displayed using e.g. Jmol can be extracted for use in possibly another
context. Acrobat is poorly suited for such data-mining, although the presence of meta-data at least indicating its
existence elsewhere might be a workable alternative.
533577047
Page 12 of 20
Figure 5:. Interface for capturing 3D coordinates from a Jmol display.
Towards the Holistic Approach
The Podcast
The metaphor of student learning resources merging with a journal article is one means of achieving a holistic
effect. But this neglects the audio-visual components of teaching with which we are all familiar. We were
intrigued by the possibility of somehow combining the two approaches. A few years ago, the newsfeed (more
formally known as the RSS feed) started becoming popular. In its original incarnation, it carried only text, in the
form of meta-data describing a recently created or changed resource on the Web. This alerting mechanism has
recently been adapted to cover broadcast and audio-visual materials, and in this guise has become known as a
podcast. A few adventurous souls have adapted existing taught courses and recast them as podcasts. Apple
Computer has produced a software solution, iTunes, which allows such podcasts to be captured on a portable
storage device known as an iPod (from which the name derives). They have extended the concept by producing
a customised version of iTunes known as iTunes U which allows its use in a university environment for
533577047
Page 13 of 20
podcasting lectures. In effect it is a content management system for audio-visual materials; a selection of
chemistry podcasts available in their "store" is shown in Figure 6.
Figure 6:. The podcast page in the Apple iTunes store, filtered using the search term chemistry.
Our entry into Podcasts started with two observations. Firstly, the information in a podcast is defined using a
language known as XML. This is essentially a protocol for extending the well known HTML into more specific
and accurate ways of capturing information. An XML document can contain several XML-based languages,
each distinguished by a so-called namespace. The basic podcast for example contains two namespaces; the first
is the RSS 2.0 syntax, which provides mechanisms for linking in basic meta-data (title, description, author etc)
and also defines a way of expressing an <enclosure /> which is a basic audio or video file. The second is the
proprietary <itunes: /> component, which handles more specific meta-data (relating to properties which the
iTunes software needs to know about). Importantly, these two components, one "open", the other not, co-exist
533577047
Page 14 of 20
quite happily together; neither destroys the other. With this premise, it becomes trivial to add further
namespaces. We added a <cml:>...</cml:molecule> namespace, into which chemistry can be poured. Again,
its presence is quite benign, and it is simply ignored by programs such as iTunes which do not recognise it.
Importantly, it can quietly lurk in the podcast, and when software which does recognise its presence receives the
podcast, it can then spring to life. For good measure, further namespaces such as <mathml: /> (for
mathematical content) and <svg: /> (for graphical content) can be inserted. An example of this can be found at
http://www.ch.ic.ac.uk/video/index.rss.
The podcast viewed in this manner becomes an interesting vehicle for accurately expressing a variety of types
of information and data in a single document, which can be automatically delivered to either a viewer of some
sort, or a storage device on a regular basis. It can be processed in quite a few ways. Most trivially, it could be
viewed simply in a Web browser. If you would like to view the above in e.g. Firefox, you should be able to see
some of the embedded components (particularly the MathML and the SVG). iTunes would be used to play any
audio and visual components. A system known as Bioclipse11 can extract the molecular content. The list of
applications is potentially endless! A glimmer of a how a holistic approach to the Web might be achieved is
starting to be seen.
One new variation on the Podcast rather spoils the above modular approach. Introduced by Apple into its iTunes
software, the enhanced podcast allows visual materials to be integrated into the timeline of an audio stream.
The intent was clearly to enable reproduction of a conventional lecture, which obviously contains these two
components. The mechanism adopted for this is to multiplex the visuals directly into the file containing the
audio components (an MPEG 4 file), and add a timetrack to the MPEG file which indicates start and end
timings for each visual component. Each inserted component can also have an associated URL which can point
to further materials. This is illustrated in Figure 7, being the editing presentation of one program that enables all
this (Podcast Maker).
533577047
Page 15 of 20
Figure 7:. Interface for creating an enhanced podcast. Each episode contains a single audio file, with defined
timepoints at which visual material (normally a JPEG diagram, but it can also be an Acrobat file) is introduced
as a so-called chapter. Each chapter can have a title which will be displayed, and a hyperlink which can be
displayed in a Web browser.
533577047
Page 16 of 20
Is the Current Podcast Genre Holistic?
Whilst this relatively simple one step forward for the podcast mechanism is easy to author, and to view in
iTunes, in other ways it also represents one step back. The additional content is buried inside a binary file, and
once in, cannot be extracted out again. Worryingly, it cannot be indexed, and hence searched for. It has no
meta-data of its own. Given this mechanism is very new (having been introduced only in late 2005), we might
presume it is still quite immature, and that some of these failings will in time be addressed. At any rate, as
teachers we have an interesting new way of delivering audio, video, and lecture notes in a smartly integrated
manner, with hooks for attaching chemical content to these.
Let us however inject a controversial note at this stage. A perusal of iTunes-U Podcasts with alleged chemical
content (April 14, 2006) reveals most of them to be in the "easy to produce" audio-only format, with no visual
or other enhanced content. In one example (we will not identify it), one is first treated to around 3 minutes of
introductory noise from the audience, before a speaker introduces themselves. Thereafter, we can only hear the
sound of chalk being applied to a blackboard for a further minute or so. The only way to find out the content is
to either have the detailed course timetable to hand, or to endure the four minutes of "noise" in the hope of
eventually hearing something worthwhile. A suspicion must be that if this level of (chemical) quality persists,
the podcast will rapidly join other technically clever, but eventually irrelevant technologies in the dustbin of
neglect. There are no "quick and easy" solutions to this issue; one always gets what one pays for (or at least
invests the time in)!
The Wiki as a Collabulary
The final approach to be discussed here is the Wiki (exemplified here by one particular flavour of Wiki, the
WikiMedia. Other types of Wiki implement some of the features described below slightly differently). In some
ways, this mechanism (a representative of what is often colloquially referred to as Web 2.0) is a substantial
simplification of the original Web concept (which would have to be Web 1.0) and in other ways is arguably a
welcome return to what Berners-Lee always envisaged as the true objective of the Web. The two main
differences to a conventional Web page are:
1. The basic authoring syntax is simpler than HTML. Whereas the latter has around 60 elements, which
carry the basic information, the Wiki reduces this to a smaller number of simpler rules (these can be
summarised in a one page manual, supplemented by templates, of which sadly, none exist for
chemistry). There is however a smart processor (php) in the background that interprets these rules and
stores the result in a proper (MySQL) structured database. In HTML, any unclosed or improperly
constructed element will result in a severe error; in Wiki it either works properly or it is not a container,
and there is no equivalent of a broken element or container.
2. This simplified approach in turn allows a Wiki to move away from the write once, read many
conventional approach to a more symmetric write many, read many metaphor. The write many can be
either the entire world, or constrained to say a local subset of students and teachers in order to address
institutional concerns about legality etc. For anyone interested, the latter can be easily implemented in a
Wiki by adding the (suitably localized) following lines to the LocalSettings.php file;
require_once('LdapAuthentication.php');
$wgAuth = new LdapAuthenticationPlugin();
$wgLDAPDomainNames = array( "ic.ac.uk" );
$wgLDAPServerNames = array( "ic.ac.uk"=>"ldap.ic.ac.uk" );
$wgLDAPSearchAttributes = array( "ic.ac.uk"=>"uid" );
$wgLDAPBaseDNs = array( "ic.ac.uk"=>"ou=everyone,dc=ic,dc=ac,dc=uk" );
$wgLDAPUseSSL = true;
$wgLDAPUseLocal = false;
533577047
Page 17 of 20
$wgLDAPAddLDAPUsers = false;
$wgLDAPUpdateLDAP = false;
$wgLDAPMailPassword = false;
$wgLDAPRetrievePrefs = false;
$wgMinimalPasswordLength = 1;
A Wiki is therefore very much a collaborative medium (a collabulary) and the database driven approach tends
to reduce the error rate in the underlying syntax to a minimum. One of the more interesting features is arguably
neither of the above, but the mechanism which is adopted for Linking. As an author, if you wish a word, phrase
(or image) to signify a link, the term is simply enclosed in brackets thus: [[electrocyclic reactions|A class of
pericyclic reaction]]. The syntax of the this example indicates a hyperlink to another page on the same Wiki
with the name electrocyclic reactions, and with an informative title following the | character (which in effect is
treated as a simple metadata term for the hyperlink). The author does not need to know the location of this page,
the system determines it automatically. If in fact the page does not exist, selecting the link will forward the
reader to a blank page, whereupon they are invited to then become its author! This Janus-like role for the
Wikipedian is entirely novel, and could potentially revolutionise the relationship between teacher and student.
A second example is also interesting; [[organic:electrocyclic reaction|A class of pericyclic reaction]]. The
word preceeding the : character defines a category. It provides a means of collecting groups of terms into a
small communal space, within which all participants can more or less agree upon the terms they will use.
Another way of thinking about it is that the community has defined a dictionary of terms which they agree to
call organic. This type of link functions in a similar manner to the (XML) namespaces we noted in the
paragraph on podcasts above. A further (very loose) analogy can be drawn to the RDF discussion above. Each
link originates from an object page and defines the subject page, with a predicate specified by the term
following the | character. Viewed this way, the Wiki incorporates (admittedly in a much less formal manner)
many of the concepts already discussed above. Indeed, a more formal integration of RDF into MediaWiki has
already been done, and can be seen implemented in our own Wiki experiment. Another outcome of the use of
categories (and sub-categories) is that the structure of the collection can be expressed via a table of contents; it
also goes without saying that a full searchable index is automatically available.
Finally, in this overview of the Wiki, we note that under the skin, the technology uses very sophisticated Webstandards, and in particular extensive use of Stylesheets. This means that for each page of a Wiki, one has the
option of producing a printable page, also suitable for archiving onto CD-ROM etc. At least in part, this might
address the concerns of e.g. Mark Bishop, who asks whether chemistry instructors and students ready for an
Internet-based chemistry text?
The Wikipedia
The preceeding discussion about categories is very highly developed in the best known incarnation of a Wiki,
the Wikipedia.1 Anyone thinking of designing a Wiki-based course is well advised to become familiar with the
Wikipedia; to become in fact a Wikpedian (of which there are already apparently more than one million), or
even say a Wikipedian chemist. Doing so enables one to learn about the sociology of how to create/create and
edit articles, and shorter forms of articles known as stubs, and how trust in this extraordinary environment
works. The taxonomy of a subject is handled using Wiktionary and associated thesaurus. Its also apparent that
the current Wikipedia concept is evolving very rapidly, with new concepts emerging so it seems on almost a
daily basis. Becoming a master Wikipedian is no longer an afternoon's work!
Using the Wikipedia approach for Synoptic Course Creation?
A number of issues arise from the Wiki concept.
533577047
Page 18 of 20
1. Should one create courses using a local, carefully controlled Wiki, or strive simply to extend the global
Wikipedia (or both)? This may boil down to how one views Open Access. Wikipedia is freely available
to all; some may regard university courses as suitably available only to those who have paid the fees!
For the record, we pin our flag firmly to the Open Access mast, together with the use of institutional
repositories.
2. Should one or more Wikipedia articles map onto the conventional 50 minute delivered lecture?
3. If not (as seems probable), then how might the two relate to each other?
4. Is the approach adopted in the article you are presently reading perhaps the way to go (whereupon links
to the Wikipedia are scattered throughout the linear discourse here)? Wikipedia of course takes are
rather less linear approach.
5. On a more technical level, could the two chemical stories above (Figure 3, 4) be recast in their entirety
into a Wiki (tools forHTML2Wiki conversion do exist).
The simple answer to this last question we think is: yes, but not as we know it! This is because both our course
examples require the Jmol applet to render the molecule files into an on screen representation. This in turn
conventionally requires an HTML element such as <object /> (or the older <applet />) to be used. Currently,
although Wiki(Media) does support the insertion of some HTML elements, neither of these two is included in
the supported list. Instead, the relevant calls must be made at the PHP code level.
The Wikipedia chemical community is growing, and currently appears to consist of at least the following
groupings:
1. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemistry
2. http://en.wikipedia.org/wiki/Portal:Chemistry
3. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals
Its becoming obvious that a more holistic view of these diverse projects might be useful (perhaps similar in
intent to the folksonomy), although it would need to be generated at least in part by automated means (a bot)
rather than just human editorialising. One means of doing this might be to embed an InChI identifier for any
Wiki page that includes a molecule. A bot could then trawl the Wiki(pedia) and be able to conclusively identify
any molecules it finds. An example of a Wiki(pedia) page with such annotation can be for mauveine and
safranine. These both mention the same molecule (although they may not necessarily link to each other), but
have an InChI in common.
This brief discussion of the Wiki only covers a few aspects of this system, and its sociology; it may be that other
features overlooked here are equally relevant to the chemical community, and the present authors welcome
other perspectives on this and the other systems mentioned here. Many of the more controversial aspects (such
as trust and quality) are discussed in the various articles cited in reference 1.
Conclusions
The Web version 2006 is a multi-faceted organism, with many attributes. Here we have covered a number of
these facets. One, namely the headlong rush of the world's scientific content into bundles (black holes?) known
as Acrobat files, we argue, has been badly thought out in terms of how these objects may be subsequently used
and integrated into a more holistic whole. Publishers and authors alike have solutions in their own hands; they
should start taking metadata very seriously, and start adding it to Acrobat files in a manner which is truly open
(exposed) and queryable by any software. With time, increasingly finely grained metadata, including molecular
metadata, would be expected from them. More sophisticated applications of meta-data such as RDF perhaps
hold the potential to achieve much more robust structures from our knowledge.
533577047
Page 19 of 20
The XML-based podcasting models provide interesting new mechanisms for closely associating audio and
video content with more traditional text-based content delivery. It is clear however that the current
implementation has some undesirable characteristics common to Acrobat, i.e. of bundling up the information in
a manner which can loose it to search engines, and inhibits its re-usability, and the lack of formal mechanisms
to add metadata to the "enhanced" components.
The Wiki, another newcomer to the block, provides a more natural way in which authors of material can
integrate it within a bounded discipline. The Wiki also provides a more symmetric medium, in which students
as well as lecturers can participate, filling in gaps perhaps as part of projects of their own. It also has holistic
aspirations. To quote from the section on meta-data: "Finally, RDF and the Wiki-principle are a perfect match.
The take-off of the semantic web is slowed down by the need for trust. Anybody can write information on the
web, but it is hard to see which information is indeed correct. Wikipedia has shown, that the collaborative
editing of articles leads to better quality and less disinformation. (A sign for the reliability of Wikipedia is that
its articles very often appear as the first search result in Google.) Therefore, also RDF data in the Wikipedia is
more trustworthy than on nonfamous web pages. The second Wiki-characteristic which makes it suitable for
RDF is that anybody can easily contribute. Contributing information is almost as easy as getting information
since the dawn of the WikiWikiWeb."
Which of these various media will prove popular with educators, and which will be adopted by only a tiny
fraction of enthusiasts, only time will tell. We can be sure however that the rate of innovation which the Web
over the period 1994 - 2006 has demonstrated is possible, will almost certainly continue. In five years time, a
Web age, it is to be hoped that the trend will be towards taking a holistic, synoptic and Open-oriented view of a
subject (summarised by the concepts espoused by Web 2.0), rather than an headlong rush into protectionism,
proprietary standards and software, and heavy-handed digital rights management.
References
1. Internet encyclopedias go head to head, J. Giles, Nature, 2006, 438, 900-901. DOI: 10.1038/438900a.
See also Wikipedia: Social revolution or information disaster? M. A. Walker, Abstracts of Papers, 231st
ACS National Meeting, Atlanta, GA, United States, March 26-30, 2006, CINF-007; Wiki ware could
harness the Internet for science. K. Yager, Nature, 2006, 440, 278. DOI: 10.1038/440278a
2. Digital object identifiers for scientific data, N. Paskin, Data Science Journal, 2005, 4, 12-20. DOI:
10.2481/dsj.4.12.
3. Electronic Chemistry Conferences: 7 Years of CONFCHEM, B. M. Tissue, S. E. Van Bramer, and D.
Rosenthal, J. Chem. Inf. Comput. Sci., 2002, 42, 23-25. DOI: 10.1021/ci010082q
4. http://scholar.google.com/.
5. InChI: Open access/open source and the IUPAC International Chemical Identifier., S. R. Heller, S. E.
Stein, D. V. Tchekhovskoi, Abstracts of Papers, 230th ACS National Meeting, Washington, DC, United
States, Aug. 28-Sept. 1, 2005, CINF-060; Enhancement of the chemical semantic web through the use of
InChI identifiers, S. J. Coles, N. E. Day, P. Murray-Rust, H. S. Rzepa and Y. Zhang, Org. Biomol.
Chem., 2005, 3, 1832-1834. DOI: 10.1039/b502828k
6. Bringing Chemical Data onto the Semantic Web, K. R. Taylor, R. J. Gledhill, J. W. Essex, J. G. Frey, S.
W. Harris, and D. C. De Roure, J. Chem. Inf. Model. 2006, in press. DOI: 10.1021/ci050378m. The
Swoogle search engine http://swoogle.umbc.edu/ searches for metadata defined specifically using RDF.
7. Chemical Applications of the World-Wide-Web, H. S. Rzepa, B. J. Whitaker and M. J. Winter, J. Chem.
Soc., Chem. Commun., 1994, 1907. DOI: 10.1039/C39940001907; Hyperactive Molecules and the
World-Wide-Web Information System, O. Casher, G. Chandramohan, M. Hargreaves, C. Leach, P.
Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Trans 2, 1995, 7. DOI:
533577047
Page 20 of 20
10.1039/P29950000007. For a historical view, see our contribution along these lines to an early
CONFCHEM in 1996
8. The Use of the Free, Open Source Program Jmol to Generate an Interactive Web Site to Teach
Molecular Symmetry, M. E. Cass, H. S. Rzepa, D. R. Rzepa and C. K. Williams, J. Chem. Ed., 2005, 82,
1736; An Animated Interactive Overview of Molecular Symmetry, M. E. Cass, H. S. Rzepa, D. R.
Rzepa and C. K. Williams, ibid, 82, 1742. See also Webware:
http://www.jce.divched.org/JCEDLib/WebWare/collection/reviewed/JCE2005p1742WW/. DOI: None
defined
9. Mechanisms That Interchange Axial and Equatorial Atoms in Fluxional Processes: Illustration of the
Berry Pseudorotation, the Turnstile, and the Lever Mechanisms via Animation of Transition State
Normal Vibrational Modes, M. E. Cass, K. K. Hii, H. S. Rzepa, J. Chem. Ed., 2006, 83, 336. Webware:
http://www.jce.divched.org/JCEDLib/WebWare/collection/reviewed/JCE2006p0336_2WW/
10. A Computational Study of the Non-dissociative Mechanisms that Interchange Apical and Equatorial
Atoms in Square Pyramidal Molecules, M. E. Cass and H. S Rzepa, Inorg. Chem., 2006, in press. DOI:
10.1021/ic0519988.
11. On the go with CHM 125, ECON 210, PHYS 218, and BIOL 205: Coursecasting at a large research
university. J. R. Garritano, D. B. Eisert, Abstracts of Papers, 231st ACS National Meeting, Atlanta, GA,
United States, March 26-30, 2006, CINF-005.
12. Bioclipse; see http://www.bioclipse.net/
Download