Progress towards a Holistic Web

533577047 Page 1 of 20 Progress towards a Holistic Web: Integrating OpenSource programs, Semantic data, Wikis and Podcasts Henry S. Rzepa and Marion E. Cass A Contribution from Imperial College London and Carleton College, USA. Abstract The way educators typically use the Web to support their teaching in 2006 is arguably a regression from many of the ideals first anticipated in 1994. Time pressure, a reluctance to learn "difficult HTML", and pressure from the publishing industry has allowed the Web to retreat into "shrink-wrapped" black holes known as Acrobat files. An ever greater reluctance (by both authors and publishers) to appreciate the importance of deploying meta-data in a meaningful manner means that most often, these Acrobat files represent the bones lying in an information graveyard, stripped of any "reusability" and really fit only for printing (e-books have yet to take off in any significant sense). In our article (we hesitate to perpetuate the above by calling it a "paper"!) we discuss two particular themes. Firstly, how a greater emphasis on data capture and its re-usability, together with the use of opensource software such as the remarkable Jmol, can result in a much more meaningful and future-proofed way of presenting chemical knowledge to students. We illustrate this via two resources, one designed to introduce symmetry to chemistry students, the other a dynamical introduction to pseudorotation in fluxional molecules. Secondly, we address the issue of how to create holistic resources and to overcome the reluctance of stressed and pressured academics by discussing two recent phenomena, that of the "Wiki" and the "Podcast". The Wikipedia is perhaps the best known illustration of how a community can coalesce and produce something far greater than the sum of its parts.1 Podcasting, which seems to be taking off in chemistry, focuses on audio and video content, but seems divorced from other forms of content, and is currently rather less than holistic. Currently, these two broad themes about how the Web should evolve are more or less developing independently. The prospects of coalescence are discussed. Introduction. Is the current state of the Web what was intended: an Accumulation of Acrobat? What exactly has the "Web" evolved to in 2006? Its a difficult question, since it now means so many things to so many people. From a student's perspective, it is frequently synonymous with "Google", and very possibly a feeling of frustration at being unable to find high quality "trusted" or relevant information amongst the overwhelming mass of trivia. Google may also be a convenient avoidance of a rigorous search of available scientific data bases (perhaps databases of reviewed scientific materials). From many an educator's point of view, we suspect the Web is simply a convenient avoidance of the photocopier. They see it as providing an alternative and apparently easier mechanism for the delivery of "shrink-wrapped" printable lecture-sized nuggets of otherwise conventionally authored notes which accompany (but which must not compromise) the still traditional 50 minutes of oral lecture delivery. The motivation to the lecturer is to minimise the amount of time spent in "maintenance of the lecture", This means the adoption of mechanisms which require a minimal learning curve, and which incur little or no subsequent maintenance. How does many a time-pressured educator approach using the Web in 2006? Coerced by most publishers into learning the shrink-wrapping technology of Acrobat (the collective noun for which is an accumulation) and receiving this back in turn from essentially all Web-based scientific journals, it becomes easy to be lulled into thinking that re-use of this expertise for producing teaching and learning materials for students is an effective 533577047 Page 2 of 20 use of an educator's time. Implicit is the assumption that students too will also benefit from being introduced to this system at an early stage, both as consumers, and in turn as authors. Before moving on to consider how these objects are used, we would like to ask the reader to consider whether they have ever seen, or contemplated adding to, the menus shown in Figure 1 as part of the process of producing an Acrobat file. These relate to "metadata", a topic that will be returned to in more detail below (and which in colloquial terms is now referred to as the Next Intel Inside). 533577047 Page 3 of 20 533577047 Page 4 of 20 Figure 1: Menu pages in Acrobat 7.0 Professional relating to document information and meta-data for (a) a typical journal article, with the (b) the additional metadata opened and (c) the advanced selection inspected. The values (or lack of values) for the various fields are exactly as found for a typical journal article. Consider next the inevitable consequence of being bombarded by the Acrobat cannonball. As it happens, delivery of such an object often bypasses the Web altogether, being emailed by individual lecturers to students directly, or via intermediate tutors (a mechanism particularly favored by colleagues in the department of one of the present authors). A little more arduously, the material will be posted to a content management system such as WebCT (perhaps more accurately described as document management system if the content is largely still managed by Acrobat), in order to provide greater context for each document. The CMS also serves to provide an access control (or in the modern parlance, a Digital-rights-management) mechanism, with a parochial navigation tree for students to find their way to a document and its relatives. The net outcome of this use of the Web (or of email) is that the student acquires a copy of the Acrobat file in their own document space, committed from thence more often than not to a printer. Other Acrobat files come from sources such as journals, expedited nowadays by the recent introduction of digital-document-identifiers (DOIs).2 These provide 1-click access if embedded in Acrobat files or Web pages as a hyperlink. An example is given here:3 http://dx.doi.org/10.1021/ci010082q (note however that you will need valid access to the journal to acquire the reprint). The outcome is an increasingly large, and quite possibly disorganized collection of these objects (one 533577047 Page 5 of 20 author speaks purely from personal experience here!). During a typical degree, a student might nowadays accumulate perhaps 1000 such files; a researcher may have many more, perhaps 5000+. How does this collection represent the facts and knowledge pertaining to the bulk of a degree course, in say chemistry? The assemblage has some of the following properties:    The Acrobat files are likely to be distributed across a number of computer hard drives, memory sticks and mobile devices such as iPods etc. Aggregating and re-conciliating these is not a trivial task. The files are unlikely to have a obvious, consistent, or even meaningful naming convention. It is very unlikely that any document meta-data, illustrated in Figure 1, will have been added (the document illustrated above dates from 2002, but in fact inspection of 2006-vintage examples produces very similar results). Thus it is not obvious how even simple questions such as "is this document about chemistry?" can be answered without opening each document in a suitable reader and having a human read it. A "mouse-over" on an unopened document merely informs that it is an Acrobat document. A "right-click" to reveal the document properties reflects (Figure 2) merely fields which were left undefined in Figure 1. Figure 2: Document properties panes from Windows XP, illustrating known metadata for a document. One also notes that such meta-data relates to the document as a whole; any finer grained structure which the document may contain (such as reference to discrete molecular species) cannot be captured at this level.   Any relationships between two or more documents (such as one document citing the other, or two documents each covering the same concepts or terms) are only really definable via the keywords or title meta-data field, and given no real context. To achieve this one would have to read and understand each document. Chemical relationships (defined by say having one or more molecules in common) may be equally difficult to detect, even if individual documents are visually inspected, since molecules may not be 533577047 Page 6 of 20 represented in a consistent, and hence comparable, manner. Thus some may merely be named, either trivially or systematically, others may be drawn as unparsable line diagrams, and yet others represented generically using e.g. the Markush R conventions to denote a variable group.  Conceptual relationships (in chemistry), the so-called synoptic or holistic view of a subject, represent the ultimate challenge in perception. Normally, students would rely on a carefully and lucidly written text book, to achieve this sort of perspective across a subject. The issue with texts books, or other static organizations of knowledge, is that they are not readily extensible, and new information cannot so easily be related to existing organized knowledge. Lecture handouts, it is true, can have an appended reading list, this indeed being part of the function of a lecturer giving a course, but it is still hard work actually making the semantic connections between diverse sets of materials. Perhaps a chemical analogy to this state of affairs can be drawn. The hard drive would be equivalent to a container filled with argon gas. Being "inert", the individual atoms of this gas hardly interact with one another, and apart from a count of the total number of atoms, little further structure to the collection can easily be discerned; molecules are certainly unlikely to form under these conditions, and nor will it be easy to crystallize this collection into something with a well defined structure and internal relationships. Partial solutions to some of these issues do exist. Thus;     One might rely on the likes of Google (or the rather more selective Google Scholar4) to perform a freetext indexing of the content of any information object at its original site of publication, but, as noted above, with the limitation that heavy contamination with less relevant sources is likely. It is unlikely that the Google robot would be allowed to access protected content in a content/document management system which probably has its own parochial index and search facility. This latter could not achieve the holistic overview required. If the content can be consolidated to a single computer hard drive, "desktop searching tools" can now be acquired to index the free text-based contents, and thus to establish some sort of holistic relationships between documents from diverse sources. Such relationships will however rely on finding unique and common descriptors of suitable concepts in the body of the document. In the example we illustrate below, one might find terms like Berry pseudorotation and non dissociative ligand exchange which describe the same concept, but in fact have no words in common. Higher order concepts, for example requiring an understanding of three dimensions, may be definable only with great difficulty. Describing, uniquely, the actual processes involved in a Berry pseudorotation using only simple words and definitions, is quite a challenge! In our article10 we introduce two other modes for this process, the "lever" and the "turnstile" modes, and just to complicate matters, show how some systems can have attributes of all three. If you are not familiar with these terms, you are unlikely to understand what is meant by them by reading just the preceding text. Any indexing/search system might make a connection between these various concepts, but the outcome is likely to be quite variable in general, and may depend on how effective any dictionary of synonyms available to the indexing engine is. It is possible to achieve greater structure to a document collection using bibliographic tools such as Reference manager or EndNote, but this requires a great deal of effort on the part of the user. The lack of self-identifying metadata, as noted above for a typical document, makes this process relatively manual and arduous, and one few students would assiduously adopt over a degree course. 533577047 Page 7 of 20 Data is key to the Holistic Web (Data is the new Intel Inside) Metadata; The Dublin Core The emptiness illustrated via Figures 1 and 2 emphasizes how neglected meta-data is. Simple fields such as document title, author names, dates, and keywords are all too frequently omitted from all kinds of documents. This may be because the authors of such documents regard these fields as self-evident to anyone reading them. This misses the point entirely. If indeed one is in possession of 1000, or even worse 5000 documents, merely opening them each would take 27 hours in the latter case (assuming 20 seconds per document). As it happens, computers are rather good at doing this automatically; appropriately declared meta-data is simply a standard mechanism for exposing this information to a computer (rather than just a human), thus enabling more efficient subsequent indexing and retrieval. The so-called Dublin-core (DC) meta-data schema ensures crosscompatibility across a diversity of programs and documents. To drive this point home, we here declare our manifesto, the first point of which is; 1. Wherever possible, meta-data based on the Dublin Core should be declared in any document circulated to students. Molecular Metadata: InChI Unlike the bibliographic-derived DC metadata schema, general chemical metadata schemes are relatively underdeveloped. However, one particular type is worth noting at this stage; the so-called InChI identifier.5 This International Chemical Identifier enables a unique descriptor to be derived for a molecule, based purely on how the atoms are connected (but not specifying the type of bond that connects them). Although it is difficult to be precise, we estimate that in a typical chemical degree, students may come across around 1000 unique molecules exemplifying some aspect of their lectures or laboratories. Many of these probably recur in more than one context. If each of these were to be labeled with an InChI, then detecting instances of each molecule would be greatly facilitated. An InChI string can be embedded into documents in various ways. Thus in an HTML page, it might be found as such: <link type="chemical/x-cml" rel="meta" href="http://www.ch.ic.ac.uk/local/symmetry/symmetry/exercises/12GW98.cml" title="InChI=1/CHBrClF/c2-1(3)4/h1H/t1-/m1/s1 for (S)-CHBrFCl)" /> or <link type="application/rdf+xml" rel="meta" href="http://www.ch.ic.ac.uk/rzepa/bpr/YX5.rdf" title="InChI identifiers for YX5 species" /> This latter approach was adopted in adding molecular meta-data to our symmetry exercises web resource, described below. 2. Wherever possible, molecular metadata based on the InChI identifier should be declared in any document circulated to students. Metadata: Resource Description Framework 533577047 Page 8 of 20 The type of metadata described above is a relatively one-dimensional descriptor, it merely states that somewhere in the document it relates to one can find e.g. a particular type of data or information. It provides for little more context than this. A much more powerful form of metadata has been developed known as RDF, or Resource Description Framework, in which each item actually has three components. These are known as the object, which makes an assertion or predicate about a subject. The predicate imparts a powerful context to the information, and both subject and object can exist either within a document, or independently of it. More interestingly, the subject of one assertion can be the object of another. In theory at least, such a declared mechanism could provide a powerful method of creating links between different types of information; a process which has been referred to as intertwingling. Until now, such mechanisms have hardly been exploited in chemistry, but they could form the basis for one holistic approach to the subject. For example, close inspection of Figure 1C reveals an entry referred to as XMP Core properties. This is the way in which Acrobat can contain an RDF declaration (known technically as a triplet). The purpose of carrying it in this manner is that it can now be queried by any system wishing to find out about the document properties; it is self-identifying metadata. To refer back to the metaphor introduced of a container of argon gas, RDF can introduce strong intermolecular interactions! Perhaps even bonding. The collection now becomes much easier to crystallize. For further detail of such a system, see reference 6. 3. In future, relationships between information expressed as RDF or other information triples should be declared in any document circulated to students. Data and the Means to display it What better means to complement metadata, than to have access to the actual data itself! Such data can be merely referenced within the document, or it can become an integral part of the document itself, in which latter case we would use the term datument instead. Most participants at this conference are entirely familiar with such concepts, which date back to 1994 when they were first introduced7 and which have no doubt been the subject of many previous articles and discussions at this forum. The particular topic of the use of XML in this context will be deferred to the discussion of podcasting below. Here the focus will be on how data can be used, on the presumption that operations rather more complex than merely printing are now required. Here, we leave the realm of Acrobat almost entirely, and enter that of the (hyper)active, re-usable document. Molecular Structures and Animations + OpenSource: Jmol and Molecular Symmetry Exercises Several of the themes introduced above can be illustrated via our project to illustrate molecular symmetry. The premise is that this topic is particularly difficult to teach without the help of three-dimensions, either real or virtualized by movement (or indeed stereoscopic computer visualization). Clearly, an additional component must be supplied in addition to the data and its context that comprise a datument, namely some suitable software to visualize the combination. Enter the remarkable Jmol opensource software. This owes its origins to two software packages. The first was called Xmol, which had been developed in the 1980s to allow users of national supercomputer centers to visualize the results of their molecular calculations. The second was called Rasmol, and it had been developed around 1990 to visualize the results of protein crystallography. Both were early examples of OpenSource software, in that source codes were available for new generations of enthusiasts to adopt. In 1996, Xmol evolved with such a new generation into Jmol (with a new set of developers), this now being a Java-based reworking of the program; its OpenSource credentials were emphatically retained however. Rasmol, a little earlier in 1995, had been adopted by a commercial vendor and reworked as a so-called browser plug-in known as Chime. Significantly, this re-working was never released as OpenSource, and it is noticeable that whilst Chime has been little enhanced and extended recently, Jmol goes from strength to strength. Our first effort at producing Web-based symmetry exercises had in fact been based on Chime, but we decided to seize the opportunity of adopting an OpenSource presentational method and have reworked it using Jmol. 533577047 Page 9 of 20 We also took the opportunity to incorporate some of the themes discussed above.8 DC metadata and InChI identifiers were added so that the molecules used to illustrate the symmetry operations could be readily identified and indeed indexed and searched for (Figure 3). Figure 3:. Introductory screen for the Molecular symmetry exercises, viewable at http://teaching.ch.ic.ac.uk/symmetry/ Molecular animations of pseudorotations 533577047 Page 10 of 20 Our second example9 arose from a not untypical scenario. A member of the teaching staff in one of our institutions was taking a sabbatical, and someone else was asked to give the course for one year only. One component involved discussing with the students a phenomenon known as pseudorotation, along with associated apparent modes called turnstile rotation, and a further obscure mode referred to as an umbrella motion. All they were given by way of explanation was a rough sketched diagram of the former, and a reference to a journal article for the latter. The surrogate lecturer approached us asking whether we could devise a Web page to illustrate these three effects for students (and actually for the lecturer themselves in the first instance, Figure 4). Figure 4:. One of the interactive figures produced for the pseudorotation article; viewable at http://www.ch.ic.ac.uk/rzepa/bpr/esi/ 533577047 Page 11 of 20 We deliberately couched this to have the appearance of a journal article (which indeed it spun out into10), in part to emphasize to students what a holistic integration of journals and teaching materials might look like. But there were other outcomes of taking this approach. 1. To gain an accurate representation of the process, we undertook ab initio calculations for our systems. This held some surprises. 2. Thus for the molecule PF5, we could find no evidence of turnstile rotation, or indeed any umbrella motion. The latter in particular we think is merely a speculation, for which no explicit examples have ever been identified. Even the former mode is unproven for PF5. 3. Seeking to generalize the effect for students, we approached IF5. During the course of viewing the Jmolgenerated animations of this species, we concluded it approximated to an "upside-down" inverted version of pseudorotation. We struggled to find a name for this mode, and observed that depending on the angle the vibration was viewed from, it magically assumed differing characters. Finally, we settled on "chimeric pseudorotation". Chimeric stems from Chimera, the mythological Greek demon of the mixed and "monstrous" character of a goat, snake and lion. Recently, chimeric has become "a byword for fabulous and fantastic - but utterly mixed-up - ideas, in this case ligand exchange via Berry pseudorotation, turnstile rotation, and a new mode called lever rotation. Ligand exchange in IF7 proved equally challenging to describe using only words. The point of this discourse is that in extending how the concept of pseudorotation could be presented to an audience of students seeing it for the first time, we had rapidly discovered new and magical modes which had hardly been described in the chemical literature (and in which two well known text books actually made fundamental errors of description of these phenomena). By presenting the data from calculations of these effects in such an animated manner, even these quite new and conventionally difficult concepts may become more accessible for students, and staff alike. 4. Further surprises were still in store for us. We found that the related BrF5 could sustain not one but two distinct modes for interchanging the F atoms. We eventually found a convenient label for these two modes in a quite different area of chemistry; that of the stereochemistry of Pb(II) compounds. Such systems exhibit both hemidirected and holodirected coordinations. We had thus joined one aspect of the stable structures of lead systems with the modes adopted by the fluxional transition states of bromine molecules. A lot of concepts in chemistry have been covered by the last few sentences. Metadata mechanisms such as RDF would nicely capture these connections. Whereas we discovered the connection in part by accident (one of us attended a scientific conference and came across a student poster describing Pb(II) systems, which, some months later, "clicked" as they say), perhaps in the future the expression of these concepts as espoused above will allow a more systematic and less accidental mode of discovery! 5. Convinced that some of these insights were sufficiently novel to be worth publishing a formal research article, in our first submission we made the tactical mistake of referring to the pedagogic stimulation for the discoveries (a referee disparaged the work on this basis). A rewriting which avoided such mention was more successful; indeed the article has been selected by the journal for "enhancement" in the sense that the data and resulting animations are now to be incorporated into the mainstream article, rather than merely being downgraded to "supporting information".10 In this sense, the holistic aspirations of this work have been recognized. However, we look forward with some anticipation to receiving the Acrobat version of the eventually published article (see Figure 1)! In particular, a unique feature of the "enhanced" resource is the potential to "re-use" the data contained in it. Figure 5 shows how 3D coordinates for a molecule displayed using e.g. Jmol can be extracted for use in possibly another context. Acrobat is poorly suited for such data-mining, although the presence of meta-data at least indicating its existence elsewhere might be a workable alternative. 533577047 Page 12 of 20 Figure 5:. Interface for capturing 3D coordinates from a Jmol display. Towards the Holistic Approach The Podcast The metaphor of student learning resources merging with a journal article is one means of achieving a holistic effect. But this neglects the audio-visual components of teaching with which we are all familiar. We were intrigued by the possibility of somehow combining the two approaches. A few years ago, the newsfeed (more formally known as the RSS feed) started becoming popular. In its original incarnation, it carried only text, in the form of meta-data describing a recently created or changed resource on the Web. This alerting mechanism has recently been adapted to cover broadcast and audio-visual materials, and in this guise has become known as a podcast. A few adventurous souls have adapted existing taught courses and recast them as podcasts. Apple Computer has produced a software solution, iTunes, which allows such podcasts to be captured on a portable storage device known as an iPod (from which the name derives). They have extended the concept by producing a customised version of iTunes known as iTunes U which allows its use in a university environment for 533577047 Page 13 of 20 podcasting lectures. In effect it is a content management system for audio-visual materials; a selection of chemistry podcasts available in their "store" is shown in Figure 6. Figure 6:. The podcast page in the Apple iTunes store, filtered using the search term chemistry. Our entry into Podcasts started with two observations. Firstly, the information in a podcast is defined using a language known as XML. This is essentially a protocol for extending the well known HTML into more specific and accurate ways of capturing information. An XML document can contain several XML-based languages, each distinguished by a so-called namespace. The basic podcast for example contains two namespaces; the first is the RSS 2.0 syntax, which provides mechanisms for linking in basic meta-data (title, description, author etc) and also defines a way of expressing an <enclosure /> which is a basic audio or video file. The second is the proprietary <itunes: /> component, which handles more specific meta-data (relating to properties which the iTunes software needs to know about). Importantly, these two components, one "open", the other not, co-exist 533577047 Page 14 of 20 quite happily together; neither destroys the other. With this premise, it becomes trivial to add further namespaces. We added a <cml:>...</cml:molecule> namespace, into which chemistry can be poured. Again, its presence is quite benign, and it is simply ignored by programs such as iTunes which do not recognise it. Importantly, it can quietly lurk in the podcast, and when software which does recognise its presence receives the podcast, it can then spring to life. For good measure, further namespaces such as <mathml: /> (for mathematical content) and <svg: /> (for graphical content) can be inserted. An example of this can be found at http://www.ch.ic.ac.uk/video/index.rss. The podcast viewed in this manner becomes an interesting vehicle for accurately expressing a variety of types of information and data in a single document, which can be automatically delivered to either a viewer of some sort, or a storage device on a regular basis. It can be processed in quite a few ways. Most trivially, it could be viewed simply in a Web browser. If you would like to view the above in e.g. Firefox, you should be able to see some of the embedded components (particularly the MathML and the SVG). iTunes would be used to play any audio and visual components. A system known as Bioclipse11 can extract the molecular content. The list of applications is potentially endless! A glimmer of a how a holistic approach to the Web might be achieved is starting to be seen. One new variation on the Podcast rather spoils the above modular approach. Introduced by Apple into its iTunes software, the enhanced podcast allows visual materials to be integrated into the timeline of an audio stream. The intent was clearly to enable reproduction of a conventional lecture, which obviously contains these two components. The mechanism adopted for this is to multiplex the visuals directly into the file containing the audio components (an MPEG 4 file), and add a timetrack to the MPEG file which indicates start and end timings for each visual component. Each inserted component can also have an associated URL which can point to further materials. This is illustrated in Figure 7, being the editing presentation of one program that enables all this (Podcast Maker). 533577047 Page 15 of 20 Figure 7:. Interface for creating an enhanced podcast. Each episode contains a single audio file, with defined timepoints at which visual material (normally a JPEG diagram, but it can also be an Acrobat file) is introduced as a so-called chapter. Each chapter can have a title which will be displayed, and a hyperlink which can be displayed in a Web browser. 533577047 Page 16 of 20 Is the Current Podcast Genre Holistic? Whilst this relatively simple one step forward for the podcast mechanism is easy to author, and to view in iTunes, in other ways it also represents one step back. The additional content is buried inside a binary file, and once in, cannot be extracted out again. Worryingly, it cannot be indexed, and hence searched for. It has no meta-data of its own. Given this mechanism is very new (having been introduced only in late 2005), we might presume it is still quite immature, and that some of these failings will in time be addressed. At any rate, as teachers we have an interesting new way of delivering audio, video, and lecture notes in a smartly integrated manner, with hooks for attaching chemical content to these. Let us however inject a controversial note at this stage. A perusal of iTunes-U Podcasts with alleged chemical content (April 14, 2006) reveals most of them to be in the "easy to produce" audio-only format, with no visual or other enhanced content. In one example (we will not identify it), one is first treated to around 3 minutes of introductory noise from the audience, before a speaker introduces themselves. Thereafter, we can only hear the sound of chalk being applied to a blackboard for a further minute or so. The only way to find out the content is to either have the detailed course timetable to hand, or to endure the four minutes of "noise" in the hope of eventually hearing something worthwhile. A suspicion must be that if this level of (chemical) quality persists, the podcast will rapidly join other technically clever, but eventually irrelevant technologies in the dustbin of neglect. There are no "quick and easy" solutions to this issue; one always gets what one pays for (or at least invests the time in)! The Wiki as a Collabulary The final approach to be discussed here is the Wiki (exemplified here by one particular flavour of Wiki, the WikiMedia. Other types of Wiki implement some of the features described below slightly differently). In some ways, this mechanism (a representative of what is often colloquially referred to as Web 2.0) is a substantial simplification of the original Web concept (which would have to be Web 1.0) and in other ways is arguably a welcome return to what Berners-Lee always envisaged as the true objective of the Web. The two main differences to a conventional Web page are: 1. The basic authoring syntax is simpler than HTML. Whereas the latter has around 60 elements, which carry the basic information, the Wiki reduces this to a smaller number of simpler rules (these can be summarised in a one page manual, supplemented by templates, of which sadly, none exist for chemistry). There is however a smart processor (php) in the background that interprets these rules and stores the result in a proper (MySQL) structured database. In HTML, any unclosed or improperly constructed element will result in a severe error; in Wiki it either works properly or it is not a container, and there is no equivalent of a broken element or container. 2. This simplified approach in turn allows a Wiki to move away from the write once, read many conventional approach to a more symmetric write many, read many metaphor. The write many can be either the entire world, or constrained to say a local subset of students and teachers in order to address institutional concerns about legality etc. For anyone interested, the latter can be easily implemented in a Wiki by adding the (suitably localized) following lines to the LocalSettings.php file; require_once('LdapAuthentication.php'); $wgAuth = new LdapAuthenticationPlugin(); $wgLDAPDomainNames = array( "ic.ac.uk" ); $wgLDAPServerNames = array( "ic.ac.uk"=>"ldap.ic.ac.uk" ); $wgLDAPSearchAttributes = array( "ic.ac.uk"=>"uid" ); $wgLDAPBaseDNs = array( "ic.ac.uk"=>"ou=everyone,dc=ic,dc=ac,dc=uk" ); $wgLDAPUseSSL = true; $wgLDAPUseLocal = false; 533577047 Page 17 of 20 $wgLDAPAddLDAPUsers = false; $wgLDAPUpdateLDAP = false; $wgLDAPMailPassword = false; $wgLDAPRetrievePrefs = false; $wgMinimalPasswordLength = 1; A Wiki is therefore very much a collaborative medium (a collabulary) and the database driven approach tends to reduce the error rate in the underlying syntax to a minimum. One of the more interesting features is arguably neither of the above, but the mechanism which is adopted for Linking. As an author, if you wish a word, phrase (or image) to signify a link, the term is simply enclosed in brackets thus: [[electrocyclic reactions|A class of pericyclic reaction]]. The syntax of the this example indicates a hyperlink to another page on the same Wiki with the name electrocyclic reactions, and with an informative title following the | character (which in effect is treated as a simple metadata term for the hyperlink). The author does not need to know the location of this page, the system determines it automatically. If in fact the page does not exist, selecting the link will forward the reader to a blank page, whereupon they are invited to then become its author! This Janus-like role for the Wikipedian is entirely novel, and could potentially revolutionise the relationship between teacher and student. A second example is also interesting; [[organic:electrocyclic reaction|A class of pericyclic reaction]]. The word preceeding the : character defines a category. It provides a means of collecting groups of terms into a small communal space, within which all participants can more or less agree upon the terms they will use. Another way of thinking about it is that the community has defined a dictionary of terms which they agree to call organic. This type of link functions in a similar manner to the (XML) namespaces we noted in the paragraph on podcasts above. A further (very loose) analogy can be drawn to the RDF discussion above. Each link originates from an object page and defines the subject page, with a predicate specified by the term following the | character. Viewed this way, the Wiki incorporates (admittedly in a much less formal manner) many of the concepts already discussed above. Indeed, a more formal integration of RDF into MediaWiki has already been done, and can be seen implemented in our own Wiki experiment. Another outcome of the use of categories (and sub-categories) is that the structure of the collection can be expressed via a table of contents; it also goes without saying that a full searchable index is automatically available. Finally, in this overview of the Wiki, we note that under the skin, the technology uses very sophisticated Webstandards, and in particular extensive use of Stylesheets. This means that for each page of a Wiki, one has the option of producing a printable page, also suitable for archiving onto CD-ROM etc. At least in part, this might address the concerns of e.g. Mark Bishop, who asks whether chemistry instructors and students ready for an Internet-based chemistry text? The Wikipedia The preceeding discussion about categories is very highly developed in the best known incarnation of a Wiki, the Wikipedia.1 Anyone thinking of designing a Wiki-based course is well advised to become familiar with the Wikipedia; to become in fact a Wikpedian (of which there are already apparently more than one million), or even say a Wikipedian chemist. Doing so enables one to learn about the sociology of how to create/create and edit articles, and shorter forms of articles known as stubs, and how trust in this extraordinary environment works. The taxonomy of a subject is handled using Wiktionary and associated thesaurus. Its also apparent that the current Wikipedia concept is evolving very rapidly, with new concepts emerging so it seems on almost a daily basis. Becoming a master Wikipedian is no longer an afternoon's work! Using the Wikipedia approach for Synoptic Course Creation? A number of issues arise from the Wiki concept. 533577047 Page 18 of 20 1. Should one create courses using a local, carefully controlled Wiki, or strive simply to extend the global Wikipedia (or both)? This may boil down to how one views Open Access. Wikipedia is freely available to all; some may regard university courses as suitably available only to those who have paid the fees! For the record, we pin our flag firmly to the Open Access mast, together with the use of institutional repositories. 2. Should one or more Wikipedia articles map onto the conventional 50 minute delivered lecture? 3. If not (as seems probable), then how might the two relate to each other? 4. Is the approach adopted in the article you are presently reading perhaps the way to go (whereupon links to the Wikipedia are scattered throughout the linear discourse here)? Wikipedia of course takes are rather less linear approach. 5. On a more technical level, could the two chemical stories above (Figure 3, 4) be recast in their entirety into a Wiki (tools forHTML2Wiki conversion do exist). The simple answer to this last question we think is: yes, but not as we know it! This is because both our course examples require the Jmol applet to render the molecule files into an on screen representation. This in turn conventionally requires an HTML element such as <object /> (or the older <applet />) to be used. Currently, although Wiki(Media) does support the insertion of some HTML elements, neither of these two is included in the supported list. Instead, the relevant calls must be made at the PHP code level. The Wikipedia chemical community is growing, and currently appears to consist of at least the following groupings: 1. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemistry 2. http://en.wikipedia.org/wiki/Portal:Chemistry 3. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals Its becoming obvious that a more holistic view of these diverse projects might be useful (perhaps similar in intent to the folksonomy), although it would need to be generated at least in part by automated means (a bot) rather than just human editorialising. One means of doing this might be to embed an InChI identifier for any Wiki page that includes a molecule. A bot could then trawl the Wiki(pedia) and be able to conclusively identify any molecules it finds. An example of a Wiki(pedia) page with such annotation can be for mauveine and safranine. These both mention the same molecule (although they may not necessarily link to each other), but have an InChI in common. This brief discussion of the Wiki only covers a few aspects of this system, and its sociology; it may be that other features overlooked here are equally relevant to the chemical community, and the present authors welcome other perspectives on this and the other systems mentioned here. Many of the more controversial aspects (such as trust and quality) are discussed in the various articles cited in reference 1. Conclusions The Web version 2006 is a multi-faceted organism, with many attributes. Here we have covered a number of these facets. One, namely the headlong rush of the world's scientific content into bundles (black holes?) known as Acrobat files, we argue, has been badly thought out in terms of how these objects may be subsequently used and integrated into a more holistic whole. Publishers and authors alike have solutions in their own hands; they should start taking metadata very seriously, and start adding it to Acrobat files in a manner which is truly open (exposed) and queryable by any software. With time, increasingly finely grained metadata, including molecular metadata, would be expected from them. More sophisticated applications of meta-data such as RDF perhaps hold the potential to achieve much more robust structures from our knowledge. 533577047 Page 19 of 20 The XML-based podcasting models provide interesting new mechanisms for closely associating audio and video content with more traditional text-based content delivery. It is clear however that the current implementation has some undesirable characteristics common to Acrobat, i.e. of bundling up the information in a manner which can loose it to search engines, and inhibits its re-usability, and the lack of formal mechanisms to add metadata to the "enhanced" components. The Wiki, another newcomer to the block, provides a more natural way in which authors of material can integrate it within a bounded discipline. The Wiki also provides a more symmetric medium, in which students as well as lecturers can participate, filling in gaps perhaps as part of projects of their own. It also has holistic aspirations. To quote from the section on meta-data: "Finally, RDF and the Wiki-principle are a perfect match. The take-off of the semantic web is slowed down by the need for trust. Anybody can write information on the web, but it is hard to see which information is indeed correct. Wikipedia has shown, that the collaborative editing of articles leads to better quality and less disinformation. (A sign for the reliability of Wikipedia is that its articles very often appear as the first search result in Google.) Therefore, also RDF data in the Wikipedia is more trustworthy than on nonfamous web pages. The second Wiki-characteristic which makes it suitable for RDF is that anybody can easily contribute. Contributing information is almost as easy as getting information since the dawn of the WikiWikiWeb." Which of these various media will prove popular with educators, and which will be adopted by only a tiny fraction of enthusiasts, only time will tell. We can be sure however that the rate of innovation which the Web over the period 1994 - 2006 has demonstrated is possible, will almost certainly continue. In five years time, a Web age, it is to be hoped that the trend will be towards taking a holistic, synoptic and Open-oriented view of a subject (summarised by the concepts espoused by Web 2.0), rather than an headlong rush into protectionism, proprietary standards and software, and heavy-handed digital rights management. References 1. Internet encyclopedias go head to head, J. Giles, Nature, 2006, 438, 900-901. DOI: 10.1038/438900a. See also Wikipedia: Social revolution or information disaster? M. A. Walker, Abstracts of Papers, 231st ACS National Meeting, Atlanta, GA, United States, March 26-30, 2006, CINF-007; Wiki ware could harness the Internet for science. K. Yager, Nature, 2006, 440, 278. DOI: 10.1038/440278a 2. Digital object identifiers for scientific data, N. Paskin, Data Science Journal, 2005, 4, 12-20. DOI: 10.2481/dsj.4.12. 3. Electronic Chemistry Conferences: 7 Years of CONFCHEM, B. M. Tissue, S. E. Van Bramer, and D. Rosenthal, J. Chem. Inf. Comput. Sci., 2002, 42, 23-25. DOI: 10.1021/ci010082q 4. http://scholar.google.com/. 5. InChI: Open access/open source and the IUPAC International Chemical Identifier., S. R. Heller, S. E. Stein, D. V. Tchekhovskoi, Abstracts of Papers, 230th ACS National Meeting, Washington, DC, United States, Aug. 28-Sept. 1, 2005, CINF-060; Enhancement of the chemical semantic web through the use of InChI identifiers, S. J. Coles, N. E. Day, P. Murray-Rust, H. S. Rzepa and Y. Zhang, Org. Biomol. Chem., 2005, 3, 1832-1834. DOI: 10.1039/b502828k 6. Bringing Chemical Data onto the Semantic Web, K. R. Taylor, R. J. Gledhill, J. W. Essex, J. G. Frey, S. W. Harris, and D. C. De Roure, J. Chem. Inf. Model. 2006, in press. DOI: 10.1021/ci050378m. The Swoogle search engine http://swoogle.umbc.edu/ searches for metadata defined specifically using RDF. 7. Chemical Applications of the World-Wide-Web, H. S. Rzepa, B. J. Whitaker and M. J. Winter, J. Chem. Soc., Chem. Commun., 1994, 1907. DOI: 10.1039/C39940001907; Hyperactive Molecules and the World-Wide-Web Information System, O. Casher, G. Chandramohan, M. Hargreaves, C. Leach, P. Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Trans 2, 1995, 7. DOI: 533577047 Page 20 of 20 10.1039/P29950000007. For a historical view, see our contribution along these lines to an early CONFCHEM in 1996 8. The Use of the Free, Open Source Program Jmol to Generate an Interactive Web Site to Teach Molecular Symmetry, M. E. Cass, H. S. Rzepa, D. R. Rzepa and C. K. Williams, J. Chem. Ed., 2005, 82, 1736; An Animated Interactive Overview of Molecular Symmetry, M. E. Cass, H. S. Rzepa, D. R. Rzepa and C. K. Williams, ibid, 82, 1742. See also Webware: http://www.jce.divched.org/JCEDLib/WebWare/collection/reviewed/JCE2005p1742WW/. DOI: None defined 9. Mechanisms That Interchange Axial and Equatorial Atoms in Fluxional Processes: Illustration of the Berry Pseudorotation, the Turnstile, and the Lever Mechanisms via Animation of Transition State Normal Vibrational Modes, M. E. Cass, K. K. Hii, H. S. Rzepa, J. Chem. Ed., 2006, 83, 336. Webware: http://www.jce.divched.org/JCEDLib/WebWare/collection/reviewed/JCE2006p0336_2WW/ 10. A Computational Study of the Non-dissociative Mechanisms that Interchange Apical and Equatorial Atoms in Square Pyramidal Molecules, M. E. Cass and H. S Rzepa, Inorg. Chem., 2006, in press. DOI: 10.1021/ic0519988. 11. On the go with CHM 125, ECON 210, PHYS 218, and BIOL 205: Coursecasting at a large research university. J. R. Garritano, D. B. Eisert, Abstracts of Papers, 231st ACS National Meeting, Atlanta, GA, United States, March 26-30, 2006, CINF-005. 12. Bioclipse; see http://www.bioclipse.net/

Progress towards a Holistic Web

Related documents

Products

Support

Progress towards a Holistic Web

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib