the web lecture archive project - Indico

S. Goldfarb, University of Michigan, Ann Arbor, MI 48109, USA
J. Herr, University of Michigan, Ann Arbor, MI 48109, USA
H. A. Neal, University of Michigan, Ann Arbor, MI 48109, USA
K. M. Storr, CERN, 1211 Geneva, Switzerland
The Web Lecture Archive Project (WLAP) is a joint
project between the University of Michigan and CERN
[1] that has recorded, archived and published web-based
lectures and tutorials for the ATLAS Collaboration, the
LHC, CERN, and the University since 1999. This paper
presents an overview of the project, its history,
achievements and current projects.
In this context, we provide a brief technical description
of web lectures, how they are constructed and published,
and the content they typically provide to the viewer. We
focus on issues specific to lectures recorded for HEP
(High-Energy Physics), the university classroom, and
large-scale conferences. Finally, we describe current
projects designed to address these issues and discuss
possible future directions for the field.
Project History
WLAP [2] was launched in 1999 as a pilot project [3]
coordinated by the University of Michigan ATLAS
Collaboratory Project [4] and sponsored by the U. S.
National Science Foundation to investigate the usefulness
and feasibility of recording and archiving web-based
electronic lectures for a HEP (High-Energy Physics)
audience. The initial project focused on recording lectures
under a variety of circumstances, including educational
lectures in an auditorium setting, plenary meetings, and
tutorials. Electronic archives, in which the slides of a
presentation are synchronized to the video of the lecturer,
were constructed using the Sync-O-Matic application [5]
of Charles Severance, one of the founders of the project.
The prestigious CERN Summer Student Programme [6]
lectures were chosen as an initial test bed, along with
plenary meetings and software tutorials of the ATLAS
Collaboration [7]. No formal program was specified, as
the goal was to evaluate the recording and archival
methods, as well as the quality and value of the product.
So, additional opportunities, such as CERN seminars,
colloquia, and tutorials on LHC software projects were
commonly included in the tests.
The success of the pilot project [8] was immediate and
evident, based on the positive feedback concerning the
posted lectures and an abundance of requests for more
ATLAS and CERN recordings. The University of
Michigan, ATLAS, and CERN Academic and Technical
Training launched follow-up programs to record and
publish web lectures, and our partnership was born, with
technical support from CERN IT.
In the next few years, web-based repositories were
brought on-line at CERN [9] and at the University [2] to
host the growing supply of lectures. The project team has
gained tremendous experience since that time, adapting to
the ever-increasing demand and providing production
services, complementary to the research. We have
applied this experience to advances in the technology and
methods used for capturing, building and publishing
lectures, which we describe briefly here.
Technological Achievements
The large-scale production of web lectures (and limited
resources) drove the project team to seek automation at
nearly every step of the process. This led to the natural
development of new recording and publishing techniques,
which were written up, along with a description of the
project, in 2001 [10].
Several of the more important technological advances
that have been achieved or that are currently in
development include:
 hardware and software solutions to automate the
encoding, compression and synchronization of audio,
video and slides;
 a proposed XML standard, called Lecture Object, to
facilitate the archiving and sharing of multimedia
presentations in an open fashion;
 a robotics camera tracking system to remove the need
for a camera operator in tracking speakers;
 software to harvest text from captured slides,
associating the resulting metadata with relevant
sections within a lecture and radically improving
search capabilities.
The Lecture Object standard was first proposed in 2001
[11]. A complete description of the robotic camera project
is presented as a separate contribution to the proceedings
of this workshop. Both projects are discussed in more
detail below.
Current Activities and Research Focus
Our group regularly records ATLAS, LHC, CERN and
University of Michigan events, hosting a significant
archive of hundreds of lectures for these communities,
while simultaneously benefiting from each recording as a
test bed for newly developed technologies. Over the
years, our program has included the recording and
publishing of web lectures for:
 ATLAS Collaboration plenary sessions, workshops,
physics and computing tutorials;
 general CERN and LHC computing seminars and
major events at the University or at CERN (seminars
from recent Nobel prize recipients, etc.);
 CERN academic and technical training seminars and
 Fermilab software tutorials;
 University of Michigan Saturday Morning Physics
talks, from 2001 until the present;
 2005 International Conference on Systems Biology at
 University of Michigan Medical School Grand
Rounds talks;
 American Physical Society conferences.
Each of these programs has presented us with new
challenges, defining and motivating our research.
While WLAP was one of the first collaborations to
create large-scale web lecture archives, there now exists a
large and growing number of dedicated web archives, as
well as a variety of software applications available for
publishing lectures on the web. Lecture archiving is
rapidly becoming a common – if not yet standard –
procedure for conferences, tutorials, and even the
WLAP has focused its efforts on improving the
technology to handle the specific requirements of largescale global collaborations, such as ATLAS and the LHC,
as well as the academic environment of the university
classroom. In the text that follows, we describe the
current status of the technology and present several of our
ongoing projects designed to address these particular
challenges. We start with a brief description of the
lectures, themselves.
Description of a Web Lecture
An electronic web-based lecture, in general, provides
the audience with the audio and video of a speaker
making a presentation, together with a view of supporting
documents or other material, such as slides, notes, or even
screen captures. The supporting material can be presented
as images or video streams, often synchronized to the
speaker or incorporated into the video of the speaker.
WLAP lectures present the viewer with a video of the
speaker making the presentation, high-quality audio, and
a larger, high-resolution image of the slides, screen
capture, or other material synchronized to the
audio/video. Slides change automatically, for example, as
a speaker reaches the corresponding part of the
presentation. In addition, supporting material is indexed
and tagged with metadata to facilitate searches and
random access to any location in the lecture. A screen
capture of a WLAP lecture is presented in Figure 1.
Example WLAP lectures can be viewed or downloaded
from the archive [2]. Viewing of the lectures requires
only a web browser and the free Real Player video plugin, and works on any modern platform.
Lecture Recording for HEP
The size and globally distributed nature of modern HEP
collaborations, such as those of the LHC experiments,
impose unique requirements and constraints on lecture
archiving, as well as on collaborative tools, in general. A
complete summary of these requirements is documented
in the LCG (LHC Computing Grid) RTAG (Requirements
and Technical Assessment) report on Collaborative Tools
[12]. Concerning material, our experience is that there is
significant demand for the recording of collaboration
plenary sessions, training seminars, and tutorial sessions.
Recording lectures in each of these environments presents
a variety of challenges.
Figure 1: Screen capture of a typical WLAP web lecture.
Archiving of collaboration plenary sessions provides a
permanent record of key stages of the planning and
decision-making process of a major collaboration. If such
sessions can be recorded and made available in a quick
manner, however, the archive can also serve as a
communication tool, complementary to the remote
participation provided by web casting, phone or audio
conferencing. Such asynchronous communication can be
advantageous both for addressing major time-zone
differences and for the capability of selecting only the
material of interest.
The need for quick turnaround time from the recording
of a lecture to its publication has been an important
motivating factor in our development of automated
techniques to record, encode and compress audio, video,
and support material, as well as our methods to construct
lectures in real time.
Training seminars and tutorial sessions are, in general,
the most viewed of the lectures in the archive. The
existence of these archives is invaluable for large
collaborations that need to almost continuously train a
new work force in the ever-evolving tools of the trade.
Experts on particular topics, who would normally have
spent a significant amount of time and resources
travelling to train colleagues, can dramatically decrease
this effort to a few recording sessions at their home
institute, a conference, or at CERN.
The archival of training seminars and tutorials typically
does not require the fast turnaround time of a plenary
session. Some time must be spent on the development of a
high quality record of the event, often with the inclusion
of additional support material. However, a certain amount
of automation is desirable, primarily to reduce the
manpower necessary to construct the lectures.
Large-Scale Lecture Recording
Collaboration events, such as plenary weeks or
seminars, typically require the effort of 1-2 FTE to handle
the recording and archiving of lectures. Large-scale
workshops, such as CHEP (Computing in High Energy
and Nuclear Physics), or national physical society
meetings, such as the APS (American Physical Society)
March Meeting, host a large number of parallel sessions,
and would require an important scaling of resources,
unless a significant degree of additional automation could
be achieved.
Another potentially revolutionary use-case for the
large-scale recording of lectures is that of the classroom.
Several educational institutes, including the University of
Michigan, are now beginning to implement the systematic
recording of classroom lectures. The University of
Michigan School of Dentistry has conducted a successful
program of pod casting audio recordings of its first and
second year courses, and the Physics Department, in
cooperation with our group, is launching the web lecture
recording of its General Physics I course, attended by
hundreds of students. Our efforts to investigate the
feasibility of such large-scale programs for the University
and for the APS have led to some dramatic developments
toward the eventually complete automation of audio and
video capture. We describe one of these developments
The Web Lecture Capture Device
The Web Lecture Capture Device (WLCD) is a
portable kit designed to provide the tools necessary to
automatically capture, construct, and upload WLAP web
lectures in real time. The kit includes an audio and video
capture device, associated software, and a robotic camera.
Development of the kit is coordinated by the University
of Michigan ATLAS Collaboratory Project, with the goal
of automatically recording classroom lectures at the
University for publication on its online course
management system, CTools [13].
Perhaps the most challenging and intriguing component
of the WLCD project is the development of the robotic
camera. If successful, it would be possible to install such
a system in a classroom or in the parallel session lecture
rooms of major conferences, to record and publish
lectures with essentially no manpower, other than that
needed for installation and take down. The design and
construction of such a system, however, is far from
obvious, and has thus far never been successfully
One could imagine, as an alternative, using the video
obtained by placing a camera with a wide-angle view of
the lecture area on a tripod, so as to capture an image of
the speaker, regardless if where she or he roams during
the presentation. It is our experience, however, that video
obtained in such a manner does not provide the quality of
images that enrich the content of a lecture, as those
obtained by a camera operator. Without a clear view of
facial expressions, hand gestures, and other visual cues,
the video becomes more of a distraction than an aid and is
better off omitted from the final lecture.
We are thus left with the difficult task of attempting to
mimic the intelligence and abilities of an experienced
camera operator. To achieve this, we have developed a
system that is based on the detection of signals coming
from an infrared emitting LED necklace, worn by the
lecturer. Two cameras are required to make the system
work: one camera that tracks movements of the necklace,
and a second camera for capturing the video, which is
directed by software that inputs and analyzes the signals
coming from the first camera. Figure 2 presents the
various components of the tracking camera system.
Figure 2: The IR Camera Tracking System, comprising
the camera pair (left) and the IR emitting necklace.
Although development of the robotic camera is still in
its early stages, preliminary results are promising and
work is now focused on refining the algorithms for
differentiating the necklace signal from background due
to other infrared sources (bright incandescent lights and
the sun, notably) and in tuning the sensitivity of camera
reaction to only major movements of the lecturer. The
ability to mimic the brain of an experienced camera
operator is a distant goal, but providing video from a
completely automated lecture capture device with
sufficient quality for web lectures appears within reach.
Lecture Viewing
The diverse nature of modern HEP collaborations and
their increasingly long lifetimes (20-30 years for the LHC
collaborations, e.g.), impose restrictions on the nature of
the lecture archives, their storage, and the means by
which they can be viewed. Proprietary solutions, which
would be otherwise suitable and easily implemented in a
commercial setting, pose problems in our environment for
a variety of reasons.
First of all, large fees often accompany solutions that
rely on one particular software technology for archiving
and/or viewing. The technology might be free at the
onset, but once one has built a significant library of
material, one does not want to be held hostage to newly
imposed and unforeseen licensing costs. Secondly, the
companies providing the solutions and/or the product
being exploited might not last as long as is desired for the
archive. Some LHC material, such as tutorials or records
of major discoveries (hopefully a recurrent theme), will
have value for the entire duration of the experiments, if
not longer. Very few software commercial software
applications have such a long lifetime.
Finally – and perhaps most pragmatically – our HEP
colleagues are not easily convinced to buy into specific
technologies. In some cases, it is for the reasons cited
above, in other cases due to licensing agreements already
made within the home institutes, and in other cases for
matters of taste. Regardless, it would not be wise to store
lectures in a format that required one specific brand of
software for viewing, unless that format was easily

<slide title=“Second" type="image/jpeg"

The Lecture Object
We present here a few ideas for advancements to the
technology, which in our opinion merit immediate effort.
It is for the reasons cited above that the Lecture Object
was proposed as a standard format for lecture archive
storage. Details of the Lecture Object can be found in the
proposal [11]. It is essentially the collection of the raw
material that has been captured from a presentation, such
as audio, video, slides, screen captures, etc., and an XML
representation of the information needed to construct a
viewable lecture, such as the timing of the slides,
metadata, and the names and locations of the media files.
Initial tests of the lecture object included the
development of transformations to allow the construction
of lectures in SMIL (Synchronized Multimedia
Integration Language) [14], a W3C (World-Wide Web
Consortium) standard format, or the usual Sync-O-Matic
HTML lectures. It should be noted that SMIL was not
chosen as the archive format, as it lacks the complete
functionality needed to construct lectures for a variety of
different viewing applications. For example, Lecture
Objects can now be easily transformed into lectures
viewable on PDA’s or even Apple iPod’s, as well as for
the usual web browser interface. Figure 3 presents a small
extraction from the lecture object XML description of a
typical WLAP lecture.
<?xml version="1.0" encoding="UTF-8"?>
<video title="Welcome to WLAP"
<seq title="Sequence of slides"
<slide title="First" type="image/gif"
Figure 3: Extraction from a lecture object XML
description of a typical WLAP lecture.
All recent WLAP lectures are stored in the archive
database as lecture objects. Media files, such as audio and
video – a part of the lecture object – can be stored in a
physically separate database, provided the transformation
software has access. This can be advantageous, depending
on the size and type of media stored. Naturally, WLAP
media are stored in standard formats, such as MPEG-4 or
JPEG. Metadata is stored using RDF (Resource
Description Framework) [15], another W3C standard
XML language.
As with most computing-based efforts efficiency in
lecture archival would greatly benefit from the adaptation
of a standard language for lecture description. Universal
acceptance of the Lecture Object or an equivalent
standard would simplify the development of large-scale
web-lecture databases and allow easy sharing of tools and
Our focus in this direction is to complete the
development of transformations from LO to a variety of
viewing platforms and applications, including Real
Player™, QuickTime™, and Media Player™. Much work
will also be needed in the political arena, as well,
convincing the major players to find agreement, both on
the recording and publishing sides.
Merging of web archiving with web casting and video
conferencing is a must and ought to be straightforward.
Existing video encoding standards, such as the ITU
(International Telecommunication Union) standard H.239
[16] already provide the ability to broadcast simultaneous
video signals. Capture and archival of these signals in real
time is only a matter of adapting existing technology.
Demand for such functionality is evident in HEP as well
as academia.
The ability to port lectures to new technologies, such as
PDA’s or Apple iPod’s™ [17] has already been
demonstrated. This flexibility would be given a serious
boost by adaptation of a standard, such as the lecture
Our focus is on optimizing transformations to adapt to
the peculiarities of these devices (small video, limited
formats) to create quality lectures for their viewers. This
work targets a typically younger audience and could make
a lasting impact in education, but also on HEP. After all,
shouldn’t CHEP presentations, such as this, be available
for anyone, anywhere?
We would like to thank all past and current members of
WLAP for the significant contributions that have helped
to make lecture archival a growing success. We thank the
CERN Summer School, Academic and Technical
Training Programmes, CERN IT, the ATLAS
Collaboration, the University of Michigan Department of
Physics, the U. M. Media Union, and the ATLAS
Collaboratory Project, for their collaboration and support.
We also acknowledge and thank the U. S. National
Science Foundation and the U. S. Department of Energy
for funding the research. Finally, we thank the organizers
of CHEP 2006 and the Tata Institute of Fundamental
Research for inviting us to present our work and vision.
