MASSACHUSETTS INSTIT TE
OF TECHNOLOGY
JUL 15 2014
LIBRARIES
PENTIMENTO:RETROACTIVE EDITING FOR LECTURES
by
Kenny H. Lam
B.S. Physics, B.S. Computer Science & Engineering
Massachusetts Institute of Technology, 2013
Submitted to the Department of Electrical Engineering and Computer Science in Partial
Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology
June 2014
@ 2014 Massachusetts Institute of Technology. All rights reserved.
Signature redacted
Signature of Author:
"I
Dep artment of Electrical Engineering and Computer Science
ay 20, 2014
Signature redacted
Certified by:
Al
Fredo Durand
I Professor of Electrical Engineering and Computer Science
Thesis Supervisor
Signature redacted
Accepted by:
Albert R. Meyer
and
Computer
Science
Professor of Electrical Engineering
Chairman, Masters of Engineering Thesis Committee
2
PENTIMENTO:RETROACTIVE
EDITING FOR LECTURES
by
Kenny H. Lam
B.S. Physics, B.S. Computer Science & Engineering
Massachusetts Institute of Technology, 2013
Submitted to the Department of Electrical Engineering and Computer Science
on May 20, 2014
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology
ABSTRACT
The boom in online education has provided for the potential of a personalized lecture
experience for every single student. These recorded lectures provide a major benefit to both
students and authors, but currently present several drawbacks as well. The limitations that
exist stem from the method in which lectures are created: using video recorders. Video
recordings inherently limit the editing capabilities of an author and constrain the interaction
from students, providing for a poor choice of media. An alternative encoding of a lecture
could provide for a much fuller feature set to users on both sides of a lecture.
The Pentimento system was designed to promote the expedited creation of hand-drawn
lecture notes for online education platforms such as edX or Coursera. By decoupling the
visual and audio domains of a lecture, content creators are able to more freely fix mistakes or
change small portions without the need to re-record the correct portions. Small recordings are
pieced together to give the final lecture, where the correct synchronization of edits among the
lecture is handled by the system, and the lecture appears to have been seamlessly recorded
in one session. Full control of the data also allows for the potential of increased interactivity
from students.
Thesis Supervisor: Fredo Durand
Title: Associate Professor of Electrical Engineering and Computer Science
3
4
Acknowledgements
First of all, I must thank Fredo Durand for his guidance in the creation of the overall
system. He was the original creator of the Pentimento system, and without his vision, none
of this would have been possible. His experience definitely played a key-role in solving many
of the technical issues which came up in the course of this work. As I encountered the
quickly-growing complexity of the system, I could not help but be amazed that one person
could have built so much alone and without guidance.
I must also thank my collaborators who diligently put work into various parts of the
system; without their efforts, the implementation would have taken dramatically longer.
Steve Komarov and others at edX and Harvard helped provide an early-stage recorder, along
with knowledge of future challenges to overcome. Halla Moore and Richard Lu also created
important building blocks for the system, the undo manager and renderer, respectively. Their
efforts and enthusiasm were invaluable to completion of the project in a timely manner.
I would also like to thank the incredible set of people whom I have met here, and
especially the friends who have helped me get through the of entirety of MIT with some
semblance of my sanity remaining. A special thank you goes to the members of the student
organization Camp Kesem. Though they often encouraged me to embrace the craziness
within, the counselors, the kids, and the families, are truly the greatest inspiration I have
ever encountered. Without their constant laughter, love, and support, I am sure that I would
not be where I am today.
Finally, I must thank both my parents, Mai and Joe Lam, and my brother, Andy Lam.
In so many ways, my brother has been the major driving force that allowed me to get to MIT
in the first place. Without him and his subtle praises or guidance, I doubt I could be here
today. My parents, being immigrants, opened up doors for me through their sheer diligence.
It is through them that I learned how to work hard, and I cannot thank them enough for all
their support throughout the years.
5
6
Contents
1
Introduction
10
1.1
M otivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.2
O verview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2 Related Work
3
4
16
2.1
Recording .........
2.2
Editing .........
2.3
Playback ..........
.....................................
.......................................
......................................
16
17
18
Features
21
3.1
Recording M ode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.2
Edit M ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.3
Undo & Redo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.4
Constraints
26
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Design
27
4.1
M odels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .
28
4.1.1
State M odel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.1.2
Lecture M odel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
7
4.3
6
.
4.2.1
Tools Controller . .
. . . . . . . . . . . . . . . . . .
33
4.2.2
Time Controller . .
. . . . . . . . . . . . . . . . . .
35
4.2.3
Lecture Controller
. . . . . . . . . . . . . . . . . .
35
4.2.4
Visuals Controller
. . . . . . . . . . . . . . . . . .
35
4.2.5
Recording Controller
. . . . . . . . . . . . . . . . . .
36
4.2.6
Retiming Controller.
. . . . . . . . . . . . . . . . . .
36
4.2.7
Undo Manager
. . . . . . . . . . . . . . . . . .
37
.
.
.
33
. .
View
38
40
Implementation
Method
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
5.2
Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
5.3
Editing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.4
Undo& Redo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
.
.
.
.
5.1
Conclusion
46
6.1
47
FutureWork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
5
Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
4.2
8
List of Figures
edX Finger Exercises . . . . . .
13
2.1
YouTube seek mechanism
19
2.2
YouTube seek for long videos
20
3.1
Features available in Pentimento
22
3.2
Recording tools . . . . . . . . .
23
3.3
Editing tools . . . . . . . . . . .
25
4.1
MVC Layout
4.2
.
.
.
.
.
.
.
1.1
. . . . . . . . . . . . . . . .
28
Monotonic ordering of constraints
. . . . . . . . . . . . . . . .
30
4.3
Lecture model hierarchy.....
. . . . . . . . . . . . . . . .
31
4.4
Controller logic flow
. . . . . .
. . . . . . . . . . . . . . . .
34
4.5
Overview of the view . . . . . .
. . . . . . . . . . . . . . . .
39
5.1
Snap-to-insert . . . . . . . . . .
43
5.2
Deletion cascade
. . . . . . . .
44
.
.
.
.
. . . . . . . . . .
9
Chapter 1
Introduction
The current explosion of massively open online courses (MOOCs) has meant that lectures
should no longer be thought of as limited to strictly the classroom anymore, but now have
a reach that is worldwide.
With new distribution platforms such as edX and Coursera,
direct access to top-tier education instruction on a variety of subjects is freely and openly
available. The growth of technology into the field of education has also meant that lectures
can be dramatically changed so that no longer need to be delivered in real-time but can be
pre-recorded and replayed later. These pre-recorded and durably stored lectures provide
several benefits, such as:
* any-time accessibility for students to watch on-demand as they please, including the
freedom to stop and resume a lecture at a later, more convenient time
o speed control for students gives students the ability to vary the pace of a lecture
depending on the difficulty of the material, speed of the lecturer, and natural learning
speed of the student
e selective replay of sections which were confusing or misheard; while lectures were
previously monotonic in time and may have moved on without a student, students can
now individually control a personal time in the lecture
10
* ability for students to pause and find external resources on the topic or underlying
principles if they are unfamiliar
However, the benefit of pre-recorded lectures is not simply limited to students, but also
provides value to the creator. Authors can now ensure that derivations and examples are
fully correct ahead of time. Later clarifications can also be inserted directly into the original
lecture, and mistakes or fixes can also be applied to the original lecture as well. Recorded
lectures also allow for incremental updating as the class is taught over several sessions, as the
relevant material should change, and as better techniques in the field arise.
This thesis presents Pentimento, a system which allows flexible authoring of hand-written
lectures. Pentimento no longer relies on the strict video format to deliver a lecture, but
the viewing experience is still seamlessly similar to viewer. The separation of a lecture into
visual and audio components is the primary differentiator of the system from other recording
applications. Section 1.1 further discusses the motivations and applications for this thesis,
while Section 1.2 outlines the remainder of the thesis.
1.1
Motivation
Much of the motivation for this thesis comes from ubiquity of MOOCs and online education
in the current world. We believe the recent entry of the edX and Coursera platforms into the
space and their numerous partnerships with acclaimed universities can bring a shift in the
way higher learning is provided, and has already changed the way in which higher education
is available. Already-existing options of Khan Academy and MIT's OpenCourseWare also
add to the variety of resources available for students seeking to further their education, each
already servicing millions of requests a month [1], [2].
As the manner in which lectures are delivered is changed, so must the tools to take full
advantage of the new media. Currently, video provides a familiar, but antiquated method
11
of delivery for lectures. The first drawback of the video delivery method is that it highly
limits the level of interaction available for students. With videos, the students can pause
and resume, change playback speeds, or skip to another time, but these interactions hardly
constitute meaningful interaction with the content presented.
As a result of the limited interaction available in videos, instructors are unable to make
accurate assessments until an evaluation is issued, returned, and graded-typically in the
form of an exam or problem set. Exams suffer from the fact that they exist at a large
granularity with respect to topics, so not all topics may be thoroughly addressed during
the limits of an exam. This brings about a potentially non-representative distribution of
scores such that it may not reflect student understanding in general, but only on topics which
appeared and in the time constraints given. Problem sets seek to address understanding
at a smaller granularity, but still continue to batch topics into one- or two-week groupings
typically. The trade-off that problem sets make for this smaller granularity, however, is their
rapid pace-which can also present a problem: if an assignment fell to the wayside, students
often need to focus on the current material and neglect to review what has passed until an
exam is upcoming.
Both of these two methodologies fundamentally suffer from the fact that they do not
guide a student's understanding during a lecture, but instead ask students to review lectures
after they have already passed.
More in-the-moment assessments could lead to a more
solid understanding the first time material is introduced, as opposed to a look-back-andreview approach. edX currently provides a version of this by interspersing "finger exercises"
with short video segments, as shown in Figure 1.1. In this way, lectures no longer need
to be so rigid as to enforce one tract upon all students, and students who struggle with
such mini-assessments could be put onto a slower tract than students who excel with the
material. A slower tract could have additional examples or sample problems, rather than
only providing students with the option to re-watch what they have already covered. Such a
system could help to mitigate learning-inequality with its more thorough coverage of material
for slower tracts. Additionally, students who struggle with different concepts can receive more
12
finger exercises
video clips
Overview
introduction to Computation
of Programs
~~~ramsLECTURE 3 INTRODUCTION
Simp
r
t
currently active item
hms
Lecture 3-Simple
Aliritbms
Lecture 4- Functions
Recursion and Objects
Eugging Assertions
and Eiceptions
Efficiency, Orders of Growth,
Memory and Search
Classes and object
Oriented Programming
Plotting Simulations and
Random Walks
Sampling Monte Carlo
Methods, and
Statistical Thinking
Download video
Download timed transcript
Figure 1.1: An example of the finger exercises from a typical edX course. The exercises are
inter-spliced with shorter sections of video [3].
personalized content on different sections, which one, strictly linear lecture cannot provide.
Also, videos continue to present a variety of problems for students and authors. They
provide a good medium for distribution only in the case of sequential viewing, but the option
to skip forwards or backwards presents the problem of how to discretize or sample the video
in such a manner as to be useful to someone seeking particular content. Video is much more
costly to apply editing techniques to due to the large amount of processing which must be
done on the data. Finally, videos serve as a relatively expensive means of maintaining data
as well, as not all elements in a video frame are relevant to the content meant to be served.
Additionally, since aforementioned benefits of recorded lectures are predicated on the idea
that lectures can easily be manipulated after their initial recording, video is an inappropriate
choice of media.
Content which cannot be changed after-the-fact provide rigidity and in
fact limit creators more than benefit them. A lecture which cannot be edited, or where
13
many pains must be taken to do so, force authors to have a fully-fleshed out script prior
to any final recording. These such recordings are a major hindrance since they impose an
all-or-nothing atomicity about them, necessitating that lecturers not only have a full script or
notes beforehand, but also have a recording which is acceptable. Any mistakes or stumbles
can force an entire re-recording from the beginning, since the mistakes will be recorded
inadvertently. Long pauses in the recordings also introduce a very disruptive flow in the
continuity of the lecture, whereas classrooms give lecturers the option to naturally pause for
questions or to pose questions.
Moreover, this also takes away the ability of a lecturer to reuse the recording if any
information should change or need to be updated. While some lecturers may choose to do
their recordings at a smaller granularity so that more local changes can be made, there will be
increased overhead in stitching together the piecewise components. Creators must also impose
some sort of structure for naming if they believe they may need to update the sub-lectures
at a later time. This simply presents yet another barrier to authors of the traditional video
format that is so common, such as from Khan Academy, Coursera, edX, and MIT's OCW [4],
[5], [6], [7].
1.2
Overview
This thesis presents Pentimento, based in principle on the original version proposed by
Professor Durand, but migrated to the web browser with a new overall design of system
components. The key element of the framework is that while other systems record a contiguous
and coupled stream of audio and visuals as they occur, Pentimento allows for a recording of
either through a full decoupling of the audio and visual channels. This has the major benefit
that since channels are recorded separately, they can be edited separately without affecting
the counterpart channel. After each recording is correct, they can later be synchronized to
however the user wishes them, with variable pacing for different sections. The synchronization,
14
however, is non-binding and can still be edited afterwards. At no point is the lecture in any
sort of final or draft state, but instead, the production and editing versions are one and the
same. This is very much by design, and rather reflects the mentality that any document may
never be finished, but may always be edited or updated with new information.
Chapter 2 further discusses other pieces of work which are relevant to Pentimento.
A variety of tools exist for recording or editing, though they do not provide the same
flexibility in editing as our application. Chapter 3 discusses the features and tools which are
available to users of the system. Chapter 4 discusses the design of the data structure and
components which constitute Pentimento, along with the interactions between them. Chapter
5 further discusses specific implementation choices which were made in the current version
of Pentimento and the specific semantics which are available in the application. Chapter 6
provides a summary of the system and its benefits, while also discussing what extensions can
be made on top of what is existing at the time of writing.
15
Chapter 2
Related Work
A variety of applications exist for several of the individual features related to the Pentimento
system, though none embrace all the functionality that can be leveraged with the work here,
or allow for the extensibility that Pentimento provides for other media besides hand-drawn
strokes. Primarily, work relevant to us falls into three general categories of functionality:
recording, editing, and playback.
2.1
Recording
A variety of options are available for the recording of a lecture, varying from true video
recording to screen capture to capture at the level of a stroke. The most basic approach
to capture of lectures is for a full video capture of the lecturer as well as the blackboard.
This can require a substantial amount of hardware, including a camera or camcorder, tripod
stand, and microphone system. Overall, estimates for this type of system estimate costs to be
between $1000 to $3000 according to [8] and [9], not including time or cost for post-processing
and editing. Document cameras are also a feasible approach, though a microphone must be
purchased as an addendum to the video capture [10]. The price of some document cameras
in use, though, can be comparable with a camcorder alone, even with an education pricing
16
[8], [11]. Instead of capturing truly written written strokes, screen capture of hand-written
strokes is another approach popularized by Khan Academy. Solutions such as this can use
any variety of screen capture tools and input tablet, with Khan Academy specifically using
Camtasia Recorder and a Wacom Bamboo Tablet in conjunction with a recording microphone
[12] [13].
A final option is to capture the strokes themselves instead of what is visible at all points
in time. Options such as Penultimate, Note Taker HD, Write, and a host of others referenced
in [14] or [15] will capture writes at the stroke level instead of maintaining an entire video
screen capture at all times, which provides a much more efficient and relevant encoding to
hand-written lectures. More advanced options such as LiveScribe, NoteLedge, SoundNote,
and AudioNote also allow for synchronized recording of audio with the hand-written strokes
[14], [15]. Some options also allow for later recording of audio after the visuals have been
recorded.
2.2
Editing
The domain of editing recorded lectures primarily falls into the domain of video editors,
such as iMovie, Final Cut Pro, Adobe Premiere, or Avid. These video editors are highly
restrictive in that they do not fully leverage the underlying data within a stroke of the lecture,
but are intended as a more fully-featured option which can operate on any video [16]. This
means that in addition to learning curve for learning a new piece of software and becoming
comfortable, users will also experience difficulty in trying to well-manipulate the strokes
within a video. The other option for video editing is to outsource the process to professionals,
though this option can be prohibitively expensive. Specifically, MIT's AMPS service provides
services for both the recording of a lecture as well as the editing process, though the cost is
around $295/hr, with a total semester cost around $10,000 [17], [9].
The prior options mentioned for recording of strokes also offer editing as well, in some
17
cases. However, to the best of our knowledge, the edits are strictly applied to the state of the
document: no sense of when the edits occurred is maintained, and the edits serve to only
apply fixes to notes, but do not constitute a process. While an incredibly useful feature, some
edits should be visible in a lecture to preserve a sense of why or when a change occurred.
Additionally, we do not believe these applications provide the ability to selectively edit the
speed of the different channels independently.
2.3
Playback
At a minimum, two very different types of authors exist in practice: those which record
many smaller lectures and provide the breakdown to students, and those who record one
longer lecture in a contiguous time block. The latter is very common for lecturers who have
agreed to their lectures being recorded as they are delivered in a classroom. However, either
scenario presents a question of how playback should be presented to the user, especially so
that playback is useful to users when seeking content.
Now commonplace, YouTube's seeking mechanism was rolled out in 2012. Show in Figure
2.1, a series of thumbnails is presented when a user hovers over the seek bar within a video
[18]. This provides users with a way to quickly glance at different frames within the video so
that they may more quickly identify relevant frames and move towards that direction. The
mechanism also allows for an additional layer of nesting shown in Figure 2.2 for very long
videos, where a smaller pop-out of the seek bar is displayed before generation of thumbnails.
However, Figure 2.1 demonstrates a series of potential problems with the thumbnail display.
In videos where frames may be very distinct, thumbnails provide near immediate uniqueness,
but lectures where the majority of elements are on a blackboard do not necessarily display
such properties with a simple glance. An additional problem with thumbnails in that there is
no causality between them-it is unclear how one frame relates to another, and this continuity
may be very important in finding a particular section of a lecture.
18
Google Chrome Speed Tests
&Lke
0
Subscribe
+ Add to
'
221 videos
Shari
L
5,549,002
"
googIechrom.
Figure 2.1: The YouTube filmstrip allows for seeking within a video by displaying a
thumbnail preview of different frames within the video [18].
Another approach is to analyze the video for relevant items before presentation to the
students. NoteVideo offers such an approach for visuals by doing a video analysis on strokes
and providing an interface where users can click on strokes to replay at any particular stroke.
NoteVideo+ expands on this by closely coupling the transcript with the video, so that users
can see the related text displayed when hovering over a stroke, and also giving users the
option to search the transcript and jump to those points within the video [19].
19
Life In A Day
Hefte
0
9
Subscribe
+
At
68 v~deos v
3,994,228 .w
Ware
Figure 2.2: The YouTube interface allows for zoom into a smaller section of the seek bar,
which then expands into a thumbnail [18].
20
Chapter 3
Features
The primary feature of this thesis is the ability for edits to occur in a system which also
performs the recording of hand-written lectures.
Being so, it is important for users to
distinguish between two different modes of operation: one mode is the recording mode which
allows an insertion of new material to be put into any time within the lecture, and another is
the editing mode, where users can change the properties of already existing material. Each
mode offers a different suite of tools for what is appropriate in the current mode-context.
Figure 3.1 presents some common features provided by the application which are useful in
the creation of a hand-written lecture.
3.1
Recording Mode
The recording mode encapsulates all activity of both the tools which provide live changes
and their potential configurations. The set of tools which are active during this time are all
additive in the sense that they modify a lecture with new material which did not previously
exist and they never remove content from the lecture itself. Tool modifiers such as changing
the stroke width also exist for the recording-mode tools. The handful of tools exist for
recording is shown in Figure 3.2 and described below:
21
/
dv
.
-
(a) The pen tool allows for strokes to be captured directly by the application.
Au
M
R".
slc
dee
rdo ee C h."iansesesm
asnin
nomnil
fum Tocis
-5
"o'.rf
e..q
dv
0L
K'O
'X
2-
'1.
--
~~t
(b) There is the option to edit strokes which were written in various ways, such as selection followed
by a deletion.
, -,A ,I
G)~
azat
S
2
(c) Flexible recording allows users to move back in time and record fixes at the time of the mistake,
instead of recording them later. This makes it seem as if the mistake has never occured.
Figure 3.1: A preview of some of the features which Pentimento provides for flexible
recording and editing.
22
In-Lecture TOOls
Selecion for editing
(eraoersideofstykis)
Pen
Figure 3.2: Tools available during recording
" selection tool which is used to take visuals on-screen and put them into the current
selection. This can be used as a more long-lasting highlight tool or to buffer visuals for
another tool's functionality
" pen tool which is the primary tool used for adding content. This allows a user to create
new visuals and draw strokes onto the canvas which are all recorded by the application
" highlighter tool which is used to provide emphasis on a section of the canvas, typically
for emphasis on already-drawn visuals. The highlights are recorded by the system,
though the emphasis is temporary with respect to time within the lecture
" width tool allows users to change the stroke width of their pen or highlight strokes
" delete tool can be used to remove visuals at a time in the lecture. It is important to
note that a deletion which occurs during recording simply removes that visual from
being visible at any later time in the lecture. The visual will still exist in the underlying
data structure, but will have a flag set for time of deletion. This is in contrast to an
edit-mode deletion, which is described in the next section
23
* additional slide tool provides the user with a new, blank canvas which has no notion
any previous or any future slide.
3.2
Edit Mode
The edit mode tools are the ones which allow for retroactive changes to be applied to the
lecture. If a mistake is made and the user wishes to fully erase the mistake from ever having
existed in the lecture, the suite of edit tools allow for changes to be directly applied to visual
elements. Tools within this mode cannot add any new material into the lecture, but instead
modify visuals or data which is already within the lecture. The set of tools which exist for
edits, shown in Figure 3.3, are:
" play/pause tool allows the user to preview the lecture as it will be exported in its
current form
" select tool is analogous to the select tool in the recording mode. Selecting in the edit
mode has no side effects for the recording itself
" delete tool is the tool for full removal of a visual from the history of a lecture. In
contrast with the recording-mode deletion, this deletion is a full removal of the visual
from the data structure
" redraw tool allows the user to replace visuals with a new set of visuals. This action acts
as a shortcut for the compound action of deleting a selection and starting a recording.
Because users can replace one visual with any number, they must specify for when the
recording is ended.
" changing the stroke width will edit a stroke or strokes to appear as if they were originally
drawn with the specified width
" deletion of the currently active slide will remove the entire slide and its encapsulated
visuals from the lecture
24
,Ow. 0No Pcae
S
0-4-'. Select I
Cdft* I
redraw selectlon I
I
Figure 3.3: Tools available during editing
3.3
Undo & Redo
Pentimento is also integrated with both an undo and a redo stack which allows for actions to
be inverted or replayed. The ability to undo or redo at varying granularity is also supported,
where users may want to only undo the last action or undo a entire set of successive actions.
The undo and redo actions are accessible in both the recording and edit modes, though the
ability to perform either is dependent on which actions have most recently been performed.
They are in both sections of tools simply for spatial association in the view.
While the recording and editing modes enforce strict additivity or non-additivity, respectively, the undo and redo exist as meta-tools which diverge from the mode-enforced
semantics. These tools are exemptions from the mode-based restrictions in order to provide
a correct meaning of the undo action and redo action to the user. For example, a stroke can
be drawn in the recording mode, and the undo action applied to the stroke will erase the
stroke from ever having existed in the lecture-a very edit-like action during the recording
mode. Likewise, a stroke can be deleted in the edit mode, and the undo of that action will
replace that stroke in time and space-an action which is additive in nature. In both cases, if
the user wishes to not have performed the most recent action in the first place, the undo
must break the mode semantics. The exact meaning of undo & redo in different contexts
is provided in further detail in Chapter 5, which discusses the implementation and specific
guarantees of Pentimento.
25
3.4
Constraints
The decoupling of audio and visuals is what allows Pentimento to give such freedom to editing
freely in one channel without directly affecting the counterpart.
However this immense
freedom comes at the price of removing the tightly coupled meaning to any specific time;
when previously a time corresponded to a visual and associated audio, it now may correspond
only to a visual but have the associated audio shifted to another time or vice-versa. This
disjointness brings about two separate notions of time, which can now be skewed or shifted
by variable amounts at different sections within the lecture.
This is not a problem, though, since the natural speed of writing and speaking are not
the same, and each will change when trying to do both [16]. Additionally, the pacing of
each channel may need to adjusted separately in different sections to compensate for this
difference. In order to allow users to specify the relative speed of each channel, users can place
constraints within a lecture, where a constraint consists of a (visual-time, audio-time)
pair. This allows for the specification of a loose-synchronization between the two channels,
and these constraint points can be changed later in the edit mode. The interpretation of
constraints is further discussed in Chapter 5.
26
Chapter 4
Design
As with any system, the design of Pentimento is strongly influenced by the features and
semantics which we guarantee to the user. The promises of the application motivate the
layout of the data structures, which are described in detail within this chapter. Several
additional factors also strongly influenced the design of this new Pentimento system: the
need for a clean separation of logic between components and the want for extensibility of the
platform to other media beyond handwriting. Therefore, the overall design of the system was
therefore geared to follow the MVC ideology as closely as possible to satisfy all preceding
factors. Strong adherence to the design principles set forth in the ideology has enforced
a clean design structure, which naturally provides strong modularity within each of the
components. This modularity therefore provides extensibility with the replacement of any
specific component or combination of components.
Figure 4.1 shows a high-level architecture, including several controllers and several
data layers. Important to note is that the controllers and models do not necessarily exist
in a one-to-one relation, as the controllers represent more a logical grouping of functions,
while the models represent data important to the application. In fact, for Pentimento, data
can be manipulated in several meaningful ways depending on the mode of operation, so
controllers are numerous compared to models.
27
This specific MVC design is also highly
A
interactsp
interacts
* -
apply updates
fires events
services reads
pushes updates
user
controllers
view
models
Figure 4.1: MVC Layout of each component. This follows the MVC derivative, MVP pattern
where presenters bridge the model and view, as opposed to the canonical MVC [20], [21]. We
continue with the label of controllers for more familiarity to readers.
controller centric, where all updates between models and view must be controller-driven, and
no direct communication occurs between the model and view.
4.1
Models
The models serve the sole purpose of being containers with the proper structure to encapsulate
all the application data in a consistent manner. Two very different models exist for the
application: the lecture model and a global notion of the state model.
The lecture
encapsulates all the data relevant to the presentation and what will be presented to students
by the author, while the state is information relevant to the current session of the author,
but which should not be saved across sessions or presented to the students.
4.1.1
State Model
Overall, the state model is an umbrella term for different pieces of sub-state which are each
managed through a different controller responsible for that sub-state. Any data which might
be considered transient for a session should be included within this model, and not within the
lecture model. The state serves primarily the purpose of being a channel for inter-controller
communication. Generally, there are three major sub-categories for the state: input state,
navigation state, and tool state.
The navigation state consists of the timing within the lecture and the currently active
28
slide that is viewed. The importance of slide-locality is discussed further in subsection 4.1.2,
but the notion of only being aware of the current slide allows for consideration of only the
intra-slide elements in most cases. This state is important so that the elements rendered
match the time cursor available to the user.
Input state comprises the sum of the input state from all the I/O devices on the author's
machine. The inputs may not always be relevant, but their changes in their state are always
monitored. The choice to keep track of state variables allows the system to check against
the state instead of querying the device for its state on every event. These inputs serve to
potentially modify the effect of the tools which are active: selecting visuals with the Ctrl/Cmd
key or Shift key down will add to the current selection, the selection box should only be
expanded if the left mouse button is held down, etc. The correct combination of inputs can
also trigger the firing of a tool if enabled, such as the commonly used Ctrl/Cmd-Z shortcut
to fire the undo action.
The final state element, the tool state, consists of the properties which may modify
the functionality of any tool when changed. The most recently active color and the most
recently active stroke width both affect any type of drawing which may occur. Recording
is also considered a tool, so the parameters relevant to recording are encapsulated within
this category. Additionally, flags can be set for some elements of the hardware I/O, such as
whether to consider pressure-sensitivity in assigning a stroke's color or width or whether to
consider keyboard shortcuts in the firing of tools. Though the input state may continue to
reflect ignored variables, tools refer to the tool state for whether to consider the inputs.
4.1.2
Lecture Model
The lecture model is the entity which contains all the information for the replay of a lecture,
and exists as a hierarchy of elements. At the highest level, this model consist of only a few
elements: an array of slides and an array of constraints, each explicitly ordered based on
29
visual time
visual time
audio time
audio time
(b) This set of constraints is allowed, since all
constraints are monotonic and do not intersect.
(a) This set of constraints is dis-allowed,
since a pair of constraints conflict.
Figure 4.2: Pentimento enforces strictly non-intersecting constraints, so that constraints are
always monotonic in visuals and in audio. If two constraints conflict, such as in (a), one
constraint will be removed.
monotonically increasing time. Though no unified notion of time exists, each channel should
be monotonically increasing, and constraints are not allowed to crisscross in their (visual-time,
audio-time) mappings, as shown in Figure 4.2. The hierarchy of components to the lecture
model is shown in Figure 4.3.
The choice of sub-dividing a lecture into slides is inspired by the idea that most lectures
can naturally be discretized into sections such as blackboards or Powerpoint slides. A slide is
a fully-contained, standalone unit that defines sharp boundaries for all data encapsulated
within it. Slides, like lectures, contain an array of visuals and a duration of existence, but
each slide also maintains an array of its transformations over time. Transformations at this
level primarily consist of camera changes, such as zooming in/out or panning to another area.
The duration of the entire lecture can be determined based on the sum of the slide durations,
and at any time t within a lecture, the active slide i can be defined such that:
i-i
Z
j.duration < t <
j.duration
(4.1)
j=0
j=0
Time 0 is special in that it belongs to the first slide and is the only exception to this rule.
Slides also lack an explicit slide number, but slide numbers are implicit and can instead
inferred by the position of a slide in the array.
The visuals within a slide may vary quite a bit depending on their types, but they share
some common and fundamental fields. Some visuals that are often helpful in during a lecture
30
lecture: obiect
slide: object
visual: object
Figure 4.3: Lecture model hierarchy
31
include examples such as the strokes an author writes, images relevant, or videos relevant.
All fields for a visual are defined relative to the slide, as opposed to the more global lecture,
to allow for more modularity between slides. Among common fields these different visuals
may have are:
" type of the visual, which essentially identifies the class. In the mentioned examples of
visuals, the types would be stroke, img, video, respectively.
" tMin which defines when the visual came into existence, either through direct recording
of later editing.
This property is also when the viewer of the lecture should first
acknowledge the existence of the visual.
" properties is a map between strings which may vary based on the type of a visual.
For example, the properties of a stroke may include the color and the width of the
stroke, while images may have height and width attributes or a scaling attribute.
" transforms is an array of time-ordered transformations which may have been applied
to the visual. If a visual has been modified in any way after creation, such as being
scaled, sheared, or moved, such transformations will be stored in the transforms of
the visual.
" tDeletion is when the visual should no longer be visible or in existence during the
slide, after its tMin. This field is useful when a visual should only be visible for a small
duration during a slide, but not the entire duration of the slide after its tMin.
" hyperlink is a link to a foreign resource which is bound to the visual. For img and
video visuals, the link may simply be a reference to the source of the visual, but a
series of strokes which define an equation may link to an alternate derivation, such as
from Wikipedia.
The final layer fundamental to the lecture model is that of the vertex. Every visual
needs some notion of an (x,y) coordinate: images and videos may define attributes such
32
as height and width, but need a top-left-corner or some other such relative point defined;
strokes are simply a series of (x,y) coordinate vertices connected together. For Pentimento's
needs in particular, especially with its focus on hand-drawn strokes, vertices are defined as
(x,y,t, p) coordinates, with t the time it was laid down and p the pen-pressure with which
it was laid down. The latter field may not always be useful for all visual types.
4.2
Controllers
Controllers serve as the primary hub for all the application logic, serving to bridge the gap
between the data within the model and the view presented to the user. This means that
controllers are in charge of properly interpreting user input, manipulating the models in the
proper manner, and keeping a consistent state. Figure 4.4 shows a layout of the controllers
and their interactions among one another.
4.2.1
Tools Controller
The tools controller is the entry point for all user input in relation to tools and a dispatcher
to all the other controllers to perform the correct logic on the usage of a tool. This means
that this controller is in charge of attaching and detaching handlers on tool changes, while
also keeping track of changes to tool state, such as current stroke color or current stroke
width. Mode changes, from editing-mode to recording-mode or vice-versa, are also loosely
classified as tool changes, but such changes render some tools usable and others unusable.
Therefore, it is also the responsibility of the tools controller to update the view in response
to user input which changes the mode of operation.
33
Recording Controller
recording mode
editing mode
TU
UI
C,
II
Query to
update model
ontro er1
UI Events
Queries for
update to model
I Lecture
Visuals
Controller
Controllei
Visual
changes
Slide changes
Lecture
Model
Constraint
changes
Applied changes
to model
Figure 4.4: The layout of controllers in relation to the lecture model. The green flow
represents the recording mode action flow, while the blue flow is the flow for editing mode
actions.
34
4.2.2
Time Controller
The time controller is the sole controller responsible for updating the state to reflect the
correct current time of the session. Important to note, though, is that there are two separate
notions of time, with one for the visuals and another for the audio. Though many other
components may ask to read the current visual time or audio time from the state, such as
the renderer which must know at what time step to render, it is time controller maintains
the exclusive right to update the time. Any controller that wishes to update the time must
ask the time controller to do so on its behalf.
4.2.3
Lecture Controller
The lecture model itself is hidden behind this controller so that the lecture controller should
be the only one capable of editing the lecture's fields. The lecture controller only considers
high-level changes such as the addition of slides or deletion of slides. This is also the controller
that maintains the global notion of the lecture, so any computation which depends on the
entirety of its components, such as determining what slide any time corresponds to. The
lecture also maintains no notion of duration, but such a field is determined by the sum of
the durations of its slides, which is yet another computation this controller is responsible for.
The time controller will update the time appropriately, then request for the lecture controller
to update the current slide within the state to correctly reflect the correct slide.
4.2.4
Visuals Controller
Manipulation of the properties of a visual fall under the responsibilities of this controller,
regardless of which slide is relevant. Any change related to visuals must ask the visuals
controller to perform the update on its behalf. The additions of a new visual, deletion of
a visual, or modification of a visual are all encapsulated by the visuals controller.
35
This
controller may modify a visual by either changing its properties directly or appending a
transform object into its array of transformations. The visuals controller also has an audio
controller parallel which is discussed in Chapter 6.
4.2.5
Recording Controller
Changes to the lecture model can occcur in one of two fundamental forms: edits and
additions, and the recording controller is charged with the latter functionality. The entry
point for any changes still falls with the tools controller, which then delegates responsibility
for any recording to this controller. While edits may alter the properties of already-existing
visuals, edits may not create or insert new visuals. The recording controller handles additions
which occur at a time or span some time interval and provide a change which did not
previously exist to be incorporated to the lecture model.
Any change to the lecture happens directly on the underlying data structure, so the
controller acts as a hub for all logic which manipulates the model. It appropriately handles
temporary buffers for in-flight visuals which have not yet been completed, and asks the lecture
controller and visuals controller to perform actions on its behalf, such as appending a new
slide or inserting a new visual. As a side effect of direct manipulation, the currently active
time and currently active slide may change during a recording, so this controller must also
act in conjunction with the time controller to ask for correct updates to to both.
4.2.6
Retiming Controller
The retiming controller handles the constraints which are placed in the application. At the
beginning and end of a recording, the recording controller will ask the retiming controller
to place constraints so that the elements within a recording remain coherent as they were
initially recorded. Later during editing, users can move these automatically placed constraints,
delete them, or insert their own. This controller also enforces the requirement that no two
36
constraints can conflict. Other controllers may also query the retiming controller for proper
interpolation of a visual time to an audio time or vice versa.
4.2.7
Undo Manager
The undo manager is not in fact a controller, but exists as a logging mechanism for all actions
that have been performed. With each action, a log exists of its inverse action to be applied
on an undo 0 call. The undo manager not only keeps a track of an undo-stack of actions
that can be undone, but also a redo-stack of actions which were undone. Any non-undo or
non-redo action that directly pushes to the undo stack will clear the redo stack.
Important to note is that the undo manager helps to guarantee flexible semantics of
the application. In fact, the undo manager helps to guarantee an asymmetric undo in some
cases, where the lecture model is modified to a point where it never was before the undo
action. Much like how the undo manager breaks the semantics of each mode, such scenarios
are possible where we believe it would be more meaningful to the user than the perform of an
exact undo. This is discussed further in Chapter 5, which discusses specific implementation
choices.
The undo manager's logic also reinforces the controller break-down as presented here.
Since each controller performs specific and clearly defined actions, the same controller must be
responsible for the inverse action. For example, because the visuals controller is responsible
for all changes to visuals; it is responsible for the deletion of a visual and addition of a visual,
which is the inverse action of the deletion. In this way, the type of events which are placed
onto the undo stack are well localized to each controller.
37
4.3
View
The view has several separate components which are managed individually: the editing tools,
the recording tools, the navigation, the drawing canvas, and the record start/stop. The
aforementioned tools controller handles the actions of correct tools in each mode, which still
leaves the problem of needing to correctly render the state of the lecture at any time. That
responsibility is moved a separate component called the player. The player correctly handles
the constraints and changes to visuals which may have been made over the course of the
lecture. The player generates output based on the time frame at which it is asked to render.
The tools laid out across the top are consist of the non-recording tools, which are used
during times when the application is not recording a section for the lecture. Along the side
are the recording tools which are available for use during a recording. Additionally, the
navigation state is reflected in the view as well. The layout can be seen in Figure 4.5, and
includes the option to begin a recording or stop a recording, with what channel should be
recorded.
The video time cursor and ticker box reflect the current state of a session. The primary
way in which authors navigate the visuals is through the slider, but the ticker will always
reflect the time of a lecture, even during a recording. The slider is disabled during recording.
The recording tools and editing tools reflect those which were described in more detail in
Section 3.1 and Section 3.2, respectively. The playing of a lecture and moving across slides
may also be helpful in editing, but do not modify the lecture itself, though they will change
the state. These tools are grouped with the editing tools since they navigate a lecture only
when not recording.
38
F
Figure 4.5: An overview of the view which is presented to a user. The aforementioned tools
are logically grouped by color in this figure.
39
Chapter 5
Implementation
5.1
Method
The underlying requirement that this embodiment of Pentimento be cross-compatible among
different platforms and different operating systems meant that we primarly considered options
which are wide-spread and mature for development. While we could have allowed for users to
download and install a binary, many of the features we wish to support lend naturally to
the web, specifically hyperlinking and embedding of foreign visuals such as images or videos.
Additionally, because the majority of MOOCs are accessed through the internet, it seemed a
natural extension to provide the recording and viewing with a browser environment.
While the browser environment provides a primarily benefit, there are some drawbacks
which should be addressed. In order to record audio, the origin policies of current browsers
require that the application must be hosted, meaning that authors must either run a local
server to host the code or connect to a remote server which does so. Saving the contents of a
lecture, however, may be done either locally or remotely; though a local save would require
re-upload the lecture before editing. The primary problem which is faced, however, is the
fragmentation of browsers and which features each browser supports. Most notably, Internet
40
Explorer deviates from the standards to which Chrome and Firefox adhere, and vice-versa.
Additionally, we aim to support HTML5-compliant browsers, so last-generation browsers may
not receive a full set of features. Lastly, the different browsers also support a different set of
events and event-rate, which can affect the smoothness of strokes, for example. Some of these
issues can be mitigated, such as importing a the third-party Javascript library excanvas.
j
s
in order to compensate for older browser not supporting the canvas element [22].
5.2
Recording
Recording brings about a series of problems which are masked behind the logic of the
recording controller. The recording controller acts as the primary hub for every action that
occurs during a recording. It primarily handles the setting of various state aspects related to
recording, so that other controllers and UI handlers can reference the state to determine
whether or not to proceed. Control flow during a recording is show in Figure 4.4.
The recording controller firstly marks the state to indicate a recording is in progress,
and keeps a local variable of when the recording began with respect to its local clock. It
then asks the time controller to begin a regular interval at which it updates the video cursor
(and/or the audio cursor depending on which type of recording is being performed). We
found that the regular updates were not quite regular, so instead the regular updates perform
a local read of the clock and update appropriately depending on the read and the most
previous clock read. Handlers for the writing of strokes then notify the recording controller
of their specific event, leaving the recording controller to act on it as desired. For example,
beginning a stroke on the canvas will fire a handler which will pass in a new stroke object
into the recording controller, the recording controller will mutate the time to be relative to
the beginning of the slide, and then ask the visuals controller to add the visual into the slide.
Additionally, the recording controller also masks visuals which do not apply to the
recording. Consider a visual which exists at time ti, and a recording which begins at time to
41
such that to < t1 . The visual should come into existence once the time reaches ti, but this
specification brings about awkward semantics during recording for the author. Instead, the
original visual should be shifted by the time of the recording within the slide, A, and should
come into existence at t 1 + A during the slide. However, the value of A cannot be known
until the recording is finished.
A variety of options are available for this problem, and we considered three possible
solutions. The first of which is to consider a temporary lecture which is merged with the
original at the end of the recording. However, this provided large amounts of complexity in
both integration with the undo manager and rendering. The second solution is to perform a
"rolling shift", where the where the visuals are shifted on an interval basis, so they are never
displayed. Our solution is that any visuals in the current slide after to are "disabled" by
the recording controller, by setting its tMin to be Number . POSITIVEINFINITY in JavaScript,
essentially moving those visuals to the end of the slide. Once a recording is finished, those
visuals are reset to their original tMin values, then shifted by A.
Recording also makes three additional guarantees worthy of note. Firstly, it provides
snap-left insertion, meaning that if a user seeks to do a recording, the time will snap left
to right after the most previous visual.
However, if there is a visual appearing during
the insertion time, no snap-left will occur. This helps to eliminate the silences that come
about when a user begins recording at an arbitrary time within the lecture. We believe
that users very rarely want silences, but instead want to place constraints that are more
meaningful. Second, recording provides strict additvity in all aspects, but specific to note are
the transformations. For example, selection of visuals during a recording places a transform
within each of the selected visuals which says to display the selection color and selection
width until another transformation says otherwise. The deletion of a visual during a recording
also does not delete the visual, but simply sets the tDeletion appropriately. Third, the
recording controller also asks the retiming controller to properly maintain the timing within
the recording by placing constraints at the beginning and end of a recording.
42
visual time
visual time
insertion time
(a) The user wishes to begin a recording at
some point in time.
insertion time
(b) The system will snap the insertion time left
to right after the most previous visual.
visual time
insertion time
(c) When the recording time overlaps with a
visual, no snap will occur.
Figure 5.1: The current time will snap backwards to whenever the user last had a visual,
eliminating silences between the recording and the previous visual, only when it does not
conflict with another visual.
5.3
Editing
Where recordings applied transforms to get the desired effect, editing is the process of
retroactively changing the properties of a visual directly. Changes to the color of a stroke or
width of a stroke directly change the stroke's property to make the stroke appear as if it were
originally drawn with the new color or width. Editing also performs the action of clearing all
conflicting transforms, so if a stroke is edited to have a new width, it will have that width for
its entire lifetime unless the user changes the stroke's properties during another recording.
Also important are our specifications for a deletion and their relation to time. In simply
deleting a visual, a space of empty time is left, which may be highly undesirable. Instead, we
allow for a time collapse upon a deletion, where the empty space left by a visual will get filled
in by shifting all subsequent visuals left by the time of the deleted visual. Again, we believe
the user rarely, if ever, means to leave empty space, but usually means to more appropriately
place constraints. However, simply shifting visuals will affect the constraints, so subsequent
constraints are shifted as well. Deletion of a slide affects the constraints in a similar fashion.
43
2
1
visual time
visual time
Figure 5.2: Deletions cascade from the back forwards. The items marked for deletion are in
red. (1) is performed first, followed by (2), leading to a collapse of time based on the items
which were marked for deletion.
5.4
Undo & Redo
The undo manager provides for either a symmetric undo or an asymmetric undo depending on
whether the original action was performed in the recording mode or in the editing mode. The
latter is the easier case to address, which will be discussed first. The actions in the editing
mode provide for a symmetric undo, though calling an undo will break the non-additive
nature of the editing mode. Strokes which are deleted during the editing mode can be added
back into the lecture, which is an additive form of action. Likewise, edits to the stroke
property can be exactly undone to restore the property back to the original.
The undo manager provides for an asymmetric undo semantic in almost all cases with our
application, in relation to a recording. In particular, we believe that most users will intend
to undo their visual actions or undo audio actions, but not necessarily undo both. During a
recording, we only allow for the undoing of visuals; because time is strictly monotonically
increasing for both visual and audio channels, this allows the lecture to be placed into a state
which did not previously exist. Likewise, redoing an action during a recording will place the
lecture into a new state which did not previously exist due to the increase in timing.
For audio, once the recording ends, we place an event at the beginning of the current
recording which indicates the insertion of audio. This means that users who wish to modify
the recording can edit it directly, but would have to undo all visuals during the recording
to reach the audio on the undo stack. Additionally, shifting of visuals by the A is placed
at the beginning of the recording after the audio insertion event. Placing the A at the end
would immediately lead to an immediate undo of the shift on the invocation of the undo call,
44
meaning that visuals would immediately overlap.
Important to note is that a recording maintains external consistency with the outside
world. That is, if a recording begins at a time
then the total time which has elapsed is
tbegin
tend - tbegin.
in the world and ends at a time
tend,
This difference is placed into the lecture
no matter if an undo or redo of a slide occurs in between. If a user performs the undo of
a slide at time
tundo
such that
tbegin < tundo < ten,
then the entirety of
tend -
tbegin
placed into the preceding slide. Additionally, if a user then performs a redo at time
that
tbegin < tundo < tredo < tend,
slide, and
of
tend -
tend -
tbegin
tundo
then the time of
tundo -
tbegin
will be
tredo
such
is placed into the preceding
will be placed into the newly added slide, such that the total time
is placed into the lecture. This same logic can be recursively applied to any
number of slides beyond simply one prior slide and one newly added slide, so that
tend - tbe in
is preserved within the lecture.
This naturally leads us to the semantic that the duration of a recording belongs to the
recording itself, when considering undo and redo. After a recording is finished, users can still
undo and redo actions which occurred during the recording, but because the in-recording
undo guarantees a duration of
guarantees.
tend - tbegin,
the out-of-recording undo must provide the same
Hence, while each slide has a duration, that duration can change if a user
performs an undo/redo on the addition of a new slide. In order to remove the extra addition
of
tend -
tbegin,
a user must either delete a visual which was added, delete a slide which was
added, or undo the entire recording.
45
Chapter 6
Conclusion
Pentimento provides a framework for the flexible authoring of hand-written lectures which
can be modified retroactively. The decoupling of visuals and audio during recording time
allow for more free-form edits which affect each channel independently, and users can later
specify for how to re-synchronize the channels.
Our system follows the MVC design philosophy, where the view is separated from the
models of the application by a variety of controllers which are the hub of the system logic.
The tools controller acts as the interface point for the view, delegating responsibilities for
other controllers and updating the view as appropriate. The recording controller and the time
controller work in conjunction to provide meaningful semantics during a recording session,
with an audio recording controller soon-to-be integrated. After a recording, the user will enter
the editing mode, where they can ask the lecture controller and visuals controller act upon
the underlying model appropriately. During either recording or editing, users can specify
an undo or redo of their last action, which provides the semantics that we guarantee, but
not always an exact undo or redo. At any point in the editing mode, users can also specify
constraints to be placed or removed from the lecture. These constraints provide a loose
synchronization between the two channels, and are handled by the retiming controller.
46
The models within the application consist of either durable data which should be persisted
within the lecture or temporary data which should only exist during a user's session with
the application. The state model consist of user-specified configurations or changes to tool
state. The lecture model is designed around the philosophy that lectures can be broken into
discrete chunks such as slides or blackboards, where one chunk can function independently of
another. Each slide then independently contains an array of transformations and an array of
visuals. Visuals then each also independently contain properties such as the tMin of when
they come into existence during the lecture, tDeletion of when they are removed from the
lecture, and transforms of alterations over their existence. Currently, each visual also stores
its own set of verticies in the form of (x,y,t,p) pairs. Chapter 5 more specifically details
the exact semantics for all of the operations which are supported.
6.1
Future Work
The existing implementation of Pentimento provides for the recording of written strokes,
and the further incorporation of an audio controller. This is a primary goal which is under
development, though can generally follow the structure set forth in the framework laid out,
where the audio controller is the parallel of the visuals controller. We believe audio represents
a more realistic estimate of time than visuals, so the audio channel would be the basis on
which playback is based. We also hope to support processing of the audio to detect silences
and highlight them more easily for the user. Audio presents a different set of needs than the
visuals, though, since authors may have multiple audio tracks. For example, if authors were
to show a YouTube video of an experiment, they might narrate over the video's audio. We
also do not believe the slide semantics of discretization will necessarily apply to audio.
Another goal is the interface and its layout to become more refined for users. Once an
audio controller has been integrated, more visual elements can be presented to the user for
editing, such as the placement of constraints or visualization of audio. We also currently
47
present both recording tools and editing tools simultaneously, though a complete replacement
of tools on mode change could help to eliminate any mode error a user may experience.
Several add-ons to Pentimento would also provide a much richer interaction for users
than what is currently given. We have implemented a best-guess at the primary feature set
and logic which users will need for a coherent recording hand-written lectures, but some
tools are to-be-completed, such as color changes. Space and logic has also been allocated for
tools which apply spatial transformations, though none exist at the moment. Additionally,
users typically prefer recording and editing visuals first, followed by syncing audio to those
visuals [16]. This sync function is currently pending on the audio controller and would require
correct constraint placement.
Pentimento also currently takes full-charge of its visuals representation, from each
individual vertex up to the entire stroke structure.
This allows for the player to have
backwards-compatibility with the OS X implementation [16]. However, some third-party
libraries support the visual operations based on the canvas element, such as Fabric.js.
Moving to an alternative library would change how visuals are represented, but may expedite
development of tools which such a library already supports, such as drag-and-drop.
Lastly, we currently extend purely to the realm of recording hand-written lecture notes
and their replaying, but this could be expanded to any realm where this recording logic
would apply. In particular, text-based lectures, could be a great field of expansion, combining
the interactivity of environments like CodeAcademy with the lecture-style of presentations
like Railscast. As online engines to execute submitted code also exist, text would be a very
interesting area for us to move towards.
48
49
Bibliography
[1] MIT OpenCourseWare. Monthly Reports. 2014.
http://ocw.mit.edu/about/site-statistics/monthly-reports/
[2] Michael Horn. Special K: Don't Sleep On Khan Academy, Knewton. March 21, 2013.
http: //www . forbes . com/sites/michaelhorn/2013/03/21/special-k-dont-sleepon-khan-academy-knewton/
[3] MITx. 6.00r Courseware. 2014.
https: //lms
.mitx.mit
.edu/courses/MITx/6.00r/2014_Spring/courseware/
[4] Khan Academy. What is Khan Academy?
http: //khanacademy.desk. com/customer/portal/articles/337790-what-iskhan-academy[5] Coursera Help. What format are the courses in?
http: //help. coursera. org/customer/portal/articles/1164362-what-f ormatare-the-courses-in-
[6] edX. Student FAQ.
https://www.edx.org/student-faq
[7] MIT OpenCourseWare. Audio/Video Lectures.
http://ocw.mit.edu/courses/audio-video-courses/
[8] Academia Stack Exchange. How much effort does it take to record video courses?
October 9, 2013.
http: //academia. stackexchange . com/questions/9221/how-much-effort-doesit-take-to-record-video-courses
[9] Erik Demaine and Martin Demaine. Recording Video Lectures - a guide by Erik Demaine
and Martin Demaine. October 17, 2012.
http://erikdemaine.org/classes/recording/
[10] Google Groups. Video Capture of Handwritten Notes. 2012.
https ://groups . google . com/a/ascue . org/f orum/#! topic/members/a8xl-9amCq4
[11] Epson. Education Only US Product Pricing - April 2014. April 2014.
http://www.epson. com/_alfresco/projectors/brighterfutures/pdf/Pricing_
Sheets/pricing-bf-projectors.pdf
50
[12] Khan Academy. What software program / equipment is used to make Khan Academy
videos?
http://khanacademy.desk.com/customer/portal/articles/329318-whatsoftware-program-equipment-is-used-to-make-khan-academy-videos[13] Sanjay Gupta. Khan Academy: The future of education? CBS News 2012.
https ://www . youtube . com/watch?v=zxJgPHM5NYI
[14] Android 4 Schools. 5 Note-taking Android Apps for Students.
http: //www. android4schools . com/2013/08/21/5-note-taking-android-appsfor-students/
[15] AppAdvice. iPad Note Taking Apps.
http: //appadvice. com/applists/show/notepad
[16] Fredo Durand. Non-Sequential Authoring of Handwritten Video Lectures with Pentimento.
2014.
[17] AMPS. Lecture Capture.
http://mit-amps.mit.edu/services/lecture
.html
&
[18] Google Product Forums. New YouTube Player Features: Previewing with Thumbnails
More. 2012.
https: //productforums .google. com/f orum/#!topic/youtube/oOzNeuJkrQc
[19] Toni-Jan Keith Monserrat, Shengdong Zhao, Kevin McGee, Anshul Vikram Pandey,
NoteVideo: FacilitatingNavigation of Blackboard-style Lecture Videos, CHI 2013.
[20] Chris Ramsdale, GWT Project, 2010.
http://www.gwtproject. org/articles/mvp-architecture .html
[21] John T. Emmatty, Differences between MVC and MVP for Beginners, 2011.
http: //www. codeproj ect . com/Articles/288928/Dif f erences-betwe en-MVC-andMVP-f or-Beginners
[22] explorercanvas: HTML5 Canvas for Internet Explorer.
https: //code .google. com/p/explorercanvas/wiki/Instructions
51