Pentimento Syncing Audio and Visual Components

advertisement
Creating and Editing "Digital Blackboard" Videos using Pentimento: With a Focus on
Syncing Audio and Visual Components
by
Alexandra Hsu
S.B., C.S. M.I.T., 2013, M.Eng., C.S. M.I.T, 2015
Submitted to the Department of Electrical Engineering
and Computer Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology
June, 2015
Copyright 2015 Alexandra Hsu. All rights reserved.
The author hereby grants to M.I.T. permission to reproduce and
to distribute publicly paper and electronic copies of this thesis document in whole and in part in
any medium now known or hereafter created.
Author:
Department of Electrical Engineering and Computer Science
May 22, 2015
Certified by:
Fredo Durand Thesis Supervisor
May 22, 2015
Accepted by:
Prof. Albert R. Meyer, Chairman, Masters of Engineering Thesis Committee
1
2
Creating and Editing "Digital Blackboard" Videos using Pentimento: With a Focus on
Syncing Audio and Visual Components
by Alexandra Hsu
Submitted to the Department of Electrical Engineering and Computer Science
May 22, 2015
In Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical
Engineering and Computer Science
Abstract
Online education is a rapidly growing field and with the desire to create more online educational
content comes the necessity to be able to easily generate and maintain that content. This project
aims to allow for recording, editing, and maintaining “digital blackboard” style lectures, in which
a handwritten lecture with voiceover narration is created. Current recording software requires
that a lecture audio and visuals be recorded correctly in one take or they must be re-recorded. By
utilizing vector graphics, a separation of audio and visual components, and a way for the user to
be in control of the synchronization of audio and visuals, Pentimento is a unique piece of
software that is specifically designed to record and edit online handwritten lectures.
3
4
Acknowledgements
I would like to thank the many people who helped me reach the completion of this thesis.
First and foremost, I would like to thank my family for many years of support. Especially my
mom, Stephanie Maze-Hsu, for reading countless papers and constantly helping me get through
everything. I would also like to thank my brother Robert Hsu for always being there for a laugh.
Additionally, I would like to thank my friends without whom I couldn’t have gotten to this point.
Most importantly I would like to acknowledge Zach Hynes for his unending encouragement and
support. Kevin Hsiue and Eric Kline both helped me immensely giving feedback and motivation
as I wrote my thesis.
Professor Fredo Durand was a wonderful guiding force and mentor and was extremely
understanding about non-academic issues I faced during my undergraduate and graduate careers.
Without his vision and his guidance this project wouldn’t be realized.
Jonathan Wang worked on this project with me and co-authored portions of this thesis. Without
him I don’t think the Pentimento prototype would work.
Finally, I would like to thank all of the medical professionals who have helped patch me together
enough to hand in this thesis.
There were times I thought I would never complete this document and the fact that I have is a
testament to my amazing support system. Thanks to all of those people mentioned here and
countless others, without whom I wouldn’t have been able to complete my thesis.
5
Table of Contents
1.
Introduction ............................................................................................................................. 9
2.
Background Information ....................................................................................................... 10
2.1 Visual Edits ..................................................................................................................... 10
2.2 Audio Edits ...................................................................................................................... 11
3.
Related Work and Current Solutions..................................................................................... 12
4.
Project Goals ......................................................................................................................... 14
4.1 Features ........................................................................................................................... 15
4.1.1 Implemented Recording Features ........................................................................... 15
4.1.2 Implemented Editing Features................................................................................ 17
5.
User Guide and Tutorial ........................................................................................................ 18
5.1 The Main Recording UI .................................................................................................. 19
5.1.1 Recording Visuals .................................................................................................. 20
5.1.2 Editing Visuals ....................................................................................................... 22
5.1.3 Recording Audio .................................................................................................... 23
5.1.4 Editing Audio ......................................................................................................... 23
5.2 The Retimer ............................................................................................................... 23
6.
Code Organization Overview ................................................................................................ 26
6.1 Lecture ............................................................................................................................. 28
6.1.1 Lecture Model ........................................................................................................ 28
6.1.2 Lecture Controller .................................................................................................. 29
6.2 Time Controller ............................................................................................................... 30
6.3 Visuals ............................................................................................................................. 32
6.3.1 Visuals Model......................................................................................................... 32
6.3.2 Visuals Controller................................................................................................... 33
6.3.3 Tools Controller ..................................................................................................... 34
6.3.3.1 Visuals Selection ................................................................................................. 34
6.4 Audio ............................................................................................................................... 34
6.4.1 Audio Model........................................................................................................... 35
6
6.4.2 Audio Controller..................................................................................................... 35
6.4.3 Audio Playback ...................................................................................................... 36
6.4.4 Audio Timeline....................................................................................................... 37
6.4.5 Audio Track Controller .......................................................................................... 37
6.4.6 Audio Segment Controller...................................................................................... 37
6.4.7 Audio Plug-in ......................................................................................................... 38
6.5 Retimer ............................................................................................................................ 38
6.5.1 Retimer Model ........................................................................................................ 39
6.5.2 Retimer Controller .................................................................................................. 40
6.6 Thumbnails Controller .................................................................................................... 41
6.7 Undo Manager ................................................................................................................. 41
6.8 Renderer .......................................................................................................................... 42
6.9 Save and Load Files ........................................................................................................ 42
7.
Future Work .......................................................................................................................... 43
7.1 Future Features ................................................................................................................ 44
7.1.1 Recording ............................................................................................................... 44
7.1.2 Editing .................................................................................................................... 46
7.2 User Interface Additions ................................................................................................. 47
7.2.1 Main Visuals Recording and Editing UI ................................................................ 47
7.2.1 Audio and Retimer Timeline UI............................................................................. 48
7.3 Student Player.................................................................................................................. 49
8.
Conclusions ........................................................................................................................... 49
Appendix A: Documentation ........................................................................................................ 51
A.1 Lecture ............................................................................................................................ 51
A.1.1 Lecture Model ....................................................................................................... 51
A.1.2 Lecture Controller ................................................................................................. 52
A.2 Time Controller .............................................................................................................. 54
A.3 Visuals ............................................................................................................................ 55
A.3.1 Visuals Model ........................................................................................................ 55
A.3.2 Visuals Controller .................................................................................................. 57
7
A.4 Tools Controller.............................................................................................................. 59
A.5 Audio .............................................................................................................................. 62
A.5.1 Audio Model .......................................................................................................... 62
A.5.2 Audio Controller .................................................................................................... 65
A.5.3 Track Controller .................................................................................................... 68
A.5.4 Segment Controller ................................................................................................ 70
A.6 Retimer ........................................................................................................................... 71
A.6.1 Retimer Model ....................................................................................................... 71
A.6.2 Retimer Controller ................................................................................................. 73
A.7 Thumbnails Controller.................................................................................................... 75
Appendix B: Example of Saved Lecture JSON Structure ............................................................ 76
References ..................................................................................................................................... 84
8
1. Introduction
With the growing popularity of online education through programs such as Khan Academy and
MIT’s EdX, there is an increased need for an easier way to create educational video lectures.
This project strives to simplify the process of creating these online videos and to make editing
these videos much easier. In current solutions, an entire incorrect segment must be re-recorded
to edit the lecture. There are many advantages to separately editing audio and written visual
components of an online lecture, such as content being updated as years pass and not becoming
obsolete. It will also be less frustrating and time consuming for educators to record presentations
that do not have to be done correctly in one take.
The project focuses on the popular style exemplified by Khan Academy where an educator
writes notes on a virtual blackboard as he or she gives a lecture or explains a topic. With these
“blackboard lectures” students see the handwritten notes and drawings produced by the lecturer,
while hearing a voiceover narration explaining the written content. Currently, creating these
videos is done by recording the written notes using extremely basic tools. These tools essentially
use a tablet and pen as input and screen capture to record the strokes in a simple paint program.
Although this method is an improvement from merely taking a live video recording of the
lecturer and the blackboard they are writing on, it still requires the educator to get everything
correct in a single pass because there is no editing capability beyond cuts.
The technology created by the Pentimento project makes it easy to edit and update the content of
each lecture. Pentimento was started by MIT Professor Fredo Durand for the purpose of
addressing the specific needs that arise when editing handwritten lecture content. The software
is currently under development by Professor Durand and my thesis work involved taking the
9
Pentimento prototype (which can only be run on Mac OS) and converting it into a web based
tool that can be used to record, edit, and view online lectures.
2. Background Information
There are some key differences between editing a handwritten lecture, such as the ones that can
be seen on Khan Academy or EdX, and editing a standard video or movie. Typical movies can
be edited with cuts and if there are errors it often makes sense to re-record the entire scene or
video. However, handwritten lectures introduce different types of errors and corrections that
could ideally be fixed without having to remake a segment or worse, the entire lecture. For
example, handwritten lecture style videos also often benefit from recording and editing the audio
portions separately from the visuals. This minimizes cognitive load on the speaker while
recording [4] as well as providing the ability to independently correct mistakes that occur only in
the audio recording or only in the visuals. In addition to allowing for temporal editing, such as
moving visuals to a different time in the lecture, the new editing capabilities introduced by
Pentimento facilitate correction of the following types of errors:
2.1 Visual Edits
•
Correct existing lines: While writing a lecturer will often make a writing mistake that
they would like to correct later, such as accidentally writing “x=…” instead of “y=…” In
these cases, the software allows for modification of the strokes to change the “x” into a
“y” without altering the timing of the writing or the synchronization with the audio.
•
Insert missing lines: As the lecture is recorded, the lecturer may omit something that they
would like to add later, such as leaving the prime off of a variable (writing “x” instead of
10
“ x’ ”). While editing, this stroke can be added at the appropriate time in the lecture
without altering the timing of the following strokes or synchronization with the audio.
This can also be extended to include entirely new content, such as adding a clarifying
step in a derivation.
•
Move/resize drawings and text: Sometimes when viewing the video, the lecturer
discovers that it would make more sense to arrange the text/visuals differently or that a
certain visual should be made bigger or smaller. While editing they will have the ability
to rearrange and resize these visual components without altering the timing of the strokes
or synchronization with the audio.
2.2 Audio Edits
•
Re-record and sync: This feature gives the lecturer the ability to only record the audio
portion for a section of the lecture and have it remain synced with the visuals for that
portion. This could be the first pass recording (e.g. recording audio after visuals have
already been drawn) or an edited audio recording to correct a spoken error during the first
pass of the lecture.
•
Eliminate Silence: Often it takes longer to write something than to say it, which can lead
to long silences in the lecture. Pentimento allows the visuals to be sped up to fill only the
time that corresponds to speech and to eliminate these silences.
These differences require alternate tools and editing that currently available standard video
editing software cannot provide. Pentimento strives to allow for these different types of video
alterations and hopes to make it extremely easy for lecturers or others to edit the videos quickly
and efficiently.
11
For visual edits, this will be accomplished by tracking the user’s drawing and writing inputs and
representing the strokes as vectors, which can be modified later. By using a vector
representation, the position, size and speed at which the text and drawings are presented can be
altered and updated after the recording has taken place. The ability to move and edit drawings
and text allows the presenter to fix or change the focus of a section of the video without rerecording the whole thing. Pentimento uses the vectors to represent the lecture in a vector
graphics format, which can be edited much more simply than the current representations in the
form of raster graphics. Raster graphics are images that are displayed as an array of pixels (i.e.
a computer screen or television) [6]. Vector representations offer various advantages over raster
graphics representations, such as compatibility with differing resolutions and ease of
modification based on construction of vectors [10], and ease of modification based on
construction of the vectors. The vectors can be formed either by storing two points and the
connecting line segment, or storing a single point with an associated magnitude and direction [2].
Both of these constructions allow changes to be made fairly efficiently, since the image does not
have to be redrawn. Instead, a parameter in the construction simply has to be modified to fix the
problem.
3. Related Work and Current Solutions
The current process for editing videos of this type is not quite so simple. Even the preeminent
video editing tools (such as Apple’s Final Cut Pro, Adobe’s Premiere or Avid [4]) only allow
traditional raster graphics to be edited and moved, but don’t facilitate the modification of
handwritten videos stored as vectors. Some promising video editing work has been done to
allow removal of segments of a video (selected from the video transcript) and then to seamlessly
12
transition between the portions of the video before and after the deletion [3]. However, these
sorts of editing capabilities only address some of the difficulties with editing handwritten
lectures. While these tools would be useful for changes such as removing long silences that
occur while writing or helping to correct the speaker’s mistakes, they are unable to address and
correct errors in the writing or help make significant changes to the content of the lecture.
Editing these videos isn’t the only challenge with current technologies; even creating
handwritten content for teaching purposes is somewhat difficult with currently available tools.
There are very few ways to record, edit, and view freehand drawings [4]. Although there is some
ability to animate the vector graphics with formats like SVG [9] the ability to easily maintain
audio synchronization is not supported. More commonly, to give the appearance of real-time
handwriting in web browsers Flash animation can be used [8]. However, this can be complex
and it often requires using animated masks or other techniques [1], which can be difficult and
requires a lot more effort and processing than simply recording a video of the handwritten
content as it occurs. There is immense desire to create the appearance of handwritten lectures,
and software has even been created to automatically create animations that give the appearance
of handwriting [5]. However, none of these solutions allow for real time recording and post
process editing of the actual writing in the ways that are necessary to record an effective video
lecture.
The Khan Academy style videos are recorded using technology available for capturing live
handwritten lectures. These are usually recorded using a screencast method, which involves
digitally capturing a certain portion of a virtual whiteboard screen during a live lecture (i.e.
capturing the screen as the lecture is written). These methods can produce videos similar to the
13
style we hope to achieve, but they often require extensive preparation to plan out exactly what
will be said and when to say it to avoid having to re-record large portions of the lecture [7]. As
mentioned above, a key feature of our software is the capability to easily edit or update lectures
after they have been recorded, which is not a possibility with the screencast methods (especially
separately editing the visual and audio components and keeping them synced in time).
4. Project Goals
There is a clear need and audience for software that allows for straightforward recording and
editing of digitally handwritten video lectures. Professor Durand has been developing this
software, but his current prototype version is only compatible with Mac OS. In order to make
the software easily accessible and widespread we are aiming to create a web based version to
allow for easy recording, editing, and viewing of these online video lectures. There are many
advantages to creating a web based software that we considered when deciding to develop a
second prototype. Most importantly, it is accessible to anyone regardless of the operating system
they are using. Secondly, many existing premiere video editing tools are extremely expensive
and are not able to be used cross platform. Another advantage is that web based tools allow for
easier collaboration because they do not require all parties to have the same software.
Ultimately, we hope to have an available web version that lecturers can easily use to record one
of the handwritten whiteboard style lecture videos. The web version of Pentimento was written
using JavaScript, HTML5, and CSS, as well as additional libraries including jQuery, jCanvas,
wavesurfer.js, and spectrum.js. By using these technologies we have created a website prototype
where lecturers can produce videos that can then be easily edited and updated after recording.
14
4.1 Features
To create the web version of Pentimento there is a minimum feature set that must be
implemented to make the program useful and effective. As I worked towards my thesis, I
assisted in designing and implementing the main interface and the key features to make the web
version of Pentimento successful. The key features will be updated as users actually test the
website and, in theory, iteratively updated to match user need. New features will also be
implemented by future students. There are two components of the program: recording and
editing. Each has specific features that will make the website functional, in addition to other
features that would be nice for a user to have available but weren’t essential during the first
prototype (discussed in the future work section).
4.1.1 Implemented Recording Features
Lecturers require recording capability to record both the audio and visual components of the
lectures. During the recording phase the lecturer will write and draw the visual components of
the lecture as well as record the corresponding audio explanation. To be able to do these tasks
the software must support the following features:
Feature
Record Button
Function
This allows the lecturer to start the recording.
Stop/Pause
Button
The lecturer is able to stop the recording at any point (and then resume
recording from the same point later).
Pen Tool
The main use of the software is to create handwritten lectures and the pen is
the basic input tool used to write/create the visual input content.
Recording
Canvas
The writing and drawing must be done on something resembling a virtual
whiteboard. The canvas is the area of the screen devoted to creating the actual
lecture content (i.e. this is the area that would be recorded using the typical
15
Feature
Function
screen capture techniques).
Insert New
Slide
The recording area consumes a finite space on the screen, but most lectures
need more space available. Inserting a slide allows the lecturer to reveal new
blank space to fill with content.
Selection of
Visuals
While recording, the lecturer often needs to select content to move or delete.
Note: Selection during recording is different than selection during editing. If
something is selected while being recorded, that selection will be part of the
final video.
Deletion of
Visuals
During recording the lecturer may want to remove content. Note: Deletion
during recording is different than deletion during editing. In the final video,
something that was written and recorded and then deleted while recording will
appear in the video and then the viewer will see it was removed (in contrast to
deleting something while editing where the strokes will never appear in the
video).
Time Display
and Slider
While recording, the lecturer should be aware of how much time has elapsed in
the recording. The time display shows the current recording time. The time
slider, as part of the audio timeline, also allows the lecturer to choose when to
insert a recording.
Separate
Recording of
Audio/Visuals
Lecturers have the ability to record solely audio, solely visuals or both
components simultaneously, allowing for fine tuning the recording of the
lectures.
Pen Color
Many online handwritten lectures utilize changes in pen color to emphasize
certain topics or at least vary the visuals to make it easy to quickly notice the
key points.
Line Weight
Similar to variations in pen color, being able to support different line weights
allows lecturers more flexibility in the visual quality of their videos and allows
different areas to be emphasized and stand out.
Recording
Indicator
Sometimes it is difficult to tell if certain software is in recording mode or not,
so we want to make it very obvious to the lecturer that the current actions are
being recorded. Currently this is done by changing the “Recording Tools”
label background to red, which probably is not obvious enough.
16
4.1.2 Implemented Editing Features
Once the lecture has been recorded, the lecturer (or another person) may proceed to edit the
recorded content. The editing phase requires a different tool set than while recording because the
post recording editing capabilities are crucial to saving time and maintaining the quality of the
content. The essential editing capabilities are:
Feature
Play/Pause
Buttons
Function
Playback is an essential part of editing because it allows reviewing the
portions of the video that have been recorded.
Time Slider
Navigating to a certain point in the lecture is necessary to be able to edit
parts of the recording at certain times. The time slider also allows the
lecturer to choose when to start or stop playback.
Selection of
Visuals
While editing, many tools require use of a selection tool, which allows for
selection of certain components to edit (e.g. to delete them). Note: this
selection tool is different than the selection tool used during the recording
phase. Selections made while editing will not be seen as part of the final
video.
Deletion of
Visuals
Removing errors or unwanted content is an essential part of editing video
lectures. Note: deletion during editing is different from deletions during
recording. In the editing phase strokes that are deleted will be removed from
the final video (as if they were never written).
Retiming and
There is a tool that allows realigning visual and audio components of the
Resynchronization video in case they are recorded separately or some adjustments need to be
made. This also allows for removing long silences introduced by writing
taking longer than speaking and adjusting other places where the
visual/audio parts of the lecture may need to be edited separately and
realigned. Additionally, temporal edits are necessary so that visuals are
sped up or slowed down to match the speed of the audio. Note: the retimer
is a separate user interface
Stroke Color
This would allow the editor to change the color of a stroke for the duration
of the video (whereas if the pen color is changed while recording, the
original color would be maintained where it was already recorded).
17
Feature
Stroke Weight
Function
Similar to editing stroke color, this feature would allow the editor to change
the weight of a stroke after recording and the new weight would be evident
for the full duration of the recording for that stroke.
5. User Guide and Tutorial
Pentimento was created to allow for easy creation and revision of handwritten “digital
blackboard” style lecture videos. However, Pentimento transcends current solutions by adding a
simple editing component, which facilitates increased flexibility in updating lecture content once
recording is completed. Other solutions barely allow editing beyond cutting content, but
Pentimento has much stronger editing capabilities, including separate editing and
synchronization of audio and visual components.
This section walks a user through the Pentimento web software, detailing the user interface and
explaining how to do simple recordings of lectures. Since Pentimento allows for non-linear
recording and editing of lectures there are a lot of options for how to begin recording a lecture.
The basic Pentimento lecture consists of handwritten strokes on slides with a voiceover lecture,
but there are many choices for how to create this lecture. As a lecture is recorded the user has
the option to insert slides for organizational purposes or simply to create a blank slate to record
visuals on. The visuals are currently in the form of strokes, which appear as the handwritten part
of the lecture. An audio track is created while recording audio and new audio segments are
created by breaks in audio recording. The audio segments can be rearranged by the user after
recording. Finally, the user has a chance to create synchronization points to connect specific
audio and visual moments in the lecture, allowing for playback to show user selected visuals at a
user specified audio time.
18
The first unique aspect of the Pentimento software is the ability to record audio and visuals
separately or together. Once audio and visual components of a lecture have been recorded, the
audio and visuals can be synchronized
chronized through the retimer. The second, and probably most
important, innovation of Pentimento is the ability to edit the lecture after recording. By allowing
the user to change content of the lecture (visual or audio) after recording and keep the timing
timi the
same, it is much simpler to create an accurate, effective
effective, and up to date lecture video.
Pentimento allows users to edit the
he lecture in many ways such as: updating layout and display
(e.g. changing the color of visuals), inserting content at any ti
time
me and synchronizing the audio
and visual components to make the timing exactly what is desired.
5.1 The Main Recording UI
The main recording portion off the web interface is where a user can begin recording and editing
visuals.
Figure 1:: Main Pentimento Recording Interface (in editing mode)
19
As a lecturer be
begins
gins recording he or she is given the option to
record just the visuals, just the audio or both. User tests
Figure 2: Record Button with
Recording Options (current state
would record both visuals and audio)
indicated that most people choose to record the visuals first,
then the audio and then add synchronization between the two
[4]. For simplicity heree we will discuss how to record and edit each modality separately, but
they can also be recorded at the same time.
5.1.1 Recording Visuals
The basis for recordingg visuals is the pen tool. Afte
After the record button is pressed, any time the
pen or mouse is placed on the main drawing canvas the resulting strokes will be recorded as part
of the lecture. Hitting stop in the top left corner then stops the recording.
Figure 3:: Main Recording User Interface (in Recording Mode)
Mode).. The pen tool is highlighted as the main input
for lecturers. The recording canvas is shaded to indicate space where the pen tool can be used. Finally to
stop recording the stop button in the top left corner would be clicked.
20
While recording visuals, it is possible to select visuals and then resize, delete or move those
visuals. If a selection is made while in recording mode, that selection will become part of the
recorded lecture, so when it is played back the person watching the lecture will be able to see the
selection and any actions that have been taken (e.g. moving the selected visuals).
Figure 4:: Using the Selection Tool. The selection tool is highlighted. In this example, the letter "C" is
selected and could be deleted, moved or resized by the user.
Additionally, the color and/or width of the pen strokes can be adjusted by selecting these options
from the recording menu.
Figure 5:: Pen Stroke Changes. On the left is the color palette to change the color of
the pen strokes. The right image shows the available widths of the pen tool.
21
A lecturer also has the ability to insert a new slide by pressing the add slide button.
This clears the canvas and allows for a blank slate while recording. Slides can be
Figure 6: Add
Slide Button
used as organizational tools, or simply to wipe the screen clean for more space.
Once some visuals have been recorded they can be played back by hitting the play button.
Figure 7:: In editing mode visuals can be played back by clicking the play button
(emphasized here)
5.1.2 Editing Visuals
When recording has stopped Pentimento enters
ers editing mode. This allows a
user to make changes that are not recorded as part of the lecture, but instead
change from the moment the visual appears. For example
example,, changing the color
of a stroke while editing will change the color of that stroke from the moment
it was written, instead of changing it mid
mid-playback
playback (which is what would
happen if the color was changed during recording). Some other examples of
visual edits
dits are changing the width, resizing, moving, and deleting visuals.
This allows for errors to be corrected (e.g. if something is misspelled the
Figure 8: Editing
Toolbar
visuals could be deleted in editing mode and the specific word could be re22
recorded)) and content to be updated
updated.. Layout changes are also common, since sometimes it is
difficult to allocate space properly the first time a lecture is recorded.
5.1.3 Recording Audio
While audio can be recorded at the same time as the visuals, many users choose
to record it separate
separately.
ly. Recording audio is as simple as hitting record and then
Figure 9:
Recording Only
Audio
speaking into the microphone. It is also possible to insert audio files, such as
background music or audi
audio examples to enhance a lecture.
5.1.4 Editing Audio
The main type of audio edit that is necessary in handwritten lectures of this kind is removing
long silences. Often, if recording audio and visuals at the same time, writing takes longer than
speaking, filling the lecture with long silences that can be deleted in the audio editing phase.
Audio
udio segments can also be rearranged or dragged to a different time.
Figure 10:: Audio Waveform displayed on the audio timeline
5.2 The Retimer
Retiming is a key innovation of Pentimento
Pentimento,, allowing the user to resynchronize the visual and
audio components of a lecture. This is a form of editing that affects tthe
he playback of the lecture,
playing visuals at a user specified time during the audio. To achieve this synchronization the
user uses the retimer display as shown. The display is comprised of a thumbnail timeline,
displaying snapshots of visuals at time iintervals
ntervals throughout the lecture. These correspond to the
23
audio timeline below. In between the thumbnails and the audio is the main feature of the
retimer, where correspondences between audio and visuals are drawn.
Figure 11:: The Audio Timeline and Retimer. This displays the user interface that can be used to
add synchronization points between visual and audio time in a lecture. The top displays
thumbnails of the lecture visuals. The bottom is the audio waveform representing the lecture
audio. In between is the retiming canvas, which allows the user to add synch
synchronization
ronization points
between the visuals (represented by thumbnails) and the audio (represented by an audio
waveform).
To insert a new constrain
constraintt the “add constraint” button must be clicked
and then the user must click on the place on
n the retimer timeline where
Figure 12: Add Constraint
Button
he or she wants to draw the correspondence. These synchronization
points are represented by arrows pointing to the point in the audio tim
timee and the corresponding
point in the visual time. Note: Some constraints are added automatically at the beginning and
end of recordings to preserve other constraint points. Automatic constraints are gray, while
manually added constraints are black.
Figure 13: New constraint added to the
constraints can
canvas by the user.
24
To fine tune the audio and
visual correspondence the user
can drag the ends of the arrow
to line up with the exact audio
time and the exact visual time
they would like to be played
Figure 14:: User dragging a constraint to synchronize a certain point in the audio
(bottom of the arrow) with a new point in the visuals (the point the top of the
arrow is dragged to)
together. Then the visuals on
either side of the constraint will
be sped upp or slowed down appropriately to ensure that during playback the desired audio and
visual points are played at the same time. Note: it is always the visual time being adjusted to
correspond to the audio time (this decision was made because writing faster or slower flows
much better than the lecturer suddenly talking faster or slower).
To delete a constraint a user simply click
clicks within the constraints timeline and drags
drag a selection
box over the constraint(s) he or she wishes to remove.
Figure 15
15: User selecting a constraint to delete
This turns the selected constraints red (to visually confirm that the desired constraint has been
chosen). Then the user can click on the delete constraint(s) button to remove the
correspondence.
Figure 16(a): Selected constraint
(indicated by turning red)
Figure 16(b): Delete
Constraint(s) Button
25
Figure 16(c):
(c): Selected
Constraint Removed
6. Code Organization Overview
The base functionality of Pentimento is the ability to record a lecture. This process is initialized
when a user clicks the record button and starts to record visuals and/or audio. This then begins
the recording process in the LectureController, which propagates down to recording
visuals and audio. As the user adds strokes on the main canvas these events are captured by the
VisualsController and added to the visuals array in the current slide of the
VisualsModel. Similarly, the AudioController processes audio input and creates an
audio segment which is stored in the current audio track. Recording input is continually added to
these data structures and changes are also processed and added. For example, if a user decides to
change the color of a stroke, that property transformation is added to the data structure for that
visual. Ultimately, when a recording is completed, users can then go back and edit the recorded
content. This process also stores property transforms and spatial transforms as part of the visuals
data structure. Retiming is another key part of editing. When a user adds a constraint to the
retiming canvas that constraint is processed and added to the constraints array with the associated
visual and audio times to be synchronized.
All of these components are combined to create a Pentimento lecture. A lecture is the basic data
structure and it is comprised of separate visual and audio pieces, each of which is organized into
a hierarchy. The visuals are comprised of slides, each of which contains visual strokes written
by the lecturer. These strokes are made up of vertices (points that are connected to display the
stroke). The audio contains various tracks, each of which includes audio segments. The final
component of a lecture is the retiming constraints, which are the synchronization information
that unites the audio and visual components at a certain time.
26
The Pentimento code base is organized into a Model
Model-View-Controller
Controller (MVC) architectural
pattern. The basis for any recording is the Lecture, which contains visuals, aaudio
udio and retiming
information. Each of these main components has a model and a controller, the details and
specifications of which are outlined below. The models contain the specific data structures for
each component, allowing lecture data to be manipulated.. The controllers connect the lecture
data to the view (the user interface), handling user inputs and making the necessary changes to
the models, updating the lecture appropriately.
Figure 17:: All of the modules in the Pentmento code base. Arrows indicate that there is a reference
in the file with the origin of the arrow to the module where the arrow is pointing. This allows for
the file original file to access the functionality of the sub
sub-file
The web version of Pentimento was written using JavaScript, jQuery,, HTML5 and CSS.
Additional packages were used for displaying certain aspects of the user interface. jCanvas was
used for displaying the retimer constraints, allowing a simple API for drawing and dragging the
constraints, as well as selection and ot
other
her canvas interactions. Wavesurfer.js is used for
displaying audio wave forms. Spectrum.js is used as a color selection tool.
27
6.1 Lecture
A Pentimento lecture is made up of visual and audio components. To allow the lecture to be
played back correctly a Pentimento lecture also contains a “retimer,” which stores the
synchronization information between the visuals and the audio.
Figure 18: Illustration of the data types that comprise a Pentimento Lecture. At the highest level there is the lecture,
which is comprised of visuals, audio, and retiming data.
6.1.1 Lecture Model
The LectureModel contains access to the VisualsModel, AudioModel and
RetimerModel. Each of these
ese models has a getter and a setter in the lecture model,
establishing the places to store and update the data associated with each component of the
28
lecture. The LectureModel also contains functionality for initializing and getting the total
duration of the lecture, and for saving and loading a lecture to JSON.
6.1.2 Lecture Controller
The LectureController handles the UI functionality for recording and playback, undo and
redo, and loading and saving lectures. It also serves as the entry point for the application through
the $(document).ready( ) function. For recording and playback, it uses the
TimeController to start timing and then calls the appropriate methods in the audio and
visuals controllers. The LectureController determines if the recording mode is visuals
only, audio only, or both visuals and audio. This information is used in functions to start and stop
recording a lecture.
During a recording, the LectureController creates a grouping for the UndoManager so
that all undoable actions fall within that undo group. When the undo button is pressed, it calls a
method in the LectureController that calls the undo method of the UndoManager and
redraws all of the other controllers. The LectureController also registers a function as a
callback to the UndoManager and the role of this function is to update the state of the undo and
redo buttons so that each one is inactive if there are no undo or redo actions that can be
performed.
The LectureController is also responsible for initializing the process of creating and
loading saved Pentimento files. This is discussed in the Save File section.
29
6.2 Time Controller
In a Pentimento lecture, time must be kept track of because visuals and audio of the lecture may
operate on different timelines. The two timelines can occur if audio and visuals are recorded
separately, or if the retimer is utilized to adjust visual time to coincide with certain audio times.
The TimeController manages the “global time” of the lecture, or the time seen when the
lecture is being played back (regardless of the corresponding time that the visuals being played
were recorded at). The TimeController contains access to the current lecture time, as well
as providing the necessary calls to start or stop keeping track of global lecture time.
When referring to time, there are four different time measurements.
1. “Real Time” refers to the time of the system clock. Real time is the time returned by the
system clock when using the JavaScript Date object: new Date().getTime().
2. “Global Time” or “Lecture Time” refers to the global time for the lecture that is kept by
the TimeController. The global time starts at 0 and the units are milliseconds.
3. “Audio Time” refers to the time used for keeping track of the audio elements. There is a
1:1 correspondence between global time and audio time, so audio time directly matches
with the global time. Because of this, there is no real difference between the global time
and the audio time. The only difference is that global time is used when referring to the
time kept by the TimeController, and audio time is used when keeping track of the
time in the context of the audio.
4. “Visual Time” is used when keeping track of the time for the visual elements, and it is
aligned with the global time through the retimer and its constraints. All times from the
30
TimeController must be passed through the retimer in order to convert them into
visual time.
The audio, visuals, and retimer need the TimeController in order to get the time, but the
TimeController operates independently from the audio, visuals, and retimer. The
TimeController has functionality to get the current time, start timing (automatic time
updating), allow a manual update of the time, and notify listeners of changes in the time. When
the TimeController starts timing, the global time will begin to count up from its current
time. Timing can be stopped with a method call to the TimeController. When the
LectureController begins recording, it uses this timing functionality to advance the time
of the lecture. Methods can also be registered as callbacks to the TimeController so that
they are called when the time is updated automatically through timing or manually through the
updateTime method.
Internally, timing works by keeping track of the previous real time and then using a JavaScript
interval to trigger a time update after a predetermined real time interval. When the interval
triggers, the difference between the current real time and previous real time is calculated and
used to increment the global time. The current real time is saved as the previous time. The
updateTimeCallbacks are called with the new global time as an argument. When timing is
not in progress, the getTime method just returns the current time. However, when timing is in
progress, the getTime method will get the current real time and calculate the difference
between that and the previous time, just as it happen during an interval update. Effectively, this
pulls the current global time instead of just observing an outdated global time. This allows a finer
granularity of time readings during timing. This mechanism is important because if the time were
31
only updated every interval without pulling the most recent global time, then there would be
visuals occurring at different times but still reading the same global time.
updateTimeCallbacks is not called when the time is pulled during a getTime call. This is
to prevent an overwhelming number of functions getting called when there are a large number of
getTime calls, such as those that occur during a recording when there are many visuals being
drawn that require getTime to get the time of the visual.
The TimeController also has methods to check if it is currently timing and to check the
beginning and ending times of the previous timing. The TimeController does not have any
notion of recording or playback. It is the LectureController that uses the
TimeController timing to start a recording or playback.
6.3 Visuals
The visuals component of a Pentimento lecture is organized in a hierarchy. Slides are the base
level, which contain visuals. Each type of visual then has a certain data structure associated with
it. Currently, strokes are the only type of visual that has been implemented. Strokes are
comprised of vertices, which are points containing x, y, t and p coordinates (x and y coordinate
position, time, and pressure respectively).
6.3.1 Visuals Model
The VisualsModel contains the constructors for all components of visuals. The
VisualsModel contains an array of slides, allowing slides to be created and manipulated. A
slide provides a blank canvas for recording new visuals and allows the lecturer to have a level of
32
control over the organization of information. A slide contains visuals, slide duration and camera
transforms.
The visuals themselves have many components including type (e.g. stroke, dot, or image),
properties (e.g. color, width and emphasis), tMin (the time when the visual first appears),
tDeletion (time when the visual is removed), property transforms (e.g. changing color or width)
and spatial transforms (e.g. moving or resizing). Property transforms have a value, time and
duration. Spatial transforms also have a time and duration, as well as containing a matrix
associated with the transform to be performed.
Finally, to actually display the visuals, the type of visual is used to determine the drawing
method. Currently, strokes are the only supported type of visuals and strokes are comprised of
vertices. A vertex is represented by (x,y,t,p) coordinates, where x is the x position, y is the y
position, t is the time and p is the pen pressure associated with that vertex.
6.3.2 Visuals Controller
The VisualsController has access to the VisualsModel and the RetimerModel.
The VisualsController also utilizes the ToolsController and the Renderer. The
VisualsController is responsible for drawing the visuals onto the canvas as the lecture is
being recorded. As visuals and slides are added to the view by the user, the
VisualsController accesses the VisualsModel and adds the appropriate data structure.
The VisualsController also allows the user to adjust properties of the visuals, such as the
width and color.
33
6.3.3 Tools Controller
The ToolsController allows the user to manipulate which tool they are using while
recording or editing the visuals of the lecture. The ToolsController allows switching of
tools as well as indicating what to do with each tool while the lecture is recording or in playback
mode. The ToolsController also creates the distinction of which tools are available in
editing mode vs. in recording mode.
6.3.3.1 Visuals Selection
Visual elements can be selected by using the selection box tool. This tool works in both
recording and editing modes. In the VisualsController, the selection is an array of visuals
that is under the selection box drawn by the user. For StrokeVisuals, the renderer uses
different properties to display these visuals so that the user has feedback that the visuals have
been selected.
The selection box itself is implemented on a separate HTML div on top of the rendering canvas.
Inside this div, there is another div that is setup using jQuery UI Draggable and Resizable. This
allows the box to be dragged and resized by the user. Callback methods are registered so that
when the box is resized or dragged, a transform matrix will be created based on the change
dimensions and position of the selection box. This transformation matrix is passed on to the
VisualsModel.
6.4 Audio
Similar to the visuals, the audio components of Pentimento lectures are organized into a
hierarchy with audio tracks being organized into segments. A lecture could contain multiple
tracks (the simplest example being one track containing the narration of the lecture while a
34
second track contains background music). Each track is organized into separate segments (e.g.
the audio associated with a slide).
6.4.1 Audio Model
The AudioModel consists of an array of audio tracks, where each audio track consists of an
array of audio segments. An audio segment contains the URL for an audio clip, the total length
of the clip, the start and end times within the clip, and the start and end locations within the track
(audio time).
The top level AudioModel has functions to insert and delete tracks. The audio track class has
functions to insert and delete segments. All functionality for modifying segments within a track
is handled by the audio track. This includes shifting segments, cropping segments, and scaling
segments. This is because no segments can overlap within a track, so modifying a segment
requires knowledge of the other segments within that track to ensure that the operation is valid.
The audio segment class has methods for converting the track audio time into the time within the
clip and vice versa.
The AudioModel can be converted to and from JSON for the purpose of saving to and loading
from a file. During the saving process, the audio clip URLs are converted into indices and the
resources they point to are saved with filenames corresponding to those indices.
6.4.2 Audio Controller
The AudioController accesses the AudioModel so user changes can be applied. Within
the AudioController there is also access to the track and segment controllers. Each track
and segment is initialized when the user begins recording, which is processed through the
35
AudioController. The AudioController also handles the end of recording. In addition
to handling recording, the AudioController is responsible for playback of the audio.
The audio TrackController contains access to all of the segments contained within that
track. Each TrackController can also retrieve the track ID and the duration of the track.
The TrackController also allows for manipulation of the segments within the track
(dragging, cropping, inserting, and removing segments). The SegmentController handles
access to specific segments and contains the means to display the audio segments.
6.4.3 Audio Playback
When the LectureController begins playback, it calls the startPlayback method in
the AudioController, which starts the playback in the tracks. The TrackController
uses a timer to start playback for the segments after a delay. The delay is equal to the difference
between the segment start time and the current audio time. If the current audio time intersects a
segment, then playback for that segment begins immediately. Playback uses the wavesurfer.js
library to play the audio resource in the audio segments. When a segment playback starts, the
SegmentController uses wavesurfer.js to start playing audio. The start point of the audio
can be specified so that it can start playing in the middle of the audio clip if specified by the
segment parameters.
Automatically stopping playback for the segment when the current audio time moves past the
end of the segment is handled by wavesurfer.js by specifying the stop time for the audio clip.
When playback is stopped in the LectureController, the stopPlayback method of the
AudioController is called, and it stops playback in all of the TrackControllers, which
36
then stops playback in all of the SegmentControllers. These SegmentControllers
manually stop any “wavesurfers” that are in the process of playing an audio clip.
6.4.4 Audio Timeline
The audio timeline is used for displaying the audio tracks to illustrate where the segments are in
relation to the audio/global time. It also has a plug-in functionality so that other items can be
displayed on the timeline. The timeline has a pixel-to-second scale which is used for drawing
items on the timeline, and it has the functionality to change this scale by zooming in and out.
This scale is illustrated through the use of labels and gradations.
6.4.5 Audio Track Controller
The TrackController draws the audio track from the model and handles playback for the
track. It delegates playback for the individual segments to the SegmentController. It has
the functionality for translating the UI events for editing the track into parameters that can be
used to call the methods to change the audio model.
6.4.6 Audio Segment Controller
The SegmentController draws the audio segment from the model. It uses the Wavesurfer
JavaScript library to display and play the audio files that are associated with the segments. It
creates the view for the segments and registers the callbacks associated with the various UI
actions such as dragging and cropping.
For the UI functionality of segment editing, the jQuery UI library is used. The Draggable and
Resizable functionality is used to implement segment shifting and cropping, respectively. In
order to enforce the constraint that the audio segments cannot overlap one another in the same
track, the drag and resize functions tests to see if the new position of the segment view resulting
37
from the drag or resize action leads to a valid shift or crop action. If the action is invalid, the
position of the segment view is restored to last valid position. The functionality for checking the
validity of these operations resides in the AudioModel. The result is that the UI uses the
AudioModel to check if the user actions are valid and the UI provides the relevant visual
feedback to the user.
6.4.7 Audio Plug-in
Audio timeline plug-ins are a way to display additional views in the audio timeline. This is a
useful feature because it allows those views to line up with the audio segments. For example,
visual thumbnails that will be played at the corresponding audio time can be displayed. Since the
audio time is used as the global time, it makes sense to visualize the information in this way.
When the timeline is panned from side to side, the plug-in views also pan with the rest of the
timeline. The plug-in is able to register a callback function that gets called when the timeline is
zoomed in or out. This allows the plug-in to define its own implementation of what is supposed
to happen when the pixel-to-second scale of the timeline changes.
The other components that currently use the plug-in functionality of the audio timeline are the
retimer constraints view and the thumbnails view. For these views, it makes sense to display
them as audio timeline plug-ins because it gives the user a sense of how the visual time relates to
the audio time and how that relationship changes when retimer constraints are added and
modified.
6.5 Retimer
The retimer is one of the main innovations of the Pentimento lecture software. By allowing
users to easily manipulate the synchronization between the visual and audio components of a
38
lecture, the retimer provides much needed flexibility in recording and editing lectures. The
retimer contains constraints, which are the synchronization connections between a point in the
audio time of the lecture and the visuals time of the lecture. Thus the retimer allows the
playback of the lecture to have proper synchronization between the visuals timeline and the
audio timeline.
6.5.1 Retimer Model
The RetimerModel provides the ability to manipulate constraints, including addition,
deletion, shifting and access to the constraints. The RetimerModel contains an array of
constraints, which are used to synchronize the audio and visual time. A constraint is comprised
of a type (automatic or manual), an audio time and a visual time. Automatic constraints are
inserted mechanically as the lecture is recorded (e.g. at insertion points or at the beginning/end of
a recording). Manual constraints are added by the user to synchronize a certain point in the
audio with a certain point in the visuals.
Adding constraints to the model requires finding the previous and next constraints (in audio
time). Once these constraints have been determined, the visual time can be interpolated between
the added constraint and the visual time of the two surrounding constraints, to allow for smooth
playback. This is done because adding a constraint only affects the time of visuals between the
two surrounding constraints.
When visuals or audio is inserted, automatic constraints are added to preserve the
synchronization provided by existing constraints. This requires shifting the existing constraints
by the amount of time that is added by the insertion. This process is completed by making the
constraints after the insertion point “dirty” until the insertion is completed. This means moving
39
the constraints to an audio time at “infinity” indicating they will be shifted. The original time is
stored so that when the recording is completed the constraint times can be shifted by the
appropriate amount. To perform the shift, the “dirty” constraints are “cleaned” by shifting the
original time that has been stored by the duration of the inserted recording (and removing the
value of infinity from the constraint time).
6.5.2 Retimer Controller
The RetimerController has access to the RetimerModel, so that when a user
manipulates constraints the necessary updates can be made to the constraints data. The retimer
controller also has access to the visuals and audio controllers so that synchronizations can be
inserted properly.
Additionally, the RetimerController manages redrawing of constraints and thumbnails so
that the view is properly updated when a user adds, drags or deletes a constraint. The
RetimerController interacts with the UI, so all user input events are handled properly.
When a constraint is added by a user, the RetimerController handles converting the
location of the click on the retiming canvas (in x and y positions) to the audio and visual time
represented by that location. Similarly, when a user is selecting to delete a constraint, the
RetimerController processes the selection area and locates the constraints within the
selection by converting constraint times to positions on the retiming canvas. Dragging
constraints is also handled by the RetimerController and when a user stops dragging the
RetimerController updates the RetimerModel to reflect the newly selected
synchronization timing.
40
6.6 Thumbnails Controller
The ThumbnailsController displays the visual thumbnails to the user (as part of the
retimer and audio timeline display). The ThumbnailsController requires access to the
Renderer to display the lecture visuals in a thumbnail timeline. The thumbnails are generated by
calculating how many thumbnails will fit in the timeline (based on the length of the lecture).
Then each thumbnail is drawn on its own canvas at the time in the middle of the time span
represented by the thumbnail. The thumbnails are redrawn any time a user updates a recording
or drags a constraint to update the visual or audio time synchronization.
6.7 Undo Manager
The UndoManager allows for undoing/redoing any action while recording or editing any
component of a lecture. The UndoManager is organized into an undo stack and a redo stack.
The stacks contain actions that can be undone or redone. Actions can be a single event or a
collection of actions all of which would be undone or redone.
The UndoManager is integrated to work with the rest of the application by registering undo
actions in the visuals, audio, and retimer models. For example, in the AudioModel, a segment
can be shifted by incrementing the start and end times by a certain amount. The shift is then
registered with the UndoManager, but the argument will be the inverse amount of the shift
performed (e.g. if the segment was shifted +5 seconds the UndoManager would store an undo
action shift moving the segment -5 seconds). If the action is undone the UndoManager, will
then push the action onto the redo stack instead of the undo stack. Differentiating between undo
and redo actions is handled by the UndoManager. In the implementation of the models, for
each action, the code only needs to register the inverse action with the UndoManager.
41
In the LectureController, when recording begins, an undo group is started. When the
recording ends, the group is ended. This allows the user to undo a recording as though it were
one action. The beginning of a recording also registers changeTime as an undo action. The
changeTime argument is the begin time of the recording. This way, if the recording is undone,
the time in the TimeController can be set to the begin time before the recording started.
When changeTime is called, it pushes another changeTime call to the UndoManager. The
argument for this call is the current time. The result is that when undoing and then redoing, the
time will switch to the place it was before the recording started and switch back to the place it
was after the recording ended.
6.8 Renderer
The Renderer is used for displaying visuals at a certain time, either as a still frame or during
playback. The Renderer takes in the canvas where the visuals should be displayed (and handles
scaling the visuals to the appropriate size for the given canvas). Then, to display the visuals, the
Renderer uses a minimum and maximum time and draws all visuals that are active in that time
range. The renderer is used for the main canvas while visuals are being recorded or during
playback. The Renderer is also used to display thumbnails for retimer purposes.
6.9 Save and Load Files
Pentimento saved files are regular .ZIP files with the content located in different files and
folders. In the top-level of the .ZIP, there is a JSON file containing the model. There is also a
folder containing the audio files. In the future, there could be other top-level folders for items
such as images and other external resources. Inside the audio folder, the audio clips are stored
42
with filenames starting with the number 0 and counting up the rest of the files. The JSZip library
is used for loading and saving .ZIP files.
The save is handled through LectureController. The JSON representation of the model is
obtained by using the saveToJSON method in the different models. In the AudioModel
JSON, the URL references to the audio clips are replaced by index numbers which will be used
as the file names of the audio files. The JSON is saved as a text file in the .ZIP, and then the
audio files are converted to HTML5 blobs asynchronously. When all of the downloads have
completed, the .ZIP file is downloaded to the computer’s local storage.
Loading of the Pentimento .ZIP files is also handled by the LectureController. The .ZIP
is loaded asynchronously and when the load is complete the contents are read. The different
models are loaded from their JSON representations using their respective loadFromJSON
methods. The audio files are loaded into the browser and the URL for the resource is obtained.
This URL is substituted back into the audio segments where an index number was previously
used as a substitute for the audio clip URL.
7. Future Work
While we have successfully created a working prototype, there are many additional features that
would certainly benefit Pentimento users. In addition to the features and additions outlined
below, user testing and input would be essential for determining exactly how the Pentimento
website would be used by real lecturers. While the feature sets below have been divided roughly
into recording vs. editing tools, many of the recording tools would require additional tools for
editing and many editing tools could be used while recording.
43
7.1 Future Features
7.1.1 Recording
After the baseline feature set was created and established, we determined that there are many
additional recording tools that lecturers find extremely helpful and somewhat necessary (that
aren’t necessarily part of the minimum successful product). Some of these possible features are
outlined below:
Feature
Auto Record
Function
There could be an option for “auto” recording, which will begin recording
when the pen is touched to the screen (as opposed to manually clicking the
button before recording begins). Auto record would stop after a few seconds of
the pen being inactive. This would allow for more flow while recording
lectures.
Text Boxes
Some lecturers may want to include typed text instead of handwriting. The
textbox feature would also include the ability to resize text, adjust fonts, text
formatting and other general text editing capabilities. Note: this feature is
currently under development
Set
Background
The background of the lecture or slide could be set to a certain a color or even
to an inserted image or a slide deck.
Emphasis
This tool would allow the lecturer to circle or otherwise select a certain area of
text to change either the line weight or color of the selection for emphasis.
Since these videos omit the actual lecturer and are solely visual and audio based
it is hard to direct student attention to a certain point of the screen without a
tool that facilitates emphasizing exactly what should be focused on at that
moment.
Set/Go to
“Home View”
Some lectures are based off of a central idea that the lecturer would like to
emphasize by revisiting the same set of visuals repeatedly throughout the
lecture. This view that should be easily accessed and could be set as the home
view. Then throughout the lecture to go to that view the lecturer can just click a
button that allows them to go home, as opposed to moving the canvas until the
desired view is found.
44
Feature
Inclusion of
Other Content
Function
Many times lecturers will want to include images, videos, hyperlinks or other
media into their recordings.
Zoom In/Out
While recording to emphasize certain aspects of diagrams or ideas it might be
useful for the lecturer to be able to zoom in/out on certain parts of the canvas.
Traceable
Images
Many times lecturers want to accurately draw an image (e.g. a map). With this
feature the image could be inserted onto the canvas, traced by the lecturer and
then the image would be removed. Therefore during playback only the
lecturer’s tracing will be shown and not the base image.
Shapes
Often times when creating slides it is nice to be able to draw shapes onto the
slides using a shape tool as opposed to hand drawing them (e.g. inserting
rectangles and circles). A shape tool is provided by many slide creation tools
(e.g. PowerPoint) and would be a nice addition to Pentimento.
Animations
Animations are extremely useful educational tools in many scenarios. For
example, when learning about mass spring systems it would be ideal if the
lecturer could draw the mass and the spring and then have it animate over a
specified curve. By adding the ability to insert these types of animations more
lecture styles will be possible.
Stroke Type
Currently whenever the lecturer uses the pen tool, the handwritten strokes are
recorded as calligraphy. Having a non-calligraphic mode would allow for more
variation in lectures.
Redraw
Visuals
This feature would allow a user to select certain visuals and redraw them while
keeping the timing of the previous visuals preserved. This would allow for
faster editing capabilities because visuals will not necessarily need to be
resynchronized with the audio.
Record Audio
for Selected
Visuals
Similar to redrawing visuals to maintain synchronization with audio, this
feature would allow for recording of audio for a set of selected visuals. This
will then retime those visuals to playback in correspondence with the newly
recorded audio (automatically adding the constraints instead of manually
retiming later).
Recording
Buffer
When a lecturer is writing sometimes it is difficult to plan how much space they
will need. Adding a recording buffer area off the canvas as a warning that they
are running out of space, but with the ability to still finish a thought can allow
for more fluidity in recording lectures. After recording in the buffer space
45
Feature
Function
while editing the user could resize the visuals to make them fit onto the
playback screen.
7.1.2 Editing
In addition to the basic editing techniques enabled by the tools described above, there are some
additional capabilities that the lecturers may want to be able to use while editing these lectures.
Feature
Background
and Insertion
of Media
Function
While editing the lecturer may realize that they wanted to change the
background to something different or insert a video or other media at a certain
point, which would be allowed during the editing phase.
Grid View
While editing the lecturer may want to view gridlines to make sure that
everything is lined up properly. Note: this same tool can also be used during
the recording phase (grid lines would not be recorded).
Redraw Tool
This tool is complex because it will combine recording and editing modes.
With the redraw tool, editors will be able to select content that they want to rerecord, remove that content, and then they can re-record that content but
maintain the same timing within the lecture (i.e. the redrawn strokes will play
back in the video at the same time as the original strokes).
Slide
Manipulation
Currently, a user can only add slides to a lecture. Adding a way to select and
delete a slide or to rearrange slides would be highly beneficial for lecturers.
This may require additional UI work (e.g. a slide view).
Auto-Delete
Silence
One of the many issues Pentimento set out to address is the fact that writing
often takes longer than speaking. If recording audio and visuals concurrently
there may be long silences introduced by the writing taking longer. A feature
to auto-delete these silences could make editing more efficient.
Override
Automatic
Constraints
Currently, automatic constraints are handled the same as manually added
constraints. We do not allow visuals to be drawn “backwards” so a constraint
may not be dragged past the previous or next constraint to preserve timing.
However, user added constraints could override automatically added
constraints. One way to do this would be allowing for dragging a manual
constraint past an automatic constraint and deleting the automatic constraint.
46
Feature
Handwriting
Beautification
Function
There are certain algorithms available for handwriting beautification [11],
which could be used for these handwritten lectures. A user could select the
handwritten text and then make it appear neater, but still look like it is being
written by hand (as opposed to simply typing).
Keyboard
Shortcuts
Most computer programs allow for keyboard shortcuts and adding those to
Pentimento would allow for experienced users to be more efficient. Some
examples might be, initiating playback by pressing the space bar or being able
to delete visuals by pressing the delete key.
7.2 User Interface Additions
There are many additional improvements to the User Interface that could be beneficial for users
of Pentimento. One such improvement could be the ability to toggle between the main recording
view and the audio and retimer timeline view, instead of having them stacked on top of each
other and scrolling between them. This could be implemented by having a tab or accordion
system within the webpage, which could easily allow the user to update what he or she is seeing.
This would make the UI cleaner and easier to scale to different size displays since each
component of the view could have its own screen. While this would improve the interaction
between the two main components of the user interface, each portion could also be improved.
7.2.1 Main Visuals Recording and Editing UI
UI Feature
Description
Slides Interface It could be good for the lecturer to have the ability to see all of their slides (e.g.
in a grid) so that slides could be manipulated more easily. For example this
would allow a user to rearrange slides, insert a slide at a certain point or select
and delete a slide.
Save/Open
Buttons
These buttons could be resized or a different icon chosen to fit more cohesively
into the overall UI
Time Indicator
The time indicator could be made more obvious. Additionally, if the main
47
UI Feature
Description
recording UI and audio/retimer timeline UI are separated a new time slider
should be available for navigating through the visuals.
Color Select
Palette
The spectrum.js color picker design doesn’t necessarily fit in with the rest of
the Pentimento UI so it could be updated or manipulated to be displayed more
attractively.
Jump to
Beginning or
End
Buttons could be added to allow the user to quickly jump to the beginning or
end of the lecture without dragging a time slider. Jumping to the beginning is
often useful for playback. Jumping to the end would be useful to add more
content.
Adding Tools
The layout may need to be updated to add more tools. Currently the Recording
and editing tools fit nicely alongside the canvas, but a column approach or
different layout may become necessary as more tools become available.
7.2.1 Audio and Retimer Timeline UI
UI Feature
Selection of
Thumbnails
Description
By clicking on the thumbnail timeline and dragging a user could potentially
select visuals, which would add another way to select visuals that may appear
across different slides or a long period of time and editing all of them at once.
Fluid Arrow
Dragging
Currently when a user drags an arrow, it creates a straight line to connect the
two ends of the arrow. However, it might be clearer if the center of the arrow
was anchored and the arrow dragging looked more fluid (e.g. the arrow
became curved so that the tip is still pointing straight up or straight down).
jCanvas supports the drawing of many types of curves which could facilitate
this improvement.
Tick Marks for
Thumbnails
Speed
Displaying tick marks below the visuals thumbnails or on the visuals side of
the retimer canvas could indicate the speed at which the thumbnails are being
played back. If the tick marks are close together the visuals have been sped up
by constraints and if they are far apart it would indicate that the visuals have
been slowed down to maintain synchronization with the audio.
Audio
“Transcript”
It is slightly complicated for lecturers to synchronize audio and visuals because
the audio is presented as a wavelength display. It would be extremely helpful
if there was a way to process the wavelengths and display a transcript of the
speech below the audio timeline so that it is faster and easier to find the correct
48
UI Feature
Description
point for synchronization.
Audio
Manipulation
Buttons
Currently the buttons are styled very simply to function, but they could be
replaced by icons (to fit in with the rest of the UI) or styled in a way that is
more consistent with the rest of the UI.
7.3 Student Player
Another vital addition to the Pentimento lecture software will be creating a web based student
viewer, where students can see these videos and interact with them in ways specified by the
lecturer. For example, the lecturer could insert a pause point that requires the student to submit
an answer to a question. The student view would still allow for interactivity and have some
features available to the teachers as well (e.g. maybe they will have to record and show their
work for a problem), but would mainly be used for recording new student content and wouldn’t
require as much editing capability as the actual lecture recording. The student input may also be
in forms that do not require writing, since writing is facilitated by a pen and tablet input, which is
a resource that isn’t readily available to student. For example, a multiple choice question could
appear at a certain point in the lecture requiring a student answer. However, many students
would have access to a touch screen in the form of a tablet (e.g. iPad) or smartphone. This could
suggest the desire for a mobile version of the student player. The need for secure accounts or
online "classroom' structures adds additional complexity to student interfaces.
8. Conclusions
The main goal of this project is to make creating and recording online educational video content
easy and maintainable. To begin achieving this goal we were able to turn the current iOS
prototype into a web based tool that can be accessed by anyone to record and edit presentations.
49
The presentations can be edited in both space and time and these changes can be applied to both
the visual and audio elements. For visual changes, this means having the ability to change
strokes and placement after the recording. This can include correcting existing strokes, adding
missing ones (such as a missing prime on a variable), moving drawings and text, or resizing
drawings and text. In the future we hope that it will also be possible to re-record, insert, or delete
sections of audio and keep them synchronized with the visual recording.
By using my thesis work to implement as base set of features, it will be possible for educators to
use Pentimento to easily record high-quality, educational videos that can be maintained for many
years. Since online learning is becoming extremely widespread it is important to have accessible
software that can keep up with the demand and be easy to use for lecturers.
50
Appendix A: Documentation
Here we will outline all of the functions available in the Pentimento code base, with their input
parameters and outputs if any.
A.1 Lecture
A.1.1 Lecture Model
getVisualsModel
Return: the current visuals model
setVisualsModel
Parameters: newVisualsModel (the new visuals model that the visuals model should be set to)
Sets the visuals model to the newly chosen visuals model
getAudioModel
Return: the current audio model
setAudioModel
Parameters: newAudioModel (the new audio model that the audio model should be set to)
Sets the audio model to the newly chosen audio model
getRetimerModel
Return: the current retimer model
setVisualsModel
Parameters: newRetimerModel (the new retimer model that the retimer model should be set to)
sets the retimer model to the newly chosen retimer model
getLectureDuration
Return: the duration of the lecture in milliseconds. The full lecture duration is the maximum
duration of the audio recording or visuals recording times.
loadFromJSON
Parameters: json_object (the lecture in JSON form to be loaded from a file)
Loads a lecture file. Calls the necessary load methods from the visuals, audio and retimer to
construct the models for each and allow for playback. (Note: see JSON file structure in Saving
and Loading appendix).
51
saveToJSON
Saves the whole lecture to a JSON file. Gets the data from the saving methods from the visuals
model, audio model and retimer model to create the JSON object. (Note: see JSON file structure
in Saving and Loading appendix).
A.1.2 Lecture Controller
save
Saves the current lecture to a JSON file. The JSON is put into a zip file with the audio blob
(Note: see JSON file structure in Saving and Loading appendix).
load
Reads the selected lecture file so that it can be opened and displayed to the user.
openFile
Parameters: jszip (the JSON zip file including the JSON object that represents the lecture and the
audio blob).
Opens the specified file into the UI and resets all of the controllers and models to be consistent
with the loaded lecture. (Note: see JSON file structure in Saving and Loading appendix).
getLectureModel
Return: the lecture model
getTimeController
Return: the time controller
recordingTypeIsAudio
Return: true if the recording will include audio (i.e. if the audio checkbox is checked), otherwise
returns false
recordingTypeIsVisual
Return: true if the recording will include visuals (i.e. if the visuals checkbox is checked),
otherwise returns false
isRecording
Return: true if a recording is in progress, otherwise returns false
isPlaying
Return: true if playback is in progress, otherwise returns false
52
startRecording
Starts the recording and notifies other controllers (time, visuals, audio and retimer) to begin
recording. Updates the UI to toggle the recording button to the stop button. Note: only notifies
the audio and visuals controllers if their respective checkboxes are checked for recording.
Return: true if successful
stopRecording
Stops the recording and notifies other controllers (time, visuals, audio and retimer) to end
recording. Updates the UI to toggle the stop button to the recording button. Note: only notifies
the audio and visuals controllers if their respective checkboxes are checked for recording.
Return: true if successful
startPlayback
Starts playback and notifies the other controllers (time, visuals, audio) that playback has begun.
Toggles the play button to the pause button.
Return: true if successful
stopPlayback
Stops playback and notifies the other controllers (time, visuals, audio) that playback has ended.
Toggles the pause button to the play button.
Return: true if successful
getPlaybackEndTime
Return: the lecture time when playback is supposed to end (returns -1 if not currently in playback
mode)
draw
Redraws the views of all of the controllers (visuals, audio and retimer).
undo
Undoes the last action and redraws the view to reflect the change.
redo
Redoes the last undone action and redraws the view to reflect the change.
changeTime
This function creates a wrapper around a call to the time controller and the undo manager. This
is necessary because the time needs to revert back to the correct time if an action is undone or
redone.
53
loadInputHandlers
Initiates the input handlers (i.e. mousedown, mouseup, keydown, and keyup). Also registers if it
is a touch screen to determine if pen pressure will be applied. Also connects the click events to
the lecture buttons.
updateButtons
Toggles the UI display to between recording/stop button and play/pause button to reflect the
current recording or playback state.
A.2 Time Controller
addUpdateTimeCallback
Adds a callback that should notify listeners when the current time changes (note: functions
should have one argument currentTime, in milliseconds)
getTime
Return: current time, in milliseconds
updateTime
Manually update the current time and notify callbacks
globalTime
Return: UTC time (to keep track of timing while it is in progress)
isTiming
Return: true if timing is in progress, otherwise returns false
startTiming
Starts progressing the lecture time
Return: true if successful
stopTiming
Stops progressing the lecture time
Return: true if successful
getBeginTime
Return: the time (in milliseconds) when the previous or current timing started (returns -1 if there
was no previous or current timing).
54
getEndTime
Return: the time (in milliseconds) when the previous timing ended (returns -1 if there was no
previous timing event).
A.3 Visuals
A.3.1 Visuals Model
getCanvasSize
Return: an object with the size of the canvas where the visuals are being recorded, formatted as:
{‘width’: <canvas width>, ‘height’: <canvas height>}
getDuration
Return: the total visuals duration (calculated by adding the durations of each of the slides)
getSlides
Return: the array of all of the lecture slides
getSlidesIterator
Return: an iterator over the slides
getSlideAtTime
Parameters: time (in milliseconds)
Return: the slide that is displayed at the specified time
insertSlide
Parameters: prevSlide (the previous slide before the point where the new slide will be inserted),
newSlide (the slide to be inserted)
Return: true if successful (false if the previous slide does not exist)
removeSlide
Parameters: slide (the slide to be removed, type slide)
Return: true if successful (false if there are no slides or if there is only one slide remaining)
addVisuals
Parameters: visual (the visual to be added, type visual)
Gets the slide at the minimum time of the visual and then adds the indicated visual to the visuals
belonging to that slide.
55
deleteVisuals
Parameters: visual (the visual to be deleted, type visual)
Gets the slide at the minimum time of the visual and then removes the indicated visual from the
visuals belonging to that slide
visualsSetTDeletion
Parameters: visual (the visual to be deleted), visuals_time (the time to delete the visual, in
milliseconds)
Sets the deletion time property of the given visual to the specified deletion time.
setDirtyVisuals
Parameters: currentVisualTime (the current visual time, after this time all visuals will be set to
dirty)
Creates wrappers around the visuals that keeps track of their previous time and the times of their
vertices. Then move the visuals to positive infinity. Used at the end of a recording so that the
visuals will not overlap with the ones being recorded. Only processes visuals in the current slide
after the current time.
cleanVisuals
Parameters: amount (the amount of time, in milliseconds, that the visuals that were previously set
as dirty will need to be shifted by to accommodate the new recording)
Restores visuals to their previous time plus the amount indicated. Used at the end of a recording
during insertion to shift visuals forward.
doShiftVisual and shiftVisual
Don’t function, but were written by previous M.Eng student as part of the “shift as you go”
approach to shifting visuals during insertion. Left in the code base for the possibility of going
back to that method.
prevNeighbor
Parameters: visual
Return: the previous visual (i.e. the visual that occurs right before the specified visual in time).
nextNeighbor
Parameters: visual
Return: the next visual (i.e. the visual that occurs right after the specified visual in time).
segmentVisuals
Parameters: visuals (an array of all visuals)
Return: returns an array of segments, where each segment consists of a set of contiguous visuals.
56
getSegmentShifts
Parameters: segments (an array of visual segments, where a segment is a set of contiguous
visuals)
Return: returns an array of the amount by which to shift each segment
saveToJSON
Saves the visuals as a JSON object
loadFromJSON
Parameters: json_object
Return: an instance of the visuals model with the data specified in the JSON object (loaded from
a file)
A.3.2 Visuals Controller
getVisualsModel
Return: visuals model
getRetimerModel
Return: retimer model
drawVisuals
Parameters: audio_time
Draws visuals on the canvas using the renderer. The time argument is optional, but if specified
is the audio time at which to draw the associated visuals (visual time calculated from the
retimer). If the time is not specified visuals are drawn at the current time of the time controller.
startRecording
Parameters: currentTime (time at which to start recording)
Begins recording visuals on the slide at the current time.
stopRecording
Parameters: currentTime (time at which to stop recording)
Stops the recording. If it is an insertion visuals after the recording time are “cleaned” to move to
the end of the insertion. Durations are updated.
startPlayback
Parameters: currentTime (time at which to start playback)
Starts playback
57
stopPlayback
Parameters: currentTime (time at which to stop playback)
Stops playback
currentVisualTime
Return: visual time (converted from the time controller time through the retimer)
currentSlide
Return: the slide at the current time (gotten from the visuals model)
addSlide
Adds a slide to the visuals model
addVisual
Adds a visual to the visuals model (once it is done being drawn)
recordingDeleteSelection
Deletes the selected visuals during recording and sets the tDeletion property for all of the
selected visuals.
editingDeleteSelection
Deletes the selected visuals while in editing mode, which removes the selected visuals entirely
from all points in time.
recordingSpatialTransformSelection
Parameters: transform_matrix (the matrix that will transform the selected visuals to the correct
place).
Transforms the visuals spatially during recording. Gets the selected visuals and calculates the
new position versus the original position to calculate the final transform matrix and it is added to
the spatial transforms of those visuals.
editingSpatialTransformSelection
Parameters: transform_matrix (the matrix that will transform the selected visuals to the correct
place).
Transforms the visuals spatially during editing. Gets the selected visuals and adds the transform
matrix to the spatial transforms of those visuals.
recordingPropertyTransformSelection
Parameters: visual_property_transform (visual property that will be changed by the selection, i.e.
color or width).
58
Changes the properties of the selected visuals during recording. Adds a property transform to the
selected visuals property transforms
editingPropertyTransformSelection
Parameters: property_name (property that will be changed), new_value (value to change the
property to)
Changes the properties of the selected visuals during editing. Updates the specified property to
the new property value (e.g. changes from one color to another).
A.4 Tools Controller
startRecording
Activates the recording tools and hides the editing tools
stopRecording
Activates editing tools and hides recording tools
toolEventHandler
Parameters: event
Handles a click event on one of the tool buttons (handles both recording and editing tools)
acitvateCanvasTool
This activates a tool on the canvas. This is used for tools such as pen, highlight, and select. The
tool that is registered is the active tool for the current mode (recording/editing). Initializes
mouse and touch events for the active tool.
drawMouseDown
Parameters: event
Used when the pen tool is active. Called when the mouse is pressed down or a touch event is
started. Activates the mouse move and mouse up handlers and starts a new current visual (i.e.
the visual that is being drawn by the pen).
drawMouseMove
Parameters: event
Used when the pen tool is active. When the mouse is down and moved or touch is moving,
appends a new vertex to the current visual.
59
drawMouseUp
Parameters: event
Used when the pen tool is active. When the mouse is released or a touch ends, clears the handlers
and adds the completed visual.
resetSelectionBox
Parameters: event
Resets the selection box so that it is not visible.
selectMouseDown
Used when the selection tool is active. When the mouse is pressed down or a touch event is
started, activates the selection box and the mouse move and mouse up handlers
selectMouseMove
Parameters: event
Used when the selection tool is active. When the mouse is down and moved or a touch event is
moving, updates the dimensions of the selection box and selection vertices.
selectMouseUp
Parameters: event
Used when the selection tool is active. When the mouse is released or a touch ends, clears the
handlers and turns on dragging and resizing of the selection box.
selectBoxStartTranslate
Parameters: event, ui
While dragging a selection box, stores the original UI element dimensions
selectBoxEndTranslate
Parameters: event, ui
While editing handles the end of dragging a selection box
selectBoxEndScale
Parameters: event, ui
While editing handles the end of resizing a selection box
widthChanged
Parameters: new_width (the newly selected width for the pen tool)
Handles changing the width of the pen tool.
60
colorChanged
Parameters: new_spectrum_color (the newly chosen color for the pen tool. The color is passed
in as a spectrum.js color and then converted to hex).
Handles changing the color of the pen tool.
isInside
Parameters: rectPoint1 (top left corner of selection rectangle), rectPoint2 (bottom right corner of
selection rectangle), testPoint (vertex point).
Return: true if the test vertex is inside the selection, otherwise returns false.
Tests if a vertex inside the rectangle formed by the two rectangle points that form the selection
box
getCanvasPoint
Parameters: event
Return: Vertex(x,y,t,p) with x,y on the canvas, and t a global time
Gives the location of the mouse event on the canvas, as opposed to on the page
getTouchPoint
Parameters: eventX, eventY (the coordinates of the touch event)
Return: Vertex(x,y,t,p) with x,y on the canvas, and t a global time
Gives the location of the touch event on the canvas, as opposed to on the page
calculateTranslateMatrix
Parameters: original_position, new_position (position is represented as { left, top })
Return: translation matrix
Given the original and new position of a box in the canvas, calculate and return the math.js
matrix. Necessary to translate the box from the original to the new coordinates.
calculateScaleMatrix
Parameters: original_position, original_size, new_position, new_size (position is represented as {
left, top }, size is represented as { width, height })
Return: scaling matrix
Given the original and new dimensions of a selection box in the canvas, calculate and return the
math.js matrix necessary to scale the box from the original to the new coordinates. Scaling
normally ends up translating, so the matrix returned by this function will negate that translation.
61
A.5 Audio
A.5.1 Audio Model
getAudioTracks
Return: array containing all audio tracks
setAudioTracks
Parameters: tracks (array containing audio tracks)
Sets the audio tracks to the specified tracks
addTrack
Parameters: track, insert_index (optional argument)
Adds the track to the end of the audio tracks, unless the insertion index is specified, then insert
the track at the chosen index.
removeTrack
Parameters: track
Return: true if completes, false otherwise
Removes the specified audio track
getDuration
Return: the total duration of the audio (in milliseconds), which is the max of the all audio track
lengths. Returns 0 if no audio tracks
getBlobURLs
Return: an array of all the unique audio blob URLs
saveToJSON
Return: a JSON object containing the audio JSON
Saves the model to JSON
loadFromJSON
Parameters: json_object (JSON object containing the audio information)
Return: audio model populated with the information from the JSON object
getAudioSegments
Return: an array of all audio segments
62
setAudioSegments
Parameters: segments
Sets the segments in the track to the specified segments
insertSegment
Parameters: new_segment, do_shift_split
Return: true if insert succeeded, unless a split occurs. If there is a split returns an object {left,
right, remove} with the left and right side of the split segment and the segment that was removed
to become the left and right parts.
Insert the provided segment. Note: another segments in the track may need to be split to insert
the specified new segment.
addSegment
Parameters: segment
Add the segment to the audio segments array.
removeSegment
Parameters: segment
Return: rue of the segment is removed
Removes the specified audio segment.
canShiftSegment
Parameters: segment, shift_millisec
Return: true if the shift is valid, otherwise return the shift value of the greatest magnitude that
would have produced a valid shift
Determines whether the specified segment can be shifted to the left or right. If a negative
number is given for shift_millisec, then the shift will be left. The final value of the segment
starting time cannot be negative. The segment cannot overlap existing segments in the track. If
the shift will cause either of these conditions to be true, then the shift cannot occur.
shiftSegment
Parameters: segment, shift_millisec, check (optional and defaults to true. If false, shift is
performed without checking for validity)
Return: true if shift succeeds, return the shift value of the greatest magnitude that would have
produced a valid shift
Shifts the specified segment left or right by a certain number of milliseconds. If a negative
number is given for shift_millisec, then the shift will be left.
63
canCropSegment
Parameters: segment, crop_millisec, left_side (boolean indicating whether the left side is being
cropped)
Return: Returns true if the crop is valid, otherwise returns a crop millisecond of the greatest
magnitude that would have produced a valid crop
Determines whether the specified segment can be cropped on the left or right. If a negative
number is given for crop_millisec, then the crop will shrink the segment. If a positive number is
given for crop_millisec, then the crop will extend the segment. The segment cannot overlap
existing segments in the track. The segment cannot extend past the audio length and cannot
shrink below a length of 0.
cropSegment
Parameters: segment, crop_millisec, left_side (boolean indicating whether the left side is being
cropped), check (optional and defaults to true. If false, it will crop without checking for validity)
Return: Returns true if the crop is valid, otherwise returns a crop millisecond of the greatest
magnitude that would have produced a valid crop
Crop the specified segment by the specified number of milliseconds. If a negative number is
given for crop_millisec, then the crop will shrink the segment side
endTime
Return: the end time of the track in milliseconds, which is the greatest segment end time.
Returns 0 if the track is empty.
saveToJSON
Return: a JSON object containing the audio track JSON
Saves the audio tracks to JSON
loadFromJSON
Parameters: json_object (JSON object containing the audio information)
Return: audio track with the information from the JSON object
audioResource
Return: the URL of the audio resource blob needed for playback
totalAudioLength
Return: total length of the audio resource blob
lengthInTrack
Return: the length of the segment in the track
64
audioLength
Return: the length of the audio that should be played back
splitSegment
Parameters: splitTime
Return: an object {left, right} with two segments that are the result of splitting the segment at the
specified track time. Returns null if the track time does not intersect the segment within
(start_time, end_time)
Splits an audio segment at the specified time
trackToAudioTime
Parameters: trackTime
Return: audio time. Returns false if the given track time is invalid.
Converts a track time to the corresponding time in the audio resource at the current scale
audioToTrackTime
Parameters: audioTime
Return: track time. Returns false if given audio time is invalid.
Converts a time in the audio resource to the corresponding time in the track at the current scale
saveToJSON
Return: a JSON object containing the audio segment JSON
Saves the audio segment to JSON
loadFromJSON
Parameters: json_object (JSON object containing the audio segment information)
Return: audio segment with the information from the JSON object
A.5.2 Audio Controller
getAudioModel
Return: the audio model
addTrack
Creates a new track in the model to add. Redraws the audio timeline
removeTrack
Remove a track from the audio model. Redraws the audio timeline
changeActiveTrack
65
Parameters: index (index of track to make active)
Changes the active track index to refer to another track
startRecording
Parameters: currentTime
Starts recording the audio at the given track time (in milliseconds)
stopRecording
Parameters: currentTime
Ends the recording (only applies if there is an ongoing recording)
startPlayback
Parameters: currentTime
Begins audio playback at the given track time (in milliseconds)
stopPlayback
Parameters: currentTime
Stops all playback activity.
millisecondsToPixels
Parameters: millSec
Return: pixel value
Converts milliseconds to pixels according to the current audio timeline scale
pixelsToMilliseconds
Parameters: pixels
Return: millisecond value
Converts pixels to milliseconds according to the current audio timeline scale
tickFormatter
Parameters: tickpoint
Return: time (e.g. 00:30:00)
Changes tickpoints into time display (e.g. 00:30:00). Each tickpoint unit is one second which is
then scaled by the audio timeline scale.
disableEditUI
Disables all UI functionality for editing audio (used during recording and playback)
enableEditUI
Enables all UI functionality for editing audio (used when recording or playback stops)
66
drawTracksContainer
Return: jquery tracks container object
Draw the container that will be used to hold audio tracks
pluginTopOffset
Parameters: pluginIndex
Return: offset from the top of the tracks container (in pixels)
Gets the offset (pixels) from the top of the tracks container for the nth plugin. Using a
pluginIndex equal to the number of plugins will return the offset needed by the tracks that are
drawn under the plugins.
refreshGradations
Redraw the gradations container to fit the current audio tracks
drawGradations
Draw the graduation marks on the audio timeline
refreshPlayhead
Refreshes played position
drawPlayehead
Draws the playhead for showing playback location
zoom
Parameters: zoomOut (default true to indicate zoom out, false means zoom in)
Zooms the audio timeline in or out
draw
Draws all parts of the audio timeline onto the page
updatePlayheadTime
Parameters: currentTime
Updates the current time (ms) of the audio timeline (the time indicated by the playhead)
updateTicker
Parameters: time
Updates the ticker display indicating the current time as a string
timelineClicked
67
Parameters: event
When the timeline is clicked, update the playhead to be drawn at the time of the clicked position.
addTimelinePlugin
Parameters: plugin
Adds the plugin to the list of plugins
getTimelinePluginID
Parameters: plugin
Return: the ID of the plugin, which is calculated as the base plus the index of the plugin in the
array
A.5.3 Track Controller
getID
Return: the ID of the track
getLength
Return: the length of the track (in milliseconds)
getAudioTrack
Return: the audio track
insertSegment
Parameters: newSegment (segment to be inserted)
Insert a new segment into the audio track
removeSegment
Parameters: segment
Remove a segment from the track
segmentDragStart
Parameters: event, ui, segmentController
Callback for when a segment UI div starts to be dragged. Sets initial internal variables.
segmentDragging
Parameters: event, ui, segmentController
Callback for when a segment UI div is being dragged. Tests whether or not the drag is valid. If
the dragging is valid, it does nothing, allowing the segment UI div to be dragged to the new
position. If the dragging is invalid, it sets the segment UI div back to the last valid position.
68
segmentDragFinish
Parameters: event, ui, segmentController
Callback for when a segment UI div is finished being dragged. Performs the drag in the audio
model.
segmentCropStart
Parameters: event, ui, segmentController
Callback for when a segment UI div starts to be cropped. Sets the initial internal variables.
segmentCropping
Parameters: event, ui, segmentController
Callback for when a segment UI div is being cropped. If the cropping is valid, it does nothing. If
the cropping is invalid, it sets the UI div back to the original size and position.
segmentCropFinish
Parameters: event, ui, segmentController
Callback for when a segment UI div has finished being cropped. The cropping should always be
valid because the 'segmentCropping' callback only allows cropping to happen in valid ranges.
Performs the crop in the audio track.
removeFocusedSegments
Remove all segments that have focus.
startPlayback
Parameters: startTime, endTime
Start the playback of the track at the specified time interval. Stops the previous playback if there
is one currently going. The time is specified in milliseconds. If the end time is not specified,
playback goes until the end of the track.
stopPlayback
Stop the playback of the track. Does nothing if the track is not playing.
refreshView
Refresh the view to reflect the state of the model for an audio track
draw
Parameters: jqParent (jQuery container where track should be drawn)
Return: a new jQuery track
Draw a track into the parent jQuery container
69
A.5.4 Segment Controller
getID
Return: the segment ID
getWavesurferContainerID
Return: the ID of the wavesurfer container
getClassName
Return: the name of the class used to represent audio segments
getAudioSegment
Return: the audio segment
getParentTrackController
Return: the parent track controller
startPlayback
Parameters: delay, trackStartTime, trackEndTime
Play the audio segment back after a delay at the specified time interval (milliseconds). If the end
time is undefined, play until the end. If playback is currently going or scheduled, then cancel the
current and start a new one.
stopPlayback
Stop any ongoing or scheduled playback
refreshView
Refresh the view to reflect the state of the model for the audio segment
draw
Parameters: jqParent (jQuery container where segment should be drawn)
Return: a new jQuery segment
Draw a segment into the parent jQuery container
shiftWavesurferContainer
Parameters: pixelShift
Shift the internal wavesurfer container left (negative) or right (positive) in pixels. This is used
when cropping to move the container so the cropping motion looks natural.
70
A.6 Retimer
A.6.1 Retimer Model
getConstraints
Return: an array containing all of the constraints
makeConstraintDirty
Parameters: constraint
Return: the constraint having been disabled
cleanConstraints
Parameters: constraint, amount (amount to shift the original time of the constraint)
Shifts the dirty constraints by the specified amount (from their original time) and enables the
constraints.
checkConstraint
Parameters: constraint
Return: true if this is a valid constraint, false otherwise
Check to see if the constraint is in a valid position
updateConstraintVisualsTime
Parameters: constraint, audioTimeCorrespondingToNewVisualsTime, test (default is false,
optional Boolean indicating whether to test the update without actually updating)
Return: a Boolean indicating whether the update was successful
Update the visuals part of the constraint located at the specified audio time (tAud)
updateConstraintAudioTime
Parameters: constraint, newTAudio, test (default is false, optional Boolean indicating whether to
test the update without actually updating)
Return: a Boolean indicating whether the update was successful
Update the audio part of the constraint located at the specified visuals time (tVid)
addConstraint
Parameters: constraint
Return: true if constraint is successfully added
Add a constraint to the lecture
71
deleteConstraint
Parameters: constraint
Deletes the specified constraint
shiftConstraints
Parameters: constraints, amount
Shifts the specified constraints by the specified amount of time
getConstraintsIterator
Return: an iterator over all constraints
getPreviousConstraint
Parameters: time, type (visual or audio)
Return: the constraint that appears in time before the time of the given constraint
getNextConstraint
Parameters: time, type (visual or audio)
Return: the constraint that appears in time after the time of the given constraint
getVisualTime
Parameters: audioTime
Return: visual time associated with the given audio time
Converts audio time to visual time
getAudioTime
Parameters: visualTime
Return: audio time associated with the given visual time
Converts visual time to audio time
saveToJSON
Return: a JSON object containing the constraints JSON information
Saves the constraints to JSON
loadFromJSON
Parameters: json_object
Return: an instance of the retimer model with the data specified in the JSON object (loaded from
a file)
72
A.6.2 Retimer Controller
addArrowHandler
Parameters: event
The event handler for when a user clicks on the constraints canvas after clicking on the “add
constraint” button. It adds the constraint to the model, and then draws the arrow on the canvas
drawTickMarks
Draws tick marks on the retimer canvas to indicate how quickly or slowly the visuals are being
played back. (Note: not active currently, interpolation isn’t working properly)
drawConstraint
Parameters: constraint_num (unique id for each constraint added (incremented by the retimer)
Draw the constraint on the constraints canvas (for manual/user added constraints)
redrawConstraints
Refresh the canvas and redraw the constraints
redrawConstraint
Parameters: constraint, constraint_num
Redraw an individual constraint on the retimer canvas
addConstraint
When a user adds a constraint, add the constraint to the retimer model
selectArea
Parameters: event
Handles the event when a user clicks on the retimer canvas to select a constraint
selectionDrag
Parameters: event
As a user drags along the retimer canvas the selection box is updated and drawn
endSelect
Parameters: event
Handles the end of a selection dragging along the retimer canvas
selectConstraints
Parameters: event
Finds the constraints that are within the selection area
73
displaySelectedConstraints
Parameters: event
Redraws the constraints that have been selected to be displayed in red
deleteConstraints
Parameters: event
Deletes the selected constraint(s) from the retimer model
constraintDragStart
Parameters: layer (jCanvas layer containing the constraint to be dragged)
When dragging starts, record whether the drag is for the top or bottom of the arrow (visuals end
or audio end respectively) and record the original x position of that end of the arrow.
constraintDrag
Parameters: layer (jCanvas layer containing the constraint being dragged)
Dragging moves one end of the arrow while the other tip remains in place
constraintDragStop
Parameters: layer (jCanvas layer containing the constraint that has stopped being dragged)
When dragging stops, update the visuals or audio time of the constraint depending on whether
the drag was top or bottom. Updates the thumbnails accordingly.
constraintDragCancel
Parameters: layer (jCanvas layer containing the constraint being dragged)
When dragging cancels (i.e. if a user drags the constraint off the canvas), it should reset to its
original value.
beginRecording
Parameters: currentTime
Adds automatic constraints at the beginning of a recording
endRecording
Parameters: currentTime
Add an automatic constraint at the end of a recording
74
A.7 Thumbnails Controller
drawThumbnails
Draw the thumbnails whenever the visuals in the main window are updated or changed.
Calculates number of thumbnails to draw. Setup all the thumbnail canvases (each thumbnail is
drawn on a separate canvas). Iterate over the number of thumbnails and call generate thumbnail.
generateThumbnail
Parameters: thumbOffset (the number of the thumbnail in the sequence of all of the thumbnails),
visuals_min (the minimum time to be displayed by the current thumbnail), visuals_max
(maximumm time to be displayed by the current thumbnail), thumbnail_width (the width of the
thumbnails canvas, specified to ensure that it will line up with the audio timeline)
Generate a thumbnail by getting the visuals from the slides.
75
Appendix B: Example of Saved Lecture JSON Structure
{
"visuals_model": {
"slides": [
{
"visuals": [
{
"type": "Stroke",
"hyperlink": null,
"tDeletion": null,
"propertyTransforms": [],
"spatialTransforms": [],
"tMin": 947,
"properties": {
"c": "#777",
"w": 2
},
"vertices": [
{
"x": 92.0625,
"y": 31,
"t": 949
},
{
"x": 92.0625,
"y": 32,
"t": 1034
},
{
"x": 93.0625,
"y": 33,
"t": 1046
},
{
"x": 93.0625,
"y": 34,
"t": 1059
},
{
"x": 93.0625,
76
"y": 36,
"t": 1073
}
]
},
{
"type": "Stroke",
"hyperlink": null,
"tDeletion": null,
"propertyTransforms": [],
"spatialTransforms": [],
"tMin": 2531,
"properties": {
"c": "#777",
"w": 2
},
"vertices": [
{
"x": 163.0625,
"y": 56,
"t": 2531
},
{
"x": 166.0625,
"y": 53,
"t": 2594
},
{
"x": 168.0625,
"y": 51,
"t": 2603
},
{
"x": 171.0625,
"y": 50,
"t": 2617
},
{
"x": 174.0625,
"y": 48,
"t": 2629
77
}
]
},
{
"type": "Stroke",
"hyperlink": null,
"tDeletion": null,
"propertyTransforms": [],
"spatialTransforms": [],
"tMin": 9468,
"properties": {
"c": "#777",
"w": 2
},
"vertices": [
{
"x": 125.0625,
"y": 258,
"t": 9470
},
{
"x": 125.0625,
"y": 257,
"t": 9491
},
{
"x": 127.0625,
"y": 254,
"t": 9522
},
{
"x": 131.0625,
"y": 251,
"t": 9528
}
]
}
],
"duration": 23116
},
{
78
"visuals": [
{
"type": "Stroke",
"hyperlink": null,
"tDeletion": null,
"propertyTransforms": [],
"spatialTransforms": [],
"tMin": 947,
"properties": {
"c": "#777",
"w": 2
},
"vertices": [
{
"x": 92.0625,
"y": 31,
"t": 949
},
{
"x": 92.0625,
"y": 32,
"t": 1034
},
{
"x": 93.0625,
"y": 33,
"t": 1046
},
{
"x": 93.0625,
"y": 34,
"t": 1059
}
]
},
{
"type": "Stroke",
"hyperlink": null,
"tDeletion": null,
"propertyTransforms": [],
"spatialTransforms": [],
79
"tMin": 2531,
"properties": {
"c": "#777",
"w": 2
},
"vertices": [
{
"x": 163.0625,
"y": 56,
"t": 2531
},
{
"x": 166.0625,
"y": 53,
"t": 2594
},
{
"x": 168.0625,
"y": 51,
"t": 2603
},
{
"x": 171.0625,
"y": 50,
"t": 2617
},
{
"x": 174.0625,
"y": 48,
"t": 2629
}
]
},
{
"type": "Stroke",
"hyperlink": null,
"tDeletion": null,
"propertyTransforms": [],
"spatialTransforms": [],
"tMin": 9468,
"properties": {
80
"c": "#777",
"w": 2
},
"vertices": [
{
"x": 125.0625,
"y": 258,
"t": 9470
},
{
"x": 125.0625,
"y": 257,
"t": 9491
},
{
"x": 127.0625,
"y": 254,
"t": 9522
},
{
"x": 131.0625,
"y": 251,
"t": 9528
}
]
}
],
"duration": 23116
}
],
"canvas_width": 800,
"canvas_height": 500
},
"audio_model": {
"audio_tracks": [
{
"audio_segments": [
{
"audio_clip": 0,
"total_audio_length": 12528,
"audio_start_time": 0,
81
"audio_end_time": 12528,
"start_time": 0,
"end_time": 12528
},
{
"audio_clip": 1,
"total_audio_length": 4399,
"audio_start_time": 0,
"audio_end_time": 4399,
"start_time": 12528,
"end_time": 16927
}
]
}
]
},
"retimer_model": {
"constraints": [
{
"tVis": 0,
"tAud": 0,
"constraintType": "Automatic"
},
{
"tVis": 6650,
"tAud": 6650,
"constraintType": "Manual"
},
{
"tVis": 9525,
"tAud": 9525,
"constraintType": "Manual"
},
{
"tVis": 12528,
"tAud": 12528,
"constraintType": "Automatic"
},
{
"tVis": 14500,
"tAud": 14500,
82
"constraintType": "Manual"
},
{
"tVis": 16927,
"tAud": 16927,
"constraintType": "Automatic"
}
]
}
}
83
References
[1] Adriatic 11, “Handwriting flash cs3 tutorial,” 2008.
[2] D. Doerman, “An Introduction to Vectorization and Segmentation,” Lecture Notes in
Computer Science, 1998, Volume 1389/1998, p. 1-8.
[3] F. Berthouzoz, W. Li, M. Agrawala, “Tools for placing cuts and transitions in interview
video,” ACM Transactions and Graphics 31, 4, 67.
[4] F. Durand, “Non-Sequential Authoring of Handwritten Video Lectures With Pentimento”
[5] J. Loviscach, “A real-time production tool for animated hand sketches,” 2011 in CVMP.
[6] P. Shirley, S. Marschner, et al., “Fundamentals of Computer Graphics,” A K Peters Ltd.
2009.
[7] R. Talbert, “How I Make Screencasts: The Whiteboard Screencast,” The Chronicle of
Higher Education blog, 2011.
[8] Sclipo , “How to create flash animations with a wacom tablet,” 2008.
[9]W3C, “Scalable vector graphics (svg) 1.1.” 2011.
[10] Y. Lai, S. Hu, R. Martin, “Automatic and Topology-Preserving Gradient Mesh Generation
for Image Vectorization,” ACM Transactions on Graphics, Vol. 28, No. 3, Article 85, August
2009.
[11] C. Lawrence Zitnik. “Handwriting Beautification Using Token Means,” Microsoft
Research.
84
85
Download