Analysing Features of Lecture Slides and Past Exam Paper Materials

advertisement
Analysing Features of Lecture Slides and Past Exam Paper Materials
Towards Automatic Associating E-materials for Self-revision
Petch Sajjacholapunt1 , Mike Joy1
1 Department
of Computer Science, University of Warwick, United Kingdom
{P.Sajjacholapunt, M.S.Joy}@warwick.ac.uk
Keywords:
Information Retrieval, Technical Terms Extraction, Technology Enhanced Learning
Abstract:
Digital materials not only provide opportunities as enablers of e-learning development, but also create a new
challenge. The current e-materials provided on a course website are individually designed for learning in classrooms rather than for revision. In order to enable the capability of e-materials to support a students revision,
we need an efficient system to associate related pieces of different e-materials. In this case, the features of each
item of e-material, including the structure and the technical terms they contain, need to be studied and applied
in order to calculate the similarity between relevant e-materials. Even though difficulties regarding technical
term extraction and the similarities between two text documents have been widely discussed, empirical experiments for particular types of e-learning materials (for instance, lecture slides and past exam papers) are still
rare. In this paper, we propose a framework and relatedness model for associating lecture slides and past exam
paper materials to support revision based on Natural Language Processing (NLP) techniques. We compare
and evaluate the efficiency of different combinations of three weighted schemes, term frequency (TF), inverse
document frequency (IDF), and term location (TL), for calculating the relatedness score. The experiments
were conducted on 30 lectures (⇠ 900 slides) and 3 past exam papers (12 pages) of a data structures course
at the authors’ institution. The findings indicate the appropriate features for calculating the relatedness score
between lecture slides and past exam papers.
1
INTRODUCTION
E-materials that are normally provided on a course
website comprise lecture slides, past exam papers,
homework and assignments, and a list of related textbooks. They are likely to be just supplementary materials for the class, and are not intended specifically
for self-study. Although it is difficult to deny that
attending lectures must be the first priority of a student, rather than paying attention only to self-study,
students still need self-study to recall knowledge before an examination.
In order to succeed in preparing materials before examinations, students need to understand the
course materials that are delivered by the lecturers.
These e-course materials, however, have not yet maximised the advantage of being digital media. Most ematerials on a course website are closed materials in
that they are independent with no possibility of free
linking and combining features (Krnel and Barbra,
2009). Many studies illustrate the difficulties of using
online e-materials. For example Mertens et al. (2006)
states that searching through a pile of digital content
is time-consuming and requires skill and knowledge
of using computers and computer applications. Moreover, students sometimes suffer from inadequate lecture note content or the poor linking of key concepts
between course materials.
The main goal of our research is to address the difficulties of using e-materials by associating relevant
content among different types of e-materials. In order
to achieve this goal, the structure of each item of material, has to be studied and determined, using techniques including natural language processing (NLP)
for technical term extraction. In this paper, we construct a framework for associating e-materials based
on existing NLP techniques. We also experiment with
comparing potential features of e-materials that are
appropriate when constructing a weighting scheme
for calculating the relatedness scores between different e-materials. At this preliminary stage, we focus
our experiment on the structure of lecture slides and
past exam papers, as they are the primary set of materials that students tend to review (Sajjacholapunt and
Joy, 2014). Additional e-materials will be examined
in the near future. The detail of the framework, tech-
niques and experimental results are discussed in the
following subsections.
2
RELATED WORK
Lecture slides are a common material that is used for
presentation of lectures in class, and many studies
have focused on the use of lecture slide in a classroom
context. For example, Frey and Birnbaum (2002)
stated that instructors do not wish to make PowerPoint slides as a substitute for lecturing. They aim
at presenting only main ideas, but not a summary of
the lecture. Holmes (2004) also stated that a presentation can serve as a guide for listeners or readers, but
it can never be a medium that is capable of replacing a skilled teacher. What is possible is that it can
be used to conceal poor-quality teaching by providing
validity, albeit without gains in terms of the results of
learning (Pros et al., 2013).
Many studies emphasise improving presentation
video based on presentation slides such as synchronising a speech transcript and presentation slides (Chen
and Heng, 2003), annotating each video segment with
the related presentation slide(Sack and Waitelonis,
2006) and improving search performance in video
recording presentation by indexing the presentation
slides(Vinciarelli and Odobez, 2006; Le et al., 2008).
There are also other related works that aim at improving presentation slides by adding their own information. Hayama and Kunifuji (2011) describes a
method to extract related pieces of information from
presentation slides and display them properly as preview information. Hill (2011) purposes the idea of
checking and grading presentation file assignments by
comparing student documents with a correct version
of assignment. Yuanyuan and Kazutoshi (2011) concentrates on improving browsing performance of presentation slides by recommending other related presentation slides and identifying their relationship.
Many research projects have analysed and reused
content in presentation slides for summarising slides
content for a quick overview, building an index for
quick access to other digital media and for linking related slides together. These projects have not yet concentrated on improving presentation slides based on
other e-course materials.
At this stage, we considered to use past exam paper as a supplementary material for enriching lecture
slides content. This is because past exam paper is also
a commonly used resource during revision. Most students use a past exam paper to help them get familiar
with terminologies and exam style questions that will
be used in the actual exam (du Boulay, 2011). They
also are used to help them work out time required for
each question as well as to identify subject areas to
focus on in their revision. Enhancing e-lecture slide
based on past exam paper is a challenge problem. We
need to understand the structure of both materials including their features.
3
METHODOLOGY
In order to implement a system for automatic association between lecture slides and past exam papers, we
need a framework as a guideline for designing the system. The framework for associating e-lecture slides
with related past exam paper materials is illustrated
in Figure 1. This framework was designed based on
common NLP techniques presented in (Baeza-Yates
and Ribeiro-Neto, 2011). This framework starts by
determining the potential features of the selected ematerials, extracting candidate terms and calculating
the similarity table of these two documents.
Figure 1: The framework for associating e-lecture slides
with related past exam paper materials.
3.1
Determining Potential Features
Determining key features of individual e-materials is
a challenging issue because their structures are normally different. By their nature, lecture slides and
exam paper materials contain a lot fewer terminologies and content than does an e-book. In order to create a link between these related materials, it is necessary to understand their structure. The following section explores lecture slide and exam paper structure,
as well as identifying potential common features.
3.1.1
Lecture slides
The format of lecture slides usually contains two major parts — the title and the content as presented in
Figure 2. The former offers an overview of the content that is in the body, while the latter contains key
information related to the title. The content is mostly
presented as a bullet list which allows students to
quickly obtain information.
Figure 3: Cover page of exam paper structure.
Figure 4: Question page of exam paper structure.
3.2
Figure 2: Lecture Slides Structure
The format of lecture slides can be altered based
on the style of the author. Two major styles that the
author normally chooses are media selection and media format. Media selection is a process that is concerned with what media should be used in the lecture
slide, such as figure, table or text information. Media format is a process of selecting the property of the
media that is to be presented on the slide. For example, the text information can be adjusted in terms of
its font size, colour or location.
3.1.2
Past Exam Paper
Generally, the format of exam papers contains two
major parts — cover page and question page. The
former, as presented in Figure3, consists of general information with regard to course information, date and
time, exam time and length, as well as instructional
information. The information on the cover page usually does not contain any subject regarding the course
content.
The latter, as presented in Figure 4, consists of
a set of main questions and sub-questions. Subquestions are not only defined in the form of an interrogative sentence, but sometimes are also defined
in the form of an affirmative sentence or equation following a core question in order to measure abilities
in terms of explanation and discussion. A mark for
each sub-question is commonly provided to indicate
the level of difficulty.
Candidate Term Extraction
The following subsections explain the processes and
approaches for extracting candidate terms, including
techniques that are considered appropriate in the research.
3.2.1
Data Pre-Processing
In order to convert the current format of e-materials to
plain text, we consider applying the six common preprocessing techniques presented in Table 1. The iText
(Lowagie, 2007), a Free Java-PDF library 5.5.0 API,
is used as a library for converting PDF E-Materials to
text format.
Table 1: Common pre-processing tasks.
Converted Document to Plain Text
Sentence Segmentation
Tokenisation
Part-of-Speech Tagging
Stemming and Lemmatisation
Stop-words Filtering
Converting E-Lecture Slides and Exam Paper to
Plain Text
Converting a document to plain text is a process
of transforming a non-plain text document (e.g.,
JPEG, .PDF, .PPT) to a plain text document (Foo,
2012). Much research has considered use of e-lecture
slides in PowerPoint (PPT) format. In the real world,
however, most of the provided e-lecture slides and
past exam papers are presented in PDF format for
accessibility and security reasons. We thus mainly
focus on converting the PDF document to plain text.
The iText (Lowagie, 2007), a Free Java-PDF library
5.5.0 API, is used as a library for transforming PDFto-TEXT. The Apache OpenNLP 1 was selected as a
tool to perform sentence segmentation, tokenisation,
part-of-speech tagging, stemming and lemmatisation
in this research because it is an open-source tool and
its popularity for using in term extraction system.
Finally, the 517-stop words list built by Salton (1971)
for the experimental SMART information retrieval
system at Cornell University was used in stop-word
filtering process.
3.2.2
Linguistic and Statistic Approaches for
Term Extraction
Techniques for extracting terminologies can mainly
be classified into two approaches (Pazienza et al.,
2005; Conrado et al., 2013), which are (1) linguistic
approaches and (2) statistical approaches. The former
approach is to deal with pure linguistic properties to
extract terminologies, such as part-of-speech patterns
and words related to the stem. The latter approach applies statistics to measure the degree of termhood of
candidate terms to decide whether to choose the term.
For example, basic term frequencies and word length
counts. The details of these two approaches are shown
in the following section.
Use of the linguistic approaches or statistical approach alone does not provide an effective result
(Pazienza et al., 2005). There are only a few works
that use only the statistical method without touching
any of the linguistic approaches (Jones et al., 1990;
Salton et al., 1975) . In this research, we therefore
chose to use a hybrid approach for the lecture slide
materials. The open-source JATEtoolkit 2 were used
because it can perform both approaches. However,we
do not apply the statistical approach to extract candidate terms in past exam paper because the nature of
exam paper contains low term frequencies for which a
statistical approach may not be useful. For the statistical approach, the C-value algorithm was tested first
because it is outperformed among other algorithms
and it does not need reference corpus as a training
data (Pazienza et al., 2005).
3.3
Similarity Calculation
Identifying a lecture slide that relates to relevant past
exam papers is another challenge. The degree of similarity between each item of material needs to be calculated. Having analysed the structure of both the lecture slides and exam papers (in the subsection 3.1), we
1 http://opennlp.apache.org
2 https://code.google.com/p/jatetoolkit/
considered three potential features, for testing the effectiveness of association between lecture slides and
past exam papers, which are the following.
• Terms frequency (TF): is a measure of frequency of a candidate term that appears in the
target document. The following equations show
how normalised TF is computed (Salton and
Buckley, 1988).
For the candidate term ti in the document d j , the
normalised term frequency t f (i, j) derives from a
fraction of the raw frequency f req(i, j) and the
maximum of the raw frequency maxi, j f req(i, j)
of term ti over all terms mentioned in any
documents d j .
t f (i, j) =
f req(i, j)
.
maxi, j f req(i, j)
(1)
• Inverse Document frequency (IDF): Sometime
there is a term that has a high frequency in the
document but it does not represent the document
because it also has a high frequency in another
document. In this case, the statistical term
extraction technique called TF-IDF value is applied. The TF-IDF is computed by the following
equation.
Let N be the total number of documents and
ni be number of the document that a candidate
term appears. The inverse document frequency
of term ti is computed by (Salton and Buckley,
1988).
id f (i) = log
N
ni
(2)
The common term weighting score t f id f (i, j)
is computed by (Salton and Buckley, 1988).
tf
id f (i, j) = t f (i, j) ⇥ id f (i)
(3)
• Terms Location (TL): This is a location where
the term is displayed in the document. The significance of the term can be changed based on the
location. For example, in the lecture slides, terms
that appear in titles should be more important than
terms that appear on in body position.
As we mentioned earlier, candidate terms that
present in past exam paper likely be a subject area
that the student must know before the exam. We then
can assume that term that appears in the lecture slide
and also appears in the exam paper is considered as a
key for linking these two documents. Simple Boolean
matching (of candidate terms) as presented in Figure
(TL) adjustment. In addition to Case 3, candidate
terms located in the header of the slide S j have
more weight than candidate terms that appear only
in the body slide.
ˆ ⇥ id f (i, j)
f req
w4 (i, j) =
,
(7)
ˆ ⇥ id f (i, j)
maxi, j f req
where
Figure 5: Technical term association
5 thus, is a promising technique for identifying the
lecture slides that relates to past exam paper.
In this research, we chose to investigate and compare the three mentioned features to identify the best
features for scoring and ranking relatedness between
lecture slides and exam papers. The experiment was
done using the four following cases.
Denote :
w(·) (i, j) = Weight of term ti in slide S j ,
f req(i, j) = Number of term ti in slide S j ,
H j := Header of slide S j ,
WH := Weight of Header (vary from range 2 - 10).
f req(i, j)
.
maxi, j f req(i, j)
(4)
• Case 2: Term Frequency (TF) feature with term
location (TL) adjustment. In addition to Case 1,
candidate terms located in the header of the slide
S j have more weight than candidate terms that appear on the body slide.
w2 (i, j) =
where
ˆ
f req(i,
j) =
8
<
ˆ
f req(i,
j)
,
ˆ
maxi, j f req(i, j)
(5)
w3 (i, j) =
Precision =
Recall =
if i 2 H j .
• Case 3: Term Frequency and Inverse Document
Frequency (TF-IDF) feature. Calculate the TFIDF score of candidate terms that appear in both
slide S j and past exam paper Pk .
f req ⇥ id f (i, j)
.
maxi, j f req ⇥ id f (i, j)
(6)
• Case 4: Term Frequency and Inverse Document
Frequency (TF-IDF) feature with term location
: f req ⇥ id f (i, j) ·W
H
if i 2 H j .
8
>
<
Âi2 j w(·) (i, j)
max j (Âi2 j w(·) (i, j))
>
:
if i 2 k,
(8)
0 otherwise.
A higher relatedness score implies that more information related to a past exam paper is contained in
a lecture slide. The appropriate relatedness threshold,
therefore, has to be defined. This can be achieved by
using different relatedness thresholds to retrieve documents and measure their effectiveness. The results
will be compared with the answer set to evaluate the
accuracy of different cases. The expected results are a
combination of features that can provide the most fitting retrieved results including a similarity threshold
and weight of terms located in the header of lecture
slides that can produce the most accurate result.
f req(i, j) if i 2
/ Hj,
: f req(i, j) ·W
H
f req ⇥ id f (i, j) if i 2
/ Hj,
Relatedness, RScore(·) ( j, k), is a score of similarity between lecture slide S j and past exam paper Pk
where term i is presented in both S j and Pk . The
RScore ranking model is calculated as the following
equation.
RScore(·) ( j, k) =
• Case 1: Term Frequency (TF) feature. Calculate
the TF score of candidate terms that appear in both
slide S j and past exam paper Pk .
w1 (i, j) =
ˆ
f req⇥id
f (i, j) =
8
<
|answer set \ retrieved document|
(9)
retrieved document
|answer set \ retrieved document|
answer set
F-Score = 2 ·
4
Precision
Recall
(10)
(11)
EXPERIMENTS
In order to perform the experiment, we designed a
system based on the above-mentioned framework.
The four cases weighting scheme will be investigated
to determine the appropriate weighting scheme for the
proposed framework.
4.1
Datasets for Training
The datasets for training the system were obtained
from a corpus of undergraduate lecture slides at the
Department of Computer Science, University of Warwick. The course CS126 Design of Information
Structures was selected because it provided sufficient
materials such as lecture slides, past exam papers as
well as a recommended textbook for the purposes of
our study.
Past exam paper in the module were available for
3 years. All three past exam papers (12 pages) and
30 lectures (⇠ 900 lecture slide pages) were selected
for the experiment. The answer set were manually
defined by matching lecture slide pages with related
past exam papers based on the rule that content in a
lecture slide page must contains technical terms that
appear in a past exam paper.
Result of related documents in different cases of
calculating similarity table would be retrieved to compare with the answer set based on different similarity
threshold. The precision (Eq.9), recall (Eq. 10), and
F-score (Eq.11) were used as the evaluation results for
comparison.
4.2
the lecture slides does not impact a great deal on the
recall score.
Evaluation Results
As part of the evaluation for determining the effective
feature for the proposed framework, we studied the
effectiveness of TF and TF-IDF weighting schemes
by adjusting the weight based on the term location.
Bar chart is used to represent the precision, recall, fscore value at different similarity threshold. Ten bars
at each similarity threshold in the bar chart represent
different adjusted weight (1-10) from left to right.
4.2.1
Figure 6: The average precision of TF feature on different
weight of terms in the header.
Weighting Approach of TF Feature
At any similarity threshold in Figure 6, increasing the
weight for the terms in the header location does not
greatly affect the average precision score. This preliminary result suggests that the term in the header location is relatively unimportant compared to the relatedness score. The highest precision score is at 27.59%
with a weight >than 4 and the relatedness threshold
0.25. It is improved by only 1.84% from the highest
precision score associated with using the TF feature
with no adjusted weight.
The trend of the average recall (in Figure 7) shows
that the TF features decrease when we increase the
relatedness threshold from 0.05 to 0.55, then remain
stable after that. This is almost exactly the same at any
adjusted weight. This also suggests that increasing
the weight of terms that are located in the headers of
Figure 7: The average recall of TF feature on different
weight of terms in the header.
The F-Score, in Figure 8, confirms that giving
more weight to the terms located in the headers of
the lecture slides does not greatly affect the relatedness score. The highest F-score, presented is 18.92%
with a relatedness threshold 0.15 and the weight of
the header being 2. It only increases 0.24% from the
highest average F-score associated with using the TF
feature without weighting.
Figure 8: The average F-score of TF feature on different
weight of terms in the header.
4.2.2
Weighting Approach of TF-IDF Feature
When applying the IDF score, the trend in terms of
the precision score (see Figure 9) fluctuates more than
when using only the TF score in Figure 6. From the
relatedness threshold 0.25 to the relatedness threshold
0.55, there was a considerable fall in the percentage
of the average precision on the TF feature, while the
graph of the TF-IDF feature still continues increasing towards the highest average precision which is
34.58% at the relatedness threshold 0.5 , when the
weight is 1.5. Thus it can be stated that applying
IDF can eliminate some of the non-technical terms
extracted from the lecture slides.
At some point of the relatedness threshold, for instance at a relatedness threshold of 0.35, the greater
the weighting given to the terms located in the headers, the greater the average precision score. On the
other hand, at a relatedness threshold of 0.5, giving
more weight can return a lower percentage in terms
of average precision.
Figure 10: The average recall of TF-IDF feature on different
weight of terms in the header.
Figure 11: The average F-score of TF-IDF feature on different weight of terms in the header.
Figure 9: The average precision of TF-IDF feature on different weight of terms in the header.
Bar chart trends (see Figure 10) are similar to the
average recall of TF presented in Figure 7. This can
confirm that using the TF-IDF feature and giving different weights to the term location cannot increase the
number of relevant documents retrieved.
Giving different weights to the term location with
the TF-IDF feature does not greatly affect the FScore. The highest average F-score presented in Figure 11 is 22.93% of a similarity threshold of 0.2 and
weight 2. This is just 1% higher than using only the
TF-IDF feature.
5
CONCLUSIONS AND FUTURE
WORK
In conclusion, the idea of increasing the relatedness
threshold is to eliminate non-relevant documents by
retrieving only those documents that have a relatedness score above the threshold. Thus, a target document that has a high relatedness score appears to have
a high degree of relevance to the source documents.
This is supposed to improve the precision score.
The recall score is initially dependent on the relevant document extraction method. The more relevant the documents (in the answer set) that are retrieved, the greater the recall score. Increasing the relatedness threshold to eliminate the non-relevant documents therefore, cannot improve the recall score. Although increasing the relatedness score does not increase the recall scores, it can still retain or slow down
the reduction of the recall score.
In this experiment, however, increasing the relatedness threshold at some point can improve the average precision score but not the recall score. It can
be characterized by the fact that, using TF and TFIDF features, both with and without giving weight
to the term location, can eliminate the non-relevant
documents from being retrieved. The sharp reduction in recall scores at the initial range of the relatedness threshold between 0.05–0.3 in all cases, implies that some of the relevant documents that were
retrieved with a lower relatedness score were elimi-
nated. Thus it can be concluded that the TF-IDF feature is the best feature when it comes to eliminating
non-relevant documents and to improving precision
among other test features. The term weight location
only has a minimal impact on this, and is not significant.
In future, the research will still need to find out in
detail why the term location feature is not much help,
as well as identifying how to improve a recall score at
the beginning of extracting a candidate set of related
documents.
REFERENCES
Baeza-Yates, R. and Ribeiro-Neto, B. (2011). Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition) (ACM Press Books). AddisonWesley Professional, 2 edition.
Chen, Y. and Heng, W. J. (2003). Automatic synchronization of speech transcript and slides in presentation. In
Circuits and Systems, 2003. ISCAS ’03. Proceedings of
the 2003 International Symposium on, volume 2, pages
II–568–II–571 vol.2.
Conrado, M., Pardo, T. A. S., and Rezende, S. O. (2013).
Exploration of a Rich Feature Set for Automatic Term
Extraction, volume 8265 of Lecture Notes in Computer
Science, pages 342–354. Springer Berlin Heidelberg.
du Boulay, D. (2011). Study Skills For Dummies. John
Wiley & Sons, (uk edition) edition.
Foo, J. (2012). Computational Terminology : Exploring
Bilingual and Monolingual Term Extraction. PhD thesis,
Linkoping University.
Frey, B. A. and Birnbaum, D. J. (2002). Learners’
perceptions on the value of powerpoint in lectures.
(ED467192):10.
Hayama, T. and Kunifuji, S. (2011). Relevant piece of
information extraction from presentation slide page for
slide information retrieval system. In Theeramunkong,
T., Kunifuji, S., Sornlertlamvanich, V., and Nattee, C.,
editors, Knowledge, Information, and Creativity Support
Systems, volume 6746 of Lecture Notes in Computer Science, pages 22–31. Springer Berlin Heidelberg.
Hill, T. G. (2011). Word grader and powerpoint grader.
ACM Inroads, 2(2):34–36.
Holmes, W. N. (2004). In defense of powerpoint. IEEE
Computer, 37(7):98–99.
Jones, L. P., Gassie Jr, E. W., and Radhakrishnan, S. (1990).
Index: The statistical basis for an automatic conceptual
phrase-indexing system. Journal of the American Society
for Information Science, 41(2):87–97.
Krnel, D. and Barbra, B. (2009). Learning and e-materials.
Acta Didactica Napocensia, 2(1):97–108.
Le, H. H., Lertrusdachakul, T., Watanabe, T., and Yokota,
H. (2008). Automatic digest generation by extracting
important scenes from the content of presentations. In
Database and Expert Systems Application, 2008. DEXA
’08. 19th International Workshop on, pages 590–594.
Lowagie, B. (2007). IText in Action: Creating and Manipulating PDF. Manning Pubs Co Series. Manning.
Mertens, R., Farzan, R., and Brusilovsky, P. (2006). Social navigation in web lectures. In Proceedings of the
Seventeenth Conference on Hypertext and Hypermedia,
HYPERTEXT ’06, pages 41–44, New York, NY, USA.
ACM.
Pazienza, M., Pennacchiotti, M., and Zanzotto, F. (2005).
Terminology extraction: An analysis of linguistic and
statistical approaches. In Sirmakessis, S., editor, Knowledge Mining, volume 185 of Studies in Fuzziness and
Soft Computing, pages 255–279. Springer Berlin Heidelberg.
Pros, R. C., Tarrida, A. C., Martin, M. d. M. B., and
Amores, M. d. C. C. (2013). Effects of the powerpoint
methodology on content learning. Intangible Capital,
9(1):184–198.
Sack, H. and Waitelonis, J. (2006). Automated annotations
of synchronized multimedia presentations. In In Workshop on Mastering the Gap: From Information Extraction to Semantic Representation, CEUR Workshop Proceedings.
Sajjacholapunt, P. and Joy, M. (2014). Exploring patterns
of using learning resources as a guideline to improve
self-revision. In INTED2014 Proceedings, 8th International Technology, Education and Development Conference, pages 5263–5271. IATED.
Salton, G. (1971).
The SMART Retrieval System–
Experiments in Automatic Document Processing.
Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513 – 523.
Salton, G., Yang, C.-S., and Yu, C. T. (1975). A theory of
term importance in automatic text analysis. Journal of
the American society for Information Science, 26(1):33–
44.
Vinciarelli, A. and Odobez, J. (2006). Application of information retrieval technologies to presentation slides. Multimedia, IEEE Transactions on, 8(5):981–995.
Yuanyuan, W. and Kazutoshi, S. (2011). A browsing
method for presentation slides based on semantic relations and document structure for e-learning. IPSJ Journal, 52(12):15p.
Download