Latent Semantic Analysis (LSA)

advertisement
Latent Semantic Analysis (LSA)
Introduction to LSA




Learning Model
Uses Singular Value Decomposition (SVD) to simulate
human learning of word and passage meaning
Represents word and passage meaning as highdimensional vectors in the semantic space
Its success depends on:



Sufficient scale and sampling of the data it is given
Choosing the right number of dimensions to extract
Shows that empirical association data, when
sufficient to induce how elements are related to each
other, makes learning from experience very powerful
Introduction to LSA cont’d (2)

Implements the idea that the meaning of a passage
is the sum of the meanings of its words:
meaning of word1 + meaning of word2 + … + meaning of wordn = meaning of passage


This “bag of words” function shows that a passage is
considered to be an unordered set of word tokens
and the meanings are additive.
By creating an equation of this kind for every
passage of language that a learner observes, we get
a large system of linear equations.
Introduction to LSA cont’d (3)

System is “ill-conditioned”:



Too few equations to specify the values of the variables
Different values for the same variable (natural since
meanings are vague or multiple)
Instead of finding absolute values for the meanings,
they are represented in a richer form (vectors)

Use of SVD (reduces the linear system into multidimensional
vectors)
How LSA works





Takes as input a corpus of natural language
The corpus is parsed into meaningful passages (such
as paragraphs)
A matrix is formed with passages as rows and words
as columns. Cells contain the number of times that a
given word is used in a given passage
The cell values are transformed into a measure of the
information about the passage identity the carry
SVD is applied to represent the words and passages
as vectors in a high dimensional semantic space
Similarity in LSA


The vector of a passage is the vector sum of the
vectors standing for the words it contains
Similarity of any two words or two passages is
computed as the cosine between them in the
semantic space:




Identical meaning: value of cosine = 1
Unrelated meaning: value of cosine = 0
Opposite meaning: value of cosine = -1
Number of dimensions used is an important issue


Small dimensions (small singular values) represent local
unique components
Large dimensions capture similarities and differences
Similarity in LSA cont’d


Dropping dimensions that do not matter is an
advantage for detecting similarity
SVD supports optimal dimension reduction



Keeps aspects that are more characteristic
Deletes aspects that are unreliable
In LSA a word with many senses does not have
multiples representations, even when the senses are
not related to each other
Fly
Insect
Mosquito
Soar
Pilot
0.26
0.34
0.54
0.58
0.61
0.27
0.09
Sample Applications of LSA

Essay Grading




LSA is trained on a large sample of text from the same
domain as the topic of the essay
Each essay is compared to a large set of essays scored by
experts and a subset of the most similar identified by LSA
The target essay is assigned a score consisting of a
weighted combination of the scores for the comparison
essays
Automatic Information Retrieval

LSA matches users’ queries with documents that have the
desired conceptual meaning
Sample Applications of LSA
cont’d (2)

Retrieval when queries and documents are in
different languages




Overlapping set of documents (does not have to be large)
Rotation of the two semantic spaces, so there is
correspondence on the overlapping set
Second language learning
Prediction of how much an individual student will
learn from a particular instructional text


Based on the similarity of an essay on a topic to a given
text.
Optimal text can be chosen
Sample Applications of LSA
cont’d (3)

Prediction of differences in comprehensibility of texts




By using conceptual similarity measures between successive
sentences
LSA has predicted comprehension test results with students
Evaluate and give advice to students as the write and
revise summaries of texts they have read
Assess psychiatric status

By representing the semantic content of answers to
psychiatric interview questions
“Poverty of the stimulus”
argument

It is the belief that the information available from
observation of the language is insufficient for learning
to use or understand it



LSA can not be considered a theory of mind because it is
based only on co-occurrence
Many human abilities are centered on the representation of
non co-occurring units
The anti-learning position on verbal meaning has
deeper roots

Chomsky has stated that “concepts must be available prior
to experience; children acquire labels for concepts they
already have”
“Poverty of the stimulus”
argument cont’d (2)


Landauer claims that LSA has proven that it can
represent meanings without pre-existing knowledge,
just by learning from experience
LSA up to date does not account for syntax at all.


There is no proof, however, that there is not a method for
induction of syntactic meaning based on the observation of
syntactic patterns
Tomasello (2000) shows that syntactic patterns of language
develop gradually in children, which means that
experimental data can be used to learn syntax
“Poverty of the stimulus”
argument cont’d (3)

Another version of the poverty of the stimulus comes
from Gold (1967) and his followers


A language is a deterministic set of rules that specify an
infinite set of word strings and thus it cannot be learned by
observing samples of the language
Landauer claims that


LSA models the representation of meaning rather than the
process by which language is produced
LSA could also have some relevance to production without
using rules. Instead, production would be based on the
process of finding the set of words whose vector sum
approximates the vector for an idea
“Poverty of the stimulus”
argument cont’d (4)


It remains to be seen whether this mechanism can be used
for language production by also taking into consideration the
ordering of the words
LSA is not a complete theory of verbal meaning, but
it provides a good starting point for future exploration
Download