Latent Semantic Analysis (LSA) Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage meaning Represents word and passage meaning as highdimensional vectors in the semantic space Its success depends on: Sufficient scale and sampling of the data it is given Choosing the right number of dimensions to extract Shows that empirical association data, when sufficient to induce how elements are related to each other, makes learning from experience very powerful Introduction to LSA cont’d (2) Implements the idea that the meaning of a passage is the sum of the meanings of its words: meaning of word1 + meaning of word2 + … + meaning of wordn = meaning of passage This “bag of words” function shows that a passage is considered to be an unordered set of word tokens and the meanings are additive. By creating an equation of this kind for every passage of language that a learner observes, we get a large system of linear equations. Introduction to LSA cont’d (3) System is “ill-conditioned”: Too few equations to specify the values of the variables Different values for the same variable (natural since meanings are vague or multiple) Instead of finding absolute values for the meanings, they are represented in a richer form (vectors) Use of SVD (reduces the linear system into multidimensional vectors) How LSA works Takes as input a corpus of natural language The corpus is parsed into meaningful passages (such as paragraphs) A matrix is formed with passages as rows and words as columns. Cells contain the number of times that a given word is used in a given passage The cell values are transformed into a measure of the information about the passage identity the carry SVD is applied to represent the words and passages as vectors in a high dimensional semantic space Similarity in LSA The vector of a passage is the vector sum of the vectors standing for the words it contains Similarity of any two words or two passages is computed as the cosine between them in the semantic space: Identical meaning: value of cosine = 1 Unrelated meaning: value of cosine = 0 Opposite meaning: value of cosine = -1 Number of dimensions used is an important issue Small dimensions (small singular values) represent local unique components Large dimensions capture similarities and differences Similarity in LSA cont’d Dropping dimensions that do not matter is an advantage for detecting similarity SVD supports optimal dimension reduction Keeps aspects that are more characteristic Deletes aspects that are unreliable In LSA a word with many senses does not have multiples representations, even when the senses are not related to each other Fly Insect Mosquito Soar Pilot 0.26 0.34 0.54 0.58 0.61 0.27 0.09 Sample Applications of LSA Essay Grading LSA is trained on a large sample of text from the same domain as the topic of the essay Each essay is compared to a large set of essays scored by experts and a subset of the most similar identified by LSA The target essay is assigned a score consisting of a weighted combination of the scores for the comparison essays Automatic Information Retrieval LSA matches users’ queries with documents that have the desired conceptual meaning Sample Applications of LSA cont’d (2) Retrieval when queries and documents are in different languages Overlapping set of documents (does not have to be large) Rotation of the two semantic spaces, so there is correspondence on the overlapping set Second language learning Prediction of how much an individual student will learn from a particular instructional text Based on the similarity of an essay on a topic to a given text. Optimal text can be chosen Sample Applications of LSA cont’d (3) Prediction of differences in comprehensibility of texts By using conceptual similarity measures between successive sentences LSA has predicted comprehension test results with students Evaluate and give advice to students as the write and revise summaries of texts they have read Assess psychiatric status By representing the semantic content of answers to psychiatric interview questions “Poverty of the stimulus” argument It is the belief that the information available from observation of the language is insufficient for learning to use or understand it LSA can not be considered a theory of mind because it is based only on co-occurrence Many human abilities are centered on the representation of non co-occurring units The anti-learning position on verbal meaning has deeper roots Chomsky has stated that “concepts must be available prior to experience; children acquire labels for concepts they already have” “Poverty of the stimulus” argument cont’d (2) Landauer claims that LSA has proven that it can represent meanings without pre-existing knowledge, just by learning from experience LSA up to date does not account for syntax at all. There is no proof, however, that there is not a method for induction of syntactic meaning based on the observation of syntactic patterns Tomasello (2000) shows that syntactic patterns of language develop gradually in children, which means that experimental data can be used to learn syntax “Poverty of the stimulus” argument cont’d (3) Another version of the poverty of the stimulus comes from Gold (1967) and his followers A language is a deterministic set of rules that specify an infinite set of word strings and thus it cannot be learned by observing samples of the language Landauer claims that LSA models the representation of meaning rather than the process by which language is produced LSA could also have some relevance to production without using rules. Instead, production would be based on the process of finding the set of words whose vector sum approximates the vector for an idea “Poverty of the stimulus” argument cont’d (4) It remains to be seen whether this mechanism can be used for language production by also taking into consideration the ordering of the words LSA is not a complete theory of verbal meaning, but it provides a good starting point for future exploration