A Probabilistic Approach to Semantic Representation Tom Griffiths Mark Steyvers Josh Tenenbaum • How do we store the meanings of words? – question of representation – requires efficient abstraction • How do we store the meanings of words? – question of representation – requires efficient abstraction • Why do we store this information? – function of semantic memory – predictive structure Latent Semantic Analysis (Landauer & Dumais, 1997) co-occurrence matrix high dimensional space Doc1 Doc2 Doc3 … words 34 0 3 in 0 12 2 semantic 5 19 6 in spaces … 11 … 6 … 1 … semantic X SVD words spaces UDVT Mechanistic Claim Some component of word meaning can be extracted from co-occurrence statistics Mechanistic Claim Some component of word meaning can be extracted from co-occurrence statistics But… – Why should this be true? – Is the SVD the best way to treat these data? – What assumptions are we making about meaning? Mechanism and Function Some component of word meaning can be extracted from co-occurrence statistics Semantic memory is structured to aid retrieval via context-specific prediction Functional Claim Semantic memory is structured to aid retrieval via context-specific prediction – Motivates sensitivity to co-occurrence statistics – Identifies how co-occurrence data should be used – Allows the role of meaning to be specified exactly, and finds a meaningful decomposition of language A Probabilistic Approach • The function of semantic memory – The psychological problem of meaning – One approach to meaning • Solving the statistical problem of meaning – Maximum likelihood estimation – Bayesian statistics • Comparisons with Latent Semantic Analysis – Quantitative – Qualitative A Probabilistic Approach • The function of semantic memory – The psychological problem of meaning – One approach to meaning • Solving the statistical problem of meaning – Maximum likelihood estimation – Bayesian statistics • Comparisons with Latent Semantic Analysis – Quantitative – Qualitative The Function of Semantic Memory • To predict what concepts are likely to be needed in a context, and thereby ease their retrieval • Similar to rational accounts of categorization and memory (Anderson, 1990) • Same principle appears in semantic networks (Collins & Quillian, 1969; Collins & Loftus, 1975) The Psychological Problem of Meaning • Simply memorizing whole word-document co-occurrence matrix does not help • Generalization requires abstraction, and this abstraction identifies the nature of meaning • Specifying a generative model for documents allows inference and generalization One Approach to Meaning • Each document a mixture of topics • Each word chosen from a single topic • • from parameters from parameters One Approach to Meaning w P(w|z = 1) = f (1) HEART LOVE SOUL TEARS JOY SCIENTIFIC KNOWLEDGE WORK RESEARCH MATHEMATICS topic 1 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 w P(w|z = 2) = f (2) HEART LOVE SOUL TEARS JOY SCIENTIFIC KNOWLEDGE WORK RESEARCH MATHEMATICS topic 2 0.0 0.0 0.0 0.0 0.0 0.2 0.2 0.2 0.2 0.2 One Approach to Meaning Choose mixture weights for each document, generate “bag of words” q = {P(z = 1), P(z = 2)} {0, 1} {0.25, 0.75} MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART {0.5, 0.5} MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART {0.75, 0.25} WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL {1, 0} TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY One Approach to Meaning q z • Generative model for co-occurrence data • Introduced by Blei, Ng, and Jordan (2002) • Clarifies pLSI (Hofmann, 1999) w Matrix Interpretation normalized co-occurrence matrix = F topics C topics words words documents mixture components documents Q mixture weights A form of non-negative matrix factorization Matrix Interpretation = Q vectors = U vectors C words words vectors D vectors documents F documents topics C topics words words documents documents VT The Function of Semantic Memory • Prediction of needed concepts aids retrieval • Generalization aided by a generative model • One generative model: mixtures of topics • Gives non-negative, non-orthogonal factorization of word-document co-occurrence matrix A Probabilistic Approach • The function of semantic memory – The psychological problem of meaning – One approach to meaning • Solving the statistical problem of meaning – Maximum likelihood estimation – Bayesian statistics • Comparisons with Latent Semantic Analysis – Quantitative – Qualitative The Statistical Problem of Meaning • Generating data from parameters easy • Learning parameters from data is hard • Two approaches to this problem – Maximum likelihood estimation – Bayesian statistics Inverting the Generative Model • Maximum likelihood estimation WT + DT parameters • Variational EM (Blei, Ng & Jordan, 2002) WT + T parameters • Bayesian inference 0 parameters Bayesian Inference • Sum in the denominator over Tn terms • Full posterior only tractable to a constant Markov Chain Monte Carlo • Sample from a Markov chain which converges to target distribution • Allows sampling from an unnormalized posterior distribution • Can compute approximate statistics from intractable distributions (MacKay, 2002) Gibbs Sampling For variables x1, x2, …, xn Draw xi(t) from P(xi|x-i) x-i = x1(t), x2(t),…, xi-1(t), xi+1(t-1), …, xn(t-1) Gibbs Sampling (MacKay, 2002) Gibbs Sampling • Need full conditional distributions for variables • Since we only sample z we need number of times word w assigned to topic j number of times topic j used in document d Gibbs Sampling iteration 1 i wi di zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 ? Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 ? Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 ? Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 ? Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 ? Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 1 ? Gibbs Sampling iteration 1 2 i wi di zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 1 2 ? Gibbs Sampling iteration 1 2 … 1000 i wi di zi zi zi 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 1 2 2 2 2 1 2 2 1 2 . . . 1 2 2 2 1 2 2 2 1 2 2 2 2 . . . 1 … A Visual Example: Bars sample each pixel from a mixture of topics pixel = word image = document A Visual Example: Bars From 1000 Images Interpretable Decomposition • SVD gives a basis for the data, but not an interpretable one • The true basis is not orthogonal, so rotation does no good Application to Corpus Data • TASA corpus: text from first grade to college • Vocabulary of 26414 words • Set of 36999 documents • Approximately 6 million words in corpus A Selection of Topics THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIFIC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERVED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THEORIES BELIEVED DISCOVERED OBSERVE FACTS SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAUTS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES ATMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT SATURN MILES ART STUDENTS PAINT TEACHER ARTIST STUDENT PAINTING TEACHERS PAINTED TEACHING ARTISTS CLASS MUSEUM CLASSROOM WORK SCHOOL PAINTINGS LEARNING STYLE PUPILS PICTURES CONTENT WORKS INSTRUCTION OWN TAUGHT SCULPTURE GROUP PAINTER GRADE ARTS SHOULD BEAUTIFUL GRADES DESIGNS CLASSES PORTRAIT PUPIL PAINTERS GIVEN BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY SMELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPINAL FIBERS SENSORY PAIN IS CURRENT ELECTRICITY ELECTRIC CIRCUIT IS ELECTRICAL VOLTAGE FLOW BATTERY WIRE WIRES SWITCH CONNECTED ELECTRONS RESISTANCE POWER CONDUCTORS CIRCUITS TUBE NEGATIVE NATURE WORLD HUMAN PHILOSOPHY MORAL KNOWLEDGE THOUGHT REASON SENSE OUR TRUTH NATURAL EXISTENCE BEING LIFE MIND ARISTOTLE BELIEVED EXPERIENCE REALITY THIRD FIRST SECOND THREE FOURTH FOUR GRADE TWO FIFTH SEVENTH SIXTH EIGHTH HALF SEVEN SIX SINGLE NINTH END TENTH ANOTHER A Selection of Topics JOB SCIENCE BALL FIELD STORY MIND DISEASE WATER WORK STUDY GAME MAGNETIC STORIES WORLD BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET TELL DREAM DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE CHARACTER DREAMS GERMS SWIM KNOWLEDGE BASEBALL EXPERIENCE NEEDLE THOUGHT CHARACTERS FEVER SWIMMING WORK PLAYERS EMPLOYMENT CURRENT AUTHOR IMAGINATION CAUSE POOL OPPORTUNITIES RESEARCH PLAY COIL READ MOMENT CAUSED LIKE WORKING CHEMISTRY FIELD POLES TOLD THOUGHTS SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING OWN VIRUSES SHARK SKILLS MANY BASKETBALL COMPASS TALES REAL INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT LIFE VIRUS SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING IMAGINE MICROORGANISMS SHARKS FIND FIELD PLAYING ELECTRIC SHORT SENSE PERSON DIVING POSITION PHYSICS HIT DIRECTION INFECTIOUS DOLPHINS CONSCIOUSNESS FICTION FIELD LABORATORY TENNIS FORCE ACTION STRANGE COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS TRUE FEELING CAUSING LONG REQUIRE WORLD GAMES BE EVENTS WHOLE SMALLPOX SEAL OPPORTUNITY SPORTS MAGNETISM SCIENTIST TELLS BEING BODY DIVE EARN STUDYING BAT POLE TALE MIGHT INFECTIONS DOLPHIN ABLE SCIENCES TERRY INDUCED NOVEL HOPE CERTAIN UNDERWATER A Selection of Topics JOB SCIENCE BALL FIELD STORY MIND DISEASE WATER WORK STUDY GAME MAGNETIC STORIES WORLD BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET TELL DREAM DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE CHARACTER DREAMS GERMS SWIM KNOWLEDGE BASEBALL EXPERIENCE NEEDLE THOUGHT CHARACTERS FEVER SWIMMING WORK PLAYERS EMPLOYMENT CURRENT AUTHOR IMAGINATION CAUSE POOL OPPORTUNITIES RESEARCH PLAY COIL READ MOMENT CAUSED LIKE WORKING CHEMISTRY FIELD POLES TOLD THOUGHTS SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING OWN VIRUSES SHARK SKILLS MANY BASKETBALL COMPASS TALES REAL INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT LIFE VIRUS SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING IMAGINE MICROORGANISMS SHARKS FIND FIELD PLAYING ELECTRIC SHORT SENSE PERSON DIVING POSITION PHYSICS HIT DIRECTION INFECTIOUS DOLPHINS CONSCIOUSNESS FICTION FIELD LABORATORY TENNIS FORCE ACTION STRANGE COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS TRUE FEELING CAUSING LONG REQUIRE WORLD GAMES BE EVENTS WHOLE SMALLPOX SEAL OPPORTUNITY SPORTS MAGNETISM SCIENTIST TELLS BEING BODY DIVE EARN STUDYING BAT POLE TALE MIGHT INFECTIONS DOLPHIN ABLE SCIENCES TERRY INDUCED NOVEL HOPE CERTAIN UNDERWATER A Probabilistic Approach • The function of semantic memory – The psychological problem of meaning – One approach to meaning • Solving the statistical problem of meaning – Maximum likelihood estimation – Bayesian statistics • Comparisons with Latent Semantic Analysis – Quantitative – Qualitative Probabilistic Queries • can be computed in different ways • Fixed topic assumption: • Multiple samples: Quantitative Comparisons • Two types of task – general semantic tasks: dictionary, thesaurus – prediction of memory data • All tests use LSA with 400 vectors, and a probabilistic model with 100 samples each using 500 topics Fill in the Blank • 12856 sentences extracted from WordNet his cold deprived him of his sense of _ silence broken by dogs barking _ a _ hybrid accent • Overall performance – LSA gives median rank of 3393 – Probabilistic model gives median rank of 3344 Fill in the Blank Synonyms • 280 sets of five synonyms from WordNet, ordered by number of senses BREAK (78) EXPOSE (9) CUT (72) REDUCE (19) RUN (53) GO (34) DISCOVER (8) CONTRACT (12) WORK (25) DECLARE (7) SHORTEN (5) ABRIDGE (1) FUNCTION (9) • Two tasks: – Predict first synonym – Predict last synonym • Increasing number of synonyms REVEAL (3) OPERATE (7) First Synonym Last Synonym Synonyms and Word Frequency Synonyms and Word Frequency Probabilistic LSA Synonyms and Word Frequency Probabilistic LSA Word Frequency and Filling Blanks Probabilistic LSA Performance on Semantic Tasks • Performance comparable, neither great • Difference in effects of word frequency due to treatment of co-occurrence data • Probabilistic approach useful in addressing psychological data: frequency important Intrusions in Free Recall CHAIR FOOD DESK TOP LEG EAT CLOTH DISH WOOD DINNER MARBLE TENNIS • Intrusion rates from Deese (1959) • Used average word vectors in LSA, P(word|list) in probabilistic model • Favors LSA, since probabilistic combination can be multimodal Intrusions in Free Recall Intrusions in Free Recall models word frequency Word Frequency is Not Enough • An explanation needs to address two questions: – Why do these words intrude? – Why do other words not intrude? Word Frequency is Not Enough • An explanation needs to address two questions: – Why do these words intrude? – Why do other words not intrude? • Median word frequency rank: 1698.5 • Median rank in model: 21 Word Association • Word association norms from Nelson et al. (1998) PLANETS associate number people model 1 2 3 4 5 6 7 8 EARTH STARS SPACE SUN MARS UNIVERSE SATURN GALAXY STARS STAR SUN EARTH SPACE SKY PLANET UNIVERSE Word Association Performance on Memory Tasks • Outperforms LSA on simple memory tasks, both far better at predicting memory data • Improvement due to role of word frequency • Not a complete account, but can form a part of more complex memory models Qualitative Comparisons • Naturally deals with complications for LSA – Polysemy – Asymmetry • Respects natural statistics of language • Easily extends to other models of meaning Beyond the Bag of Words q z z z w w w Beyond the Bag of Words q q z z z z z z w w w w w w s s s Semantic categories MAP FOOD NORTH FOODS EARTH BODY SOUTH NUTRIENTS POLE DIET MAPS FAT EQUATOR SUGAR WEST ENERGY LINES MILK EAST EATING AUSTRALIA FRUITS GLOBE VEGETABLES POLES WEIGHT HEMISPHERE FATS LATITUDE NEEDS CARBOHYDRATES PLACES LAND VITAMINS WORLD CALORIES COMPASS PROTEIN CONTINENTS MINERALS GOLD CELLS BEHAVIOR DOCTOR BOOK IRON CELL SELF PATIENT BOOKS SILVER ORGANISMS INDIVIDUAL HEALTH READING ALGAE PERSONALITY HOSPITAL INFORMATION COPPER METAL BACTERIA RESPONSE MEDICAL LIBRARY METALS MICROSCOPE SOCIAL CARE REPORT STEEL MEMBRANE EMOTIONAL PATIENTS PAGE CLAY ORGANISM LEARNING NURSE TITLE LEAD FOOD FEELINGS DOCTORS SUBJECT ADAM LIVING PSYCHOLOGISTS MEDICINE PAGES ORE FUNGI INDIVIDUALS NURSING GUIDE ALUMINUM PSYCHOLOGICAL MOLD TREATMENT WORDS MINERAL EXPERIENCES MATERIALS NURSES MATERIAL MINE NUCLEUS ENVIRONMENT PHYSICIAN ARTICLE STONE CELLED HUMAN HOSPITALS ARTICLES MINERALS STRUCTURES RESPONSES DR WORD POT MATERIAL BEHAVIORS SICK FACTS MINING STRUCTURE ATTITUDES ASSISTANT AUTHOR MINERS GREEN PSYCHOLOGY EMERGENCY REFERENCE TIN MOLDS PERSON PRACTICE NOTE PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS ROOT POLLEN GROWING GROW Syntactic categories SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SHOWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGHED MEANT WROTE SHOWED BELIEVED WHISPERED THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN THAT NEW THOSE EACH MR ANY MRS ALL MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREATER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOMETHING BIGGER FEWER LOWER ALMOST ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST ACROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOVE DOWN BEFORE GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE * BIG LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMON WHITE SINGLE CERTAIN ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVERY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGHT HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCIENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EVERYBODY SOME THEN BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP GIVE LOOK COME WORK MOVE LIVE EAT BECOME Sentence generation RESEARCH: [S] THE CHIEF WICKED SELECTION OF RESEARCH IN THE BIG MONTHS [S] EXPLANATIONS [S] IN THE PHYSICISTS EXPERIMENTS [S] HE MUST QUIT THE USE OF THE CONCLUSIONS [S] ASTRONOMY PEERED UPON YOUR SCIENTISTS DOOR [S] ANATOMY ESTABLISHED WITH PRINCIPLES EXPECTED IN BIOLOGY [S] ONCE BUT KNOWLEDGE MAY GROW [S] HE DECIDED THE MODERATE SCIENCE LANGUAGE: [S] RESEARCHERS GIVE THE SPEECH [S] THE SOUND FEEL NO LISTENERS [S] WHICH WAS TO BE MEANING [S] HER VOCABULARIES STOPPED WORDS [S] HE EXPRESSLY WANTED THAT BETTER VOWEL Sentence generation LAW: [S] BUT THE CRIME HAD BEEN SEVERELY POLITE OR CONFUSED [S] CUSTODY ON ENFORCEMENT RIGHTS IS PLENTIFUL CLOTHING: [S] WEALTHY COTTON PORTFOLIO WAS OUT OF ALL SMALL SUITS [S] HE IS CONNECTING SNEAKERS [S] THUS CLOTHING ARE THOSE OF CORDUROY [S] THE FIRST AMOUNTS OF FASHION IN THE SKIRT [S] GET TIGHT TO GET THE EXTENT OF THE BELTS [S] ANY WARDROBE CHOOSES TWO SHOES THE ARTS: [S] SHE INFURIATED THE MUSIC [S] ACTORS WILL MANAGE FLOATING FOR JOY [S] THEY ARE A SCENE AWAY WITH MY THINKER [S] IT MEANS A CONCLUSION Conclusion Taking a probabilistic approach can clarify some of the central issues in semantic representation – Motivates sensitivity to co-occurrence statistics – Identifies how co-occurrence data should be used – Allows the role of meaning to be specified exactly, and finds a meaningful decomposition of language Probabilities and Inner Products • Single word: w F • List of words: Model Selection • How many topics does a language contain? • Major issue for parametric models • Not so much for non-parametric models – Dirichlet process mixtures – Expect more topics than tractable – Choice of number is choice of scale Gibbs Sampling and EM • How many topics does a language contain? • EM finds fixed set of topics, single estimate • Sampling allows for multiple sets of topics, and multimodal posterior distributions Natural Statistics • Treating co-occurrence data as frequencies preserves the natural statistics of language • Word frequency • Zipf’s Law of Meaning Natural Statistics Natural Statistics Natural Statistics Word Association CROWN people model KING JEWEL QUEEN HEAD HAT TOP ROYAL THRONE KING TEETH HAIR TOOTH ENGLAND MOUTH QUEEN PRINCE Word Association SANTA people model CHRISTMAS TOYS LIE MEXICO SPANISH CALIFORNIA