Problem sets • Late policy (5% off per day, but the weekend counts as only one day). E.g., – – – – Friday: -5% Monday: -15% Tuesday: -20% Thursday: -30% Outline • Final thoughts on hierarchical Bayesian models and MCMC • Bayesian classification • Bayesian concept learning MCMC methods • Gibbs sampling – Factorize hypotheses h = <h1, h2, …, hn> – Cycle through variables h1, h2, …, hn – Draw hi(t+1) from P(hi|h-i, evidence) • Metropolis-Hastings – Propose changes to hypothesis from some distribution Q(h(t+1)| h(t)) – Accept proposals with probability A(h(t+1)| h(t)) (t+1)|evidence) Q(h(t)| h(t +1)) P(h = min{ 1, } (t) (t+1) (t) P(h |evidence) Q(h | h ) Why MCMC is important • Simple • Can be used with just about any kind of probabilistic model, including complex hierarchical structures • Always works pretty well, if you’re willing to wait a long time (cf. Back-propagation for neural networks.) A model for cognitive development? • Some features of cognitive development: – – – – – Small, random, dumb, local steps Takes a long time Can get stuck in plateaus or stages “Two steps forward, one step back” Over time, intuitive theories get consistently better (more veridical, more powerful, broader scope). – Everyone reaches basically the same state (though some take longer than others). Topic models of semantic structure: e.g., Latent Dirichlet Allocation (Blei, Ng, Jordan) – Each document in a corpus is associated with a distribution θ over topics. – Each topic t is associated with a distribution φ(t) over words. Image removed due to copyright considerations. Please see: Blei, David, Andrew Ng, and Michael Jordan. "Latent Dirichlet Allocation." Journal of Machine Learning Research 3 (Jan 2003): 993-1022. Choose mixture weights for each document, generate “bag of words” θ = {P(z = 1), P(z = 2)} {0, 1} {0.25, 0.75} MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART {0.5, 0.5} MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART {0.75, 0.25} WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL {1, 0} TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY Gibbs sampling iteration 1 i 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 wi MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY di 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 zi 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 Gibbs sampling iteration 1 2 i 1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 wi MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY di 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 zi 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 zi ? A selection of topics (TASA) JOB SCIENCE BALL FIELD STORY MIND DISEASE WATER WORK STUDY GAME MAGNETIC STORIES WORLD BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET TELL DREAM DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE CHARACTER DREAMS GERMS SWIM KNOWLEDGE BASEBALL EXPERIENCE NEEDLE THOUGHT CHARACTERS FEVER SWIMMING WORK PLAYERS EMPLOYMENT CURRENT AUTHOR IMAGINATION CAUSE POOL OPPORTUNITIES RESEARCH PLAY COIL READ MOMENT CAUSED LIKE WORKING CHEMISTRY FIELD POLES TOLD THOUGHTS SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING OWN VIRUSES SHARK SKILLS MANY BASKETBALL COMPASS TALES REAL INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT LIFE VIRUS SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING IMAGINE MICROORGANISMS SHARKS FIND FIELD PLAYING ELECTRIC SHORT SENSE PERSON DIVING POSITION PHYSICS HIT DIRECTION INFECTIOUS DOLPHINS CONSCIOUSNESS FICTION FIELD LABORATORY TENNIS FORCE ACTION STRANGE COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS TRUE FEELING CAUSING LONG REQUIRE WORLD GAMES BE EVENTS WHOLE SMALLPOX SEAL OPPORTUNITY SPORTS MAGNETISM SCIENTIST TELLS BEING BODY DIVE EARN STUDYING BAT POLE TALE MIGHT INFECTIONS DOLPHIN ABLE SCIENCES TERRY INDUCED NOVEL HOPE CERTAIN UNDERWATER A selection of topics (TASA) JOB BALL SCIENCE FIELD STORY MIND DISEASE WATER WORK STUDY GAME MAGNETIC STORIES WORLD BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET TELL DREAM DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE CHARACTER DREAMS GERMS SWIM KNOWLEDGE BASEBALL EXPERIENCE NEEDLE THOUGHT CHARACTERS FEVER SWIMMING WORK PLAYERS EMPLOYMENT CURRENT AUTHOR IMAGINATION CAUSE POOL OPPORTUNITIES RESEARCH PLAY COIL READ MOMENT CAUSED LIKE WORKING CHEMISTRY FIELD POLES TOLD THOUGHTS SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING OWN VIRUSES SHARK SKILLS MANY BASKETBALL COMPASS TALES REAL INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT LIFE VIRUS SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING IMAGINE MICROORGANISMS SHARKS FIND FIELD PLAYING ELECTRIC SHORT SENSE PERSON DIVING POSITION PHYSICS HIT DIRECTION INFECTIOUS DOLPHINS CONSCIOUSNESS FICTION FIELD LABORATORY TENNIS FORCE ACTION STRANGE COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS TRUE FEELING CAUSING LONG REQUIRE WORLD GAMES BE EVENTS WHOLE SMALLPOX SEAL OPPORTUNITY SPORTS MAGNETISM SCIENTIST TELLS BEING BODY DIVE EARN STUDYING BAT POLE TALE MIGHT INFECTIONS DOLPHIN ABLE SCIENCES TERRY INDUCED NOVEL HOPE CERTAIN UNDERWATER The14 “shape7” of4 a23 female115 mating115 preference125 is32 the14 relationship7 between4 a23 male115 trait15 and37 the14 probability7 of4 acceptance21 as43 a23 mating115 partner20, The14 shape7 of4 preferences115 is32 important49 in5 many39 models6 of4 sexual115 selection46, mate115 recognition125, communication9, and37 speciation46, yet50 it41 has18 rarely19 been33 measured17 precisely19, Here12 I9 examine34 preference7 shape7 for5 male115 calling115 song125 in22 a23 bushcricket*13 (katydid*48). Preferences115 change46 dramatically19 between22 races46 of4 a23 species15, from22 strongly19 directional11 to31 broadly19 stabilizing45 (but50 with21 a23 net49 directional46 effect46), Preference115 shape46 generally19 matches10 the14 distribution16 of4 the14 male115 trait15, This41 is32 compatible29 with21 a23 coevolutionary46 model20 of4 signal9-preference115 evolution46, although50 it41 does33 not37 rule20 out17 an23 alternative11 model20, sensory125 exploitation150. Preference46 shapes40 are8 shown35 to31 be44 genetic11 in5 origin7. (graylevel = membership in topic 115) Ritchie, Michael G. "The Shape of Female Mating Preferences." PNAS 93 (1996): 14628-14631. Copyright 1996. Courtesy of the National Academy of Sciences, U.S.A. Used with permission. The14 “shape7” of4 a23 female115 mating115 preference125 is32 the14 relationship7 between4 a23 male115 trait15 and37 the14 probability7 of4 acceptance21 as43 a23 mating115 partner20, The14 shape7 of4 preferences115 is32 important49 in5 many39 models6 of4 sexual115 selection46, mate115 recognition125, communication9, and37 speciation46, yet50 it41 has18 rarely19 been33 measured17 precisely19, Here12 I9 examine34 preference7 shape7 for5 male115 calling115 song125 in22 a23 bushcricket*13 (katydid*48). Preferences115 change46 dramatically19 between22 races46 of4 a23 species15, from22 strongly19 directional11 to31 broadly19 stabilizing45 (but50 with21 a23 net49 directional46 effect46), Preference115 shape46 generally19 matches10 the14 distribution16 of4 the14 male115 trait15, This41 is32 compatible29 with21 a23 coevolutionary46 model20 of4 signal9-preference115 evolution46, although50 it41 does33 not37 rule20 out17 an23 alternative11 model20, sensory125 exploitation150. Preference46 shapes40 are8 shown35 to31 be44 genetic11 in5 origin7. (graylevel = membership in topic 115, 46) Ritchie, Michael G. "The Shape of Female Mating Preferences." PNAS 93 (1996): 14628-14631. Copyright 1996. Courtesy of the National Academy of Sciences, U.S.A. Used with permission. The14 “shape7” of4 a23 female115 mating115 preference125 is32 the14 relationship7 between4 a23 male115 trait15 and37 the14 probability7 of4 acceptance21 as43 a23 mating115 partner20, The14 shape7 of4 preferences115 is32 important49 in5 many39 models6 of4 sexual115 selection46, mate115 recognition125, communication9, and37 speciation46, yet50 it41 has18 rarely19 been33 measured17 precisely19, Here12 I9 examine34 preference7 shape7 for5 male115 calling115 song125 in22 a23 bushcricket*13 (katydid*48). Preferences115 change46 dramatically19 between22 races46 of4 a23 species15, from22 strongly19 directional11 to31 broadly19 stabilizing45 (but50 with21 a23 net49 directional46 effect46), Preference115 shape46 generally19 matches10 the14 distribution16 of4 the14 male115 trait15, This41 is32 compatible29 with21 a23 coevolutionary46 model20 of4 signal9-preference115 evolution46, although50 it41 does33 not37 rule20 out17 an23 alternative11 model20, sensory125 exploitation150. Preference46 shapes40 are8 shown35 to31 be44 genetic11 in5 origin7. (graylevel = membership in topic 115, 46, 125) Ritchie, Michael G. "The Shape of Female Mating Preferences." PNAS 93 (1996): 14628-14631. Copyright 1996. Courtesy of the National Academy of Sciences, U.S.A. Used with permission. Joint models of syntax and semantics (Griffiths, Steyvers, Blei & Tenenbaum, NIPS 2004) • Embed topics model inside an nth order Hidden Markov Model: Image removed due to copyright considerations. Please see: Griffiths, T. L., M. Steyvers, D. M. Blei, and J. B. Tenenbaum. "Integrating Topics and Syntax." Advances in Neural Information Processing Systems 17 (2005). Image removed due to copyright considerations. Please see: Griffiths, T. L., M. Steyvers, D. M. Blei, and J. B. Tenenbaum. "Integrating Topics and Syntax." Advances in Neural Information Processing Systems 17 (2005). Semantic classes MAP FOOD NORTH FOODS EARTH BODY SOUTH NUTRIENTS POLE DIET MAPS FAT EQUATOR SUGAR WEST ENERGY LINES MILK EAST EATING AUSTRALIA FRUITS GLOBE VEGETABLES POLES WEIGHT HEMISPHERE FATS LATITUDE NEEDS CARBOHYDRATES PLACES LAND VITAMINS WORLD CALORIES COMPASS PROTEIN CONTINENTS MINERALS GOLD CELLS BEHAVIOR DOCTOR BOOK IRON CELL SELF PATIENT BOOKS SILVER ORGANISMS INDIVIDUAL HEALTH READING ALGAE PERSONALITY HOSPITAL INFORMATION COPPER METAL BACTERIA RESPONSE MEDICAL LIBRARY METALS MICROSCOPE SOCIAL CARE REPORT STEEL MEMBRANE EMOTIONAL PATIENTS PAGE CLAY ORGANISM LEARNING NURSE TITLE LEAD FOOD FEELINGS DOCTORS SUBJECT ADAM LIVING PSYCHOLOGISTS MEDICINE PAGES ORE FUNGI INDIVIDUALS NURSING GUIDE ALUMINUM PSYCHOLOGICAL MOLD TREATMENT WORDS MINERAL MATERIALS EXPERIENCES NURSES MATERIAL MINE NUCLEUS ENVIRONMENT PHYSICIAN ARTICLE STONE CELLED HUMAN HOSPITALS ARTICLES MINERALS STRUCTURES RESPONSES DR WORD POT MATERIAL BEHAVIORS SICK FACTS MINING STRUCTURE ATTITUDES ASSISTANT AUTHOR MINERS GREEN PSYCHOLOGY EMERGENCY REFERENCE TIN MOLDS PERSON PRACTICE NOTE PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS ROOT POLLEN GROWING GROW Image removed due to copyright considerations. Please see: Griffiths, T. L., M. Steyvers, D. M. Blei, and J. B. Tenenbaum. "Integrating Topics and Syntax." Advances in Neural Information Processing Systems 17 (2005). Syntactic classes SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SHOWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGHED MEANT WROTE SHOWED BELIEVED WHISPERED THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN THAT NEW THOSE EACH MR ANY MRS ALL MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREATER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOMETHING BIGGER FEWER LOWER ALMOST ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST ACROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOVE DOWN BEFORE GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE * BIG LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMON WHITE SINGLE CERTAIN ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVERY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGHT HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCIENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EVERYBODY SOME THEN BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP GIVE LOOK COME WORK MOVE LIVE EAT BECOME Corpus-specific factorization (NIPS) Image removed due to copyright considerations. Please see: Griffiths, T. L., M. Steyvers, D. M. Blei, and J. B. Tenenbaum. "Integrating Topics and Syntax." Advances in Neural Information Processing Systems 17 (2005). Syntactic classes in PNAS 5 IN FOR ON BETWEEN DURING AMONG FROM UNDER WITHIN THROUGHOUT THROUGH TOWARD INTO AT INVOLVING AFTER ACROSS AGAINST WHEN ALONG 8 ARE WERE WAS IS WHEN REMAIN REMAINS REMAINED PREVIOUSLY BECOME BECAME BEING BUT GIVE MERE APPEARED APPEAR ALLOWED NORMALLY EACH 14 THE THIS ITS THEIR AN EACH ONE ANY INCREASED EXOGENOUS OUR RECOMBINANT ENDOGENOUS TOTAL PURIFIED TILE FULL CHRONIC ANOTHER EXCESS 25 26 30 SUGGEST LEVELS RESULTS INDICATE NUMBER ANALYSIS SUGGESTING LEVEL DATA SUGGESTS RATE STUDIES SHOWED TIME STUDY REVEALED CONCENTRATIONS FINDINGS SHOW VARIETY EXPERIMENTS DEMONSTRATE RANGE OBSERVATIONS INDICATING CONCENTRATION HYPOTHESIS PROVIDE DOSE ANALYSES SUPPORT FAMILY ASSAYS INDICATES SET POSSIBILITY PROVIDES FREQUENCY MICROSCOPY INDICATED SERIES PAPER DEMONSTRATED AMOUNTS WORK SHOWS RATES EVIDENCE SO CLASS FINDING REVEAL VALUES MUTAGENESIS DEMONSTRATES AMOUNT OBSERVATION SUGGESTED SITES MEASUREMENTS 33 BEEN MAY CAN COULD WELL DID DOES DO MIGHT SHOULD WILL WOULD MUST CANNOT REMAINED ALSO THEY BECOME MAG LIKELY Semantic highlighting Darker words are more likely to have been generated from the topic-based “semantics” module: Outline • Final thoughts on hierarchical Bayesian models and MCMC • Bayesian classification • Bayesian concept learning Concepts and categories • A category is a set of objects that are treated equivalently for some purpose. • A concept is a mental representation of the category. • Functions for concepts: – – – – – Categorization/classification Prediction Inductive generalization Explanation Reference in communication and thought • Classical view of concepts (1950’s-1960’s): Concepts are rules or symbolic representations for classifying. • Examples – Psychology: Bruner et al. "Striped and Three Borders": Conjunctive Concept Figure by MIT OCW. • Classical view of concepts (1950’s-1960’s): Concepts are rules or symbolic representations • Examples – AI: Winston’s arch learner Image removed due to copyright considerations. Please see: Winston, P. H., ed. The Psychology of Computer Vision. New York, NY: McGaw-Hill, 1975. ISBN: 0070710481. http://www.rci.rutgers.edu/~cfs/472_html/Learn/LearnGifs/ArchExSeq.gif __________________________________________________________________________________________ • Statistical view of concepts (1960’s-1970’s) • Examples – Machine learning/statistics: Iris classification Images removed due to copyright considerations. • Standard version (1960’s-1970’s): Concepts are statistical representations for classifying. • Examples – Psychology: Posner and Keele Image removed due to copyright considerations. Please see: Posner, M. I., and S. W. Keele. "On the Genesis of Abstract Ideas." Journal of Experimental Psychology 77 (1968): 353-363. Different levels of random distortion: Images removed due to copyright considerations. Statistical pattern recognition Two-class classification problem: Images removed due to copyright considerations. The task: Given an object generated from class 1 or class 2, infer the generating class. Formalizing two-class classification: Images removed due to copyright considerations. The task: Observe x generated from c1 or c2, compute: p ( x | c1 ) p (c1 ) p (c1 | x) = p ( x | c1 ) p (c1 ) + p ( x | c2 ) p (c2 ) Different approaches vary in how they represent p(x|cj). Parametric approach • Assume a simple canonical form for p(x|cj). • E.g., Gaussian distributions: Images removed due to copyright considerations. Parametric approach • Assume a simple canonical form for p(x|cj). • The simplest Gaussians have all dimensions independent, variances equal for all classes: – Classification based on distance to means. – Covariance ellipse determines the distance metric. Parametric approach • Assume a simple canonical form for p(x|cj). • The simplest Gaussians have all dimensions independent, variances equal for all classes: – Bayes net representation: C x1 “naïve Bayes” x2 p ( x | c j ) = p ( x1 | c j ) × p ( x2 | c j ) p ( xi | c j ) ∝ e − ( xi − µ ij ) 2 /( 2σ i2 ) Parametric approach • Other possible forms: – All dimensions independent with variances equal across dimensions and classes: p ( x | c j ) = p ( x1 | c j ) × p ( x2 | c j ) p ( xi | c j ) ∝ e − ( xi − µij ) 2 /( 2σ 2 ) C x1 “naïve Bayes” x2 Parametric approach • Other possible forms: – All dimensions independent with equal variances, but variances differ across classes: p ( x | c j ) = p ( x1 | c j ) × p ( x2 | c j ) p ( xi | c j ) ∝ e − ( xi − µij ) 2 /( 2σ 2j ) C x1 “naïve Bayes” x2 Parametric approach • Other possible forms: – All dimensions independent, variances differ across dimensions and across classes: p ( x | c j ) = p ( x1 | c j ) × p ( x2 | c j ) p ( xi | c j ) ∝ e − ( xi − µij ) 2 /( 2σ ij2 ) C x1 “naïve Bayes” x2 Parametric approach • Other possible forms: – Arbitrary covariance matrices for each class. Board formula C x = {x1, x2} Parametric approach • Assume a simple canonical form for p(x|cj). • The simplest Gaussians have all dimensions independent, variances equal for all classes: – Bayes net representation: C x1 “naïve Bayes” x2 p ( x | c j ) = p ( x1 | c j ) × p ( x2 | c j ) p ( xi | c j ) ∝ e − ( xi − µ ij ) 2 /( 2σ i2 ) Learning • Hypothesis space of possible Gaussians: Images removed due to copyright considerations. • Find parameters that maximize likelihood of examples. r – µ j = mean of examples of class j. – σ i = standard deviation along dimension i, for examples in each class. Relevance to human concept learning • Natural categories often have Gaussian (or other simple parametric forms) in perceptual feature spaces. • Prototype effects in categorization (Rosch) • Posner & Keele studies of prototype abstraction in concept learning. Posner and Keele: design Image removed due to copyright considerations. Please see: Posner, M. I., and S. W. Keele. "On the Genesis of Abstract Ideas." Journal of Experimental Psychology 77 (1968): 353-363. Posner and Keele: results Image removed due to copyright considerations. Please see: Posner, M. I., and S. W. Keele. "On the Genesis of Abstract Ideas." Journal of Experimental Psychology 77 (1968): 353-363. Unseen prototype (“Schema”) classified as well as memorized variants, and much better than new random variants (“5”). Parametric approach • Other possible forms: – All dimensions independent with variances equal across dimensions and classes: C x1 p ( x | c j ) = p ( x1 | c j ) × p ( x2 | c j ) p ( xi | c j ) ∝ e − ( xi − µij ) 2 /( 2σ 2 ) Equivalent to prototype model: r Prototype of class j: µ j = { µ1 j , µ2 j } Variability of categories: σ “naïve Bayes” x2 Limitations • Of this empirical paradigm? • Of this computational approach? Limitations • Is categorization just discrimination among mutually exclusive classes? – Overlapping concepts? Hierarchies? “None of the above”? Can we learn a single new concept? • How do we learn concepts from just a few positive examples? – Learning with high certainty from little data. – Schema abstraction from one imperfect example. • Are most categories Gaussian, or any simple parametric shape? – What about superordinate categories? – What about learning rule-based categories? Limitations • Is prototypicality = degree of membership? – Armstrong et al.: No, for classical rule-based categories – Not for complex real-world categories either: “Christmas eve”, “Hollywood actress”, “Californian”, “Professor” – For natural kinds, huge variability in prototypicality independent of membership. • Richer concepts? – Meaningful stimuli, background knowledge, theories? – Role of causal reasoning? “Essentialism”? • Difference between “perceptual” and “cognitive” concepts?