Some applications in human behavior modeling (and open questions) Jerry Zhu University of Wisconsin-Madison Simons Institute Workshop on Spectral Algorithms: From Theory to Practice Oct. 2014 1 / 30 Outline Human Memory Search Machine Teaching 2 / 30 Verbal fluency Say as many animals as you can without repeating in one minute. 3 / 30 Semantic “runs” 1. cow, horse, chicken, pig, elephant, lion, tiger, porcupine, gopher, rat, mouse, duck, goose, horse, bird, pelican, alligator, crocodile, iguana, goose 2. elephant, tiger, dog, cow, horse, sheep, cat, lynx, elk, moose, antelope, deer, tiger, wolverine, bobcat, mink, rabbit, wolf, coyote, fox, cow, zebra 3. cat, dog, horse, chicken, duck, cow, pig, gorilla, giraffe, tiger, lion, ostrich, elephant, squirrel, gopher, rat, mouse, gerbil, hamster, duck, goose 4. cat, dog, sheep, goat, elephant, tiger, dog, deer, lynx, wolf, mountain goat, bear, giraffe, moose, elk, hyena, aardvark, platypus, lion, skunk, wolverine, raccoon 5. dog, cat, leopard, elephant, monkey, sea lion, tiger, leopard, bird, squirrel, deer, antelope, snake, beaver, robin, panda, vulture 6. deer, muskrat, bear, fish, raccoon, zebra, elephant, giraffe, cat, dog, mouse, rat, bird, snake, lizard, lamb, hippopotamus, elephant, skunk, lion, tiger 4 / 30 Memory search K = 12, obj = 14.1208 human hippopotamus whale shark musky dolphin squid jaguar 1 porpoise walrus alligator baboon iguana muskox gorilla 0.5 ape 0 pony reindeer orangutan zebra manatee polar bear tiger elephant monkey cougar rhinoceros kangaroo sea lion panda moleseal ox puma pig cow cheetah otter lamblion bison aardvark wolf ocelot chipmunk armadillo rabbit wallaby moose dog raccoon elk skunk muskrat llamalemur wildebeest chimp prairie dog panther bear porcupine Guinea pig tasmanian devil wolverine goat bobcat hedgehog fox ram cat badger mongoose leopard platypus hyena yak dingo buffalo alpaca mule deer opossum impala horse donkey gazelle chinchilla rodent sloth wombat tapir coyote gerbil beaver koala racoon anteater lynx gibbon penguin jackrabbit vole bunny antelope mouse groundhog weasel sheep woodchuck jackal shrew caribou camel marten mare giraffe bull −0.5 hare colt emu ibis −1 eel boa onstrictor c crocodile crab cobra snake rattlesnake clam worm python tortoise gilamonster salamander fish chameleon frog anemone amphibian tadpole octopus snailcaterpillar lobster lizard cockroach ant centipede shrimp insect spider newt ladybug gecko wasp duck pelican hornet fly flea beetle hamstersquirrel gopher turtle gnat butterfly parrot swan roadrunner raven fowl rat woodpecker bat quail crow rooster loonflamingo oriole vulture ostrich wren sparrow owl heron pigeon buzzard gull seagull turkey crane dove bird bluejay canary cockatiel hawk chicken falcon eagle peacock robinchickadee grouse moth bee mosquito bumblebee grasshopper cricket cardinal hummingbird goose −1.5 −1 −0.5 0 0.5 1 1.5 5 / 30 Memory search K = 12, obj = 14.1208 human hippopotamus whale dolphin shark musky sq jaguar porpoise walrus alligator baboon iguana muskox boaconst crocodile cobra snake turtle rattlesnake manatee tiger gar otter monkey kangaroo sea lion puma pig moleseal aardvark ocelot chipmunk armadillo rabbit wallaby moose dog raccoon skunk muskrat llamalemur wildebeest prairie dog panther porcupine Guinea pig tasmanian devil wolverine goat hedgehog fox cat badger mongoose wolf gilamonster tortoise salama cha 6 / 30 Censored random walk [Abbott, Austerweil, Griffiths 2012] I V: n animal word types in English I P : (dense) n × n transition matrix I Censored random walk: observing only the first token of each type x1 , x2 , . . . , xt , . . . ⇒ a1 , . . . , an I (star example) 7 / 30 The estimation problem Givenn m censored random walks o (1) (1) (m) (m) D= a1 , ..., an , ..., a1 , ..., an , estimate P . 8 / 30 Each observed step is an absorbing random walk (with Kwang-Sung Jun) I P (ak+1 | a1 , . . . , ak ) may contain infinite latent steps I Instead, model this observed step as an absorbing random walk with absorbing states V \ {a1 , . . . , ak } Q R P ⇒ 0 I I 9 / 30 Maximum Likelihood I Fundamental matrix N = (I − Q)−1 , Nij is the expected number of visits to j before absorption when starting at i P p(ak+1 | a1 , . . . , ak ) = ki=1 Nki Ri1 P Pn (i) (i) (i) log likelihood m i=1 k=1 log p(ak+1 | a1 , ..., ak ) I Nonconvex, gradient method I I 10 / 30 Other estimators I PRW: pretend a1 , . . . , an not censored I PFirst2: Use only a1 , a2 in each walk (consistent) 11 / 30 Star graph x-axis: m, y-axis: kPb − P k2F 12 / 30 2D Grid 13 / 30 Erdös-Rényi with p = log(n)/n 14 / 30 Ring graph 15 / 30 Questions I Consistency? I Rate? 16 / 30 Outline Human Memory Search Machine Teaching 17 / 30 Learning a noiseless 1D threshold classifier θ∗ x ∼ uniform[0, 1] −1, x < θ∗ y= 1, x ≥ θ∗ Θ = {θ : 0 ≤ θ ≤ 1} 18 / 30 Learning a noiseless 1D threshold classifier θ∗ x ∼ uniform[0, 1] −1, x < θ∗ y= 1, x ≥ θ∗ Θ = {θ : 0 ≤ θ ≤ 1} Passive learning: iid 1. given training data D = (x1 , y1 ) . . . (xn , yn ) ∼ p(x, y) 2. finds θ̂ consistent with D 18 / 30 Learning a noiseless 1D threshold classifier θ∗ x ∼ uniform[0, 1] −1, x < θ∗ y= 1, x ≥ θ∗ Θ = {θ : 0 ≤ θ ≤ 1} Passive learning: iid 1. given training data D = (x1 , y1 ) . . . (xn , yn ) ∼ p(x, y) 2. finds θ̂ consistent with D Risk |θ̂ − θ∗ | = O(n−1 ) 18 / 30 Active learning (sequential experimental design) Binary search θ∗ 19 / 30 Active learning (sequential experimental design) Binary search θ∗ Risk |θ̂ − θ∗ | = O(2−n ) 19 / 30 Machine teaching What is the minimum training set a helpful teacher can construct? 20 / 30 Machine teaching What is the minimum training set a helpful teacher can construct? θ∗ Risk |θ̂ − θ∗ | = , ∀ > 0 20 / 30 Comparing the three θ∗ O(1/n) passive learning "waits" The teacher knows θ∗ { { θ∗ θ∗ O(1/2n) active learning "explores" teaching "guides" and the learning algorithm. 21 / 30 Example 2: Teaching a Gaussian distribution Given a training set x1 . . . xn ∈ Rd , let the learning algorithm be n µ̂ = 1X xi n i=1 n Σ̂ = 1 X (xi − µ̂)(xi − µ̂)> n−1 i=1 How to teach N (µ∗ , Σ∗ ) to the learner quickly? 22 / 30 Example 2: Teaching a Gaussian distribution Given a training set x1 . . . xn ∈ Rd , let the learning algorithm be n µ̂ = 1X xi n i=1 n Σ̂ = 1 X (xi − µ̂)(xi − µ̂)> n−1 i=1 How to teach N (µ∗ , Σ∗ ) to the learner quickly? non-iid, n = d + 1 22 / 30 An Optimization View of Machine Teaching A D A(D) ∗ θ −1 −1 ∗ Α Α (θ ) Θ D D min D∈D s.t. |D| A(D) = θ∗ The objective is the teaching dimension [Goldman, Kearns 1995] of θ∗ with respect to A, Θ. 23 / 30 Example 3: Works on Humans Human categorization on 1D stimuli [Patil, Z, Kopeć, Love 2014] human training set human test accuracy machine teaching 72.5% iid 69.8% (statistically significant) 24 / 30 New task: teaching humans how to label a graph Given: I a graph G = (V, E) I target labels y ∗ : V 7→ {−1, 1} a label-completion cognitive model A (graph diffusion algorithm) such as: I I I I I mincut harmonic function [Z, Gharahmani, Lafferty 2003] local global consistency [Zhou et al. 2004] ... Find the smallest seed set: min S⊆V s.t. |S| A(y ∗ (S)) = y ∗ (inverse problem of semi-supervised learning) 25 / 30 Example: A = local-global consistency [Zhou et al. 2004] F = (1 − α)(I − αD−1/2 W D−1/2 )−1 y ∗ (S) 1 y = sgn F −1 26 / 30 |A−1 (y ∗ )| is large Turns out 4649 out of 216 = 65536 training sets S lead to the target label completion 27 / 30 Optimal seed set 1: |S| = 3 28 / 30 Optimal seed set 2: |S| = 3 29 / 30 Questions I How does |S| relate to (spectral) properties of G? I How to solve for S? 30 / 30