CS194-10 Fall 2011 Introduction to Machine Learning Machine Learning: An Overview People Avital Steinitz 2nd year CS PhD student Stuart Russell 30th-year CS PhD student Mert Pilanci 2nd year EE PhD student Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 2 Administrative details • Web page • Newsgroup Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 3 Course outline • Overview of machine learning (today) • Classical supervised learning – Linear regression, perceptrons, neural nets, SVMs, decision trees, nearest neighbors, and all that – A little bit of theory, a lot of applications • Learning probabilistic models – – – – – – – Probabilistic classifiers (logistic regression, etc.) Unsupervised learning, density estimation, EM Bayes net learning Time series models Dimensionality reduction Gaussian process models Language models • Bandits and other exciting topics Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 4 Lecture outline • Goal: Provide a framework for understanding all the detailed content to come, and why it matters • Learning: why and how • Supervised learning – Classical: finding simple, accurate hypotheses – Probabilistic: finding likely hypotheses – Bayesian: updating belief in hypotheses • Data and applications • Expressiveness and cumulative learning • CTBT Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 5 Learning is…. … a computational process for improving performance based on experience Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 6 Learning: Why? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 7 Learning: Why? • The baby, assailed by eyes, ears, nose, skin, and entrails at once, feels it all as one great blooming, buzzing confusion … – [William James, 1890] Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 8 Learning: Why? • The baby, assailed by eyes, ears, nose, skin, and entrails at once, feels it all as one great blooming, buzzing confusion … – [William James, 1890] Learning is essential for unknown environments, i.e., when the designer lacks omniscience Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 9 Learning: Why? • Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain. Presumably the child brain is something like a notebook as one buys it from the stationer's. Rather little mechanism, and lots of blank sheets. – [Alan Turing, 1950] • Learning is useful as a system construction method, i.e., expose the system to reality rather than trying to write it down Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 10 Learning: How? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 11 Learning: How? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 12 Learning: How? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 13 Learning: How? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 14 Structure of a learning agent Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 15 Design of learning element • Key questions: – What is the agent design that will implement the desired performance? – Improve the performance of what piece of the agent system and how is that piece represented? – What data are available relevant to that piece? (In particular, do we know the right answers?) – What knowledge is already available? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 16 Examples Agent design Component Representation Feedback Knowledge Alpha-beta search Evaluation function Linear polynomial Win/loss Rules of game; Coefficient signs Logical planning agent Transition model Successor-state (observable envt) axioms Action outcomes Available actions; Argument types Utility-based patient monitor Physiology/senso r model Dynamic Observation Bayesian network sequences Gen physiology; Sensor design Satellite image pixel classifier Classifier (policy) Markov random field Coastline; Continuity scales Partial labels Supervised learning: correct answers for each training instance Reinforcement learning: reward sequence, no correct answers Unsupervised learning: “just make sense of the data” Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 17 Supervised learning • To learn an unknown target function f • Input: a training set of labeled examples (xj,yj) where yj = f(xj) • E.g., xj is an image, f(xj) is the label “giraffe” • E.g., xj is a seismic signal, f(xj) is the label “explosion” • Output: hypothesis h that is “close” to f, i.e., predicts well on unseen examples (“test set”) • Many possible hypothesis families for h – Linear models, logistic regression, neural networks, decision trees, examples (nearest-neighbor), grammars, kernelized separators, etc etc Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 18 Supervised learning • To learn an unknown target function f • Input: a training set of labeled examples where yj = f(xj) (xj,yj) • E.g., xj is an image, f(xj) is the label “giraffe” • E.g., xj is a seismic signal, f(xj) is the label “explosion” • Output: hypothesis h that is “close” to f, i.e., predicts well on unseen examples (“test set”) • Many possible hypothesis families for h – Linear models, logistic regression, neural networks, decision trees, examples (nearest-neighbor), grammars, kernelized separators, etc etc Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 19 Supervised learning • To learn an unknown target function f • Input: a training set of labeled examples where yj = f(xj) (xj,yj) • E.g., xj is an image, f(xj) is the label “giraffe” • E.g., xj is a seismic signal, f(xj) is the label “explosion” • Output: hypothesis h that is “close” to f, i.e., predicts well on unseen examples (“test set”) • Many possible hypothesis families for h – Linear models, logistic regression, neural networks, decision trees, examples (nearest-neighbor), grammars, kernelized separators, etc etc Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 20 Example: object recognition x f(x) giraffe Lecture 1 8/25/11 giraffe giraffe llama CS 194-10 Fall 2011, Stuart Russell llama llama 21 Example: object recognition x f(x) giraffe X= Lecture 1 8/25/11 giraffe giraffe llama llama llama f(x)=? CS 194-10 Fall 2011, Stuart Russell 22 Example: curve fitting Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 23 Example: curve fitting Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 24 Example: curve fitting Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 25 Example: curve fitting Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 26 Example: curve fitting Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 27 Basic questions • Which hypothesis space H to choose? • How to measure degree of fit? • How to trade off degree of fit vs. complexity? – “Ockham’s razor” • How do we find a good h? • How do we know if a good h will predict well? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 28 Philosophy of Science (Physics) • Which hypothesis space H to choose? – Deterministic hypotheses, usually mathematical formulas and/or logical sentences; implicit relevance determination • How to measure degree of fit? – Ideally, h will be consistent with data • How to trade off degree of fit vs. complexity? – Theory must be correct up to “experimental error” • How do we find a good h? – Intuition, imagination, inspiration (invent new terms!!) • How do we know if a good h will predict well? – Hume’s Problem of Induction: most philosophers give up Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 29 Kolmogorov complexity (also MDL, MML) • Which hypothesis space H to choose? – All Turing machines (or programs for a UTM) • How to measure degree of fit? – Fit is perfect (program has to output data exactly) • How to trade off degree of fit vs. complexity? – Minimize size of program • How do we find a good h? – Undecidable (unless we bound time complexity of h) • How do we know if a good h will predict well? – (recent theory borrowed from PAC learning) Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 30 Classical stats/ML: Minimize loss function • Which hypothesis space H to choose? – E.g., linear combinations of features: hw(x) = wTx • How to measure degree of fit? – Loss function, e.g., squared error Σj (yj – wTx)2 • How to trade off degree of fit vs. complexity? – Regularization: complexity penalty, e.g., ||w||2 • How do we find a good h? – Optimization (closed-form, numerical); discrete search • How do we know if a good h will predict well? – Try it and see (cross-validation, bootstrap, etc.) Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 31 Probabilistic: Max. likelihood, max. a priori • Which hypothesis space H to choose? – Probability model P(y | x,h) , e.g., Y ~ N(wTx,σ2) • How to measure degree of fit? – Data likelihood Πj P(yj | xj,h) • How to trade off degree of fit vs. complexity? – Regularization or prior: argmaxh P(h) Πj P(yj | xj,h) (MAP) • How do we find a good h? – Optimization (closed-form, numerical); discrete search • How do we know if a good h will predict well? – Empirical process theory (generalizes Chebyshev, CLT, PAC…); – Key assumption is (i)id Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 32 Bayesian: Computing posterior over H • Which hypothesis space H to choose? – All hypotheses with nonzero a priori probability • How to measure degree of fit? – Data probability, as for MLE/MAP • How to trade off degree of fit vs. complexity? – Use prior, as for MAP • How do we find a good h? – Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h) • How do we know if a good h will predict well? – Silly question! Bayesian prediction is optimal!! Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 33 Bayesian: Computing posterior over H • Which hypothesis space H to choose? – All hypotheses with nonzero a priori probability • How to measure degree of fit? – Data probability, as for MLE/MAP • How to trade off degree of fit vs. complexity? – Use prior, as for MAP • How do we find a good h? – Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h) • How do we know if a good h will predict well? – Silly question! Bayesian prediction is optimal!! Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 34 Neon sculpture at Autonomy Corp. Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 35 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 36 Lots of data • • • • • Web: estimated Google index 45 billion pages Clickstream data: 10-100 TB/day Transaction data: 5-50 TB/day Satellite image feeds: ~1TB/day/satellite Sensor networks/arrays – CERN Large Hadron Collider ~100 petabytes/day • Biological data: 1-10TB/day/sequencer • TV: 2TB/day/channel; YouTube 4TB/day uploaded • Digitized telephony: ~100 petabytes/day Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 37 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 38 Real data are messy Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 39 Arterial blood pressure (high/low/mean) 1s Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 40 Application: satellite image analysis Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 41 Application: Discovering DNA motifs ...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA... Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 42 Application: Discovering DNA motifs ...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA... Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 43 Application: User website behavior from clickstream data (from P. Smyth, UCI) 128.195.36.195, -, 3/22/00, 10:35:11, W3SVC, SRVR1, 128.200.39.181, 781, 363, 875, 200, 0, GET, /top.html, -, 128.195.36.195, -, 3/22/00, 10:35:16, W3SVC, SRVR1, 128.200.39.181, 5288, 524, 414, 200, 0, POST, /spt/main.html, -, 128.195.36.195, -, 3/22/00, 10:35:17, W3SVC, SRVR1, 128.200.39.181, 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.195.36.101, -, 3/22/00, 16:18:50, W3SVC, SRVR1, 128.200.39.181, 60, 425, 72, 304, 0, GET, /top.html, -, 128.195.36.101, -, 3/22/00, 16:18:58, W3SVC, SRVR1, 128.200.39.181, 8322, 527, 414, 200, 0, POST, /spt/main.html, -, 128.195.36.101, -, 3/22/00, 16:18:59, W3SVC, SRVR1, 128.200.39.181, 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.200.39.17, -, 3/22/00, 20:54:37, W3SVC, SRVR1, 128.200.39.181, 140, 199, 875, 200, 0, GET, /top.html, -, 128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 17766, 365, 414, 200, 0, POST, /spt/main.html, -, 128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.200.39.17, -, 3/22/00, 20:55:07, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 1061, 382, 414, 200, 0, POST, /spt/main.html, -, 128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.200.39.17, -, 3/22/00, 20:55:39, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.200.39.17, -, 3/22/00, 20:56:03, W3SVC, SRVR1, 128.200.39.181, 1081, 382, 414, 200, 0, POST, /spt/main.html, -, 128.200.39.17, -, 3/22/00, 20:56:04, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, 128.200.39.17, -, 3/22/00, 20:56:33, W3SVC, SRVR1, 128.200.39.181, 0, 262, 72, 304, 0, GET, /top.html, -, 128.200.39.17, -, 3/22/00, 20:56:52, W3SVC, SRVR1, 128.200.39.181, 19598, 382, 414, 200, 0, POST, /spt/main.html, -, User 1 User 2 User 3 User 4 User 5 … Lecture 1 8/25/11 2 3 7 1 5 3 3 7 5 1 … 2 3 7 1 1 2 1 7 1 5 3 1 7 1 3 3 1 1 1 3 1 3 3 3 3 1 7 7 7 5 1 5 1 1 1 1 1 1 CS 194-10 Fall 2011, Stuart Russell 44 Application: social network analysis HP Labs email data 500 users, 20k connections evolving over time Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 45 Application: spam filtering • • • • • 200 billion spam messages sent per day Asymmetric cost of false positive/false negative Weak label: discarded without reading Strong label (“this is spam”) hard to come by Standard iid assumption violated: spammers alter spam generators to evade or subvert spam filters (“adversarial learning” task) Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 46 Learning Learning knowledge data Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 47 Learning prior knowledge Learning knowledge data Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 48 Learning prior knowledge Learning knowledge data Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 49 Learning prior knowledge Learning knowledge data Crucial open problem: weak intermediate forms of knowledge that support future generalizations Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 50 Example Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 51 Example Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 52 Example Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 53 Example – arriving at Sao Paulo, Brazil Bem-vindo! Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 54 Example – arriving at Sao Paulo, Brazil Bem-vindo! Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 55 Example – arriving at Sao Paulo, Brazil Bem-vindo! Bem-vindo! Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 56 Example – arriving at Sao Paulo, Brazil Bem-vindo! Bem-vindo! Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 57 Weak prior knowledge • In this case, people in a given country (and city) tend to speak the same language • Where did this knowledge come from? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 58 Weak prior knowledge • In this case, people in a given country (and city) tend to speak the same language • Where did this knowledge come from? – Experience with other countries – “Common sense” – i.e., knowledge of how societies and languages work Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 59 Weak prior knowledge • In this case, people in a given country (and city) tend to speak the same language • Where did this knowledge come from? – Experience with other countries – “Common sense” – i.e., knowledge of how societies and languages work • And where did that knowledge come from? Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 60 Knowledge? What is knowledge? All I know is samples!! [V. Vapnik] • All knowledge derives, directly or indirectly, from experience of individuals • Knowledge serves as a directly applicable shorthand for all that experience – better than requiring constant review of the entire sensory/evolutionary history of the human race Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 61 CTBT: Comprehensive Nuclear-Test-Ban Treaty • Bans testing of nuclear weapons on earth – Allows for outside inspection of 1000km2 • • • • 182/195 states have signed 153/195 have ratified Need 9 more ratifications including US, China US Senate refused to ratify in 1998 – “too hard to monitor” Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 71 2053 nuclear explosions Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 72 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 73 254 monitoring stations Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 74 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 75 The problem • Given waveform traces from all seismic stations, figure out what events occurred when and where • Traces at each sensor station may be preprocessed to form “detections” (90% are not real) ARID ORID STA 49392708 5295499 WRA 49595064 5295499 FITZ 49674189 5295499 MKAR 49674227 5295499 ASAR PH BEL DELTA SEAZ ESAZ TIME TDEF AZRES ADEF SLORES SDEF WGT VMODEL LDDATE P -1.0 23.673881 342.00274 163.08123 0.19513991 d -1.2503497 d 0.24876981 d -999.0 0.61806399 IASP 2009-04-02 12:54:27 P -1.0 20.835616 4.3960142 184.18581 1.2515257 d 2.7290018 d 5.4541182 n -999.0 0.46613527 IASP 2009-04-02 12:54:27 P -1.0 58.574266 124.26633 325.35514 -0.053738765 d -4.6295428 d 1.5126035 d -999.0 0.76750542 IASP 2009-04-02 12:54:27 P -1.0 27.114852 345.18433 166.42383 -0.71255454 d -6.4901126 d 0.95510033 d -999.0 0.66453657 IASP 2009-04-02 12:54:27 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 76 What do we know? • Events happen randomly; each has a time, location, depth, magnitude; seismicity varies with location • Seismic waves of many kinds (“phases”) travel through the Earth – Travel time and attenuation depend on phase and source/destination • Arriving waves may or may not be detected, depending on sensor and local noise environment • Local noise may also produce false detections Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 77 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 78 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 79 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 80 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 81 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 82 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 83 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 84 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 85 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 86 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 87 # SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE]; IsEarthQuake(e) ~ Bernoulli(.999); EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution() Else UniformEarthDistribution(); Magnitude(e) ~ Exponential(log(10)) + MIN_MAG; Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s)); IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s); #Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)]; #Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0; Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION) else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a); TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a))); Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360) else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a); AzRes(a) ~ Laplace(0, AZSCALE(site(a))); Slow(a) ~ If (event(a) = null) then Uniform(0,20) else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a)); Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 88 Learning with prior knowledge • Instead of learning a mapping from detection histories to event bulletins, learn local pieces of an overall structured model: – Event location prior (A6) – Predictive travel time model (A1) – Phase type classifier (A2) Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 89 Event location prior (A6) Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 90 Travel time prediction (A1) How long does it take for a seismic signal to get from A to B? This is the travel time T(A,B) If we know this accurately, and we know the arrival times t1, t2, t3, … at several stations B1, B2, B3, …, we can find an accurate estimate of the location A and time t for the event, such that – T(A,Bi) ≈ ti – t for all i Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 91 Earth 101 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 92 Seismic “phases” (wave types/paths) Seismic energy is emitted in different types of waves; there are also qualitatively distinct paths (e.g., direct vs reflected from surface vs. refracted through core). P and S are the direct waves; P is faster Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 93 Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 94 IASP 91 reference velocity model Spherically symmetric, Vphase(depth); from this, obtain Tpredicted(A,B). Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 95 IASP91 inaccuracy is too big! • Earth is inhomogeneous: variations in crust thickness and rock properties (“fast” and “slow”) Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 96 Travel time residuals (Tactual – Tpredicted) • Residual surface (wrt a particular station) is locally smooth; estimate by local regression Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 97