CS194 Fall 2011 Lect..

advertisement
CS194-10 Fall 2011
Introduction to Machine Learning
Machine Learning: An Overview
People
Avital Steinitz
2nd year CS
PhD student
Stuart Russell
30th-year CS
PhD student
Mert Pilanci
2nd year EE
PhD student
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
2
Administrative details
• Web page
• Newsgroup
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
3
Course outline
• Overview of machine learning (today)
• Classical supervised learning
– Linear regression, perceptrons, neural nets, SVMs, decision trees,
nearest neighbors, and all that
– A little bit of theory, a lot of applications
• Learning probabilistic models
–
–
–
–
–
–
–
Probabilistic classifiers (logistic regression, etc.)
Unsupervised learning, density estimation, EM
Bayes net learning
Time series models
Dimensionality reduction
Gaussian process models
Language models
• Bandits and other exciting topics
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
4
Lecture outline
• Goal: Provide a framework for understanding all the
detailed content to come, and why it matters
• Learning: why and how
• Supervised learning
– Classical: finding simple, accurate hypotheses
– Probabilistic: finding likely hypotheses
– Bayesian: updating belief in hypotheses
• Data and applications
• Expressiveness and cumulative learning
• CTBT
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
5
Learning is….
… a computational process for improving
performance based on experience
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
6
Learning: Why?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
7
Learning: Why?
• The baby, assailed by eyes, ears, nose, skin, and
entrails at once, feels it all as one great blooming,
buzzing confusion …
– [William James, 1890]
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
8
Learning: Why?
• The baby, assailed by eyes, ears, nose, skin, and
entrails at once, feels it all as one great blooming,
buzzing confusion …
– [William James, 1890]
Learning is essential for unknown environments,
i.e., when the designer lacks omniscience
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
9
Learning: Why?
• Instead of trying to produce a programme to simulate the
adult mind, why not rather try to produce one which
simulates the child's? If this were then subjected to an
appropriate course of education one would obtain the adult
brain. Presumably the child brain is something like a
notebook as one buys it from the stationer's. Rather little
mechanism, and lots of blank sheets.
– [Alan Turing, 1950]
• Learning is useful as a system construction
method, i.e., expose the system to reality rather
than trying to write it down
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
10
Learning: How?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
11
Learning: How?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
12
Learning: How?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
13
Learning: How?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
14
Structure of a learning agent
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
15
Design of learning element
• Key questions:
– What is the agent design that will implement the desired
performance?
– Improve the performance of what piece of the agent
system and how is that piece represented?
– What data are available relevant to that piece? (In
particular, do we know the right answers?)
– What knowledge is already available?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
16
Examples
Agent design
Component
Representation
Feedback
Knowledge
Alpha-beta
search
Evaluation
function
Linear
polynomial
Win/loss
Rules of game;
Coefficient signs
Logical planning
agent
Transition model Successor-state
(observable envt) axioms
Action outcomes
Available actions;
Argument types
Utility-based
patient monitor
Physiology/senso
r model
Dynamic
Observation
Bayesian network sequences
Gen physiology;
Sensor design
Satellite image
pixel classifier
Classifier (policy)
Markov random
field
Coastline;
Continuity scales
Partial labels
Supervised learning: correct answers for each training instance
Reinforcement learning: reward sequence, no correct answers
Unsupervised learning: “just make sense of the data”
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
17
Supervised learning
• To learn an unknown target function f
• Input: a training set of labeled examples (xj,yj)
where yj = f(xj)
• E.g., xj is an image, f(xj) is the label “giraffe”
• E.g., xj is a seismic signal, f(xj) is the label “explosion”
• Output: hypothesis h that is “close” to f, i.e., predicts
well on unseen examples (“test set”)
• Many possible hypothesis families for h
– Linear models, logistic regression, neural networks, decision
trees, examples (nearest-neighbor), grammars, kernelized
separators, etc etc
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
18
Supervised learning
• To learn an unknown target function f
• Input: a training set of labeled examples
where yj = f(xj)
(xj,yj)
• E.g., xj is an image, f(xj) is the label “giraffe”
• E.g., xj is a seismic signal, f(xj) is the label “explosion”
• Output: hypothesis h that is “close” to f, i.e., predicts
well on unseen examples (“test set”)
• Many possible hypothesis families for h
– Linear models, logistic regression, neural networks, decision
trees, examples (nearest-neighbor), grammars, kernelized
separators, etc etc
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
19
Supervised learning
• To learn an unknown target function f
• Input: a training set of labeled examples
where yj = f(xj)
(xj,yj)
• E.g., xj is an image, f(xj) is the label “giraffe”
• E.g., xj is a seismic signal, f(xj) is the label “explosion”
• Output: hypothesis h that is “close” to f, i.e., predicts
well on unseen examples (“test set”)
• Many possible hypothesis families for h
– Linear models, logistic regression, neural networks, decision
trees, examples (nearest-neighbor), grammars, kernelized
separators, etc etc
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
20
Example: object recognition
x
f(x)
giraffe
Lecture 1 8/25/11
giraffe
giraffe
llama
CS 194-10 Fall 2011, Stuart Russell
llama
llama
21
Example: object recognition
x
f(x)
giraffe
X=
Lecture 1 8/25/11
giraffe
giraffe
llama
llama
llama
f(x)=?
CS 194-10 Fall 2011, Stuart Russell
22
Example: curve fitting
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
23
Example: curve fitting
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
24
Example: curve fitting
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
25
Example: curve fitting
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
26
Example: curve fitting
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
27
Basic questions
• Which hypothesis space H to choose?
• How to measure degree of fit?
• How to trade off degree of fit vs. complexity?
– “Ockham’s razor”
• How do we find a good h?
• How do we know if a good h will predict well?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
28
Philosophy of Science (Physics)
• Which hypothesis space H to choose?
– Deterministic hypotheses, usually mathematical formulas
and/or logical sentences; implicit relevance determination
• How to measure degree of fit?
– Ideally, h will be consistent with data
• How to trade off degree of fit vs. complexity?
– Theory must be correct up to “experimental error”
• How do we find a good h?
– Intuition, imagination, inspiration (invent new terms!!)
• How do we know if a good h will predict well?
– Hume’s Problem of Induction: most philosophers give up
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
29
Kolmogorov complexity (also MDL, MML)
• Which hypothesis space H to choose?
– All Turing machines (or programs for a UTM)
• How to measure degree of fit?
– Fit is perfect (program has to output data exactly)
• How to trade off degree of fit vs. complexity?
– Minimize size of program
• How do we find a good h?
– Undecidable (unless we bound time complexity of h)
• How do we know if a good h will predict well?
– (recent theory borrowed from PAC learning)
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
30
Classical stats/ML: Minimize loss function
• Which hypothesis space H to choose?
– E.g., linear combinations of features: hw(x) = wTx
• How to measure degree of fit?
– Loss function, e.g., squared error Σj (yj – wTx)2
• How to trade off degree of fit vs. complexity?
– Regularization: complexity penalty, e.g., ||w||2
• How do we find a good h?
– Optimization (closed-form, numerical); discrete search
• How do we know if a good h will predict well?
– Try it and see (cross-validation, bootstrap, etc.)
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
31
Probabilistic: Max. likelihood, max. a priori
• Which hypothesis space H to choose?
– Probability model P(y | x,h) , e.g., Y ~ N(wTx,σ2)
• How to measure degree of fit?
– Data likelihood Πj P(yj | xj,h)
• How to trade off degree of fit vs. complexity?
– Regularization or prior: argmaxh P(h) Πj P(yj | xj,h) (MAP)
• How do we find a good h?
– Optimization (closed-form, numerical); discrete search
• How do we know if a good h will predict well?
– Empirical process theory (generalizes Chebyshev, CLT, PAC…);
– Key assumption is (i)id
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
32
Bayesian: Computing posterior over H
• Which hypothesis space H to choose?
– All hypotheses with nonzero a priori probability
• How to measure degree of fit?
– Data probability, as for MLE/MAP
• How to trade off degree of fit vs. complexity?
– Use prior, as for MAP
• How do we find a good h?
– Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h)
• How do we know if a good h will predict well?
– Silly question! Bayesian prediction is optimal!!
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
33
Bayesian: Computing posterior over H
• Which hypothesis space H to choose?
– All hypotheses with nonzero a priori probability
• How to measure degree of fit?
– Data probability, as for MLE/MAP
• How to trade off degree of fit vs. complexity?
– Use prior, as for MAP
• How do we find a good h?
– Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h)
• How do we know if a good h will predict well?
– Silly question! Bayesian prediction is optimal!!
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
34
Neon sculpture at Autonomy Corp.
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
35
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
36
Lots of data
•
•
•
•
•
Web: estimated Google index 45 billion pages
Clickstream data: 10-100 TB/day
Transaction data: 5-50 TB/day
Satellite image feeds: ~1TB/day/satellite
Sensor networks/arrays
– CERN Large Hadron Collider ~100 petabytes/day
• Biological data: 1-10TB/day/sequencer
• TV: 2TB/day/channel; YouTube 4TB/day uploaded
• Digitized telephony: ~100 petabytes/day
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
37
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
38
Real data are messy
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
39
Arterial blood pressure (high/low/mean) 1s
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
40
Application: satellite image analysis
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
41
Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG
TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA
TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT
CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG
GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT
AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT
GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA
GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA
GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC
TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA...
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
42
Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCGGTTGTGATG
TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTCTTCAGTAGTA
TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCCTTACTGTTAT
CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTTATGTCACAAGG
GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCTTGACTGAAT
AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTGAGTTGGAGTT
GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGTATTTCCACTA
GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATTACATCTTGAA
GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCGCGGAGATAAC
TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCGTCAGTGA...
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
43
Application: User website behavior from
clickstream data (from P. Smyth, UCI)
128.195.36.195, -, 3/22/00, 10:35:11, W3SVC, SRVR1, 128.200.39.181, 781, 363, 875, 200, 0, GET, /top.html, -,
128.195.36.195, -, 3/22/00, 10:35:16, W3SVC, SRVR1, 128.200.39.181, 5288, 524, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.195, -, 3/22/00, 10:35:17, W3SVC, SRVR1, 128.200.39.181, 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.195.36.101, -, 3/22/00, 16:18:50, W3SVC, SRVR1, 128.200.39.181, 60, 425, 72, 304, 0, GET, /top.html, -,
128.195.36.101, -, 3/22/00, 16:18:58, W3SVC, SRVR1, 128.200.39.181, 8322, 527, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.101, -, 3/22/00, 16:18:59, W3SVC, SRVR1, 128.200.39.181, 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:54:37, W3SVC, SRVR1, 128.200.39.181, 140, 199, 875, 200, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 17766, 365, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:07, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 1061, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:39, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:03, W3SVC, SRVR1, 128.200.39.181, 1081, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:56:04, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:33, W3SVC, SRVR1, 128.200.39.181, 0, 262, 72, 304, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:56:52, W3SVC, SRVR1, 128.200.39.181, 19598, 382, 414, 200, 0, POST, /spt/main.html, -,
User 1
User 2
User 3
User 4
User 5
…
Lecture 1 8/25/11
2
3
7
1
5
3
3
7
5
1
…
2
3
7
1
1
2
1
7
1
5
3
1
7
1
3 3 1 1 1 3 1 3 3 3 3
1
7 7 7
5 1 5 1 1 1 1 1 1
CS 194-10 Fall 2011, Stuart Russell
44
Application: social network analysis
HP Labs email data
500 users, 20k connections
evolving over time
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
45
Application: spam filtering
•
•
•
•
•
200 billion spam messages sent per day
Asymmetric cost of false positive/false negative
Weak label: discarded without reading
Strong label (“this is spam”) hard to come by
Standard iid assumption violated: spammers alter
spam generators to evade or subvert spam filters
(“adversarial learning” task)
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
46
Learning
Learning
knowledge
data
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
47
Learning
prior
knowledge
Learning
knowledge
data
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
48
Learning
prior
knowledge
Learning
knowledge
data
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
49
Learning
prior
knowledge
Learning
knowledge
data
Crucial open problem: weak intermediate forms of knowledge
that support future generalizations
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
50
Example
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
51
Example
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
52
Example
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
53
Example – arriving at Sao Paulo, Brazil
Bem-vindo!
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
54
Example – arriving at Sao Paulo, Brazil
Bem-vindo!
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
55
Example – arriving at Sao Paulo, Brazil
Bem-vindo!
Bem-vindo!
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
56
Example – arriving at Sao Paulo, Brazil
Bem-vindo!
Bem-vindo!
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
57
Weak prior knowledge
• In this case, people in a given country (and city)
tend to speak the same language
• Where did this knowledge come from?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
58
Weak prior knowledge
• In this case, people in a given country (and city)
tend to speak the same language
• Where did this knowledge come from?
– Experience with other countries
– “Common sense” – i.e., knowledge of how societies and
languages work
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
59
Weak prior knowledge
• In this case, people in a given country (and city)
tend to speak the same language
• Where did this knowledge come from?
– Experience with other countries
– “Common sense” – i.e., knowledge of how societies and
languages work
• And where did that knowledge come from?
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
60
Knowledge? What is knowledge?
All I know is samples!! [V. Vapnik]
• All knowledge derives, directly or indirectly, from
experience of individuals
• Knowledge serves as a directly applicable shorthand
for all that experience – better than requiring
constant review of the entire sensory/evolutionary
history of the human race
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
61
CTBT: Comprehensive Nuclear-Test-Ban Treaty
• Bans testing of nuclear weapons on earth
– Allows for outside inspection of 1000km2
•
•
•
•
182/195 states have signed
153/195 have ratified
Need 9 more ratifications including US, China
US Senate refused to ratify in 1998
– “too hard to monitor”
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
71
2053 nuclear explosions
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
72
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
73
254 monitoring stations
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
74
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
75
The problem
• Given waveform traces from all seismic stations,
figure out what events occurred when and where
• Traces at each sensor station may be preprocessed
to form “detections” (90% are not real)
ARID
ORID STA
49392708 5295499 WRA
49595064 5295499 FITZ
49674189 5295499 MKAR
49674227 5295499 ASAR
PH BEL DELTA SEAZ
ESAZ
TIME
TDEF AZRES ADEF SLORES SDEF WGT
VMODEL LDDATE
P -1.0 23.673881 342.00274 163.08123 0.19513991 d -1.2503497 d 0.24876981 d -999.0 0.61806399 IASP 2009-04-02 12:54:27
P -1.0 20.835616 4.3960142 184.18581 1.2515257
d 2.7290018 d 5.4541182 n -999.0 0.46613527 IASP 2009-04-02 12:54:27
P -1.0 58.574266 124.26633 325.35514 -0.053738765 d -4.6295428 d 1.5126035 d -999.0 0.76750542 IASP 2009-04-02 12:54:27
P -1.0 27.114852 345.18433 166.42383 -0.71255454 d -6.4901126 d 0.95510033 d -999.0 0.66453657 IASP 2009-04-02 12:54:27
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
76
What do we know?
• Events happen randomly; each has a time, location,
depth, magnitude; seismicity varies with location
• Seismic waves of many kinds (“phases”) travel
through the Earth
– Travel time and attenuation depend on phase and
source/destination
• Arriving waves may or may not be detected,
depending on sensor and local noise environment
• Local noise may also produce false detections
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
77
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
78
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
79
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
80
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
81
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
82
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
83
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
84
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
85
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
86
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
87
# SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE];
IsEarthQuake(e) ~ Bernoulli(.999);
EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution()
Else UniformEarthDistribution();
Magnitude(e) ~ Exponential(log(10)) + MIN_MAG;
Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s));
IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);
#Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)];
#Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0;
Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION)
else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a);
TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a)));
Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360)
else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a);
AzRes(a) ~ Laplace(0, AZSCALE(site(a)));
Slow(a) ~ If (event(a) = null) then Uniform(0,20)
else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
88
Learning with prior knowledge
• Instead of learning a mapping from detection
histories to event bulletins, learn local pieces of an
overall structured model:
– Event location prior (A6)
– Predictive travel time model (A1)
– Phase type classifier (A2)
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
89
Event location prior (A6)
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
90
Travel time prediction (A1)
How long does it take for a
seismic signal to get from A to
B? This is the travel time T(A,B)
If we know this accurately, and
we know the arrival times t1, t2,
t3, … at several stations B1, B2,
B3, …, we can find an accurate
estimate of the location A and
time t for the event, such that
– T(A,Bi) ≈ ti – t for all i
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
91
Earth 101
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
92
Seismic “phases” (wave types/paths)
Seismic energy is emitted
in different types of
waves; there are also
qualitatively distinct
paths (e.g., direct vs
reflected from surface
vs. refracted through
core). P and S are the
direct waves; P is faster
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
93
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
94
IASP 91 reference velocity model
Spherically symmetric, Vphase(depth); from this, obtain Tpredicted(A,B).
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
95
IASP91 inaccuracy is too big!
• Earth is inhomogeneous: variations in crust
thickness and rock properties (“fast” and “slow”)
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
96
Travel time residuals (Tactual – Tpredicted)
• Residual surface (wrt a particular station) is locally
smooth; estimate by local regression
Lecture 1 8/25/11
CS 194-10 Fall 2011, Stuart Russell
97
Download