russell-ctbt-llnl08 - Computer Science Division

advertisement
Machine learning, probabilistic
modelling
Stuart Russell
Computer Science Division, UC Berkeley
1
Outline
 Some basic aspects of machine learning
 Example: detecting artifacts in ICU data
 Example: probabilistic data association
 Multitarget tracking
 Freeway traffic
 CiteSeer
 Sibyl attacks on recommender systems
2
Machine learning: model-free
Learning
hypothesis
data
3
Model-free learning contd.
 Supervised learning
 Input: x1, f(x1) … xn, f(xn)
 (many possible input and label spaces)
 Output: h  f
 E.g., f classifies xi as earthquake/explosion
 Unsupervised learning
 Input: x1, … xn
 Output: clustering of inputs into categories
4
Model-free learning contd.
 Application, form of data influence choice
of hypothesis class for H




Linear models, logistic regression
Decision trees (classification or regression)
Nonparametric (instance-based)
Kernel methods
 effectively linear separators in a transformed high-
dimensional input space
 Probabilistic grammars for strings
 Etc.
5
Model-based learning
prior
knowledge
Learning
knowledge
data
6
Model-based learning
prior
knowledge
Learning
knowledge
data
7
Bayesian model-based learning
 Generative approach
 P(world) describes prior over what is (source), also
over model parameters, structure
 P(signal | world) describes sensor model (channel)
 Given new signal, compute P(world | signal)
 Learning
 Posterior over parameters (or structure) given data
 Or use maximum a posteriori, maximum likelihood
 Substantial advances modeling capabilities,
general-purpose inference algorithms
 Applications with millions of parameters,
gigabytes of data are fairly routine
8
9
Artifact events ubiquitous
10
Blood pressure signals
11
Artifact events
 Goal: detect, categorize, and correct for
artifacts in blood pressure signal
12
Generative model
 Parameters for event duration, frequency trained
on small sample of one-second data
 Detection uses equivalent one-minute model
based on measurement and artifact processes
13
ALARM
14
Example: classical data association
15
Example: classical data association
16
Example: classical data association
17
Example: classical data association
18
Example: classical data association
19
Example: classical data association
20
Generative model
 World = aircraft, trajectories, blip associations
#Aircraft ~ NumAircraftPrior();
State(a, t)
if t = 0 then ~ InitState()
else ~ StateTransition(State(a, t-1));
#Blip(Source = a, Time = t)
~ NumDetectionsCPD(State(a, t));
#Blip(Time = t)
~ NumFalseAlarmsPrior();
ApparentPos(r)
if (Source(r) = null) then ~ FalseAlarmDistrib()
else ~ ObsCPD(State(Source(r), Time(r)));
21
Aircraft Tracking Results
[Oh et al., CDC 2004]
(simulated data)
MCMC has smallest error,
hardly degrades at all as
tracks get dense
[Figures by Songhwai Oh]
MCMC is nearly as fast as
greedy algorithm;
much faster than MHT
22
Extending the Model: Air Bases
#Aircraft(InitialBase = b) ~ InitialAircraftPerBasePrior();
CurBase(a,
if t =
elseif
elseif
else =
t)
0 then = InitialBase(b)
TakesOff(a, t-1) then = null
Lands(a, t-1) then = Dest(a, t-1)
CurBase(a, t-1);
InFlight(a, t) = (CurBase(a, t) = null);
TakesOff(a, t)
if !InFlight(a, t) then ~ Bernoulli(0.1);
Lands(a, t)
if InFlight(a, t) then
~ LandingCPD(State(a, t), Location(Dest(a, t)));
Dest(a, t)
if TakesOff(a, t) then ~ Uniform({Base b})
elseif InFlight(a, t) then = Dest(a, t-1)
State(a, t)
if TakesOff(a, t-1) then
~ InitState(Location(CurBase(a, t-1)))
elseif InFlight(a, t) then
~ StateTrans(State(a, t-1), Location(Dest(a, t))); 23
Unknown Air Bases
 Just add two more lines:
#AirBase ~ NumBasesPrior();
Location(b) ~ BaseLocPrior();
24
Example: traffic surveillance
Multiple distributed sensors
Uncertain, time-varying travel time
Prediction error >>> object separation
25
Example: Citation Matching
[Lashkari et al 94] Collaborative Interface Agents,
Yezdi Lashkari, Max Metral, and Pattie Maes,
Proceedings of the Twelfth National Conference on
Articial Intelligence, MIT Press, Cambridge, MA,
1994.
Metral M. Lashkari, Y. and P. Maes. Collaborative
interface agents. In Conference of the American
Association for Artificial Intelligence, Seattle,
WA, August 1994.
Are these descriptions of the same object?
Core task in CiteSeer, Google Scholar
26
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~
NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar
(Name(FirstAuthor(PubCited(c))),
Title(PubCited(c)));
27
Citation Matching Results
Error
(Fraction of Clusters Not Recovered Correctly)
0.25
0.2
Phrase Matching
[Lawrence et al. 1999]
0.15
Generative Model + MCMC
[Pasula et al. 2002]
Conditional Random Field
[Wellner et al. 2004]
0.1
0.05
0
Reinforce
Face
Reason
Constraint
Four data sets of ~300-500 citations, referring to ~150-300 papers
28
Example: Sibyl attacks




Typically between 100 and 10,000 real entities
About 90% are honest, have one identity
Dishonest entities own between 10 and 1000 identities.
Transactions may occur between identities
 If two identities are owned by the same entity (sibyls), then a
transaction is highly likely;
 Otherwise, transaction is less likely (depending on honesty of
each identity’s owner).
 An identity may recommend another after a transaction:
 Sibyls with the same owner usually recommend each other;
 Otherwise, probability of recommendation depends on the
honesty of the two entities.
29
#Entity ~ LogNormal[6.9, 2.3]();
Honest(x) ~ Boolean[0.9]();
#Identity(Owner = x) ~
if Honest(x) then 1 else LogNormal[4.6,2.3]();
Transaction(x,y) ~
if Owner(x) = Owner(y) then SibylPrior ()
else TransactionPrior(Honest(Owner(x)),
Honest(Owner(y)));
Recommends(x,y) ~
if Transaction(x,y) then
if Owner(x) = Owner(y) then Boolean[0.99]()
else RecPrior(Honest(Owner(x)),
Honest(Owner(y)));
Evidence: lots of transactions and recommendations,
maybe some Honest(.) assertions
Query: Honest(x)
30
Summary
 Generative approach to machine learning
 Can accommodate
 strong prior knowledge
 heterogeneous data
 noise, artifacts
 Vertically integrated probability models (not
pipeline) connect events, transmission,
detection, association
31
Download