COMP201 Java Programming

advertisement
COMP 538
(Reasoning and Decision under Uncertainty)
Introduction to Bayesian networks
Introduction to Course
COMP 538 Introduction / Slide 2
Nevin L. Zhang
Bayesian Networks

Bayesian networks

Are networks of random variables

Are a marriage between probability theory and graph theory.

Represent conditional independence
– A random variable is directly related to only a few neighboring variables.
– It is independent of all other variables given the neighboring variables.

Facilitate the application of probability theory to many problems in AI,
Applied Mathematics, Statistics, and Engineering that
– Are complex
– Involve uncertainty.
COMP 538 Introduction / Slide 3
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Bayesian networks have been developed in the AI community as a
tool to build intelligent reasoning systems, in particular expert
systems.


We next provide a brief historic account of the development.
Prior to 1980, intelligent reasoning systems were based on symbolic
logic.

To tackle uncertainty, numerical tags were attached with if-then rules.

Sometimes, the numbers were interpreted probabilistically (MYCIN
(Buchanan et al 1984), PROSPECTOR).

The probabilistic interpretation is not justified because the numbers
were not manipulated according principles of probability theory.
COMP 538 Introduction / Slide 4
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Rule-based systems

Uncertainty associated with rules: summarizes exceptions
– Consider rule

If the ground is wet, then it rained.
– Exceptions: Sprinkler was on, water truck leaked, water pipe bursted, …
– In general, exceptions too many to explicate. Summarized by a weight:


If the ground is wet, then it rained (0.8).
Application of rule “if A then B”:
– If you see A in knowledge base, then conclude B
– (Locality) regardless of other things in knowledge base
– (Detachment) regardless of how A was derived.
COMP 538 Introduction / Slide 5
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Problem with locality (Pearl 1988)
– Rule 1: If ground wet, then it rained (0.8).
– If we see wet ground, we conclude that it rained with 80% prob.
– But what if, somewhere in the knowledge base, there is the
sentence “Sprinkler on last night”?

Problem with detachment (Pearl 1988)
– Rule 2: If Sprinkler on (last night), then ground wet (this morning).
– We know: Sprinkler on
– Using rule 2: Ground wet
– Using rule 1 (Detachment here): It rained.
COMP 538 Introduction / Slide 6
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Detachment also implies that there is no way to determine
whether two pieces of information originate from the same source
or from two independence sources (Pearl 1988, Henrion 1986).
– Analogy: Shanghai Disney larger than Hong Kong Disney?
Shanghai Business
Man Statement
TV report
Newspaper report
My Belief
COMP 538 Introduction / Slide 7
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Rule-based systems can operate safely only in tree structured
networks and they can perform either diagnosis or prediction, but
not both (Shafer and Pearl 1990, Introduction to Chapter 5).

Classic logic does not suffer from those problems because the
truth value characterize logical formulae themselves rather than
exceptions.
COMP 538 Introduction / Slide 8
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Model-based systems

Uncertainty measure not on individual logical formulae, rather on sets of
possible worlds.

In the case of probability models, there is a probability distribution over
the sample space --- the Cartesian product of the state spaces of all
random variables. Or a joint probability over all the random variables.
– Well-known and well-understood framework for uncertainty
– Clear semantics
– Provides principled answers for:

Combining evidence

Predictive & Diagnostic reasoning

Belief update
COMP 538 Introduction / Slide 9
Nevin L. Zhang
Probability Theory & Uncertainty in AI


Difficulties in applying probability theory:

Complexity of model construction

Complexity of problem solving

Both exponential in problem size: number of variables.
Example:

Patients in hospital are described by several attributes:
– Background: age, gender, history of diseases, …
– Symptoms: fever, blood pressure, headache, …
– Diseases: pneumonia, heart attack. …

A joint probability distribution needs to assign a number to each combination of
values of these attributes, exponential model size.
– 20 attributes require 220 =106 numbers
– Real applications usually involve hundreds of attributes

One of the reasons why probability theory did not play a significant role in AI
reasoning systems before 1980.
COMP 538 Introduction / Slide 10
Nevin L. Zhang
Probability Theory & Uncertainty in AI

The breakthrough came in early 1980’s (Pearl 1986,
1988, Howard & Matheson 1984)

In a joint probability, every variable is, in theory, directly related to
all other variables.

Pearl and others realized:
– It is often reasonable to make the, sometimes simplifying,
assumption that each variable is directly related to only a few other
variables.
– This leads to modularity: Splitting a complex model and its
associated calculations into small manageable pieces.
COMP 538 Introduction / Slide 11
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Example: Africa Visit (Lauritzen & Spiegelhalter 1988, modified)

Variables:
– Patient complaint: Dyspnea --- D,
– Q&A and Exam:

Visit-Africa --- A; Smoking --- S; X-Ray --- X
– Diagnosis


Lung Cancer --- L; Tuberculosis --- T; Bronchitis --- B
Assuming all variables binary, size of joint probability model
P(A, S, T, B, D, X)
is 64-1
COMP 538 Introduction / Slide 12
Nevin L. Zhang
Probability Theory & Uncertainty in AI


Reasonable assumptions:

X directly influenced by T & L; conditioned on T & L, it is independent of all other
variables.

D directly influenced by T, L, B; conditioned on T, L, B, it is independent of all
other variables.

A directly influences T.

Smoking directly influences L & B
Break up model P(A, S, T, L, B, X, D)

P(A), P(S), P(T|A), P(L|S), P(B|S)
P(TorL|T, L),P(X|TorL), P(D|TorL, B)

Total number of parameters
– 1+1+2+2+2+4+2+4=18
COMP 538 Introduction / Slide 13
Nevin L. Zhang
Probability Theory & Uncertainty in AI

Modularity (conditional independence)

Simplifies model construction
– 18 parameters instead of 63 parameters
– More drastic in real-world applications.

Model construction, inference, and model learning become
possible for realistic applications
– Before: exponential in problem size --- number of all variables.
– Now: exponential in “number of neighbors” (more precisely size of
largest clique.)
COMP 538 Introduction / Slide 14
Nevin L. Zhang
Probability Theory & Uncertainty in AI

After the breakthrough

1980’s
– Representation (graphical representation of conditional
independence)
– Inference (polytree propagation, clique tree propagation)

1990’s
– Inference (Variable elimination, search, MCMC, variational methods,
special local structures, …)
– Learning (Parameter learning, structure learning, incomplete data,
latent variables, …)
– Sensitivity analysis, temporal models, causal models, ….
COMP 538 Introduction / Slide 15
Nevin L. Zhang
Impact of Bayesian Networks

From non-existence to prominence

Prior to 1980, probability theory had essentially no role in AI.

1980-1990: The breakthrough and much research activity.
– However, by 1990, “there is till no consensus on the theoretical and
practical role of probability in AI” (Shafer and Pearl 1990,
Introduction)

2002: The role of Bayesian networks in AI is so prominent that
the the first invited talk at AAAI-2002 was entitled
– Probabilistic AI (M. Jordan)
COMP 538 Introduction / Slide 16
Nevin L. Zhang
Impact of Bayesian Networks


Bayesian networks are now a major topic in influential textbooks on

AI (Russell & Norvig 1995, Artificial intelligence : a modern approach)

Machine Learning (Mitchell 1997, Machine Learning)
They are also discussed in textbooks on

Data Mining (Hand et al 2001, Principles of Data Mining)

Pattern Recognition (Duda et al 2000, Pattern Classification)
COMP 538 Introduction / Slide 17
Nevin L. Zhang
Impact of Bayesian Networks

Impact beyond AI

In statistics, Bayesian networks are
– Viewed as a kind of statistical models, just as regress models
– Called graphical models.
– Used for multivariate data analysis.

A side note: Bayesian networks vs Neural networks
– Bayesian networks: model data generation process, more
interpretable.
– Neural networks: motivated by biological process, less interpretable.
COMP 538 Introduction / Slide 18
Nevin L. Zhang
Impact of Bayesian Networks

Bayesian networks provide a uniform framework to view a variety
of models in Statistics and Engineering:
– Hidden Markov models, Mixture models, latent class models,
Kalman filters, factor analysis, and Ising models.

Books were written that
– Use Bayesian networks to explain algorithms in digital
communication, in particular data compress and channel coding
(Frey 1998, Graphical models for machine learning and digital
communication)
– Draw connections between Bayesian networks and contemporary
cognitive psychology (Glymour 2001, The mind’s arrows: Bayes nets
and graphical causal models in psychology)
COMP 538 Introduction / Slide 19
Nevin L. Zhang
Impact of Bayesian Networks

Applications: too many to survey. Some random samples:

Medical diagnostic systems,

Real-time weapons scheduling,

Jet-engines fault diagnosis,

Intel processor fault diagnosis (Intel),

Generator monitoring expert system (General Electric),

Software troubleshooting (Microsoft office assistant, Win98 print
troubleshooting),

Space shuttle engines monitoring(Vista project),

Biological sequences analysis and classification
COMP 538 Introduction / Slide 20
Nevin L. Zhang
Contents of This Course

This course is designed for graduate students in science and
engineering.

Our objective is to give an in-depth coverage of what we deem the
core concepts, ideas, and results of Bayesian networks

Concept and semantics of Bayesian networks

Representational issues: what can/cannot be represented

Inference: How to answer queries efficiently

Learning: How to adapt/learn Bayesian networks based on/from data

Special models: Hidden Markov models, Latent class models

Bayesian networks for classification and cluster analysis
Download