ppt file

advertisement
Introduction to
Predictive Learning
LECTURE SET 1
INTRODUCTION and OVERVIEW
Electrical and Computer Engineering
1
OUTLINE of Set 1
1.1 Overview: what is this course about:
- subject matter
- philosophical connections
- prerequisites and HW1
- expected outcome of this course
1.2 Historical Perspective
1.3 Motivation for Empirical Knowledge
1.4 General Experimental Procedure for
Estimating Models from Data
2
1.1.1 Subject Matter
Uncertainty and Learning
• Decision making under uncertainty
• Biological learning (adaptation)
(examples and discussion)
• Induction, Statistics and Philosophy
Ex. 1: Many old men are bald
Ex. 2: Sun rises on the East every
day
3
(cont’d) Many old men are bald
•
Psychological Induction:
- inductive statement based on experience
- also has certain predictive aspect
- no scientific explanation
•
Statistical View:
- the lack of hair = random variable
- estimate its distribution (depending on age) from
past observations (training sample)
•
Philosophy of Science Approach:
- find scientific theory to explain the lack of hair
- explanation itself is not sufficient
- true theory needs to make non-trivial predictions
4
Conceptual Issues
• Any theory (or model) has two aspects:
1. explanation of past data (observations)
2. prediction of future (unobserved) data
• Achieving both goals perfectly not possible
• Important issues to be addressed:
- quality of explanation and prediction
- is good prediction possible at all ?
- if two models explain past data equally well, which one
is better?
- how to distinguish between true scientific and pseudoscientific theories?
5
Beliefs vs True Theories
Men have lower life expectancy than women
• Because they choose to do so
• Because they make more money (on
average) and experience higher stress
managing it
• Because they engage in risky activities
• Because …..
Demarcation problem in philosophy
6
1.1.2 Philosophical Connections
• Oxford English dictionary:
Induction is the process of inferring a
general law or principle from the
observations of particular instances.
• Clearly related to Predictive Learning.
• All science and (most of) human knowledge
involves induction
• How to form ‘good’ inductive theories?
7
Challenge of Predictive Learning
• Explain the past and predict the future
8
Background: philosophy
William of Ockham: entities should not
be multiplied beyond necessity
Epicurus of Samos: If more than one
theory is consistent with the
observations, keep all theories
9
Background: philosophy
Thomas Bayes:
How to update/ revise beliefs in
light of new evidence
Karl Popper: Every true
(inductive) theory prohibits
certain events or occurences,
i.e. it should be falsifiable
10
Background: philosophy
George W. Bush:
I am The Decider
11
Background: philosophy
Bill Clinton:
I told the Truth
12
1.1.3 Prerequisites and Hwk1
• Math: working knowledge of basic
Probability + Linear Algebra
• Statistical software (of your choice):
- MATLAB, also R-project, Mathematica etc.
Note: you will be using s/w implementations
of learning algorithm (not writing programs)
• Writing: no special requirements
• Philosophy: no special requirements
13
Homework 1
• Purpose: testing background on probability
and computer skills
• Goal: estimate pdf of a random variable X
• Real Data: X=daily price changes of SP500
i.e. X (t ) 
Z (t )  Z (t  1)
 100% where Z(t) = closing price
Z (t  1)
• Typical + Useful Statistics
- Histogram (empirical pdf)
- mean, standard deviation
14
(cont’d) Homework 1
Histogram = estimated pdf (from data)
• Example: histograms of 5 and 30 bins to model N(0,1)
also mean and standard deviation (estimated from data)
500
100
400
80
300
60
200
40
100
20
0
-3
-2
-1
0
1
2
3
4
0
-3
-2
-1
0
1
2
3
4
15
(cont’d) Homework 1
NOTE: histogram ~ empirical pdf, i.e. scale of y-axis scale is
in % (frequency).
See histogram of SP500 daily price changes in 1981:
1981
7.00%
6.00%
5.00%
4.00%
3.00%
2.00%
1.00%
2.00%
1.80%
1.60%
1.40%
1.20%
1.00%
0.80%
0.60%
0.40%
0.20%
0.00%
-0.20%
-0.40%
-0.60%
-0.80%
-1.00%
-1.20%
-1.40%
-1.60%
-1.80%
-2.00%
0.00%
16
Homework 1 (background)
• Is the stock market truly random?
- model daily price changes of SP500 as a
random coin toss process
- measure average trend duration, i.e.
UP, DOWN,UP, DOWN,…~ +1,-1,+1,-1,…
here average trend duration =1 day and the
process is not random
• Calculate average trend duration for random
coin toss process, compare it with the stock
market, and draw conclusions
17
1.1.4 Expected Outcome (of this course)
Scientific:
• Learning = generalization, concepts and issues
• Math theory: Statistical Learning Theory aka VC-theory
• Implications for Philosophy and Applications
Philosophical:
• Nature of human knowledge and intelligence
• Demarcation principle
• Psychology of learning
Practical:
• Financial engineering
• Security
• Genomics
• Predicting successful marriage, climate modeling etc., etc.
What is this course NOT about
18
OUTLINE of Set 1
1.1 Overview: what is this course about
1.2 Historical Perspective
Handling uncertainty and risk:
- probabilistic
vs - risk minimization
1.3 Motivation for Empirical Knowledge
1.4 General Experimental Procedure for
Estimating Models from Data
19
1.2.1Handling Uncertainty and Risk
• Ancient times
• Probability for quantifying uncertainty
- degree-of-belief
- frequentist (Cardano-1525, Pascale, Fermat)
• Newton and causal determinism
• Probability theory and statistics (20th century)
• Modern science (A. Einstein)
Goal of science: estimating a true model or
system identification
20
Handling Uncertainty and Risk(2)
• Making decisions under uncertainty
= risk management
• Probabilistic approach:
- estimate probabilities (of future events)
- assign costs and minimize expected risk
• Risk minimization approach:
- apply decisions to known past events
- select one minimizing expected risk
• Common in all living things: learning,
generalization
21
Human Generalization
•
•
All men by nature desire knowledge
- Aristotle
Example 1: continue given sequence
6, 10, 14, 18,….
Example 2:
Sceitnitss osbevred: it is nt inptrant how
lteters are msspled isnide the word. It is
ipmoratnt that the fisrt and lsat letetrs
do not chngae, tehn the txet is itneprted
corrcetly
22
Human Generalization
•
Example 3:
Emily Cherkassky
The Dictator
23
Scientific Example: Planetary Motions
• How planets move among the stars?
- Ptolemaic system (geocentric)
- Copernican system (heliocentric)
• Tycho Brahe (16 century)
- measure positions of the planets in the sky
- use experimental data to support one’s
view
• Johannes Kepler:
- used volumes of Tycho’s data to discover
three remarkably simple laws
24
First Kepler’s Law
• Sun lies in the plane of orbit, so we can
represent positions as (x,y) pairs
• An orbit is an ellipse, with the sun at a
focus
c1 x  c2 y  c3 xy  c4 x  c5 y  c6  0
2
2
25
Second Kepler’s Law
• The radius vector from the sun to the
planet sweeps out equal areas in the
same time intervals
26
Third Kepler’s Law
P
Mercury 0.24
Venus
0.62
Earth
1.00
Mars
1.88
Jupiter 11.90
Saturn 29.30
P = orbit period
D
0.39
0.72
1.00
1.53
5.31
9.55
P2
0.058
0.38
1.00
3.53
142.0
870.0
D3
0.059
0.39
1.00
3.58
141.00
871.00
D = orbit size (half-diameter)
For any two planets: P2 ~ D3
27
Empirical Scientific Theory
• Kepler’s Laws can
- explain experimental data
- predict new data (i.e., other planets)
- BUT do not explain why planets move.
• Popular explanation
- planets move because there are invisible
angels beating the wings behind them
• First principle scientific explanation
Galileo, Newton discovered laws of motion
and gravity that explain Kepler’s laws.
28
OUTLINE of Set 1
1.1 Overview: what is this course about
1.2 Historical Perspective
1.3 Motivation for Empirical Knowledge
- human (scientific) knowledge
- growth of empirical knowledge
- the nature of human knowledge
1.4 General Experimental Procedure for
Estimating Models from Data
29
1.3.1Human scientific knowledge
•
Knowledge is relationship between
facts and ideas (mental constructs)
• Two types of scientific knowledge:
- First-principles vs empirical
• Classical (first principles) knowledge:
- rich in ideas
- relatively few facts (amount of data)
- simple relationships
Examples/ discussion
30
1.3.2Growth of empirical knowledge
•
•
•
•
Huge growth of the amount of data in
20th century (computers and sensors)
Complex systems (engineering, life
sciences and social)
Classical first-principles science is
inadequate for empirical knowledge
Need for Data-Analytic Modeling:
How to estimate good predictive
models from noisy data
31
1.3.3 Nature of human knowledge
•
Three types of knowledge
- scientific (first-principles, deterministic)
- empirical
- metaphysical (beliefs)
•
Boundaries are poorly understood
32
More on Empirical Knowledge
Demarcation:
•
Empirical Knowledge vs Beliefs
Examples
•
Empirical vs First Principle Knowledge
Examples
33
Example: Empirical Knowledge vs Beliefs
From WSJ, January 13, 2007; Page B5, By Isabelle Lindemayer
Dollar Ends Three-Day Advance But Could Be Poised for Gains
"The dollar saw an after-the-fact
selloff following better-thanexpected retail-sales
numbers for December,"
Naomi Fink, currency
strategist at BNP Paribas,
said.
"More likely than not, investors felt
as though the dollar's threeday rally preceding the
release had compensated
adequately for the upside
surprise," she said
34
Empirical vs First Principle Knowledge
Compare:
•
•
•
First Principle Knowledge
Example: Newton’s Law of Gravity
Empirical Knowledge
Example: my wife’s successful betting
Question for Discussion:
What is the difference btwn the two
approaches to knowledge discovery?
35
Summary
• First-principles knowledge (taught at
school):
deterministic relationships between a few
concepts (variables)
• Importance of empirical knowledge:
- statistical in nature
- (possibly) many variables
• Goal of modeling: to act/perform well,
rather than system identification
36
1.4 General Experimental Procedure
1. Statement of the Problem
2. Hypothesis Formulation (Learning Problem
Statement)
3. Data Generation/ Experiment Design
4. Data Collection and Preprocessing
5. Model Estimation (learning)
6. Model Interpretion, Model Assessment and
Drawing Conclusions
Note:
- each step is complex and usually involves several
iterations
- estimated model depends on all previous steps
37
Honest Disclosure of Results
• Recall Tycho Brahe (16th century)
• Modern drug studies
Review of studies submitted to FDA
• Of 74 studies reviewed, 38 were
judged to be positive by the FDA.
All but one were published.
• Most of the studies found to
have negative or questionable
results were not published,
researchers found.
Source: The New England Journal of
Medicine, WSJ Jan 17, 2008)
Publication bias:
widespread in modern research
38
Download