Introduction to Predictive Learning LECTURE SET 3 Philosophical Perspectives Electrical and Computer Engineering 1 OUTLINE 3.1 Overview of Philosophy 3.2 Epistemology 3.3 Acquisition of Knowledge and Inference 3.4 Empirical Risk Minimization 3.5 Occam’s Razor 3.4 Popper’s Falsifiability 3.5 Bayesian Inference 3.6 Principle of Multiple Explanations 3.7 Social Systems and Philosophy 3.8 Summary and Discussion 2 3.1 Overview of Philosophy Motivation: why philosophy is relevant? • • • • • Relationship between reality (facts) and mental constructs (ideas) Epistemology and view of uncertainty Inference: from facts to models Truth vs utility Role of human culture /social structure 3 Observations, Reality and Mind Philosophy is concerned with the relationship btwn - Reality (Nature) - Sensory Perceptions - Mental Constructs (interpretations of reality) Three Philosophical Schools • REALISM: - objective physical reality perceived via senses - mental constructs reflect objective reality • IDEALISM: - primary role belongs to ideas (mental constructs) - physical reality is a by-product of Mind • INSTRUMENTALISM: - the goal of science is to produce useful theories 4 Philosophical Schools • Realism (materialism) • Idealism • Instrumentalism 5 • Realism is essential to common sense, but can not be proven by logic arguments • Idealism states that only mental constructs exist: R. Descartes: Cogito ergo sum I. Kant: … if I remove the thinking subject, the whole material world must at once vanish because it is nothing but a phenomenal appearance in the sensibility of ourselves as a subject, and a manner or species of representation. • Hegel (1770-1831): Reality and Mind are parts of a system Whatever exists (is real) is rational, and whatever is rational is real • Instrumentalist view: Whatever is useful is rational and (maybe) real 6 Discussion: Science and Philosophy • • • • REALISM - adopted by most scientists and engineers IDEALISM: - should not be trivialized INSTRUMENTALISM: - the goal of science is to produce useful theories Example(s) 7 • 3.2 Epistemology and Treatment of Uncertainty Epistemology ~ theory of knowledge - what is the nature of knowledge? - how it is acquired? • Statistical Learning: - what is the goal of learning? How to measure the quality of estimated models? - principles of inductive inference? • • Interpretation of uncertainty/randomness Examples: Ancient Greece and Age of Reason 8 Ancient Greece • Idealistic Approach - emphasizing logical reasoning - similar to geometry/ math • Start with self-evident truth ~ axioms • Then proceed via deduction to more complex forms of knowledge Example: Men are mortal. Socrates is a man. Therefore Socrates is mortal. • Drawbacks of Greek Epistemology - self-evident truth may be subjective - emphasis on human reasoning - lack of experimental science 9 Aristotle (384-322 BC) • Lack of empirical observations: Claimed that human males have more teeth than females • • • • Cosmological view: Earth is the center of the Universe Physical model: the World consists of ‘self-evident’ 5 elements: Fire, Earth, Air, Water, and Aether View of uncertainty plausability ~ ‘likeness to truth’ frequentist probability was totally unknown Rejecting randomness in Nature: “Chance always involves choice and hence cannot occur in nature” 10 Age of Reason (Rationalism) • 17th century in Western Europe was the start of modern philosophy, influenced by: • • 1. Advances in Physical Sciences: deterministic view of the physical world 2. Ideas from Renaissance: role of individuals in taking control of their density renewed interest in characterization of randomness and uncertainty Rationalism: any view appealing to reason as a source of knowledge. In more technical terms, it is a theory or method in which the criterion of truth is not empirical but intellectual and deductive. aka Causal Determinism: have been used to explain physical world, social systems, psychology etc. 11 Eastern Philosophy and Culture • Inductive Inference + Causal Reasoning originate from Western Philosophy • Ancient Eastern Societies developed different systems of thought Nisbett et al (2001): Philosophy and Epistemology are affected by the social structure • Ancient Greek Society - emphasis on personal freedom - society = union of independent individuals - tradition of public debate, and logic arguments - logical concept of right and wrong Western epistemology = analytic approach: - the World is collection of discrete objects - need to describe/ categorize these objects (via logic rules) - analytic knowledge focuses on an object’s attributes (rather than on its context, i.e. relation to other objects) 12 Ancient Chinese Society/Culture • • • • Individual is a part of a social group (community) Individual’s rights are subdued to group’s interests Emphasis on the relationships between objects (parts of the system) , i.e. harmony Harmony ~ avoidance of confrontation (i.e. public debate) Chinese epistemology = holistic approach: - knowledge is empirical and practical (no pure/abstract knowledge in Confucianism) - dialectical thought, i.e. focus on the relationships between an object and other parts of the system - the “Middle Way” solution to reconcile opposite views. That is, two opposing propositions can both contain some truth 13 Holistic vs Analytical way of thinking and epistemology: • Empirical knowledge vs first-principle knowledge + abstract logic • Relationships vs categorization (based on object’s attributes) • Multiple explanations vs search for the truth Eastern vs Western cognition: • Collection of discrete objects vs continuous changing system Example: Eastern and Western medicine • The Ying-Yang principle for dealing with contradictions: everything can be explained by interaction of two opposing energies 14 BLUE~WESTERN • RED~CHINESE Waiting in line: 15 BLUE~WESTERN • RED~CHINESE Old age: 16 BLUE~WESTERN • RED~CHINESE Conflict resolution: 17 BLUE~WESTERN RED~CHINESE • Me: 18 BLUE~WESTERN RED~CHINESE • Leader: 19 BLUE~WESTERN RED~CHINESE • Good looking skin tone: 20 3.3 Acquisition of Knowledge & Inference • • • Inference ~ the process of deriving a conclusion based on existing knowledge and/or facts (data) Inference involves interaction between mental constructs (ideas) and facts: - emphasis on ideas leads to logical inference (that dominated philosophical tradition much of human history) - emphasis on facts (experimental data) leads to empirical inference (that became popular only in 20th century) Distinction between logical and empirical inference is not clearly understood in philosophy • Predictive Learning needs empirical inference 21 Logical Inference • For idealists: knowledge corresponds to ideas (beliefs) that are consistent with true propositions. • Plato: Knowledge is what is both true and believed • Knowledge is justified true belief 22 Logical Inference and New Knowledge • Mathematical Induction ~ method of proof for an infinite number of statements: - initial guess ~ belief - the proof (math induction) makes it justified. Can new knowledge be obtained from existing knowledge alone (via logical reasoning)? • Scientific discovery requires intelligent quessing (plausible reasoning) from observations ~ empirical inference 23 Inductive Inference • Scientific Method had a major impact on philosophical understanding of inference: objectivity + repeatability experimental data • Scientific Knowledge constrained by two factors: - assumptions used to form an enquiry - empirical observations (experimental data) • Emphasis on assumptions (concepts) leads to first-principle knowledge (deterministic laws) Emphasis on empirical data leads to extracting knowledge from noisy data, or empirical inference: - estimation of reliable models from noisy data - fits the instrumental view, i.e. the goal is to estimate useful models • 24 Inductive Inference and Generalization • • • • Any scientific theory ~ generalization over finite number of observations (i.e., experiments used to confirm it) Impossible to logically justify new theory by experimental data alone need philosophical principle known as inductive inference = generalization from repeatedly observed instances (observations) to some as yet unobserved instances (Popper, 1953). Similar to psychological induction (or learning by association – Pavlov’s conditional reflex etc.) Philosophy of Science and Predictive Learning both are concerned with general strategies for obtaining good (valid) models from data - known as inductive principles in learning theory 25 Empirical Inference • Two approaches to empirical or statistical inference EMPIRICAL DATA KNOWLEDGE, ASSUMPTIONS STATISTICAL INFERENCE PROBABILISTIC MODELING • RISK-MINIMIZATION APPROACH General strategies for obtaining good models from data - known as inductive principles in learning theory 26 OUTLINE 3.1 Overview of Philosophy 3.2 Epistemology 3.3 Acquisition of Knowledge and Inference 3.4 Empirical Risk Minimization 3.5 Occam’s Razor 3.4 Popper’s Falsifiability 3.5 Bayesian Inference 3.6 Principle of Multiple Explanations 3.7 Social Systems and Philosophy 3.8 Summary and Discussion 27 Empirical Risk Minimization • • • • Two goals of inductive inference: 1. Explanation (of observed data) 2. Generalization for new data ERM is only concerned with 1st goal ERM ~ biological/ psychological induction (learning by association/ correlation) Example: continue given sequence 6, 10, 14, 18, …. 28 Empirical Risk Minimization (ERM) Given: training data (xi , yi ), i 1,2,...n and model estimates yˆ f (x, w) Find a function f (x, w ) that explains data best, i.e. n 1 minimizes total error L( y , f (x , w)) min n i 1 i k L( y, f (x, w)) is a non-negative loss function given a priori (reflects application requirements) • Why/ when inductive models estimated via ERM can generalize??? ERM does not provide guidance for model selection 29 3.5 Occam’s Razor • • Several known versions 1. William of Occam (14-th century): entities should not be multiplied beyond necessity 2. Isaac Newton: We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances 3. Modern version: KISS principle All things being equal, the simplest solution tends to be the best one Interpretation for model selection. 30 Example Applications of Occam’s Razor From Wikipedia • Ptolemaic vs Copernican model • Diagnostic parsimony in medicine: look for the fewest possible causes that will explain (account for) all symptoms Example: a patient’s symptoms include fatique and cirrhosis (of lever) Most simple explanation (hypothesis) is a drinking problem (unless patient is a practicing Muslim) 31 Limitations and misuse of Occam’s Razor • In medicine : Hickam’s dictum Patients can have as many diseases as they like! • Misuse: If you really wanted to start from a small number of entities, use the letters of alphabet, since you could certainly construct the whole of human knowledge out of them (Galileo) • Nature (and complex systems) may have its own notion of simplicity: ‘There are more things on earth, Horatio, than are dreamt of in your philosophy’ (Shakespeare) • Chatton’s Anti-Razor Principle (can be also viewed as a strategy for implementing Occam’s Razor) 32 Statistical Applications and Discussion Occam’s Razor as principle for model selection: trade-off btwn fitting error and model complexity • Analytic model selection methods • Data Compression Approach: MML • Variable Selection (feature selection for dimensionality reduction) …… Limitations • • Meaning of terms: explained, entities, necessity Successful application requires common sense and good engineering 33 3.6 Popper’s Falsifiability • • • • Karl Popper (1902-1994) Background: Vienna in early 20th century - new social theories (Marxism, socialism) - psychoanalysis (Freud) - theory of relativity (Einstein) Demarcation Principle A theory which is compatible with all possible empirical observations is unscientific. True theory can be falsified. Characteristic property of scientific method is falsifiability (rather than inductive inference) Instrumentalist View: a good theory cannot be regarded as absolute truth 34 Conditions for Scientific Theories Scientific hypothesis must satisfy two conditions • • • • Testability via empirical data Falsifiability: it must be falsifiable by some conceivable evant (fact) New scientific theory is a scientific hypothesis that also makes non-trivial(risky) predictions, not explained by old theories Examples 35 Examples of Falsifiable Theories • • • Statement: All ferrous metals are affected by magnetic fields Newton’s Law of Gravitation (falsified by Einstein Theory of General Relativity) How about pure math, i.e. Pythagorean th? - non-falsifiable metaphysical - if math is viewed as natural science, then math axioms reflect empirical evidence may be falsifiable. 36 Examples of Metaphysical Theories • Conspiracy theories: the absence of evidence is claimed as verification of the conspiracy. Examples: in politics, everyday life, social sciences • • Psychoanalysis Historicism: there exists a historical (or economic) law that determines the way in which historical events proceed. cannot be falsified. 37 Statistical Use: Connection to Occam’s Razor • Interpretation of Complexity: complexity ~ degree-of-falsifiability Popper justifies Occam’s Razor by a more general Principle of Falsifiability: • For given data set, choose the model explaining this data well (testability) and having large falsifiability • Discussion: concepts ‘testability’ and ‘falsifiability’ lack quantitative meaning 38 OUTLINE 3.1 Overview of Philosophy 3.2 Epistemology 3.3 Acquisition of Knowledge and Inference 3.4 Empirical Risk Minimization 3.5 Occam’s Razor 3.4 Popper’s Falsifiability 3.5 Bayesian Inference 3.6 Principle of Multiple Explanations 3.7 Social Systems and Philosophy 3.8 Summary and Discussion 39 Bayesian Inference & Max Likelihood Thomas Bayes(1702-1761) How to update/ revise beliefs in light of new evidence Scientific Method: Collecting empirical evidence (data) E that may be consistent (or inconsistent) with the original hypothesis (model) H0 . Given prior probability P(H0) calculate posterior probability as: 40 P(E | H0) ~ the likelihood of observing E given hypothesis H0 P(E) ~ the marginal probability of E, i.e. probability of observing new evidence under all mutually exclusive hypotheses, i.e. P(E) = P(E/H0) P(H0) + P(E | not H0) P(not H0) P(H0 | E) ~ the posterior probability of hypothesis H0 given E. 41 Example: medical diagnosis A patient goes to see a doctor. A test is performed such that: - if a patient is sick, the test result is positive (with prob. 0.99) - if a patient is healthy, the result is negative (with prob. 0.95). The patient tests positive. Assuming that only 0.1 percent of all people in the country are sick, what are the chances this patient is sick? Hypotheses: Patient Sick ~ S; Patient Healthy ~ H Evidence (Data): Test Positive (+); Test Negative (-). Bayesian probabilities: prior probability P(S) = 0.001 likelihood P(+/S) = 0.99 likelihood P(-/H) = 0.95 The “chance that patient is sick” = the prosterior probability P(S/+): P( / S ) P( S ) P( S / ) P() Calculations yield P() P( / H ) P( H ) P( / S ) P( S ) P( S / ) 0.99 0.001 0.02 0.05 0.999 0.99 0.001 42 Example: Bayesian criminal justice Bayesian inference for jury trial: to incorporate the evidence for and against the guilt of the defendant. Consider the following possible events: G ~ the defendant is guilty E ~ the defendant's DNA matches DNA found at the crime scene. Also denote the following probabilities: P(E | G) is the probability of observing E assuming G (that the defendant is guilty). This probability is usually taken to be 1. P(G) is the prior probability: subjective belief of the jury that the defendant is guilty, based on the evidence other than the DNA match. This could be based on previously presented evidence. Let us assume that P(G)=0.3 P(E) can be found as P(E) = P(G) P(E/G) + P(notG) P(E/notG) where the probability that a person chosen at random would have DNA that matched that at the crime scene was 1 in a million, i.e., P(E/notG) = 10^^-6. 43 Bayesian criminal justice (cont’d) Then the posterior probability P(G | E), that the defendant is guilty assuming the DNA match event E, equals where P(E) = P(G) P(E/G) + P(notG) P(E/notG) Thus Bayesian jury can revise its opinion to account for DNA: Bayesian approach can be applied successively to incorporate various pieces of evidence. 44 Maximum A Posterior Probability (MAP) for model selection • Several hypotheses (models) Hi describing the same evidence (data) Calculate posterior probabilities for each Hi, and choose the one with MAP Example: number of students using drugs in high school H1: 100% used H2: 75% used + 25% never_used H3: 50% used + 50% never_used H4: 25% used + 75% never_used H5: 100% never_used Survey of 10 randomly chosen students: never_used 45 MAP for model selection (cont’d) • • • Prior Probabilities (of each Hi) are given as: P(H1)=0.1 P(H2)=0.2 P(H3)=0.4 P(H4) =0.2 P(H5)=0.1 The likelihood of observing a student who Never_used under each Hi : P(N/H1) = 0 P(N/H2) = 0.25 P(N/H3) = 0.5 P(N/H4) = 0.75 P(N/H5) = 1 The posterior probabilities of each Hi can be updated sequentially after each observation j as: P( H i | x j ) P( x j | H i ) P( H i ) P( x j ) i 1,...,5 j 1,...,10 5 P( x j ) P( x j M ) P( M / H i ) P( H i ) 0.5 i 1 46 MAP for model selection (cont’d) Posterior probabilities P( H j | x j ) for each j shown below From the final values (after 10 observations) select H5 according to Maximum A Posterior Probability (MAP) 47 Principle of Multiple Explanations Epicurus of Samos: If more than one theory is consistent with the observations, keep all theories • Epicurean philosophy: explains everything in terms of pleasure (good) and pain (bad) • Epistemology: emphasizes sensory inputs Combine several theories explaining the data 48 Examples On the Nature of Things by Lucretius: - Provides explanations to natural events and social customs (i.e. naïve ‘scientific’ approach) Behavior of the Moon (natural phenomenon): As for the moon, it may be that she owes her brilliance to the impingement of solar rays, and that day by day, as she recedes farther from the sun's disk, she turns her light more fully toward our view, until, when exactly opposite him, she shines with her fullest splendor and, as she soars high above the horizon, sees his setting. Then she must hide her light little by little behind her, as she glides nearer to the sun's fire, moving through the zone of the zodiac from the opposite side. This is the view of those who suppose the moon to be a spherical body, whose course is lower than that of the sun. 49 Examples (cont’d) On the subject of sex: Venus united the bodies of lovers in the woods. The woman either yielded from mutual desire, or was mastered by the man's impetuous might and inordinate lust, or sold her favors for acorns or berries or choice pears. (Lucretius, v. 963) Note: keep all reasonable explanations 50 Fuzzy logic and fuzzy inference • Principle of multiple explanations is similar to fuzzy logic Light 50 100 Heavy 150 200 250 Boolean logic demands crisp membership Light 50 100 Heavy 150 200 250 Fuzzy logic allows partial membership 51 Fuzzy inference • Given several fuzzy rules: IF X IS ABOUT Xa THEN Y IS ABOUT Ya etc. • How to combine into a single model? y yB yC yA xA xB xC x 52 Bayesian Averaging • Several hypotheses (models) Hi describing the same evidence (data) Calculate posterior probabilities for each Hi, and then combine all hypotheses. Example: fraction of students who Never_used drugs H1: 100% used H2: 75% used + 25% never_used H3: 50% used + 50% never_used H4: 25% used + 75% never_used H5: 100% never_used Survey of 10 randomly chosen households: all Never_used 53 Bayesian averaging (cont’d) • Posterior probabilities P( w 0 | X ) 0 P( H j | X ) P( w 0.25 | X ) 0 P( w 0.75 | X ) 0.1 for each j are P( w 0.5 | X ) 0.0035 P( w 1) 0.8956 • Keep all hypotheses by Bayesian averaging. That is, use the weighted average of all hypotheses as the best prediction 0% * P( w 0) 25% * P( w 0.25) 50% * P( w 0.5) 75% * P( w 0.75) 100% * P( w 1) 92.5% • This example shows Bayesian averaging approach to estimating the most likely (‘true’) value of unknown parameter, as a weighted combination of ‘plausible’ parameter values. 54 OUTLINE 3.1 Overview of Philosophy 3.2 Epistemology 3.3 Acquisition of Knowledge and Inference 3.4 Empirical Risk Minimization 3.5 Occam’s Razor 3.4 Popper’s Falsifiability 3.5 Bayesian Inference 3.6 Principle of Multiple Explanations 3.7 Social Systems and Philosophy 3.8 Summary and Discussion 55 Modeling Social Systems Recall two main philosophical interpretations of scientific theories • REALISM: - objective physical reality perceived via senses - mental constructs reflect objective reality - for first-principles knowledge • INSTRUMENTALISM: - the goal of science is to produce useful theories - where ‘usefulness’ can be objectively verified (that is, experimentally measured) 56 • Can Realism or Instrumentalism be used to model social systems ??? • Deterministic (mechanical) approach: - yes, social systems are governed by objective laws independent of our current knowledge. Examples: most economic theories (efficient markets), historicism • Instrumentalist approach: - social systems are described via data-driven models derived from historical data + self-evident assumptions. • Alternative View (G. Soros): social systems are fundamentally different from natural systems 57 G. Soros: philosophy and epistemology • Personal History - survivor of Holocaust (1940’s) - student of Karl Popper (1950’s) - successful speculator 70’s - philantropist and philosopher • G. Soros’ philosophy: - falsifiability as a guiding principle (in his own life) - disagreed with Popper about equal treatment of natural and social sciences - developed new theory for understanding and modeling social systems, and successfully applied this theory to financial markets - applied the same philosophical principles to his philantropic and political activities. 58 Reflexive Property of Social Systems • • • REALISM + INSTRUMENTALISM - passive (human) observers SOCIAL SYSTEM - observers ~ active participants Reflexive property of social systems (G. Soros) 59 Epistemology and Reflexivity • • • • • Scientific knowledge: is independent of the system and of the people who apply it. Social systems: involve thinking participants who try to understand the world and participate in it Cognitive function: reality affects mental models Participating function: human decisions affect reality Reflexive property our understanding of social systems is inherently imperfect (flawed) 60 Reflexivity and falsifiability • Popper the same methods and criteria (falsifiability) apply to both natural and social sciences • Soros “Social science” is an oxymoron since reflexive events cannot be explained by universally applicable laws (i.e., first-principle laws) • Reflexivity in Social Systems is not a scientific theory, but Soros claims its validity in: - recognizing the difference btwn social and natural phenomena - practical utility for making real-life deciisions - important role of reflexivity in financial markets (see The Alchemy of Finance by G. Soros) 61 Reflexivity in financial markets • Reflexivity the biases of market participants may change the fundamentals of the economy. • This leads to the state of disequilibrium (where conventional economic theory does not apply). • Investors' participation in the capital markets may influence valuations AND fundamental conditions. Examples: equity leveraging, trend-following • Distortion between mental models and reality (actual state of economy) leads to ‘boom-and-bust’ cycles. 62 Reflexive systems and predictive learning • According to Soros: reflexive systems (i.e. financial markets) can not be modeled quantitatively • Predictive Learning approach can be still used, as long as one can specify learning formulations meaningful from practical (investment) point of view. Examples - market timing of international mutual funds - estimating NAV of domestic mutual funds • Explanation (of contradiction): - Soros only considers distinction between scientific (first-principles) and metaphysical theories - He does not discuss predictive models (derived from empirical data) 63 Examples of reflexive systems • • • Mental constructs are (potentially) flawed - these misconceptions help to shape the events (reality) in which we participate - mental models and reality can be out of sync Personal Life: falling in love - the other person’s feelings (towards you) are influenced by your own feelings/actions - love depends on participants’ beliefs - changing beliefs: marriage divorce History and social systems: - Marxism, communism - terrorist ideology 64 Example: reflexivity in economic systems • Recent housing Boom-Bust cycle - started in 2001-2002 (after the stock market bubble) - enthusiastic home buyers and speculators - fueled by easy credit and lax lending irrational exuberance • Housing bust (2007 – 2010) - over-leveraged homeowners - increase in mortgage payments (ARMs) - overbuilding/ oversupply of homes for sale - decrease in home values (10-20%) - near collapse of US banking system in late 2007 Distortion between mental models (real-estate prices can only go up) and reality (~ disequilibrium) 65 • Housing bust - national median rent $943 (in 2006) - national median mortgage payment $1,566 15-20% rise in home foreclosures (2007) • Subprime lending industry - no credit check - no down payment - risks compensated by higher interest rate • Packaging (hiding) risk into new tradable financial instruments 66 Real estate bust: rise in foreclosures (WSJ, Feb 17, 2007) 67 OUTLINE 3.1 Overview of Philosophy 3.2 Epistemology 3.3 Acquisition of Knowledge and Inference 3.4 Empirical Risk Minimization 3.5 Occam’s Razor 3.4 Popper’s Falsifiability 3.5 Bayesian Inference 3.6 Principle of Multiple Explanations 3.7 Social Systems and Philosophy 3.8 Summary and Discussion 68 Summary and Discussion • • Many useful ideas and concepts Can be readily interpreted for statistical learning • Limitations: lack of well-defined math framework, i.e. - dealing with uncertainty - quantifying ‘complexity’, ‘falsifiability’ • A. Bierce, The Devil’s Dictionary Philosophy: A route of many roads leading from nowhere to nothing 69 References on Philosophy • • • • • • P. Bernstein, Against the Gods: the remarkable story of risk, Wiley 1998 B. Russell, A History of Western Philosophy, Simon & Schuster Popper Selections, edited by D. Miller, Princeton University Press, 1985 K. Popper, The Logic of Scientific Discovery, Routledge, 2002 G. Soros, The Alchemy of Finance, Wiley, 2003 G. Soros, Soros on Soros: Staying Ahead of the Curve, Wiley, 1995 70