REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from 1972) on my website www.thomashoskynsleonard.co.uk Refers to technical material in my book Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers (1999, with John S.J. Hsu) Cambridge University Press See also my academic life story The Life of a Bayesian Boy. Self-published on my website Slides prepared by Thomas Tallis Among competing (plausible) hypotheses, the hypothesis with the fewest assumptions should be selected. (WILLIAM OF OCKHAM) In other words: Keep things simple, and cut out extraneous information OCCAM’S RAZOR (William of Ockham, c1287-1347) FOR EXAMPLE:: Use parameter parsimonious sampling models which depend upon on low numbers of unknown parameters (e.g. which minimise AIC or DIC) Contrasts with: ‘A model should be as big as an elephant’ (Leonard ‘Jimmie’ Savage, 1954, Lindley, 1983) Agrees with: ‘The greater the amount of information the less you actually know’ (Toby Mitchell, c 1980) Related to: E.T. Jaynes’ extremely valuable idea (1957 and 1968) of choosing the ‘maximum entropy’ prior distribution when only p summaries of the prior information are specified. Pascal Fermat Blaise Pascal (1623-1662) formulated ‘Pascal’s Wager’ by reference to the notion of subjective probability. Pascal corresponded with Pierre de Fermat about the potential development of probability theory. In 1654, Pascal and De Fermat (1601 or 1607 -1665 ) together solved the problem of ‘points’ or ‘division of stakes’. In 1657, Christian Huygens discussed the Pascal –De Fermat debate, in De rationiciis in ludo aleae Daniel Bernoulli (1700-1782) Swiss physician, doctor and mathematician. Formalised subjective view of probability, decision making and risk. Introduced concept of EXPECTED UTILITY in 1738 in historic paper published in St Petersburg Used the St PETERSBURG PARADOX to justify maximising expected utility. Daniel Bernoulli (where the expected reward from the specified betting scheme is infinite, but most punters would only want to place a small bet on the outcome because of the high probability of a low return) Educated (from age 12) at University of Edinburgh Sceptical views about causality in 1739-41 trilogy between 1723 and 1725 Questionable cause fallacy----The false assumption that correlation proves causality Subjective probability discussed in Ch 6 of his 1748 book David Hume F.R.S.E (1711-1776) Author of is-ought problem or Hume’s guillotine Significant difference between descriptive statements (about what is) and prescriptive statements (about what ought to be) Not obvious how to get from descriptive statements to prescriptive ones Hume’s Law: You can’t derive an ought from an is “A midget on the shoulders of giants like Hume and Huygens’ (Tom Leonard, 2014) Studied for Presbyterian Ministry at University of Edinburgh between 1719 and about 1722. Probably derived continuous version of ‘Bayes’ Theorem’ during the 1740’s while a wealthy, wellconnected minister in Tunbridge Wells, with a serious demeanour and happy disposition. Rev. Thomas Bayes (1701-1763) The Notebook of Thomas Bayes (1747-1760) contains a section on probabilities. In his tract In defence of Isaac Newton (1736, printed by John Noon), sold for a shilling, Bayes writes, To suspect Isaac Newton of the mean design of seeking reputation among the ignorant by venting unintelligible notions, and defending them by artful cunning and cunning artistry, is what no man is capable of doing. Moral philosopher, inductive thinker, and political activist in support of American Revolution. In 1763, Richard Price published Bayes’ paper ‘An Essay towards solving a Problem in the Doctrine of Chances’, posthumously, in the Proceedings of the Royal Society of London. Bayes solved a complicated ‘Ball tossing problem’ involving n non-independent trials and with applications in life assurance. His mathematical solution was brilliant, but counterintuitive. Rev. Richard Price F.R.S.(1723-1791) *** He posed this as a special case of: Obscurely Worded General Problem: Given the number of times (n) an unknown event has happened and failed, REQUIRED the chance that the probability (ξ) of its happening in a single trial lies somewhere between any two degrees of probability that can be made? A further special case (n=50 independent Bernoulli trials---see Bayes Appendix): If you fail to win a lottery on n=50 occasions, with equal chance ξ of winning on reach occasion, then what is the chance that you probability ξ of winning it on the 51st attempt lies between 0.001 and 0.01? A young Bayesette VERY SPECIAL CASE (n=1) If a mother’s first baby is a girl, then what is the chance that the probability ξ that her second baby is a boy lies between 0.5 and 1? Note that probability (girl on first birth, given ξ ) = 1-ξ Therefore LIKELIHOOD FUNCTION OF ξ is L (ξ, given girl on first birth) = 1-ξ for 0< ξ <1 In general, the likelihood of the unknown parameters is the assumed sampling density or probability mass function of the observations but expressed as a function of the unknown parameters, given the observations actually observed. Initiated the ‘Savageous’ philosophy of Bayesian Statistics THE BAYESIAN PARADIGM Posterior information=Prior Information + Sampling Information. ($$$) A Bayesian is somebody who tries to represent his prior information about ξ by a probability distribution on ξ BAYES THEOREM (Continuous case): POSTERIOR DENSITY = K x PRIOR DENSITY x LIKELIHOOD LEONARD ‘JIMMIE’ SAVAGE (1917- 1971) where K can be calculated by noting that posterior density integrates to unity across the parameter space. However, in his 1763 paper, Bayes assumed a uniform prior distribution on (0,1) for ξ, in which case POSTERIOR DENSITY=K x LIKELIHOOD POSTERIOR DENSITY OF PSI In preceding very special case, Posterior density of ξ , given girl on first birth = (1-ξ)/2 (0<ξ<1) (*) D E N S I T Y Posterior mean of ξ =predictive probability that next baby is a boy= 1/3 and P (0.5 <ξ <1, given girl on first birth) =1/4 If first n babies are girls, then predictive probability that next baby is a boy is 1/(n+2) PSI French Astronomer, Mathematician, and Politician Minister in Napoleon’s Government FOUNDING FATHER OF BAYESIAN STATISTICS AND DATA ANALYSIS In 1774, his Memoir on the Probability of the Causes of Events Included a Bayesian analysis of the causes of events. In 1812, his Analytic Theory of Probabilities contained a number of detailed statistical analyses. He introduced a general version of Bayes’ theorem that Le Marquis Pierre’ Simon de includes the discrete and multiparameter cases. Laplace (1749-1827) Applied it to ANALYZE DATA in celestial mathematics, MEDICAL STATISTICS, reliability and jurisprudence. Developed LAPLACE’S APPROXIMATION to multidimensional integrals And LAPLACE TRANSFORMATIONS (moment generation functions) Scottish moral philosopher and leading political economist. The Wealth of Nations , 1776 Rejected the idea that: Demand must be related to utility i.e. the more useful a thing is, and the more satisfaction it gives, the more people would be willing to pay for it. Adam Smith (1723-1794) THE PARODOX OF DIAMONDS AND WATER Water is necessary for life, and yet very cheap Diamonds have little utility, and are yet very costly. Smith thereby concluded that willingness to pay is not related to utility. Adam Smith proposed using interval bounds for probabilities, rather than precisely specified subjective probabilities British philosopher, jurist and social reformer. Regarded by some as the father of modern utilitarianism, and by others, in the context of banking, insurance, and speculation, as the founder of the subjectivist, Bayesian approach to decision making. (Bentham’s approach to subjective probability is an earlier version of the exact, linear approach recommended as being rational by Tversky and Kahnemann). Introduction to Principles of Morals and Legislation, 1780 GREATEST HAPPINESS PRINCIPLE: Jeremy Bentham (1748-1832) It is the greatest happiness of the greatest number which is the principle of right or wrong. Classification of 12 pains and 14 pleasures by which we may test the happiness factor of any action. Formalised set of criteria for measuring the extent of pain or pleasure that any decision will create. Reviewed concept of punishment, and whether a particular punishment will create more pain or pleasure for society. Bentham applied similar ideas to monetary economics. Anglo-Indian mathematician, statistician and spiritualist. Appointed to Chair of Mathematics at University of London (later UCL) in 1838 See his Essay on Probabilities (1838) De Morgan further developed Bayes’s and Laplace’s approach to INVERSE PROBABILITY... Augustus De Morgan (1806-71) Posterior probabilities when the prior distribution is uniform. Somewhat arbitrary e.g. a uniform prior for a non-linear transformation of the parameter will give different posterior. Uniform priors over on continuous unbounded parameter space are improper, but can, though not always, yield meaningful proper posteriors. De Morgan sought to justify uniform prior by Laplace’s Principle of Insufficient Reason Florence Nightingale (1820-1910) Nurse and statistician For remainder of 19th century (A) Many statistical scientists (e.g. Gauss, Edgeworth, Galton) thought Bayesian (B) Inverse probabilities remained the main methodology for statistical Inference. Fisher dabbled with then in the early 20th century and discarded them because of the arbitrariness in the choice of uniform prior. (C) Emphasis seemed to shifted somewhat to numerical and graphical summaries of data. e.g. London Cholera epidemic map (1832) and Crimean War (Florence Nightingale, e.g. pie charts) English geneticist, statistician and polymath, a truly great man of science In 1877 built machine called GALTON QUINCUNX Used simulations while attempting to calculate posterior distribution Galton encouraged use of Bayes Theorem Sir Francis Galton (1822-1911) Informative conjugate analysis for normal distribution developed around that time. American philosopher, logician, mathematician and scientist. ‘The father of pragmatism’ Emphasised that objective statistical conclusions can only be hoped for if the data result from a randomised experiment. Was the first scientist to elicit subjective probabilities in experimental psychology. French Military Officer 1894 TRIAL OF MILLENIUM Dreyfus tried for treason Bizarrely justified subjective ‘probability’ of forgery. Falsely convicted of transmitting military secrets to Germany. Probability related to possible coincidences concerning frequencies of symbols in the code. Alfred Dreyfus 9 October 1859 – 12 July 1935) ‘SIMILAR PROBLEMS OCCUR TODAY WHENEVER STATISTICAL EVDENCE AND SUBJECTIVE PROBABILITIES ARE INTRODUCED INTO EVIDENCE’ David H. Kaye, Minnesota Law Review (2007) O.J. Simpson murder case, Adam’s Rape Case, Sally Clark Cot Death Case See also D.H. Kaye (2010) DNA identification and the threat to civil liberties. Yale University Press British mathematician, philosopher and economist 1926 papers on subjective probability and utility were encouraged by the economist John Maynard Keynes His work on subjective probability and its elicitation satisfied Charles Peirce’s empirical test. Used by experimental psychologists and recognised in 1944 by Von Neumann and Morgenstern, in their book The Theory of Games and Economic Behaviour Famously used utility theory to judge ‘how much of its Frank Ramsey (1903-1930) wealth a nation should spend’ Close friend of philosopher Ludwig Wittgenstein whose works he translated Never stay up on the barren heights of cleverness, but come down into the green valleys of silliness Highly eccentric English statistician, evolutionary biologist, geneticist and eugenics One of the chief architects of neo-Darwinian synthesis Galton Professor of Eugenics at UCL (1933-43) Argued with Karl Pearson e.g, about who should teach which course. Dabbled with Bayesian inference and inverse probability, then argued vehemently against it because of its dependence on prior e.g. the choice of ‘vague’ so-called ignorance prior. Sir Ronald Fisher (1990-1962) Introduced FIDUCIAL INFERENCE in paper in Annals of Eugenics (1935).Disputed by Neyman and shown by Lindley in 1958 to violate Kolmorogov’s addition laws of probability. Baron Keynes of Tilton Cambridge Economist Employed expected utility in 1936 in Chapter 12 of The General Theory of Employment, Interest and Money. Keynesian Economics has fundamentally affected the theory and practice of modern macroeconomics, and influenced the policies of governments, until about 1979, until the ideas of Milton Friedman, who also used expected utility, took over. John Maynard Keynes (1883-1946) Cambridge-based Mathematician, Statistician, Geologist and Astronomer The Theory of Probability (1939) Precursed Anglo-American Bayesian Revival of 1960s Led by Rudolf Kalman, Raiffa and Schlaifer, Mosteller and Wallace, Box and Tiao, John Aitchison F.R.S.E and Dennis Lindley. INCLUDED: Invariance priors---Vague priors which refer to the determinant of Fisher’s Information and yield Sir Harold Jeffreys F.R.S. posterior distributions which are invariant under non(1891-1989) linear transformations of the parameters. Approximate Bayes intervals (also approximate confidence intervals) centred on the maximum likelihood estimate, which also refer to the likelihood dispersion. Pre-eminent Russian Mathematician and Probabilist Introduced concept of Bayesian sufficiency in his paper on the statistical estimation of the law of Gauss in !942 in URSS Bulletin of the Academy of Sciences. Kolmogorov’s Extension Theorem constrains us to only defining our probability distributions on measurable subsets of the parameter space or sample space (i.e. those which are elements of an appropriate sigmafield, such as a Borel field) Andrey Kolmogorov (1903-1987) Alan Turing (1912-1954) Irving Jack Good (1916-2009 ) Alan Turing: Gay icon and martyr, father of machine intelligence, modern computer science and artificial intelligence. Also the father of modern Bayesian applied statistics. Jack Good: cryptanalysist, mathematician, statistician and philosopher. While solving the Nazi codes at Bletchley Park, Turing and Good used various pioneering, effectively Bayesian procedures including •Empirical alternatives to Bayes factors as measures of evidence •Effectively Bayesian sequential analysis and decision-tree analysis •Shrinkage estimators for multinomial cell probabilities, which smooth the relative frequencies of the letters in the German code towards a common value, Thomas Tallis 1988-NotDeadYet Adam Empirius Logan "If Bayesians live to be a hundred they think they think they've got it made, Very few people die past that age." If we deduce that knowledge comes from irrationality and out of rationality comes rationality then we must also deduce that most of our conventional knowledge derives from the senses and that every rational saying is a pragmatic lie (Adam Logan, Farewell Halcyon Days, 2013)