Data analysis Ben Graham MA930, University of Warwick October 5, 2015 Data Analysis I George Box: All models are wrong, some models are useful. I Corollary: This model is wrong, therefore it is useful. I Peter Norvig (not): All models are wrong, and increasingly you can succeed without them. Statistics I What is a statistic? I Why is this course not called statistics? I Data I Designed experiments I Observed data I Big data I Small data I Summary statistics (mean, median, mode, min, max, ....) I Graphs I Probabilistic models I Seek the model parameters I I Frequentist statistics Bayesian statistics Key principles of statistics I Taking averages is good. I Correlation does not imply causation (missing covariates, Simpson's paradox). I Interpolation good; extrapolation bad. I Can you play catch? Machine learning * I Supervised learning I Learning to approximate high dimensional functions I Boring? Includes a huge range of problems. I Neural networks, decision trees, random forests, support vector machines, bagging and boosting. I Unsupervised learning I Clustering I PCA, LLE, RBMs I Dimensionality reduction I Simplify correlation structures of data? Problems with English From the FT: Linda is single, outspoken, and deeply engaged in social issues. Which of the following is more likely? 1. That Linda is a bank manager. 2. That Linda is a bank manager who is an active feminist. Set theory Denition 1.1.1. The set, S , of all possible outcomes of a particular experiment is called the sample space for the experiment. I Coin toss I Sequence of coin tosses I Two children, at least one of them a boy. I Waiting time at a red trac light. I Waiting time passing a trac light. Events Denition 1.1.2 An Event is any collection of possible outcomes of an experiments, that is any subset of S. Includes ∅, {x } for every x ∈ S, and S . How many events when you I toss a coin I roll a die De Morgan's Laws (A ∪ B )c = Ac ∩ B c (A ∩ B )c = Ac ∪ B c Proof ? Disjoint events Denition 1.1.5 Two events A and B are disjoint if A ∩ B = ∅. Denition 1.1.6 If A1 , A2 , . . . are a collection of pairwise disjoint events, and if ∪i Ai = S then A1 , A2 , . . . form a partition of S . Axioms of Probability Def 1.2.1 A collection of events is called a 1. ∅∈B denoted B c ∈ B (closed under complements) ∞ A ∈ B (closed under countable A1 , A2 , · · · ∈ B , then ∪ i =1 i 2. If A 3. If σ -algebra, ∈B then A unions). Examples I Toss a coin I Roll a die I Roll a die to see if you get a 6. if Probability space Def 1.2.4 Given S and P : B → [0, 1] 1. P (A) ≥0 2. P (S ) =1 B, a probability function is a function s.t. for all A 3. If A1 , A2 , · · · ∞ A ) P (∪ i =1 i ∈B ∈PB are pairwise = i P (Ai ). disjoint, then Examples I Toss a coin I Roll a die to see if you get a 6. I Circle 2 [0, 1] {(x , y ) : (x − 0.5)2 + (y − 0.5)2 ≤ 1} . in the unit square National Lottery counting I There are 49! = 1 × 2 × · · · × 48 × 49 ways to pick 49 balls in order (without replacement). I If we only pick 6 balls, there are 49 × 48 × · · · × 44 × 5 × ··· × 1 6 possibilities. Def 1.2.17 Binomial coecients n = r n choose r = ! r !(n − r )! n ways of picking r objects from n objects. Conditional probability Def 1.3.2 I Events I , A B ∈ B. ( ) > 0. P B I The conditional probability of A ( | B) = P A P ( · | B) given B is ( ∩ B) P (B ) P A satises the axioms for being a probability measure! Bayes Rule By the denition of conditional probability: ( | B ) = P (B | A) P A ( ) ( ) P A P B Theorem 1.3.5: Let I , A1 A2 I Let B ,... partition the sample space S , be any event (P (B ) > 0), P (B | Ai )P (Ai ) ( i | B ) = P∞ j =1 P (B | Aj )P (Aj ) P A Or if A1 = A, A2 = Ac , ( | B) = P A ( | A)P (A) | A)P (A) + P (B | Ac )P (Ac ) P B ( P B Independence Def 1.3.7: Two events A and B are independent if ( ∩ B ) = P (A)P (B ). P A Can an event be independent of itself ? c Is A independent of B ? Def 1.3.12: A collection of events A1 , A2 , . . . is independent if for every n element subset Ai1 , . . . , Ain P n Y n ∩j =1 Ai = P j =1 j A i . j Do not confuse this with pairwise independence! Do not think about pairwise independence!!! Random variables Def 1.4.1: A random variable is a function X : S → R. I Represent something random like rolling a die I Not actually random themselves, I also not actually variables, on account of being functions. Examples I Toss a coin (1 for H, 0 for T) I Toss n coins and count the number of H I Toss a coin repeatedly: count how many H before the rst T. CDF - cumulative distribution function Def 1.5.1 X (x ) = P (X ≤ x ) F I cadlag: continue à droite, limite à gauche I Left limit 0 I Right limit 1 I non-decreasing Examples I Roll a die I Trac lights waiting time. I Radioactive decay Density and mass functions Def 1.6.1: Discrete r.v. - probability mass function X (x ) = P (X = x ) f for all x Def 1.6.3: Continuous r.v. probability density function fX (x ) satises ( P X ≤ x ) = FX (x ) = ˆ x X (x )dt f −∞