Random Variables & Entropy: Extension and Examples Brooks Zurn EE 270 / STAT 270 FALL 2007 Overview • Density Functions and Random Variables • Distribution Types • Entropy Density Functions • PDF vs. CDF 1 0.9 0.8 0.7 0.6 0.5 PDF 0.4 CDF 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 – PDF shows probability of each size bin – CDF shows cumulative probability for all sizes up to and including current bin – This data shows the normalized, relative size of a rodent as seen from an overhead camera for 8 behaviors Markov & Chebyshev Inequalities • What’s the point? • Setting a maximum limit on probability • This limits the search space for a solution – When looking for a needle in a haystack, it helps to have a smaller haystack. • Can use limit to determine the necessary sample size Markov & Chebyshev Inequalities • Example: Mean height of a child in a kindergarten class is 3’6”. (Leon-Garcia text, p. 137 – see end of presentation) – Using Markov’s inequality, the probability of a child being taller than 9 feet is <= 42/108 = .389. there will be fewer than 39 students over 9 feet tall in a class of 100 students. Also, there will be NO LESS THAN 41 students who are under 9’ tall. -Using Chebyshev’s inequality (and assuming the variance = 1 foot) the probability of a child being taller than 9 feet is <= 122/1082 = .0123. there will be no more than 2 students taller than 9’ in a class of 100 students. (this is also consistent with Markov’s Inequality). Also, there will be NO LESS THAN 98 students under 9’ tall. This gives us a basic idea of how many student heights we need to measure to rule out the possibility that we have a 9’ tall student… SAMPLE SIZE!! Markov’s Inequality For a random variable X >= 0, E[ X ] P{ X c} c Derivation: E[x]=, where fx (x)=P[x-e/2£X£x+e/2]/e Assuming this also holds for X = a, because this is a continuous integral. Markov’s Inequality Therefore for c > 0, the number of values of x > c is infinite, therefore the value of c will stay constant while x continues to increase. Markov’s Inequality References: Lefebvre text. Chebyshev’s Inequality P{ Y E[Y ] c} Derivation (INCOMPLETE): 2 c 2 , c 0 Chebyshev’s Inequality As before, c2 is constant and (Y-E[Y])2 continues to increase. But, how do fy|Y-E[Y]| and fY (Y-E[Y])2 relate? (|Y-E[Y]|)2 = (Y-E[Y])2 As long as Y – E[Y] is >= 1, then u2 will be > u and the inequality holds, as per Markov’s Inequality. Note: this is not a rigorous proof, and cases for which Y – E[Y] < 1 are not discussed. Reference: Lefebvre text. Note • These both involve the Central Limit Theorem, which is derived in the Leon-Garcia text on p. 287. • Central Limit Theorem states that the CDF of a normalized sequence of n random variables approaches the CDF of a Gaussian random variable. (p. 280) Overview • Entropy – What is it? – Used in… Entropy • What is it? – According to Jorge Cham (PhD Comics), Entropy • “Measure of uncertainty in a random experiment” Reference: Leon-Garcia Text • Used in information theory – Message transmission (for example, Lathi text p. 682) – Decision Tree ‘Gain Criterion’ • Leon-Garcia text p. 167 • ID3, C4.5, ITI, etc. by J. Ross Quinlan and Paul Utgoff • Note: NOT same as the Gini index used as a splitting criterion by the CART tree method (Breiman et al, 1984). Entropy • ID3 Decision Tree: Expected Information for a Binary Tree q E ( A) s1j s 2j ... s nj j 1 s I ( S 1j , S 2j ,..., S nj ) where the entropy I is n I ( S1 , S 2 ,..., S n ) pi log 2 pi i 1 E(A) is the average information needed to classify A. • ITI (Incremental Tree Inducer): • -Based on ID3 and its successor, C4.5. -Uses a gain ratio metric to improve function for certain cases Entropy • ITI Decision Tree for Rodent Behaviors – ITI is an extension of ID3 Reference: ‘Rodent Data’ paper. Distribution Types • Continuous Random Variables – Normal (or Gaussian) Distribution – Uniform Distribution – Exponential Distribution – Rayleigh Random Variable • Discrete (‘counting’) Random Variables – Binomial Distribution – Bernoulli and Geometric Distributions – Poisson Distribution Poisson Distribution P{ X n} e n and n! n ( z ) PX ( z ) e e ( z 1) n 0 n! • Number of events occurring in one time unit, time between events is exponentially distributed with mean 1/a. • Gives a method for modeling completely random, independent events that occur after a random interval of time. (Leon-Garcia p. 106) • Poisson Dist. can model a sequence of Bernoulli trials (Leon-Garcia p. 109) – Bernoulli gives the probability of a single coin toss. References: Kao text, Leon-Garcia text. Poisson Distribution • http://en.wikipedia.org/wiki/Image:Poisson_distribution_PMF.png References • Lefebvre Text: • Kao Text: • Lathi Text: • Entropy-Based Decision Trees: • Other Decision Tree Methods: • Rodent Data: • Poisson Distribution Example: – Applied Stochastic Processes, Mario Lefebvre. New York, NY: Springer., 2003 – An Introduction to Stochastic Processes, Edward P. C. Kao. Belmont, CA, USA: Duxbury Press at Wadsworth Publishing Company, 1997. – Modern Digital and Analog Communication Systems, 3rd ed., B. P. Lathi. New York, Oxford: Oxford University Press, 1998. – ID3: P. E. Utgoff, "Incremental induction of decision trees.," Machine Learning, vol. 4, pp. 161-186, 1989. – C4.5: J. R. Quinlan, C4.5: Programs for machine learning, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993. – ITI: P. E. Utgoff, N. C. Berkman, and J. A. Clouse, "Decision tree induction based on efficient tree restructuring.," Machine Learning, vol. 29, pp. 5-44, 1997. – CART: L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth. 1984. – J. Brooks Zurn, Xianhua Jiang, Yuichi Motai. Video-Based Tracking and Incremental Learning Applied to Rodent Behavioral Activity under Near-Infrared Illumination. To appear: IEEE Transactions on Instrumentation and Measurement, December 2007 or February 2008. – http://en.wikipedia.org/wiki/Image:Poisson_distribution_PMF.png Questions?