Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information -- from Science 16 December 2011 The definition of Shannon entropy In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits. In this context, a 'message' means a specific realization of the random variable. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable. The concept was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication". How do we measure information in a message? Definition a message: a string of symbols. The following was Shannon’s argument: From Shannon’s ‘A mathematical theory of communication’ Shannon entropy was established in the context of telegraph communication Shannon’s argument to name H as the entropy: Some Properties of H: •The amount of entropy is not always an integer number of bits. •Many data bits may not convey information. For example, data structures often store information redundantly, or have identical sections regardless of the information in the data structure. •For a message of n characters, H is larger when a larger character set is used. Thus ‘010111000111’ has less information than ‘qwertasdfg12’. •However, if a character is rarely used, its contribution to H is small, because since p * log(p) 0 as p0. Also, if a character constitute the vast majority, e.g., ‘10111111111011’, the contribution of 1s to H is small, since p * log(p) 0 as p1. •For a random variable with n outcomes, H reaches maximum when probabilities of the outcomes are all the same, i.e., 1/n, and H = log(n). •When x is a continuous variable with a fixed variance, Shannon proved that H reaches maximum when x follows a Gaussian distribution. Shannon’s proof: Summary Or simply: Note that: H>=0 Spi =1 H is larger when there are more probable states. H can be generally computed whenever there is a p distribution. Mutual information Mutual information is measure of dependence. The concept was introduced by Shannon in 1948 and has become widely used in many different fields Formulation of mutual information Meaning of MI MI and H Properties of MI Connection between correlation and MI. Example of MI application Home advantage Estimated MI between goals a team scored in a game and whether the team was playing at home or away. The heights of the grey bars provide the approximate 95% of the null points. The Canada value is below the line because it would have been hidden by the grey shading above. Reference reading A correlation for the 21st century. Terry Speed. Commentary on MIC. http://www.sciencemag.org/content/334/6062/1502.full Detecting Novel Associations in Large Data Sets. Reshaf et al., Science 2011. December. Some data analyses using mutual information. David Brillinger http://www.stat.berkeley.edu/~brill/Papers/bjps1.pdf