• Office hours shifted: 2-4 Tuesdays • Summary & Discussion of Gell-Mann • Definitions & Examples – Shannon Information – Mutual Information – Kolmogorov Complexity (AIC) • Turing Machine – Effective Complexity Shannon Information • Shannon Entropy H to measure basic information capacity: – For a random variable X with a probability mass function p(x), the entropy H ( X ) of log 2 p( x) as: Xp(isx)defined – Entropy is measured in bits. – H measures the average uncertainty in the random variable. • Example 1: – Consider a random variable with uniform distribution over 32 outcomes. – To identify an outcome, we need a label that takes on 32 different values, e.g., 5-bit strings. 32 32 H ( X ) p (i ) log p (i ) i 1 i 1 1 1 log log 32 5 bits 32 32 What is a Random Variable? • A function defined on a sample space. – Should be called “random function.” – Independent variable is a point in a sample space (e.g., the outcome of an experiment). • A function of outcomes, rather than a single given outcome. • Probability distribution of the random variable X: P{ X x j } f ( x j ) (j 1,2,...) • Example: – – – – Toss 3 fair coins. Let X denote the number of heads appearing. X is a random variable taking on one of the values (0,1,2,3). P{X=0} = 1/8; P{X=1} = 3/8; P{X=2} = 3/8; P{X=3} = 1/8. • Example 2: – A horse race with 8 horses competing. – The probabilities of 8 horses are: 1 1 1 1 1 1 1 1 , , , , , , , 2 4 8 16 64 64 64 64 – Calculate the entropy H of the horse race: 1 1 1 1 1 1 1 1 1 1 H ( X ) log log log log 4 log 2 bits 2 2 4 4 8 8 16 16 64 64 – Suppose that we wish to send a (short) message to another person indicating which horse won the race. – Could send the index of the winning horse (3 bits). – Alternatively, could use the following set of labels: • 0, 10, 110, 1110, 111100, 111101, 111110, 111111. • Average description length is 2 bits (instead of 3). • Huffman code: variable length ‘prefix free’ code for maximum lossless compression • More generally, the entropy of a random variable is a lower bound on the average number of bits required to represent the random variable. • Shannon information is the amount of ‘surprise’ when you read a message • The amount of uncertainty you have before you read the message— information measures how much you’ve reduced that uncertainty by reading the message. • The uncertainty (complexity) of a random variable can be extended to define the descriptive complexity of a single string. – E.g., Kolmogorov (or algorithmic) complexity is the length of the shortest computer program that prints out the string. • Entropy is the uncertainty of a single random variable. • Conditional entropy is the entropy of a random variable given another random variable. Mutual Information • Measures the amount of information that one random variable contains about another random variable. – Mutual information is a measure of reduction of uncertainty due to another random variable. – That is, mutual information measures the dependence between two random variables. – It is symmetric in X and Y, and is always non-negative. • Recall: Entropy of a random variable X is H(X). • Conditional entropy of a random variable X given another random variable Y = H(X | Y). • The mutual information of two random variables X and Y is: I ( X , Y ) H ( X ) H ( X | Y ) p ( x, y ) log x, y H(X) H(Y) p ( x, y ) p( x) p( y ) H(X|Y) Algorithmic Complexity (AIC) (also known as Kolmogorov-Chaitin complexity) • Kolomogorov-Chaitin complexity or Algorithmic Information Content, K(x) or KU(x), is the length, in bits, of the smallest program that when run on a Universal Turing Machine outputs (prints) x and then halts. • Example: What is K(x) where x is the first 10 even natural numbers? Where x is the first 5 million even natural numbers? • Possible representations: – 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, … (2n - 2) K(x) = O(n log n) bits – for (j = 0; j < n: j++) printf(“%d\n”, j * 2) K(x) = O(log n) Two problems with AIC – Calculation of K(x) depends on the machine we have available (e.g., what if we have a machine with an instruction “print the first 10 even natural numbers”?) – Determining K(x) for arbitrary x is uncomputable AIC cont. • AIC formalizes what it means for a set of numbers to be compressible and incompressible. – Data that are redundant can be more easily described and have lower AC. – Data that have no clear pattern and no easy algorithmic description have high AIC. • Random strings are incompressible, therefore contain no regularities to compress – K(x) = | Print(x) | • Implication: The more random a system, the greater its AIC. • Contrast with Statistical simplicity Random strings are simple because you can approximate them statistically – Coin toss, random walks, Gaussian (normal distributions) – You can compress random numbers with statistical descriptions and only a few parameters Gell-Mann’s Effective Complexity • The length of the shortest description of a set’s regularities • EC(x) = K(r) where r is the set of regularities in x and Kolmogorov Complexity, K(r), is the length of a concise description of a set • Highest for entities that are not strictly regular or random High Shannon Info Low compressibility* Random Algorithmic Complexity Effective Complexity Low Shannon Info High compressibility Orderly Randomness h m Randomness h m – Gell Mann suggests one formal way to identify regularities – Determine mutual AIC between parts of the string • If x = [x1,x2] • K(x1,x2) = K(x1) + K(x2) – K (x) • The sum of the AICs of the parts – the AIC of the whole • Eg. 10010 10011 10010 -the whole has more regularity than the sum of the regularities in the parts Total Information – Alternative approach in Gell-Mann & Lloyd 1998 – EC(x) = K(E) where E is the set of entities of which x is a typical member – Then K(x) is the length of the shortest program required to specify the members of a the set of which x is a typical member – Effective complexity measures knowledge—the extent to which the entity is nonrandom and predictable – Total Information is Effective complexity, K(E), + the Shannon Information (peculiarities, randomness) – TI(x) = K(E) + H(x) – There is a tradeoff between the effective complexity (how complete a description of the regularities) and the remaining randomness – Ex: 10010 10011 10010 10010 10010 Language Complexity • How complex is the English (or Spanish or Chinese or Thai or C programming or Java…) language? • What is the Shannon information content oof a message in this language, as a function of the message length? • What is the effective complexity (AIC of the regularities? • What is the total information content? • How well do these measures approximate the complexity you perceive in these languages? What is the information content, AIC, Effective Complexity & logical depth of this string? 0413433021131473713868974402394801381716598485518981513440862714 2027932522312442988890890859944935463236713411532481714219947455 6443658237932020095610583305754586176522220703854106467494942849 8145339172620056875566595233987560382563722564800409510712838906 1184470277585428541980111344017500242858538249833571552205223608 7250291678860362674527213399057131606875345083433934446103706309 4520191158769724322735898389037949462572512890979489867683346116 2688911656312347446057517953912204556247280709520219819909455858 1946136877445617396074115614074243754435499204869180982648652368 4387027996490173977934251347238087371362116018601281861020563818 1835409759847796417390032893617143215987824078977661439139576403 7760537119096932066998361984288981837003229412030210655743295550 3888458497370347275321219257069584140746618419819610061296401614 8771294441590140546794180019813325337859249336588307045999993837 5411726563553016862529032210862320550634510679399023341675 Summary of Complexity Measures • Information-theoretic methods: – – – • Effective Complexity: – • How complex a machine is needed to compute a function? Logical depth: – • How many resources does it take to compute a function? The language/machine hierarchy: – • Neither regular nor random Computational complexity: – • Shannon Entropy Algorithmic complexity Mutual information Run-time of the shortest program that generates the phenomena and halts. Asymptotic behavior of dynamical systems: – – Fixed points, limit cycles, chaos. Wolfram’s CA classification: the outcome of complex CA can not be predicted any faster than it can be simulated. Frozen Accidents & Sensitive dependence on initial conditions For Want of a Nail For want of a nail the shoe was lost. For want of a shoe the horse was lost. For want of a horse the rider was lost. For want of a rider the battle was lost. For want of a battle the kingdom was lost. And all for the want of a horseshoe nail. Hierarchy, Interactions, Frozen Accidents HELA: An exponential that has lasted forever Under the microscope, a cell looks a lot like a fried egg: It has a white (the cytoplasm) that's full of water and proteins to keep it fed, and a yolk (the nucleus) that holds all the genetic information that makes you you. The cytoplasm buzzes like a New York City street. It's crammed full of molecules and vessels endlessly shuttling enzymes and sugars from one part of the cell to another, pumping water, nutrients, and oxygen in and out of the cell. All the while, little cytoplasmic factories work 24/7, cranking out sugars, fats, proteins, and energy to keep the whole thing running and feed the nucleus. The nucleus is the brains of the operation; inside every nucleus within each cell in your body, there's an identical copy of your entire genome. That genome tells cells when to grow and divide and makes sure they do their jobs, whether that's controlling your heartbeat or helping your brain understand the words on this page. Defler paced the front of the classroom telling us how mitosis — the process of cell division — makes it possible for embryos to grow into babies, and for our bodies to create new cells for healing wounds or replenishing blood we've lost. It was beautiful, he said, like a perfectly choreographed dance. All it takes is one small mistake anywhere in the division process for cells to start growing out of control, he told us. Just one enzyme misfiring, just one wrong protein activation, and you could have cancer. Mitosis goes haywire, which is how it spreads. "We learned that by studying cancer cells in culture," Defler said. He grinned and spun to face the board, where he wrote two words in enormous print: HENRIETTA LACKS. I had the idea that I'd write a book that was a biography of both the cells and the woman they came from — someone's daughter, wife, and mother. From The immortal life of Henrietta Lacks. http://www.npr.org/templates/story/story.php?storyId=123232331 Defining Complexity Suggested References • • • • • • Computational Complexity by Papadimitriou. Addison-Wesley (1994). Elements of Information Theory by Cover and Thomas. Wiley (1991). Kaufmann, At Home in the Universe (1996) and Investigations (2002). Per Bak, How Nature Works: The Science of Self-Organized Criticality (1988) Gell-Mann, The Quark and the Jaguar (1994) Ay, Muller & Szkola, Effective Complexity and its Relation to Logical Depth, ArXiv (2008) Mitchell Ch. 3 Information • CAS process information – CAS are computers • Energy is Conserved • Entropy increases: the arrow of time • Statistical mechanics bridges classical Newtonian physics to thermodynamics: S = k log W • Maxwell’s Demon – Information costs energy • Real world information—analyzed for meaning, processed for some outcome, dependent on context Mitchell Ch. 4 Godel & Turing • Godel: Mathematics is Incomplete “This statement is false.” • Turing: Mathematics is undecideable • Turing Machine defines definite procedure • UTM: tape contains both the input I and the machine (or program) M • I could be the M of another machine (it could even be the same M) Turing Machines 1. 2. 3. 4. a tape A head that can r/w & move l/r Instruction table State register Tape is infinite, all else is finite and discrete Proof by Contradiction that Math/logic/computation is undecideable • Turing Statement (to be contradicted): A TM, H, given input I and Machine M will halt in finite time and return Yes if M will halt on I, or No if M will not halt on I H(M,I) = 1 if M halts on I H(M,I) = 0 if M does not halt on I • Equivalent to saying we can design an infinite loop detector H • Why can’t H just run M on I? Inspiration from Godel Create H’ that calculates H(M,M) except, H’ (M,M) does not halt if M halts on M H’ (M,M) halts if if M does not halt on M • What does H’ do when it is its own input? • H’(H’,H’) halts if H’ doesn’t halt on its own input; • H’ doesn’t halt if H’ halts on its own input CONTRADICTION • Godel encodes logical/mathematical statements so they talk about themselves. • Turing encoded logic in TM and runs TM on themselves. • Demonstrate that math/logic are incomplete and undecideable My view of CAS • Incomplete, Undecidable We know it when we see it • Process information • Interactions between components • Hierarchy of components • Result from frozen accidents & arrow of time – Largely from evolution – Exception: climate & weather: memory at different scales • Predictable at some scales, sometimes Logical Depth • Bennett 1986;1990: – The Logical depth of x is the run time of the shortest program that will cause a UTM to produce x and then halt. – Logical depth is not a measure of randomness; it is small both for trivially ordered and random strings. • Drawbacks: – Uncomputable. – Loses the ability to distinguish between systems that can be described by computational models less powerful than Turing Machines (e.g., finite-state machines). • Ay et al 2008, Recent proposed proof that strings with high effective complexity also have high logical depth, and low effective complexity have small logical depth.