Kolmogorov Complexity and Effective Computation: A New Way of formulating Proofs Fouad B. Chedid, Ph.D. Presented at the “International Conference on Scientific Computations”, Lebanese American University, July 1999, Beirut, Lebanon. Abstract Kolmogorov complexity, a topic deeply rooted in probability theory, combinatorics, and philosophical notions of randomness, has come to produce a new technique that captures our intuition of effective computation as set by Turing and Church during the 1930s. This new technique, named the incompressibility method, along with the fact that most strings are Kolmogorov random, has led researchers to a new way of formulating proofs, a new way that is intuitive, direct, and almost trivial. This paper reviews the basics of Kolmogorov complexity and explains the use of the Incompressibility Method. Examples are drawn from Mathematics and Computer Science. Index Terms: Kolmogorov complexity, incompressibility method, lower bounds. Introduction We use the term Kolmogorov Complexity to refer to what has appeared in the literature over the last 35 years under the names: Descriptive Complexity, Program Size Complexity, Algorithmic Information Theory, Kolmogorov Complexity, and KolmogorovSolomonoff-Chaitin Complexity (Kolmogorov and Uspensky, 1963; 1987; Kolmogorov, 1963; 1965a; 1965b; 1983; Solomonoff, 1964; 1975; Chaitin, 1966; 1969; 1970; 1971; 1974; 1977; 1987a; 1987b; Li and Vitanyi, 1990; 1992a; 1992b; 1993; Loveland, 1969; Martin-Lof, 1966; Watanabe, 1992). Basically, Kolmogorov complexity is about the notion of randomness. This is in relation to what the theories of computation and algorithms have taught us over the last 30 years. Turing and Church have established the notion of effective computation. Now, we feel comfortable with the idea of a universal Turing machine. This is to say that different computing methods do not add considerably to the complexity of describing an object; that is, the complexity of an object is an intrinsic quality property of the object itself and independent, up to some additive constant, of the method of computing in use. This has led researchers to a new definition of complexity; the kind of complexity which captures the real or pure amount of information stored in a string. The Kolmogorov complexity of a string w, denoted K(w), is the length of the shortest <program, input> which, when run on a computer, produces w. The rest of this paper is organized as follows. Section 2 lists basic definitions and facts about our topic. Section 3 describes a hierarchy of randomness among strings. Section 4 describes the encoding method to be used. The incompressibility method is introduced in section 5. Section 6 compares the incompressibility method to two other popular methods and section 7 concludes with four proofs that make use of the incompressibility method. 1. Definitions and Facts We begin by considering a simple example. Consider the following two 16-bit series: (a) 1010101010101010 and (b) 1000101000111101. Which of these two series is more random ? Intuitively, it is clear that (b) is more random than (a); that is, because (a) is more regular ('10' repeated 8 times). Now, what should we consider as the basis of our decision? If we were to consider the origin of these two events; say both generated by independent trials of flipping a fair coin, and recording 1 for heads and 0 for tails, then both (a) and (b) are equally likely, and as such they are equally random. However, Kolmogorov randomness offers a different view. Kolmogorov randomness is about the length of the shortest <program, input> capable of reproducing a string. In this regard, (b) is more random than (a); that is, because (a) can be generated as “print '10' eight times”, while the only way to produce (b) is to “print 1000101000111101.” There does not seem to exist any shorter way of describing the string in (b) than writing (b) in its entirety. It should be clear now as to why random strings are referred to as incompressible strings, while non-random strings are said to be compressible. Definition 1.1. Random strings are those that do not admit descriptions significantly shorter than their own length. (cannot be significantly compressed). In this respect, randomness is a relative quantity. Some strings are extremely compressible, others may not be compressed at all, and still others may be compressed; but then it might take a lot of efforts to do so. Strings that are hard to compress will be called Kolmogorov random strings. The word random here agrees with the notion of randomness in Statistics; that is, a string is random if and only if every digit in the string appears with an equal relative frequency. In the case of binary strings, a random string implies a string where each bit appears with a relative frequency of 1/2. This also agrees with Shannon’s Information Theory, where the amount of information (entropy) is defined as I = p log p i i i where pi is the relative frequency of each bit. With random strings, pi = 1/2, and I = 1; maximal entropy. Definition 1.2. A series is random if the smallest <algorithm, input> capable of specifying it to a computer has about the same number of digits as the series itself. Fact 1.3. K(w) |w| + c We can always use the copy Turing machine to copy the input to the output. In this inequality, c is the length of the "copy" program. Fact 1.4. Strings which are not random are a few. To be compressible, a string must exhibit enough regularity in its structure to allow for significant reduction in its description. For example, the string pp...p, pattern p repeated n times, can be generated as "print 'p' n times". This program has length log n + |p|. The string itself has length n|p|. For large n, log n is significantly smaller than n. Fact 1.5. The set of computable numbers have constant Kolmogorov complexity. 2. Hierarchy of Randomness Given a string of length n, how many descriptions are there of length less than n? A simple calculation shows that there are n 1 2 i 2 n 1 descriptions (programs) of length less n. Thus, among i 0 all 2n strings of length n, there is one string which cannot be compressed at all, not even by one bit. Discovering that for a given size n, one of the strings is absolutely random doesn’t say much about the rest of the strings in that universe. What if we ease the restriction of “absolute randomness” and ask for those n-bit strings whose descriptions are of length n-m (can be compressed by at least m bits). For example, we saw that there are 2n1 strings of length n whose descriptions are of length n-1. How many strings of length n can have descriptions of length n-2, n-3, ... nm ? In general, it can be shown that there are 2 i 2 nm1 1 strings i 0 of length n whose descriptions are of length n-m. Notice that the fraction of m-compressible strings to m-noncompressible strings is 2m+1 ; a value that diminishes exponentially as m increases. For example when m = 11, the fraction is 2-10; that is of all strings of length n, 1 in 1024 is compressible by at least 11 bits, all the other 1023 strings requiring n-10 bits or more. Put in probabilistic terms, the probability of selecting an 11-compressible string from all strings of length n is 0.0009; the probability of obtaining 11-noncompressible string is 0.9990, almost certain. Fact 2.1. Most strings of a given length are random. 3. From Integers to Binary Sequences We shall map integers to binary strings written in increasing order of their length, and in lexicographical order within the same length. That is, the list 0,1,2,3,4,5,6,7,8,... will correspond to the list 0,1,00,01,10,11,000,001,010,...,etc. This is a better choice than using the usual binary representation as in binary representation, either a string doesn’t represent a valid integer (to say that 001 in not 1) or more than one string represent the same integer (to say that 01, 001, 0001, ... all represent 1). It can be easily verified that the length of an integer n following the former representation is log n + 1; for more simplicity log n will be used instead. Fact 3.1. For an integer n, K(n) log n - O(1). 4. The Incompressibility Method To prove a property about a problem A, the proof is written for a single random instance of A. However since most strings are random, and randomness is a non-effective property, the proof extends to almost all other instances. The incompressibility method depends on one single fact; that is, most strings are random. To prove a property P about a problem A, we begin by considering a single instance of the problem A, and encode it as a string w of length n. It is fair to assume that w is random. Then, we show that unless P holds, w can be significantly compressed; hence a contradiction. Since we are dealing with a specific instance of the problem, the proof is usually easier to write. Proof Technique: The basic idea is to show that if the statement to be proven, doesn’t hold, then an incompressible string becomes compressible (contradiction). The concept of Self Delimiting Description (SDD) is useful here. Self Delimiting Description: Consider the string 1101. Its length is 4, which in our representation is 10. Double each bit in 10, and add 01 at the end. This gives 110001. Then append the string itself, in this case 1101, to the end of 110001 to obtain 1100011101. This is the SDD of 1101. Lemma 4.1. The SDD of a string w requires w+2log|w|+2 bits. Lemma 4.2. The SDD of a number n requires log n + 2log log n + 2 bits. 5. Comparison to Other Methods Writing proofs has always been a difficult task. So here it might be very useful to say a few words on the relationship between the incompressibility method and other methods known from the literature. First, there is the counting method where a proof for a property P about a problem A involves all instances of A. It would be nicer if one could identify a typical instance of the problem A and write the proof for that single instance; but typical instances are hard to find, and one finds no other way but to cover all instances. Second, there is the probabilistic method. Here, problem instances are assumed to come from a known distribution, and the proof is written for a single instance that is proved to appear with high probability (fairly often). Probabilistic proofs are good enough if the problem instances distribution is known in advance. Compared to the above two methods, a random string in the incompressibility method is a typical instance. To write a proof about a problem A, we begin with a single random instance I of A and write the proof for that specific instance I. Even though we can never display a string and say that it is random, we know that random strings exist and that is all that we need. Asking to demonstrate the existence of a random string amounts to another version of Berry’s paradox. That paradox asks for “a number whose shortest description requires more characters than there are in this sentence.” Obviously such a number cannot exist. Berry’s statement itself provides a shorter description for that number whose definition is to have no shortest description with less characters. To prove that a string w of length n is random is to prove that w admits no description of length less than n. For large n, “w admits no description of length less than n” is by itself a description of w which is less than n, and that is not supposed to exist. 6. Four Examples A. Show that there is an infinite number of primes. If there is not, then the number of primes is finite; say p1, p1, …, pm. Given a random number n, write n following its prime factorization: n p1e1 p2e2 ... pmem . Each pi 2, and therefore each ei log n. Then n can be reconstructed using O(1) + mloglog n + 2logloglog n + 2) bits; that is, K(n) mlog n + O(1), for all n. But, this cannot be true as n is random. B. Binary search runs in (log n) time. Binary search is about finding an index i from the set {1, .., n}. K(i) log n. C. Sorting by comparison of keys of n numbers runs in (nlog n) time. Sorting n numbers is about finding a permutation from the set {p1, p2, ..., pn! }. K(pi) log n! nlog n. D. Godel Incompleteness Theorem: In any formal system (set of axioms and rules of inferences), there are always some true statements which cannot be proven true within that system. Let f be the number of bits used in describing a formal system F. Given a theorem P (true statement) of length n >> f. One can list, in lexicographical order of their length, all the proofs that can be possibly generated from F. When a proof is found for theorem P, that proof is generated along with the theorem. This means that with about f + log n bits, we are able to describe something of length n. For large n, log n is much smaller than n. We can expect a few strings to be compressible this far, but the majority of strings remain incompressible. Hence, a contradiction. As explained by Chaitin, Kolmogorov complexity provides a very comforting explanation to the big upset of Godel's incompleteness theorem. Formal systems of length n are capable of proving theorems of length about n bits; but not much larger. Or as put by Chaitin, one cannot expect to prove a 20 pound theorem using only 10 pounds of axioms. Works Cited Chaitin, G.J. (1966) On the Length of Programs for Computing Finite Binary Sequences, J. Assoc. Comput. Mach., 13:547-569. Chaitin, G.J. (1969) On the simplicity and speed of programs for computing infinite sets of natural numbers, J. Assoc. Comput. Mach., 16:407-422. Chaitin, G.J. (1971) Computational Complexity and Godel's Incompleteness Theorem, SIGACT News, 9:11-12. Chaitin, G.J. (1974) Information-Theoretic Limitations of Formal Systems, J. Assoc. Comput. Mach., 21:403-424, Chaitin, G.J. (1977) Algorithmic Information Theory, IBM J. Res. Develp., 21:350-359. Chaitin, G.J. (1987a) Algorithmic Information Theory, Cambridge University Press. Chaitin, G.J. (1987b) Information, Randomness, and Incompleteness Papers on Algorithmic Information Theory, World Scientific, Expanded 2nd edition, 1992. Chaitin, G.J.(1970) On the Difficulty of Computations," IEEE Trans. Inform. Theory, IT-16:5-9. Kolmogorov, A.N. (1963) “On Tables of Random Numbers, Sankhya”, The Indian Journal of Statistics, Series A, 25:369376. Kolmogorov, A.N.( 1965a) Three Approaches to the Quantitative Definition of Information, Problems Inform. Transmission, 1(1):1-7. Kolmogorov, A.N. (1965b) On the Logical Foundations of Information Theory and Probability Theory, Problems Inform. Transmission, 1(1):1-7. Kolmogorov, A.N. (1983) On Logical Foundations of Probability Theory, In K. Ito and J.V. Prokhorov, editors, Probability Theory and Mathematics, Springer-Verlag, 1-5. Kolmogorov, A.N., and Uspensky V.A. (1963) On the Definition of an Algorithm, In Russian, English translation: Amer. Math. Soc. Translat., 29:2, 217-245. Kolmogorov, A.N., and Uspensky V.A. (1987) Algorithms and Randomness: SIAM J. Theory Probab. Appl., 32:389-412. Li, M., and Vitanyi P.M.B. (1990) Applications of Kolmogorov Complexity in the Theory of Computation, In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, chapter 4, Elsevier and MIT Press, 187-254. Li, M., and Vitanyi, P. (1993) An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag. Li, M., and Vitanyi, P.M.B.(1992a) Inductive reasoning and Kolmogorov complexity, J. Comput. Syst. Sci., 44(2):343-384. Li, M., and Vitanyi, P.M.B. (1992b) Worst case complexity is equal to average case complexity under the universal distribution, Inform. Process. Lett., 42:145-149. Loveland, D.W. (1969) A variant of the Kolmogorov concept of complexity, Inform. Contr., 15:510-526. Martin-Lof, P. (1966) The definition of random sequences, Inform. Contr., 9:602-619. Solomonoff, R.J. (1964) A formal theory of inductive inference, part 1 and part 2, Inform. Contr., 7:1-22, 224-254. Solomonoff, R.J. (1975) Inductive inference theory - a unified approach to problems in pattern recognition and aritifical intelligence, In 4th International Conference on Artificial Intelligence, Tbilisi, Georgia, USSR, 274-280. Watanabe, O. (1992) editor, Kolmogorov Complexity and Computational Complexity, Springer-Verlag.