Kolmogorov Complexity and Effective Computation: A New Way of

advertisement
Kolmogorov Complexity and Effective Computation:
A New Way of
formulating Proofs
Fouad B. Chedid, Ph.D.
Presented at the “International Conference on Scientific
Computations”, Lebanese American University, July 1999, Beirut,
Lebanon.
Abstract
Kolmogorov complexity, a topic deeply rooted in probability theory, combinatorics, and
philosophical notions of randomness, has come to produce a new technique that captures
our intuition of effective computation as set by Turing and Church during the 1930s. This
new technique, named the incompressibility method, along with the fact that most strings
are Kolmogorov random, has led researchers to a new way of formulating proofs, a new
way that is intuitive, direct, and almost trivial. This paper reviews the basics of
Kolmogorov complexity and explains the use of the Incompressibility Method.
Examples are drawn from Mathematics and Computer Science.
Index Terms: Kolmogorov complexity, incompressibility method, lower bounds.
Introduction
We use the term Kolmogorov Complexity to refer to what has
appeared in the literature over the last 35 years under the names:
Descriptive Complexity, Program Size Complexity, Algorithmic
Information Theory, Kolmogorov Complexity, and KolmogorovSolomonoff-Chaitin Complexity (Kolmogorov and Uspensky, 1963;
1987; Kolmogorov, 1963; 1965a; 1965b; 1983; Solomonoff, 1964;
1975; Chaitin, 1966; 1969; 1970; 1971; 1974; 1977; 1987a; 1987b; Li
and Vitanyi, 1990; 1992a; 1992b; 1993; Loveland, 1969; Martin-Lof,
1966; Watanabe, 1992). Basically, Kolmogorov complexity is about
the notion of randomness. This is in relation to what the theories of
computation and algorithms have taught us over the last 30 years.
Turing and Church have established the notion of effective
computation. Now, we feel comfortable with the idea of a universal
Turing machine. This is to say that different computing methods do
not add considerably to the complexity of describing an object; that is,
the complexity of an object is an intrinsic quality property of the
object itself and independent, up to some additive constant, of the
method of computing in use. This has led researchers to a new
definition of complexity; the kind of complexity which captures the
real or pure amount of information stored in a string. The
Kolmogorov complexity of a string w, denoted K(w), is the length of
the shortest <program, input> which, when run on a computer,
produces w. The rest of this paper is organized as follows. Section 2
lists basic definitions and facts about our topic. Section 3 describes a
hierarchy of randomness among strings. Section 4 describes the
encoding method to be used. The incompressibility method is
introduced in section 5. Section 6 compares the incompressibility
method to two other popular methods and section 7 concludes with
four proofs that make use of the incompressibility method.
1. Definitions and Facts
We begin by considering a simple example. Consider the following
two 16-bit series: (a) 1010101010101010 and (b)
1000101000111101. Which of these two series is more random ?
Intuitively, it is clear that (b) is more random than (a); that is, because
(a) is more regular ('10' repeated 8 times). Now, what should we
consider as the basis of our decision? If we were to consider the
origin of these two events; say both generated by independent trials of
flipping a fair coin, and recording 1 for heads and 0 for tails, then
both (a) and (b) are equally likely, and as such they are equally
random. However, Kolmogorov randomness offers a different view.
Kolmogorov randomness is about the length of the shortest <program,
input> capable of reproducing a string. In this regard, (b) is more
random than (a); that is, because (a) can be generated as “print '10'
eight times”, while the only way to produce (b) is to “print
1000101000111101.” There does not seem to exist any shorter way of
describing the string in (b) than writing (b) in its entirety. It should be
clear now as to why random strings are referred to as incompressible
strings, while non-random strings are said to be compressible.
Definition 1.1. Random strings are those that do not admit
descriptions significantly shorter than their own length.
(cannot be significantly compressed).
In this respect, randomness is a relative quantity. Some strings are
extremely compressible, others may not be compressed at all, and still
others may be compressed; but then it might take a lot of efforts to do
so. Strings that are hard to compress will be called Kolmogorov
random strings. The word random here agrees with the notion of
randomness in Statistics; that is, a string is random if and only if
every digit in the string appears with an equal relative frequency. In
the case of binary strings, a random string implies a string where each
bit appears with a relative frequency of 1/2. This also agrees with
Shannon’s Information Theory, where the amount of information
(entropy) is defined as I =
  p log p
i
i
i
where pi is the relative
frequency of each bit. With random strings, pi = 1/2, and I = 1;
maximal entropy.
Definition 1.2. A series is random if the smallest <algorithm, input>
capable of specifying it to a computer has about the same number of
digits as the series itself.
Fact 1.3. K(w)  |w| + c
We can always use the copy Turing machine to copy the input to the
output. In this inequality, c is the length of the "copy" program.
Fact 1.4. Strings which are not random are a few.
To be compressible, a string must exhibit enough regularity in its
structure to allow for significant reduction in its description. For
example, the string pp...p, pattern p repeated n times, can be
generated as "print 'p' n times". This program has length log n + |p|.
The string itself has length n|p|. For large n, log n is significantly
smaller than n.
Fact 1.5. The set of computable numbers have constant Kolmogorov
complexity.
2. Hierarchy of Randomness
Given a string of length n, how many descriptions are there of length
less than n? A simple calculation shows that there are
n 1
2
i
 2 n  1 descriptions (programs) of length less n. Thus, among
i 0
all 2n strings of length n, there is one string which cannot be
compressed at all, not even by one bit.
Discovering that for a given size n, one of the strings is absolutely
random doesn’t say much about the rest of the strings in that universe.
What if we ease the restriction of “absolute randomness” and ask for
those n-bit strings whose descriptions are of length n-m (can be
compressed by at least m bits). For example, we saw that there are 2n1 strings of length n whose descriptions are of length  n-1. How
many strings of length n can have descriptions of length  n-2, n-3, ...
nm
? In general, it can be shown that there are
2
i
 2 nm1  1 strings
i 0
of length n whose descriptions are of length  n-m. Notice that the
fraction of m-compressible strings to m-noncompressible strings is 2m+1
; a value that diminishes exponentially as m increases. For
example when m = 11, the fraction is 2-10; that is of all strings of
length n, 1 in 1024 is compressible by at least 11 bits, all the other
1023 strings requiring n-10 bits or more. Put in probabilistic terms,
the probability of selecting an 11-compressible string from all strings
of length n is 0.0009; the probability of obtaining 11-noncompressible
string is 0.9990, almost certain.
Fact 2.1. Most strings of a given length are random.
3. From Integers to Binary Sequences
We shall map integers to binary strings written in increasing order of
their length, and in lexicographical order within the same length. That
is, the list 0,1,2,3,4,5,6,7,8,... will correspond to the list
0,1,00,01,10,11,000,001,010,...,etc. This is a better choice than using
the usual binary representation as in binary representation, either a
string doesn’t represent a valid integer (to say that 001 in not 1) or
more than one string represent the same integer (to say that 01, 001,
0001, ... all represent 1). It can be easily verified that the length of an
integer n following the former representation is log n + 1; for more
simplicity log n will be used instead.
Fact 3.1. For an integer n, K(n)  log n - O(1).
4. The Incompressibility Method
To prove a property about a problem A, the proof is written for a
single random instance of A. However since most strings are random,
and randomness is a non-effective property, the proof extends to
almost all other instances.
The incompressibility method depends on one single fact; that is,
most strings are random. To prove a property P about a problem A,
we begin by considering a single instance of the problem A, and
encode it as a string w of length n. It is fair to assume that w is
random. Then, we show that unless P holds, w can be significantly
compressed; hence a contradiction. Since we are dealing with a
specific instance of the problem, the proof is usually easier to write.
Proof Technique: The basic idea is to show that if the statement to be
proven, doesn’t hold, then an incompressible string becomes
compressible (contradiction). The concept of Self Delimiting
Description (SDD) is useful here.
Self Delimiting Description: Consider the string 1101. Its length is 4,
which in our representation is 10. Double each bit in 10, and add 01 at
the end. This gives 110001. Then append the string itself, in this case
1101, to the end of 110001 to obtain 1100011101. This is the SDD of
1101.
Lemma 4.1. The SDD of a string w requires w+2log|w|+2 bits.
Lemma 4.2. The SDD of a number n requires log n + 2log log n + 2
bits.
5. Comparison to Other Methods
Writing proofs has always been a difficult task. So here it might be
very useful to say a few words on the relationship between the
incompressibility method and other methods known from the
literature. First, there is the counting method where a proof for a
property P about a problem A involves all instances of A. It would be
nicer if one could identify a typical instance of the problem A and
write the proof for that single instance; but typical instances are hard
to find, and one finds no other way but to cover all instances.
Second, there is the probabilistic method. Here, problem instances are
assumed to come from a known distribution, and the proof is written
for a single instance that is proved to appear with high probability
(fairly often). Probabilistic proofs are good enough if the problem
instances distribution is known in advance.
Compared to the above two methods, a random string in the
incompressibility method is a typical instance. To write a proof about
a problem A, we begin with a single random instance I of A and write
the proof for that specific instance I. Even though we can never
display a string and say that it is random, we know that random
strings exist and that is all that we need.
Asking to demonstrate the existence of a random string amounts to
another version of Berry’s paradox. That paradox asks for “a number
whose shortest description requires more characters than there are in
this sentence.” Obviously such a number cannot exist. Berry’s
statement itself provides a shorter description for that number whose
definition is to have no shortest description with less characters. To
prove that a string w of length n is random is to prove that w admits
no description of length less than n. For large n, “w admits no
description of length less than n” is by itself a description of w which
is less than n, and that is not supposed to exist.
6. Four Examples
A. Show that there is an infinite number of primes.
If there is not, then the number of primes is finite; say p1, p1, …, pm.
Given a random number n, write n following its prime
factorization: n  p1e1 p2e2 ... pmem . Each pi  2, and therefore each ei  log
n. Then n can be reconstructed using O(1) + mloglog n + 2logloglog n +
2) bits; that is, K(n)  mlog n + O(1), for all n. But, this cannot be true
as n is random.
B. Binary search runs in (log n) time.
Binary search is about finding an index i from the set {1, .., n}. K(i) 
log n.
C. Sorting by comparison of keys of n numbers runs in (nlog n) time.
Sorting n numbers is about finding a permutation from the set {p1, p2,
..., pn! }. K(pi)  log n!  nlog n.
D. Godel Incompleteness Theorem: In any formal system (set of axioms
and rules of inferences), there are always some true statements which
cannot be proven true within that system.
Let f be the number of bits used in describing a formal system F.
Given a theorem P (true statement) of length n >> f. One can list, in
lexicographical order of their length, all the proofs that can be
possibly generated from F. When a proof is found for theorem P, that
proof is generated along with the theorem. This means that with about
f + log n bits, we are able to describe something of length n. For large
n, log n is much smaller than n. We can expect a few strings to be
compressible this far, but the majority of strings remain
incompressible. Hence, a contradiction. As explained by Chaitin,
Kolmogorov complexity provides a very comforting explanation to
the big upset of Godel's incompleteness theorem. Formal systems of
length n are capable of proving theorems of length about n bits; but
not much larger. Or as put by Chaitin, one cannot expect to prove a 20
pound theorem using only 10 pounds of axioms.
Works Cited
Chaitin, G.J. (1966) On the Length of Programs for Computing Finite
Binary Sequences, J. Assoc. Comput. Mach., 13:547-569.
Chaitin, G.J. (1969) On the simplicity and speed of programs for
computing infinite sets of natural numbers, J. Assoc. Comput.
Mach., 16:407-422.
Chaitin, G.J. (1971) Computational Complexity and Godel's
Incompleteness Theorem, SIGACT News, 9:11-12.
Chaitin, G.J. (1974) Information-Theoretic Limitations of Formal
Systems, J. Assoc. Comput. Mach., 21:403-424,
Chaitin, G.J. (1977) Algorithmic Information Theory, IBM J. Res.
Develp., 21:350-359.
Chaitin, G.J. (1987a) Algorithmic Information Theory, Cambridge
University Press.
Chaitin, G.J. (1987b) Information, Randomness, and Incompleteness Papers on Algorithmic Information Theory, World Scientific,
Expanded 2nd edition, 1992.
Chaitin, G.J.(1970) On the Difficulty of Computations," IEEE Trans.
Inform. Theory, IT-16:5-9.
Kolmogorov, A.N. (1963) “On Tables of Random Numbers,
Sankhya”, The Indian Journal of Statistics, Series A, 25:369376. Kolmogorov, A.N.( 1965a) Three Approaches to the
Quantitative Definition of Information, Problems Inform.
Transmission, 1(1):1-7.
Kolmogorov, A.N. (1965b) On the Logical Foundations of
Information Theory and Probability Theory, Problems Inform.
Transmission, 1(1):1-7.
Kolmogorov, A.N. (1983) On Logical Foundations of Probability
Theory, In K. Ito and J.V. Prokhorov, editors, Probability
Theory and Mathematics, Springer-Verlag, 1-5.
Kolmogorov, A.N., and Uspensky V.A. (1963) On the Definition of
an Algorithm, In Russian, English translation: Amer. Math.
Soc. Translat., 29:2, 217-245.
Kolmogorov, A.N., and Uspensky V.A. (1987) Algorithms and
Randomness: SIAM J. Theory Probab. Appl., 32:389-412.
Li, M., and Vitanyi P.M.B. (1990) Applications of Kolmogorov
Complexity in the Theory of Computation, In J. van Leeuwen,
editor, Handbook of Theoretical Computer Science, chapter 4,
Elsevier and MIT Press, 187-254.
Li, M., and Vitanyi, P. (1993) An Introduction to Kolmogorov
Complexity and its Applications, Springer-Verlag.
Li, M., and Vitanyi, P.M.B.(1992a) Inductive reasoning and
Kolmogorov complexity, J. Comput. Syst. Sci., 44(2):343-384.
Li, M., and Vitanyi, P.M.B. (1992b) Worst case complexity is equal
to average case complexity under the universal distribution,
Inform. Process. Lett., 42:145-149.
Loveland, D.W. (1969) A variant of the Kolmogorov concept of
complexity, Inform. Contr., 15:510-526.
Martin-Lof, P. (1966) The definition of random sequences, Inform.
Contr., 9:602-619.
Solomonoff, R.J. (1964) A formal theory of inductive inference, part 1
and part 2, Inform. Contr., 7:1-22, 224-254.
Solomonoff, R.J. (1975) Inductive inference theory - a unified
approach to problems in pattern recognition and aritifical
intelligence, In 4th International Conference on Artificial
Intelligence, Tbilisi, Georgia, USSR, 274-280.
Watanabe, O. (1992) editor, Kolmogorov Complexity and
Computational Complexity, Springer-Verlag.
Download