1 Kolmogorov Complexity Krzysztof Zawada University of Illinois at Chicago E-mail: kzawada@uic.edu Abstract Information theory is a branch of mathematics that attempts to quantify information. To quantify information one needs to look at the data compression and transmission rate. Information theory has applications in many fields. The purpose of this paper is comprehensively present a subset of information theory as it applies to computer science. Andrei Kolmogorov was the pioneering mathematician that applied information theory to computer science. The branch of information theory that relates to computer science became known as Kolmogorov complexity or algorithmic complexity. I. I NTRODUCTION In 1965 Andrei Kolmogorov a researcher and mathematician presented a definition for the descriptive complexity of an object. He stated that the complexity of an object is the length of the shortest binary computer program that describes the object. Kolmogorov complexity or algorithmic complexity studies the complexity of strings and other data structures. Although he was the pioneer in this area other researchers such as Solmonoff and Chaitin contributed to this idea in this time frame (1960s-1970s). To measure the computational resources to specify an object in the real world, an embedded system is one example. Embedded systems use hardware and software to accomplish something specific for an application. Embedded systems use messaging schemes, data structures, and algorithms that define information and transmit this information reliably. There is a high level of complexity that produces something deterministic (real-time) using some computational resources. Kolmogorov complexity is an abstract and deep idea and it can be used to state or prove many impossibility results, statements that are true but cannot be proven. In essence there is no way to actually solve or find the shortest computer program for a given string in practice because this might take an infinite amount of time. Kolmogorov complexity is considered as a way of thinking and a good basis for inductive reference to fundamental understanding of computer science as well as other areas such as physics and communication theory. This paper will present the basic idea, author contributions, and new developments in the area. II. KOLMOGOROV C OMPLEXITY D EFINITION AND E XPLANATION To understand and explain the concept or of Kolmogorov complexity a few classic examples are used. Since it is a study of complexity of strings and other data structures consider the examples of strings in figure 1. Kolmogorov complexity attempts to answer the following questions: (a) What is the length of shortest description of the above strings or sequences? (b) What is the shortest binary computer program for the above string or sequences? (c) What is the minimum amount of bits to describe the sequences or strings above? Analyzing the string sequences above it is easy to see that each has a length and an English language description. Some can be described easily, but others are not described easily. In fact, the only way to 2 Fig. 1. Kolmogorov Examples describe the more complex and random strings is by the giving the string itself. To find a shortest computer program that could generate the above strings would involve CPU, memory, input/output, encoding (ASCII), and a description language such as assembly, C/C++ or Java Virtual Machine byte code. Even though this would be something to consider Kolmogorov showed that the definition of complexity is computer independent. Thomas and Cover[6] explain this and define the Kolmogorov complexity of a string with a universal computer as the following: πΎπ (π₯) = ππππ:π (π)=π₯ π(π) Where, π₯ represents a string, π is a universal computer, and π is the program. This equation states theπΎπ (π₯) is the shortest description (program) length of string π₯ interpreted by a universal computer. The universal computer is called a Turing machine, developed by Alan Turing to be the simplest universal computer. It is a widely used computational system for analysis that simplifies a computer. In a Turing machine, input tape is a binary program that is fed to the computer. The computer uses a state machine to interpret the instructions and produce a binary output on another tape. This in fact is a simple and a good model to represent a digital computer for academic analysis. Mathematically it is represented by the following formula: πΎπ (π₯) ≤ πΎπ΄ (π₯) + ππ΄ β£πΎπ (π₯) − πΎπ΄ (π₯)β£ < π π΄ is a computer with a constant (ππ΄ ) that does not depend on the string length π₯. The constant ππ΄ in this equation account for the difference in using a univeral computer versus computer π΄ which may have large instruction set or function. The idea that the universal computer has a program with a specific minimal length that could generate a string or sequence was further developed. The development stated that the program (or the minimal description) could not be much larger than the string itself. The proof of this is provided by Thomas and Cover[6]. Using conditional Kolmogorov complexity, with the precondition that the string is some known length. It relates the length of the string to the minimum program length as follows: πΎ(π₯β£π(π₯)) ≤ π(π₯) + π πΎ(π₯β£π(π₯)) ≤ πΎ(π₯β£π(π₯)) + 2 log π(π₯) + π (π’ππππ πππ’ππ) 3 This means that there is an upper bound on the size of the minimal description (or program). There also exists a lower bound on this minimal description. It is defined for a prefix-free set that satisfies Kraft inequality by Thomas and Cover[6] as: ∑ π (π₯π )πΎ(π₯π β£π) ≥ π»(π1 , π2 , π3 , ..., ππ ) = ππ»(π) The relationship of Kolmogorov complexity to information theory is through entropy. The entropy is the expected length of the shortest binary computer description. It is defined by: log π π 1 πΈπΎ(π π β£π) ≤ π» + β£πβ£ + π π π Chain rule in information theory also has an analog in Kolmogorov complexity and it is defined by: πΎ(π, π ) = πΎ(π) + πΎ(π π) + π(log(πΎ(π, π )) π»≤ Kolmogorov complexity has many other applications that can also be applied to find bounds. For instance, image compression; the measure of the algorithmic complexity of an image with n pixels that is compressed by some factor f can be defined by π πΎ(πππππβ£π) ≤ ( ) + π. π Another example is gambling, where it can be shown that the logarithm of wealth a gambler achieves on a sequence is no smaller than length of the sequence [6]. This idea also extends to finding the complexity of integers, algorithmically random and incompressible sequences, universal probability, statistics, and other applications. Although from the given examples we know that Kolmogorov complexity cannot be computed, the outcome converges to the true value and this provides a framework to study randomness and inference. The main contributors to the original idea are Kolmogorov, Chaitin, and Solomonff. Their contributions are given next. III. O RIGINAL PAPERS A. Andrei Kolmogorov The original ideas of complexity were published in two main papers. The first was in 1965 titled ”Three approaches to the quantitative definitions of information”[11]. In this paper Kolmogorov presented the combinatorial approach, probabilistic approach, and the algorithmic approach to define information. He stated that the two common approaches were the combinatorial and probabilistic. When describing the combinatorial method the author used sets of elements, language alphabets, and Russian literary text as examples in deriving the entropy, mutual information, and bounds on information. Then he applied meaning to these outcomes. With the probabilistic approach Kolmogorov focused on random variables with a given probabilistic distribution. He showed that the mutual information πΌ(π; π ) is non-negative in the combinatorial approach, but the probabilistic approach shows it could be negative and that the average quantity πΌπ (π, π ) was the true quantity of information content. The probabilistic approach was identified to be more applicable to communication channels with many weakly related messages. The last topic in his paper was the algorithmic approach which used the theory of recursive functions. To describe the length of a string π₯ (π(π₯) = π), he proposed to do it recursively in log π + log log π + log log log π + ⋅ ⋅ ⋅ bits, and continue this recursively until the last positive term. This would give a more efficient way to describe the length of a string π₯ (π(π₯) = π). With the use of recursive functions 4 Kolmogorov stated that this allowed for a correct description of the quantity of hereditary information. He concluded that he intended to study the relationship between the necessary complexity of a program and its difficulty. The second paper published in 1968 titled ”Logical basis for Information Theory and Probability”[12], Kolmogorov linked conditional entropy π»(π₯β£π¦) to quantity of information required for computing. His definition of condition entropyπ»(π₯β£π¦) was interpreted as the minimal length of a recorded sequence of zeros and ones of a program (denoted by π ) that constructs the value (denoted by π₯) when the length of it (denoted by π¦) is known. The definition was given by π»(π₯β£π¦) = ππππ΄(π,π¦)=π₯ π(π ). He saw a need to take the conditional entropy π»(π₯β£π¦) and mutual information πΌ(π₯β£π¦) and attach a definite meaning to these concepts. He stated that he was following Solomonoffs work, since he was the first to support his idea. After developing the conditional entropy definition the next developments were slowed because he could not prove algebraically or derive the concepts of πΌ(π₯β£π¦) = πΌ(π¦β£π₯) and π»(π₯, π¦) = π»(π₯) + π»(π¦β£π₯). He was able to able to derive some form of the concepts with approximate inequalities. The new definitions he gave were β£πΌ(π₯β£π¦)β£ − πΌ(π¦β£π₯) = π(log πΌ(π₯, π¦)) and π»(π₯, π¦) = π»(π₯) + π»(π¦β£π₯) + π(log πΌ(π₯, π¦)) He went on to analyze long sequences π₯ = (π₯1 , π₯2 , , π₯π ) with length π(π₯) consisting of zeros and ones to show that these sequences have entropy that is not smaller than their length, π»(π₯) ≥ π(π₯). With his student Martin-L¨ πf he next looked at the symptom of randomness and approximated this to Bernoulli sequences. In this paper he also included Martin-L¨ πfs work with m-Bernoulli type sequences that have the joint entropy π»[π₯β£π(π₯), π(π₯)]πΆππ − π and π»(π₯π β£π, π)πΆπππ − π. He also stated that Bernoulli sequences have logarithmic increasing entropy π»(π₯π ) = π(log π). These were the new basic information theory concepts that he found linking to probability theory and as well as introducing random sequences or sequences that do not have periodicity. He concluded the paper with a universal programming method defined by π»π΄ (π₯β£π¦) ≤ π»π΄ (π₯β£π¦) + πΆπ΄ and noted himself as well as Solomonoff for the accomplishments. There were several other papers he publish that built up to this idea [9] [8] [10]. B. Gregory Chaitin Gregory Chaitin was a mathematician and a computer scientist that contribute to Kolmogorov complexity. In 1966 he wrote a paper titled ”On the Length of Programs for Computing Finite Binary sequences” [2]. Chaitin goes into a great amount of detail about the functionality of Turing machines. He describes Turing machines as a general purpose input/output computer as shown below. The general purpose computer takes the input tape and scans the tape for binary input. It subsequently produces a binary output on the tape. The Turing machine is defined to be N-state M-tape-symbol or N-row by M-column table. Chaitins interest was in finding the shortest program for calculating an arbitrary binary sequence. The study includes several factors such as the number of symbols a Turing machine can write, changing the logic design, removing redundancies, recursion, and transferring from one part of the program to another. He concludes that there are exactly ((π + 1)(π + 2))π π possible programs for a N-state M-table-symbol Turing machine and this means that the machine has 5 Fig. 2. Turing Machine π π log2 π bits of information. He also proposes a redesign of the machine to remove any redundancies. The result of the paper shows the mathematical outcomes for changing the factors above. Chaitins second contribution to Kolmogorov complexity was in 1974 with a paper titled ”Information Theoretic Limitations of Formal Systems” [3]. In this paper Chaitin investigates mathematical applications of computational complexity. The goal of this paper was to examine the time and the number of bits of instructions to carry out finite or infinite amount of tasks. To be able to accomplish this he considered the size of the proofs and the tradeoffs between how much is assumed. After analyzing Turing machines and mathematic models for these machines Chaitin began to research randomness. In 1975 he wrote a paper titled, Randomness and Mathematical Proof. In this paper he established that for a non-random sequence 01010101010101010101 (01 ten times) that a computer would execute to produce the output would be relatively small. For example: 1010 → πππππ’π‘ππ → 01010101010101010101. (1) It is obvious the input processed by the computer is smaller size than the output. However, for a random sequence 01101100110111100010 with twenty random outcomes, this is closely modeled as twenty coin flips with the outcome of 220 binary series. Therefore, the program length to produce a random output sequence would be at least the length of the sequence itself. For example: 01101100110111100010 → πππππ’π‘ππ → 01101100110111100010. (2) The input is the same size as the output. He also touched on other properties of randomness such as frequencies of binary digits as well as formal systems which were concluded to not be able to prove a number can be random unless the complexity of the number is less than the system itself. G¨dels theorem was used to prove this. Lastly, in 1977 Chaitin publish a paper titled, Algorithmic Information Theory [5], in which he attempted to show that Solmonoffs probability measure of programs and his and Kolmogorovs ideas about size of smallest programs were equivalent. Others also examined randmoness with mathematical proofs [4]. C. Ray Solomonoff Solmonoff is known as being one of the founders of Kolmogorov complexity. Solomonoff used induction to predict future events based on prior results. In 1964 he published a two part paper titled, ”A formal Theory of Inductive Inference”[22][23]. For instance, if 6 taking a Turing machine that outputs some sequence of symbols from some length of input sequence, these output sequences could be assigned probabilities and predict the future events. This is a form of extrapolating data, specifically extrapolating of long sequences of symbols. Some of the mathematical models that he presented involved Bayes Theorem. Bayes Theorem of probability is based on conditional probability. One modern example of this is Bayesian SPAM filters. For instance, to predict if an incoming message is SPAM, previous information is used to assign a probability to the incoming message. If a user builds up an extensive list of known SPAM a Bayesian filter can use this to predict if the new incoming message based on the content of the message. Solomonoff develops mathematics to prove his induction/extrapolation methods, but at the time of his writing he also states that these findings are based on heuristics and it is likely that the findings (equations) may need to be corrected. Another method that he proposed in his paper is what he called inverse Huffman coding. When using Huffman coding, knowing the probability of strings allows a code to be constructed to code the strings minimally, so that the strings with highest probability are as short as possible. Inversion of Huffman code means that a minimal code is obtained for the string first, and using this code a probability of the string is found. In part two of his paper he applied his models to Bernoulli sequences, some Markov Chains, and phase structure grammars. D. Others There are two authors that provide basis for developing proofs and have been mentioned through the previous publications. One is William of Ochkam and is attributed to Occams Razor principle. This principle states that ”entities should not be multiplied beyond necessity” [17]. This means that if there are multiple theories, chose the simplest theory which will make the confidence of a prediction higher. The next is Kurt G¨ πdel which is attributed to G¨ πdel’s incompleteness theorem. This theorem used self-reference to prove statements within the system are true. This theorem is used in theory to determine what can be done and its limits. The fact that Kolmogorov complexity cannot be computed is due to G¨ πdel’s theorem. IV. F URTHER D EVELOPMENTS After the ideas of Kolmogorov complexity were put forth. Students of Kolmogorov began further developing the ideas. Martin-Lofs work in 1966[15] was dedicated to defining random sequences as well as developing test for random sequences. This was presented in a paper titled, The Definition of Random Sequences. In this paper Martin-Lof first restates the complexity measure of Kolmogorov and follows this by definitions and proofs of universal test for randomness. He also develops the idea that all non-random sequences are defined as the null set. The remainder of the paper is dedicated to Bernoulli sequences. Zovkin and Levin[14] also studied Kolmogorovs work and developed a paper that surveyed the results of Kolmogorov, Solomonoff, Martin-Lof, and others. They developed more ideas about universal probability and its relationship to complexity. In the 1970s, Schnorr studied and published three papers [19] [20] [21] on random sequences. These papers state several definitions about finite and infinite random sequences. In one paper he stated that he formulates a thesis on the true concept of randomness. These studies are related to gambling. Thomas and Cover[6] explain this in a more comprehensive manner and call this universal gambling. They explain that if a gambler gambles on sequences consisting of 0 and 1 (π₯ ∈ {0, 1}) without knowing the origin of the sequence there is a 7 way to associate his wealth with a betting scheme. The theorem that is proposed states that the length of the sequence (π(π₯)) is smaller than the wealth that a gambler achieves on a sequence plus the complexity of the sequence, formally: log π(π₯) + πΎ(π₯) ≥ π(π₯). This is proven using Schnorr′ s papers. V. A DVANCEMENTS IN T OPIC Since the publishing of Kolmogorov complexity in the 1960s there have been several developments and applications in this area. One of the topics is called prefix-free complexity. Prefix-free languages are used to solve a specific problem determining where one string stops and another begins. Several papers have been written by Miller [16], Bienvenu, and Downey [1] to look at the upper bounds on plain and prefix-free Kolmogorov complexity. New applications of Kolmogorov complexity have also been developed. For example, Boris Rybko and Daniil Ryabko have published a paper titled, ”Using Kolmogorov Complexity for Understanding Some Limitations on Steganography” [18]. Steganography is a way of writing hidden messages so that the sender and the recipient can convey messages, but others do not suspect that any message exists. They describe cover texts which can be a sequence of photographic images, videos, or text e-mails that are exchanged between two parties over a public channel. The main idea is that the speed of transmission of secret information is proportional to the length of the cover text. Therefore, cover texts with maximal Kolmogorov complexity need to have a stegosystem with the same order of complexity and this system does not exist. They also define secure stegosystems that are simple and that have a high speed of transmitting secret information. The simplicity is determined by the equation exp(π(π)), when n is very large and approaches infinity. Highly complicated sources of data are modeled by Kolmogorov complexity. Kolmogorov complexity has also been used to study data mining because data mining is equivalent to data compression. The process of data mining is looking at and extracting patterns from large set of data. Faloutsos and Megalooikonomou have written a paper, ”On Data Mining, Compression, and Kolmogorov Complexity” [7], in which they argue that data mining is an art and not a theory because the Kolmogorov complexity cannot be found. That is, data mining cannot be automated and many different approaches to it exist. Another very interesting and recent application of Kolmogorov complexity is detecting denial-of-service attacks. Kulkarni, Bush, and Evans wrote a paper on this topic titled, ”Detecting Distributed Denial-of-Service Attacks Using Komogorov Complexity Metrics” [13]. A denial-of-service attack occurs when a source machine floods a target machine with typically ICMP or UDP protocol packets overloading the target with packets. In this case the details (bits) of the packet are of specific interest and can be analyzed based on fundamentals of information theory. In this paper complexity definitions for two random strings X and Y are given by: πΎ(ππ ) ≤ πΎ(π) + πΎ(π ) + π. (3) This definition comes from the fundamental theorem of Kolmogorov complexity. In this theorem πΎ(ππ ) is the joint complexity of strings π and π . The individual complexities of strings π and π are given by πΎ(π) and πΎ(π ). When the correlation between the strings increases the joint complexity decreases. When a high amount of traffic occurs the packets can be analyzed in terms of their destination, type, and execution pattern. In this paper the authors were able to show that by sampling packets a complexity differential over the samples can be made from the collected samples. This complexity can be used to determine 8 if a denial-of-service attack is occurring. The results in this paper showed that this type of method outperforms current methods of applying filters. VI. C ONCLUSION In conclusion, Kolmogorov complexity studies the complexity of object and defines it to be the shortest binary program that defines the object. It is a subset of information theory relating to computer science. This idea was developed by Solomonoff, Kolmogorov, and Chatin. Each of these authors created the idea through inductive inference, use of Turing machines, random and non-random of objects (strings), and others. Although, Kolmogorov complexity cannot be solved it can be used as way of thinking and inductive reference for understanding computer science. It is a subset of information theory fundamental concepts such as entropy which is the expected length of the shortest binary computer description and others. This topic has many applications to physics, computer science, communication theory [17] and still an active area of research. R EFERENCES [1] L. Bienvenu and R. Downey, “Kolmogorov complexity and solovay functions,” CoRR, vol. abs/0902.1041, 2009. [2] G. J. Chaitin, “On the length of programs for computing binary sequences.” Journal of the Association for Computing Machinery (ACM), vol. 13, pp. 547–569, 1966. [3] ——, “Information theoretical limitations of formal systems.” Journal of the Association for Computing Machinery (ACM), vol. 21, pp. 403–424, 1974. [4] ——, “Randomness and mathematical proof.” Scientific American, vol. 232, no. 5, pp. 47–52, 1975. [5] ——, “Algorithmic information theory.” IBM Journal of Research and Development, vol. 21, pp. 350–359, 1977. [6] T. M. Cover and J. A. Thomas, Elements of Information Theory Second Edition. N. Y.: John Wiley & Sons, Inc., 2006. [7] C. Faloutsos and V. Megalooikonomou, “On data mining, compression, and kolmogorov complexity,” Data Mining and Knowledge Discovery, vol. Tenth Anniversary Issue, 2007. [8] A. N. Kolmogorov, “On the shannon theory of information transmission in the case of continuous signals.” IRE Transactions on Information Theory, vol. 2, pp. 102–108, 1956. [9] ——, “Kolmogorov. a new metric invariant of transitive dynamical systems and automorphism in lebesgue spaces.” Doklady Akademii Nauk SSSR, pp. 861–864, 1958. [10] ——, “A new invariant for transitive dynamical systems.” Doklady Akademii Nauk SSSR, vol. 119, pp. 861–864, 1958. [11] ——, “Three approaches tothe quantitative definition of information.” Problems of Information Transmission, vol. 1, pp. 4–7, 1965. [12] ——, “Logical basis for information theory and probability theory.” IEEE Transactions on Information Theory, vol. 14, no. 5, pp. 662–664, 1968. [13] A. Kulkarni and S. Bush, “Detecting distributed denial-of-service attacks using kolmogorov complexity metrics,” J. Netw. Syst. Manage., vol. 14, no. 1, pp. 69–80, 2006. [14] L. A. Levin and A. K. Zvonkin, “The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms.” Russ. Math. Surv., vol. 25, no. 6, pp. 83–24, 1970. [15] P. Martin-Lof, “The definition of random sequences.” Inform. and Control, vol. 9, pp. 602–619, 1966. [16] J. S. Miller, “Contrasting plain and prefix-free kolmogorov complexity,” In preparation. [17] P. V. Ming Li, An Introduction to Kolmogorov Complexity and Its Applications Second Edition. N. Y.: Springer, 1997. [18] B. Ryabko and D. Ryabko, “Using kolmogorov complexity for understanding some limitations on steganography,” CoRR, vol. abs/0901.4023, 2009. [19] C. P. Schnorr, “A unified approach to the definition of random sequences.” Math. Syst. Theory, vol. 5, pp. 246–258, 1971. [20] ——, “Process, complexity and effective random tests.” J. Comput. Syst. Sci.,, vol. 7, pp. 376–378, 1973. [21] ——, “A surview on the theory of random sequences.” R. Butts andJ. Hinitikka (Eds.), Logic, Methodology and Philosophy of Science. Reidel,Dordrecht, The Netherlands, 1977. [22] R. J. Solomonoff, “A formal theory of inductive inference. part i,” Information and Control, vol. 7, no. 1, pp. 1–22, 1964. [23] ——, “A formal theory of inductive inference. part ii,” Information and Control, vol. 7, no. 2, pp. 224–254, 1964.