Cryptography – Authentication Codes When Alice sends a message to Bob (encrypted or not), how can Bob be sure that it was Alice who sent the message, and how does he know that the message was not altered by someone else during its transmission. This points to the need for an authentication code. The mathematical setting: There are three participants: Alice, Bob, and Oscar. Alice and Bob want to communicate over an insecure channel (e.g., by e-mail, fax, or cell-phone). Oscar (the ``bad guy'') has the ability to introduce his own messages into the channel and/or to modify existing messages. Consider two types of attacks by Oscar. When Oscar places a (new) message m' into the channel, it is called impersonation. When Oscar sees a message m and changes it to a (different) message m' m, it is called substitution. As an example, suppose that Bob is Alice's stockbroker. When Alice sends a message to Bob, such as "buy 1000 shares of Acme stock'', she would not be very happy if Oscar changed buy to sell! The goal of an authentication code is to allow Bob to detect with high probability when such an attack has taken place. Definition: An authentication code is a four-tuple (S, A, K, E), where the following conditions are satisfied. S is a finite set of source states A is a finite set of authenticators. K is a finite set of keys. For each k K, there is an authentication rule ek E, where ek: S A. How an authentication code works: Alice and Bob jointly choose a secret key k K at random and ahead of time. A source state is just the information that Alice wants to communicate to Bob (e.g., ``buy 100 shares … ''). When Alice wants to communicate the source state s S to Bob, she uses the authentication rule ek to construct the authenticator a = ek(s) . The message m is formed by concatenating s and a, i.e., m = (s,a). The message m is then sent over the channel. When Bob receives m , he verifies that a = ek(s) to authenticate the source state s . If a ek(s), then Bob is able to detect that an attack has taken place. Let P0 denote the probability that Oscar can deceive Bob by impersonation (sending a message in Alice's name) Let P1 denote the probability that Oscar can deceive Bob by substitution (changing Alice's sent message) Theorem: Suppose there is are m MOLS(n). Then there is an authentication code for m source states, having n authenticators and n2 keys, in which P0 = P1 = 1/n. Note that this is the best possible with n authenticators. Example: Suppose that Alice and Bob want at least 300 source states (so they need at least 300 MOLS). Now suppose that they want a security level of 1/5000. This says that they want MOLS of order n 5000. The easiest way to satisfy these requirements is to take n to be the smallest prime greater than 5000. This is 5003. They construct 300 MOLS(5003) (we saw how to do this earlier). Call these L1, L2, …, L300. They also have a previously agreed upon secret key k – this will be an ordered pair of numbers from 1 to 5003 (say k = (1244, 346)). Then, say if Alice wants to send the source message s = 219 (this could stand for "buy 219 shares of Acme). Alice computes her authenticator a = e(1244, 346) (219) = L219(1244, 346), and sends the message m = (219, a) to Bob. Bob can check the authenticity of m by looking at the (1244,346) cell of the L219. If it is not a then he knows that something is wrong. Latin Square Statistical Designs and Covering Arrays Latin squares provide a efficient way to test for two way interaction among several variables. Example: Suppose there are n varieties of wheat to be tested with n fertilizers and n insecticides. Then there are n3 variety-fertilizerinsecticide triples to be tested. To reduce experimental cost we can use a Latin Square Design. Let the symbols of an n n latin square correspond to the wheat varieties varieties. Let the rows correspond to the n fertilizers Let the columns correspond to the n insecticides Can test each variety of wheat with each of the fertilizers and insecticides in n2 tests. n=4 f1 f2 f3 f4 i1 1 3 4 2 i2 2 4 3 1 i3 3 1 2 4 i4 4 2 1 3 Note that wheat variety 1 is matched with the four fertilizer-insecticide pairs (1,1), (2,3), (3,4) and (4,2), so is tested once with each fertilizer and each insecticide. An analysis of variance can determine the significance of the data and whether or not one of the fertilizers or insecticides is better than the others. This can easily be generalized to more than three variables by using orthogonal latin squares. Write the 16 tests in a 3 16 array: rows columns symbols 1111222233334444 1234123412341234 1234341243212143 This is called an Orthogonal Array. This one is an OA(3,4) Fact: The existence of k MOLS(n) is equivalent to the existence of an OA(k,n). 1 2 3 4 2 1 4 3 3 4 1 2 4 3 2 1 OA’s have been generalized to Covering Arrays. (Basically, these are arrays that cover all pairs of variables) An example (Cohen, Dalal, et. al. (1996)): ATT is testing a telephone network service called AIN (Advanced Intellegent Network). This is an automated phone service. There are four parameters: 1. Type of announcement – this has three values None Interruptible Noninterruptible 2. User input of digits No digits Fixed number of digits Variable number of digits terminated by the # key 3. Make a billing record Yes no 4. User access Local phone Long-distance trunk (none for announcement and none for the number of digits is not permitted) Note that to test all pairwise interactions among the 4 parameters would take 32 = (3 3 2 2) 4 tests. The following covering array shows that all pairwise combinations of the parameter values can be tested for in 8 tests (a 75% reduction in experimental cost) Test 1 2 3 4 5 6 7 8 non- non- nonAnnounce. none none inter. inter. inter. int int int Digits fixed var none fixed var none fixed var wanted Billing no yes no yes yes yes yes no Access line trunk trunk trunk line line trunk trunk type Much research has been done to design good covering arrays