A Unidirectional Approach to Achieving Instant Message Confidentiality1 Student: David Ja Advisor: Dr. Matt Blaze 1 I would like to thank Micah Sherr for all the guidance and support he has provided throughout this project. A Unidirectional Approach to Achieving Instant Message Confidentiality Student: David Ja (davidja@seas.upenn.edu) Advisor: Dr. Matt Blaze Abstract Instant message systems (IMs) have become a common tool for communication used today. Yet, with all the frenzy for creating security and privacy for phone calls and faxes, IMs remain relatively unprotected. While many protocols exist to supplement and secure the current IM systems, there are too many drawbacks for them to become widely popularized. The common fatal flaws are often that they are not simple enough to be widely used and that they require bilateral cooperation among the communicating parties. Confusion protocol offers an alternative to the current existing protocol. Confusion protocol does not encrypt traffic but generates noise along with actual traffic. This results in the actual traffic remaining secure. The purpose of this project is to apply the confusion protocol to the current IM systems. The goal of the system is to create a secure IM system that can replace the current IM system by creating a library add-on that implements the confusion protocol on top of the current IM systems. Most importantly, it will provide the user the same flexibility of the current system without the added complication of encryption. Related Works Instant Message Systems There has been a plethora of research done on instant message systems since the beginning of the internet. The modern IM system, such as GAIM (instant message system that works with multiple protocols), a client-server model, will be the basis for this research project. Encryption Although encryption for IM is not common, encryption for internet traffic is not a novel concept. Implementations of encryption protocols such as PGP and SSL have been around for more than a decade. A majority of the encryption algorithms currently implemented are no different in structure than the PGP system. PGP offers two primary services: encryption and digital signature. Both of PGP’s function can be used to protect the message from eavesdropping. Encryption: the encryption algorithm used by PGP involves a two-way public key encryption follow by exchange of symmetric key. This means the symmetric session key can be protected by the RSA algorithm to ensure the symmetric key is secure. The reason symmetric key encryption is used instead of straight RSA is because of the efficiency of the symmetric key encryption. Digital Signature: digital signature allows for the authentication of any file. It is generated using a hash code and the private key of the sender. This means the signature can be attached to the end of the messages to allow for authentication. SSL provides similar encryption services relying on two-way key exchange before a symmetric key encryption algorithm. It also provides digital signature service as PGP. Other security protocols not mentioned here are: PEM, MOSS, S/MIME, PKCS#7, CMS, etc. All these encryption protocols have been used to protect internet traffic like emails. IM encryption is usually a modified version of these processes. The public key exchange and symmetric key system that is used in PGP is often used to encrypt IM with minor modifications (example, SecureIM, AIM built in encryption, etc). Confusion Project2 Confusion is a project that Dr. Blaze’s group worked on in the Distributed Systems Lab. Confusion in a simple sense is the utilization of noise to hide the real message. This, in turn, confuses the interceptor. Current papers suggest looking at the security problem from the eavesdropper’s perspective, by considering the “fidelity” of a system. These papers suggest that the confusion concept can degrade the eavesdropper’s ability to compromise a network system. The important result of the Confusion proposal is the unilateral implementation that can be used with Confusion. It does not require the cooperation of both users to obtain security. Thus, the Confusion proposal is a strong alternative to the added confusion and complication of public key exchange and symmetric key system or other encryption algorithm that require the users to have previously agreed on the encryption and the encryption keys. It is the ability to incorporate confusion into existing protocols without changing implementations that allows its use on legacy systems. In conclusion, a confusion-based network protocol can ensure security of information from interception. Current Proposal The key change to this project compared with current systems is the additional implementation of the confusion protocol on top of a currently existing client of IM system. The project idea is to implement confusion as a way to securely transfer IMs without the overhead of complicated key exchange. This solves three problems. 2 “The Eavesdropper’s Dilemma”, Eric Cronin, Micah Sherr, and Matt Blaze, submitted for publication. First, the key exchange requires the user computer to actually carry out encryption and decryption processes. This requires the user to install some software on the currently used machine. This turns majority of users away from utilizing this security measure because it is not user-friendly and the “reward” of extra security does not seem to outweigh the inconvenience of additional software. The additional library provided by this project will be much more user-friendly. Second, current security implementations are vulnerable to attacks that make them insecure. Dr. B. Schneier published a paper on this vulnerability: a chosen cipher attack against the majority of the security protocol discussed above.3 Confusion offers an alternative that can provide some measure of data confidentiality without utilizing cryptography. Third, the most important improvement is that implementing confusion does not require symmetric implementation on the receiving side. The receiving user need not do anything for the traffic to be secure. The confusion protocol is carried out unilaterally. Only the sender has to use the library for the outgoing message to be secure. The receiver need not even know the confusion protocol is being used by the sender, unlike traditional key exchange encryption. These three advantages mean a confusion-based system will have significant advantage over the current security implementations for IMs. “A Chosen Ciphertext Attack Against Several E-Mail Encryption Protocols”, Jonathan Katz. Bruce Schneier, June 23, 2000 3 Technical Approach The construction of this project will be broken down into two sections. The first will be the IM system itself. The second will be the confusion protocol on top of the IM system. IM System The IM system will be no different then the current existing model. The goal is to be able to add the confusion protocol to current IMs without modifying them. The IM protocol that is going to be used for testing will be the Jabber protocol implemented on UNIX. The system that will be used to demonstrate the confusion protocol would be the GAIM system. The Jabber protocol offers all the ability that other IM protocol do. It consists of client and server architecture. The client has the ability to edit and type messages and send them off to another client. Client-server architecture takes the messages sent from client to a server than reroute the message to destination client. This allows the server to keep track of user data on the servers and provide additional services such as contact list management and external access. The real services are all completed on the server side and the client side serves only to interact with the user. The Jabber protocol is implemented over a TCP network. The diagram bellow illustrates the basic server client architecture. C1----S1---S2---C3 | C2----+--G1===FN1===FC1 The symbols are as follows: o C1, C2, C3 = XMPP clients o S1, S2 = XMPP servers o G1 = A gateway that translates between XMPP and the protocol(s) used on a foreign (non-XMPP) messaging network o FN1 = A foreign messaging network o FC1 = A client on a foreign messaging network "-" represents communications that use XMPP and "=" represents communications that use any other protocol4 4 http://www.ietf.org/rfc/rfc3920.txt, the Extensible Messaging and Presence Protocol (XMPP): Core, by the Jabber Software Foundation XMPP stands for Extensible Messaging and Presence Protocol (formal name for the Jabber Protocol) Protocol The Jabber protocol defines the communication syntax between all the elements of the IM system. This includes multiple servers, clients and gateways. A gateway is a system which allows the Jabber system to interact with foreign clients and servers. The core of the XMPP relies on its utilization of XML as its base for communication. The communication syntax relies on the parsing of specific XML tags, which are noted in RFCs 3920 and 3921 (I will not go into much details here as the RFCs are incredibly extensive). The communication between server and client is done through establishing a stream (usually over a TCP connection). Within a stream there can be multiple XML stanza, (XML stanza is the unit of communication), sent between the elements on either end of a stream. Below is an illustration of the stream. |--------------------| | <stream> | |--------------------| | <presence> | | <show/> | | </presence> | |--------------------| | <message to='foo'> | | <body/> | | </message> | |--------------------| | <iq to='bar'> | | <query/> | | </iq> | |--------------------| | ... | |--------------------| | </stream> | |--------------------|5 The key words contained in the ‘<’ and ‘>’ denotes a specific tag; usually they come in pairs. The Jabber protocol also supports security protocols and other protocols that prevent bad connections and other errors. All are defined in RFCs 3920 and 3921. The primary part of the protocol that concerns this project is the <body> to </body> stanza. This stanza 5 http://www.ietf.org/rfc/rfc3920.txt, the Extensible Messaging and Presence Protocol (XMPP): Core, by the Jabber Software Foundation encases the actual IM message itself. The confusion protocol is only implemented on the message between those two tags6. Confusion The confusion protocol will be layered on top of the IM system. The client will utilize a semantic noise generator to secure any transmission. A semantic noise generator is a generator that generates words in a specific language. In this case, it would generate English noise. Use of a semantic noise generator means that the noise created, which is indistinguishable from the message itself, will successfully hide the real transmission.7 The key modification to the normal client is that the messages sent from client are going to be sent along with the noise generated by the semantic noise generator. Again the confusion protocol will be implemented as a library. Because it is also only implemented on the sender side all security action will be carried out unilaterally regardless of the receiver. Implementation The implementation is broken down into three distinct parts: the noise generation, the capturing of system calls to replace with new libraries, and the editing process for the TTL and MAC of outgoing packets. By changing the TTL and MAC properly, the network will drop the packets before it gets to the end user. Example TTL: The TTL number is essentially the number of “jumps” the packet takes before it reaches its destination. If the packet takes more jumps than the TTL number, it is dropped from the network. Thus, by changing the TTL number on the confuse packet to be LOWER than that of the number required to reach the destination, the receiver would never receive the confuse packets. Instead, the packets would all be dropped during the transportation. Noise generation Noise generation relies on the Dadadodo technology.8 The Dadadodo program written by Jamie Zawinski utilizes the input text and constructs a random parse tree. The new texts are generated by doing a random walk through the parse tree weighted by the probability of the connecting words. The important result is that the more extensive the input text the better the end random texts will be. I will use chapter1 of Orwell’s famed 1984 novel as 6 see implementation section see section Security 8 http://www.jwz.org/dadadodo/, Dadadodo by Jamie Zawinski 7 an example in this project. However, any form of English human text will support Dadadodo (For instance, this protocol works equally well on the complete works of William Shakespeare). The noise generation will be done by a function calling Dadadodo, via popen(). The output of the Dadadodo is stored in a 2-dimensional data structure, where the first dimension is the pointer to each word and the second dimension is the characters themselves that form the word. Each word is fixed at 20 characters, if a word is not as long as those null characters are used to fill out the end. The reason 20 characters are chosen is to solve the problem that TCP packets are numbered and if the word packets are not send out in the same size, the sentence can be easily reconstructed and lose the security. Consider the following example: Real Message: David had Chinese for lunch Confusion Message: Micah helped quite a bit If this were sent out on a TCP stream there are only 4 possible combinations because “David” can only be followed by “helped quite a bit” or “had Chinese for lunch”, same goes for “Micah”. If all the packets are 20 byte size than the combinations become 25 = 32 combinations. The second problem is the irregular length of the sentence. This is solved by passing the length (number of words) of the actual message into the noise generation function, such that all sentences not long enough are padded by NULL packets and sentences that were too long are cut off by the length variable. The noise generation function calls Dadadodo “number” of times (“number” is a passed in variable as well). This results in the returning of a 3 dimensional character array, indexed by number of confused messages, word number, and character string of a word. The final size of return is “number” by “word_count” by 20. The implementation is done in the confuser2.c file. Capturing System Calls All relevant system calls involved in sending packets over a network are replaced by a new set of system calls. The functions replaced are: Socket Setsockopt Write Send Close The most important rewrite comes in the “write” and “send” system calls. These two system calls are rewritten to parse through the send message in order to separate the real IM message into word for word packets. The “message body” in a Jabber protocol is contained in the <body> and </body> tags. The “write” and “send” functions first detect the start tag and then break the sentence up into words (padding the words up to 20 characters and sending each word) until end tag. The code is contained in the libcvore2.c file. Confuse Process This is the background process that waits for each message, calls the generate noise function, and sends out the noise and the real message with the choice of TTL or MAC for confusion. The process first waits for a message then when it receives a message it notes down the packet information (IP address, etc) so that confuse packets can be generated. Upon receiving the <body> tag from the send command, the program determines if this is part of a message the already had noise generated. If not, then, the program generates noise messages and stores them with a counter (to keep track of noises sent) and a destination IP address (to differentiate different calls). It proceeds to send the message through with a set of generated noise. Then, incrementally the counter that is stored with the generated noise (goes onto the next word). The noise information is stored in the link list of noise nodes defined as a struct. The program continuously loops to wait for new messages. The code is contained in the confuser2.c file. Security This is a two part discussion. The first issue is the security of the algorithm. The second is the cost of the algorithm. I will begin with the latter. Cost There are two different type of cost related to this algorithm. The cost in terms of additional bytes sent that taxes the network (measuring cost to network) and secondly, the latency related to the additional transmissions. Consider a Jabber IM message to be size m + x + n m is the size of the actual text message (ex. “my roommate is asleep.”) x is the xml overhead of the jabber protocol n is the network overhead (basically everything other than data the packet carries) Let C be the number of noise packets generated per word Let T be the time it takes to send one packet Assumption made on the messages for easiness in the analysis: 1. The message contains words of the English language. 2. Messages have a consistent grammatical structure. 3. Messages do not contain abnormal characters or spacing (ex. ACSII art). 4. Message is sent in one packet. 5. Words are restricted to less than 20 characters. Latency Latency caused by retransmission comes from the TCP protocol. Each transmission waits for a send back before transmitting the next packet. The result, sending the additional noise packet, increases the additional wait time. For each Jabber IM message, the implementation breaks the message down to all data before <body> tag, <body> tag itself, each word in the actual text message, </body> tag, and the rest of the data. This breaks xml part of the jabber message into 4 parts (all data before <body> tag, <body> tag itself, all data before <body> tag, <body> tag itself and the rest of the data) and the text into word packets. Thus, the number of packets sent instead of the original packet is 4 + m / 6, where 5 is the average length of the English word and the last character for space or punctuation delimiting a word. The noise packets are all the same size as the word packets (which are transformed into 20 bytes regardless of the original word size). Thus the number of noise packets sent is C * m / 6. Therefore the final number of packets sent to through the network instead of the original packet is 4 + m / 6 + C * m / 6 Thus, the Latency which is caused by addition packets is ( 4 + m / 6 + C * m / 6 ) * T Byte Cost The byte cost will be the cost to the network. Again consider the same packet we sent above. The total number of packets sent because of the confusion algorithm is 4 + m / 6 + C * m / 6. The network overhead is n and the network information is duplicated for all the packets. Thus, we sent n * (4 + m / 6 + C * m / 6 ) number of bytes of network overhead. The original packet contains length x + m. The implementation breaks the packet down to words and lengthens each word to 20 characters. So the data sent to complete the original message becomes x + m / 6 * 20 Finally consider bytes sent in the noise packets. Each noise packet is 20 bytes and there are C copies per word. The bytes sent are C * m / 6 * 20 The total number of bytes sent is: n * ( 4 + m / 6 + C * m / 6 ) + x + m / 6 * 20 + C * m / 6 * 20 or n * ( 4 + m / 6 + C * m / 6 ) + x + ( m / 6 * 20 ) * ( C + 1 ) The cost ratio is: ( n * ( 4 + m / 6 + C * m / 6 ) + x + ( m / 6 * 20 ) * ( C + 1 ) )/ ( n + x + m ) Algorithm The English language consists of the approximately 500,000 included in the Oxford dictionary. If each word were to be confused with 500,000 noise words then the message is considered to be “secure”.9 Unfortunately, if C = 500,000, the cost to the network would be too high for most networks to handle efficiently even though the cost is a linear increase. If the number of packets is below the number of possible words in the English language than security depends on the combination of words that can be use to hide the real message. Since for each word, the implementation sends C confused words, the number of combination is ( C + 1 )( m / 6 ) without taking into account any structure of the English language. 9 Claude E. Shannon, "Communication Theory of Secrecy Systems", Bell System Technical Journal, vol.28-4, page 656--715, 1949. English The language structure increases the complexity of the problem. A shrewd eavesdropper can analyze the combination of noise packets send and eliminate the combination that does not follow the norm of the English Language autonomously. The term “norm” is used because there is still no perfect method to determine if a combination of English words creates a valid sentence. The lack of a perfect grammar checker illustrates the challenge posed by attempting a perfect, autonomous analysis of the English language. However, many probabilistic attempts have been accomplished. Dadadodo, for example, builds a Markov chain based on existing English text to generate new text that follows the same probabilistic pattern. For this confusion implementation, however, it implies ( C + 1 )( m / 6 ) is not the measure of security but ( C + 1 )( m / 6 ) * x where x is the percent of the combination that cannot be eliminated autonomously by taking into account the English language. Nonautonomous threat is eliminated because one can create a large enough value for C such that manual elimination becomes infeasible. Note that x can be a function of C and not a constant. Empirical Results The derivation is beyond the scope of this current project. It is however trivially true that ( C + 1 )( m / 6 ) * x has a lower bound of ( C + 1 ). Thus, an increase in copies of confuse packets translates to at least a linear increase of security ( d ( C + 1 ) / d C = 1 ). However, some empirical results were obtained. The easiest way to eliminate sentences that does not conform to the norm of the English language is to build a Markov chain of words base on a pre-existing language text, much like Dadadodo. Then proceed to eliminate the sentence combinations that cannot be stepped through with the tree build with the existing text. With limited amount of testing on a very simple program that builds the tree from chapter 2 of Orwell’s novel, the results have been promising. The program return the most likely sentence base on only 5 noise copy and the real message. So far all the results have not returned the original message. The reason for using Chapter 2 of Orwell’s novel is the importance of using consistent language so that past and current text messages are compatible with one another. If Shakespeare were to be used, then most likely, none of the combination, including the original message, would have passed through the filter. Naturally an eavesdropper would use past text collected from the user the eavesdropper is eavesdropping on. Conclusion While there is much additional research that needs be done to complete the security analysis on the confusion implementation over IM, the current implementation offers much promise. The same implementation can also be implemented for email and other form of network traffic that transports English text. The implementation of this project will not replace the use of encryption but offer additional security to current IM systems. There are many assumptions that were used only to serve to simplify the implementation that on a more elaborate system can be removed (These include, for example: the 40 word sentence limit, noise generated in the form of complete sentence, and abnormal character restriction which can be removed with more complex inter-process communication and parser.). The assumption of 20 character word limit is a choice made based on the fact that the majority of English words are far shorter than 20 characters. The longest word recorded in the Oxford dictionary is 52 characters. The filler characters can also be changed (currently set as space) to more suitable characters depending on the system. The implementation overall succeeded in its original intent: to generate noise to hide the real message transporting across a real time IM stream. There were many implementation difficulties that were successfully maneuvered around, such as the TCP packet sequencing problem.10 With the implementation, the main goal of proving confusion can be achieved on a practical legacy system is accomplished. Even though its security value will need more rigorous analysis, the preliminary results support confusion protocol for IMs. 10 Additional explanations in the difficulties section. References 1. http://www.ietf.org/rfc/rfc2440.txt, the Open PGP protocol, PGP protocol originally by P. Zimmerman 2. “The Eavesdropper’s Dilemma”, Eric Cronin, Micah Sherr, and Matt Blaze, Submitted for publication. 3. http://www.jwz.org/dadadodo/, Dadadodo by Jamie Zawinski 4. Jonathan Katz. Bruce Schneier , “A Chosen Ciphertext Attack Against Several E-Mail Encryption Protocols”, 9th USENIX Security Symposium, June 23, 2000 5. http://www.ietf.org/rfc/rfc3920.txt, the Extensible Messaging and Presence Protocol (XMPP): Core, by the Jabber Software Foundation 6. Claude E. Shannon, "Communication Theory of Secrecy Systems", Bell System Technical Journal, vol.28-4, page 656--715, 1949. Appendix 1 Difficulties The difficulty this project faces thus far lies in both the theoretical work and the implementation of the protocol. There exist many works on normal client server IMs systems which provided models to conduct theoretical security analysis. In the past, however, many existing proofs are of security on encryption. The same procedures are difficult to duplicate for proving the protocol is secure. Unlike the proof for an encryption algorithm, much depends on the ability to analyze a specific language. Proof also depends on the ability to generate noise. Thus, the two abilities rely on the same technology. In a sense, the implementation also increases its strength as the ability to “decode” the language improves. The traditional theoretical concept of perfect secrecy is also not applicable to this project. Perfect secrecy is the condition in which the probability of resulting in the encrypted text is independent of the message.11 More simply, given the encrypted text, the only way to get the decrypted text is as good as randomly guessing. The other difficult problem however is that the cost of security cannot be analyzed until the implementation is completely designed. This problem halted any progress on theoretical work until the project was almost complete. This is because there are many design consideration that affect the cost of the confusion algorithm (the 20 character per word issue for example). The third difficulty is the implementation of the protocol because there are many unforeseen network issues like TCP numbering of the packets, which disallow transmission of sentences of uneven length. However, by contrast, utilizing Dadadodo12 proved to be relatively simple. However, the greatest difficulty is the problem with network programming itself. It is difficult to test and construct programs for a network system. This significantly impeded progress on this project at an implementation level (We spent 3 weeks on a bug that we have no understanding of—regarding either its occurrence or disappearance.). 11 Claude E. Shannon, "Communication Theory of Secrecy Systems", Bell System Technical Journal, vol.28-4, page 656--715, 1949. 12 http://www.jwz.org/dadadodo/, Dadadodo by Jamie Zawinski