Support for the adoption of a domain based authentication mechanism for electronic mail Robert T. Johnson, III Abstract This paper analyzes the current mail system and the weaknesses inherent in it that allow unethical users to misrepresent themselves and attempt to hide their identities. Ways in which this is done is discussed, as well as what information present in mail messages we can expect to have some guarantee on accuracy in the presence of potential forgery. A current proposal submitted for consideration by the IETF is then discussed and a recommendation made that the proposal be adopted as an eventual standard to increase the integrity of mail messages and diminish the impact of spam. 1 Introduction Electronic mail (email) has seen widespread adoption as a communications medium. Many of the advantages of other forms of communication are present in email and it provides new ones as well. Messages can be sent easily and generally delivered quickly to a receiving mail server. Recipients can retrieve messages and respond if needed at their convenience. Worldwide communication between interested parties is possible. Other electronic information can also be sent with a message. You can communicate with anyone as long as you know their email address. There is almost no cost to send messages on a per message basis. The popularity of the medium, open communication, and nearly nonexistent cost per message has also given rise to nefarious uses of electronic mail; unsolicited bulk commercial advertising and attempts to trick users into giving up personal financial information, such as bank account numbers (phishing). Labeled as spam and the perpetrators as spammers, these messages clog up the inboxes of all types of users and consume large amounts of the limited resources of systems involved in mail delivery. This has detrimental effects on the usefulness of email as a communications medium. In the extreme, if not halted or kept in check, may even have the potential to destroy the usefulness of email as a communications medium. Time, money, and system resources are lost both by the recipients of spam and by the entities involved in the delivery of it. For quantitative data on the problem, 96% of end users think that spam is at least annoying, while only 4% do not think it is annoying. That translates to hundreds of millions of annoyed end users worldwide. In one day, AOL end users reported over five million pieces of spam. Costs for receiving spam have been estimated at: 30 to 50 dollars a year in direct costs to each end user, 730 dollars a year in lost productivity for each employee, and 8,900,000,000 dollars a year in total cost to US corporations [13]. Obviously this is a serious problem! Spamming is attractive to the unethical and criminal element of society due to the potential to reach a large audience for almost no cost to the spammer. The cost of the spam is forced upon both the recipients of it and the entities involved in delivery. The ability to do this so easily is what has made spam so prevalent in the email system. Spam is like “junk” postal mail that has postage due upon delivery and must be paid for by the recipient. Even with the large public outcry against spamming, it is still a popular, and apparently profitable tactic among the undesirable element of society. Thirty million dollars was seized from one spam group and another spam related business was estimated at 3,200,000,000 dollars [13]. One of the reasons for this is due to the low accountability spammers currently face for their actions. Since most spammers are able to get away with it, there is little incentive for them to stop. So far the most successful, and promising ways of dealing with spam involve blacklists, whitelists, and filtering it at some level. These are valuable ways to combat spam and can be quite effective. However, these methods are stopgap measures that do not provide incentives for making spammers stop their activities. They also have their own potential drawbacks as well, still requiring users to occasionally sort though mail tagged as spam and create opportunities for legitimate mail to be lost. Ultimately, one of the biggest weaknesses of the current mail system is the lack of authentication present for messages. Undesirable elements of society exploit this weakness to trick users into thinking a message is from someone else and to avoid being detected as the sender of the message. How this is done in the current mail system, as well as a current proposal called DomainKeys that attempts to overcome these weaknesses are the topics of this paper. The next section will discuss the original SMTP protocol, while Section 3 discusses the newer ESMTP protocol. Open Relays and the Authenticated SMTP extension will be covered in Section 4, while issues involved in tracking spammers based on message header analysis will be presented in Section 5. DomainKeys is discussed in Section 6 and Section 7 concludes the paper. 2 SMTP The original SMTP protocol was developed in 1982 for the Arpanet, and was designed as a very open protocol. Historically, this has been one of emails greatest strengths – anyone can send anyone a message as long as they know that person’s email address. However, due to the nefarious activities of spammers the usefulness of this strength is being undermined. The protocol provides communication between a mail-user agent (MUA) and a mail transfer agent (MTA), also known as a mail server, as well as communication between an MTA and another MTA. Due to the fairly simple nature of the protocol, users can also telnet on port 25 (the reserved port for SMTP) to an MTA and send mail directly. Several MTAs can also be involved in mail delivery, which is known as mail relaying. The general pattern to send mail to the final destination MTA is as follows: MUA→MTA→MTA, or MUA→MTA→MTA→…→MTA if relaying is used [1]. Since one of the main goals of a spammer is to avoid being identified, and since an ESMTP receiver is required to be able to accept mail from an SMTP sender to be fully compliant with the standard, I will first present the minimal use of the protocol to send mail. It is very unlikely a smart spammer would use more than the minimum required unless it provided them with some advantage to avoid being identified. There are five ordered commands that are needed to accomplish this, as shown in the table below [2]. Command HELO MAIL RCPT Parameters domain FROM: <reverse-path> TO: <forward-path> Parameter meaning Host name of sender-SMTP Path leading back to the originator of the message Path leading to the ultimate destination of the message DATA QUIT Table 1 – Commands used by the sender-SMTP to send mail HELO is used to synchronize the initial connection, and should be used to identify the sender-SMTP to the receiver-SMTP. MAIL should be used to provide a path back to the originator of the message. Between the DATA and QUIT command contains both the message headers and the message body. The minimum message headers required for a valid message are shown in the table below, though only one of the To or Bcc headers is required [3]. Header Date: From: To: Bcc: Parameters date-time mailbox address address Parameter meaning The date and time the message was sent The mail address of the sender The mail address(es) of the recipient(s) of the message Same as To:, but can be empty Table 2 – Minimal message headers When an MTA takes responsibility for delivering a message, it also adds a new header at the top of the message. This header takes the form of Received: <stamp>, with an example shown below. Received: FROM ABC.ARPA BY XYZ.ARPA ; 22 OCT 81 09:23:59 PDT This potentially provides information about who the sender-SMTP and the receiverSMTP were and what time the mail session took place. When the final receiving MTA takes receipt of the message, it also adds an additional header at the top of the message in the form of Return-Path: <reverse-path>. The reverse-path is the same reverse-path that is given in the MAIL command for this session. An example is shown below. Return-Path: <@ABC.ARPA:JOE@ABC.ARPA> The problem involves who provides information in the initial mail session. In this session, the parameters of both the HELO and MAIL commands, as well as all the initial message headers are dependent upon the initial sender-SMTP. If the initial sender-SMTP chooses to provide incorrect information in some or all of these areas, there is nothing that stops it from doing so. An example of the complete message headers illustrates this [2, 3]: Return-Path: <@DEF.ARPA, @ABC.ARPA:JOE@ABC.ARPA> Received: from DEF.ARPA by GHI.ARPA ; 27 Oct 81 15:15:13 PST Received: from ABC.ARPA by DEF.ARPA ; 27 Oct 81 15:01:59 PST Date: 27 Oct 81 15:01:01 PST From: JOE@ABC.ARPA TO: SAM@GHI.ARPA Dependent on initial sender-SMTP If the true initial sender-SMTP is really sending the message on behalf of say, david@spammer.com, there is no information about this present in the final mail message headers. A spammer is able to mask his own identity in this fashion. The only information we are left with for some clue to their identity is what they have in the body of their message and the bogus information provided in the headers. 3 ESMTP As stated earlier, full compliance with the newer standard requires the ability to interact with a sender-SMTP that uses the original protocol and to accept their mail for delivery. So that the receiver-ESMTP can determine if a session has been established with a sender-SMTP or a sender-ESMTP, the HELO mail command is replaced by the command EHLO. For tracking purposes, there is one important addition, though it is not absolutely required for compliance with the specification. The addition involves potentially including at least the IP address of the sender-SMTP from the TCP connection and optionally the host name for the IP address if available, as well as the information provided by the sender-(E)SMTP in the (EHLO)HELO command [4, 5]. The only reason I believe a legitimate MTA would not wish to include this information is for some type of anonymous email service, though such services provide a perfect opportunity for a spammer to avoid being traced by IP address if the service provider is unscrupulous or careless [6]. The example below shows what the new Received header can potentially look like. Received: from ad.bogus.com (goofy.cs.wisc.edu [128.105.181.25]) by obsidian.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id iBG8gXCs029856 Host name given in the EHLO command The real host name and IP There are two very interesting aspects of the current specification when considering their relation to spamming. To be fully compliant, a receiver-ESMTP must not refuse to accept a message if the domain name given in the EHLO command does not correspond to the IP address of the client. This also happens to be one of the most common tricks spammers use to try to avoid detection. Also, full compliance requires that a receiver- ESMTP that provides relay capabilities must not inspect the message headers or body unless performing an attempt to detect mailing loops [4]. Since one of the main techniques spammers use is finding relays that will deliver their mail for them (known as open relays), this information could be useful for an open relay to decide to refuse to accept responsibility for delivering their mail. Using an open relay to send out spam allows the spammers to trick the open relay into performing the bulk of their work for them. Note: For the remainder of the paper I will use the term SMTP to refer to the enhanced SMTP protocol. To specifically distinguish between the original and the extended SMTP protocol the terms “original SMTP” and “extended SMTP” will be used instead. 4 Open Relays and Authenticated SMTP An open relay is an SMTP server that will assume responsibility for delivering the mail message when neither the sender-SMTP nor the final receiver-SMTP belongs to the domain of the SMTP server. I am unaware of any specific justification of why an open relay might be required for any domain. Stronger claims also exist that they are never required, which I believe is most likely accurate [7]. The “best” justification for an open relay I am aware of is to allow users to send mail through the mail server(s) in their domain while they are not on part of the network in the domain. Though Authenticated SMTP (ASMTP) allows legitimate users the ability to relay through the mail server without requiring an open relay. ASMTP adds the following command to SMTP. Command Parameters AUTH mechanism Parameter Meaning A Simple Authentication and Security Layer (SASL) authentication mechanism initial-response An optional initial response for the sender-SMTP being authenticated Table 3 – The ASMTP AUTH command An example of the command is shown below [8]. AUTH CRAM-MD5 Open relays are used by spammers to offload the work involved in delivering all of their messages to the relay. The spammer can use a sender-SMTP that connects to the relay, and sends the single mail message body with one million RCPT commands containing one million email addresses. Once the session is terminated the spammer is finished and the relay is stuck trying to deliver one million pieces of spam. The relays are also used in attempts to avoid their spam being traced back to them [9]. If the spammer is able to find an open relay that uses the original SMTP, or one using extended SMTP that does not include the IP information from the TCP connection, there will be no information in the message headers that can be used to track the spammer (assuming forgery is present). Even without these types of open relays, spammers can attempt to mask their identity by forging Received headers in their messages, making it look like the spammer is simply an “innocent” open relay that relayed along the mail message from another sender-SMTP. Given that open relays are so useful to spammers, it is interesting to look at some statistics on them. The Open Relay Database is a non-profit organization that stores a database of IP addresses belonging to mail servers that have been verified as open relays. Visitors to the site submit the host name or the IP address of the mail server(s) they wish to be checked. If the hosts are determined to be open relays by the organization then they are added to the database. While their verification process is not published on the site, they suggest setting up a mail server and submitting it to them for testing to determine what specific tests they are currently using. After the organization has tested the server, the server logs can be examined for this information [7]. Country Number of open relays United States 82981 China 25774 Republic of Korea 16421 Japan 9921 Taiwan, Province of China 8468 Table 4 – Top five countries with the most open relays in the Open Relay Database, based on data from countries.nerd.dk : http://www.ordb.org/statistics/countries Figure 1 – The number of open relays present in the Open Relay Database As Table 4 and Figure 1 show, there are still plenty of open relays out on the Internet for the spammers to use, and these numbers are most likely not even the exhaustive numbers of all open relays present on the entire Internet. 5 Analysis Common tricks spammers use to try to avoid being identified involve using a bogus email address or someone else’s legitimate email address in the From header and MAIL command, faking the host name in the EHLO command, and forging Received headers. A proper sequence of Received headers should take the form: 1. Received: from y by z (the y used is the host name as determined by the IP address of the TCP connection if present) 2. Received: from x by y If the two values for y do not match, this represents a “broken chain” and most likely indicates that the Received header in line two is forged [14, 15, 16]. Abuse and eMailTrackerPro are two software tools that have been created to automate the procedure of header analysis to determine the IP address of the spammer in a mail message. Abuse is an open source project, while eMailTrackerPro is a commercial product sold by VisualWare [10, 11]. Due to the possible presence of various types of forgery, an investigation was performed to judge how accurate we might be able to expect the software analysis to be in determining the true IP address of the spammer. Fifty pieces of spam were taken from three email accounts for each software package to analyze, and their results were compared to the IP address I determined belonged to the spammer. Figure 2 – The results of the header analysis Abuse performed very well for the analysis, agreeing with my determination for all of the messages. EMailTracker Pro performed well on messages that did not contain the presence of forged received headers. These forged headers gave the software some problems, as while it would recognize that some type of misdirection was taking place, it would still continue down broken or invalid received header chains to pick its choice for the spammers IP address. This led to some interesting choices, such as the spam originating from the Internet Corporation for Assigned Names and Numbers (ICANN). I believe it is fairly safe to assume that the spam did not originate from their domain. I have been able to determine one type of forgery that fools Abuse into giving an incorrect IP address for the spammer. The following SMTP session was initiated from the host goofy.cs.wisc.edu: telnet mta104.mail.sc5.yahoo.com 25 Forged (with valid host names and EHLO goofy.cs.wisc.edu IP addresses for the hosts) MAIL FROM:< djones@hotmail.com> RCPT TO: xhockeyguy11@yahoo.com DATA Received: from 68.78.232.93 (adsl-68-78-232-93.dsl.mdsnwi.ameritech.net [68.78.232.93]) by goofy.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id iHF33VNXH Wed, 15 Dec 2004 02:10:30 -0600 Date: Wed, 15 Dec 2004 02:05:24 -0600 From: Dr. Jones <djones@hotmail.com> To: xhockeyguy11@yahoo.com Subject: Get rich quick Make money now! . QUIT The following mail message was then received: From Dr. Jones Wed Dec 15 00:05:24 2004 X-Apparently-To: xhockeyguy11@yahoo.com via 206.190.39.91; Mon, 20 Dec 2004 20:35:45 -0800 Authentication-Results: mta104.mail.sc5.yahoo.com from=hotmail.com; domainkeys=neutral (no sig) X-Originating-IP: [128.105.181.25] Return-Path: <djones@hotmail.com> Received: from 128.105.181.25 (EHLO goofy.cs.wisc.edu) (128.105.181.25) by mta104.mail.sc5.yahoo.com with SMTP; Mon, 20 Dec 2004 20:35:45 -0800 Received: from 68.78.232.93 (adsl-68-78-232-93.dsl.mdsnwi.ameritech.net [68.78.232.93]) by goofy.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id iHF33VNXH Wed, 15 Dec 2004 02:10:30 -0600 Date: Wed, 15 Dec 2004 02:05:24 -0600 From: "Dr. Jones" <djones@hotmail.com> To: xhockeyguy11@yahoo.com Subject: Get rich quick Content-Length: 16 Make money now! When the headers were analyzed by Abuse, the IP address of the spammer was determined to be 68.78.232.93. Even by manual analysis it is difficult to detect the forgery unless you try to initiate an SMTP session with goofy.cs.wisc.edu. To avoid even this possible method of detection the spammer could set up an open relay on the host. The two cases of where the spam was initially sent from 68.78.232.93 or goofy.cs.wisc.edu, with a forged received header, would then be indistinguishable from the headers. 6 DomainKeys DomainKeys is a current proposal that has been submitted to the Internet Engineering Task Force (IETF) for consideration. It is based on public/private key cryptography and digital signatures. When a user that is authorized to send messages from a domain submits a message for delivery to an MTA in the domain the message headers and body will be signed by the MTA. The signature, and the information needed to verify the signature is added at the top of the message in a new DomainKey-signature header by the MTA. When an MTA in another domain receives the message it can then retrieve the correct public key for the domain listed in the from header and verify the authenticity of the message. The public key information is stored in the domains DNS record as well as policy information about how the domain both uses DomainKeys and what should be done if a message is received claiming to be from the domain without a signature or the verification of a signature fails. DomainKeys does not require that all message headers be included as part of the signature and verification process, but the From message header must be [12, 17]. Conclusion Due to the abuse of the openness of the current mail system by certain elements of society, increased authentication and integrity checks need to be added to provide stronger guarantees against forgery and misrepresentation. This needs to be done both to be able to better track the criminal elements, and well as make it more difficult for them to engage in criminal activities, such as conning naïve users into providing financial information to their “bank”. DomainKeys seems to provide a reasonably effective solution towards that end, without requiring an overly massive restructuring of the existing mail system. I recommend that it is eventually adopted as a standard by the IETF to help improve the current problems faced by the mail system. References [1] – Wood, David. Programming Internet Email. O’Reilly, August 1999 [2] – Postel, Jonathan. Simple Mail Transfer Protocol. RFC 821, August 1982 [3] – Crocker, David. Standard For The Format Of Arpa Internet Text Messages. RFC 822, August 1982 [4] – Klensin, J. Simple Mail Transfer Protocol. RFC 2821, April 2001 [5] – Resnick, P. Internet Message Format. RFC 2822, April 2001 [6] – Anonymous Email. http://www.advicebox.com/ [7] – The Open Relay Database. http://www.ordb.org/ [8] – Myers, J. SMTP Service Extension for Authentication. RFC 2554, March 1999 [9] – Lindberg, G. Anti-Spam Recommendations for SMTP MTAs. RFC 2505, February 1999 [10] – Visualware. eMailTrackerPro. http://download.visualware.com/#emailtrackerpro [11] – Abuse. http://spam-abuse.sourceforge.net/ [12] – Delany, Mark. Domain-based Email Authentication Using Public-Keys Advertised in the DNS (DomainKeys). Internet Draft, August 2004 [13] – Atkins, Steve. Size and cost of the problem. http://www.ietf.org/proceedings/03mar/slides/asrg-1/index.html [14] – Mattocks, Bill. Email Spam Tracking. http://www.mailsbroadcast.com/email.bolts&nuts/101.email.spam.tracking.htm [15] – How to analyze spam. http://www.rickconner.net/spamweb/analysis.html [16] – Tracking email. http://www.phaster.com/news_articles/tracking_email.html [17] – DomainKeys Overview. http://antispam.yahoo.com/domainkeys#a3