Support for the adoption of a domain based authentication

advertisement
Support for the adoption of a domain based authentication
mechanism for electronic mail
Robert T. Johnson, III
Abstract
This paper analyzes the current mail system and the weaknesses inherent in it that
allow unethical users to misrepresent themselves and attempt to hide their identities.
Ways in which this is done is discussed, as well as what information present in mail
messages we can expect to have some guarantee on accuracy in the presence of potential
forgery. A current proposal submitted for consideration by the IETF is then discussed and
a recommendation made that the proposal be adopted as an eventual standard to increase
the integrity of mail messages and diminish the impact of spam.
1 Introduction
Electronic mail (email) has seen widespread adoption as a communications
medium. Many of the advantages of other forms of communication are present in email
and it provides new ones as well. Messages can be sent easily and generally delivered
quickly to a receiving mail server. Recipients can retrieve messages and respond if
needed at their convenience. Worldwide communication between interested parties is
possible. Other electronic information can also be sent with a message. You can
communicate with anyone as long as you know their email address. There is almost no
cost to send messages on a per message basis.
The popularity of the medium, open communication, and nearly nonexistent cost
per message has also given rise to nefarious uses of electronic mail; unsolicited bulk
commercial advertising and attempts to trick users into giving up personal financial
information, such as bank account numbers (phishing). Labeled as spam and the
perpetrators as spammers, these messages clog up the inboxes of all types of users and
consume large amounts of the limited resources of systems involved in mail delivery.
This has detrimental effects on the usefulness of email as a communications medium. In
the extreme, if not halted or kept in check, may even have the potential to destroy the
usefulness of email as a communications medium. Time, money, and system resources
are lost both by the recipients of spam and by the entities involved in the delivery of it.
For quantitative data on the problem, 96% of end users think that spam is at least
annoying, while only 4% do not think it is annoying. That translates to hundreds of
millions of annoyed end users worldwide. In one day, AOL end users reported over five
million pieces of spam. Costs for receiving spam have been estimated at: 30 to 50 dollars
a year in direct costs to each end user, 730 dollars a year in lost productivity for each
employee, and 8,900,000,000 dollars a year in total cost to US corporations [13].
Obviously this is a serious problem!
Spamming is attractive to the unethical and criminal element of society due to the
potential to reach a large audience for almost no cost to the spammer. The cost of the
spam is forced upon both the recipients of it and the entities involved in delivery. The
ability to do this so easily is what has made spam so prevalent in the email system. Spam
is like “junk” postal mail that has postage due upon delivery and must be paid for by the
recipient.
Even with the large public outcry against spamming, it is still a popular, and
apparently profitable tactic among the undesirable element of society. Thirty million
dollars was seized from one spam group and another spam related business was estimated
at 3,200,000,000 dollars [13]. One of the reasons for this is due to the low accountability
spammers currently face for their actions. Since most spammers are able to get away with
it, there is little incentive for them to stop. So far the most successful, and promising
ways of dealing with spam involve blacklists, whitelists, and filtering it at some level.
These are valuable ways to combat spam and can be quite effective. However, these
methods are stopgap measures that do not provide incentives for making spammers stop
their activities. They also have their own potential drawbacks as well, still requiring users
to occasionally sort though mail tagged as spam and create opportunities for legitimate
mail to be lost.
Ultimately, one of the biggest weaknesses of the current mail system is the lack of
authentication present for messages. Undesirable elements of society exploit this
weakness to trick users into thinking a message is from someone else and to avoid being
detected as the sender of the message. How this is done in the current mail system, as
well as a current proposal called DomainKeys that attempts to overcome these
weaknesses are the topics of this paper. The next section will discuss the original SMTP
protocol, while Section 3 discusses the newer ESMTP protocol. Open Relays and the
Authenticated SMTP extension will be covered in Section 4, while issues involved in
tracking spammers based on message header analysis will be presented in Section 5.
DomainKeys is discussed in Section 6 and Section 7 concludes the paper.
2 SMTP
The original SMTP protocol was developed in 1982 for the Arpanet, and was
designed as a very open protocol. Historically, this has been one of emails greatest
strengths – anyone can send anyone a message as long as they know that person’s email
address. However, due to the nefarious activities of spammers the usefulness of this
strength is being undermined. The protocol provides communication between a mail-user
agent (MUA) and a mail transfer agent (MTA), also known as a mail server, as well as
communication between an MTA and another MTA. Due to the fairly simple nature of
the protocol, users can also telnet on port 25 (the reserved port for SMTP) to an MTA and
send mail directly. Several MTAs can also be involved in mail delivery, which is known
as mail relaying. The general pattern to send mail to the final destination MTA is as
follows: MUA→MTA→MTA, or MUA→MTA→MTA→…→MTA if relaying is
used [1].
Since one of the main goals of a spammer is to avoid being identified, and since
an ESMTP receiver is required to be able to accept mail from an SMTP sender to be fully
compliant with the standard, I will first present the minimal use of the protocol to send
mail. It is very unlikely a smart spammer would use more than the minimum required
unless it provided them with some advantage to avoid being identified. There are five
ordered commands that are needed to accomplish this, as shown in the table below [2].
Command
HELO
MAIL
RCPT
Parameters
domain
FROM: <reverse-path>
TO: <forward-path>
Parameter meaning
Host name of sender-SMTP
Path leading back to the originator of the message
Path leading to the ultimate destination of the
message
DATA
QUIT
Table 1 – Commands used by the sender-SMTP to send mail
HELO is used to synchronize the initial connection, and should be used to identify
the sender-SMTP to the receiver-SMTP. MAIL should be used to provide a path back to
the originator of the message. Between the DATA and QUIT command contains both the
message headers and the message body. The minimum message headers required for a
valid message are shown in the table below, though only one of the To or Bcc headers is
required [3].
Header
Date:
From:
To:
Bcc:
Parameters
date-time
mailbox
address
address
Parameter meaning
The date and time the message was sent
The mail address of the sender
The mail address(es) of the recipient(s) of the message
Same as To:, but can be empty
Table 2 – Minimal message headers
When an MTA takes responsibility for delivering a message, it also adds a new
header at the top of the message. This header takes the form of Received: <stamp>, with
an example shown below.
Received: FROM ABC.ARPA BY XYZ.ARPA ; 22 OCT 81 09:23:59 PDT
This potentially provides information about who the sender-SMTP and the receiverSMTP were and what time the mail session took place. When the final receiving MTA
takes receipt of the message, it also adds an additional header at the top of the message in
the form of Return-Path: <reverse-path>. The reverse-path is the same reverse-path that
is given in the MAIL command for this session. An example is shown below.
Return-Path: <@ABC.ARPA:JOE@ABC.ARPA>
The problem involves who provides information in the initial mail session. In this
session, the parameters of both the HELO and MAIL commands, as well as all the initial
message headers are dependent upon the initial sender-SMTP. If the initial sender-SMTP
chooses to provide incorrect information in some or all of these areas, there is nothing
that stops it from doing so. An example of the complete message headers illustrates
this [2, 3]:
Return-Path: <@DEF.ARPA, @ABC.ARPA:JOE@ABC.ARPA>
Received: from DEF.ARPA by GHI.ARPA ; 27 Oct 81 15:15:13 PST
Received: from ABC.ARPA by DEF.ARPA ; 27 Oct 81 15:01:59 PST
Date: 27 Oct 81 15:01:01 PST
From: JOE@ABC.ARPA
TO: SAM@GHI.ARPA
Dependent on initial sender-SMTP
If the true initial sender-SMTP is really sending the message on behalf of say,
david@spammer.com, there is no information about this present in the final mail
message headers. A spammer is able to mask his own identity in this fashion. The only
information we are left with for some clue to their identity is what they have in the body
of their message and the bogus information provided in the headers.
3 ESMTP
As stated earlier, full compliance with the newer standard requires the ability to
interact with a sender-SMTP that uses the original protocol and to accept their mail for
delivery. So that the receiver-ESMTP can determine if a session has been established
with a sender-SMTP or a sender-ESMTP, the HELO mail command is replaced by the
command EHLO. For tracking purposes, there is one important addition, though it is not
absolutely required for compliance with the specification. The addition involves
potentially including at least the IP address of the sender-SMTP from the TCP connection
and optionally the host name for the IP address if available, as well as the information
provided by the sender-(E)SMTP in the (EHLO)HELO command [4, 5]. The only reason
I believe a legitimate MTA would not wish to include this information is for some type of
anonymous email service, though such services provide a perfect opportunity for a
spammer to avoid being traced by IP address if the service provider is unscrupulous or
careless [6]. The example below shows what the new Received header can potentially
look like.
Received: from ad.bogus.com (goofy.cs.wisc.edu [128.105.181.25]) by
obsidian.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id iBG8gXCs029856
Host name given in the EHLO command The real host name and IP
There are two very interesting aspects of the current specification when considering their
relation to spamming. To be fully compliant, a receiver-ESMTP must not refuse to accept
a message if the domain name given in the EHLO command does not correspond to the
IP address of the client. This also happens to be one of the most common tricks
spammers use to try to avoid detection. Also, full compliance requires that a receiver-
ESMTP that provides relay capabilities must not inspect the message headers or body
unless performing an attempt to detect mailing loops [4]. Since one of the main
techniques spammers use is finding relays that will deliver their mail for them (known as
open relays), this information could be useful for an open relay to decide to refuse to
accept responsibility for delivering their mail. Using an open relay to send out spam
allows the spammers to trick the open relay into performing the bulk of their work for
them.
Note: For the remainder of the paper I will use the term SMTP to refer to the enhanced
SMTP protocol. To specifically distinguish between the original and the extended SMTP
protocol the terms “original SMTP” and “extended SMTP” will be used instead.
4 Open Relays and Authenticated SMTP
An open relay is an SMTP server that will assume responsibility for delivering the
mail message when neither the sender-SMTP nor the final receiver-SMTP belongs to the
domain of the SMTP server. I am unaware of any specific justification of why an open
relay might be required for any domain. Stronger claims also exist that they are never
required, which I believe is most likely accurate [7]. The “best” justification for an open
relay I am aware of is to allow users to send mail through the mail server(s) in their
domain while they are not on part of the network in the domain. Though Authenticated
SMTP (ASMTP) allows legitimate users the ability to relay through the mail server
without requiring an open relay. ASMTP adds the following command to SMTP.
Command Parameters
AUTH
mechanism
Parameter Meaning
A Simple Authentication and Security Layer (SASL)
authentication mechanism
initial-response An optional initial response for the sender-SMTP being
authenticated
Table 3 – The ASMTP AUTH command
An example of the command is shown below [8].
AUTH CRAM-MD5
Open relays are used by spammers to offload the work involved in delivering all
of their messages to the relay. The spammer can use a sender-SMTP that connects to the
relay, and sends the single mail message body with one million RCPT commands
containing one million email addresses. Once the session is terminated the spammer is
finished and the relay is stuck trying to deliver one million pieces of spam. The relays are
also used in attempts to avoid their spam being traced back to them [9]. If the spammer is
able to find an open relay that uses the original SMTP, or one using extended SMTP that
does not include the IP information from the TCP connection, there will be no
information in the message headers that can be used to track the spammer (assuming
forgery is present). Even without these types of open relays, spammers can attempt to
mask their identity by forging Received headers in their messages, making it look like the
spammer is simply an “innocent” open relay that relayed along the mail message from
another sender-SMTP.
Given that open relays are so useful to spammers, it is interesting to look at some
statistics on them. The Open Relay Database is a non-profit organization that stores a
database of IP addresses belonging to mail servers that have been verified as open relays.
Visitors to the site submit the host name or the IP address of the mail server(s) they wish
to be checked. If the hosts are determined to be open relays by the organization then they
are added to the database. While their verification process is not published on the site,
they suggest setting up a mail server and submitting it to them for testing to determine
what specific tests they are currently using. After the organization has tested the server,
the server logs can be examined for this information [7].
Country
Number of open relays
United States
82981
China
25774
Republic of Korea
16421
Japan
9921
Taiwan, Province of China
8468
Table 4 – Top five countries with the most open relays in the Open Relay Database,
based on data from countries.nerd.dk : http://www.ordb.org/statistics/countries
Figure 1 – The number of open relays present in the Open Relay Database
As Table 4 and Figure 1 show, there are still plenty of open relays out on the Internet for
the spammers to use, and these numbers are most likely not even the exhaustive numbers
of all open relays present on the entire Internet.
5 Analysis
Common tricks spammers use to try to avoid being identified involve using a
bogus email address or someone else’s legitimate email address in the From header and
MAIL command, faking the host name in the EHLO command, and forging Received
headers. A proper sequence of Received headers should take the form:
1. Received: from y by z (the y used is the host name as determined by the IP
address of the TCP connection if present)
2. Received: from x by y
If the two values for y do not match, this represents a “broken chain” and most likely
indicates that the Received header in line two is forged [14, 15, 16].
Abuse and eMailTrackerPro are two software tools that have been created to
automate the procedure of header analysis to determine the IP address of the spammer in
a mail message. Abuse is an open source project, while eMailTrackerPro is a commercial
product sold by VisualWare [10, 11]. Due to the possible presence of various types of
forgery, an investigation was performed to judge how accurate we might be able to
expect the software analysis to be in determining the true IP address of the spammer.
Fifty pieces of spam were taken from three email accounts for each software package to
analyze, and their results were compared to the IP address I determined belonged to the
spammer.
Figure 2 – The results of the header analysis
Abuse performed very well for the analysis, agreeing with my determination for
all of the messages. EMailTracker Pro performed well on messages that did not contain
the presence of forged received headers. These forged headers gave the software some
problems, as while it would recognize that some type of misdirection was taking place, it
would still continue down broken or invalid received header chains to pick its choice for
the spammers IP address. This led to some interesting choices, such as the spam
originating from the Internet Corporation for Assigned Names and Numbers (ICANN). I
believe it is fairly safe to assume that the spam did not originate from their domain.
I have been able to determine one type of forgery that fools Abuse into giving an
incorrect IP address for the spammer. The following SMTP session was initiated from the
host goofy.cs.wisc.edu:
telnet mta104.mail.sc5.yahoo.com 25
Forged (with valid host names and
EHLO goofy.cs.wisc.edu
IP addresses for the hosts)
MAIL FROM:< djones@hotmail.com>
RCPT TO: xhockeyguy11@yahoo.com
DATA
Received: from 68.78.232.93 (adsl-68-78-232-93.dsl.mdsnwi.ameritech.net
[68.78.232.93]) by goofy.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id
iHF33VNXH Wed, 15 Dec 2004 02:10:30 -0600
Date: Wed, 15 Dec 2004 02:05:24 -0600
From: Dr. Jones <djones@hotmail.com>
To: xhockeyguy11@yahoo.com
Subject: Get rich quick
Make money now!
.
QUIT
The following mail message was then received:
From Dr. Jones Wed Dec 15 00:05:24 2004
X-Apparently-To:
xhockeyguy11@yahoo.com via 206.190.39.91; Mon, 20
Dec 2004 20:35:45 -0800
Authentication-Results:
mta104.mail.sc5.yahoo.com from=hotmail.com;
domainkeys=neutral (no sig)
X-Originating-IP:
[128.105.181.25]
Return-Path: <djones@hotmail.com>
Received:
from 128.105.181.25 (EHLO goofy.cs.wisc.edu) (128.105.181.25)
by mta104.mail.sc5.yahoo.com with SMTP; Mon, 20 Dec 2004 20:35:45
-0800
Received:
from 68.78.232.93 (adsl-68-78-232-93.dsl.mdsnwi.ameritech.net
[68.78.232.93]) by goofy.cs.wisc.edu (8.13.1/8.13.1) with ESMTP id
iHF33VNXH Wed, 15 Dec 2004 02:10:30 -0600
Date: Wed, 15 Dec 2004 02:05:24 -0600
From: "Dr. Jones" <djones@hotmail.com>
To:
xhockeyguy11@yahoo.com
Subject:
Get rich quick
Content-Length:
16
Make money now!
When the headers were analyzed by Abuse, the IP address of the spammer was
determined to be 68.78.232.93. Even by manual analysis it is difficult to detect the
forgery unless you try to initiate an SMTP session with goofy.cs.wisc.edu. To avoid even
this possible method of detection the spammer could set up an open relay on the host.
The two cases of where the spam was initially sent from 68.78.232.93 or
goofy.cs.wisc.edu, with a forged received header, would then be indistinguishable from
the headers.
6 DomainKeys
DomainKeys is a current proposal that has been submitted to the Internet
Engineering Task Force (IETF) for consideration. It is based on public/private key
cryptography and digital signatures. When a user that is authorized to send messages
from a domain submits a message for delivery to an MTA in the domain the message
headers and body will be signed by the MTA. The signature, and the information needed
to verify the signature is added at the top of the message in a new DomainKey-signature
header by the MTA. When an MTA in another domain receives the message it can then
retrieve the correct public key for the domain listed in the from header and verify the
authenticity of the message. The public key information is stored in the domains DNS
record as well as policy information about how the domain both uses DomainKeys and
what should be done if a message is received claiming to be from the domain without a
signature or the verification of a signature
fails. DomainKeys does not require that all message headers be included as part of the
signature and verification process, but the From message header must be [12, 17].
Conclusion
Due to the abuse of the openness of the current mail system by certain elements of
society, increased authentication and integrity checks need to be added to provide
stronger guarantees against forgery and misrepresentation. This needs to be done both to
be able to better track the criminal elements, and well as make it more difficult for them
to engage in criminal activities, such as conning naïve users into providing financial
information to their “bank”. DomainKeys seems to provide a reasonably effective
solution towards that end, without requiring an overly massive restructuring of the
existing mail system. I recommend that it is eventually adopted as a standard by the IETF
to help improve the current problems faced by the mail system.
References
[1] – Wood, David. Programming Internet Email. O’Reilly, August 1999
[2] – Postel, Jonathan. Simple Mail Transfer Protocol. RFC 821, August 1982
[3] – Crocker, David. Standard For The Format Of Arpa Internet Text Messages.
RFC 822, August 1982
[4] – Klensin, J. Simple Mail Transfer Protocol. RFC 2821, April 2001
[5] – Resnick, P. Internet Message Format. RFC 2822, April 2001
[6] – Anonymous Email. http://www.advicebox.com/
[7] – The Open Relay Database. http://www.ordb.org/
[8] – Myers, J. SMTP Service Extension for Authentication. RFC 2554, March 1999
[9] – Lindberg, G. Anti-Spam Recommendations for SMTP MTAs. RFC 2505, February
1999
[10] – Visualware. eMailTrackerPro. http://download.visualware.com/#emailtrackerpro
[11] – Abuse. http://spam-abuse.sourceforge.net/
[12] – Delany, Mark. Domain-based Email Authentication Using Public-Keys Advertised
in the DNS (DomainKeys). Internet Draft, August 2004
[13] – Atkins, Steve. Size and cost of the problem.
http://www.ietf.org/proceedings/03mar/slides/asrg-1/index.html
[14] – Mattocks, Bill. Email Spam Tracking.
http://www.mailsbroadcast.com/email.bolts&nuts/101.email.spam.tracking.htm
[15] – How to analyze spam. http://www.rickconner.net/spamweb/analysis.html
[16] – Tracking email. http://www.phaster.com/news_articles/tracking_email.html
[17] – DomainKeys Overview. http://antispam.yahoo.com/domainkeys#a3
Download