Andrew System E-mail Architecture at Carnegie Mellon University Rob Siemborski

advertisement
Carnegie Mellon
Andrew System
E-mail Architecture at
Carnegie Mellon University
Rob Siemborski
rjs3@andrew.cmu.edu
Last Revision: 01/27/2004 (wcw)
Walter Wong
wcw@cmu.edu
Computing Services
Carnegie Mellon University
5000 Forbes Ave
Pittsburgh, PA 15213
Carnegie Mellon
Presentation Overview









History & Goals
The Big Picture
Mail Transfer Agents
Mail Processing (Spam & Virus Detection)
The Directory
The Cyrus IMAP Aggregator
Clients and Andrew Webmail
Current Andrew Hardware Configuration
Future Directions
Carnegie Mellon
The Early Years

Early 80s – The Andrew Project





Campus-wide computing
Joint IBM/CMU Venture
One of the first large scale distributed systems,
challenging the ‘mainframe’ mentality
The Andrew File System (AFS)
The Andrew Message System (AMS)
Carnegie Mellon
Goals of the
Andrew Message System



Reliability
Machine and Location Independence
Integrated Message Database





Personal Mail and Bulletin Boards
Separation of Interface from Functionality
Support for Multi-Media
Scalability
Easy to Extend, Easy to Use
Carnegie Mellon
End of AMS

AMS was a nonstandard system




Avoid becoming a “technology island”
Desire to not maintain our own clients.
AMS was showing scalability problems
Desire to decouple the file system from
the mail system
Carnegie Mellon
Project Cyrus Goals



Scalable to tens of thousands of users
Support wide use of bulletin boards
Use widely accepted standards-based
technologies


Comprehensive client support on all major
platforms
Supports a disconnected mode of
operation for the mobile user
Carnegie Mellon
Project Cyrus Goals (2)




Supports Kerberos authentication
Allows for easy sharing of private folders
with select individuals
Separation of the mail store from a
distributed file system
Can be independently installed, managed
and set up for use in small departmental
computing facilities
Carnegie Mellon
More CMU Mail System Goals

Allow users to have a single @cmu.edu address
no matter where their actual mail store is
located




“CMUName” Service
Ability to detect and act on incoming Spam and
Virus Messages
Provide access mail over the Web
Integration of messaging into the overall
Computing Experience
Carnegie Mellon
The Big Picture
The Internet
LDAP Directory
Servers
Users /
Mail Clients
Mail Transfer Agents
(Three Pools)
Cyrus IMAP Aggregator
Carnegie Mellon
Mail Transfer Agents
The Internet
LDAP Directory
Servers
Users /
Mail Clients
Mail Transfer Agents
(Three Pools)
Cyrus IMAP Aggregator
Carnegie Mellon
Mail Transfer Agents

Andrew has 3 Pools of Mail Transfer Agent (MTA)
Machines




Mail exchangers (MX Servers) receive and handle mail
from the outside world for the ANDREW.CMU.EDU
domain.
The “SMTP Servers” process user submitted
messages (SMTP.ANDREW.CMU.EDU)
Mail exchangers for the CMU.EDU domain (the
CMU.EDU MXs)
All Andrew MTAs run Sendmail
Carnegie Mellon
Mail Transfer Agents (2)

Why 3 Pools?

MX Servers




Subject to the ebb and flow of the outside world
Significant CPU-intensive processing
Typically handle much larger queues (7,000+ messages
each)
SMTP Servers



Speak directly to our clients
Need to be very responsive
Very small queues (200 messages each)
Carnegie Mellon
Mail Transfer Agents (3)

CMU.EDU MXs
Service separation from Andrew MX servers
 Mostly just forwarding
 No real need to duplicate processing done on
Andrew MX servers


All Three Pools are Redundant

Minimize impact of a machine failure
Carnegie Mellon
Mail Transfer Agents (4)

Separate MTA pools give significant control
over incoming email.


A message may touch multiple pools
Example:
User submits message
to foo@CMU.EDU via
SMTP servers
Message processed by
CMU.EDU MX, bound for
foo@ANDREW.CMU.EDU
Message
processed by
ANDREW MX
Final Delivery
To Cyrus Aggregator
Carnegie Mellon
Mail Processing

All mail through the system is “processed”
to some degree.




Audit Logging
Cleaning badly-formed messages
Blocking restricted sender/recipients/relays
More substantial processing done by
Andrew MX Servers
Carnegie Mellon
Mail Processing (2)

Spam Detection




Uses Heuristic Algorithms to identify Spam
Messages (SpamAssassin)
Tags message with a header and score
User initiated filters (SIEVE) can detect the
header and act upon it (bounce the message
or file it into an alternate folder)
Very computationally expensive on MX
Carnegie Mellon
Mail Processing (3)

Virus Detection



Uses signatures to match virus messages
(ClamAV)
“Bounce” message immediately at the
incoming RCPT
Debate between bounce vs. tag
Carnegie Mellon
The Directory
The Internet
LDAP Directory
Servers
Users /
Mail Clients
Mail Transfer Agents
(Three Pools)
Cyrus IMAP Aggregator
Carnegie Mellon
The Directory

Mail delivery and routing is assisted by an
LDAP-accessible database.
Every valid destination address has an
LDAP entity
LDAP lookups can do “fuzzy matching”

LDAP queries done against replicated pool


Carnegie Mellon
The Directory (2)

Every account has a mailRoutingAddress:
the “next hop” of the delivery process


mRA is not generally user configurable
Some accounts have a user-configurable
mailForwardingAddress (mFA)

mFA will override the mRA
Carnegie Mellon
The Cyrus IMAP Aggregator
The Internet
LDAP Directory
Servers
Users /
Mail Clients
Mail Transfer Agents
(Three Pools)
Cyrus IMAP Aggregator
Carnegie Mellon
The IMAP Protocol





Standard Protocol developed by the IETF
Messages Remain on Server
MIME (Multipurpose Internet Mail
Extentions) Aware
Support for Disconnected Operation
AMS-Like Features (ACLs, Quota, etc)
Carnegie Mellon
The Cyrus IMAP Server

CMU Developed IMAP/POP Server




Released to public and maintained as active
Open Source project under BSD-like License
No servers were available implemented all of
the features needed to replace AMS.
Designed to be a “Black Box” server
Performance and Scalability were key to
Design
Carnegie Mellon
Initial Cyrus IMAP Deployment






Single monolithic server (1994-2002)
Originally deployed alongside AMS
Features were implemented incrementally
Users were transitioned incrementally
Local users provided a good testing pool
Scaled surprisingly well
Carnegie Mellon
Cyrus IMAP Aggregator Design

IMAP not well suited to clustering




No real concept of mailbox “location”
Clients expect consistent views of the server and its
mailboxes
Significantly varying client implementation quality
Aggregator was designed to make many
machines look like one so any user can share a
folder to any other user
Carnegie Mellon
Cyrus IMAP Aggregator Design (2)

Three Participating
Types of Servers



IMAP Frontends
(“dataless” Proxies)
IMAP Backends
(“Normal” IMAP
Servers; your data
here)
MUPDATE (Mailbox
Database)
Users /
Mail Clients
Frontends Proxy
Requests
For Clients
Backends hold
Traditional
Mailbox Data
MUPDATE Server
Maintains list
Carnegie Mellon
IMAP Frontends

Fully redundant




Users /
Mail Clients
All are identical
Maintain local replica of
mailbox list
Proxies most requests,
querying backends as
needed
May also send IMAP
referrals to capable
clients
Frontends
Proxy
Requests
For Clients
Backends hold
Traditional
Mailbox Data
MUPDATE Server
Propogates mailbox list
changes to frontends
Carnegie Mellon
IMAP Backends


Basically Normal IMAP
Servers
Mailbox Operations
are approved &
recorded by
MUPDATE server



Create / Delete
Rename
ACL Changes
Users /
Mail Clients
Requests are
proxied by
Frontends
Backends
hold
Traditional
Mailbox Data
MUPDATE Server
approves mailbox
operations
Carnegie Mellon
MUPDATE Server



Specialized Location
Server (similar to
VLDB in AFS)
Provides guarantees
about replica
consistency
Simpler than
maintaining database
consistency between
all the frontends
Users /
Mail Clients
Frontends
update local
mailbox list
replicas
Backends send
mailbox list
updates
MUPDATE Server
approves and
replicates updates
Carnegie Mellon
Cyrus Aggregator:
Data Usage






User INBOXes and sub folders
Users can share their folders
Internet mailing lists as public folders
Netnews Newsgroups as public folders
Public folders for “workflow”; general
discussion, etc
Continued “bboard” paradigm: 30,000+
folders visible
Carnegie Mellon
Cyrus IMAP Aggregator:
Advantages

Horizontal Scalability






Adding new capacity to frontend and/or backend is easy to do
and can be done with no user visible downtime
Management possible through single IMAP client session
Wide client interoperability
Simple Client configuration
Ability to (mostly) transparently move users from one
backend to another
Failures are partitioned
Carnegie Mellon
Cyrus IMAP Aggregator:
Limitations


Backends are NOT redundant
MUPDATE is a single point of failure

Failure only results in error when trying to
CREATE/DELETE/RENAME or change ACLs on
mailboxes
Carnegie Mellon
Cyrus IMAP Aggregator:
Backups




Disk partition backup via Kerberized
Amanda (http://www.amanda.org)
Restores are manual
21 day rotation – no archival
Backup to disk (no tapes)
Carnegie Mellon
Cyrus IMAP Aggregator:
Other Protocol Support

POP3 support for completeness




Possibly creates more problems than not
(where did my INBOX go?)
NNTP to populate bboards
NNTP access to mail store
LMTP w/AUTH for mail transport from MTA
to backends
Carnegie Mellon
Clients
The Internet
LDAP Directory
Servers
Users /
Mail Clients
Mail Transfer Agents
(Three Pools)
Cyrus IMAP Aggregator
Carnegie Mellon
Clients

IMAP has many publicly available clients



Varying quality
Varying feature sets
Central computing recommends Mulberry



Roaming Profiles via IMSP
Many IMAP extensions supported (e.g. ACL)
UI not as popular
Carnegie Mellon
Clients - Webmail


Use SquirrelMail as a Webmail Client
Local Modifications




Interaction with WebISO (pubcookie) Authentication
Kerberos Authentication to Cyrus
Local proxy (using imtest) to reduce connection load
on server
Preferences and session information shared via
AFS (simple, non-ideal)
Carnegie Mellon
Clients – Mailing Lists

+dist+ for “personal” mailing lists
+dist+~user/foo.dl@andrew.cmu.edu


Majordomo for “Internet-style” mailing lists
Prototype web interface for accessing bboards

Authenticated (for protected bboards)
http://bboard.andrew.cmu.edu/bb/org.acs.asg.coverage

Unauthenticated (for mailing list archives)
http://asg.web.cmu.edu/bb/archive.info-cyrus
Carnegie Mellon
Andrew Mail Statistics

Approximately 30,000 Users
12,000+ Peak Concurrent IMAP Sessions
8+ IMAP Connections / Second
650 Peak Concurrent Webmail Sessions
Approximately 1.5 Million Emails/week

See Also: http://graphs.andrew.cmu.edu




Carnegie Mellon
Andrew Hardware

5 frontends



5 backends



Dell 2650 (Pentium 4 3ghz; 2 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks)
1 mailing list


Dell 2650 (Pentium 4 3ghz; 2 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks)
2 CMU.EDU MX


Dell 2650 (Pentium 4 3ghz; 2 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks)
3 SMTP.ANDREW.CMU.EDU


Dell 2450 (Pentium III 733 MHz; 1 GB memory; PERC3 RAID5 4x36GB 10000RPM disks)
3 ANDREW.CMU.EDU MX


4 Sun 220R (450mhz UltraSparc II; 2GB memory; JetStor II-LVD RAID5 8x36 GB 15000 RPM disks)
1 SunFire 280R (2x1ghz UltraSparc III; 4GB memory; JetStor III U160 RAID5 8x73 GB 15000 RPM disks)
1 mupdate


3 Sun Ultra 80s (2x450mhz UltraSparc II; 2 GB memory; Internal 10000 RPM disk)
2 SunFire 280Rs (2x1ghz UltraSparc III; 4 GB memory; Internal 10000 RPM disk)
Dell 2650 (Pentium 4 2.8ghz; 1 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks)
3 webmail

Dell Optiplex GX 260 small form factor (Pentium 4 2.4Ghz; 1GB memory; 80GB ATA disk)
Carnegie Mellon
Current Issues


Lack of client support for ‘check new’ for
IMAP folders (even when client supports
NNTP)
Large number of visible folders can be
problematic for clients (i.e. PocketInbox)
Carnegie Mellon
Potential Future Work



Online/Self-Service Restores (e.g. AFS “OldFiles”, delayed
EXPUNGE)
Virtual “Search” Folders
Fault tolerance



Replicate backends
Support multiple MUPDATE servers
Multi-Access Messaging Hub




One Mail Store, many APIs
IMAP, POP, NNTP, HTTP/DAV/RSS, XML/SOAP
Web Bulletin Boards / blog interface
Remove Shared Folder / Mailing List Distinction
Carnegie Mellon
Current Software







MTA: Sendmail 8.12.10
LDAP: OpenLDAP 2.0
Cyrus: 2.2.3
MIMEDefang: 2.28
SpamAssassin: 2.61
ClamAV: 0.63
Squirrelmail: 1.4.2 (w/Local Modifications)
Download