History and Evolution of Electronic Mail (and a bit of a tutorial) John C Klensin, Ph.D. APEC, 2014-10-30 About Internet History – A Disclaimer • Early period (~1965 to ~1985) – Many parallel developments – Extensive collaboration and idea-sharing • Recent period – Internet has become important – Many claims of individual invention • I will tell the story I know: – It is not the only story; others may be equally accurate 2 More Warnings Everything is connected to everything else Many places where this talk says (another talk) Any time you have a spare couple of weeks… Going to say some controversial things Welcome questions and arguments (mostly tomorrow) 3 Before the Beginning: Messages to the Computer Operator • Probably goes back to handwritten notes with job submissions • Some batch job control options – For example, device mount instructions • Similar user → operator messages in early time-sharing systems • Typically one-way only! 4 The CTSS Insight • MIT’s Compatible Time-Sharing System – Often recognized as the beginning of interactive, multiple concurrent user, computing • Two features of many – Messages to operators – Interprocess signaling between users • Why not permit users to send messages to each other and notify on arrival? (van Vleck and Morris, 1965-1966) 5 Parallel and Slightly Later Developments • DTSS • Sigma-7 • MTS • Multics • TENEX ? • CompuServe • All multiple-user, single machines until – MIT cloned CTSS and ran two separate systems with tape transfer of data… and messages – 6 - 12 hour turnaround, plus or minus 6 From the Beginning • Postal mail model – Envelope and content – Origination, transport, and delivery systems • Terminology changed – Mail, electronic mail, net mail, email – MUA, MTA, MSA, MDA • Even regulatory concerns 7 Then the ARPANET Happened • Original usage model involved resourcesharing – First two important application protocols were remote login (“telnet”) and file transfer (“FTP”) – FTP very soon acquired a “mail” verb and conventions – “netmail” and “user@host” • FTP was recognized as not a really good model • ITU OSI work, including X.400, started 8 Internet Mail Redesign 1 • Large community effort • Mail transport separated from FTP • Separation of envelope and headers – Detailed specification of headers – Detailed specification of envelope and transport model • DNS-based and explicit models for dealing with relays and intermittently-connected hosts. • ARPANET/Internet still very restricted use • Deployed 1981-1982, DNS mostly later 9 Alternative Mail Systems • Mail over UUCP • Development of BITNET/EARN/NetNorth and mail also JANET, etc. • FidoNet • Many private/proprietary mail system developments … Just in the US: – ccMail – Notes – AOL -- MSMail -- CompuServe -- Delphi -- MCIMail -- MS Exchange (later) • ITU/ISO X.400 / MHS 10 A World of Gateways • People wanting to communicate no matter which mail system they were using • “Gateways” for translation – Had to be built one pair at a time – Different information models – Never perfect – Information often got lost, messages sometimes. 11 SMTP as Common Denominator • Since the early 1990s, mail exchange among other systems – primarily went through Internet-(and SMTP-) capable gateways – Many-one rather than many-many conversions • SMTP became the model for envelopes in many other systems • Headers: – Internet Mail Header Format (RFC 822) for many – X.400 for several more – Completely proprietary for a few 12 It Just Works (and the robustness principle) • SMTP Design – – – – Very simple command structure Rules against guessing and transforming midway Can deliver almost anything – sort out at destination Notification of non-delivery • Headers – ASCII “name: value” fields – Few requirements; recipients generally ignore what they do not understand • Robustness: Senders expected to be careful, receivers liberal • All worked well until anti-spam came along (another talk) 13 Why Internationalize? • People prefer to communicate in own languages (obvious, and always has been) • Use of “foreign” languages and scripts can be hard • Support for localization – Very few people really care about “i18n” – Without it as foundation, chaos or isolation 14 Going Multilingual and Multimedia • IETF effort started ~1990 to standardize coding and identification for non-Latin script content – Not the first use of those scripts in Internet email – Just mechanisms to identify what was being used so promoting interchange • Language issues immediately came into play • Effort expanded to multimedia mail, etc. • Result was MIME – Structured messages – Content/Media type and “charset” identification – Plus multimedia stuff (another talk) • And an SMTP extension/ negotiation mechanism 15 The Internationalization Tradeoff and People • More accessibility to Internet but more fragmentation: – Obvious advantages for communication within a language/script community – Disadvantages for communication among people and communities who use different languages and/or scripts • Enables local content – More accessibility – Translation possible, but with all the usual problems – Email bodies are content 16 Rare and Endangered Languages and Scripts • Really quite important (another talk by someone else) • May not benefit from some internationalization approaches – Applications software rarely adopted – Inability to render a script and produce meaningless displays (□□□□ or ????) – The “wait for Unicode” problem Further drive toward major languages 17 Requirements for internationalized message content • Either – Coding scheme to transmit ASCII-only or – Reliable way to indicate extensions are in use (did both) • Clear identification of Character Set and encoding used (“charset”) • Optional identification of language • SMTP extension mechanism – Included provisions for non-ASCII-coded message bodies 18 ESMTP and MIME Source Message… Envelope: EHLO MAIL FROM: RCTP TO: DATA Headers: From: To: Subject: Date: Source Message… Source Message… 19 The Internationalization Tradeoff and Computer Networks • With one, interconnected, network – Computers are not very smart – Mnemonics, acronyms, and codes don’t translate • Alias models do not scale well • Some lessons there about domains (another talk) • In particular, when the audience is computers – Actual protocol elements do not need translation (at least in theory) – Identifier strings used with protocol elements may not translate (or need to 20 Be Careful What You Try to Internationalize 21 Internationalizing Domain Names • Significant pressure for mnemonics in local scripts – “All will be well if work at 2nd level and below” – Some incorrect conceptions about DNS – In particular, cannot enforce language – Whoops, need TLDs (!) • IDNA and coding (another talk) 22 Are IDNs Necessary? • Socially and politically, definitely yes • If search is used more than remembering or guessing domain names, maybe not. • Favorites and bookmarks can be anchored in any language and mapped to domains in any script 23 Beyond content to addresses • Internationalization tradeoffs still a problem – Good within language/ script communities – Problem when sender and recipient use different ones. – If I cannot read or type your address, we have a problem (noticed in Post a long time ago) • Updating email transport systems is easy – Legacy conversion Is harder – Interface to and in MUAs is really hard. • Unlike content, multiple character codes are a problem for addresses 24 Messages with New Addresses to Old Systems • No conversion gateways – Sender System (MSA or MTA): Can you accept this? – Receiver MTA: No – Sender MTA: ok, goodbye… will tell the user 26 Mail Transport Source Message … MUA MSA MTA Gateway Relay Relay Retrieval & Presentation Delivery Process MTA 27 Why no “downgrading”? • Note: local-part@domain • Constraints imply – No way to do IDNA-like mapping of addresses – Local-part may be an arbitrary string; domain not much better • No translation either • Transliteration not reliable even if agreement could be reached 28 Email Extended for Non-ASCII Addresses - Characteristics • local-part@domain – entirely Unicode UTF-8 • Requires non-ASCII Unicode support in header field data • Addresses in envelope – Supported through SMTP extension – No fallback or translation/ coding in transit. – System accepting the extensions must be prepared for any Unicode-supported script • New addresses + older systems: No communication 29 I Did Not Talk About MUAs • Always the hard part – Need to understand people and behavior, not just computers – Figuring out what to do when something is not understood is hard too • Not clear that we know how to build a perfect one, even for all-ASCII message and systems (another talk) 31 As the Extensions Deploy… • More Internet accessibility to people unfamiliar with Latin characters • Better ability to use non-basic-Latin email addresses – Both local parts and domain names • Better communication within language communities • Probably little change between communities. – Learning that from inevitable problems 32 Email Probably Has A Future • It is as universal as human communication • But humans still communicate better when – Same language – Same writing system – Same culure • More internationalized email probably won’t change that 33 Thank you Bring questions tomorrow. 34