Abstract Over the past 5 years the Internet has changed vastly from its academic origins to become the lifeblood of global business communications. Unfortunately, along with all the obvious benefits associated with such connectivity there is a downside. This paper aims to investigate the carrier class virus phenomenon and propose solutions. Historically, computer viruses spread primarily via booting from infected disks or executing infected files or documents. In every case, a human element has factored in a virus's ability to pollinate. However, this situation has recently changed and with it so have some important factors regarding virus protection. Email aware viruses such as Melissa, Happy99 or ExploreZIP are able to pollinate themselves, instantly and with great efficiency. Worse still, the powerful scripting languages presented in todays’ email clients and office suites make creating such viruses a comparatively easy task. We know that statistically, 1 in every 1,500 emails on average will contain a virus. In order to help counter this trend towards email capable viruses, ISPs must take more responsibility for the email they forward and provide an effective front line of defense for their customers. In this presentation, MessageLabs will introduce supporting data to highlight important new issues and trends in virus protection and detail how ISPs can efficiently and easily integrate virus scanning into their networks. The Internet is at the core of the problem so it is logical that it should be at the core of the solution. Contents: Page No. 1. Emerging Virus Trends…………………………………………………………………3 2. Why Are Viruses So Prolific?…………………………………………………………4-6 2.1 Networked Plumbing 2.2 Too Much Functionality 2.3 Common Platforms 3. 4. Current Solutions are Outdated……………………………………………………6-7 Evolving a New Approach: Scanning For Viruses at the Internet Level………………………………………………………………………7-14 4.1 How Does the Virus Scanning System work? 4.2 How is the Virus Scanning System Deployed? 4.3 Skeptic™ 5. Mail Encryption……………………………………………………………………………..15 6. Making Use of Live Statistics……………………………………………..………….16 http://www.messagelabs.com/ Mark Sunner, MessageLabs, November 2000 1 www.MessageLabs.com Section 1. Emerging Virus Trends Recent developments in the industry have seen virus writers incorporating email capabilities into viruses with devastating effects. Older techniques used to increase the impact or longevity of a virus, such as stealth and polymorphism, have become old hat. By using email as a method of distribution, it is possible to write a virus capable of infecting thousands of computers in a matter of minutes. Worse still, creating such viruses has become easy due to the immense programming capabilities contained within today’s powerful office suites. Since the first PC virus appeared on the scene a little over 12 years ago, there are now approximately some 40-50,000 known viruses. However, whilst this figure may sound dramatic, it is not an indication of the overall threat, indeed at any point in time only a very small proportion of viruses are actually in the wild causing damage. The current number of viruses in the wild is estimated to be around 400 although this figure precludes variants it still seems small contrasted against the number of documented incidents. Traditionally, virus incidents are hard to track and obtain accurate data on, as this information is often kept “under wraps”. In December 1999, Dell Computer Corporation were publicly exposed as being hit by the Funlove virus. Section 2. Why are Viruses So Prolific? It is easy to see that the number of incidents has risen sharply over the past 18 months. In order to see what conclusions and possible preventative steps can be drawn lets first take a closer look at the three main contributory elements. 2.1 Networked Plumbing We’re all connected! The corporate world has just spent the last 5 years feverishly gluing itself together. The major driving force behind this is email. Five years ago, the chances of the author of this paper being able to email you, the reader, were probably around 50/50. Nowadays, having an email address at work is taken as a given. Aside from the corporate world, domestic connectivity is catching up fast. In a recent Email Marketing Report (February 2000) there were 409 million email accounts worldwide in 1999, up from 234 million the previous year, that’s a growth rate of 170%. With current estimates projecting 700M+ mail accounts worldwide by the year 2005, the total pervasiveness of email will be truly woven into the fabric of society. Email is a type of plumbing that links us all together – globally. It does not take a rocket scientist to see that viruses engineered to exploit this global connectivity, have the potential to wreak havoc worldwide. This is what we are seeing right now. According to a recent study published by ICSA Labs, The Computer Virus Prevalence Survey 2000 (Nov 2000), there have been very significant changes in virus distribution methods – in 1996, 9% of viruses were distributed by email, in 2000 email was the main method of virus distribution at 87%. 2.2 Too much functionality? Consider the Scenario: Date: 25th December 1978, Christmas Day morning. Venue: My parents’ house. It’s Christmas day and I’ve just opened my largest Christmas present. It’s a chemistry set and I’m a very happy 8 year old boy! I can’t believe my luck ! I can clearly remember taking out and ogling nearly 100 test tubes full of various potions and powders. Mark Sunner, MessageLabs, November 2000 2 www.MessageLabs.com I can also remember my parents returning to the house, after having been with friends for just an hour. They find me standing in the center of the living room, looking very guilty. The living room was proudly sporting a new look, a fairly uniform dark purple splatter. Having been handed all the essential tools (chemicals, a large test tube, mentholated spirit and a wick) it was just a matter of time before I’d try my hand at some form of explosive. All this brings me to the powerful programming languages now incorporated as standard in today’s Office Application Suites and browsers. A while ago, writing a virus was just like writing any program and required a bit of savvy on behalf of the author. Rapid development tools we now take for granted simply did not exist in the past and this fact kept many would-be virus writers at bay. Putting the obvious malicious aspect to one side for a moment this was a job for men not boys. However the proliferation of the Internet has accelerated the need for new development tools and environments that can exploit this connectivity. Visual Basic for Applications (VBA) and Visual Basic Script (VBS) both present obvious advantages for power users who want better integration between applications, it also presents virus authors with a perfect toolkit for quickly developing viruses based around Office applications. More recently, the ability to easily gain access to just about any conceivable email function is responsible for the sharp increase in the number of worm type macro viruses now propagating in the wild. This is a growing trend, which is visible on just about every virus prevalence table available. Given the right tools, it is only a matter of time before a certain combination will cause havoc. 2.3 Common Platforms: Microsoft Outlook Consider the Scenario: Hybridization was the term given to a crude form of genetic engineering back in the 60’s. The idea was simple, cross the biggest, hardiest, fastest-growing varieties of corn (or any other crop) until you end up with SUPERCORN. SUPERCORN is so much better than regular corn. It yields more bushels per acre and is more resistant to disease. Soon there are millions of acres of SUPERCORN that are not only the same variety, but since it is derived from a single master plant it is genetically identical. And that's the dark side of hybridization, because if a new disease comes along that does bother SUPERCORN, it affects the entire crop. There is always the risk that the entire crop will die all at once. This brings us on to the Love Bug, the latest in a recent spate of computer virus furor that caused billions of dollars in damage. Hybridization comes into this drama because the I Love You worm isn't just a computer virus or a PC virus or a Windows virus or even an e-mail virus. I Love You is specifically a Microsoft Outlook/Visual Basic virus. It takes advantages of features in this SUPERCORN of e-mail programs to cause damage to the greatest possible number of users. Our networks, desktops and office suites are literally littered with SUPERCORN like Outlook. This is a strong argument for genetic diversity in software because what made the Love Bug so costly was Microsoft's success at getting people to use its software. In summary: We are all totally connected via the Internet and email. Writing viruses is easier than ever before. Once flaws are discovered in common software platforms, virus writers will exploit them, impacting a massive installed base. Mark Sunner, MessageLabs, November 2000 3 www.MessageLabs.com Section 3. Current Solutions are Outdated An important conclusion that can be drawn is that anti-virus solutions have been slow in evolving to meet the new virus threat. Anti-virus solutions in the main are still focused on the desktop or local gateway. A desktop strategy should certainly always exist as people will always continue to use suspect floppy disks and CDs from a magazine. Such solutions are however inadequate at dealing with a global outbreak such as the Love Bug. To help illustrate this suppose we could step into a time machine and travel back four years……. Four years in the past, somewhere in the Philippines, our bad guy sits hunched over his keyboard in a darkened room illuminated only by the phosphorous of his monitor. He is about to release a virus into a CompuServe forum. However, back then, the virus he is about to release is not a potential threat to the “computer user” over in the UK for weeks, if not months or years. The essential ingredient for virus pollination at this time is people. The very speed in which people can exchange floppy disks, .exe files and office documents is a very limiting factor – at this point, we are not all glued together. Anti virus strategies around at this time were adequate in dealing with the virus problem. Our time machine now whizzes forward to the present day. This time, our bad guy sits at his flat screen and posts his email aware virus into a news group. It’s a simple bit of Visual Basic Script (VBS) and thanks to COM, it is easily able to locate and exploit Microsoft Outlook, the dominant mail client. Depending on whose address book contains my email details, this virus threat could arrive within minutes. My desktop AV software does not know about this new virus nor can it catch it heuristically. It takes, on average, 6 hours for my AV vendor to obtain a sample and make a signature available. As everyone is trying to obtain the signature at the same time from the publicly accessed web-sites of my AV vendor, the ftp server is unavailable. In effect, a pseudo denial of service attack is being performed. Working in this way places a huge burden on network managers and dramatically increases the amount of anti-virus activity necessary to provide effective protection. Performing tasks such as maintaining multiple scanners and updating virus signatures several times day are often impractical for network administrators. Section 4. Evolving a New Approach: Scanning for Viruses at the Internet Level In order to provide better virus protection we need to implement scanning and detection systems higher up the food chain of mail delivery where economies of scale make a more sophisticated approach possible. The logic lies in scanning for email viruses at the Internet level. MessageLabs has been providing a virus scanning service at the ISP level for 2 years. The system took just under a year to develop and essentially integrated several commercial virus scanners into an ISPs mail infrastructure, plus a rules-based system for emergency outbreaks (such as Lovebug). The greatest hurdle to overcome with ISP virus scanning is scalability. From the first live date the Virus Scanning System started to intercept approximately 40 viruses each day, instantly vindicating the approach. Today the system has evolved into a carrier class solution and regularly intercepts in excess of 1000 viruses per day. As an important aside the system also generates valuable data about what viruses are actually in the wild in real-time and the effectiveness of many commercial scanners against our ‘live list’. The chart below details the progression of MessageLabs virus scanning technology and effectiveness over the last year. Some of the major lessons learned along the way were that more Mark Sunner, MessageLabs, November 2000 4 www.MessageLabs.com than one virus scanner is needed and that the overriding threat to computer users lay within macro viruses. We settled on 3 scanners ultimately finding that this number gave the most effective results without reducing performance. Significant re-coding of the mail platform was necessary in order to provide the self-tuning required ensuring latency remained low. Failure to use more than one scanner resulted in an average 3% non-detection rate. Given that a single virus could prove devastating, this was thought to be unacceptable. The average figure for virus detection across all email is 1 virus in every 1500 emails. From this we are able to estimate that an average of 66 viruses will pass through an ISP’s network for every 100,000 messages handled. This formula makes it easy for any ISP to estimate the potential effectiveness of implementing virus scanning technology with relative ease. Analysis over a long period of time has shown that a greater percentage of viruses come from ‘free’ mail accounts than from general private domains. Further investigation revealed that the average number of viruses contained within popular free mail accounts soars to 1 in 500. From this information we are able to broadly estimate that the average number of viruses passing through an ISP increases to 200 in 100,000 for free mail messages handled. We suspect webmail vastly increases the promiscuous use of multiple computers for handling documents and hence increases the chance of infection. 4.1 How does the Virus Scanning System Work? The system, known as the MessageLabs Virus Control Center (VCC), runs on scalable architecture comprising a cluster of towers densely populated with high performance servers, Cisco Catalysts and load distributors which host McAfee, F-Secure and Vfind virus scanning software. We have a rolling program to locate tower clusters at major peering locations around the world. Currently installations are deployed in London, Amsterdam and New York. Additional Towers will be deployed on a Global basis during Q2 and Q3 2001. Each tower comprises 2 Cisco load balancers, 2 Cisco high performance 100Mbs switches, 24 industrial PCs each running the MessageLabs proprietary mail engine and temperature and fan monitors linked to all PCs. A management system (PC) ensures that load is not only distributed but also tuned dynamically. If an individual mail server queue becomes excessive the management system lessens the delivery priority to the affected system. Excessively large emails are handled by a separate “Big-email server” to permit a more even flow. Should either the management server or big email server fail then an election is forced and other systems will take over either role. In order to perform the scan, we intercept all mail as it passes through our system so that it can be processed and examined for viruses before being allowed to continue to its ultimate destination. After being delivered to a tower the SMTP session is distributed to one of the scanning mail servers. The session is authenticated against the customer database to ensure that it is either coming from or going to a known customer. If authentication succeeds then mail and associated file attachments are decoded using open standard formats (nested and combinations are also decoded) and passed to the binary queue for scanning. During this process if an abnormal file structure is unpacked such as a “Zip of Death” expanding to several terabytes the file is rejected and an error message sent. The file is then passed through three commercial virus scanners and Skeptic™, MessageLabs own proprietary heuristics and rules based scanner used to detect the very latest viruses for which no signature is available. If all is well the message finally passes to the Processed queue where an optional corporate “Scan successful” message can be added. If a virus is detected the email is moved to a Mark Sunner, MessageLabs, November 2000 5 www.MessageLabs.com quarantine area where it will remain retrievable for a period of 10 days. Once 10 days have elapsed the email is destroyed. A separate Health monitor process constantly monitors the status of the mail server to ensure that all processes are running smoothly, security trip wires have not activated and that adequate disk space always remains. Health reports are then fed back to the management system, which is monitoring the health of the tower as a whole. These reports are in turn fed back to our central monitoring system which keeps an eye on our worldwide network. Several external processes exist to get updated information such as new customers or configuration changes of existing customers and also virus signature updates. As new data is received it is encrypted and delivered to a distribution host. Once there, a trigger is sent (push) to inform all the relevant towers that new data is waiting to be collected. The affected towers then collect (pull) the new data in via the management system. This distribution method is part of an overall security policy ensuring no communication is allowed into the towers other than SMTP. 4.2 How is the VCC Service Deployed? Getting customer’s mail scanned by the VCC is a very simple process but varies slightly depending on whether the customer has a leased line or dial up connection. Both scenarios are described as follows:Leased Line: Signing up leased line customers onto the service simply entails making the VCC the lowest MX record for a given domain. Once the scan has been completed the VCC then relays the mail onto its final destination. Outbound mail is relayed through the VCC by making a simple change to the customer’s outgoing mail gateway. ISDN/Dial up The dialup scenario is slightly more complex depending on whether the final mail relay can be contacted. In the majority of cases, mail for dial up customers is simply queued at the ISP until the customer connects and triggers mail delivery with something like a finger command or pulls the mail via POP. In either case it is necessary to have a private DNS structure to prevent mail looping back to the VCC where public DNS and the corresponding MX record will point. Creating a private DNS structure is a simple process. Scanning mail via the VCC introduces virtually no perceptible delay on the overall delivery of mail. During a four-week sample period processing approximately 1 million messages per day MessageLabs calculated that the average message size being processed was 66K and that the time taken to process these messages under normal load was 1.2 seconds. Scalability and resilience From the beginning the architecture behind the VCC solution was designed to be both scalable and resilient. Resilience starts within the towers, which have dual load balancers, dual switches and 24 mail servers for ultra redundancy. Towers are also always deployed as a pair so if a whole tower should fail it will always have an immediate twin ready to take over. Should a whole site fail, multi-site redundancy is achieved via MX records, which permit a dark site to take over the handling of mail. MessageLabs’ server cluster scalability and statistics data. BackOffice architecture uses MS SQL Server 7 for Windows NT. A centralised incorporates load balancing and redundancy to provide high performance, fault tolerance for management of customer configuration data, and customer Additional offsite servers provide further contingency. Mark Sunner, MessageLabs, November 2000 6 www.MessageLabs.com 4.3 Skeptic – MessageLabs’ proprietary heuristics and rules based scanner Skeptic is MessageLabs’ own virus scanner. Using our wealth of mail experience, we have developed a set of heuristics, which Skeptic uses to detect new viruses in email. Skeptic has been very successful, trapping the following viruses BEFORE signatures were available: ExploreZip Irok JS/Kak.A JS/Kak.days NewApt Lifestages.Worm PrettyPark.variant VBS/Fireburn.A VBS/LoveBug WinExt.worm WScript\Unicle.worm As an example of email heuristics, an email going to 20 or more recipients, and containing an office document with a macro, would be suspicious. If the macro also contained code that mass emailed, that would be very suspicious! Following the discovery of Bubbleboy we have added analysis of HTML formatted email - we search for virus-like executable scripts buried within HTML. We also added heuristics for various other scripting languages, such as VBS and JS. This enabled us to detect and trap the LoveBug virus over 10 hours before conventional anti-virus companies were able to make their signatures public. We are now often finding ourselves in the position of being aware of new viruses, or of having samples of new viruses before virus signatures are available. We are able to configure Skeptic very quickly to counter such threats. For instance, we can search for emails containing certain attachments, or with certain text in the subject line or body text. We can check if emails contain Office documents which contain macros. We can check attachments have specific MD5 checksums and can also mix these checks to be very specific and cut down the chance of false alarms. Usually, we have detection configured, testing and running within 20 minutes. Solutions from other vendors often require a wait of several hours, days or even weeks for the public signatures to arrive. Section 5. Mail Encryption It is sometimes proffered that scanning for viruses at either a gateway or ISP level becomes obsolete if the transient email is encrypted. Whilst on the surface this is seemingly a valid point, it has no basis on current fact. Firstly, the most recent statistics indicate the prevalence of encrypted email is extremely minor. Encryption will undoubtedly become an important issue in the future. As such, the solution employed will involve a dual keyed approach whereby the higher level scanning facility will have a copy of the potential recipient’s private key. Section 6. Making use of Live Statistics Finally, along with many anti-virus vendors, MessageLabs contributes to the wildlist. The wildlist (www.wildlist.org) is widely considered a defacto meeting point between anti-virus vendors. The data presented is essentially a compiled report of all viruses seen actively in the wild by all contributing vendors. By scanning for viruses at the Internet level, using 4 anti virus scanners, the actual number of viruses detected and intercepted is significantly higher than one single vendor’s scanner, deployed at the desktop or gateway. Therefore this type of virus prevention not only prevents viruses entering a company’s network, but it also provides the anti virus industry with continuous, real time virus statistics. This data is used by the anti virus vendors to develop signatures and further increase their virus knowledge base. Mark Sunner, MessageLabs, November 2000 7 www.MessageLabs.com