TuningMailscan

advertisement
Tuning Your Mailscan Account
07/28/06 – 3 – AJR
Daily Training
– We recommend that you daily process your Cache Contents for each non-zero cache –
be sure to click “Confirm the Status of these Items” when done with each cache:
Adjusting Your Threshhold
– Your spam threshhold is a number representing Mailscan's dividing line between spam
and non-spam (good email) – this number is usually 5.0 or less – any email that scores at
or above the threshhold is considered spam, and anything less is non-spam
– An email's spam score is the sum of many small numbers, each of which gets added
whenever the email fails one of Mailscan's 700+ spam tests
– Suppose an email accumulates a total spam score of 4.1 – if your threshhold is 4.25 that
email is classified as non-spam – if your threshhold is 4.0 it's classified as spam – the
lower your threshhold, the more spam you'll catch BUT... you also increase the
likelihood of a high-scoring piece of good email being classified as spam – setting
your threshhold is an art, not a science
– To adjust your threshhold, click the gear icon on your Mailscan home page:
– Then click on your email address in the upper right corner of the E-Mail Addresses box:
Tuning Your Mailscan Account – page 1 of 7
– You'll now be in your Mail Filter Settings page:
– Change your threshhold by editing the value Consider mail 'Spam' when Score is >= (in
this screenshot it's being set to 4.0), then click Update This Address' Settings – the new
value takes effect immediately for all new email
– We recommend adjusting your threshhold up or down in increments of 0.25 or less – it's
smart to adjust slowly and watch the results for a few days
Examining Mailscan Statistics
– It is instructive and impressive to check Mailscan's stats to see what it's really doing
– Click the chart icon on your Mailscan home page:
– This will take you to your Statistics page:
Tuning Your Mailscan Account – page 2 of 7
– The information in this section also applies to your personal stats, but the systemwide
stats are a better indication of Mailscan's overall performance – at the bottom of this page
click View Systemwide Statistics:
– The colors of the pie chart wedges correspond to the rows of the table (ie, medium pink =
confirmed spam)
– Look at the top three Mail Types:
–
–
–
–
Unconfirmed Non-spam is good email that has not yet been classified by trainers
Confirmed Non-spam is good email that has been classified by trainers
False Positives are bad, since Mailscan has mistakenly classified good email as spam
The sum of these three Mail Types represents all the non-spam we get (18% of incoming
email in this screenshot)
– Look at the next three Mail Types:
Tuning Your Mailscan Account – page 3 of 7
– Suspected spam is spam that has not yet been classified by trainers
– Confirmed Spam is spam that has been classified by trainers
– False Negatives are good – when we trainers find spam in our non-spam cache and
reclassify it, we're telling Mailscan, “here's a piece of junk you missed... learn about it so
you'll be more likely to recognize it next time and classify it as spam” – false negatives
are a sign that Mailscan is “learning”
– The sum of these second three Mail Types represents all the spam we get (78.8% of
incoming email in this screenshot) – this corresponds with recent studies which suggest
75% or more of all Internet email is spam
– Look at the bottom of this table:
– Efficiency is the best overall measure of the Mailscan's performance – the more diligently
we trainers confirm/reclassify spam/non-spam, the higher the efficiency will become –
we hope to push this slow-moving number into the high 90s
– Viruses/Malware are found in 2.9% of incoming email – “viruses” is a catch-all term for
viruses, worms and trojans (bad stuff which can directly infect a PC) – “malware” is a
catch-all term for spyware, adware, phishing scams, and HTML-borne programming
(more bad stuff which can slow down or disable your PC, steal your data and passwords,
turn your PC into a spam generator, turn your PC into a robot [“bot”] that attacks other
computers, reprogram your browser, etc) – all infected email is immediately discarded
– Click on View Virus Statistics:
Tuning Your Mailscan Account – page 4 of 7
– The most frequently detected viruses and malware are at the top of the Viruses list –
notice that the top five are phishing scams embedded in HTML-formatted email, and the
top one alone (HTML.Phishing.Pay-168) accounts for 54.3% of all viruses received to
date – can you see why HTML-formatted email should be avoided like the plague?
– Return to the systemwide stats page, then click on View Spamassassin Rule Statistics:
– SpamAssassin is the name of the software which actually does the spam analysis – it uses
over 700 rules to examine each email, and every time an email meets a rule's criteria, the
rule “triggers” – this table is sorted with the most frequently triggered rules at the top
– Razor2 is one of the two global anti-spam networks to which we've connected Mailscan,
and the fact that Razor2 rules are so high on the list shows how much this participation
helps us fight spam – the other network is DCC, which appears in the fifth rule
– The second rule (RAZOR2_CHECK) has a Score of 1.511 – this means each time an
email triggers this rule, 1.511 points are added to its accumulating spam score, helping tip
it toward a classification as spam
– Even more significant is the BAYES_99 rule – the type of artificial intelligence used by
SpamAssassin is known as a “bayesian filter”, and it is the bayesian filter that we are
training – when the BAYES_99 rule triggers it means “because of my training I'm 99-
Tuning Your Mailscan Account – page 5 of 7
100% certain this email is spam, and I'm adding 3.500 points to its spam score” – 3.5
points are a lot, and help to quickly tip the email toward a classification as spam
White and Black Lists
– A whitelist is a list of email addresses (ie, joe@boguscom.com) and/or domains (ie,
boguscom.com) for which email addressed to you is always accepted, no matter what –
this means you must completely trust the address or domain because its email will
bypass Mailscan's tests, including the virus check – in other words, don't take
whitelists lightly
– A blacklist is just the opposite – it's a list of addresses and/or domains for which email
addressed to you will always be blocked and discarded, no matter what
– The purpose of whitelists and blacklists is to allow or block email which Mailscan is not
handling correctly for you – it's your last resort (not first) in tuning Mailscan
– Click on the divided rectangle icon to reach your White/Blacklist Settings:
– To add a single address to your whitelist, type it into the box as shown below
(joe@boguscom.com in this example), then click Add to List:
– To remove an entry from your whitelist or blacklist, click in the empty circle under
Remove, then click Update:
Strategy and Expectations
Tuning Your Mailscan Account – page 6 of 7
– Now that you understand how Mailscan works, here's a strategy for tuning it...
– When confirming your non-spam, watch for two trends in spam values: 1) the lowest
normal value for false negatives you reclassify as spam, and 2) the highest normal value
for non-spam (your actual good email)
– If you're lucky, your lowest normal false negative value (ie, 3.9) will be higher than your
highest normal value for good email (ie, 3.0) – if this is the case, set your threshhold a
little lower than your false negative value (ie, 3.75) – watch the results for a few days and
adjust as necessary
– If you're not so lucky and are constantly reclassifying low-scoring spam, consider
lowering your threshhold and whitelisting the few known-good addresses that regularly
score above it – an alternative is to blacklist recurring spam addresses if there aren't too
many of them (but there will always be more and you'll always be playing catch-up)
– A reasonable goal is to keep tweaking your threshhold, whitelists and blacklists until
Mailscan approaches 100% accuracy in classifying your email
– It is unreasonable to expect Mailscan to consistently achieve 100% accuracy because
“things change” – your idea of spam may change, and without question the tactics of
spammers will change – the smarter spammers (who increasingly are well trained and
may be financed by organized crime syndicates) know exactly how SpamAssassin works
and keep trying to find ways around its rules
Opting Out
– If you eventually get tired of handling your own email and training Mailscan, you can
easily opt out and return to “non-interactive mode”
– To opt out, email the HelpDesk <helpdesk@nicc.edu> with your request
– As soon as we delete your Mailscan account (including your personal threshhold,
blacklists and whitelists), all your email will be processed according to Mailscan's
settings for the NICC domain – you may get a little more spam, but you won't have to do
anything more than delete it from your New mail folder
Thank You
– Once again, thank you for being a Mailscan trainer – since all of NICC's email is
processed by the same rules and training, the few minutes you daily contribute to the
training of the bayesian filter pays off for everyone
– If you have further questions or suggestions about Mailscan, please use the server's builtin Help system or contact the HelpDesk <helpdesk@nicc.edu>
Tuning Your Mailscan Account – page 7 of 7
Download