Tuesday, November 13, 2007

The War Against Spam

In my earlier post about IT security, I described the Cold War between hackers/crackers/spammers and IT departments. Spam control is one of my most challenging battlefields.

Whenever I speak about security, I describe it as a Cold War between hackers/crackers/spammers and Information Technology departments. Spam control is one of our most challenging battlefields.

At BIDMC, we receive an average of 886,674 emails every day from the internet. We deliver 57,103 of these, meaning that 829,751 of these are Spam. This translates into 302,859,115 Spam per year or over a third of a BILLION Spam.

There are many commercial products on the market that can help with this problem. At BIDMC and Harvard Medical School we use Symantec Brightmail Anti-Spam Version 6.0. Here's the challenge - it's not easy to distinguish legitimate clinical email from advertising. In a medical environment our clinicians describe anatomy, medications, and diagnoses that might be the same key words used in emails which advertise herbals to enlarge your body parts. Suppose that our filters are tuned so tightly that all Spam is eliminated but also 1% of legitimate email is also blocked. The cost of this solution would be that 208,425 legitimate emails per year would be undelivered. Conversely, suppose our Spam filters are relaxed so that no legitimate email is blocked but also 1% of Spam gets through. The cost of this solution is that 3 million Spam make it to our inboxes every year.

The balance between false positives (blocking legitimate email) and false negatives (letting Spam through) is quite challenging and requires continuous updating of our Spam filtering techniques. We blacklist known spamming sites. We whitelist sites which send emails about anatomical parts, but are known clinical partners. We have a Spam Feedback mailbox which provides continuous feedback to Brightmail. We use Exchange and Outlook rules to automatically move Spam into folders. We block all ZIP files from the internet but notify recipients that an email containing a ZIP was received and blocked.

Two types of Spam still get through

1. Spammers embed graphics of advertisements instead of text. Since computers cannot read graphics, we cannot filter them

2. Spammers use words that are not unique i.e. "enhance your being a male" that cannot be filtered without removing legitimate email

At present, using Brightmail and the other techniques described above, we block 99% of all Spam (one third of a BILLION) and deliver nearly 100% of legitimate email, allowing 3 million Spam per year to land in our mailboxes but ensuring our doctors and staff get the mission critical email they need to deliver good care. We'll continue to enhance our Spam filtering systems, but you can still expect some Spam to get through. As fast as we innovate, spammers innovate, creating a continuous battle against Spam.

The ultimate answer may be that the internet email infrastructure itself needs to be revised to deny all email traffic except that which is specifically whitelisted by email servers and users. Earthlink and other ISPs have used this approach. It's a bit irritating for the sender who is told that email will not be received until the recipient approves the sender. It's a hassle for the recipient who has to approve every incoming email sender. The result however is that offending senders are blocked forever and no spam passes through the human medicated approval process.

Other alternatives are to charge bulk email senders postage for sending their contents over the internet, but that's tomorrow's blog entry!

No comments:

Post a Comment