|
|
A Last Go At Spam Filtering Before Whitelisting
Yesterday was the last day I let my three mail services run without training them to understand false matches. It was also the first time that SpamCop ran with all previous training removed. Despite that, it still performed great. False matches were up a bit with Gmail, while Yahoo Mail was simply appalling in performance. The stats below, along with a first look at "whitelisting" mail by identifying false matches. Those new to the series should see my Email category and read from the oldest up.
Let's start with the stats. For more on what the figures mean, see this page. These are for January 19, 10:30am UK time through January 20, 11:30am UK time.
Both Gmail and SpamCop have generally been catching spam in the 60 percent range, so yesterday seemed a tough one for them. As for Yahoo, it continues to catch far less spam. Worse, of the spam it grabs, it has a far higher false match rate. What did Yahoo nab as spam yesterday that wasn't?
For the first time, I've now used the "Not Spam" button that Yahoo Mail Beta offers to identify the false matches. Several of the newsletters it caught yesterday as spam come each day. We'll see if using Not Spam helps train Yahoo not to catch these when I check things tomorrow. Annoyingly, there's no way to see that the addresses have actually been added to any whitelist. Gmail did far better than Yahoo on the false match front, but it still grabbed a number of items.
Like Yahoo, Gmail has a Not Spam button you can use to indicate if something was nabbed by mistake. I've done that to the false matches. Also like Yahoo, there's unfortunately no way to see exactly what addresses (if any) have been added to a personal whitelist. In addition, I continue to find it annoying that I can't sort messages in my spam folder by subject, as you can with Yahoo and SpamCop. This makes it much easier to scan and spot things falsely held as spam, especially since items in non-Latin languages get grouped together. Over at SpamCop, removing all my previous filters didn't cause the false match rate to go up. SpamCop held only one item, a message I'd sent myself. I used the "Release and Whitelist" feature to train SpamCop about this. I also love that by going into Options, then SpamCop Tools, then Manage Your Personal Whitelist, I can see that my address was indeed added to the whitelist. Way back, I wrote that despite the spam catching at either SpamCop or Gmail, both would let some spam through. Here are some stats from yesterday to ponder:
What's this showing? In the first chart, you could see that Gmail stopped a lot of spam from getting into my inbox at all. That 297 figure for my inbox represents how much mail was allowed through what's effectively my first line of defense, Gmail's own spam filtering. My second line of defense, as I've written, is Mailwasher. It has its own spam filtering features, along with a blacklist I've built up over years. Using it, I stopped another 76 items from hitting my Outlook mail application. In other words, 26 percent of what Gmail thought was "clean" wasn't. I've never tested this with SpamCop, but in my experience, it probably lets about 20 percent through. I also wanted to add a bit on why I got started using Mailwasher but still want filtering on my server as well, as I explained in comments on Jeremy's blog:
Ironically, there's such as easy way for Gmail or SpamCop to improve the spam still getting past their filters. Just give me an option to filter out messages predominantly using Asian or Cyrillic characters. That's what's getting through. My assumption is that the spam filters they are using just don't work well in non-English or non-Latin languages. With SpamCop, I can kind of rig this by finding a unique character, the Asian or Cyrillic equivalent of the letter "e" in English, the most popular letter used. Unfortunately, the filtering only happens when you log into web mail. With Gmail, I might be able to make that type of filter work despite doing POP downloads. I'll try later. But unfortunately, I can't make it automatically move items to the spam folder for possible review. I have to tag them (which means they're still in the inbox) or throw them in the trash (which may work, but it's another thing to review). Finally, I leave you with this:
Look at the top. A first I thought these must be ads, but there's no ad coding. That example just leads here. They've been out since at least April 2005. By Danny Sullivan on Jan. 19, 2006 | PermalinkSee related posts in: Email
Next Post: Abandoning NewsGator Because Of Portability Issues Comments Want to comment? If you are signed into TypeKey, you'll see a form below. No form? Click on the sign-in link below, and you can sign-in or sign-up for a free account. Sorry you have to use TypeKey, but I use it to avoid comment spam. All comments currently appear automatically after posting.
|
Subscribe! Search
|
Leave a comment