Email & Spam Filtering Stats: Jan 12, 2006

by on January 13, 2006

in Email

I wrote earlier that I hoped to do some side-by-side looks at how SpamCop, Yahoo & Gmail all handle spam fighting. Here’s the rundown for the mail received yesterday from around 12:30pm UK time to 10:30am today, with the summary being that SpamCop captures the most spam with the least amount of false matches. But that might also be helped by training I’ve done with it over time. Here’s the rundown:

Service

Yahoo

Gmail

SpamCop

Inbox

508

396

254

Spam

142

269

408

False Match

11

0

5

Total Mail

661

665

667

% Spam Caught

21%

40%

61%

% False Match

8%

0%

1%

What do the figures on the chart mean? Here we go:

Inbox: How much mail was in my inbox when I checked at each place at the same time today. Gmail is the exception. Since I download email from Gmail throughout the day, my inbox was empty. I’ve estimated how much would be in there if I had NOT downloaded by looking at the total mail I received at Yahoo and SpamCop (about 665 items) and using that as an estimate. Why’s the inbox figure important? It shows how much mail each system forced me to deal with. If a lot of spam gets through to your inbox, more work. SpamCop would have saved me the most.

Spam: Shows how much spam each service filtered. The more spam filtered, the better — EXCEPT if there are a lot of false matches, as explained below. SpamCop did the best job in spam catching. Some additional notes:

  • Yahoo uses what it calls SpamGuard, and it’s an off or on affair, no degrees, no way to crank it down to 2 or up to 11. If you pay, you can get what’s promised to be better protection with SpamGuard Plus.
  • Gmail has spam filtering on by default. You can’t turn it off or adjust settings.
  • SpamCop has a wide range of options. You can use none of them or all. I currently have it to use SpamAssassin with a limit of 5, plus to check against the SpamCop Blacklist, the DSBL open relays and blacklists for South Korea, China, Nigeria, Argentina, Brazil & the SORBS list. This degree of control is nice, but the options and why you might use them aren’t really explained.

False Match: Shows how many items were considered spam and held when in reality, they were legit mail. Why’s the false match rate important? The higher the false match rate, the more work is required to manually check and make sure important messages weren’t missing. More for each service:

  • Yahoo in particular savaged newsletters I receive. Search Engine Guide, iMedia, MediaPost, MarketWatch, my weekly Odeon cinema listings and our own SearchDay newsletter all got nabbed as spam. Four emails from people responding to messages I’d sent were also held. There is a way to flag things held as “Not Spam.” I believe that over time, this would help train Yahoo not to reject such material. I may test this in the future.
  • Gmail had no false matches. It also caught more actual spam than Yahoo, so points for that. However, it didn’t catch as much as SpamCop. Like Yahoo, Gmail has a “Not Spam” button that I suspect if used may whitelist things and prevent them from being caught in the future.
  • SpamCop grabbed the most spam with the lowest false match rate, so a pretty good compromise. Like Yahoo, it nabbed the Search Engine Guide newsletter. It also grabbed two notifications from my Yahoo Groups mailing list, a message to that list and one response to a message I’d sent someone else.One thing I love about SpamCop is that junk mail is sorted alphabetically by default. It makes it very easy to see all the non-English spam I’m getting, plus see the duplicate messages and so on. I could easily do the same at Yahoo (in the beta I used) by clicking to sort by subject. Gmail has no ability like this.SpamCop also has the ability to “Release & Whitelist” items similar to Yahoo & Google. I’ve used this over time at SpamCop, so that is one reason why it might have a lower false match rate than Yahoo. I currently have 210 addresses on my whitelist. I might delete these and start fresh in the future, to better compare. I also have three addresses on my blacklist.

Total Mail: The total amount of mail I received. My “real” mail is uncertain.

Gmail sent me 396 items, and some of those were definitely spam that got through. MailWasher, which I use to prescreen further, has a statistics report window showing what I’ve deleted. Estimated the best I can, I’d say about 100 items of spam got past Gmail. So call it 300 items, which is pretty close to what was in my SpamCop inbox.

Why not take the SpamCop inbox figure as “real” mail given the high amount of spam it pulled? I know from experience some spam still gets through. Using the SpamCop figure, I’d say my real mail was about 200 items, based on a typical day. So overall, 200-300 items, that’s my estimate of what I dealt with yesterday.

% Spam Caught: The higher the better, assuming the false match rate isn’t high. Percentage comes from spam caught divided by total mail received.

% False Match: The lower the better, assuming you also have a high spam caught rate. Percentage comes from false matches divided by total spam caught.

Share

{ 1 comment }

1 SEO-siti-web July 5, 2008 at 11:58 pm

SpamCop seems to be the best choice, but i think gmail will overcome it! :)

Comments on this entry are closed.

Previous post:

Next post: