Bayes' Theorem

Chapter 10: Bayes’ Theorem in Real Life: Spam Filtering

If you hate spam, you’ll love Bayes’ Theorem.

Yep. It’s true, even if you don’t have a clue about what Bayes’ Theorem is.

Spam filtering has really improved over the last decade to the point where most of us don’t think about spam anymore. This is a fantastic and welcomed change from not too long ago. But how did spam filtering progress and become much more effective? After all, it wasn’t too long ago when we were regularly inundated with lots of junk mail.

In 1998 Microsoft applied for a spam filter patent that used a Bayesian filter, and in doing so ignited a new war on spam – Bayesian style. Other competitors soon joined in and Bayes’ Theorem quickly became the backbone of spam filtering.

Bayesian filters determine if an email is spam or not based on the email’s content. When an email is received, each word is read and the filter determines the probability of it being spam or legitimate (often defined as spam or ham). What sets Bayesian filters apart from other email filters though is that they learn and adapt to each individual email user. And that, in a nutshell, is why they are so effective.

Here’s how it works:
Spam filters built on a Bayesian network are typically pre-populated with a list of potential words and characteristics that spam contains. This list is usually derived from feeding the filter copious amounts of both spam and legitimate emails, which the filter uses to learn what constitutes spam. Take a moment and think of spam you’ve seen. What words pop up?

  • Usually, anything to do with sex
  • Ink
  • Deals
  • Secret
  • This list could on and on…

Where does the filter look for these words and phrases? The filter analyzes:

  • Words in the body of the message.
  • Words in the header.
  • Words in the metadata.

This list is continually updated as each email is received and the filter learns more and more about what to look for. The filter learns in two different ways:

  • Based off its own decisions.
  • Based off the user’s decisions (For example, you check an email as spam).

For example, if the word sports often appears in your email, the filter might conclude that the word sports has a very low probability of being spam (on a scale of 0-1, maybe a .1). However, if you never include deals with the word sports, the filter might conclude that an email with this phrase has a high probability of being spam ( on a scale of 0-1, .65).

So, how exactly could Bayes’ Theorem be used in detecting spam? Here is a simplified version. *Again, please note this is very simplified to demonstrate the concept of using Bayes’ Theorem with spam filtering. There are many more complexities that are involved, e.g., calculating priors, etc. In this example we have abbreviated the text:
C. Word = Certain Word

If you are interested in learning more, we would recommend Wikipedia’s article and Microsoft’s paper on Junk Filters.

Continue on to Chapter 11: Bayes’ Theorem History.