Glossary of CSE Part 14 - Bayesian Analysis | HackTHatCORE

Glossary of CSE Part 14 - Bayesian Analysis | HackTHatCORE
Bayesian Analysis

Bayesian Analysis

Formerly obscure topics in mathematics have a way of sud- denly becoming relevant in the information age. For exam- ple, the true/false algebraic logic invented by George Boole in the 19th century turned out to perfectly map the opera- tion of electronic on/off in computer circuits. The Reverend Thomas Bayes (1701?–1761) was another formerly obscure British mathematician who discovered a completely different way of looking at probability. Classical probability assumes that one can make no prior assump- tions about the events to be tested. That is, when throwing a die, one does not base the probability that it will come up with a six on the results of any prior throws. Of course that approach is correct in that probability of a six is always 1 in 6 (as long as the dice are honest). In some situations, however, what has already hap- pened does influence the probability of a future event. Consider a blackjack player who wants to know the prob- ability that the next card drawn will be a face card. If the deck has been properly shuffled, that probability starts out as 12/52 (or 3/13), since there are 12 face cards in the deck of 52 cards.
But suppose that, of the six cards dealt to three players in the first hand, two are face cards. When the dealer deals the next hand, the probability that any card will be a face card has changed. There are now two fewer face cards (12 - 2 = 10) and four fewer non-face cards (40 - 4 = 36), so the probability that a given card is a face card becomes 10/36 or 5/18. While this is pretty straightforward, in many situations one cannot easily calculate the shifting probabilities. What Bayes discovered was a more general formula:
P(T|E) = (P(E|T) * P(T)) / P(E)
In this formula T is a theory or hypothesis about a future event. E represents a new piece of evidence that tends to support or oppose the hypothesis. P(T) is an esti- mate of the probability that T is true, before considering the evidence represented by E. The question then becomes: If E is true, what happens to the estimate of the probability that T is true? This is called a conditional probability, rep- resented by the left side of the equation, P(T|E), which is read “the probability of T, given E.” The right side of Bayes’s equation considers the reverse probability—that E will be true if T turns out to be true. This is represented by P(E|T), multiplied by the prior probability of T and divided by the independent probability of E.

Practical Applications

In the real world one generally has imperfect knowledge about the future, and probabilities are seldom as clear cut as those available to the card counter at the blackjack table. However, Bayes’s formula makes it possible to continually adjust or “tune” estimates based upon the accumulating evidence. One of the most common applications of Bayes- ian analysis is in e-mail filters (see spam ). Bayesian spam filters work by having the user identify a sample of mes- sages as either spam or not spam. The filter then looks for patterns in the spam and non-spam messages and calcu- lates probabilities that a future message containing those patterns will be spam. The filter then blocks future mes- sages that are (above some specified threshold) probably spam. While it is not perfect and does require work on the part of the user, this technique has been quite effective in blocking spam. A Bayesian algorithm’s effectiveness can be expressed in terms of its rate of false positives (in the spam example, this would be the percentage of messages that have been mistak- enly classified as spam). If the rate of “true positives” is too low, the algorithm is not effective enough. However, if the rate of false positives is too high, the negative effects (blocking wanted e-mail) might outweigh the positive ones (blocking unwanted spam).

References:

  • Kantor, Andrew. “Bayesian Spam Filters Use Math that Works Like Magic.” USA Today online, September 17, 2004. Avail- able online. URL: http://www.usatoday.com/tech/columnist/ andrewkantor/2004-09-17-kantor_x.htm. Accessed March 15, 2007.
  • Lee, Peter M. Bayesian Statistics: An Introduction. 3rd ed. New York: Wiley, 2004.
  • Sivia, D. S. Data Analysis: A Bayesian Tutorial. 2nd ed. New York: Oxford University Press, 2006.

Comments