Glossary of CSE Part 14 - Bayesian Analysis | HackTHatCORE
Bayesian Analysis
Formerly obscure topics in mathematics have a way of sud-
denly becoming relevant in the information age. For exam-
ple, the true/false algebraic logic invented by George Boole
in the 19th century turned out to perfectly map the opera-
tion of electronic on/off in computer circuits.
The Reverend Thomas Bayes (1701?–1761) was another
formerly obscure British mathematician who discovered a
completely different way of looking at probability. Classical
probability assumes that one can make no prior assump-
tions about the events to be tested. That is, when throwing
a die, one does not base the probability that it will come up
with a six on the results of any prior throws. Of course that
approach is correct in that probability of a six is always 1 in
6 (as long as the dice are honest).
In some situations, however, what has already hap-
pened does influence the probability of a future event.
Consider a blackjack player who wants to know the prob-
ability that the next card drawn will be a face card. If the
deck has been properly shuffled, that probability starts out
as 12/52 (or 3/13), since there are 12 face cards in the deck
of 52 cards.
But suppose that, of the six cards dealt to three players in
the first hand, two are face cards. When the dealer deals the
next hand, the probability that any card will be a face card has changed. There are now two fewer face cards (12 - 2 = 10)
and four fewer non-face cards (40 - 4 = 36), so the probability
that a given card is a face card becomes 10/36 or 5/18.
While this is pretty straightforward, in many situations
one cannot easily calculate the shifting probabilities. What
Bayes discovered was a more general formula:
P(T|E) = (P(E|T) * P(T)) / P(E)
In this formula T is a theory or hypothesis about a
future event. E represents a new piece of evidence that
tends to support or oppose the hypothesis. P(T) is an esti-
mate of the probability that T is true, before considering the
evidence represented by E. The question then becomes: If
E is true, what happens to the estimate of the probability
that T is true? This is called a conditional probability, rep-
resented by the left side of the equation, P(T|E), which is
read “the probability of T, given E.” The right side of Bayes’s
equation considers the reverse probability—that E will be
true if T turns out to be true. This is represented by P(E|T),
multiplied by the prior probability of T and divided by the
independent probability of E.
Practical Applications
In the real world one generally has imperfect knowledge about the future, and probabilities are seldom as clear cut as those available to the card counter at the blackjack table. However, Bayes’s formula makes it possible to continually adjust or “tune” estimates based upon the accumulating evidence. One of the most common applications of Bayes- ian analysis is in e-mail filters (see spam ). Bayesian spam filters work by having the user identify a sample of mes- sages as either spam or not spam. The filter then looks for patterns in the spam and non-spam messages and calcu- lates probabilities that a future message containing those patterns will be spam. The filter then blocks future mes- sages that are (above some specified threshold) probably spam. While it is not perfect and does require work on the part of the user, this technique has been quite effective in blocking spam. A Bayesian algorithm’s effectiveness can be expressed in terms of its rate of false positives (in the spam example, this would be the percentage of messages that have been mistak- enly classified as spam). If the rate of “true positives” is too low, the algorithm is not effective enough. However, if the rate of false positives is too high, the negative effects (blocking wanted e-mail) might outweigh the positive ones (blocking unwanted spam).
References:
- Kantor, Andrew. “Bayesian Spam Filters Use Math that Works Like Magic.” USA Today online, September 17, 2004. Avail- able online. URL: http://www.usatoday.com/tech/columnist/ andrewkantor/2004-09-17-kantor_x.htm. Accessed March 15, 2007.
- Lee, Peter M. Bayesian Statistics: An Introduction. 3rd ed. New York: Wiley, 2004.
- Sivia, D. S. Data Analysis: A Bayesian Tutorial. 2nd ed. New York: Oxford University Press, 2006.
Comments
Post a Comment