Bayes Theorem

By Aaron Brest

I’ll pose a famous question made SAS-centric. I picked a student from high school who I tell you is shy and introverted. Do you expect them to be in Model United Nations (MUN), or Math Club?

A common thinking process is that because MUN is an activity centered around public speaking, a shy and introverted person probably wouldn’t thrive there and thus wouldn’t be a part of it. They’d be in the Math Club. This is a good line of reasoning—from my experience, people in the field of mathematics tend to be more socially uncomfortable than diplomats, so:

Let’s assume that 75% of Math Club members are shy (made up)
Let's assume that 15% of MUN members are shy (made up)

But a key piece of information that I’ve neglected to tell you is the number of people in each club.

Math Club has about 20 active members, according to the avg. of Frank Xie and Kangmin Kim’s responses
MUN has about 100 active members, according to Sean Lai

In our quest for evaluating the likelihood of a shy and introverted person to be in Math Club or MUN, we forgot to incorporate other pieces of information into our thought process. With the memberships in mind, we can construct a probability table to show the relative likelihoods.

If we know the person is shy and introverted, he ultimately only has a probability of 1540 or 37.5% that he’s in MC.

It turns out that there isn’t a need to table out our data when facing a problem such as this one. Bayes’ theorem, recorded to be derived in the middle 18th century, is a systematic way to get these comprehensive probabilities. It’s derivation begins with the equation for conditional probability, which Precalculus students will be familiar with:

P(A|B)=P(A B)P(A)
P(B|A)=P(B A)P(A)
P(BA)=P(AB)
P(B|A)P(A)=P(AB)

And when we replace probability of A and B with the expression on the left, we get the below, Bayes’ Theorem:

P(A|B) = P(B|A) P(A)P(B)

Semantically, this equation reads that the probability of A given B is equal to the probability of B given A multiplied by probability of A all over probability of B. Taking the example of the student above, let’s calculate the probability of him being in Math Club with Bayes’ Theorem.

P(MC member | shyintrovert) = P( shy introvert | MC member)P(MC member)P( shy introvert)
P(MC member | shyintrovert) = (0.75) (0.1666)(0.333)
P(MC member | shyintrovert) = 0.375

This equation has applications beyond gauging the probability of club membership. I’ll pose another question. You are a doctor and your patient is tested positive for a terrible cancer, and you are aware that having cancer almost guarantees having the flu. Let’s assume that:

99.999999% of cancer-afflicted patients have a cough (made up)
20% of flu-afflicted patients have a cough (made up)
0.003% of the population has cancer (avg. of male & female rates from NCI, 2020 multiplied by 10)
7% of the population have the flu (CDC, 2010-2020)

Let’s find the likelihood that our coughing patient has cancer.

P(cancer | coughing) = P(coughing | cancer)P(cancer)P(coughing)
P(cancer | coughing) = 99.999999 0.000030.07
P(cancer | coughing) =0.00042857

Despite the fact that a cancer patient is guaranteed a cough, there’s only a 0.043% probability that a coughing patient has cancer.

This leads us into today’s actual topic—not an equation, but a way of thinking. For less scientific propositions like “McDonalds is a bad restaurant,” Bayes’ rule serves as a new way to incorporate evidence to prove or disprove a claim. Normally, we have a standing opinion for these sorts of qualitative assertions, but we remain unflinching until we face a substantially powerful piece of evidence to reverse our notion; Bayes’ Theorem allows us to treat these opinions as slidable values of beliefs, which we can constantly shift with new evidence.

Let’s suppose I think that McDonalds is a bad restaurant—I’m not a McFan. One day, a friend of mine tells me that out of the 100 orders he’s placed in McDonalds, 40 orders were incorrect. That’s a pretty bad tally—40% of his orders were served incorrectly. How much should this piece of evidence cement my McDonald’s-disliking persuasions? Let’s find out:

I know that this is messy, but we need some way to link probability of faulty orders to badness. Essentially think of P(McDonalds is bad) as our standing opinion of McDonalds, with values near-0 representing belief that McDonalds is not bad, and vice versa.

We already have the probability of an incorrect order given that the restaurant is McDonalds, 0.4, but you’d hopefully realize that this would be quite similar to the probability of an incorrect order given McDonalds, given our current opinion of McDonalds—I think of McDonalds as unfavorable, and this evidence does not reveal a more favorable McDonalds.

Before we go on, try to think about what we are doing when we apply Bayes’ rule to a proposition and a piece of evidence. In McDonald’s terms, the probability of McDonalds being a bad restaurant given the probability of an incorrect order at McDonalds is equal to the probability of an incorrect order at McDonalds given that McDonalds is bad divided by the probability of an incorrect burger at McDonalds multiplied by our prior belief that McDonalds is bad.

That’s a lot to unpackage, but the key concept here is that P(B|A)P(B) serves as a scalar for our prior belief. This is the updator of P(A). If P(B|A)P(B) is less than one, the evidence has proven our proposition less likely. If P(B|A)P(B) is greater than one, the evidence has made our proposition less likely.

Now let’s go back to McDonalds. Since the probability of getting an incorrect burger given McDonalds is 0.4, and the probability of getting an incorrect burger at McDonalds given our current opinion of McDonalds is a similar value, we can claim that P(B|A)P(B) will be a bit larger than one. Our friend’s struggle with McDonalds isn’t that significant of a piece of evidence.

This method of updating our beliefs is formally referred to as Bayesian inference. Pedantics aside, this method reveals that probability can be interpreted as a degree of belief—being a “Bayesian” shouldn’t mean that you are plugging values into the theorem, but it means that you should keep values of P(B|A)P(B) in mind.

How more does event B happen given A than under normal circumstances? Ultimately, this quotient shows the strength of new evidence—how you should update your prior belief.

References

Laplace, P.-S. de. (1774). Oeuvres Complètes De Laplace. Gallica. Retrieved November 21, 2022, from https://gallica.bnf.fr/ark:/12148/bpt6k77596b/f284.image

Source shows Pierre-Simon de Laplace’s discovery of Rev. Thomas Baye’s theorem. Detailed on page 29. For a quick history, Bayes essentially conceived of the equation, and Laplace refined the math and pioneered the modern thinking surrounding it.

Further Reading

Bayes Theorem

Laplace, P.-S. de. (1774). Oeuvres Complètes De Laplace. Gallica. Retrieved November 21, 2022, from https://gallica.bnf.fr/ark:/12148/bpt6k77596b/f284.image

Source shows Pierre-Simon de Laplace’s discovery of Rev. Thomas Baye’s theorem. Detailed on page 29. For a quick history, Bayes essentially conceived of the equation, and Laplace refined the math and pioneered the modern thinking surrounding it.

Bayes, T., & Price, R. (1763). An Essay towards Solving a Problem in the Doctrine of Chances (Vol. 53). Hafner.

Laplace, P. S., Truscott, F. W., Emory, F. L., & Bell, E. T. (1902). A Philosophical Essay on Probabilities. Dover Publications

Recent Posts

Comments

CONTACT US

CONTACT US