The unintuitive Bayesian thinking

What do you think the answer is for the following question?

Let's assume that one in a hundred women have breast cancer. Suppose there is a test with 90% sensitivity (true-positive rate) and a 9% false-positive rate. That is, 90% of the women with cancer will test positive, but also 9% of those who don't have the disease.

A woman tests positive. What is the probability that she has cancer? Try to answer the question before reading further.

Almost everyone (including most doctors) would say with little hesitation that it's around 80-90%. By definition, isn't it? And everyone saying so would be wrong. The correct answer is just above 9%.

Here is the demonstration. Of 1,000 women, on average 10 have cancer and 990 have not (1%). Everyone tests. Of the 10 women with the disease, 9 get a positive result (90% true-positive rate). Of the 990 healthy women, 89 (9% false-positive rate).

Together, 98 women tested positive, even though there are statistically only 9 of them who have cancer. Therefore if a woman tests positive, her chances of being one of the unfortunates are 9/98=~9.17%.

This example is just one of the many Steven Pinker uses, in his new book, Rationality, to demonstrate how unreliable sometimes our intuitions are in measuring up reality. The book is a great reservoir of examples and explanations of human folly, but this particular instance impressed me the most. Because even after reading about it previously, and having been explained multiple times, I still find it hard to internalize.

"But we know that the test is 90% reliable?! What does that number mean then?" - screams a voice in my head.

This result seems so wrong intuitively that it's hard to shake off the incredulity, even after going through the numbers a couple of times. But there is no catch here. The numbers are right and our intuition is just wrong.

The reason for the stark mismatch between the real probabilities and our intuitive guesses is that humans are very bad at estimating probabilities. When facing problems like the one described, our mind latches onto the sensitivity rate (90%), refuses to let go, and completely ignores the base rate: the 1% prevalence rate of the disease among the population.

In the first draft of this post, I continued with a second example from Pinker's book, that dresses the same abstract problem in an intuitively more appealing narrative. But then, on the net, I stumbled across a much better one. One for me, Steven! Or maybe one for Judea Pearl, the computer scientist developing the mathematical framework for Bayesian networks, who used the example below. More about all things Bayesian later, first let's see the example.

Imagine that instead of the probability of having a disease, we want to calculate the probability of having rained the night before. Our test is checking whether the grass is wet. Its true-positive rate is 90% (10% of the cases the rain dries up before we wake up). The error of the cancer test (the 9% false-positive rate) is substituted with an unreliable sprinkler. Due to some electric malfunction, it goes off roughly once in ten nights (9%). 

So one day, we look out the window, see the grass is wet, and want to figure out what was the probable cause. Did it rain (parallel: does the individual has cancer) or was it just the sprinkler again (parallel: false-positive test result)? The answer to that question obviously depends on the frequency of rain in our area (parallel: the base rate = the prevalence of the disease) - because we already know the frequency of sprinkler misbehavior. If it rains usually once in 100 days (parallel: 1% of women has breast cancer) and the sprinkler goes off once in 10 days (parallel: 10% false-positive), then the probability of the rain having caused the wetness is roughly ten times smaller than that of the sprinkler.

What we grasp easily in the second example and fail to notice in the first one is the role of the sprinkler (parallel: test-error = false-positive rate). There are multiple ways to test positive (having the disease and the error in the test result), and the likelihood of each depends on their respective frequencies.

I'm not sure what makes the difference. Maybe false-positivity is just a trickier concept than the sprinkler - which is a mundane, material thing. The physical cause of the positive result in the former case is not tangible. Or maybe just the difficulty of filtering out relevant pieces of the data we get. Knowing only that of 100 women only 1 has the disease but 9 will test positive falsely would give us a good estimation, we don't even need to know the sensitivity of the test.

I mentioned Bayesian before, and it's time to give the theoretical formula for the kind of reasoning we followed above. This is called Bayesian inference, and it's captured by the following equation.




In plain English, it means that if we have a number of hypotheses (H1=rain, H1=sprinkler) to explain a certain event/evidence (e=wet grass), and we know the prior probability of each of those hypotheses (1% for rain, 10% for sprinkler), and also the probability of the event assuming the hypothesis to be true (if it rained, the probability of seeing wet grass in 90% - this can be the same for the sprinkler), then we can calculate the probability that a specific hypothesis (either the rain or the sprinkler) was the cause of the event - and not the others.

Applying the symbols to our first case, this is what they mean:

H - our hypothesis that one has the disease
e - positive test (evidence)
P(H) = probability of having the disease before testing = 0.01 (1%). Since one in a hundred women has it
P(e|H) = probability of testing positive if you have the disease = 0.9 (90%)
P(e) = probability of testing positive, whatever the reason is. 

P(e) = ∑ P(e|Hi) * P(Hi) = (probability of true-positive result) * (probability of having cancer)
+ (probability of false-positive result) * (probability of NOT having cancer)
= 0.9 * 0.01 + 0.09 * 0.99 = 0.0981 (9,8%)

The above formula gives us the probability that the person has the disease (our hypothesis is true) given the evidence of a positive test result

P(H|e) = 0.9 * 0.01 / 0.0981 = 0.0917, which is the same result we got by running the thought experiment at the beginning.