Solve for One Possible Outcome – With All Data Provided
In this chapter, we use Venn diagrams to visualize our problems so that they are easier to understand and solve. If reading the term Venn diagram makes you shudder, Wikipedia provides a good overview to get you up to speed. This article at Stanford University is also helpful, but you really don’t need to worry. We explain everything step-by-step.
When searching for a probability we are sometimes given all of the components or ingredients, and simply need to A) Identify each one B) Label each one, and C) Plug each one into Bayes’ formula. Hands down these are the easiest questions to apply Bayes’ formula too. In this section, we will be dealing with these types of probabilities using Venn diagrams. Venn diagrams work great for these by helping us visually understand the question.
Here are a few tips as you approach each question:
- Try not to get overwhelmed or lost in the question. Always begin by writing down what you want to discover.
- Try not to confuse P(A|B) with P(B|A). Always double check your numbers! This is a common error (technically called a Base Rate Fallacy).
- Remember to take your time. There is no need to rush.
Let’s say that you are at work one day and have just finished lunch. You suddenly feel horrible and find yourself lying down and within a few minutes begin to panic. Wasn’t your friend at work recently sick with the flu? What if you have it? Will you have to cancel your big trip next week?
You have a headache and sore throat, and you know that people with the flu have the same symptoms roughly 90% of the time. In other words, 90% of people with the flu have the same symptoms you currently have. Does this mean you have the flu?
Wanting to gain a little more information you roll over, grab your phone and search Google. You find a reputable article that says that only 5% of the population will get the flu in a given year. Ok. So, the probability of having the flu, in general, is only 5%.
You then spot one more statistic that says 20% of the population in a given year will have a headache and sore throat at any given time. After reading this you throw your phone down and curl up in your seat. You’re completely overwhelmed and more confused than you were to start. Do you have the flu? What should you do?
Let’s break this scenario apart.
First, let’s remember what Bayes’ Theorem does: it helps us update a hypothesis based on new evidence. In this scenario, your hypothesis is that you have the flu and your evidence is your headache and sore throat. Now, after seeing that 90% of people with the flu have your symptoms, many of us would stop and conclude that we have the flu. We would look at the 90% statistic and sigh, resolved to the fact that we likely have the flu. This reaction is very common and called Base Rate Fallacy or Base Rate Neglect. The CIA has a nifty article on this, and it explains how people often gravitate towards the easiest information available when making decisions.
So, we are left wondering. Is our assumption based on the 90% statistic right?
This is where Bayes’ Theorem comes in and helps us have a clearer picture. By using the theorem, we are forced to look at all data and update our hypothesis with new evidence. In the scenario, we are given two additional pieces of information that can help us come to a more precise probability of having the flu given our symptoms.
Let’s review all the information we do have before moving on.
- We know people with the flu have a headache and sore throat roughly 90% of the time.
- We know the probability of having the flu, in general, is only 5%.
- We know that 20% of the population in a given year will have a headache and sore throat at any given time.
To start, we always need to determine what we are wanting to find. We want to know what the probability is of having the flu given our current symptoms. Now that we know what we are solving for, we are going to tackle this problem in two ways. Depending on how you learn you may prefer one over the other, and that is ok. People learn differently, and that is why we included both options.
Example 1.1 Visualize the Problem
To visualize the problem, we’ll draw two circles and merge them into a Venn diagram.
Circle #1: The area inside this circle represents all possible outcomes. In this example, the area represents all people who could get sick with the flu – in other words, the entire population. The shaded circle labeled “A” represents the 5% of the population who have the flu. Let’s step back now. What does this exactly mean? Within the circle is the entire population, and there are two possible outcomes for the population: people can have the flu, or not have the flu.“A” is an event, and its probability is 5%. This probability is represented in our formula as P(A).
Circle #2: The area inside this circle also represents all possible outcomes. In this instance it represents all people who could have the symptoms – this is the entire population. The shaded circle labeled “B” represents the 20% of the population that does have the symptoms. What this means is that within the entire circle there are two possible outcomes: people have the symptoms or do not have the symptoms. “B” is an event, and its probability is 20%. This probability is represented in our formula as P(B).
Circle #3: In this circle, we have combined both events “A” and “B” – and this is where the magic happens!
Here is a quick breakdown of how you can read this:
- The white area inside this circle represents people who do not have either the flu or the symptoms.
- The area where only Circle A covers shows us people who only have the flu.
- The area where only Circle B covers shows us people who only have the symptoms.
Now, take a look at Circle B and see where it overlaps with Circle A. This is what we are really interested in! This is our question from Step 1 in visual form. We want to know the probability P(A|B) of having the flu given our symptoms. This probability is found where both events occur together and are called an intersection. Another way to look at it is like this: if we are in area B, what is the probability we are also in area AB (where A and B overlap)?
With both circles now merged, we can visually see our question and what we are trying to solve for. Although we won’t be solving the question with a Venn diagram, the diagram does help us visualize what we are trying to understand. If P(A) is the probability of you having the flu, and P(B) is the probability of you having your symptoms, what is the probability of you having both? While we don’t yet know the actual answer, we can clearly see what we are trying to solve for.
Example 1.2 Plugging In Bayes’ Formula and Solving
Now let’s solve the problem by using Bayes’ formula. For the sake of ease, we’ll begin by re-stating what we want to find.
Step 1: Determine what you want to find. Again, we are solving for the same thing we did above with the Venn diagram but are restating this for clarity. We want to know what the probability is of having the flu given our current symptoms.
Step 2: Write the above as a formula. Let’s translate what we are solving for into the formula. In other words, we’ll bring the language of Step #1 above into the formula.
Here is Bayes’ formula:
Now, let’s translate with what we are solving for.
Step 3: Find each ingredient and label it. From the scenario, we know the following: *We have changed the ingredients provided in the scenario from percents into decimals. We will do this every time before we begin to plug the ingredients into the formula.
- P(A) – In our formula, this ingredient is represented as P(Flu) and answers the question: What is the probability of you having the flu? This number is .05.
- P(B|A) – In our formula, this ingredient is represented as P(Symptoms | Flu). This number is .9.
- P(B) – In our formula, this ingredient is represented as P(Symptoms) and answers the question: What is the probability of you having the symptoms? This number is .2.
Step 4: Plug each ingredient into the formula and solve.
Conclusion: So, after plugging each ingredient into the formula our answer is 22.5%. We can conclude from this that if you have a sore throat and headache you only have a 22.5% probability of having the flu. Wow! Now, remember what Bayes’ Theorem does: it helps us update a hypothesis based on new evidence. Originally, we thought that the probability of having the flu was as high as 90%! This belief was based on our latching on to P(B|A). However, our answer P(A|B) is very different! The 22.5% we ended with is more accurate than the 90% probability we started with.
This problem is a fantastic illustration of the power that Bayes’ Theorem can give us when facing tough uncertainties. It is also a tweaked example of a questionnaire given to 1000 gynecologists. In the study, only 21% of gynecologists chose the correct answer while almost 50% chose the equivalent of our 90%! If you’d like to read more on this, Cornell University has a fantastic article.
Continue on to Chapter 5: Bayes’ Theorem Breathalyzer Example.
- Home: BayesTheorem.net
- Chapter 1: Bayes’ Theorem for Dummies
- Chapter 2: Bayes’ Theorem Formula: A Simple Overview
- Chapter 3: Bayes’ Theorem Examples to Get You Started
- Chapter 4: Bayes’ Theorem Flu Example
- Chapter 5: Bayes’ Theorem Breathalyzer Example
- Chapter 6: Bayes’ Theorem Peacekeeping Example
- Chapter 7: No P(B) Provided and What Are You Looking For?
- Chapter 8: No P(B) Provided – Bayes’ Theorem Flu Example
- Chapter 9: Bayes’ Theorem in Real Life Use: Search and Rescue
- Chapter 10: Bayes’ Theorem in Real Life Uses: Spam Filtering
- Chapter 11: Bayes’ Theorem History
- Chapter 12: Books on Bayes’ Theorem
- Chapter 13: Articles on Bayes’ Theorem
- Chapter 14: Videos on Bayes’ Theorem