0% found this document useful (0 votes)

528 views

Bayesforbeginners

Bayes' Rule is a way of calculating conditional probabilities. It can be fiendishly difficult for beginners to understand and apply. This article was written for my 4th year Psychometrics class. It is released under a Creative Commons license: Attribution-NonCommercial-ShareAlike 2.5.

Uploaded by

api-284811969

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

528 views

Bayesforbeginners

Uploaded by

api-284811969

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Bayes For Beginners

Chris Westbury
Department of Psychology, P220 Biological Sciences Bldg., University of Alberta,
Edmonton, AB, T6G 2E9, Canada.
E-mail: chrisw@ualberta.ca

This manuscript was written for my 4th year Psychometrics class. It is released under a Creative Commons
license: Attribution-NonCommercial-ShareAlike 2.5. You are free to copy, distribute, display, and (God
forbid!) perform the work under the following conditions:
- By Attribution. You must attribute the work to the author.
- Noncommercial. You may not use this work for commercial purposes.
- Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting
work only under a license identical to this one.
* For any reuse or distribution, you must make clear to others the license terms of this work.
* Any of these conditions can be waived if you get permission from the copyright holder.
Your fair use and other rights are in no way affected by the above.
This is a human-readable summary of the Legal Code that can be obtained from:
http://creativecommons.org/licenses/by-nc-sa/2.5/legalcode

Acknowledgements Thanks to Gail Moroschan for feedback on an earlier version of this

manuscript.

Bayes For Beginners

Abstract
Bayes Rule is a simple way of calculating conditional probabilities. Although a great
deal has been written about the relevance of Bayes Rule in clinical settings, it is difficult
to find a single article that is both mathematically comprehensive and easily accessible to
students and professionals with clinical backgrounds. This article tries to fill that void, by
laying out the nature and implications of Bayes Rule in a way that requires little or no
background in probability theory. It builds on Meehl & Rosens classic (1955) paper,
Antecedent probability and the efficiency of psychometric signs, patterns, or cutting
scores, by laying out algebraic proofs that they simply allude to, and by providing
extremely simple and intuitively accessible examples of the concepts that they simply
assumed their reader understood.

Bayes For Beginners

Bayes Rule is a way of calculating conditional probabilities. Although it is quite

simple in its conception, Bayes Rule can be fiendishly difficult for beginners to
understand and apply. In part this is because it forces us to confront and overcome strong
biases in our natural way of thinking and in part it is because it is not easy to be specific
about exactly where Bayes Rule will apply, or how it may apply in any particular case.
The purpose of this paper is to present and explore the simplest forms of Bayes Rule,
and to understand how it may be used in practical reasoning, especially in clinical
settings.
A great deal has been written about the importance of conditional probability in
diagnostic situations. However, there are, so far as I know, no papers that are both
comprehensive and simple. Most writing on the topic, particularly in probability
textbooks, assumes too much knowledge of probability for diagnosticians, losing the
clinical reader by alluding to simple proofs without giving them. Many introductory
psychometrics textbooks err on the other side, either ignoring conditional probability
altogether, or by considering it in such a cursory manner that the reader has little chance
to understand what it is and why it matters. This paper is intended to fill the perceived
void between simplicity and thoroughness. The exposition provided here assumes only
the most basic understanding of non-conditional probability, and provides both concrete
examples and simple algebraic proofs of some implications of Bayes Rule that are
clinically relevant. It may be reasonably considered an interpretive guide to perhaps the
best paper on Bayes Rule for the clinician, Meehl & Rosens classic (1955) paper,
Antecedent probability and the efficiency of psychometric signs, patterns, or cutting
scores. Most students of psychology find this paper very difficult to understand, in part

Bayes For Beginners

because the authors do make mathematical claims without providing any detailed
explanation of where they came from. The present paper frames Meehl & Rosens claims
with a more basic introduction than they give, and fills in some simple proofs that they
only allude to.
The first section consists of a general introduction to understanding conditional
probabilities. The second section introduces Bayes Rule itself, in an historical and
mathematical setting. The third section lays out some implications of Bayes Rule that
follow as a direct result of its definition.

Conditional Probabilities
Conditional probabilities are those probabilities whose value depends on the value
of another probability. Such probabilities are ubiquitous. For example, we may wish to
calculate the probability that a particular patient has a disease, given the presence of a
particular set of symptoms. The probability of disease may be more or less close to
certain, depending on the nature and number of symptoms. Or we may wish to calculate
the probability that a given hypothesis is true, given a diverse set of evidence (say, results
from several experiments) for and against it. Hypothesis testing is just one way of
assigning weight to belief. Conditional probabilities also come into play when we wish to
decide how much confidence we wish to assign to a given belief.
A very simple example of conditional probability will elucidate its nature.
Consider a 6/49 lottery, in which players are invited to choose 6 out of 49 numbers, and
win a jackpot if their six numbers are chosen. The probability that any particular six
numbers will be chosen is 1 in (49 x 48 x 47 x 46 x 45 x 44), or 1/10,496,787,840. These

Bayes For Beginners

clearly are not very good odds: if you entered a 6/49 lottery every day from your
eighteenth to your eightieth birthday, you would still only have about one chance out of
464,000 of winning the lottery.
To understand conditional probability, consider the question: How likely is that
you would win the jackpot in a 6/49 ticket if you didnt have a ticket? It should be
obvious that the answer is zero you certainly could not win if you didnt even have a
ticket. So the probability of winning a 6/49 lottery is really a conditional probability,
where your odds of winning are conditional on the number of tickets you have purchased.
If you have zero tickets, then you have no chance of winning. With one ticket, you have
1/10,496,787,840 chances to win. With two tickets, your odds will be twice as good, and
you will have 2/10,496,787,840 chances of winning.
We symbolize conditionality by using a vertical slash | , which can be read as
given. Then the odds of winning the 6/49 with one ticket could be expressed as
p(Winning | One ticket). There are many keywords in a problems definition that may
(but need not necessarily) suggest that you are dealing with a problem of conditional
probability. Phrases like given, if, with the constraint that, assuming that, under
the assumption that and so on all suggest that there may be a conditional clause in the
problem.
One thing that sometimes confuses students of probability is the fact that all
probability problems are really conditional. Consider the simple probability question:
What is the probability of getting a head with a coin toss? The question implicitly
assumes that the coin is fair (that is, that heads and tails are equally probable), and should
really be phrased What is the probability of getting a head with a coin toss, given that the

Bayes For Beginners

coin is fair? Non-conditional probability problems conceal their conditional clause in the
background assumptions that either explicitly or implicitly limit the domain in which the
probability calculation is supposed to apply.
This observation sheds light on what conditionality actually does. A condition
always serves exactly this role: to limit the domain in which the non-conditional portion
of the question is supposed to apply. When you are asked What is the probability of
getting a head with a coin toss? you are supposed to understand that we are only
considering fair coins. When you are asked What is the probability that you have disease
X, given that you have symptom Y?, you are supposed to understand that now the
probability calculation only applies to those people who do have symptom Y. An
appropriate way of thinking about conditional probability is to understand that a
conditional limits the number and kind of cases you are supposed to consider. You can
think of the vertical slash as meaning something like ignoring everything to which the
following constraint does not apply. So What is the probability of getting a head with a
coin toss, given that the coin is fair? means What is the probability of getting a head
with a coin toss, ignoring every coin to which the following statement does not apply:
The coin is fair.
Bayes Rule and other methods of solving conditional probability questions are
simply mathematical means of limiting the domain across which a calculation is being
computed. To see that this is so, consider the following simple question:
Three tall and two short men went on a picnic with four tall and four short
woman. What is P(Tall | Female), the probability that a person is tall, given that
the person is female?

Bayes For Beginners

The solution to this problem may be immediately obvious, but it is worth working
through a few ways of solving it. These are all formally the same, though they may
appear to be different.
The first way is just to turn the question into a very simple non-conditional
question that we know how to solve. Following the discussion above, the question can be
re-phrased to say What is the probability that a person is tall, ignoring everyone who is
not a woman? If we ignore the men, we have a really simple question, viz. Four tall and
four short woman went on a picnic? What is the probability that a woman who went on
the picnic was tall? This is simple (that is, non-conditional) probability. Like any simple
probability question, it can be solved by dividing the number of ways the outcome of
interest (being tall) can happen by the number of ways any outcome in the domain
(being a woman) can happen. So: 4 tall women / (4 tall woman + 4 short woman) = 0.5
probability that a person on the picnic was tall, given that she was a woman.
A formally identical way of solving the same problem can be seen by drawing a 2
x 2 table such as the following:
GENDER /
HEIGHT
TALL
SHORT

FEMALE MALE
4

The condition Given that she was a female means that we can simply ignore the
rightmost column of this box, the males, and act as if the question about the probability of
being tall only applied to the leftmost column, the woman.

Bayes For Beginners

Here comes the tricky part. This diagram makes clear what the question is asking:
What is the ratio of people who are both tall and female (top left cell) to people who are
female (sum of left column)? We can re-state this and solve the problem in a third way by
asking: What is the ratio of the probability that a person is both female and tall to the
probability that a person is female? To see why, consider the concrete example again.
There were thirteen people on the picnic. Since 4 were tall females, the probability of
being a tall female is 4/13. Since 8 were females, the probability of being female was
8/13. The ratio of people who were both tall and female to people who were female is
therefore 4/13 / 8/13, or 4/8, or 50%. The reason this may seem tricky is that here we
consider the domain as a whole all people who went on the picnic and then take the
ratio of two probabilities in that domain.
If you understand this third method of calculating the conditional probability, then
you will understand Bayes Rule. Bayes Rule is a way to automatically pick out this
very same ratio: the ratio of the probability of being in the cell of interest (in this case, the
cell tall and female picnickers) to the probability of being in the sub-domain of interest
that is specified by the conditional clause (in this case, woman, a subset of all the people
who went on the picnic).
Before we look at how the math works, lets introduce the rule itself.

Bayes Rule
Bayes Rule is very often referred to Bayes Theorem, but it is not really a
theorem, and should more properly be referred to as Bayes Rule (Hacking, 2001). In
either case, it is so-called because it was first stated (in a different form than we consider

Bayes For Beginners

here) by Reverend Thomas Bayes in his Essay towards solving a problem in the doctrine
of chances, which was published in the Philosophical Transactions of the Royal Society
of London in 1764. Bayes was a minister interested in probability and stated a form of his
famous rule in the context of solving a somewhat complex problem involving billiard
balls that need not concern us here.
Bayes Rule has many analogous forms of varying degrees of apparent
complexity. This paper concerns itself almost entirely with the simplest form, which
covers the cases in which two sets of mutually exclusive possibilities A and B are
considered, and where the total probability in each set is 1. At the end of the paper we
will briefly examine how this most simple case is just a specific case of a more general
form of Bayes Rule. The simplest case covers many diagnostic situations, in which the
patient either has or does not have a disease (possibility set A) and either has or does not
have a set of symptoms (possibility set B). For such cases, Bayes Rule can be used to
calculate P(A | B), the probability that the patient has the disease given the symptom set.
Bayes Rule says that:
P(A | B) = P(B | A) P(A) / P(B)
P(A) is called the marginal or prior probability of A, since it is the probability of A prior
to having any information about B. Similarly, the term P(B) is the marginal or prior
probability of B. Because it does depend on having information about B, the term P(A |
B) is called the posterior probability of A given B. The term P(B | A) is called the
likelihood function for B given A.

Bayes For Beginners

In the third solution to the example above, we solve for the probability of being
female, given that you are tall, by considered the ratio of those who were tall and female
to those who were female:
P(Tall | Female) = P(Tall & Female) / P(Female)
This suggests that Bayes Rule can also be stated in the following form:
P(A | B) = P(A & B) / P(B)
From this it should be evident, by equating the numerators of the two equations above,
that:
P(A & B) = P(B | A) P(A)
This is true by the definition of &. Let us try to understand why this is so, by again
considering the three tall and two short men went on a picnic with four tall and four short
woman. We have already convinced ourselves that P(Female & tall) is 4/13, because
there are 4 people in the cell of interest and thirteen people in the problems domain.
Lets see how the definition agrees with this answer. The definition above says that
P(Female & Tall) = P(Tall | Female)P(Female). P(Tall | Female), the probability of a
picnicker being tall given that she is female, is 4/8. P(Female) is 8/13, because eight of
the thirteen people on the picnic are females. 4/8 multiplied by 8/13 is 4/13.
Note that it is equally correct to write that:
P(A & B) = P(A | B) P(B)
In other words:
P(B | A)P(A) = P(A | B) P(B)
Lets see why using the same example. Now we will see that P(Female & Tall) =
P(Female | Tall)P(Tall). P(Female | Tall), the probability of a picnicker being female

Bayes For Beginners

given that he or she is tall, is 4/7, because there are four tall females and seven tall people
altogether. P(Tall) is 7/13, because seven of the thirteen people on the picnic are tall. 4/7
multiplied by 7/13 is 4/13.
If you go back and look at the 2x2 table above, you should be able to understand
why these two calculations of P(A & B) must be the same. The first calculation picks out
the cell of tall females by column. The second picks it out by row. It doesnt matter if you
concern yourself with females who are tall or tall people who are females in the end you
must get to the same answer if you want to know about people who are both tall and
female. A tall female person is also a female tall person.
So now we have
P(A | B) = P(B | A)P(A)/P(B) = P(A | B)P(B)/P(B)
Although either form will give the same answer, the first form is the canonical form of
Bayes Rule, for a reason that should be obvious: because the second form contains the
same element on the right, P(A | B), as the left element that we are trying to calculate. If
we already know P(A | B), then we dont need to compute it. If we dont know it, then it
will not help us to include it in the equation we will use to calculate it.
Bayes Rule can be easily derived from the definition of P(A | B), in the following
manner:
1.) P(A | B) = P(A & B) / P(B)
2.) P(B | A) = P(A & B) / P(A)

[ By definition ]
[ By definition ]

3.) P(B | A) P(A) = P(A & B)

4.) P(A | B) P(B) = P(B | A) P(A)
5.) P(A | B) = P(B | A) P(A) / P(B)

[ Multiply 2.) by P(A) ]

[ Substitute 1.) in 3.)]
[ Bayes Rule]

It might seem at first glance that Bayes Rule cannot be a very helpful rule,
because it says that to solve a conditional probability P(A | B) you have to know another

Bayes For Beginners

conditional probability P(B | A). However, Reverend Bayes insight was that in many
cases the second possibility is knowable when the first is not. In diagnostic cases where
were are trying to calculate P(Disease | Symptom) we often know P(Symptom | Disease),
the probability that you have the symptom given the disease, because this data has been
collected from previous confirmed cases. In scientific cases where we want to know
P(Hypothesis | Result), the probability that a hypothesis is true given some relevant
result, we may know P(Result | Hypothesis), the probability that we would obtain that
result given that the hypothesis is true- this is often statistically calculable, as when we
have a p-value.

Implications of Bayes Rule

Bayes Rule is very simple. However, its implications are often unexpected. Many
studies have shown that people of all kinds even those who are trained in probability
theory- tend to be very poor at estimating conditional probabilities. It seems to be kind of
innate incompetence in our species. As a result, people are often surprised by what
Bayes Rule tells them.
Let us consider an example given in Meehl & Rosen (1955), from which much of
the discussion in this section is drawn.
A particular disorder has a base rate occurrence of 1/1000 people. A test
to detect this disease has a false positive rate of 5% that is, 5% of the time that it
says a person has the disease, it is mistaken. Assume that the false negative rate is
0% the test correctly diagnoses every person who does have the disease. What is

Bayes For Beginners

the chance that a randomly selected person with a positive result actually has the
disease?
When this question was posed to Harvard University medical students, about half said
that the answer was 95%, presumably because the test has a 5% false positive rate. The
average response was 56%. Only 16% gave the correct answer, which can be computed
with Bayes Rule in the following manner:
Let: P(A) = Probability of having the disease = 0.001
P(B) = Probability of positive test
= Sum of probabilities of all independent ways to get a positive test
= Probability of true positive + probability of false positive
= (True positive base rate x Percent correctly identified) + (Negative
Base Rate x Percent incorrectly identified)
= (0.001 x 1) + (0.999 x 0.05)
= 0.051
P(B | A) = Probability of positive test given disease = 1
Then: P(A | B) = P(B | A) P(A) / P(B)
= (1 x 0.001) / (0.051)
= 0.02, or 2%
Although the test is highly accurate, it in fact gives a correct positive result just 2% of the
time. How can this be? The answer (and the importance of Bayes Rule in diagnostic
situations) lies in the highly skewed base rates of the disease. Since so few people
actually have the disease, the probability of a true positive test result is very small. It is
swamped by the probability of a false positive result, which is fifty times larger than the
probability of a true positive result.
You can concretely understand how the false positive rate swamps the true
positive rate by considering a population of 10,000 people who are given the test. Just
1/1000th or 10 of those people will actually have the disease and therefore a true positive
test result. However, 5% of the remaining 9990 people, or 500 people, will have a false

Bayes For Beginners

positive test result. So the probability that a person has the disease given that they have a
positive test result is 10/510, or 2%.
Many cases are subtle. Consider another case cited by Meehl & Rosen (1955).
This involved a test to detect psychological adjustment in soldiers. The authors of the
instrument validated their test by giving it to 415 soldiers known to be well-adjusted, and
89 soldiers known to be mal-adjusted. The test correctly diagnosed 55% of the maladjusted soldiers as mal-adjusted, and incorrectly diagnosed only 19% of the adjusted
soldiers. Since the true positive rate (55%) is much higher than the false positive rate
(19%), the authors believed their test was good. However, they failed to take into account
base rates. Meehl & Rosen did not know P(Maladjusted), the probability that a randomlyselected soldier was maladjusted, but they guessed that it might be as high as 5%. With
this estimate, we can use Bayes Rule as follows:
Let P(M) = Probability of being maladjusted = 0.05, by assumption
Let P(D) = Probability of being diagnosed as being maladjusted.
= Probability of true positive + probability of false positive
= (True positive base rate x Percent correctly identified) + (Negative
Base Rate x Percent incorrectly identified)
= (0.55*0.05) + (0.95 * 0.19)
= 0.208
P(D | M) = Probability of being diagnosed, given maladjustment.
= 0.55, as found by the authors.
P(M | D) = Probability of maladjustment given diagnosis as maladjusted
= P(D | M)P(M)/P(D)
[ Bayes Rule ]
= (0.55)(0.05)/0.208
= 0.13 or 13%
When base rates are taken into account, the tests true positive rate is just 13%, not 55%
as claimed. The test is still better than guessing that everyone is maladjusted. With that
strategy 5% of positive diagnoses would be correct. However, note that the tests
diagnosis of maladjustment is much more likely to be wrong (87% probability) than right

Bayes For Beginners

(13% probability).
Of course we prefer to make diagnoses that are more likely to be right than
wrong. We can state this desire more formally by saying that we want the fraction of the
population that is diagnosed correctly to be greater than the fraction of the population that
is diagnosed incorrectly. Mathematically this leads to a useful conclusion in the following
manner:
Fraction diagnosed correctly > Fraction diagnosed incorrectly
Fraction diagnosed incorrectly / Fraction diagnosed correctly < 1
Let D = Diseased and S = Selected (~ means not)
P(D & ~S) / P(D & S) < 1
[ Subsitute symbols ]
P(D | ~S)P(~S) / P(D | S) P(S) < 1
[ By definition of & ]
P(D | ~S) / P(D | S) P(S) < 1 / P(~S)
[ Divide by P(~S) ]
P(D | ~S) / P(D | S) < P(S) / P(~S)
[Multiply by P(S) ]
In English this can be expressed as:
False positive rate / True positive rate < Positive base rate / Negative base rate
We need the ratio of positive to negative base rates to be greater than the ratio of the false
positive rate to the true positive rate, if we want to be more likely to be right than wrong.
This can be a handy heuristic because it allows us to calculate the minimum
proportion of the population we are working with that needs to be diseased in order for
our diagnostic methods to be useful. In the example above, the ratio of false positive to
true positive rates is 0.19 / 0.55 or 0.34. This means that the test can only be useful in
the sense of having a positive diagnosis that is more likely to be true than false when it
is used in settings in which the ratio of the maladjusted people (positive base rate) to the
number of people who are not maladjusted (negative base rate) is at least 0.34.
Again we can consider another example from Meehl & Rosen (1955). Imagine
that you have a test that correctly identifies 80% of brain-damaged patients, but also

Bayes For Beginners

misidentifies 15% of non-brain-damaged people. The calculation above says that this test
will only be reliable if the ratio of brain-damaged to non-brain-damaged people is greater
than 0.15 / 0.80, or about 0.19. If we are using the test in a setting which has a lower ratio
of brain damaged people, we will run in to the problem described above, in which we
find that the base rates have made it more likely that we are wrong than right when we
make a diagnosis.
Note that the requirement given by this heuristic does not mean that the true
population base rate must be that high it is sufficient for the base rate of the
subpopulation to which the test is exposed to be high enough. If the test is used in settings
(such a mental clinic to which front-line physicians refer) that have higher
concentration of maladjusted subjects than the general population as a result on nonrandom sampling of that population, then the test may be useful in that setting, even
though it would not be reliable if subjects were randomly selected from the population as
a whole.
This ability to skew true diagnosis rates in a favorable direction by pre-selecting
subjects has important implications. In most of the examples we have considered so far,
we have assumed low base rates. The implications of a conditional clause, such as a the
probability of that a person has a disease given a positive tests results, become more
severe as the base rates moves away from 0.5. The further the base rate is from 50/50, the
further it takes the posterior probability P(A | B) from the simple hit rate, given by
taking the ratio of the true positive rate to the positive diagnoses rate (the sum of the true
and false positive rate).
Mathematically, we can see this by expanding the canonical form of Bayes Rule

Bayes For Beginners

given above, just as we did with the example of the maladjusted soldiers above:
Let P(C) = Probability of belonging to the diagnostic category
Let TP = True positive rate = P(C & Diagnosed)
Let FP = False positive rate = P(~C & Diagnosed)
Let B = Base rate of the diagnostic category
Let P(D) = Probability of being diagnosed as being maladjusted.
= Probability of true positive + probability of false positive
= (True positive base rate x Percent correctly identified) + (Negative
Base Rate x Percent incorrectly identified)
= (B * TP) + ((1- B) * FP)
P(C | D) = Probability of belonging to the category given diagnosis
= P(D | C)P(C) / P(D)
[ Bayes Rule ]
= (TP * B) / (B * TP) + ((1- B) * FP)
[ Substitute P(D) ]
= (TP * 0.5) / (0.5 * TP) + (0.5 * FP)
[ Let the base rate B = 0.5 ]
= TP / TP + FP
[ Divide by 0.5 ]
This is a degenerate case of Bayes Rule, since the conditional collapses to the
simple unconditional probability that is given by the ratio of the probability of getting
diagnosed correctly to the probability of getting diagnosed at all, whether correctly or
not. One way of understanding what is happening in this case is to note that the true and
false positive rates are sampling equally from the population. When this is so, we dont
need to bother to weight their respective contributions to the conditional probability of
belonging to the category given a diagnosis.
A concrete example may make this interpretation more clear. Consider the
conditional probability of having blue eyes, given that you are female. Since eye color is
not a sex-linked character, the conditional is the same for both those who are in the group
of interest (females) and those who are not (males). You may be able to intuit in this case
that the conditional is therefore irrelevant: that is, the probability of being blue-eyed
given that you are female is just the same as the probability of being blue-eyed.
This degenerate case of exactly equal base rates with and without the character of

Bayes For Beginners

interest may occur only rarely, but the general principle illustrated by this case is of wider
relevance for the reason note above: the further the positive and negative base rates are
from being equal, the greater the difference between the conditional probability that
depends on that base rate and the simple probability given by the ratio of the probability
of getting diagnosed correctly to the probability of getting diagnosed at all (that is, the
ratio of the true positives to the sum of the true and false positives).
Intuitively, this makes sense for the following reasons. Insofar as a disease is less
common, it becomes more likely that a larger portion of the positives are false positives,
as in the case considered above that bamboozled so many of the Harvard medical
students. By the same token, insofar as a disease is more common, it becomes more likely
that many of the negative diagnoses are false. At some point as base rates increase, they
may come to exceed the ability of the test to identify them, rendering the test worse than
guessing, as discussed above.
Bayes Rule may be easily generalized to incorporate multiple pieces of evidence
bearing on a single belief, hypothesis, or diagnosis, or to incorporate multiple pieces of
evidence bearing on multiple beliefs, hypotheses, or diagnoses.
The simplest way to extend Bayes Rule is to note that the posterior probability
may depends on more than one piece of evidence. This is not an extension at all, since we
noted at the beginning what was given in a conditional may be a set of evidence rather
than a single piece. However, it is worth emphasizing this point, since so many of the
examples considered in this paper have treated the conditional as a single piece of
evidence. Given a belief, hypothesis, or diagnosis H, and a single relevant piece of
evidence E1, we have seen how to compute some new probability P(H | E1). If we get a

Bayes For Beginners

new piece of relevant evidence E2, that is independent from E1, we could as easily
calculate P(H | E2) for the same H. However, that calculation would not take into account
the fact that we already attached a certain level of probability to H because of the prior
evidence A. To get that, we need to calculate P(H | E1&E2).
For example, imagining trying to guess a single card from a deck. If you know it
is red, then you have P(Guess | Red) = 1/26, because there are 26 red cards in a deck. If
you know it is a face card, you have P(Guess | Face) = 4/13, because there are four face
cards per suit of 13 cards. If you know it is both a face card and red, you need to calculate
P(Guess | (Face & Red) = 1/8, because there are eight cards that are both red and a face
card.
A slightly more complex way of generalizing Bayes Rule comes about when
there is more than one competing hypothesis, diagnosis, or possibility to be considered.
In that case, evidence brought to bear in favor of any single hypothesis needs to be
considered in the context of the domain of all other competing hypotheses. In fact the
simple forms of Bayes Rule we have considered in this paper does exactly this. We have
seen that P(H | E) = P(E | H) P(H) / P(E), where H is some hypothesis, diagnosis, or
possibility, and E is some evidence bearing on it . We have also seen in several examples
that the denominator P(E) to be concrete, the probability of getting a positive diagnosis
can be expanded into sum of (the true positive rate * the positive base rate) and (the false
positive rate * the negative base rate). The two elements in this sum are just two different
hypotheses about where a positive diagnosis could have come from: it could either have
come from a mistaken diagnosis or a true diagnosis. If there was also a possibility of a
deliberately fraudulent diagnosis, we would have to add that in to our calculation of the

Bayes For Beginners

probability of getting a positive diagnosis, as a third term in P(E).

The generalization of Bayes Rule to handle any number of competing hypotheses
simply makes explicit that the denominator in Bayes Rule is the domain of possible
kinds of evidence that could explain H- or said another way, the domain of possible ways
the evidence under consideration could come about. The generalized expression is:
P(Hn | E) = P(E | Hn)P(Hn)/ [P(E | Hn-1) P(Hn-1)]
Hn is a current hypothesis, and E is, as ever, some new piece of evidence, such as a
diagnostic sign. The denominator, as above in the specific cases we have considered, is
simply the sum of all ways the diagnostic sign might occur, howsoever that may be.

Conclusion
The goal of this paper has been to introduce conditional probability in general,
and Bayes Rule in specific, in a manner that is both comprehensive and accessible. If I
have succeeded, then P(Understanding Bayes Rule | Reading this paper) will be high. I
hope that it is.

Bayes For Beginners

Hacking, I. (2001). An introduction to probability and deductive logic. Cambridge,

England: Cambridge University Press.
Meehl, P. & Rosen A. (1955). Antecedent probability and the efficiency of psychometric
signs, patterns, or cutting scores. Psychological Bulletin, 52(3):194-216. As
reprinted in: Meehl, P. (1977). Psychodiagnosis: Selected Papers. New York,
USA: W.W. Norton & Sons.

Content Delivery Networks (Lecture Notes in Electrical Engineering)
No ratings yet
Content Delivery Networks (Lecture Notes in Electrical Engineering)
429 pages
Arts 1262-1274
No ratings yet
Arts 1262-1274
11 pages
n4rr4t1v3 sk11ll5 f0r v1d30g4m3s
0% (1)
n4rr4t1v3 sk11ll5 f0r v1d30g4m3s
5 pages
Back Tracking
No ratings yet
Back Tracking
36 pages
Chapter Bit Manipulation
No ratings yet
Chapter Bit Manipulation
14 pages
Salmivalli 1996
No ratings yet
Salmivalli 1996
15 pages
463505
100% (1)
463505
311 pages
Drivers
100% (1)
Drivers
5 pages
Ds and Algo
No ratings yet
Ds and Algo
2 pages
Pre-Placements Checklist
100% (1)
Pre-Placements Checklist
9 pages
WK 1 Probability Distributions
No ratings yet
WK 1 Probability Distributions
43 pages
Notes
No ratings yet
Notes
422 pages
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
No ratings yet
27 SVM Interview Questions (ANSWERED) To Master Before ML & Data Science Interview - MLStack - Cafe
25 pages
APEXPreCalculus PDF
No ratings yet
APEXPreCalculus PDF
164 pages
Download Full Discrete Mathematics Special Indian Edition Semyour Lipschutz PDF All Chapters
100% (8)
Download Full Discrete Mathematics Special Indian Edition Semyour Lipschutz PDF All Chapters
77 pages
XV. Anomaly Detection
0% (1)
XV. Anomaly Detection
4 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Econ275 (Stanford) PDF
No ratings yet
Econ275 (Stanford) PDF
4 pages
ALX Data Analytics Program Description
No ratings yet
ALX Data Analytics Program Description
6 pages
Portfolio Optimization Using Particle Swarm Optimization
No ratings yet
Portfolio Optimization Using Particle Swarm Optimization
6 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Teaching Bayesian Method
No ratings yet
Teaching Bayesian Method
20 pages
ISyE 6669 Homework 15 PDF
No ratings yet
ISyE 6669 Homework 15 PDF
3 pages
Agile Scrum
No ratings yet
Agile Scrum
14 pages
1 Markov Chains: Indian Institute of Technology Bombay
No ratings yet
1 Markov Chains: Indian Institute of Technology Bombay
15 pages
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
21 pages
Minimum Vertex Cover Problem
No ratings yet
Minimum Vertex Cover Problem
2 pages
Rsa - TCR PDF
No ratings yet
Rsa - TCR PDF
89 pages
Coding Theory: A Bird S Eye View: Continued Block Codes: Basics
No ratings yet
Coding Theory: A Bird S Eye View: Continued Block Codes: Basics
32 pages
Week 5 Programming Assignment: (Https://swayam - Gov.in)
No ratings yet
Week 5 Programming Assignment: (Https://swayam - Gov.in)
12 pages
Decision Theory
No ratings yet
Decision Theory
40 pages
Toptal Alternatives and Competitors To Hire Developers
No ratings yet
Toptal Alternatives and Competitors To Hire Developers
18 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Numpy
No ratings yet
Numpy
15 pages
Hardware Security A Hands-on Learning Approach 1st Edition by Swarup Bhunia, Mark M Tehranipoor ISBN 0128124784 9780128124789 - The ebook is available for quick download, easy access to content
No ratings yet
Hardware Security A Hands-on Learning Approach 1st Edition by Swarup Bhunia, Mark M Tehranipoor ISBN 0128124784 9780128124789 - The ebook is available for quick download, easy access to content
52 pages
David Forsyth - Probability and Statistics For Computer Science (2018, Springer)
No ratings yet
David Forsyth - Probability and Statistics For Computer Science (2018, Springer)
368 pages
Central Algorithmic Techniques: Iterative Algorithms
No ratings yet
Central Algorithmic Techniques: Iterative Algorithms
177 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
Pattern Recognition and Machine Learning Errata and Additional Comments
0% (1)
Pattern Recognition and Machine Learning Errata and Additional Comments
7 pages
Let Us C Y Kanitkar 01 - First Few Pages
No ratings yet
Let Us C Y Kanitkar 01 - First Few Pages
7 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
Complete Download Introduction to Probability Detailed Solutions to Exercises 1st Edition David F Anderson Timo Sepp Al Ainen Benedek Valkó PDF All Chapters
100% (1)
Complete Download Introduction to Probability Detailed Solutions to Exercises 1st Edition David F Anderson Timo Sepp Al Ainen Benedek Valkó PDF All Chapters
41 pages
Bayesian Learning Methods
No ratings yet
Bayesian Learning Methods
57 pages
Exam Data Structure
No ratings yet
Exam Data Structure
2 pages
MongoDB Mongoosess
No ratings yet
MongoDB Mongoosess
31 pages
Introduction To GNU Octave: Updated To Current Octave Version by Thomas L. Scofield
No ratings yet
Introduction To GNU Octave: Updated To Current Octave Version by Thomas L. Scofield
18 pages
Teorija Na Igra - Beleski
No ratings yet
Teorija Na Igra - Beleski
81 pages
Technical Interview Puzzles
No ratings yet
Technical Interview Puzzles
47 pages
The Matplotlib User's Guide
No ratings yet
The Matplotlib User's Guide
868 pages
A Sampling of Remarkable Groups Thompson s Self similar Lamplighter and Baumslag Solitar Marianna C. Bonanome all chapter instant download
No ratings yet
A Sampling of Remarkable Groups Thompson s Self similar Lamplighter and Baumslag Solitar Marianna C. Bonanome all chapter instant download
53 pages
General Idea of Iterative Models-Spiral Model
No ratings yet
General Idea of Iterative Models-Spiral Model
30 pages
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
Foundations of Logic and Mathematics Applications To Computer Science and Cryptography 1st Edition Yves Nievergelt (Auth.)
100% (15)
Foundations of Logic and Mathematics Applications To Computer Science and Cryptography 1st Edition Yves Nievergelt (Auth.)
84 pages
Week 1
No ratings yet
Week 1
50 pages
Worksheet 01: Software LAN Wi-Fi Attachments Virus Worm Resolves Anti
100% (1)
Worksheet 01: Software LAN Wi-Fi Attachments Virus Worm Resolves Anti
7 pages
Probabilistic Reasoning in Artificial Intelligence
No ratings yet
Probabilistic Reasoning in Artificial Intelligence
7 pages
Don't Overprep Coding Interviews.: For Your
No ratings yet
Don't Overprep Coding Interviews.: For Your
24 pages
Introduction To Algebraic Geometry - Dolgachev PDF
No ratings yet
Introduction To Algebraic Geometry - Dolgachev PDF
198 pages
Exercises of Stochastic Processes
From Everand
Exercises of Stochastic Processes
Simone Malacrida
No ratings yet
Fast Sequential Monte Carlo Methods for Counting and Optimization
From Everand
Fast Sequential Monte Carlo Methods for Counting and Optimization
Reuven Y. Rubinstein
No ratings yet
Beginning C# 3.0: An Introduction to Object Oriented Programming
From Everand
Beginning C# 3.0: An Introduction to Object Oriented Programming
Jack Purdum
No ratings yet
Thinking, The Ruin, Ed. Matthew Gumpert and Jalal Toufic PDF
No ratings yet
Thinking, The Ruin, Ed. Matthew Gumpert and Jalal Toufic PDF
42 pages
Download Complete The Thief of Time Philosophical Essays on Procrastination 1st Edition Chrisoula Andreou PDF for All Chapters
100% (7)
Download Complete The Thief of Time Philosophical Essays on Procrastination 1st Edition Chrisoula Andreou PDF for All Chapters
71 pages
Edward Fitzgerald and Sypnosis of His Rubaiyat of Omar Khayyam
No ratings yet
Edward Fitzgerald and Sypnosis of His Rubaiyat of Omar Khayyam
6 pages
Internship Report
No ratings yet
Internship Report
29 pages
Soros - Open Society
100% (4)
Soros - Open Society
222 pages
Work Energy and Power Class XI
No ratings yet
Work Energy and Power Class XI
17 pages
Course Syllabus
0% (1)
Course Syllabus
2 pages
Honesty Quotes, Sayings About Dishonesty, Lying, Truth
No ratings yet
Honesty Quotes, Sayings About Dishonesty, Lying, Truth
8 pages
Teaching MAPEH On Elementary Grades Syllabus by Jeiar Burgos
No ratings yet
Teaching MAPEH On Elementary Grades Syllabus by Jeiar Burgos
7 pages
TQM Course Content
No ratings yet
TQM Course Content
22 pages
The Gospel of Wealth
100% (3)
The Gospel of Wealth
43 pages
Carroll - The Philosophy of Motion Pictures PDF
No ratings yet
Carroll - The Philosophy of Motion Pictures PDF
250 pages
Alternative Certification 2010
No ratings yet
Alternative Certification 2010
12 pages
The Theory of A Variable Area Flow Meter
No ratings yet
The Theory of A Variable Area Flow Meter
2 pages
Sir Thomas More
No ratings yet
Sir Thomas More
2 pages
Wcip
No ratings yet
Wcip
260 pages
Consumer Influence and Diffusion of Innovations
No ratings yet
Consumer Influence and Diffusion of Innovations
29 pages
Upanishads". They Are As Follows
No ratings yet
Upanishads". They Are As Follows
4 pages
Philip Bayard Crosby: A Presentation By: Abhishek Kumar and Vishakha Chopra
100% (4)
Philip Bayard Crosby: A Presentation By: Abhishek Kumar and Vishakha Chopra
40 pages
Parent-Teen Talk Facilitator's Guide - Lexcode
100% (1)
Parent-Teen Talk Facilitator's Guide - Lexcode
135 pages
Books
No ratings yet
Books
1 page
Ebook On Master Wan Kean Chew
No ratings yet
Ebook On Master Wan Kean Chew
16 pages
Macbeth SQ
100% (1)
Macbeth SQ
2 pages
Sanditon
50% (2)
Sanditon
400 pages
Community Conversations Immigration Toolkit
No ratings yet
Community Conversations Immigration Toolkit
10 pages