Probability Theory

Probability theory
Random Experiment
A random experiment is a physical situation whose outcome cannot be predicted until it is observed.
Sample Space
A sample space, is a set of all possible outcomes of a random experiment.
Random Variables
A random variable, is a variable whose possible values are numerical outcomes of a random
experiment. There are two types of random variables.
1. Discrete Random Variable is one which may take on only a countable number of distinct values
such as 0,1,2,3,4,…….. Discrete random variables are usually (but not necessarily) counts.
2. Continuous Random Variable is one which takes an infinite number of possible values.
Continuous random variables are usually measurements.
Probability
Probability is the measure of the likelihood that an event will occur in a Random Experiment.
Probability is quantified as a number between 0 and 1, where, loosely speaking, 0 indicates
impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is
that the event will occur.
Example
A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes
(“heads” and “tails”) are both equally probable; the probability of “heads” equals the probability of
“tails”; and since no other outcomes are possible, the probability of either “heads” or “tails” is 1/2
(which could also be written as 0.5 or 50%).
Conditional Probability
Conditional Probability is a measure of the probability of an event given that (by assumption,
presumption, assertion or evidence) another event has already occurred. If the event of interest is A
and the event B is known or assumed to have occurred, “the conditional probability of A given B”, is
usually written as P(A|B).
Independence
Two events are said to be independent of each other, if the probability that one event occurs in no
way affects the probability of the other event occurring, or in other words if we have observation
about one event it doesn’t affect the probability of the other. For Independent events A and B below
is true
Example
Let’s say you rolled a die and flipped a coin. The probability of getting any number face on the die is
no way influences the probability of getting a head or a tail on the coin.
Conditional Independence
Two events A and B are conditionally independent given a third event C precisely if the occurrence of
A and the occurrence of B are independent events in their conditional probability distribution given
C. In other words, A and B are conditionally independent given C if and only if, given knowledge that
C already occurred, knowledge of whether A occurs provides no additional information on the
likelihood of B occurring, and knowledge of whether B occurs provides no additional information on
the likelihood of A occurring.
Example
A box contains two coins, a regular coin and one fake two-headed coin (P(H)=1P(H)=1). I choose a
coin at random and toss it twice.
Let
A = First coin toss results in an HH.
B = Second coin toss results in an HH.
C = Coin 1 (regular) has been selected.
If C is already observed i.e. we already know whether a regular coin is selected or not, the event A
and B becomes independent as the outcome of 1 doesn’t affect the outcome of other event.
Expectation
The expectation of a random variable X is written as E(X). If we observe N random values of X, then
the mean of the N values will be approximately equal to E(X) for large N. In more concrete terms, the
expectation is what you would expect the outcome of an experiment to be on an average if you
repeat the experiment a large number of time.
,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,
So the expectation is 3.5 . If you think about it, 3.5 is halfway between the possible values the die
can take and so this is what you should have expected.
Variance
The variance of a random variable X is a measure of how concentrated the distribution of a random
variable X is around its mean. It’s defined as
Probability Distribution
Is a mathematical function that maps the all possible outcomes of an random experiment with it’s
associated probability. It depends on the Random Variable X , whether it’s discrete or continues.
1. Discrete Probability Distribution: The mathematical definition of a discrete probability function,
p(x), is a function that satisfies the following properties. This is referred as Probability Mass
Function.
2. Continuous Probability Distribution: The mathematical definition of a continuous probability

function, f(x), is a function that satisfies the following properties. This is referred as Probability
Density Function.
Joint Probability Distribution

If X and Y are two random variables, the probability distribution that defines their simultaneous
behaviour during outcomes of a random experiment is called a joint probability distribution. Joint
distribution function of X and Y ,defined as
In general if there are n random variables and each can take values v1, v2 … vn different values then
there will be total (v1)^n*(v2)^n*…(vn)^n rows in the table.
Conditional Probability Distribution (CPD)

If Z is random variable who is dependent on other variables X and Y, then the distribution of P(Z|X,Y)
is called CPD of Z w.r.t X and Y. It means for every possible combination of random variables X, Y we
represent a probability distribution over Z.
Example
There is a student who has a property called ‘Intelligence’ which can be either low(I_0)/high(I_1).
He/She enrolls to a course, The course has property called ‘Difficulty’ which can take binary values
easy(D_0)/difficult(D_1). And the student gets a ‘Grade’ in the course based on his performance, and
grade can take 3 values G_1(Best)/(G_2)/(G_3)(Worst). Then the CPD P(G|I,D) is as follow
There are a number of operations that one can perform over any probability distribution to get
interesting results. Some of the important operations are as below.
1. Conditioning/Reduction
If we have a probability distribution of n random variables X1, X2 … Xn and we make an
observation about k variables that they acquired certain values a1, a2, …, ak. It means we
already know their assignment. Then the rows in the JD which are not consistent with the
observation is simply can removed and that leave us with lesser number of rows. This
operation is known as Reduction.
2. Marginalisation
This operation takes a probability distribution over a large set random variables and produces a
probability distribution over a smaller subset of the variables. This operation is known as
marginalising a subset of random variables. This operation is very useful when we have large set of
random variables as features and we are interested in a smaller set of variables, and how it affects
output. For ex.
Factor
A factor is a function or a table which takes a number of random variables {X_1, X_2,…,X_n} as an
argument and produces a real number as a output. The set of input random variables are called
scope of the factor. For example Joint probability distribution is a factor which takes all possible
combinations of random variables as input and produces a probability value for that set of variables
which is a real number. Factors are the fundamental block to represent distributions in high
dimensions and it support all basic operations that join distributions can be operated up on like
product, reduction and marginalisation.
Factor Product
We can do factor products and the result will also be a factor. For ex

Probability Theory

Uploaded by

Probability Theory

Uploaded by

Probability theory

2. Continuous Probability Distribution: The mathematical definition of a continuous probability

Joint Probability Distribution

Conditional Probability Distribution (CPD)

You might also like