Advanced Statistics and Probability

Which of the following is an arithmetic mid-value?
- mean
What is the numerical measure of average variability of a data set around the
mean – standard Deviation
Which type of statistics helps us understand how the data looks like without
giving much information on analysis of the data – Descriptive Statistics
How many different ways can four team members from the group of fifteen be
lined up for a photograph – 1365
Your team has 10 members and you need 3 of them for an app development.
How many possible combinations are there – 120
If a class consists of 20 males and 8 females, what is the probability of drawing

4 females without replacement –(8/28) …*(5/25)
In a hospital, 10% of patients have liver disease and 5% are alcoholics. Among
those diagnosed with liver disease, 7% are alcoholics. If the patient is an
alcoholic, what is their chances of having liver disease? – 0.14
A watch manufacturer has two factories (FA, FB) and 60% of their watches are
made at FA. It is known that 10% of them are made at FA and 15% made at FB
are defective. What is the probability that a selected defective watch was
manufactured at FB? – 0.05
Central limit theorem and central tendency are same things – false
An essential component of the Central Limit Theorem is that – All
Identify the variables that are continuous or discrete – time , weight
continuous Color Country Discrete
What are the characteristics of normal distribution – mean lies at the center of
the distru
When the Standard Deviation in a Normal Distribution is higher, which of the
following is true? – peak is lower
There may be times when data is supposed to fit a normal distribution, but
does not. Which of the following could be reasons for this? – Outliers and
small sample size
Which of the following can be modelled using Poisson distribution – Number

of accidents
Which of the following conditions are satisfied by Poisson random variable? –
Number of successes in two intervals is dependent
In which of these examples, binomial distribution can be used? – probability of

getting cetain number of questions
Which of the following characteristics correspond to a Binomial Distribution?
– all
Any statement whose validity is tested on the basis of a sample is called:

statistical
The significance level is the risk of: - Rejecting H0 when H0 is correct
If a finding is statistically significant one must also interpret the data, calculate
an effect size indicator, and make an assessment of practical significance – true
After a clinical trial, it is concluded that both drugs A and B are equally
effective. What type of a hypothesis is this – Null
The probability of rejecting the null hypothesis when it is true is called: - Level
of significance
If you reject H0 but H0 is true, what type of error has occurred? – Type I
Assuming innocence until proven guilty, a Type I error occurs when an
innocent person is found guilty – true
A passing student is failed by an examiner, it is an example of – Type II error
Which variable represents the actual Type I error – P - Value
A Type I error is also known as a – False Positive
The shape of the t-distribution depends upon the: - Degrees of freedom

Which of the following is not the purpose of student's t-distributions – when
modelling
Which of the following is not the purpose of using chi-square distributions –

To test how closely data follows
What of the following is not a characteristic of F-distribution curve - the

curve is left skewed
A statistical test used to compare 2 or more group means is known as – one

way analysis
A post hoc test is – ANOVA
The ANOVA is a statistical test that is used to compare how many group
means – Two or more
The analysis of variance is a statistical test that is used to compare how many
group means – two or more
Identify which of the following steps would not be included in hypothesis
testing – Eliminate all Outliers
A failing student is passed by an examiner, it is an example of – Type II error
The use of the laws of probability to make inferences and draw statistical
conclusions about populations based on sample data is referred to as –
Inferential Statistics
The p-value in statistical significance testing should be used to assess how
strong a relationship is. For example, if relationship A has a p=.04 and
relationship B has a p=.03 then you can conclude that relationship B is
stronger than relationship A – false
Confidence interval become narrow by increasing the – Degree of Freedom
A door alarm works in 72 out of 100 cases and surveillance camera works in
68 out of 100 cases. What is the probability of effective screening techniques
keeping in mind that these two methods can be used together – 0.49
Any hypothesis which is tested for the purpose of rejection under the
assumption that it is true is called – Null
An advertising agency wants to test the hypothesis that the proportion of
adults in a country who read a Sunday Magazine is 25 percent. The null
hypothesis is that the proportion reading the Sunday Magazine is – Equals
25%
In which examples could binomial distribution be used – Modelling the number
A good way to get a small standard error is to use a –large sample
A statistician calculates a 95% confidence interval for Mean when Standard
Deviation is known. The confidence interval is Rs.18000 to Rs.22000, the
amount of the sample mean is – 20000
The dividing point between the region where the null hypothesis is rejected
and the region where it is not rejected is said to be – Critical Value
Before We Start the Journey!

Understanding the concept of multivariate data analysis requires some fundamentals
like:
 What is Univariate data analysis?

 What is Bivariate data analysis?
Univariate Data Analysis - Explained!

"Use of only one variable to describe the data is known as Univariate data analysis."
Consider an example of analyzing only the age factor of human civilization living in a

colony and summarize the data as follow:
Name Age
Ram __21
Sandy __34
Jack __26
Here we considered the variable "age" only.
Bivariate Data Analysis - Explained!

"Use of two variables to describe the data and finding out the relationship between them
is known as bivariate data analysis."
Bivariate Data Analysis - A Puzzle

"How you will predict the thermal expansion of an iron rod due to known change in the
temperature? Assume that expansion depends only on the temperature."
Let us try to solve this scenario with bivariate data analysis in which we measure two
elements based on data observation.
 Since we will tag variables as x and y, let one observation made be represented
as (x,y).
Let,
x = Temperature variation in atmosphere
y = Expansion / Contraction in the material due to temperature variation.

Puzzle Unpuzzled!
In tabular form, the observations can be represented as:
From the observation, we are trying to analyse the thermal expansion of the Iron rod
with the change in temperature.
Why Multivariate Data Analysis?

In the previous scenario, we saw how the temperature factor alone was affecting the length of
the iron rod. In real world, a lot of factors like density, volume, molecular defects can alter the
expansion rate.
Whenever data is generated on a large scale, a need arises to learn something

meaningful from it as we have many variables to handle. This led to the development in
the field of statistics to meet following requirements:
 Recognising pattern within the data.

 Identifying several hidden rules and trends in the data.
 Identifying the relationship between several variables/ group of variables of the
data.
"Multivariate data analysis" provides powerful features with the help of which we can
analyse the above discussed factors.
Multivariate Statistics
Loading image..
Many times we deal with statistical problems which have many variables envolved in the
study. This could even get complex to thousand of variables!
Multivariate statistics deals with simultaneous observations and analysis on several
variables and relations between them. The main purpose is to find out the way in which they
interact with each other.
What is Multivariate Data Analysis?

Multivariate data analysis is one of the application of Multivariate Statistics. It is all about
the technique of analyzing several variables together.
It consists mainly of three phase which we will explore in details:
 Categorization of variables
 Dimensionality reduction
 Cause-effect relationship
What is Multivariate Data Analysis?

Multivariate data analysis is one of the application of Multivariate Statistics. It is all about
the technique of analyzing several variables together.
It consists mainly of three phase which we will explore in details:
 Categorization of variables
 Dimensionality reduction
 Cause-effect relationship
Let's Dive into the Phases!

Categorization of variables:
This is done to facilitate running computations in the data in a very easy manner.
Dimensionality reduction:
Reducing the complexity and number of variables without loosing the properties of the
variables is known as dimensionality reduction.
Cause-effect relationship:
The effects put upon the value of the variable due to another variable present during the
analysis is termed as Cause-effect relationship.
What is a Multivariate Random Variable?

A multivariate random variable is a set of unknown variables. The value of variable is
unknown because of its non occurence or the value is still unclear.
For example, let there be a cube of known dimensions like length, width and
height from which we will find the unknown dimension volume. Then this multivariate
random variable can be described as:
V = (L*W*H)
Multivariate Statistics - Applications

Which fields are using multivariate statistics ?
 It is being widely used in Pharmaceuticals to study several drugs together.

 In Medical science, it is used to study several readings from human body
together.
 Climate and weather study requires analyzing several variables together.
Applications are seamless, you will explore them as you know more about the subject.
Simpson as an Amalgamation Paradox!

This paradox is also known as amalgamation paradox because data trend is
present when analysis is made separately on the groups but the trend no more
exists when we analyse the groups of data in combined form!
Simpson’s Paradox
Sometimes, having a multivariate analysis produces a misleading result because of some
lurking variables that is left unanalysed but seriously affects the relationship in between
analysed variables.
Hence, lurking variables are those unanalysed variables not taken into consideration during
data analysis but seriously affects the result of analysis!
Lurking Variable!
"Lurking variable" takes its name from the word lurk which means remaining hidden
waiting for something to happen!
Hence, it is a kind of variable which remains hidden during the analysis of data but its
presence seriously affects the relationship between other variables. This makes lurking
variables to be identified and analysed in detail.
Lurking Variable - an example!

Let us take the case of iron rod which we heated! In bivariate data analysis, we were
constrained to only two variables. We studied same scenario in mulitvariate data analysis
and found that several other factors were controlling the expansion behaviour.
But this is not the end of the story!
A situation might arise when we find the difference in the expansion trend if we include the
hidden variable in the analysis. Here, the hidden variable is the factor that the Iron rod was
preheated sometime earlier also or not. If the iron rod was preheated earlier, then there will
be less expansion for the same temperature as compared to the normal expansion at that
temperature.
Dependent and Independent Variables

Before you start discussing about several methods of multivariate analysis, take a minute
at the example to look what factors separates different type of analysis.
"Suppose a scientist started experimenting on a new kind of bike. He made such an
arrangement so that throttling the bike would accelerate it but at the same time it would cool
down the engine's temperature by some degree! He named the variable denoting the degree
of throttle as t. Also, the gear value would put the effect on the engine's temperature. Hence,
he named the variable denoting the gear value as g."
"The variable c denoted the temperature of the engine. He noted down the corresponding
temperature for each degree of throttle. Finally, he found that increase in the value of t and
decrease in the value of g resulted in the decrease of c."
Dependent and Independent Variables

(continued)
Dependent variables refers to those variables of the experiment whose variation is analysed.Here, c was the dep
Classification of Multivariate Analysis Methods

Based on the dependencies of variables, there are two kinds of analysis methods:
 Multivariate analysis in which relationship exist between a dependent

variable and independent variable/s . For example: Partial least squares
regression, Multiple Regression Analysis, Multiple discriminant analysis etc.
 Multivariate analysis in which no defined relationship exist in between

dependent variable/s and independent variable/s . For example: Cluster
analysis etc.
endent variable.
Independent variables refers to those variables which acts as an input in the experiment. It is
changed to different value to test its impact on the dependent variable! Here, t was the
independent variable.
Before we Dive in!

Before we know about Partial least
square regression, following stuffs needs an explanation:
 What is regression analysis?

 What is least square regression analysis?
Regression Analysis - A Scenario!

Let us consider a scenario:
A boy is planning to make a paper aeroplane that he wants to send to his neighbour! He did
an extensive research and noted down several factors that could decide the trajectory of the
paper plane i.e. wing span, wind speed, weight, angle of inclination, humidity etc. Still he is
unable to decide which factor will have the greatest impact and which factor the least! What
do you think could be the solution?
Regression Analysis - Decoded!

Regression analysis can be thought of a mathematical technique of sorting and
deciding which of the variables in the analysis puts how much impact on the
result! It also describes how the variables interact with each other.
In the case of the boy with paper aeroplane, we have independent variables as wing

span, wind speed, weight, angle of inclination and humidity which he feels could
have the impact on the result he wants to achieve i.e. trajectory. Hence, the
factor trajectory acts as the dependent variable in the analysis.
What is Least Squares Regression?

Least square means fitting the regression line by minimizing the sum of squares of the
distance between the actual and predicted values.
Problems like estimating the manufacturing cost, given total numbers of units to be
manufactured can be handled with least square regression analysis.
We find the least square regression lines by finding residuals and the slope of the
line. Residuals are the error in the line which we use to model the relationship that arises
due to non linearity of the data on the chart.
What is Partial Least Squares Regression?

Partial least square regression technique is used for prediction of dependent variables
from large number of independent variables.
The goal of Partial least square regression is to determine dependent variables from
independent variables (also known as predictors). In statistical terms, it can be said that
we predict Y from X.
Once the reduction to a smaller set is accomplished, least square regression is done

on the fresh components. This technique is helpful when we have collinear predictors/
independent variables.
Principal component analysis is a type of Multivariate data analysis for reducing large
number of corelated variables into set of values of un-corelated variables. It is also known as
Principal mode of variation.
An Example
Let us discuss Principal component analysis with an example:
A research firm decides to perform Principal component analysis on a Mars rover being
designed by them! Following Charecteristics are studied :
A. Speed and acceleration

B. Recharge time
C. Power consumtion
D. Solar recharge time
E. Brake power and deceleration
Since it is difficult to analyse all the variables at once, they want to sort them to a smaller
number of uncorrelated variables. Hence, they find that charecteristics like A & E can be
studied as a single component "Velocity". Similarly, B,C & D can be studied as "Battery
life". Now we have only two components to analyse. This is the main goal of Principal
component analysis.
Ocean in the Capsule!

Principal component analysis preserves the essence of original data while compressing it just
like the television which instead of having two dimensional screen, can represent a three
dimensional data without information loss!
Principal Components Visualized!

Before we start discussing about principal components, lets try to look at the concept
hidden behind it.
You already know about dimensions, right? Yes..we are talking about the same 1-D, 2-D
and 3-D! Now let us take a look how it leads to principal components with the help of an
example!
Plotti ng Points in 1-D, 2-D & 3-D

Let us study the scenario of distance, speed & volume:
 Distance is a scalar quantity and can be represented on a single number line.
>Distance (in km) - {1,4,3}
 Speed can be represented on a 2-D plot as product of distance and time.
>Distance (in km) - {4,7,8,12,10,13} `|`

>Time (in hr) - {2,3,4,5,6,7}
 Volume can be represented on a 3-D plot as product of length, width and height.
>Length (in m) - {2,4,6} `|`
>Width (in m) - {2,4,6} `|`
>Height (in m) – {2,4,6}
Multivariate data analysis helps us to__ - both of these

Multivariate data analysis is application of__ all of the options
What is multivariate statistics?- all the options
Independent variables refers to those variables__- which acts
Use of only one variable to describe the data is known as__ univariate
Key to Treasure!
"Future is hidden in the past". Hence, we will explore some basics that is necessary to
understand the CDF:
 Discrete and Continuous random variables

 Probability mass function
What are Random Variables?

 Random variables are possible numerical values of the outcome of an experiment!
 It is denoted by X.
 It is also known as stochastic variable.
A random variable X for an experiment with:
A as sample space and,
B as real number value, can be represented as
Discrete Random Variables

Let us take an example of rolling two dice together. Then the outcomes (sample space)
will be:
X = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5),
(2,6), (3,1), (3,2), (3,3), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,4), (4,5),
(4,6), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5),
(6,6)}
Hence,
Number of times when "6" appears = {(1,6), (2,6), (3,6), (4,6), (5,6), (6,6)}
or, N(appearance of 6) = 6 times
So, N is a discrete variable.
Continuous Random Variables

Continuous random variables can take infinite number of values in a given experiment.
Hence the name continuous!
For example, if the height growth of a plant is studied, then the height can take infinite
number of values within an interval.
It is a function giving the probabilities of discrete random variables.

Probability Mass Function - Continued
For example, let X be the discrete random variable and its range be like:
Then, probability of the event is represented as:
Probability mass function is also known as the probability

distribution or Probability density function.
Sum of all the probabilities will equal 1.
An Example for Continuous Case!

An advertisement that displays pizza as weighing 0.25 kg can never be exactly 0.25 kg!
Hence, a randomly ordered pizza may weigh 0.23 kg or 0.27 kg.
What is the probability that a randomly ordered pizza weighs between 0.20 and
0.30 kg? In the terms of probability, if X denote the weight of a randomly
selected pizza in kg then what is P(0.20 < X < 0.30)?
Why Cumulative Distribution Function (CDF)?

What would we do if we had find P(X ≤ i) where i is the provided value upto which we
have to find the probability! For these kinds of situation, we take help of cumulative
distribution function.
CDF for Discrete Random Variables

For discrete random variables X, Cumulative distribution function is a
function F(y)telling the probability of "X is less than or equal to y".
Hence,
F(y) = P(X≤y)
or,
Let us take an example of tossing two coins together. Then sample space will be:
{(H,H), (H,T), (T,T), (T,H)}
Now according to the definition, if we would like to calculate the probability of

occurrence of at most one head i.e.P(X≤1) for "X is less than or equal to 1". Then:
P(X≤1) = 1/4+1/4+1/4 = 3/4
In this case, CDF is 3/4.

The cumulative distribution function for continuous random variables is the more natural
function as most of the time, our dataset is continuous in nature! Hence, we replace the
summation "∑" by Integral "∫".
CDF for continuous random variables -

continued
Hence, a CDF for continuous random variables can be represented as:
Summary
We learnt about Cumulative distribution function for discrete distribution as well as for
continuous distribution.
Now we know that we can write CDF as:
F(x)=p[X≤x]=α
for discrete distribution:
and for continuous distribution, CDF can be written as:
The Basics
Don't think much about the title "Kernel Density Estimation" now!
We will explore the basics that are prerequisite for "KDE" like:
 What is parameter?
 What is estimation?
 What is probability density function?
 What is density estimation?
 What is non-parametric density estimation?
What is Parameter?
A parameter is a measurable quantity telling something useful about the population! Ex:
mean, median, mode, standard deviation etc.
Note: Do not confuse parameter with variables!
Number of variable is decided by us during the experiment whereas, number of

parameters remains constant.
What is Parameter?
A parameter is a measurable quantity telling something useful about the population! Ex:
mean, median, mode, standard deviation etc.
Note: Do not confuse parameter with variables!
Number of variable is decided by us during the experiment whereas, number of

parameters remains constant.
What is Estimation?
In simple words, estimation is related with the estimation of parameters. It is some
conclusion that we can draw from the population.
Estimation can be subdivided as:
 Point estimation
 Interval estimation
Point estimation:
Point estimation is the estimation can be represented as single value.
The single value is knows as "statistic" which represents the best conclusion for the
unknown parameter.
Interval estimation:
Interval estimation is the estimation can be represented with two numbers which acts as the
interval of the maybe values of the unknown parameter.
Density Estimation - Explained!
Density estimation can be defined as the construction of probability density

function which is at first unknown to us by the help of observed data points
that are the samples.
Density estimation is a statistical process of the estimation of probability density

functionof a population present in the observation set.
"Density estimation is like filling the missing gaps!"

What is Non Parametric Density Estimation?
In Nonparametric estimation, we make no assumptions and hence no associated

parameters. Since no parametric assumption is involved, the density estimation would
involve estimation of infinite number of parameters!
There is no Central limit theorem in Nonparametric estimation.
"Non parametric estimation can be thought of as an infinite-parametric estimation."

Kernels are like small functions known as "window functions" whose value can be found only
within the interval. Outside the interval, its value becomes "zero".
It is used in non parametric density estimation for kernel density estimation to to

estimate random variables density functions!
Kernel Density Estimation- An Introduction

Kernel density estimation is one of the implementation of non - parametric density
estimation to estimate the probability density function directly from the data.
In some cases, it is termed as Parzen–Rosenblatt window.
Kernel Density Estimation - Explained!

The Histogram
We can estimate probability densities in simplest way on the histogram. The histogram is
constructed by making sub intervals known as "bins". Whenever there is a new data in
the sub interval, one bin is added on the top. Size of the bins maybe set as 1 which is
constant during the plotting process.
The point where this estimation process lacks is that the plot is not smooth and also, it
depends on the width of bins.
The Better Way!
When the block in the histogram is centered over the data points to refine the
histogram, then it is known as box kernel density estimate where we are using the
discontinuous kernel for developing the histogram. It may be represented as in the
above plot.
Amalgamation paradox is also known as__Simpson

What is done when a new data in the sub Interval is added? –one bin is added
to TOP
If the area under the PDF curve is zero, then__probabailty = 0
Probability mass function is also known as__Probability density function
Least number of coordinates required to showcase a point is__dimension
Lurking variable remains__ -Hidden
Principal component analysis reduces__large of number corrected
What is the drawback of using Kernel density estimation's Histogram method?
–
What is Prior Probability?

As the name suggests, we can say something in general about the event before we have
data related to it.
When the probability distribution of the uncertain quantity is done in the absense of
data i.e. lack of the evidence, then it is called prior probability distribution or
simply prior.
For example, Finding the probability distribution of persons attending the wedding party
after the invitation is sent can be called as prior probability.
Prior Probability - An Example!

"Suppose, we are interested to study people suffering from Pseudobulbar affect - a
kind of emotional disorder in which people can not control crying or laughing, among
the age group of 20-50 years old."
Let,
Event X = Pseudobulbar affect
M = Males and F = Females
N = Total number of people being studied

Here in this case the prior probabilities are:
P(M) = 0.3
P(F) = 0.7
The Prerequisite - Likelihood Function and

Normalizing Constant !
Likelihood function:
Likelihood function is defined as function of the parameter.
Let there be a parameter having the value θ.
Then, Likelihood function:
L(θ | X) = P(X | θ)
Normalizing constant:
Normalizing constant reduces any probability function to a probability density

function having total probability equal to "1".
What is Posterior Probability?

Posterior probability of the random event can be said as the conditional probabilty of the
event after the evidence is taken into the consideration! It is the probability distribution of
the unknown quantity which depends on the evidence from the experiment.
If the parameter θ and the evidence X of the experiment is given, then the posterior
probability can be defined as p(θ|X).
The, posterior probability can be written as:

where,
p(θ) is the probability distribution function,
p(x|θ) is the likelihood and,
p(x) is the probability of evidence X.
Posterior Probability - An Example Puzzle

"Let us study the same case of Pseudobulbar affect of the previous example."
Let,
Event X = Pseudobulbar affect
M = Males and F = Females
N = Total number of people being studied
Following prior probabilities are given:
P(M) = 0.3
P(F) = 0.7
P(X | M) = 0.7 (70% of males suffers from Pseudobulbar affect)
P(X | F) = 0.4 (40% of females suffers from Pseudobulbar affect)

Hence, we are interested in finding:
The probability of males and the probability of females given X!
This can be represented as:
Posterior probabilities P(M | X) and P(F | X).
Puzzle Solved!
Now let's consider the definition -
According to the definition, we know:
p(X | θ) i.e. p(X | M) in our case and,
p(θ) i.e. p(M) in our case.
We need to find out p(X). Hence,
P(X) = P(X | M) P(M) + P(X | F) P(F)
or, P(X) = (0.7 * 0.3) + (0.4 * 0.7)
or, P(X) = 0.21 + 0.28
or, P(X) = 0.49
Puzzle Unpuzzled!
Putting the value in the equation:
p(M | X) = (0.7 * 0.3) / 0.49 = 0.43
and, p(F | X) = (0.4 * 0.7) / 0.49 = 0.57
Hence, our posterior probabilities p(M | X) and p(F | X) are found!
What is Markov Process?

Markov process is a random process in which past event not affects future event if we have
the knowledge of present event! Hence, the prediction about the future process can be done
by knowing the present state.
Important points about Markov chain are:

 The process is named after Andrei Markov.
 Markov process depends on whether the time event is discrete or continuous.
What is Random Walk?
Random walk is that kind of event in which we can not predict the outcome in advance as it
is the result of series of random movements! It is the path created by random events in
succession.
For example, the path travelled by the water molecules is a random walk. A possible
random walk is shown above in the picture.
Here we can see transition between three states A, B and C. Markov chain finds the
probability of this transition from one state to the other state. It is to be noted that the
states can transit / hop to itself. For example, the state A can hop to state B or C.
State Acan also hop to state A.
Here, in the three state representation, the probability of transitioning will always be 0.33.
Stochastic Process and Markov Chain
Let there be a sequence of some events. If we want to deduce the outcome at any stage of the
sequence of the event, then it will depend on some probability. This probability distribution
over the path/ sequence is called Stochastic process.
Now let this stochastic process be finite i.e. number of events in the sequence are finite and
outcome at any stage should depend only on the outcome of the previous stage, then such a
stochastic process is termed as Markov chain!
4 of 6
The Scenario of People Riding in Train!

Suppose we are studying the metro train ridership in the city. Following analysis is made:
 30% of population already riding in the given year discontinues their riding next
year.
 20% of population not riding the train starts riding the train next year.
 5000 people ride the train
 10,000 people do not ride the train
We have to find the distribution of riders and non riders in the next year.
Scenario Analysed!
We should determine the number of people riding the bus next year. According to the
data:
Population who ride train next year = b1
or, b1 = 5000(0.7) + 10000(0.2) = 5500
Also,
Population who don’t ride the bus next year = b2
or, b2 = 5000(0.3) + 10000(0.8) = 9500

Then the Matrix equation:
This is an example of Markov process.
Things that you learnt!

You started the Journey by exploring Multivariate data analysis and there you learnt
about the following topics:
 Cumulative distribution function

 Kernel density estimation
 Prior probability
 Posterior probability
 Markov process and Markov chain
Hope you had a nice time completing the journey on Advanced Statistics and probability!
Features of probability density function are__ - all the options
What is kernel? – all the options
What is posterior probability? –conditional prob
Which estimation can be represented by a single value? – Point estmination
is an example of Multivariate analysis in which relationship exist between a

dependent variable and independent variable/s. – Partial

Advanced Statistics and Probability

Uploaded by

Advanced Statistics and Probability

Uploaded by

Which of the following is an arithmetic mid-value?

If a class consists of 20 males and 8 females, what is the probability of drawing

Which of the following can be modelled using Poisson distribution – Number

In which of these examples, binomial distribution can be used? – probability of

Any statement whose validity is tested on the basis of a sample is called:

The shape of the t-distribution depends upon the: - Degrees of freedom

Which of the following is not the purpose of using chi-square distributions –

What of the following is not a characteristic of F-distribution curve - the

A statistical test used to compare 2 or more group means is known as – one

Before We Start the Journey!

 What is Univariate data analysis?

Univariate Data Analysis - Explained!

Consider an example of analyzing only the age factor of human civilization living in a

Bivariate Data Analysis - Explained!

Bivariate Data Analysis - A Puzzle

x = Temperature variation in atmosphere

y = Expansion / Contraction in the material due to temperature variation.

Why Multivariate Data Analysis?

Whenever data is generated on a large scale, a need arises to learn something

 Recognising pattern within the data.

What is Multivariate Data Analysis?

It consists mainly of three phase which we will explore in details:

What is Multivariate Data Analysis?

It consists mainly of three phase which we will explore in details:

Let's Dive into the Phases!

What is a Multivariate Random Variable?

Multivariate Statistics - Applications

 It is being widely used in Pharmaceuticals to study several drugs together.

Simpson as an Amalgamation Paradox!

Lurking Variable - an example!

But this is not the end of the story!

Dependent and Independent Variables

Dependent and Independent Variables

Classification of Multivariate Analysis Methods

 Multivariate analysis in which relationship exist between a dependent

 Multivariate analysis in which no defined relationship exist in between

Before we Dive in!

 What is regression analysis?

Regression Analysis - A Scenario!

Regression Analysis - Decoded!

In the case of the boy with paper aeroplane, we have independent variables as wing

What is Least Squares Regression?

What is Partial Least Squares Regression?

Once the reduction to a smaller set is accomplished, least square regression is done

A. Speed and acceleration

Ocean in the Capsule!

Principal Components Visualized!

Plotti ng Points in 1-D, 2-D & 3-D

 Distance is a scalar quantity and can be represented on a single number line.

>Distance (in km) - {1,4,3}

 Speed can be represented on a 2-D plot as product of distance and time.

>Distance (in km) - {4,7,8,12,10,13} `|`

>Length (in m) - {2,4,6} `|`

>Width (in m) - {2,4,6} `|`

>Height (in m) – {2,4,6}

Multivariate data analysis helps us to__ - both of these

 Discrete and Continuous random variables

What are Random Variables?

A as sample space and,

B as real number value, can be represented as

Discrete Random Variables

Number of times when "6" appears = {(1,6), (2,6), (3,6), (4,6), (5,6), (6,6)}

or, N(appearance of 6) = 6 times

So, N is a discrete variable.

Continuous Random Variables

It is a function giving the probabilities of discrete random variables.

Then, probability of the event is represented as:

Probability mass function is also known as the probability

Sum of all the probabilities will equal 1.

An Example for Continuous Case!

Why Cumulative Distribution Function (CDF)?

CDF for Discrete Random Variables

{(H,H), (H,T), (T,T), (T,H)}

Now according to the definition, if we would like to calculate the probability of

P(X≤1) = 1/4+1/4+1/4 = 3/4

In this case, CDF is 3/4.