MLT Assignment 1

ASSIGNMENT-1
1. What are random variables? Explain different types of the random variables.
Ans: Random Variables. A random variable, usually written X, is a variable whose
possible values are numerical outcomes of a random phenomenon. There are two types of
random variables, discrete and continuous.
Discrete Random Variables
A discrete random variable is one which may take on only a countable number of distinct values
such as 0,1,2,3,4,........ Discrete random variables are usually (but not necessarily) counts. If a
random variable can take only a finite number of distinct values, then it must be discrete.
Examples of discrete random variables include the number of children in a family, the Friday
night attendance at a cinema, the number of patients in a doctor's surgery, the number of
defective light bulbs in a box of ten.
Continuous Random Variables
A continuous random variable is one which takes an infinite number of possible values.
Continuous random variables are usually measurements. Examples include height, weight, the
amount of sugar in an orange, the time required to run a mile.
2. Explain Probability Density function with suitable example.
Ans: Probability Density Function
The probability density function is the probability function which is defined for the continuous
random variable. The probability density function is also called the probability distribution
function or probability function. It is denoted by f (x).
Conditions for a valid probability density function:
Let X be the continuous random variable with a density function f (x). Therefore,
Example:
Check whether the given probability density function is valid or not.

The probability density function is,
Here, the function 4x3 is greater than 0. Hence, the condition f (x) ≥ 0 is satisfied.
Consider,
Hence the condition is satisfied.
Therefore, the given function is a valid probability density function.
3. Explain need of Bayesian probabilities with the help of suitable example.

Ans: Bayes' theorem is a mathematical equation used in probability and statistics to calculate
conditional probability. In other words, it is used to calculate the probability of an event based on
its association with another event. The theorem is also known as Bayes' law or Bayes' rule.
Formula for Bayes' Theorem
There are several different ways to write the formula for Bayes' theorem. The most common
form is:
P(A ∣ B) = P(B ∣ A)P(A) / P(B)
where A and B are two events and P(B) ≠ 0
P(A ∣ B) is the conditional probability of event A occurring given that B is true.
P(B ∣ A) is the conditional probability of event B occurring given that A is true.
P(A) and P(B) are the probabilities of A and B occurring independently of one another (the
marginal probability).
Example
You might wish to find a person's probability of having rheumatoid arthritis if they have hay
fever. In this example, "having hay fever" is the test for rheumatoid arthritis (the event).
 A would be the event "patient has rheumatoid arthritis." Data indicates 10 percent of
patients in a clinic have this type of arthritis. P(A) = 0.10
 B is the test "patient has hay fever." Data indicates 5 percent of patients in a clinic have
hay fever. P(B) = 0.05
 The clinic's records also show that of the patients with rheumatoid arthritis, 7 percent
have hay fever. In other words, the probability that a patient has hay fever, given they
have rheumatoid arthritis, is 7 percent. B ∣ A =0.07
Plugging these values into the theorem:
P(A ∣ B) = (0.07 * 0.10) / (0.05) = 0.14
So, if a patient has hay fever, their chance of having rheumatoid arthritis is 14 percent. It's
unlikely a random patient with hay fever has rheumatoid arthritis.
4. Explain the characteristics of normal distribution.

Ans: CHARACTERISTICS OF THE NORMAL PROBABILITY DISTRIBUTION:-
1. A continuous variable - the normal probability distribution reflects the distribution of a

continuous variable, which can receive any numerical value, i.e., whole., numbers (for
example, 101 centimeters), numbers with fractions (for instance, 101.25 centimeters),
positive numbers and negative numbers although there are no negative numbers in our
example.
2. The height reflects a probability - the height of the previous curve and every number
reflects the chances of that number occurring as compared to the other numbers. The
further away we get from the center (either to the left or the right), then the smaller is the
chance of the occurrence.
3. The center is the expectation - the center result reflects the average, and the chance of
getting it is higher than any other number. The reason why the center result is the average
is that the curve is symmetric around the center. This means that for every result to the
right of the center that contributes to increasing the average, there is also a result that is at
the same distance to the left that has an equal chance of occurring, which contributes an
equivalent degree toward decreasing the average.
4. Symmetry - the normal probability distribution is symmetric relative to the average. This
means that the chances of obtaining a result exceeding the average by 10 is equal to the
chance of receiving a result that is smaller than the average by 10.
5. The probabilities are known in advance.
5.Explain Logistic Regression with suitable example.

Ans: Logistic Regression is one of the basic and popular algorithm to solve a classification
problem. It is named as ‘Logistic Regression’, because it’s underlying technique is quite the same
as Linear Regression. The term “Logistic” is taken from the Logit function that is used in this
method of classification.
Logistic Regression Algorithm
Logistic Regression uses Sigmoid function.
An explanation of logistic regression can begin with an explanation of the standard logistic
function. The logistic function is a Sigmoid function, which takes any real value between zero and
one. It is defined as
And if we plot it, the graph will be S curve,
Let’s consider t as linear function in a univariate regression model.
So the Logistic Equation will become
Now, when logistic regression model come across an outlier, it will take care of it.
But sometime it will shift its y axis to left or right depending on outliers positions.
6. Differentiate between Regression and Classification.
Ans: The difference between regression machine learning algorithms and classification machine
learning algorithms sometimes confuse most data scientists, which make them to implement
wrong methodologies in solving their prediction problems.
Andreybu, who is from Germany and has more than 5 years of machine learning experience, says
that “understanding whether the machine learning task is a regression or classification problem is
key for selecting the right algorithm to use.”
Let’s start by talking about the similarities between the two techniques.
Supervised machine learning
Regression and classification are categorized under the same umbrella of supervised machine
learning. Both share the same concept of utilizing known datasets (referred to as training datasets)
to make predictions.
In supervised learning, an algorithm is employed to learn the mapping function from the input
variable (x) to the output variable (y); that is y = f(X).
The objective of such a problem is to approximate the mapping function (f) as accurately as
possible such that whenever there is a new input data (x), the output variable (y) for the dataset
can be predicted.
Here is a chart that shows the different groupings of machine learning:

Unfortunately, there is where the similarity between regression versus classification machine
learning ends.
The main difference between them is that the output variable in regression is numerical (or
continuous) while that for classification is categorical (or discrete).
Regression in machine learning
In machine learning, regression algorithms attempt to estimate the mapping function (f) from the
input variables (x) to numerical or continuous output variables (y).
In this case, y is a real value, which can be an integer or a floating point value. Therefore,
regression prediction problems are usually quantities or sizes.
For example, when provided with a dataset about houses, and you are asked to predict their prices,
that is a regression task because price will be a continuous output.
Examples of the common regression algorithms include linear regression, Support Vector
Regression (SVR), and regression trees.
Some algorithms, such as logistic regression, have the name “regression” in their names but they
are not regression algorithms.
7. Explain Bayes Theorem .

Ans: Bayes' theorem is a mathematical equation used in probability and statistics to calculate
conditional probability. In other words, it is used to calculate the probability of an event based on
its association with another event. The theorem is also known as Bayes' law or Bayes' rule.
Formula for Bayes' Theorem
There are several different ways to write the formula for Bayes' theorem. The most common
form is:
P(A ∣ B) = P(B ∣ A)P(A) / P(B)
where A and B are two events and P(B) ≠ 0
P(A ∣ B) is the conditional probability of event A occurring given that B is true.
P(B ∣ A) is the conditional probability of event B occurring given that A is true.
P(A) and P(B) are the probabilities of A and B occurring independently of one another (the
marginal probability).
Example
You might wish to find a person's probability of having rheumatoid arthritis if they have hay
fever. In this example, "having hay fever" is the test for rheumatoid arthritis (the event).
 A would be the event "patient has rheumatoid arthritis." Data indicates 10 percent of
patients in a clinic have this type of arthritis. P(A) = 0.10
 B is the test "patient has hay fever." Data indicates 5 percent of patients in a clinic have
hay fever. P(B) = 0.05
 The clinic's records also show that of the patients with rheumatoid arthritis, 7 percent
have hay fever. In other words, the probability that a patient has hay fever, given they
have rheumatoid arthritis, is 7 percent. B ∣ A =0.07
Plugging these values into the theorem:
P(A ∣ B) = (0.07 * 0.10) / (0.05) = 0.14
So, if a patient has hay fever, their chance of having rheumatoid arthritis is 14 percent. It's
unlikely a random patient with hay fever has rheumatoid arthritis.
8. Explain Perceptron algorithm .

Ans: Perceptron is a single layer neural network and a multi-layer perceptron is called Neural
Networks.
Perceptron is a linear classifier (binary). Also, it is used in supervised learning. It helps to classify
the given input data.
As you can see it has multiple layers.

The perceptron consists of 4 parts .
1. Input values or One input layer

2. Weights and Bias
3. Net sum
4. Activation Function
The perceptron works on these simple steps
a. All the inputs x are multiplied with their weights w. Let’s call it k.
Fig: Multiplying inputs with weights for 5 inputs
b. Add all the multiplied values and call them Weighted Sum.
Fig: Adding with Summation
c. Apply that weighted sum to the correct Activation Function.
For Example : Unit Step Activation Function.
Fig: Unit Step Activation Function
9. Explain the terms Expectation,probability,covariance,correlation with suitable examples.
Ans: Variance - squaring Expectations to measure change
In mathematically rigorous treatments of probability we find a formal definition that is very

enlightening. The variance of a random variable XX is defined as:
Var(X) = E(X^2) - E(X)^2Var(X)=E(X2)−E(X)2

Which is so simple and elegant that at first it might not even be clear what's happening. In plain
English this equation is saying:
Variance is the difference between when we square the inputs to Expectation and when we
square the Expectation itself.
Covariance - measuring the Variance between two variables
Mathematically squaring something and multiplying something by itself are the same. Because
of this we can rewrite our Variance equation as:E(XX) - E(X)E(X)E(XX)−E(X)E(X)This version
of the Variance equation would have been much messier to illustrate even though it means the
same thing. But now we can ask the question "What if one of the Xs where another Random
Variable?", so that we would have:E(XY) - E(X)E(Y)E(XY)−E(X)E(Y)
And that, simpler than any drawing could express, is the definition of Covariance
(Cov(X,Y)Cov(X,Y)). If Variance is a measure of how a Random Variable varies with itself then
Covariance is the measure of how one variable varies with another.
Correlation - normalizing the Covariance
Covariance is a great tool for describing the variance between two Random Variables. But this
new measure we have come up with is only really useful when talking about these variables in
isolation. Imagine we define 3 different Random Variables on a coin toss: A(\omega) =
\begin{cases}2 & \text{if } \omega \text{ is Heads} \\ 1 & \text{if } \omega \text{ is Tails}
\end{cases}A(ω)={21if ω is Headsif ω is TailsB(\omega) = \begin{cases}20 & \text{if } \omega
\text{ is Heads} \\ 10 & \text{if } \omega \text{ is Tails} \end{cases}B(ω)={2010if ω is Heads
if ω is TailsC(\omega) = \begin{cases}200 & \text{if } \omega \text{ is Heads} \\ 100 & \text{if
} \omega \text{ is Tails} \end{cases}C(ω)={200100if ω is Headsif ω is Tails
Now visualize that each of these are attached to the same Sampler, such that each is receiving the
same event at the same point in the process.
 Expectation,E(X)E(X) , is the outcomes of a Random Variable weighted by their probability.

 Variance is the difference between Expectation of a squared Random Variable and
the Expectationof that Random Variable squared: E(XX) - E(X)E(X)E(XX)−E(X)E(X).
 Covariance, E(XY) - E(X)E(Y)E(XY)−E(X)E(Y) is the same as Variance, only two Random
Variables are compared, rather than a single Random Variable against itself.
 Correlation, \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}√Var(X)Var(Y)Cov(X,Y), is just
the Covariance normalized.
10.Solve the following

a. A die is rolled, find the probability that an even number is obtained.
b. Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
c. Two dice are rolled, find the probability that the sum is
i) equal to 1
ii) equal to 4
iii) less than 13
d. A card is drawn at random from a deck of cards. Find the probability of getting
the 3 of diamond.
e. A card is drawn at random from a deck of cards. Find the probability of getting a
queen.
f. A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a marble is
drawn from the jar at random, what is the probability that this marble is white?
Ans: a) Our possible outcomes = 1,2,3,4,5,6 ,so no.of possible outcomes is 6. our favourable
outcomes when a die is rolled are 2, 4,6 , so no.of favourable outcomes is 3.
probability = no.of favourable outcomes by
no.of possible outcomes
=> 3/6
=> 1/2
ans: ½
b) When two coins are tossed simultaneously then the possible outcomes obtained are {HH, HT,
TH, and TT}.
Here H denotes head and T denotes tail.
Therefore, a total of 4 outcomes obtained on tossing two coins simultaneously.
Consider the event of obtaining at the most one head. At most one head is obtained means
either no head is obtained or one head is obtained.
The outcome(s) favourable to no head obtained is {TT}

The outcome(s) favourable to one head obtained is {HT, TH}
Therefore, the event of obtaining at most one head has 3 favourable outcomes. These are TT, HT
and TH.
Therefore, the probability of obtaining at most one head = 3/4.

c)
i) probability =0
ii) There are six faces for each of two dice, giving 36 possible outcomes. If the two dice are fair,
each of 36 outcomes is equally likely.
Three outcomes sum to 4: (1+3), (2+2) and (3+1).
Probability of getting a sum of 4 on one toss of two dice is 3/36, or 1/12.
iii) two fair six sided dice marked 1 through 6, the chances are 0.
6 is the highest roll on a single die. Two sixes, a result that comes up 1/6*1/6=1/36 of the time,
totals 12, short of 13.
There are other dice. The d4 will have a highest total of 8, also no good.
But 2 d8 have several 13 results, and 2d10 several more, 2d12 even more, 2d20, 2d30, 2d34…
there are even more obscure die sizes than this.
And of course dice don’t have to be marked with a series from 1 to (number of face), those d6
could be marked 1 3 5 7 9 11 for all I know
And you could mix and match your dice.
There’s no information about which dice are available, how they are marked, whether they are
fair… there simply doesn’t appear to be enough information to answer your question.
The usual meaning, though, of fair d6 marked 1 through 6, has a definitive answer: no chance.
d) Number of cards in a deck = 52 .
Therefore, Sample space n(S) = 52 .
Number of diamonds in a deck = 13

Number of diamonds card having 3 = 1 .
Now, Let the event of getting 3 of diamond is E.
Number of favourable events = n(E) = 1 .
Therefore,
Probability of getting a 3 of diamond = n(E)/n(S) = 1/52 .
e) Given that there are 4 Queens in a deck of 52 cards, the probability of drawing a Queen on a
single draw is 4/52=0.076924/52=0.07692, meaning there is a 7.692% chance of drawing a
Queen.
f) The probability to randomly draw a marble is 20C1.
The probability to draw a white marble from 10 white marbles is 10C1.
Thus, the probability to draw a white marble out of 20 is: 10C1/20C1 =1/2

MLT Assignment 1

Uploaded by

MLT Assignment 1

Uploaded by

ASSIGNMENT-1

Discrete Random Variables

Continuous Random Variables

Conditions for a valid probability density function:

Check whether the given probability density function is valid or not.

Hence the condition is satisfied.

Therefore, the given function is a valid probability density function.

3. Explain need of Bayesian probabilities with the help of suitable example.

P(A ∣ B) = P(B ∣ A)P(A) / P(B)

where A and B are two events and P(B) ≠ 0

P(A ∣ B) is the conditional probability of event A occurring given that B is true.

P(B ∣ A) is the conditional probability of event B occurring given that A is true.

Plugging these values into the theorem:

P(A ∣ B) = (0.07 * 0.10) / (0.05) = 0.14

4. Explain the characteristics of normal distribution.

1. A continuous variable - the normal probability distribution reflects the distribution of a

5.Explain Logistic Regression with suitable example.

Logistic Regression Algorithm

Logistic Regression uses Sigmoid function.

And if we plot it, the graph will be S curve,

Let’s consider t as linear function in a univariate regression model.

So the Logistic Equation will become

6. Differentiate between Regression and Classification.

Supervised machine learning

Here is a chart that shows the different groupings of machine learning:

Regression in machine learning

7. Explain Bayes Theorem .

P(A ∣ B) = P(B ∣ A)P(A) / P(B)

where A and B are two events and P(B) ≠ 0

P(A ∣ B) is the conditional probability of event A occurring given that B is true.

P(B ∣ A) is the conditional probability of event B occurring given that A is true.

Plugging these values into the theorem:

P(A ∣ B) = (0.07 * 0.10) / (0.05) = 0.14

8. Explain Perceptron algorithm .

As you can see it has multiple layers.

1. Input values or One input layer

The perceptron works on these simple steps

Fig: Multiplying inputs with weights for 5 inputs

c. Apply that weighted sum to the correct Activation Function.

For Example : Unit Step Activation Function.

Fig: Unit Step Activation Function

9. Explain the terms Expectation,probability,covariance,correlation with suitable examples.

Ans: Variance - squaring Expectations to measure change

In mathematically rigorous treatments of probability we find a formal definition that is very

Var(X) = E(X^2) - E(X)^2Var(X)=E(X2)−E(X)2

Covariance - measuring the Variance between two variables

Correlation - normalizing the Covariance

 Expectation,E(X)E(X) , is the outcomes of a Random Variable weighted by their probability.

10.Solve the following

The outcome(s) favourable to no head obtained is {TT}

Therefore, the probability of obtaining at most one head = 3/4.

Three outcomes sum to 4: (1+3), (2+2) and (3+1).

Probability of getting a sum of 4 on one toss of two dice is 3/36, or 1/12.

And you could mix and match your dice.

d) Number of cards in a deck = 52 .

Therefore, Sample space n(S) = 52 .

Number of diamonds in a deck = 13

Now, Let the event of getting 3 of diamond is E.

Number of favourable events = n(E) = 1 .

Probability of getting a 3 of diamond = n(E)/n(S) = 1/52 .

f) The probability to randomly draw a marble is 20C1.

The probability to draw a white marble from 10 white marbles is 10C1.

You might also like