Statistics and Probability - Solved Assignments - Semester Fall 2007
Statistics and Probability - Solved Assignments - Semester Fall 2007
Question 1
A district is divided into two areas, viz urban area and rural area. Total population of the
district is 271,076 out if which only 46,740 live in the urban area. Total male population
of the area is 139,699 and that of the urban area is 23,083.Total unmarried population of
the district is n112, 352 out of which 36,864 are rural females. In the urban area,
unmarried people number 21,072 out of which 12,149 are males.
SOLUTION
Question 2
The following figures give the numbers of children born to 50 women in a certain locality
up to the age of 40 years;
1, 5, 1, 1, 2, 5, 9, 2, 6, 3, 5, 7, 8, 4, 6, 8,
9, 10, 9, 3, 5, 7, 9, 9, 4, 5, 4, 5, 5, 7, 3,
4, 2, 3, 4, 6, 3, 4, 2, 5, 6, 4, 0, 5, 6, 8, 5,
4, 7, 6
SOLUTION
a) FREQUENCY DISTRIBUTION
No of tally frequency
children
0 1
1 3
2 4
3 5
4 8
5 10
6 6
7 4
8 3
9 5
10 1
Total 50
No of frequency Cumulative
children frequency
0 1 1
1 3 4
2 4 8
3 5 13
4 8 21
5 10 31
6 6 37
7 4 41
8 3 44
9 5 49
10 1 50
Total 50
No of frequency Relative
children. frequency.
0 1 1/50
1 3 3/50
2 4 4/50
3 5 5/50
4 8 8/50
5 10 10/50
6 6 6/50
7 4 4/50
8 3 3/50
9 5 5/50
10 1 1/50
Total 50 1
Question 3
a) In which situations weighted mean and arithmetic mean are used.
b) From the following data, find the weighted mean.
Items Food House Clothin Fuel & Educatio Miscellaneous
rent g electricity n
Expenditure 3000 600 200 150 100 50
Rs.
Weights 20 8 5 4 2 1
SOLUTION
a) Sometimes we want to find the average of certain values which are not of equal
importance. When the values are not of equal importance, we assign them certain
numerical values to express their relative importance. These numerical values are known
as “weights”. When the observations are associated with certain weights then we use the
weighted average. While the arithmetic mean is used simply when we want average.
Weighted mean and arithmetic mean will be equal when weights and the frequencies are
equal.
b)
Items Expenditur Weights WX
e(Rs.)(X) (W)
Miscellaneous 50 1 50
x 1 2 3 4
f 2 3 4 1
Solution:
a. AS mode is that value which occurs maximum number of time in the data and in
this case each value appears only one time, so mode can not be exist. By
definition geometric mean of a set of positive values is the nth root of the
product of the values and in this case one value is with the –ve sign also when
all the values will be multiplied they will be zero, square root of zero does not
exist.
b.
X f 1/x f(1/x)
1 2 1 2
2 3 0.5 1.5
3 4 0.33 1.32
4 1 0.25 0.25
Total 10 5.07
H.M=
f = 10 = 1.972
f 1/ x 5.07
Question.2
a. Can all quartiles and deciles be expressed as percentiles? Explain.
b. The following data gives the numbers of weeks needed to find a job for 25 older
workers that lost their jobs as a result of corporation downsizing.
13 13 17 7 22
22 26 17 13 14
16 7 6 18 20
10 17 11 10 15
16 8 16 21 11
b. Ordered array
6 7 7 8 10
10 11 11 13 13
13 14 15 16 16
16 16 17 17 17
18 20 22 22 26
AS,
Range=Xm -X0 where m denotes the max. Observation and 0 denotes the min.
observation
=26-6=20
Xm X0
Coefficient of Dispersion=
Xm X0
26 6 20
= = =0.625
26 6 32
Question.3
Find Median and Mode from the following distribution.
Solution:
30-35 2 3
35-40 9 12
40-45 10 22
45-50 11 33
50-55 12 45
55-60 5 50
Total 50
MEDIAN:
Here n/2=50/2=25
So,
h n
Median= l ( F )
f 2
5
=45+ (25 22)
11
5
=45 + (3) = 45 + 1.364=46.364
11
MODE:
( f m f m 1 )
MODE= l h
( f m f m 1 ) ( f m f m 1 )
(12 11)
= 50 5
(12 11) (12 5)
1(5)
=50+ =50+0.625=50.625
8
Assignment 3
Question 1
a) Why we use correlation Analysis Technique?
b) A computer while computing the correlation coefficient between two variables x
and y from 25 pairs of observations, obtained the following results:
n 25, x 125, x2 650 and y 100, y 2 460, xy 508
It was, however discovered at the time of checking that he had copied down two pairs
of observations as:
x y x y
10 9 instead of 15 7
8 6 11 8
Obtain the correct value of correlation coefficient between x and y.
Solution:
a. We usually use the statistical methods to analysis the data involving only one
variable. Often an analysis of data concerning two or more variables is needed
to look for any statistical relationship or association between them. The
knowledge of such a relationship is important to make inferences from the
covariation between variables in a given situation. This is done by Appling the
correlation analysis technique.
b. The corrected values for termed needed in the formula of Person’s coefficient
are determined as follows:
775 775
r 0.371
3111 1400 2086.96
Thus the corrected value of correlation coefficient between x and y is 0.371.
Question 2
Differentiate between permutation and combination.
Solution:
A permutation is any ordered subset from a set of n distinct object. The number of
permutations of r objects, selected in a defined order from n distinct objects is defined
by the symbol n Pr
n!
n
Pr
n-r !
While combination is any sunset of r object, selected without regard to their order,
from a set of n distinct object. It is denoted by nCr .
n!
n
Cr
n-r !r!
In permutation order is important while in combination order is not important.
Question 2
The data on the profit (in Rs lakh) earned by 60 companies is follows:
No. of companies 5 12 20 16 5 2
Solution:
(I)
Q1 size of (n / 4)th observation (60 / 4)th observation. It lies in class 10 20.
hn
Q1 l C
f 4
10
Q1 10 15 5
12
Q1 10 8.33 18.33lakh
Q3 size of (3n / 4)th observation 45th observation. It lies in class 30 40.
h 3n
Q3 l C
f 4
10
Q3 30 45 37
16
Q3 30 5 35lakh
Hence the profit of central 50 percent companies lies between Rs 18.33 lakh and Rs
35 lakh.
(II)
h n
Median l C
f 2
10
Median 20 30 17
20
Median 20 6.5 26.5 lakh
Q1 Q3 2 Median
Sk
Q3 Q1
18.33 35 2 26.5 0.33
Sk 0.020
35 18.33 16.67
The positive value of Skb indicate that the distribution is positively skewed
and therefore there is a concentration of large values on the right side of
the distribution.
Assignment 4
Question 1
a. In which situation, we use Permutation and Combination.
b. An MBA applies for a job in two firms X and Y. The probability of being
selected in firm X is 0.7 and being rejected at Y is 0.5. The probability
of at least one of his applications being rejected is 0.6. What is the
probability that he will be selected by one of the firm?
Solution:
a. Permutations: When order matters and an object can be chosen more than once then
the number of permutations is
Where n is the number of objects from which you can choose and r is the number to be
chosen. For example, if you have the letters A, B, C, and D and you wish to discover the
number of ways to arrange them in three letter patterns matters (e.g., A-B is different
from B-A, both are included as possibilities)
Combinations: When the order does not matter and each object can be chosen only
once, the number of combinations is the binomial coefficient binomial coefficient
Where n is the number of objects from which you can choose and r is the number to be
chosen.
For example, if you have ten numbers and wish to choose 5 you would have
10!/(5!(10−5)!) = 252 ways to choose.
.
b. Let A and B denote the event that an MBA will selected in firm X and will be
rejected in firm Y respectively.
P A 0.7, P A 1 0.7 0.3
P B 0.5, P B 1 0.5 0.5, P A B 0.6
The probability that he will be selected by one of the firms is given by
P A B P A P B P A B ------------ (1)
P A B 1 P A B 1 0.6 0.4
Putting values in equation (1) we get.
P A B 0.7 0.5 0.4 0.8
c. Let A and B be the events if the husband’s and wife’s selection, respectively.
1 1
Given that P (A) = and P (B) =
7 5
1. the probability that both of them will selected is:
1 1 1
P( Aand B) P A P B 0.029
7 5 35
2. The probability that only one of them will be selected is:
P A and B or P A and B P A and B P A and B
P A P B P A P B
P A [1 P B ] [1 P A ]P B
1 1 1 1 1 4 1 6 10
(1 ) (1 ) 0.286
7 5 7 5 7 5 5 7 35
Question 2
a. The personnel department of a company has records which show the
following analysis of its 200 engineers.
Solution:
a. Let A,B, C and D denote the events that an engineer in under 30 years of age, 40 years
of age, has a bachelor’s degree only and has a master’s degree, respectively.
1. The probability that he has only a bachelor’s degree is:
150
P C 0.75
200
2. The probability that he has a master’s degree, given that he is over 40 is:
10
P D B 200 10
P D B 0.20
P B 50 50
200
3. The probability that he is under 30, given that he has only a bachelor’s is:
90
P A C 200 90
P A C 0.60
P C 150 150
200
b. Let A be the defective item. We know that the prior probability of defective item
produced on X, Y and Z, that is,
1 1 1 1
P(X) = ; P(Y) = and P (Z) = (each have chances of being selection).
3 3 3 3
We also know that, P A X 0.02, P A Y 0.07, P A Z 0.12
Now having known that the items drawn are defective, we want to know the
probability that it was produced by Y. That is
P A Y .P Y
P A Y
P X .P A X P Y .P A Y P Z . P A Z
0.07 .
1
3 0.35
1 1 1
0.02 0.07 0.12
3 3 3
Hence the probability that the defective item was produced on Y is 0.35.
Assignment 5
Question 1
a) Illustrate the necessary conditions for probability distributions.
b) Given the discrete probability distribution
P( X x) c x(1/ 2) x (1/ 2)4 x
4
x P(x)= 4C x (1 / 2) x (1 / 2) 4 x
0 4C 0 (1 / 2) 0 (1 / 2) 40 =1/16
1 4/16
2 6/16
3 4/16
4 1/16
∑ 1
Question 2
A continuous random variable X that can assume values between x=2 and x=4 has a
density function given by
x 1
f ( x)
8
a) Show that P (2<x<4) =1
b) Find P(X<3.5)
Solution:
a)
x 1
4
2
8
dx
4
1
8 ( x 1)dx
2
1 x2
x
4
8 2 2
1
(16 / 2 4 4 / 2 2)
8
1
(12 4) 1
8
b)
x 1
3 .5
2
8
dx
1 x2
x
3. 5
8 2 2
1 3 .5 2 4
( 3 .5 2 )
8 2 2
0.70
Assignment 6
Question 1
Let X 1 , X 2 , X 3 be a random sample of size 3 from a population with mean and
variance 2 .Consider the following two estimators of the mean
Solution:
First we examine the property of unbiasedness. T1 is sample mean X , which
we know is unbiased.
X X2 X3
E (T1 ) E 1
3
E (T1 )
3
3
E (T1 )
3
E (T1 )
X 2X2 X3
E (T2 ) E 1
4
1 4
E (T2 ) 2
4 4
E (T2 )
X X2 X3
Var (T1 ) E 1
3
1
Var (T1 ) var( X 1 ) Var ( X 2 ) Var ( X 3 )
9
1
Var (T1 ) 2 2 2
9
3
Var (T1 ) 2 2 2
9
1
Var (T1 ) 2
3
X 2X2 X3
Var (T2 ) E 1
4
1
Var (T2 ) var( X 1 ) 4Var ( X 2 ) Var ( X 3 )
16
1
Var (T2 ) 2 4 2 2
16
6
Var (T2 ) 2
16
3
Var (T2 ) 2
8
1 3
3 8
Var (T1 ) Var (T2 )
b) Draw all possible samples of two letters each without replacement from
the letters of the word “Management”. Find proportion of latter “M” in each
sample. Also construct the sampling distribution of „M‟.
Solution:
a) The deviation of sample statistic T from its parameter is considered an
error. Hence the standard deviation of a sample statistic is called as standard
error of the statistic.
P f ^
f( p )
0 28 28/45
1/2 16 16/45
2/2 1 1/45
45 45/45=1
Question 3
a) Ten vegetables cans, all of the same size, have lost their labels. It is
known that 5 contain tomatoes and 5 contain corns. If 5 are selected at
random, what is the probability that all contain tomatoes?
What is the probability that 3 or more contain tomatoes?
b) For a machine making parts, there is a small probability of 0.001 for a part
to be defective. The parts are supplied in bundles of 10. Calculate
approximately the number of bundles containing no defective, one
defective or two defectives in a consignment of 10,000 bundles, given
that e0.01 0.9900 .
Solution:
a)
Let X denotes the number of tomatoes cans
For x=2
(0.01) 2 e 0.01
N .P(2; ) 10, 000
2!
2
(0.01)
10, 000 e 0.01
2!
=0.495 .
P( X 1) P( X 1) P( X 2)
P( X 1) 0.0099+0.00004995
= 0.01003995
N .P( X 1) 10000(0.01003995)
= 100.3995
Assignment 7
Question 1
a) Define Non-Sampling Error and how it can be minimize?
b) A continuous manufacturing process produces items whose weights are normally
distributed with a mean weight of 800 grams and a standard deviation of 300
grams. A random sample of 16 items is to be drawn from the process.
i. What is the probability that the arithmetic mean of the sample exceed 900
grams? Interpret the results.
ii. Find the values of the sample mean within which the middle 95 percent of
all sample means will fall.
z 1.95
Solution:
a) The non-sampling errors are common in both the complete enumeration and in
sample surveys. These include biases and mistakes. The main source of these
errors are definition of the population, defect in the method of interviewing,
duplication and substitution, inaccurate response by the respondents, faulty
reporting of facts and non-response to the mail questionnaire. These errors can
be controlled by giving precise definition of the population, making accurate
frame, improving the method of measurement, proper selection of
questionnaire, adequate training of the investigator, cross judgment, following
up of non-response and correct manipulation of the collected information.
300 300
And
75
x n 16 4
The required probability is P x 900 ; i.e.
x 900 800
P x 900 P z
75
x
P z 1.33
0.5 0.4082 0.0918
Hence, 9.18 percent of all possible samples of size n=16 will have a sample mean value
greater than 900 g.
b (ii) Since z=1.96 for the middle 95 percent area under the normal curve, therefore using
the formula for z to solve for the value of x in term of the known values are as follow:
x1 z
x x
x1 800 1.96 75 653 g
And
x2 z
x x
x2 800 1.96 75 947 g
Question 2
a) what should be the sampling size necessary to estimate the population
mean at 95 percent confidence with a sampling error of 5 and the standard
deviation equal to 20?
b) Suppose we want to estimate the proportion of families in a town which have two
or more children. A random sample of 144 families has been chosen and 48 have
two or more children. Setup a 95 percent confidence interval estimate of the
population proportion of families having two or more children.
Solution:
Calculate 95% Confidence interval for the mean mass of the population,
supposed normal, from which these masses were drawn(s=1.77)
Solution:
The 95% confidence interval for the mean mass of the population mean is given by:
X t( / 2,v ) s / n
X
21.4
23.1
25.9
24.7
23.4
24.5
25.0
22.5
26.9
26.4
25.8
23.2
21.9
n
X
i 1
i =314.7
X
X
n
314.7
X
13
X 24.21
s 1.77
v n 1 13 1 12
X t( / 2,v ) s / n
Now,
1.77
24.21 2.179
13
24.21 2.179(0.49)
24.21 1.07
23.14 25.28
or
23.14 to 25.28
b) Ten oil tins are taken at random from an automatic filling machine. The mean
weight of the tins is 15.8kg and the standard deviation is 0.50kg.Does the sample
mean differ significantly from the intended weight of 16kg?
Solution:
H 0 : µ = 16
H 1 : µ 16
Level of significance:
α = 5 % = 0.05
Critical Region:
t t( / 2,v ) 2.262
t t( / 2,v ) 2.262
Test Statistic:
X
t
s/ n
Calculation:
15.8 16
t
0.5 / 10
0.2
t 1.26
0.158
Conclusion:
Since, t (cal) =-1.26 does not fall in critical region, so we do not reject
H 0 .Hence µ = 16
Question 2
a) A random sample of 20 students obtained a mean of 72 and a variance of 16
on a college placement test in mathematics. Assuming the scores to be normally
distributed, construct a 95 % confidence interval of σ 2.
Solution:
a)
Given that x 72 , s 2 16 , n=20
And we know that
(n 1) s 2 (n 1) s 2
2
2 ,v2
21 , v 2
304 304
2
2
0.025,19 2
0.975,19
9.253 2 34.130459
Sol:
b)
Hypothesis:
H0: σ2 =1.3
H1: σ2 1.3
Level of significance:
α =0.05
Test Statistics:
(n 1) s 2
2
2
Critical Region:
2 2 ,v 2 0.025 , 7 16.013
2
2 2
1
2
, v 2 0.975,7 1.690
Computations:
7(1.8) 2
2 =17.44
1.3
Conclusion
Since our calculated value of χ2 is greater than the table value, so we reject the
null hypothesis at 5% level of significance.