Statistics With R-Programming Lab Manual
Statistics With R-Programming Lab Manual
2022-23
for
Prepared by
Professor, B.S.&H.
1) Download the latest version of R Studio just by clicking on the link provided
here https://www.rstudio.com/products/rstudio/download/ and it redirects you to
download page. There are two versions of R Studio available – desktop and
server. Based on your usage and comfort, select the appropriate version to
initiate your download.
2) Download the .exe file and double click on it to initiate the installation.
3) Click on the ‘Next’ button and it redirects you to select the installation folder.
Select ‘C:\’ as your installation directory since R and R Studio must be installed
in the same directory to avoid path issues for running R programs.
4) Click ‘Next’ to continue and a dialog box asking you to select the Start menu
folder opens. It is advisable to create your own folder to avoid any possible
confusion and click on Install button to install R Studio.
After completion of installation, clicking ‘Next’ from the previous step, the installation
procedure ends and the window is displayed. Click ‘Finish’ to exist from the
installation window
Installation of R in Ubuntu: Go to software center and search for R Base and install.
Then open terminal and enter R to get R command prompt in terminal.
Installation of R-studio in Ubuntu: Open terminal and type the following commands
Code:
m=function()
{
print("Enter the elements of vector:")
x=scan()
n=length(x)
sum=0
for(i in 1:n)
{
sum=sum+x[i]
}
mean1=sum/n
cat("Mean of the vector is ",mean1)
}
Output:
> m()
[1] "Enter the elements of vector:"
1: 1
2: 4
3: 6
4: 3
5: 5
6:
Read 5 items
Mean of the vector is 3.8
Experiment #2-B:
Code:
m1=function()
{
print("Enter the elements of vector:")
x=scan()
print(table(x))
f=as.numeric(table(x))
x1=sort(unique(x))
sum=0
mean1=sum(f*x1)/sum(f)
cat("Mean of the vector is ",mean1)
6
Output:
> m1()
[1] "Enter the elements of vector:"
1: 1
2: 1
3: 1
4: 2
5: 3
6: 4
7: 5
8: 4
9: 6
10: 7
11:
Read 10 items
x
1234567
3112111
Mean of the vector is 3.4
Experiment #2-C:
Code:
med=function()
{
print("Enter the elements of vector:")
x=scan()
n=length(x)
x1=sort(x)
print(x1)
if(n%%2==0)
{
me=(x1[n/2+1]+x1[(n+1)/2])/2
} else {
me=x1[n/2+1]
}
cat("Median of the vector is ",me,"\n")
}
Output:
> med()
[1] "Enter the elements of vector:"
1: 1
2: 2
3: 3
Experiment #2-D:
Aim: R program to find mode of the given data
Code:
mod=function()
{
print("Enter the elements of vector:")
x=scan()
print(table(x))
f=as.numeric(table(x))
x1=sort(unique(x))
mf=max(f)
for(i in 1:length(f))
{
if(f[i]==mf)
cat("\nMode is ",x1[i])
}
}
Output:
mod()
[1] "Enter the elements of vector:"
1: 1
2: 1
3: 2
4: 3
5: 4
6: 5
7: 5
8: 6
9: 6
10: 7
11: 7
12:
Read 11 items
x
1234567
2111222
Mode is 1
Mode is 5
Experiment #2-E:
Aim: R program to perform different operations on matrices
Code:
read=function()
{
A=matrix(c(1:9),nrow=3,ncol=3,byrow=T)
B=matrix(c(10:18),nrow=3,ncol=3,byrow=T)
m1=nrow(A)
n1=ncol(A)
m2=nrow(B)
n2=ncol(B)
cat("Matrix A:\n")
print(A)
cat("Matrix B:\n")
print(B)
if(m1==m2 && n1==n2)
{
cat("Sum of the matrices is A+B=\n")
print(A+B)
} else
cat("\n Addition of matrices is not possible")
if(n1==m2)
{
cat("Product of the matrices is A*B=\n")
print(A%*%B)
} else
cat("\n Multiplication of matrices is not possible")
cat("Transpose of the Matrix A is:\n")
print(t(A))
cat("Transpose of the Matrix B is:\n")
print(t(B))
}
Output:
read()
Matrix A:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Matrix B:
[,1] [,2] [,3]
[1,] 10 11 12
[2,] 13 14 15
[3,] 16 17 18
10
Experiment #3-A:
Aim: R Program to create a list containing a vector, a matrix and a list and write
a code for the following.
# 1) Give names to the elements in the list
# 2) Add element at the end of the list
# 3) Remove the second element
Code:
# Creating a list
a=c(23,4,5,56)
b=matrix(data=1:9,nrow=3)
c=list(35,"ravi","Male")
lst=list(a,b,c)
print(lst)
Output:
[[1]]
[1] 23 4 5 56
[[2]]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[[3]]
[[3]][[1]]
[1] 35
[[3]][[2]]
[1] "ravi"
11
$matrix
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$info
$info[[1]]
[1] 35
$info[[2]]
[1] "ravi"
$info[[3]]
[1] "Male"
$matrix
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$info
$info[[1]]
[1] 35
$info[[2]]
[1] "ravi"
$info[[3]]
[1] "Male"
[[4]]
[1] 1 2 3
$info
$info[[1]]
[1] 35
$info[[2]]
[1] "ravi"
$info[[3]]
[1] "Male"
[[3]]
[1] 1 2 3
Experiment #3-B:
Aim: R program to create a data frame of student with four given vectors and write a
code
# 1) to get the structure of a given data frame.
# 2) to get the statistical summary and nature of the data of a given data frame.
# 3) to extract specific column from a data frame using column name.
# 4) to extract first two rows from a given data frame.
# 5) to extract 3rd and 5th rows with 1st and 3rd columns from a given data frame.
# 6) to add a new column in a given data frame.
# 7) to add new row(s) to an existing data frame.
# 8) to drop column(s) by name from a given data frame.
# 9) to drop row(s) by number from a given data frame.
# 10) to extract the records whose grade is greater than 9.
Code:
# creating a data frame
r.no=c("17981A0461","17981A0462","17981A0463","17981A0464","17981A0465","1
7981A0466")
name=c("ramu","ahmed","samuel","singh","begum","prasanthi")
grade=c(8.4,9.9,7.5,8.7,9.1,6.8)
sex=c("M","M","M","M","F","F")
df_stud=data.frame(r.no,name,grade,sex)
print(df_stud)
# 5) Extracting 3rd and 5th rows with 1st and 3rd columns
print("The 3rd and 5th rows with 1st and 3rd columns are:")
print(df_stud[c(3,5),c(1,3)])
Output:
r.no name grade sex
1 17981A0461 ramu 8.4 M
2 17981A0462 ahmed 9.9 M
3 17981A0463 samuel 7.5 M
4 17981A0464 singh 8.7 M
5 17981A0465 begum 9.1 F
6 17981A0466 prasanthi 6.8 F
[1] "The structure of the data frame is :"
'data.frame': 6 obs. of 4 variables:
$ r.no : Factor w/ 6 levels "17981A0461","17981A0462",..: 1 2 3 4 5 6
$ name : Factor w/ 6 levels "ahmed","begum",..: 4 1 5 6 2 3
$ grade: num 8.4 9.9 7.5 8.7 9.1 6.8
$ sex : Factor w/ 2 levels "F","M": 2 2 2 2 1 1
NULL
14
16
Experiment #4-A:
Aim:R program to find biggest of 3 numbers
Code:
big=function()
{
x=as.numeric(readline("Enter x value:"))
y=as.numeric(readline("Enter y value:"))
z=as.numeric(readline("Enter z value:"))
t=0
if(x>y)
t=x else
t=y
if(t>z)
cat(t," is big") else
cat(z," is big")
}
big()
Output:
Enter x value:2
Enter y value:1
Enter z value:5
5 is big
Experiment #4-B:
Aim: R program to find roots of a quadratic equation
Code:
roots=function()
{
a=as.numeric(readline("Enter a value:"))
b=as.numeric(readline("Enter b value:"))
c=as.numeric(readline("Enter c value:"))
t=b^2-(4*a*c)
if(t<0)
{
cat("Roots are imaginary and roots are ",(-b/(2*a)),"+i",
((sqrt(-t))/(2*a)),"and",(-b/(2*a)),"-i",((sqrt(-t))/(2*a)))
} else
if(t==0)
{
cat("Roots are real and equal and root is ",(-b/(2*a)))
} else
{
cat("Roots are real and unequal\n")
cat("Root1=",(-b+sqrt(t))/(2*a),"\nRoot2=",(-b-sqrt(t))/(2*a))
17
Output:
> roots()
Enter a value:1
Enter b value:4
Enter c value:1
Roots are real and unequal
Root1= -0.2679492
Root2= -3.732051
> roots()
Enter a value:1
Enter b value:2
Enter c value:1
Roots are real and equal and root is -1
> roots()
Enter a value:1
Enter b value:1
Enter c value:1
Roots are imaginary and roots are
-0.5 +i 0.8660254 and
-0.5 -i 0.8660254
Experiment #4-C:
Aim: R program to find sum of elements of vector and to find minimum and maximum
elements of vectors
Code:
vec=function()
{
print("Enter the elements of vector:")
x=scan()
n=length(x)
sum=0
for(i in 1:n)
sum=sum+x[i]
max=min=x[1]
for(i in 1:n)
{
if(x[i]<min)
min=x[i]
if(x[i]>max)
max=x[i]
}
cat(" sum of vector elements=",sum,"\n","Minimum element of vector
is:",min,"\n","Maximum element of vector is:",max,"\n")
}
18
19
Experiment #5-A:
Aim: R program to find Factorial of a number using recursive function
Code:
fact=function()
{
n=as.numeric(readline("Enter n value:"))
f=fact1(n)
if(n>=0)
cat("Factorial of ",n," is ",f,"\n")
}
fact1=function(n)
{
if(n>=0)
{
if(n==0)
return(1) else
return(n*fact1(n-1))
} else
print("Factorial of negetive number is not possible to compute")
}
Output:
> fact()
Enter n value:-1
[1] "Factorial of negetive number is not possible to compute"
> fact()
Enter n value:0
Factorial of 0 is 1
> fact()
Enter n value:8
Factorial of 8 is 40320
Experiment #5-B:
Aim: R program to find GCD of two numbers
Code:
gcd=function()
{
x=as.numeric(readline("Enter x value:"))
y=as.numeric(readline("Enter y value:"))
g=gcd1(x,y)
cat("GCD of ",x," and ",y," is ",g,"\n")
}
gcd1=function(x,y)
{
if(y!=0)
return(gcd1(y,x%%y))
20
Output:
> gcd()
Enter x value:5
Enter y value:7
GCD of 5 and 7 is 1
> gcd()
Enter x value:125
Enter y value:35
GCD of 125 and 35 is 5
21
Experiment #6-A:
Aim: R program to mean, variance, standard deviation for the given discrete probability
distribution.
Code:
discrete=function()
{
print("Enter the values of x")
x=scan()
print("Enter the values of p")
p=scan()
y=DiscreteDistribution(supp=x,prob=p)
cat("Mean of the probability distribution is ",E(y))
cat("\nVariance of the probability distribution is ",var(y))
cat("\nStandard Deviation of the probability distribution is ",sd(y))
cat("\n The Distribution function is \n","x ",x,sep="\t","\n","F(x) ",cumsum(p))
}
Output:
> discrete()
[1] "Enter the values of x"
1: 0
2: 1
3: 2
4:
Read 3 items
[1] "Enter the values of p"
1: 0.3
2: 0.5
3: 0.2
4:
Read 3 items
Mean of the probability distribution is 0.9
Variance of the probability distribution is 0.49
Standard Deviation of the probability distribution is 0.7
The Distribution function is
x 0 1 2
F(x) 0.3 0.8 1
22
Code:
contin=function()
{
f=function(x) 3*x^2
p=integrate(f,lower=0.14,upper=0.71)
print("The probability of x lies between 0.14 to 0.17 is ")
print(p)
x=AbscontDistribution(d=f,low1 =0,up1 =1)
cat("Mean of the probability distribution is ",E(x))
cat("\nVariance of the probability distribution is ",var(x))
cat("\nStandard Deviation of the probability distribution is ",sd(x))
#cat("\n The Distribution function is \n","x ",x,sep="\t","\n","F(x) ",cumsum(p))
}
Output:
> contin()
[1] "The probability of x lies between 0.14
to 0.17 is "
0.355167 with absolute error < 3.9e-15
Mean of the probability distribution is
0.7496337
Variance of the probability distribution is
0.03768305
Standard Deviation of the probability
distribution is 0.1941212
23
Code:
#Scatter plot
plot(iris$Sepal.Length,iris$Sepal.Width,type="p")
#Histogram
par(mfrow=c(1,2))
hist(iris$Sepal.Length,main="First")
hist(iris$Sepal.Width,main="Second")
par(mfrow=c(1,2))
hist(iris$Petal.Length,main="Third")
hist(iris$Petal.Width,main="Fourth")
#Pie-chart
pie(table(iris$Species))
#Box-plot
boxplot(iris$Sepal.Length,iris$Sepal.Width,iris$Petal.Length,iris$Petal.Width)
#Bar-plot
barplot(head(iris$Sepal.Length),xlab="Sepal lenght")
Output:
24
Code:
fit_binom=function()
{
n=as.integer(readline("enter the no. of coins tossed: "))
x=0:n
print("enter the values of f ")
f=scan()
ex_freq=0
cat("\n The given Distribution is \n x",x,sep="\t","\n f",f,"\n")
N=sum(f)
if(length(x)==length(f))
{
a=as.logical(readline(prompt="Is the coin unbiased ? :Enter T for TRUE
or F for FALSE \n"))
if(a==T)
p=0.5
else
{
meen=sum(x*f)/N
p=meen/n
}
for(i in 1:(n+1))
ex_freq[i]=N*(dbinom(i-1,n,p))
cat("The expected frequencies are \n",ex_freq)
cat("\n The fitted Binomial Distribution is \n x",x,sep="\t","\n
f",round(ex_freq),"\n")
}else
print("No. of observations in x and f must be equal")
}
Output:
> fit_binom()
enter the no. of coins tossed: 3
[1] "enter the values of f "
1: 2
2: 4
3: 5
4: 6
5:
Read 4 items
Experiment #8-B:
Aim: R program to fit Poisson distribution to the given data
Code:
fit_poisson=function()
{
print("enter the values of x:")
x=scan()
print("enter the values of f:")
f=scan()
cat("\n The Given Distribution is \nx:",x,sep="\t","\nf:",f,"\n")
ex_freq=0
N=sum(f)
meen=sum(x*f)/N
for(i in 1:length(x))
ex_freq[i]=N*(dpois(i-1,meen))
cat("The expected frequencies are \n",ex_freq)
cat("\n The fitted Poisson Distribution is
\nx:",x,sep="\t","\nf:",round(ex_freq),"\n")
}
Output:
[1] "enter the values of x:"
1: 0
2: 1
3: 2
4: 3
5: 4
6:
Read 5 items
[1] "enter the values of f:"
1: 10
2: 9
3: 8
4: 7
5: 6
6:
Read 5 items
28
A manufacturer claims that the mean lifetime of a light bulb is more than 10,000 hrs. In
a sample of 30 light bulbs, it was found that they only last 9,900 hrs on average. Assume
that the population standard deviation is 120 hrs. at 0.05 significance level can we reject
the claim by the manufacturer.
Aim: To test the claim
H0: mu=10000
H1: mu>10000
Alpha=0.05=5%
Critical value from the z-table is 1.645
Code:
xbar=9900
mu=10000
n=30
sigma=120
z=(xbar-mu)/(sigma/sqrt(n))
Output:
>z
[1] -4.564355
Experiment #9-B:
Suppose the mean weight of king penguins found in an Antarctic colony last year was
15.4 kg. in a sample of 35 penguins same time this year in the same colony, the mean
penguin weight is 14.6 kg . Assume that the population standard deviation is 2.5kg. at
0.05 significance level, can we reject the null hypothesis that the mean penguin weight
does not differ from last year.
Code:
xbar=14.5
mu=15.4
n=35
sigma=2.5
z=(xbar-mu)/(sigma/sqrt(n))
Output:
[1] -2.129789
29
Experiment #9-C:
t-test
Code:
# creating a data frame
Y1=c(81.0,105.4,119.7,109.7,98.3,146.6)
Y2=c(80.7,82.3,80.4,87.2,84.2,100.4)
immer=data.frame(Y1,Y2)
t.test(immer$Y1,immer$Y2,paired=TRUE)
Output:
Paired t-test
data: immer$Y1 and immer$Y2
t = 3.324, df = 29, p-value = 0.002413
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.121954 25.704713
sample estimates:
mean of the differences
15.91333
Experiment #9-D:
Five measurements of tar content of certain kind of cigarette yielded 14.5, 14.2, 14.4,
14.3, 14.6 milligrams per cigarette. Show that the difference between the mean of this
30
Code:
data<-c(14.5, 14.2, 14.4, 14.3, 14.6)
t.test(data,mu=14.0)
Output:
One Sample t-test
data: data
t = 5.6569, df = 4, p-value = 0.004813
alternative hypothesis: true mean is not equal to 14
95 percent confidence interval:
14.20368 14.59632
sample estimates:
mean of x 14.4
Experiment #9-E:
The heights of 6 randomly chosen sailors are 63,65,68,69,71,72 inches. Those of 10
randomly chosen soldiers are 61,62,65,66,69,69,70,71,72,73 inches. Discuss whether
this data gives a suggestion that the sailors are taller than soldiers.
Aim: To test the claim that sailors are taller than soldiers
H0: x = y
H1: x > y
Level of significance: Appropriate level of significance is 5% (chosen)
The tabulated value of t at 5% level of significance for 14 degrees of freedom in a right
tailed test is 1.761. [t,n1+n2-2=t0.05,14=t0.05,14=1.761]
Code:
sailors<-c(63,65,68,69,71,72)
soldiers<-c(61,62,65,66,69,69,70,71,72,73)
t.test(sailors,soldiers, alternative = "greater", conf.level = 0.95)
Output:
Welch Two Sample t-test
data: sailors and soldiers
t = 0.10388, df = 12.228, p-value = 0.4595
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-3.226071 Inf
sample estimates:
mean of x mean of y
68.0 67.8
31
Sample 1 16 26 27 23 24 22
Sample 2 33 42 35 32 28 31
Do the population variances differ significantly?
Code:
data1<-c(16,26,27,23,24,22)
data2<-c(33,42,35,32,28,31)
F<-var.test(data1,data2)
F
Output:
F test to compare two variances
data: data1 and data2
F = 0.6696, num df = 5, denom df = 5, p-value = 0.6706
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.09369826 4.78524246
sample estimates:
Experiment #9-G:
In a large manufacturing factory, a survey was conducted regarding three types of bonus
schemes. Total employees were divided into four categories namely laborers, clerks,
technicians and executives. The results obtained by way of opinion survey are presented
in the form of contingency table as given below. Test the good ness of fit at 5% level of
significance.
EMPLOYEES BONUS SCHEMES
CATEGORY Type 1 Type 2 Type 3
32
Code:
M<-as.table(rbind(c(190,243,197),c(82,44,44),c(23,78,34),c(5,12,8)))
dimnames(M)<-list(empcategory=c("labour","clerks","technicians","executives"),
bonuschemes=c("type 1","type 2","type 3"))
xsq<-chisq.test(M)
xsq
Output:
Pearson's Chi-squared test
data: M
X-squared = 48.101, df = 6, p-value = 1.128e-08
Conclusion:
The calculated value of χ2=48.101
The table value of χ20.05,6 =12.59
33
Code:
x<-c(16,21,26,23,28,24,17,22,21)
y<-c(33,38,50,39,52,47,35,43,41)
cor(x,y)
Output:
[1] 0.9471715
Experiment #10-B:
Find the Karl Pearson’s correlation coefficient for the following data on
heights(inches) of fathers (x) and their sons(y)
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Code:
x<-c(65,66,67,67,68,69,70,72)
y<-c(67,68,65,68,72,72,69,71)
cor(x,y)
Output:
[1] 0.6030227
Experiment #10-C:
Fit a linear regression of y on x for the following data
X 1 2 3 4 5 6 7 8 9
y 11 12 13 14 15 16 17 18 19
Code:
x=c(1:9)
y=c(11:19)
lm(y~x)
summary(lm(y~x))
34
Coefficients:
(Intercept) x
10 1
> summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-9.006e-16 -2.472e-16 -2.031e-16 -1.370e-16 1.724e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.000e+01 5.784e-16 1.729e+16 <2e-16 ***
x 1.000e+00 1.028e-16 9.729e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Experiment #10-D:
Fit a multiple linear regression using iris data by considering the response variable as
Sepal.length
Code:
lm(iris$Sepal.Length~(iris$Sepal.Width+iris$Petal.Length+iris$Petal.Width))
Output:
Call:
lm(formula = iris$Sepal.Length ~ (iris$Sepal.Width + iris$Petal.Length +
iris$Petal.Width))
Coefficients:
(Intercept) iris$Sepal.Width iris$Petal.Length
1.8560 0.6508 0.7091
iris$Petal.Width
-0.5565
35