Statistical Modeling Using R - Lab Manual
Statistical Modeling Using R - Lab Manual
Lab Manual
2. How do you import the first sheet of an Excel file named data.xlsx into R?
# Load the readxl package
library(readxl)
3. Assign the value 10 to a variable a and the value 20 to a variable b. Calculate their sum,
difference, product, and quotient.
# Variable assignment
a <- 10
b <- 20
# Arithmetic operations
sum <- a + b
difference <- a - b
product <- a * b
quotient <- a / b
4. Create a numeric vector v with values from 1 to 5. Calculate the mean and standard
deviation of v.
# Create a numeric vector
v <- c(1, 2, 3, 4, 5)
6. Convert the numeric vector v (from Question 5) into a factor with three levels: "Low"
(1, 2), "Medium" (3), and "High" (4, 5).
# Create a numeric vector
v <- c(1, 2, 3, 4, 5)
# Create a scatterplot
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
ggtitle("Scatterplot of MPG vs Weight") +
xlab("Weight (1000 lbs)") +
ylab("Miles Per Gallon (mpg)")
8. Prepare a dataset of your own with missing values, special values, and outliers.
# Create a dataset
my_data <- data.frame(
id = 1:10,
value = c(10, 20, 30, NA, 50, Inf, 70, 80, 1000, 90) # NA, Inf as special values
and 1000 as an outlier
)
9. From the starwars dataset, select the columns name, height, gender, and species.
# Load the dplyr package and starwars dataset
library(dplyr)
data(starwars)
10. From the starwars dataset, print the details of all those who have "yellow" and “blue”
as their eye_color
# Filter rows based on eye_color
eyes <- starwars %>%
filter(eye_color %in% c(‘yellow’,’blue’)
# View the details
eyes
11. For all the cars in the mtcars dataset, arrange the dataset in ascending order based on
the mpg value.
# Arrange the dataset by mpg in ascending order
mtcars_arranged <- mtcars %>%
arrange(mpg)
# View the arranged dataset
head(mtcars_arranged)
12. From the starwars dataset, print the name, height, and hair_color of all those who have
height > 175 and their hair_color is "black".
# Filter by height and hair_color
filtered_starwars <- starwars %>%
filter(height > 175, hair_color == "black") %>%
select(name, height, hair_color)
13. From the starwars dataset, create a factor variable for the species column and then
discretize the height column into three categories: "Short", "Average", and "Tall".
# Load the dplyr package and starwars dataset
library(dplyr)
data(starwars)
# View the updated dataset with the new factor and discretized columns
head(starwars %>% select(name, species, species_factor, height,
height_category))
Explanation:
25. Use the mtcars dataset to explore numerical variables. Create a boxplot to visualize the
distribution of mpg (miles per gallon), hp (horsepower), and qsec (quarter mile time).
Interpret the variability and central tendency of each variable based on the boxplot.
# Load the dataset
data(mtcars)
# Boxplot for hp
boxplot(mtcars$hp, main = "Horsepower (hp)")
26. Use the mtcars dataset to compare mpg vs wt (weight) using a scatter plot. Add a trend
line and interpret the relationship between mpg and wt.
# Scatter plot of mpg vs wt with trend line
plot(mtcars$wt, mtcars$mpg, main = "Scatter Plot of MPG vs Weight", xlab =
"Weight", ylab = "Miles Per Gallon")
abline(lm(mpg ~ wt, data = mtcars), col = "blue")
27. Generate histograms and density plots for Sepal.Length and Petal.Length from the iris
dataset. Compare the distributions and comment on any differences or similarities
between these two variables.
# Load the dataset
data(iris)
# Perform t-test
t_test <- t.test(mpg ~ cyl, data = mtcars, subset = cyl %in% c(4, 6))
t_test
# Check assumptions
plot(model)
32. Use the iris dataset to perform a multiple linear regression to predict Petal.Length based
on Sepal.Length, Sepal.Width, and Petal.Width. Check the assumptions of multiple
regression (linearity, normality of residuals, multicollinearity) and validate the model.
# Check assumptions
par(mfrow = c(2, 2))
plot(model)
# Perform ANOVA
anova_result <- aov(weight ~ group, data = PlantGrowth)
summary(anova_result)