R Programming for Data Science QB
R Programming for Data Science QB
QUESTION BANK
UNIT I
INTRODUCTION TO R PROGRAMMING
PART A
1. Define R programming.
R programming is an open-source programming language and software environment
primarily used for statistical computing and data analysis. It provides a wide variety of
tools for data manipulation, visualization, and statistical modeling, making it a popular
choice in fields like data science, statistics, and machine learning.
2. List out any five features of R.
Open Source: R is free to use and has a vast community of contributors.
Data Handling: It offers powerful data manipulation capabilities through packages like
dplyr and tidyr.
Extensibility: R can be extended through packages to integrate with other languages and
technologies.
3. Differentiate between R and Python in terms of functionality.
R is specifically designed for statistics and data analysis, with extensive libraries for
statistical modeling and visualization (e.g., ggplot2, dplyr).
While Python is more flexible for a wider range of applications, including machine
learning and software development.
4. What are the applications of R?
R is used in various fields such as:
Data Science and Analytics: Analyzing and visualizing data to gain insights.
Machine Learning: Building predictive models using libraries like caret and
randomForest.
Bioinformatics: Analyzing genomic and biological data.
Statistical Research: Conducting statistical analysis for research in areas like economics
and psychology.
Finance: Risk analysis, portfolio management, and time-series forecasting.
7. List out some packages in R, which can be used for data imputation?
mice: Multiple Imputation by Chained Equations, which provides a method for imputing
missing data.
Amelia: A package for multiple imputation that handles both continuous and categorical
data.
missForest: An algorithm that imputes missing data using random forests.
Hmisc: Contains functions like aregImpute() for imputing missing values.
knnImputation: A method based on the k-nearest neighbor algorithm to impute missing
values.
9. What is the result of sqrt(25) in R? How do you compute the square root of a number in
R?
The result is 5. The sqrt() function computes the square root of 25, returning 5 as the output.
It works for both perfect squares and non-perfect squares, returning the principal square
root.
10. What function is used to load an installed package into the R session?
After installing a package, you need to load it into your R session using the library()
function. For instance, library(ggplot2) loads the ggplot2 package, allowing you to use its
functions for data visualization. If the package is not installed, R will show an error
message.
11. Which function in R lists all objects currently in the workspace?
The ls() function lists all objects (variables, functions, etc.) in the current workspace. For
example, if you run ls(), R will display the names of all objects stored in the memory. You
can also use ls(pattern = "my") to filter objects by a pattern in their names.
12. How can you remove a variable x from the R workspace?
To remove a variable or object from the workspace in R, use the rm() function. For
example, running rm(x) will remove the variable x. If you want to remove all objects in
the workspace, you can use rm(list = ls()) to clear everything.
13. How do you check the type of a variable x in R?
To check the type of a variable x, use the typeof() or class() functions. typeof(x) returns
the internal type of the object (like "integer" or "double"), while class(x) shows the object’s
class (e.g., "data.frame" or "matrix"). Both functions help to understand the structure of
the data.
14. What function can you use to view the structure of an object in R?
You can use the str() function to view the structure of an object in R. It gives you a compact
summary of the object, including its type, length, and the first few elements. For example,
str(x) shows how x is organized, whether it's a vector, matrix, data frame, etc.
15. What is the output of 5 %% 2 in R?
The output is 1. The %% operator in R is the modulus operator, which returns the remainder
of the division of 5 by 2. Since 5 divided by 2 leaves a remainder of 1, 5 %% 2 evaluates
to 1. This operator is useful for determining if a number is divisible by another.
1. Explain how R can be used as a scientific calculator. Demonstrate basic operations with
examples.
2. Discuss how to install and load R packages. Explain the difference between
install.packages() and library().
3. Explain how the workspace functions in R, focusing on ls(), rm(), and get(). Show how
to save and load the workspace using examples, and discuss how these functions enhance
the R workflow.
4. Discuss how to inspect variables in R using functions like str(), class(), typeof(), and
summary() with examples.
5. Explain the different types of operators in R with examples and discuss how operator
precedence works in complex expressions.
6. Explain how to use conditional statements and loops in R with examples and discuss the
advantages and scenarios where each is useful.
7. Explain the importance of functions in R. Describe how to define user-defined functions,
handle arguments, and use return() with examples of nested functions and discuss
function scope in R.
UNIT- 2
PART- A
To combine two vectors, use the c() function. For example, combined <- c(vec1, vec2)
combines vectors vec1 and vec2 into a new vector combined. This operation simply appends
the elements of the second vector to the first one.
You can transpose a matrix in R using the t() function. For example, t(mat) transposes
the matrix mat, swapping rows and columns. This operation is useful for altering the
orientation of data.
Elements in a list are accessed using double square brackets [[ ]]. For example,
my_list[[2]] accesses the second element, which is "apple" in this case. You can also use
the $ operator if the list elements are named.
We can access a specific column in a data frame using the $ operator. For example,
df$Name accesses the "Name" column of the data frame df. Alternatively, you can use
indexing like df[, 1] to access the first column.
Use the gsub() function to replace text with regular expressions. For example,
gsub("apple", "orange", "I have an apple") replaces "apple" with "orange", returning "I
have an orange".
To convert a string to a Date object, use the as.Date() function. For example,
as.Date("2024-12-16") converts the string "2024-12-16" to a Date object. You can specify
the format with the format argument if necessary.
14. How do you extract the year from a Date object in R?
To extract the year from a Date object, use the format() function. For example,
format(as.Date("2024-12-16"), "%Y") returns the year "2024". This function can also
extract other components like month and day using different format codes.
PART- B
1. Explain what vectors are in R. Discuss how to create a vector using different methods.
2. Demonstrate vector operations such as element-wise addition, subtraction, and logical
comparisons.
3. Discuss the difference between arrays and matrices in R.
4. Explain how to create arrays and matrices, and how to manipulate their dimensions with
examples.
5. Explain the concept of lists in R and how they differ from vectors. Discuss how to create
and access elements in a list.
6. Explain the concept of data frames in R. Discuss how to create a data frame and access
its elements.
7. Discuss the process of data transformation in R, such as scaling, normalization, and log
transformations with examples.
8. Explain the different string manipulation functions available in R.
9. Provide examples of how regular expressions are used in string matching and
replacement in R.
10. Discuss how R handles date and time formats. Explain how to create date and time
objects using functions.
UNIT-3
WORKING WITH DATA
PART- A
10. How do you parse tables from a webpage using web scraping in R?
Use the html_table() function from the rvest package to parse HTML tables. For
example, table <- webpage %>% html_nodes("table") %>% html_table() extracts tables
from the webpage. The result is a list of data frames, one for each table.
1. Explain the process of reading different types of data files in R. Describe how to read
CSV, Excel, and built-in datasets.
2. Discuss how to handle missing values or data inconsistencies during the data import
process.
3. Explain how to inspect and validate the imported data for correctness.
4. Discuss the process of reading text files in R. Explain how to read both small and large
text files using functions.
5. Explain how to write and save data to files in R. Discuss how to save data frames to CSV,
Excel, or text files.
6. Discuss how to make HTTP requests and interact with REST APIs in R. Explain how to
use the httr package.
7. Explain the process of web scraping in R using the rvest package. Discuss how to extract
data from HTML web pages.
8. Discuss how to handle challenges such as dynamic content loading and data cleaning
during web scraping.
9. How to handle messy data in R, including missing values, inconsistent data formats, and
duplicates?
10. Explain the importance of renaming columns in a data frame and how to do it in R.
Discuss how to rename individual columns, multiple columns, and handle issues with
inconsistent or non-descriptive names.
11. Explain the purpose of the attach() and detach() functions in R. Discuss how these
functions simplify the access to data frame columns.
12. How to tabulate data in R and create simple frequency tables. Explain how to use the
table() function to summarize categorical variables?
13. Discuss the concept of factor variables in R and explain how to order them.
14. Explain how to convert a character vector to a factor and specify the order of levels with
examples.
UNIT- 4
PART – A
2. How can you refine and customize charts using themes in ggplot2?
Themes in ggplot2 modify the non-data elements of a plot, such as background,
gridlines, and text. Built-in themes like theme_bw() and theme_minimal() offer different
styles. You can also create custom themes by adjusting specific components, such as font
size and color. Customizing themes improves the overall readability and presentation of
plots.
3. What is the purpose of a scatter plot in data visualization?
A scatter plot is used to visualize the relationship between two continuous variables.
Each point on the plot represents an observation, with its position determined by the values
of the variables. For example, plotting height vs. weight can reveal trends or correlations.
It helps to identify patterns, clusters, and outliers in the data.
9. How are SQL functions like SELECT, WHERE, and ORDER BY used in data querying?
SQL functions like SELECT are used to specify which columns to retrieve, FROM
indicates the data source, and WHERE filters data based on conditions. The ORDER BY
clause sorts the results, while LIMIT restricts the number of rows. Functions like MAX()
and MIN() return the maximum and minimum values, respectively, for a column.
2. What are some common file formats that can be imported into R, and which
functions are used to import them?
Common file formats include CSV (read.csv()), Excel (readxl::read_excel()), and
text files (read.table()). For specific data sources, functions like readRDS() for R data
files and DBI for databases can also be used. These functions load the data into R for
analysis.
10. How do you use the aggregate() function in R for data aggregation?
The aggregate() function is used to apply summary statistics to grouped data. For
example, aggregate(x ~ group, data = df, FUN = mean) calculates the mean of x for
each unique value in group within the data frame df.
15. What does the dplyr package in R provide for statistical analysis?
The dplyr package provides functions for data manipulation, such as filter() for
subsetting data, mutate() for creating new variables, arrange() for sorting data, and
summarize() for calculating summary statistics. It simplifies the process of wrangling
data before analysis.
18. How can you save the output of statistical analysis to a text file in R?
We can save the output to a text file using sink(). For example, sink(“output.txt”)
directs the output of R commands to “output.txt”. After running analysis or commands,
sink() should be turned off with sink() to stop writing to the file.
PART- B