R Programming Unit 2
R Programming Unit 2
• R can read and write into various file formats like csv, excel, xml etc.
• You can check which directory the R workspace is pointing to using the getwd() function.
The csv file is a text file in which the values in the Following is a simple example of read.csv()
columns are separated by a comma. function to read a CSV file available in your
current working directory:
Let's consider the following data present in the file data <-
named input.csv.
read.csv("input.csv")
id,name,salary,start_date print(data)
,dept 1,Rick,623.3,2012-
When we execute the above code, it produces the following result:
01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
,Gary,843.25,2015-03-
27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-
30,Operations
8,Guru,722.5,2014-06-
17,Finance
Analysing the CSV File :
By default the read.csv() function gives the output as a data frame. This can be easily checked as follows.
Also we can check the number of columns and rows.
data <- read.csv("input.csv")
Get the details of the person with max salary
print(is.data.frame(data) We can fetch rows meeting specific filter criteria similar to a SQL where
print(ncol(data)) clause.
print(nrow(data)) # Create a data frame.
data <- read.csv("input.csv")
When we execute the above code, it produces the following result:
# Get the max salary from data frame.
[1] TRUE
sal <- max(data$salary)
[1] 5
# Get the person detail having max salary.
[1] 8
retval <- subset(data, salary== max(salary))
Once we read data in a data frame, we can apply all the functions
print(retval)
applicable to data frames as explained in subsequent section.
Get the maximum salary:
# Create a data frame.
data <- read.csv("input.csv")
write.csv(retval,"output.csv") row.names=FALSE)
print(newdata) print(newdata)
Here the column X comes from the data set newdata. This can be
dropped using additional parameters while writing the file.
2. R- Excel Files :
• Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format.
• R can read directly from these files using some excel specific packages.
• Few such packages are - XLConnect, xlsx, gdata etc. We will be using xlsx package. R can also write into excel file
using this package.
R has two functions WriteBin() and readBin() to create and read binary files.
Example :
We consider the R inbuilt data "mtcars". First we create a csv file from it and convert it to a binary file and
store it as a OS file. Next we read this binary file created into R.
We read the data frame "mtcars" as a csv file and then write it as a binary file to the OS.
# Read the "mtcars" data frame as a csv file and store only the columns
"cyl","am" and "gear".
# Create a connection object to write the binary file using mode "wb".
write.filename = file("/web/com/binmtcars.dat", "wb")
# Write the column names of the data frame to the connection object.
writeBin(colnames(new.mtcars), write.filename)
# Close the file for writing so that it can be read by other program.
close(write.filename)
Reading a Binary file :
The binary file created above stores all the data as continuous bytes. So we will read it by choosing
appropriate values of column names as well as the column values.
# Create a connection object to read the file in binary mode using "rb".
The JSON file is read by R using the function from JSON(). It is We can convert the extracted data above to a R data frame
stored as a list in R. for further analysis using the as.data.frame() function.
CALLING FUNCTIONS
R – Function :
A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and
the user can create their own functions.
In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be
necessary for the function to accomplish the actions
The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other
objects.
Defination :
An R function is created by using the keyword function. The basic syntax of an R function definition is as follows:
• Function Name: This is the actual name of the function. It is stored in R environment as an object with
this name.
• Arguments: An argument is a placeholder. When a function is invoked, you pass a value to the
argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can
have default values.
• Function Body: The function body contains a collection of statements that defines what the function
does.
• Return Value: The return value of a function is the last expression in the function body to be evaluated.
R has many in-built functions which can be directly called in the program without defining them first. We
can also create and use our own functions referred as user defined functions.
Built in functions :
Simple examples of in-built functions are seq(), mean(), max(), sum(x)and paste(...) etc. They are directly called by
user written programs. You can refer most widely used R functions.
User defined functions :
We can create user-defined functions in R. They are specific to what a user wants and once created they
can be used like the built-in functions. Below is an example of how a function is created and used
Calling functions :
Calling a Function with Argument Values (by position and by
# Create a function to print squares of numbers in
name)
sequence. new.function <- function(a) { The arguments to a function call can be supplied in the same
for(i in 1:a) { sequence as defined in the function or they can be supplied in
b <- i^2 a different sequence but assigned to the names of the
print(b) arguments.
}
}
Arguments to functions are evaluated lazily, which means so they are evaluated only when needed by
the function body.
# Create a function with arguments. new.function
<- function(a, b) {
print(a^2)
print(a)
print(b)
}
• In R programming, "conditions" typically refer to logical expressions or statements that evaluate to either
"TRUE" or "FALSE."
• These conditions are used in control structures like if statements, while loops, and for loops to make
decisions in the program.
• Conditions in R can involve comparisons (e.g., greater than, less than), logical operators (e.g., AND, OR),
and functions that return logical values. Here are some examples of conditions in R:
Statement Description
A loop statement allows us to execute a statement or group of statements multiple times and the following is the
general form of a loop statement in most of the programming languages
R programming language provides the following kinds of loop to handle looping requirements. Click the
following links to check their detail.
repeat loop Executes a sequence of statements multiple times and abbreviates the code
that manages the loop variable.
while loop Repeats a statement or group of statements while a given condition is true. It
tests the condition before executing the loop body.
for loop Like a while statement, except that it tests the condition at the end of the loop
body.
Repeat loop :
While loop :
R - ForLoop :
Loop control statements :
R supports the following control statements. Click the following links to check their detail.
break statement Terminates the loop statement and transfers execution to the
statement immediately following the loop.
x <- 10
y <- 5
if (x > 5) {
if (y > 2) {
print("Both x and y are greater than 5 and 2, respectively.")
} else {
print("x is greater than 5, but y is not greater than 2.")
}
} else {
print("x is not greater than 5.")
}
Writing functions
Defining a function allows you to reuse a chunk of code without endlessly copying and
pasting.
It also allows other users to use your functions to carry out the same computations on their
own data or objects.
Function creation :
Syntax :
Adding arguments : Using return :
# Define a function to add two numbers # Define a function to add two numbers
add_numbers <- function(num1, num2) { add_numbers <- function(num1, num2) {
result <- num1 + num2 result <- num1 + num2
return(result) return(result)
} }
# Call the function with specific values # Call the function with specific values
num1 <- 5 num1 <- 5
num2 <- 7 num2 <- 7
sum_result <- add_numbers(num1, num2) sum_result <- add_numbers(num1, num2)
Checking for missing argument : Dealing with ellipses :
1. The "missing" function checks if required arguments have been 1. The ellipsis (or "..." notation) is a feature in R that allows you to
provided to a function. pass extra, unspecified arguments to a function without explicitly
defining them in the argument list.
2. It takes an argument tag and returns `TRUE` if that specified
argument is missing. 2. These extra arguments can be collected and used in the
function's code body.
3. This function helps prevent errors by ensuring that necessary
arguments are provided when calling a function. 3. Typically, the ellipsis is placed at the end of the argument list
because it represents a variable number of arguments.
4. For example, you can use "missing" to avoid errors when a
required argument is not supplied. 4. It's a useful way to make functions more flexible by allowing users
to provide additional inputs beyond the explicitly defined
5. "Missing" is particularly useful in the body code of a function to arguments.
handle missing or optional arguments effectively.
5. You can then pass these extra arguments to other functions
# Define a function that checks if an argument is missing
check_argument <- function(str1) { within the code body.
if (missing(str1)) {
cat("The 'str1' argument is missing.\n") 6. The ellipsis is handy when you want to create functions that can
} else { handle various inputs without explicitly specifying them in the
cat("The 'str1' argument is present.\n") function definition.
}
}
Specialized functions :
1. Helper Functions
- Designed to be called multiple times by another function.
- Can be defined inside the body of a parent function.
- Assist in specific tasks, improving code organization and readability.
• Externally defined
• Internally defined
- Directly defined as an argument for another function. - Call themselves during execution.
- Temporary functions for specific tasks. - Break down problems into smaller, similar sub-problems.
- Used briefly and discarded. - Suitable for repetitive patterns or structures.
When an exception occurs, it disrupts the normal flow of the program and may lead to unexpected behavior or
termination. Exception handling in R involves using mechanisms like the `tryCatch` function to anticipate and
manage these exceptional situations, allowing the program to gracefully handle errors without crashing.
1. Try-Catch Blocks: In R, exception handling is primarily done using try-catch blocks.
2. try() Function: The try() function allows you to execute a block of code and catch any resulting errors or exceptions.
3. tryCatch() Function: This function provides more fine-grained control over error handling by allowing specific handlers
for different types of errors.
4. Error Classes: R uses different error classes (e.g., warning, error) to categorize issues that may arise during code
execution.
5. warning() Function: To handle warnings, the `warning()` function can be employed within a try-catch block.
6. stop() Function: To intentionally generate an error, the `stop()` function can be used. This can be caught using a try-
catch block.
7. Logging Errors: It's common to log errors using functions like `cat()` or write them to a log file for future reference.
8. Handling Multiple Conditions: tryCatch() allows handling multiple conditions in a single block, improving code
readability.
9. Custom Error Messages: Creating informative custom error messages using stop() helps in debugging and
understanding issues.
10. Debugging Tools: R provides debugging tools like debug() and browser() to interactively explore code execution
when errors occur.
Timing and Visualisation
1. Purpose of Tracking Progress and Timing :
Explain the significance of monitoring progress and timing in R for lengthy numeric exercises, such as simulations or complex
operations. Emphasize the need to compare approaches for efficiency and evaluate task completion times.
prog_test <- function(n){ - Function Creation : The code defines a function named `prog_test`
result <- 0 that takes a number (`n`) as input.
progbar <- - Initialization: It starts a counter (`result`) at zero to keep track of the
txtProgressBar(min=0,max=n,style=1,char="=") total count.
for(i in 1:n){ - Progress Bar Setup: It creates a progress bar (`progbar`) that displays
result <- result + 1 Sys.sleep(0.5) the progress of the counting process from 0 to `n`.
setTxtProgressBar(progbar,value=i) - Counting Loop: Using a loop (`for(i in 1:n){...}`), it counts up to `n`.
} - Counting and Pause: With each count, it increments the `result` by 1
close(progbar) and pauses for 0.5 seconds (`Sys.sleep(0.5)`).
return(result) } - Updating Progress Bar: It updates the progress bar
(`setTxtProgressBar(progbar, value=i)`) to illustrate the current
progress.
- Completion: Once the counting loop finishes, it closes the progress
bar (`close(progbar)`).
- Result Return: Finally, it returns the total count (`result`).