0% found this document useful (0 votes)
202 views46 pages

R Programming Unit 2

The document discusses file handling in R programming. It covers reading and writing different file formats like CSV, Excel, binary, and XML files. Some key points covered include: - Getting and setting the working directory in R using getwd() and setwd() functions. - Reading a CSV file into a data frame using read.csv() and analyzing the data frame. - Writing a CSV file from an existing data frame using write.csv(). - Reading and writing Excel files using the xlsx package functions like read.xlsx() and write.xlsx(). - Creating and reading binary files using writeBin() and readBin() to store data as bytes. - Reading an XML file

Uploaded by

suhasmnnarayan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
202 views46 pages

R Programming Unit 2

The document discusses file handling in R programming. It covers reading and writing different file formats like CSV, Excel, binary, and XML files. Some key points covered include: - Getting and setting the working directory in R using getwd() and setwd() functions. - Reading a CSV file into a data frame using read.csv() and analyzing the data frame. - Writing a CSV file from an existing data frame using write.csv(). - Reading and writing Excel files using the xlsx package functions like read.xlsx() and write.xlsx(). - Creating and reading binary files using writeBin() and readBin() to store data as bytes. - Reading an XML file

Uploaded by

suhasmnnarayan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 46

File Handling

(Reading and writing files in R programming)


File Handling (Reading and writing files in R programming)
• In R, we can read data from files stored outside the R environment. We can also write data into files which will be
stored and accessed by the operating system.

• R can read and write into various file formats like csv, excel, xml etc.

Getting and setting the working directory :

• You can check which directory the R workspace is pointing to using the getwd() function.

• You can also set a new working directory using setwd()function.

# Get and print current working


directory. print(getwd())

# Set current working


directory.
setwd("/web/com")

# Get and print current working


directory. print(getwd())
1. Input as CSV File : Reading a CSV File :

The csv file is a text file in which the values in the Following is a simple example of read.csv()
columns are separated by a comma. function to read a CSV file available in your
current working directory:
Let's consider the following data present in the file data <-
named input.csv.
read.csv("input.csv")
id,name,salary,start_date print(data)
,dept 1,Rick,623.3,2012-
When we execute the above code, it produces the following result:
01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR

,Gary,843.25,2015-03-
27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-
30,Operations
8,Guru,722.5,2014-06-
17,Finance
Analysing the CSV File :

By default the read.csv() function gives the output as a data frame. This can be easily checked as follows.
Also we can check the number of columns and rows.
data <- read.csv("input.csv")
Get the details of the person with max salary
print(is.data.frame(data) We can fetch rows meeting specific filter criteria similar to a SQL where
print(ncol(data)) clause.
print(nrow(data)) # Create a data frame.
data <- read.csv("input.csv")
When we execute the above code, it produces the following result:
# Get the max salary from data frame.
[1] TRUE
sal <- max(data$salary)
[1] 5
# Get the person detail having max salary.
[1] 8
retval <- subset(data, salary== max(salary))
Once we read data in a data frame, we can apply all the functions
print(retval)
applicable to data frames as explained in subsequent section.
Get the maximum salary:
# Create a data frame.
data <- read.csv("input.csv")

# Get the max salary from data frame.


sal <- max(data$salary)
print(sal)
Writing to a CSV File :
R can create csv file form existing data frame. The write.csv() function is used to create the csv file.
This file gets created in the working directory.
# Create a data frame.
# Create a data frame.
data <- read.csv("input.csv")
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
# Write filtered data into a new file.

# Write filtered data into a new file. write.csv(retval,"output.csv",

write.csv(retval,"output.csv") row.names=FALSE)

newdata <- read.csv("output.csv") newdata <- read.csv("output.csv")

print(newdata) print(newdata)

When we execute the above code, it produces the following result:

When we execute the above code, it produces the following result:

Here the column X comes from the data set newdata. This can be
dropped using additional parameters while writing the file.
2. R- Excel Files :

• Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format.
• R can read directly from these files using some excel specific packages.
• Few such packages are - XLConnect, xlsx, gdata etc. We will be using xlsx package. R can also write into excel file
using this package.

Install xlsx Package :


install.packages("xlsx")

Verify and load the “xlsx” Package :


# Verify the package is installed.
any(grepl("xlsx",installed.packages()))

# Load the library into R


workspace. library("xlsx")

When the script runs we get the following output.


[1] TRUE

Loading required package:


rJava Loading required
package: methods Loading
required package: xlsxjars
Input as xlsx File : Reading the Excel File :
Open Microsoft excel. Copy and paste the following data in the The input.xlsx is read by using the read.xlsx() function as shown below.
work sheet named as sheet1 The result is stored as a data frame in the R environment.

Save the Excel file as "input.xlsx". You should save it in


the current working directory of the R workspace.
3. R- Binary Files :
• A binary file is a file that contains information stored only in form of bits and bytes.(0’s and 1’s).
• Attempting to read a binary file using any text editor will show characters like Ø and ð.
• Sometimes, the data generated by other programs are required to be processed by R as a binary file. Also R
is required to create binary files which can be shared with other programs.

R has two functions WriteBin() and readBin() to create and read binary files.
Example :
We consider the R inbuilt data "mtcars". First we create a csv file from it and convert it to a binary file and
store it as a OS file. Next we read this binary file created into R.

Writing the Binary File :

We read the data frame "mtcars" as a csv file and then write it as a binary file to the OS.

# Read the "mtcars" data frame as a csv file and store only the columns
"cyl","am" and "gear".

write.table(mtcars, file = "mtcars.csv",row.names=FALSE, na="",col.names=TRUE,


sep=",")

# Store 5 records from the csv file as a new data frame.


new.mtcars <- read.table("mtcars.csv",sep=",",header=TRUE,nrows = 5)

# Create a connection object to write the binary file using mode "wb".
write.filename = file("/web/com/binmtcars.dat", "wb")

# Write the column names of the data frame to the connection object.
writeBin(colnames(new.mtcars), write.filename)

# Write the records in each of the column to the file.


writeBin(c(new.mtcars$cyl,new.mtcars$am,new.mtcars$gear), write.filename)

# Close the file for writing so that it can be read by other program.
close(write.filename)
Reading a Binary file :
The binary file created above stores all the data as continuous bytes. So we will read it by choosing
appropriate values of column names as well as the column values.

# Create a connection object to read the file in binary mode using "rb".

read.filename <- file("/web/com/binmtcars.dat", "rb")


# First read the column names. n=3 as we have 3 columns.
column.names <- readBin(read.filename, character(), n = 3)
# Next read the column values. n=18 as we have 3 column names and 15 values.
read.filename <- file("/web/com/binmtcars.dat", "rb")
bindata <- readBin(read.filename, integer(), n = 18)

# Print the data.


print(bindata)
4. R-XML Files :
You can read a xml file in R using the "XML" package. This package can be installed using following command .
install.packages("XML")

Reading the XML file : XML to Dataframe :


The xml file is read by R using the function xmlParse(). It is stored as a To handle the data effectively in large files we read the data in
list in R. the xml file as a data frame. Then process the data frame for
data analysis.
5. R-JSON Files :
JSON file stores data as text in human-readable format. Json stands for JavaScript Object Notation. R can read JSON
files using the Json package.
In the R console, you can issue the following command to install the rjson package.
install.packages("rjson")

Reading the JSON file : Convert JSON file to a DataFrame:

The JSON file is read by R using the function from JSON(). It is We can convert the extracted data above to a R data frame
stored as a list in R. for further analysis using the as.data.frame() function.
CALLING FUNCTIONS
R – Function :

A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and
the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be
necessary for the function to accomplish the actions

The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other
objects.

Defination :

An R function is created by using the keyword function. The basic syntax of an R function definition is as follows:

function_name <- function(arg_1, arg_2, ...)


{ Function body
}
Function components :

• Function Name: This is the actual name of the function. It is stored in R environment as an object with
this name.

• Arguments: An argument is a placeholder. When a function is invoked, you pass a value to the
argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can
have default values.

• Function Body: The function body contains a collection of statements that defines what the function
does.

• Return Value: The return value of a function is the last expression in the function body to be evaluated.

R has many in-built functions which can be directly called in the program without defining them first. We
can also create and use our own functions referred as user defined functions.
Built in functions :
Simple examples of in-built functions are seq(), mean(), max(), sum(x)and paste(...) etc. They are directly called by
user written programs. You can refer most widely used R functions.
User defined functions :

We can create user-defined functions in R. They are specific to what a user wants and once created they
can be used like the built-in functions. Below is an example of how a function is created and used
Calling functions :
Calling a Function with Argument Values (by position and by
# Create a function to print squares of numbers in
name)
sequence. new.function <- function(a) { The arguments to a function call can be supplied in the same
for(i in 1:a) { sequence as defined in the function or they can be supplied in
b <- i^2 a different sequence but assigned to the names of the
print(b) arguments.
}
}

# Call the function new.function supplying 6 as an argument.


new.function(6)
Lazy evaluation :

Arguments to functions are evaluated lazily, which means so they are evaluated only when needed by
the function body.
# Create a function with arguments. new.function
<- function(a, b) {

print(a^2)
print(a)
print(b)
}

# Evaluate the function without supplying one of the arguments.


new.function(6)
Conditions and Loops
Conditions :

• In R programming, "conditions" typically refer to logical expressions or statements that evaluate to either
"TRUE" or "FALSE."
• These conditions are used in control structures like if statements, while loops, and for loops to make
decisions in the program.
• Conditions in R can involve comparisons (e.g., greater than, less than), logical operators (e.g., AND, OR),
and functions that return logical values. Here are some examples of conditions in R:

x > 5 (Is x greater than 5?)


y == "apple" (Is y equal to the string "apple"?)
z <= 10 & w > 2 (Is z less than or equal to 10 AND w greater than 2?)
is.na(a) (Is 'a' a missing value?)
R provides the following types of condition statements.

Statement Description

if statement An if statement consists of a Boolean expression followed by one or


more statements.

if...else statement An if statement can be followed by an optional else statement, which


executes when the Boolean expression is false.

switch statement A switch statement allows a variable to be tested for equality


against a list of values.
R – IF Statements :
R – If else Statements :
R – If… else if..else.. Statements :
R – Switch Statements :
R- Loops :
• There may be a situation when you need to execute a block of code several number of times. In
general, statements are executed sequentially.
• The first statement in a function is executed first, followed by the second, and so on.
• Programming languages provide various control structures that allow for more complicated
execution paths.

A loop statement allows us to execute a statement or group of statements multiple times and the following is the
general form of a loop statement in most of the programming languages
R programming language provides the following kinds of loop to handle looping requirements. Click the
following links to check their detail.

Loop Type Description

repeat loop Executes a sequence of statements multiple times and abbreviates the code
that manages the loop variable.

while loop Repeats a statement or group of statements while a given condition is true. It
tests the condition before executing the loop body.
for loop Like a while statement, except that it tests the condition at the end of the loop
body.
Repeat loop :
While loop :
R - ForLoop :
Loop control statements :

• Loop control statements change execution from its normal sequence.


• When execution leaves a scope, all automatic objects that were created in that scope
are destroyed.

R supports the following control statements. Click the following links to check their detail.

Control Statement Description

break statement Terminates the loop statement and transfers execution to the
statement immediately following the loop.

Next statement The next statement simulates the behavior of R switch.


Break Statement :
Next Statement :
Stacking Statement or nested statements :

1. You can put an "if statement" inside another "if statement."


2. This is called "nesting" or stacking" if statements.
3. It allows you to make complex decisions by checking multiple conditions one after
the other.
4. You can check conditions at different stages during your program's execution.

x <- 10
y <- 5

if (x > 5) {
if (y > 2) {
print("Both x and y are greater than 5 and 2, respectively.")
} else {
print("x is greater than 5, but y is not greater than 2.")
}
} else {
print("x is not greater than 5.")
}
Writing functions

Defining a function allows you to reuse a chunk of code without endlessly copying and
pasting.

It also allows other users to use your functions to carry out the same computations on their
own data or objects.
Function creation :

1. You can create a function in R using the `function` command.


2. You assign the function to a name, which you'll use to call it.
3. A function can have arguments (inputs), and these are specified in parentheses.
4. The arguments are like placeholders and don't have values initially.
5. Inside the function, you write the code that runs when the function is called.
6. You can use if statements, loops, and even call other functions within the function.
7. Arguments inside the function are treated as objects in its environment.
8. Functions should be documented to explain what they expect and do.
9. You can use the `return` command to send results back to the user.
10. If there's no `return`, the function will return the last result automatically.

Syntax :
Adding arguments : Using return :

1. If a function doesn't have a `return` statement, it will end when it


Argument Declaration: Inside the function reaches the last line of code in the function's body.
definition, you specify the arguments (inputs) the
function should accept. These arguments are like 2. In such cases, the function will return the most recently assigned or
placeholders for the values you'll provide when you created object in the function.
call the function.
3. If the function doesn't create or assign any objects, it will return
`NULL`, which means there's no specific result.
Example :

# Define a function to add two numbers # Define a function to add two numbers
add_numbers <- function(num1, num2) { add_numbers <- function(num1, num2) {
result <- num1 + num2 result <- num1 + num2
return(result) return(result)
} }

# Call the function with specific values # Call the function with specific values
num1 <- 5 num1 <- 5
num2 <- 7 num2 <- 7
sum_result <- add_numbers(num1, num2) sum_result <- add_numbers(num1, num2)
Checking for missing argument : Dealing with ellipses :
1. The "missing" function checks if required arguments have been 1. The ellipsis (or "..." notation) is a feature in R that allows you to
provided to a function. pass extra, unspecified arguments to a function without explicitly
defining them in the argument list.
2. It takes an argument tag and returns `TRUE` if that specified
argument is missing. 2. These extra arguments can be collected and used in the
function's code body.
3. This function helps prevent errors by ensuring that necessary
arguments are provided when calling a function. 3. Typically, the ellipsis is placed at the end of the argument list
because it represents a variable number of arguments.
4. For example, you can use "missing" to avoid errors when a
required argument is not supplied. 4. It's a useful way to make functions more flexible by allowing users
to provide additional inputs beyond the explicitly defined
5. "Missing" is particularly useful in the body code of a function to arguments.
handle missing or optional arguments effectively.
5. You can then pass these extra arguments to other functions
# Define a function that checks if an argument is missing
check_argument <- function(str1) { within the code body.
if (missing(str1)) {
cat("The 'str1' argument is missing.\n") 6. The ellipsis is handy when you want to create functions that can
} else { handle various inputs without explicitly specifying them in the
cat("The 'str1' argument is present.\n") function definition.
}
}
Specialized functions :

1. Helper Functions
- Designed to be called multiple times by another function.
- Can be defined inside the body of a parent function.
- Assist in specific tasks, improving code organization and readability.

• Externally defined
• Internally defined

Externally defined : Internally defined :


# External Helper Function Definition # Main Function with Internal Helper Function
multiplyByTwo <- function(x) { mainFunction <- function(x) {
result <- x * 2 # Internal Helper Function Definition
return(result) square <- function(num) {
} return(num^2)
}
# Main Function Using the External Helper Function
mainFunction <- function(y) { # Using the Internal Helper Function
# Call the external helper function result <- square(x)
result <- multiplyByTwo(y) return(result)
return(result) }
}
# Example Usage
# Example Usage mainResult <- mainFunction(4)
mainResult <- mainFunction(5) print(mainResult) # Output: 16
print(mainResult) # Output: 10
2. Disposable Functions: 3. Recursive Functions:

- Directly defined as an argument for another function. - Call themselves during execution.
- Temporary functions for specific tasks. - Break down problems into smaller, similar sub-problems.
- Used briefly and discarded. - Suitable for repetitive patterns or structures.

# Main Function with Disposable Function


mainFunction <- function(x, disposableFunc) {
# Recursive Function Example
# Using the Disposable Function as an Argument
factorial <- function(n) {
result <- disposableFunc(x)
if (n == 0 | n == 1) {
return(result)
return(1)
}
} else {
return(n * factorial(n - 1))
# Disposable Function Definition
}
doubleValue <- function(num) {
}
return(num * 2)
}
# Example Usage
result <- factorial(5)
# Example Usage
print(result) # Output: 120
mainResult <- mainFunction(5, doubleValue)
print(mainResult) # Output: 10
EXCEPTIONS
In R programming, an exception refers to an abnormal event or error that occurs during the execution of a
program. Exceptions can be caused by various reasons, such as unexpected input, division by zero, or attempting
operations that are not defined.

When an exception occurs, it disrupts the normal flow of the program and may lead to unexpected behavior or
termination. Exception handling in R involves using mechanisms like the `tryCatch` function to anticipate and
manage these exceptional situations, allowing the program to gracefully handle errors without crashing.
1. Try-Catch Blocks: In R, exception handling is primarily done using try-catch blocks.

2. try() Function: The try() function allows you to execute a block of code and catch any resulting errors or exceptions.
3. tryCatch() Function: This function provides more fine-grained control over error handling by allowing specific handlers
for different types of errors.
4. Error Classes: R uses different error classes (e.g., warning, error) to categorize issues that may arise during code
execution.
5. warning() Function: To handle warnings, the `warning()` function can be employed within a try-catch block.
6. stop() Function: To intentionally generate an error, the `stop()` function can be used. This can be caught using a try-
catch block.
7. Logging Errors: It's common to log errors using functions like `cat()` or write them to a log file for future reference.
8. Handling Multiple Conditions: tryCatch() allows handling multiple conditions in a single block, improving code
readability.
9. Custom Error Messages: Creating informative custom error messages using stop() helps in debugging and
understanding issues.
10. Debugging Tools: R provides debugging tools like debug() and browser() to interactively explore code execution
when errors occur.
Timing and Visualisation
1. Purpose of Tracking Progress and Timing :
Explain the significance of monitoring progress and timing in R for lengthy numeric exercises, such as simulations or complex
operations. Emphasize the need to compare approaches for efficiency and evaluate task completion times.

2. Tools for Tracking Progress :


- Progress Bar : Explain the concept of a progress bar as a visual representation of the execution progress in R.
- Sys.sleep Function : Describe the `Sys.sleep()` command, which causes R to pause for a specified duration in seconds, allowing
demonstration of code execution time.

3. Demonstration of Progress Bar Implementation :


- Create a demonstration using `Sys.sleep()` to simulate a task taking a considerable amount of time.
- Use the progress bar to visually indicate the progress of the task execution.

4. Importance of Timing Code Execution :


- Explain the importance of timing code execution for benchmarking and comparing different programming approaches to solve a
problem.
- Highlight how tracking time helps in identifying bottlenecks and optimizing code for efficiency.

5. Comparison of Programming Approaches :


- Utilize timing mechanisms to compare the speed and efficiency of different programming approaches in R.
- Showcase how timing code execution aids in selecting the most efficient approach for a given problem.
Example Program : Explanation :

prog_test <- function(n){ - Function Creation : The code defines a function named `prog_test`
result <- 0 that takes a number (`n`) as input.
progbar <- - Initialization: It starts a counter (`result`) at zero to keep track of the
txtProgressBar(min=0,max=n,style=1,char="=") total count.
for(i in 1:n){ - Progress Bar Setup: It creates a progress bar (`progbar`) that displays
result <- result + 1 Sys.sleep(0.5) the progress of the counting process from 0 to `n`.
setTxtProgressBar(progbar,value=i) - Counting Loop: Using a loop (`for(i in 1:n){...}`), it counts up to `n`.
} - Counting and Pause: With each count, it increments the `result` by 1
close(progbar) and pauses for 0.5 seconds (`Sys.sleep(0.5)`).
return(result) } - Updating Progress Bar: It updates the progress bar
(`setTxtProgressBar(progbar, value=i)`) to illustrate the current
progress.
- Completion: Once the counting loop finishes, it closes the progress
bar (`close(progbar)`).
- Result Return: Finally, it returns the total count (`result`).

You might also like