0% found this document useful (0 votes)

78 views

Introduction To R Installation: Data Types Value Examples

The document provides an introduction to the R programming language including how to install R and R Studio, basic data types in R like numeric and character, common data structures like vectors and data frames, and how to define functions in R. Functions allow users to perform custom computations and are a key part of the R language.

Uploaded by

Denis Shpeka

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

Introduction To R Installation: Data Types Value Examples

Uploaded by

Denis Shpeka

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Introduction to R

Installation
Installation of the basic R package is fairly simple. You need to complete the following steps:

1. Visit the website of one of the mirrors of the R project (e.g. http://cran.gis-lab.info/)
2. Follow the link corresponding to your operating system (Download R for Linux/Mac/Windows)
3. Download the installer (using the instructions on the website, the download starts when you click
on “install R for the first time”)
4. Launch the installer (for example, in Windows that is R-3.2.0-win.exe)

Along with the basic R package we suggest installing the R Studio, software that completes the core R with
a convenient integrated development environment (IDE) that often makes life much easier. The website
of this project where you can find the installation instructions is http://www.rstudio.com/.

In all the examples below we will work in R console. It appears automatically when you launch R Studio in
the upper left part of the screen and invites the user to start typing R commands with the following
message:

R version 3.1.1 (2014-07-10) -- "Sock it to Me"

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.

Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> Start typing your R code here

Data types
Like any other programming language, R supports different data types to work with different kinds of
values: integer, numeric, logic etc. The basic data types in R are described in the table below.

Data types Value examples

integer 0L, 1L
numeric 0, 1, 2.3, Inf, NaN
complex 3+4i
logical TRUE, FALSE, T, F
character "hi"

The R language manages the types dynamically: this means that, when seeing an expression like one of
those in the right column of the table, the R interpreter automatically determines its type. For example,
if you want assign a logical value to some variable x, you don’t need to define the type of x explicitly, you
have only to actually assign the value to it with no declarations (the assignment operator in R is an arrow
“<-“):

> x <- TRUE

Having an already existing variable x, you can perform a range of operations (functions) to determine or
change its type:

• Getting the type: class(x)

• Type check: is.[type](x)
• Type cast: as.[type](x)
For example:

> x <- 3.14

> class(x)
[1] "numeric"
> is.numeric(x)
[1] TRUE
> is.integer(x)
[1] FALSE
> as.integer(x)
[1] 3

It is often the case that the data about certain parameter values for certain observations is missing. R
provides a mechanism for handling that case. Missing (unknown) observation values have a special NA
type in R (“Not available”). The corresponding operations are:

• Checking that a value of a variable is of type NA: is.na(x)

• Filtering out the NA values in an array of observations: na.omit(data)

The na.omit() function is very useful when you want to calculate some statistics based on the data that
possibly has some missing values (NA). If that is the case, then the mean() function (that computes the
mean) also returns NA, which is not the desired behavior in most cases. Here na.omit() solves the
problem:

> data <- c(1, 2, 3, NA, 4)

> mean(data)
[1] NA
> mean(na.omit(data))
[1] 2.5

Compare with MATLAB: is has a similar mechanism for solving this problem, namely the functions like
nanmean().

Basic data structures

Values of basic data types (numbers, strings etc.) are usually grouped into data structures that make it
possible to work with arrays of data as with a single object. The basic data structures in R are listed in the
table below. In the right column, there are examples of expressions that can be used to construct the
corresponding data structures.

Data structure Definition examples

vector с(1L, 2L, 3L), 1:3, vector("integer", 3)
list list(1L, 2.3, "hi", F)
factor factor(c("Male", "Female", "Male", "Male"))
matrix matrix(1:6, nrow = 2, ncol = 3)
data.frame(
age = 18:23,
data.frame height = c(170, 171, NA, 176, 173, 180),
sex = factor(c("m", "f", "m", "m", "f", "m"))
)

The vector data structure is in fact a list of values of the same type: in the examples above, the first two
vectors hold the values [1, 2, 3], and in the last example the vector of three zeros gets created. A list
differs from a vector in that it can hold values of different types at the same time, including other lists.
Factor is a sort of vector used to encode categorical (nominal) data (is the example above it is used to
encode the observations of a variable that takes two values: “Male”/”Female”). A matrix in R can be
created by specifying the vector of the values in its cells and its dimensions (nrow/ncol parameters).
Finally, data.frame objects hold data tables (observations of several attributes). Data frames are thus the
basic data structure for many real-world data analysis tasks. An example of data.frame creation can be
seen below. This data frame contains data about six people (with age/heigh/sex attributes). After the code
you can see the resulting data table:

data <- data.frame(

age = 18:23,
height = c(170, 171, NA,
176, 173, 180),
sex = factor(c("m", "f", "m",
"m", "f", "m"))
)
age height sex
1 18 170 m
2 19 171 f
3 20 NA m
4 21 176 m
5 22 173 f
6 23 180 m

Functions in R
In programming, a function is a named section of a program that performs a specific task. Functions are
often also called methods or procedures. A function usually takes some parameter(s) as its input and
produces some value as its output. This return value can be a number, a logical value, or some complex
object like a plot.

In R, a user can both use a set of predefined functions incorporated into the language (like the
is.number() function that we’ve already seen before) and define his/her own functions that perform
some specific computations he or she needs. Here is the syntax of function definition in R:

myfunction <- function(arg1, arg2, ... ) {

statements
return (object)
}
To call a function, just write its name and pass the arguments to this function in parentheses:
> myfunction(<value1>, <value2>, …)
[1] <some return value>

Note that the functions in R are just like objects: they are assigned to variables (myfunction in the
example above) and can be passed to other functions as parameters. This is what makes R different from
popular non-functional languages like C or Java.

Let us define a simple function that computes the square of a number, and test it:
> square <- function(x) { return (x * x) }
> square(3)
[1] 9

It is also possible that a function calls itself in its body (main part). This is what is called recursion. A classic
example of a function that can be defined recursively is the factorial, defined for a non-negative integer
𝑛 as the product of all positive integers less than or equal to 𝑛. Note also the usage of the if ... else
expression in its definition. It allows us to specify what the function should do when the expression is true
and when it’s not:
> factorial <- function(x) {
if (x == 1) {
return (1)
} else {
return (x * factorial (x - 1))
}
}
> factorial(1)
[1] 1
> factorial(5)
[1] 120

Finally, let us re-implement the same factorial function without recursion. We can do that with a for-
loop, which is yet another basic construct in most programming languages. A for-loop makes it possible
to iterate through a collection of values and perform some steps at each iteration. The implementation
of the factorial function follows its definition: we iterate through all positive integers less than or equal to
x (the range of these values is denoted as 1:x) and add it to the overall product accumulated in the
variable called result:

factorial <- function(x) {

result <- 1
for (i in 1:x) {
result <- result * i
}
return (result)
}
> factorial(1)
[1] 1
> factorial(5)
[1] 120
Working with data (data frames)
As mentioned above, data.frame objects provide us with the basic “container” for data in R. Having a data
frame object (let’s call it data), you can pass it is a parameter to different built-in R functions to get its
dimensions, attribute names or to extract only certain parts of the data:

> nrows(data) # number of rows (records)

[1] 6
> ncol(data) # number of columns (attributes)
[1] 3
> names(data) # attribute names
[1] "age" "height" "sex"
> head(data, 3) # overview of the data (only the first three rows)
age height sex
1 18 170 m
2 19 171 f
3 20 NA m
Another useful function is called summary(). It outputs the statistics for all the attributes in the data
frame (minimum/maximum, quantiles, the number of NA values):

> summary(data)
age height sex
Min. :18.00 Min. :170 f:2
1st Qu.:19.25 1st Qu.:171 m:4
Median :20.50 Median :173
Mean :20.50 Mean :174
3rd Qu.:21.75 3rd Qu.:176
Max. :23.00 Max. :180
NA's :1
Data slicing in a data frame can be done both by rows and by columns. For both, there is the syntax of
form data[<row filter>, <column filter>]. For example, to retrieve the specified rows from
the data table, run:

> data[1,] # only the first row

age height sex
1 18 170 m

> data[1:3,] # rows from the first to the third

age height sex
1 18 170 m
2 19 171 f
3 20 NA m

> data[c(1,3),] # only the first and the third rows (indices vector)
age height sex
1 18 170 m
3 20 NA m

To get certain attributes only, you can use the data[<row filter>, <column filter>] syntax
again:

> data[, "height"]

[1] 170 171 NA 176 173 180
R also provides you with an alternative way to refer to attributes by name in a data frame separating the
data frame’s name from the name of the variable with a dollar sign ($):

> data$height
[1] 170 171 NA 176 173 180

This “$” syntax also works for named elements extraction out of an R list, as well as for R objects.

More complicated queries to a data frame can combine filters by rows and by columns. In the example
below, we retrieve the age of all men older than 20:

> data[data$age > 20 & data$sex == "m", "height"]

[1] 176 180

Other useful functions include all() and any(). These functions answer the question whether the
specified condition is true for all or any record in the data, correspondingly:

> all(data$age > 16)

[1] TRUE

> any(data$age > 60)

[1] FALSE

Finally, you can add new attributes to a data frame by passing the vector of values of this new attribute
for all records present in the data frame:

> data$foo <- c(6, 5, 4, 3, 2, 1)

> head(data, 3)
age height sex foo
1 18 170 m 6
2 19 171 f 5
3 20 NA m 4

Data visualization
In the simplest case, you can try to visualize the dependencies between attribute pairs using plots. For
this purpose, R has the plot() function that builds a dot diagram (scatter plot) on the plane. There is
also the lines() function for connecting the dots on the plot with lines. Let’s construct plot to estimate
the dependency between height and age for people in the dataset we have been using in the examples
above:

> plot(data$age, data$height, pch=19, xlab="Age", ylab="Height")

> lines(data[!is.na(data$height),]$age, na.omit(data$height), pch=19)
In fact, it makes little sense to connect dots in a scatterplot since in a larger data set there definitely would
be several observations with the same age that would make this plot too messy. But what is worth trying
to improve the visualization is the separation of points on our scatter plot by sex. In the example below
we do this by coloring the dots in red for men and in black for women. Note the alternative syntax for
specifying the attributes used here while constructing the scatter plot (data$height ~ data$age),
and also how the col parameter is used to set the dot colors:

> plot(data$height ~ data$age, pch=19, col=data$sex, xlab="Age",

ylab="Height")
> legend(18, 179, legend=c("Male", "Female"), col=c("red", "black"),
pch=19)

The plot() function can also build the scatterplot matrix for all attribute pairs if you pass the whole
data.frame object as its argument. Such a matrix can be useful as one of the first steps in data analysis:
it gives a clear idea of which attribute pairs expose a certain dependency and which seem to be
uncorrelated:

> plot(data)

Finally, to build histograms in R, one can use the barplot() function. This function should be provided
with not just the source data, but with the information about frequency of appearance of different values
of the attribute being investigated. These frequency values can be computed with the table() function.
In the example below we also pass as the second argument to barplot() the names of the bars on the
resulting histogram:
> barplot(table(data$sex), names.arg=c("Female", "Male"))

Importing data
The most simple way to load some existing data set to R is to read it from a file. The R language has a
range of functions for reading the data from different formats: read.csv() for CSV tables,
read.xlsx() (from package xlsx) for Excel tables, fromJSON() (package RJSONIO) for reading the
JSON data. All these functions produce a data.frame object. In the example below, we first download a
data set called «Iris flower» from the web to a temporary CSV file and then read this file in R:

> fileUrl <- "http://archive.ics.uci.edu/ml/machine-learning-

databases/iris/iris.data"
> download.file(fileUrl, destfile="iris.csv")
> iris.data <- read.csv("iris.csv")
> head(iris.data, 3) # iris.data is a data frame
X5.1 X3.5 X1.4 X0.2 Iris.setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
> colnames(iris.data) <- c("Sepal.Length", "Sepal.Width", "Petal.Length",
"Petal.Width", "Species") # Renaming attributes
> head(iris.data, 3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa

Reading matrices in R can be done with the combination of read.csv and as.matrix() functions. Let’s
assume you have the following contents in the matrix.txt file residing in your current working directory
(you can figure out what your current working directory is by calling getwd() in R and change it using the
setwd(<path>) function):

0,.11,.22,.4
.11,0,.5,.3
.22,.5,0,.7

Then you can read this matrix as follows:

> m <- as.matrix(read.csv("matrix.txt", header=FALSE))

> m
V1 V2 V3 V4
[1,] 0.00 0.11 0.22 0.4
[2,] 0.11 0.00 0.50 0.3
[3,] 0.22 0.50 0.00 0.7

Saving the matrix back to a file is simple as well with the write.table() function. If you don’t set the
row.names and col.names parameters to FALSE, row/column names will be written to the output file
along with the raw data:

> write.table(m, "matrix2.txt", sep=",", row.names=FALSE,

col.names=FALSE)

Corporate Cabin Attendant Training
100% (2)
Corporate Cabin Attendant Training
18 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
R Programming
No ratings yet
R Programming
79 pages
STATS LAB Basics of R PDF
No ratings yet
STATS LAB Basics of R PDF
77 pages
Untitled
No ratings yet
Untitled
59 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
A Report On R Name-Kaveena ROLL NO-12EE46
No ratings yet
A Report On R Name-Kaveena ROLL NO-12EE46
10 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Dzone R Refcard
No ratings yet
Dzone R Refcard
9 pages
R Programming Paper Solutions
No ratings yet
R Programming Paper Solutions
43 pages
R Tutorial
No ratings yet
R Tutorial
25 pages
R Interview
No ratings yet
R Interview
20 pages
Statistics Using R Language
No ratings yet
Statistics Using R Language
5 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
91 pages
Kmbn It01_ Unit 4
No ratings yet
Kmbn It01_ Unit 4
19 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
Statistical Computing II-slide (1)
No ratings yet
Statistical Computing II-slide (1)
279 pages
R Advanced
No ratings yet
R Advanced
4 pages
Unit 5 - DS - 1st year
No ratings yet
Unit 5 - DS - 1st year
19 pages
Exercise Dataframe
No ratings yet
Exercise Dataframe
6 pages
R Notes
No ratings yet
R Notes
27 pages
PW1 2
No ratings yet
PW1 2
20 pages
Arrays and Functions
No ratings yet
Arrays and Functions
11 pages
Data Analysis Using R - 2
No ratings yet
Data Analysis Using R - 2
23 pages
R-Programming: To See The Working Directory in R Studio
No ratings yet
R-Programming: To See The Working Directory in R Studio
17 pages
ECON 1100 R04 - R.Commands PDF
No ratings yet
ECON 1100 R04 - R.Commands PDF
15 pages
Understanding Basic Data Types and Data Structures in R
No ratings yet
Understanding Basic Data Types and Data Structures in R
10 pages
Untitled Document
No ratings yet
Untitled Document
27 pages
LAB1
No ratings yet
LAB1
12 pages
Teaching Notes of R
No ratings yet
Teaching Notes of R
78 pages
R Introduction
No ratings yet
R Introduction
40 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
R Studio
No ratings yet
R Studio
41 pages
updated question paper 2 ans
No ratings yet
updated question paper 2 ans
12 pages
Statistics With R Unit 1
No ratings yet
Statistics With R Unit 1
25 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
r prog
No ratings yet
r prog
96 pages
R - Interview Questions
No ratings yet
R - Interview Questions
7 pages
r 2m
No ratings yet
r 2m
34 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Muthulakshmi M: Software Technical Trainer IBM
No ratings yet
Muthulakshmi M: Software Technical Trainer IBM
81 pages
R Programming Variables Datatypes Vectors Lists Notes
No ratings yet
R Programming Variables Datatypes Vectors Lists Notes
18 pages
basics of R
No ratings yet
basics of R
12 pages
ProfessiR programming
No ratings yet
ProfessiR programming
22 pages
R Programming
No ratings yet
R Programming
48 pages
R Lectures
No ratings yet
R Lectures
10 pages
Data Analysis2
No ratings yet
Data Analysis2
16 pages
Stat 1st Unit
No ratings yet
Stat 1st Unit
32 pages
Introduction To R: Benny Yakir
No ratings yet
Introduction To R: Benny Yakir
6 pages
E5 - Statistical Analysis Using R
100% (1)
E5 - Statistical Analysis Using R
45 pages
R PROGRAMMING
No ratings yet
R PROGRAMMING
13 pages
Unit-2-Start Learning R
No ratings yet
Unit-2-Start Learning R
10 pages
An Introduction To R: 1 Background
No ratings yet
An Introduction To R: 1 Background
17 pages
MODULE-2
No ratings yet
MODULE-2
32 pages
Week 02
No ratings yet
Week 02
39 pages
Data Types
No ratings yet
Data Types
27 pages
R Short Tutorial
No ratings yet
R Short Tutorial
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Fire Resistant Cable
No ratings yet
Fire Resistant Cable
2 pages
SQL Service Integration Services
No ratings yet
SQL Service Integration Services
35 pages
Tirumala Cotton Recruitment and Selection Recruitment and Selection RCE
No ratings yet
Tirumala Cotton Recruitment and Selection Recruitment and Selection RCE
71 pages
Haryana Driver Merit List
No ratings yet
Haryana Driver Merit List
20 pages
Summer-Practice CIV300 Report
No ratings yet
Summer-Practice CIV300 Report
16 pages
Alto Elvis 15.2xla SM Ver1.0
100% (1)
Alto Elvis 15.2xla SM Ver1.0
22 pages
Pessing Libya INC: SA 312 TP 316 TO SA 312 TP
No ratings yet
Pessing Libya INC: SA 312 TP 316 TO SA 312 TP
1 page
Political Engagement External Statement July 2020
No ratings yet
Political Engagement External Statement July 2020
2 pages
MXM316 Im
No ratings yet
MXM316 Im
69 pages
04 - Literature Review
No ratings yet
04 - Literature Review
10 pages
Worksheet 7 Solution
No ratings yet
Worksheet 7 Solution
4 pages
Module-2 Notes Biology For Engineers....
No ratings yet
Module-2 Notes Biology For Engineers....
14 pages
Session 3 (Quantitative Techniques - Sample Paper 2)
No ratings yet
Session 3 (Quantitative Techniques - Sample Paper 2)
14 pages
Undersecretary: in View of The Preparations For The Upcoming School Year (SY) 2022-2023 School Opening
No ratings yet
Undersecretary: in View of The Preparations For The Upcoming School Year (SY) 2022-2023 School Opening
2 pages
Coffee Shop Business Plan Example
No ratings yet
Coffee Shop Business Plan Example
44 pages
Lecture 02 Maritime Transportation and Logistics As A Trade Facilitator
No ratings yet
Lecture 02 Maritime Transportation and Logistics As A Trade Facilitator
17 pages
Bhaktivedanta_Vidyapitha_SB_Canto_10_Overview
No ratings yet
Bhaktivedanta_Vidyapitha_SB_Canto_10_Overview
208 pages
Ce 313 Final Exam Problem Solving
No ratings yet
Ce 313 Final Exam Problem Solving
4 pages
SKILL 5 - Listen For Who and What
No ratings yet
SKILL 5 - Listen For Who and What
16 pages
Mohan2020 PDF
No ratings yet
Mohan2020 PDF
7 pages
Homework Letter To Teacher
100% (1)
Homework Letter To Teacher
8 pages
Mina Petrila-Spatiu Exterior 2 - 14 03 2024-Model
No ratings yet
Mina Petrila-Spatiu Exterior 2 - 14 03 2024-Model
1 page
SY0-701-Demo
No ratings yet
SY0-701-Demo
25 pages
FinMan Assignment
No ratings yet
FinMan Assignment
11 pages
Objective Questions
No ratings yet
Objective Questions
2 pages
+2 English: Be Confident Model Public Exam - March 2023
No ratings yet
+2 English: Be Confident Model Public Exam - March 2023
19 pages
Murat Khairzhan-Uli Munkin Current Position: Mmunkin@usf - Edu
No ratings yet
Murat Khairzhan-Uli Munkin Current Position: Mmunkin@usf - Edu
5 pages
Chea
No ratings yet
Chea
177 pages
BEC Vs K12
No ratings yet
BEC Vs K12
6 pages