Descriptive Analysis in R Programming - GeeksforGeeks-1-12
Descriptive Analysis in R Programming - GeeksforGeeks-1-12
In Descriptive analysis, we are describing our data with the help of various representative
methods like using charts, graphs, tables, excel files, etc. In the descriptive analysis, we
describe our data in some manner and present it in a meaningful way so that it can be
easily understood. Most of the time it is performed on small data sets and this analysis
helps us a lot to predict some future trends based on the current findings. Some measures
that are used to describe a data set are measures of central tendency and measures of
variability or dispersion.
Mean
Mode
Median
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 1/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Measure of variability
Measure of variability is known as the spread of data or how well is our data is
distributed. The most common variability measures are:
Range
Variance
Standard deviation
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 2/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Descriptive Analysis in R
Descriptive analyses consist of describing simply the data using some summary statistics
and graphics. Here, we’ll describe how to compute summary statistics using R software.
Before doing any computation, first of all, we need to prepare our data, save our data in
external .txt or .csv files and it’s a best practice to save the file in the current directory.
After that import, your data into R as follow:
# R program to illustrate
# Descriptive Analysis
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 3/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Output:
Mean
It is the sum of observations divided by the total number of observations. It is also defined
as average which is the sum divided by count.
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 4/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Example:
# R program to illustrate
# Descriptive Analysis
Output:
[1] 28.78889
Median
It is the middle value of the data set. It splits the data into two halves. If the number of
elements in the data set is odd then the center element is median and if it is even then the
median would be the average of two central elements.
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 5/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Example:
# R program to illustrate
# Descriptive Analysis
Output:
[1] 26
Mode
It is the value that has the highest frequency in the given data set. The data set may have
no mode if the frequency of all data points is the same. Also, we can have more than one
mode if we encounter two or more data points having the same frequency.
Example:
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 6/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
# R program to illustrate
# Descriptive Analysis
Output:
[1] 25
Range
The range describes the difference between the largest and smallest data point in our
data set. The bigger the range, the more is the spread of data and vice versa.
Example:
# R program to illustrate
# Descriptive Analysis
cat("Range is:\n")
print(range)
Hiring Challenge In 29 : 18 : 36 Data Types Control Flow Functions String Array Vector Lists Mat
# Alternate method to get min and max
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 7/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
r = range(myData$Age)
print(r)
Output:
Range is:
[1] 32
[1] 18 50
Variance
where,
N = number of terms
u = Mean
Example:
# R program to illustrate
# Descriptive Analysis
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 8/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
# Calculating variance
variance = var(myData$Age)
print(variance)
Output:
[1] 48.21217
Standard Deviation
It is defined as the square root of the variance. It is being calculated by finding the Mean,
then subtract each number from the Mean which is also known as average and square the
result. Adding all the values and then divide by the no of terms followed the square root.
where,
N = number of terms
u = Mean
Example:
# R program to illustrate
# Descriptive Analysis
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 9/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Output:
[1] 6.943498
Quartiles
A quartile is a type of quantile. The first quartile (Q1), is defined as the middle number
between the smallest number and the median of the data set, the second quartile (Q2) –
the median of the given data set while the third quartile (Q3), is the middle number
between the median and the largest value of the data set.
Example:
# R program to illustrate
# Descriptive Analysis
# Calculating Quartiles
quartiles = quantile(myData$Age)
print(quartiles)
Output:
Interquartile Range
The interquartile range (IQR), also called as midspread or middle 50%, or technically H-
spread is the difference between the third quartile (Q3) and the first quartile (Q1). It
covers the center of the distribution and contains 50% of the observations.
IQR = Q3 – Q1
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 10/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Example:
# R program to illustrate
# Descriptive Analysis
# Calculating IQR
IQR = IQR(myData$Age)
print(IQR)
Output:
[1] 9
summary() function in R
The function summary() can be used to display several statistic summaries of either one
variable or an entire data frame.
Example:
# R program to illustrate
# Descriptive Analysis
# Calculating summary
summary = summary(myData$Age)
print(summary)
Output:
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 11/17
9/20/23, 2:41 PM Descriptive Analysis in R Programming - GeeksforGeeks
Example:
# R program to illustrate
# Descriptive Analysis
# Calculating summary
summary = summary(myData)
print(summary)
Output:
https://www.geeksforgeeks.org/descriptive-analysis-in-r-programming/ 12/17