0% found this document useful (0 votes)
4 views

DA Practical Lab 02 Statistical Functions

Uploaded by

himanshut.aids22
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DA Practical Lab 02 Statistical Functions

Uploaded by

himanshut.aids22
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

S. B.

JAIN INSTITUTE OF TECHNOLOGY, MANAGEMENT


& RESEARCH, NAGPUR.

Practical No. 2

To Perform different statistical function used in Data Analytics.

Name of Student:
Roll No.:
Semester/Year:
Academic Session:
Date of Performance:
Date of Submission:
• AIM : To Perform different statistical function used in Data Analytics. •
Task:-
Apply different statistical function on given datasets used in Data Analytics.
 Create a dataset Employee with attribute
(‘name’,’age’,’salary’,’BMI’,’year of expirence’,weight’,’height’).
Perform mean, mode, median, Standard Deviation by taking axis =0
& axis=1.
 Upload any one of inbuild dataset from ‘Heart Disease, Cancer’,
‘Diabetes’,’Iris’.
Compute mean, mode, median, Standard deviation and Variance by
taking axis=0 & axis=1.

OBJECTIVES:

• To apply statistical analysis and technologies on data to find trends and solve
problems.

• To cover the variance in Python and how to calculate the variability for a set of
values..

THEORY:

• Statistic in general, is the method of collection of data, tabulation, and


interpretation of numerical data. It is an area of applied mathematics
concerned with data collection analysis, interpretation, and presentation. With
statistics, we can see how data can be used to solve complex problems.
Descriptive Statistics: -

Descriptive statistics generally means describing the data with the help of some
representative methods like charts, tables, Excel files, etc. The data is described in such a way
that it can express some meaningful information that can also be used to find some future
trends. Describing and summarizing a single variable is called univariate analysis. Describing
a statistical relationship between two variables is called bivariate analysis. Describing the
statistical relationship between multiple variables is called multivariate analysis. There are
two types of Descriptive Statistics:

• 1) The measure of central tendency

• 2) Measure of variability

Descriptive statistics summarizes the data and are broken down into measures of
central tendency (mean, median, and mode) and measures of variability
(Variance, standard deviation, range,).
The measure of central tendency
1) mean()
It is the sum of observation divided by the total number of observations. It is also defined
as average which is the sum divided by count..

>>> nums=[1,2,3,5,7,9]
>>> np..mean(nums)

2) mode()

It is the value that has the highest frequency in the given data set. The data set may have
no mode if the frequency of all data points is the same. Also, we can have more than one
mode if we encounter two or more data points having the same frequency. >>> from
scipy import stats
>>> nums=[1,2,3,5,7,9,7,2,7,6]
>>> stats.mode(nums)

3) median()

Median:
It is the middle value of the data set. It splits the data into two halves. If the number of
elements in the data set is odd then the Centre element is median and if it is even then the
median would be the average of two central elements.
median_low()

When the data is of an even length, this provides us the low median of the data. Otherwise,
it returns the middle value.

>>> st.median_low([1,2,4])
a) median_high()
Like median_low, this returns the high median when the data is of an even length.
Otherwise, it returns the middle value.

>>> st.median_high([1,2,4])

>>> st.median_high([1,2,3,4])
Measure of variability:
Measure of variability is known as the spread of data or how well is our data is distributed.
The most common variability measures are:
• Range

• Variance

• Standard deviation
2. The range describes the difference between the largest and smallest data point in our
data set. The bigger the range, the more is the spread of data and vice versa.
Variance ()
It is defined as an average squared deviation from the mean. It is being calculated by finding
the difference between every data point and the average which is also known as the mean,
squaring them, adding all of them and then dividing by the number of data points present in
our data set. In statistics, variance is a measure of how spread out a dataset is. It calculates the
average of the squared differences from the mean of the dataset. Note that the stats.variance()
function calculates the sample variance, which uses the denominator n - 1 instead of n to
adjust for bias in the estimate of the population variance.

Standard Deviation:
It is defined as the square root of the variance. It is being calculated by finding the Mean,
then subtract each number from the Mean which is also known as average and square the
result. Adding all the values and then divide by the no of terms followed the square root.
CONCLUSION:

DISCUSSION AND VIVA VOCE:


• What are the measures used for central tendency?

• What is standard deviation?

• What are the measure of variability?

REFERENCE:
1. https://www.javatpoint.com/
2. https://www.datacamp.com/tracks/data-analyst-with-python/
3. https://data-flair.training/blogs/python-descriptive-statistics/
4. “"Python for Data Analysis - Data Wrangling with Pandas, NumPy, and IPython'' by
Wes McKinney was published by O'Reilly Media, Inc. 2011,”

You might also like