Module 1

What Is Statistics?
Statistics is a branch of applied mathematics that involves the collection, description, analysis, and
inference of conclusions from quantitative data. The mathematical theories behind statistics rely heavily
on differential and integral calculus, linear algebra, and probability theory. It is basically a collection
of quantitative data.
People who do statistics are referred to as statisticians. They're particularly concerned with
determining how to draw reliable conclusions about large groups and general events from the
behavior and other observable characteristics of small samples. These small samples represent a
portion of the large group or a limited number of instances of a general phenomenon.
Types of Statistics:
● Theoretical Statistics: Theoretical statistics concerns the study and development of the
mathematical, computational, and philosophical foundations of statistics. Pure or theoretical
statistics focus primarily on the numbers, math, and problems themselves.
● Applied Statistics: Applied
statistics is the use of statistical
techniques to solve real-world data
analysis problems. In contrast to
the pure study of mathematical
statistics, applied statistics is
typically used by and for non-
mathematicians in fields ranging
from social science to business.
Thus, applied statistics can be
thought of as “statistics-in-action”.
Statistics alone can be used
pragmatically. However, in general,
the emphasis of applied statistics tends to be more oriented toward practical benefits.
What is Descriptive Statistics?

Descriptive Statistics describes the characteristics of a data set. It is a simple technique to
describe, show and summarize data in a meaningful way . Descriptive statistics is very important to
present our raw data ineffective / meaningful way using numerical calculations or graphs or tables. With
descriptive statistics, one can describe both an entire population and an individual sample. This type of
statistics is applied to already known data.
Typically, there are two general types of statistic that are used to describe data:
● Measures of central tendency: these are ways of describing the central position of a
frequency distribution for a group of data. We can describe this central position using a
number of statistics, including the mode, median, and mean.
● Measures of spread: these are ways of summarizing a group of data by describing how
spread out the scores are. Measures of spread help us to summarize how spread out these
scores are. To describe this spread, a number of statistics are available to us, including the
range, quartiles, absolute deviation, variance and standard deviation.
What is Inferential Statistics?

Inferential statistics involves drawing conclusions about populations by examining samples. It
allows us to make inferences about the entire set, including specific examples within it, based on
information obtained from a subset of examples. These inferences rely on the principles of
evidence and utilize sample statistics as a basis for drawing broader conclusions. The accuracy of
inferential statistics depends largely on the accuracy of sample data and how it represents the
larger population.
Difference Between Descriptive and Inferential Statistics

Descriptive statistics provide a summary of the features or attributes of a dataset (a population or
sample), while inferential statistics enable hypothesis testing and evaluation of the applicability
of the data to a larger population, based on a sample of that data. Descriptive and inferential
statistics are both used to analyze and comprehend data, which is a similar function to that of
descriptive statistics. They both employ statistical techniques and instruments to make
judgements about a community. Here are the key differences between descriptive vs inferential
statistics:
Descriptive Statistics Inferential Statistics

It gives information about raw data and Make inferences and draw conclusions about a
describe and summarize data population based on sample data
Analyzes and interprets the characteristics of Uses sample data to make generalizations or
a dataset predictions about a larger population
It helps in organizing, analyzing, and to It allows us to compare data, and make
present data in a meaningful manner. hypotheses and predictions.
Provides measures of central tendency and Estimates parameters, tests hypotheses, and
dispersion determines the level of confidence or
significance in the results
Mean, median, mode, standard deviation, Hypothesis testing, confidence intervals,
range, frequency tables regression analysis, ANOVA (analysis of
variance), chi-square tests, t-tests, etc.
Summarize, organize, and present data Generalize findings to a larger population,
make predictions, test hypotheses, evaluate
relationships, and support decision-making
It can be achieved Estimated using sample
with the help of charts, statistics (e.g., sample
graphs, tables, etc. mean as an estimate of
population mean). It
can be achieved by
probability.
What are the strengths of using descriptive statistics to

examine a distribution of scores?
Other than the clarity with which descriptive statistics can clarify large volumes of data, there are
no uncertainties about the values one can get (other than only measurement error, etc.).
What are the limitations of descriptive statistics?

Descriptive statistics are limited in so much that they only allow one to make summations about
the people or objects that have actually measured. One cannot use the data that have collected to
generalize to other people or objects (i.e., using data from a sample to infer the
properties/parameters of a population). For example, if a person is tested a drug to beat cancer
and it worked in one patients, but one cannot claim that it would work in other cancer patients
only relying on descriptive statistics (but inferential statistics would give us this opportunity).
What are the limitations of inferential statistics?

There are two main limitations to the use of inferential statistics. The first, and most important
limitation, which is present in all inferential statistics, is that one is providing data about a
population that one has not fully measured, and therefore, cannot ever be completely sure that
the values/statistics he/she calculated are correct. Remember, inferential statistics are based on
the concept of using the values measured in a sample to estimate/infer the values that would be
measured in a population; there will always be a degree of uncertainty in doing this. The second
limitation is connected with the first limitation. Some, but not all, inferential tests require the
user (i.e., the researcher or statistician) to make educated guesses (based on theory) to run the
inferential tests. Again, there will be some uncertainty in this process, which will have
repercussions on the certainty of the results of some inferential statistics.
Sample vs Population
Population is the entire group one wants to draw conclusions about. It basically allows one
to make predictions by taking a small sample instead of working on the whole population.
Moreover, in statistics population is the entire set of items from which data is drawn in the
statistical study. It can be a group of individuals or a set of items. population mean is usually
denoted by the Greek letter ‘μ’.
A sample represents a group of the interest of the population which we will use to represent the
data. The sample is an unbiased subset of the population in which we represent the whole data. A
sample is a group of the elements actually participating in the survey or study. A sample is the
representation of the manageable size. samples are collected and stats are calculated from the
sample so one can make interferences or extrapolations from the sample. This process of
collecting info from the sample is called sampling. The sample is denoted by the n.
Population parameter vs. sample statistic
When you collect data from a population or a sample, there are various measurements and
numbers you can calculate from the data. Parameters are numbers that describe the properties of
entire populations. Statistics are numbers that describe the properties of samples.
● Parameter = Population
● Statistic = Sample
For example, the average income for the India is a population parameter. Conversely, the
average income for a sample drawn from the India is a sample statistic. Both values represent the
mean income, but one is a parameter vs a statistic.
Statistic vs Parameter Symbols

While parameters and statistics have the same types of summary values, statisticians denote them
differently. Typically, we use Greek and upper-case Latin letters to signify parameters and
lower-case Latin letters to represent statistics.
Summary Value Parameter Statistic
Mean μ or Mu x̄ or x-bar
Standard deviation σ or Sigma s
Correlation ρ or rho r
Differences Between Statistic and Parameter

The difference between statistic and parameter can be drawn clearly on the following grounds:
1. A statistic is a characteristic of a small part of the population, i.e. sample. The parameter
is a fixed measure which describes the target population.
2. The statistic is a variable and known number which depend on the sample of the
population while the parameter is a fixed and unknown numerical value.
3. Statistical notations are different for population parameters and sample statistics, which
(examples) are given as under:
o In population parameter, µ (Greek letter mu) represents mean, P denotes
population proportion, standard deviation is labeled as σ (Greek letter sigma) etc.
o In sample statistics, x̄ (x-bar) represents mean, standard deviation is labeled as s,
etc.
Difference between Population and Sample

Some of the key differences between population and sample are clearly given below:
Comparison Population Sample
Meaning Collection of all the units or elements that A subgroup of the members of the
possess common characteristics population
Includes Each and every element of a group Only includes a handful of units of
population
Characteristics Parameter Statistic
Data Complete enumeration or census Sampling or sample survey

Collection
Focus on Identification of the characteristics Making inferences about the

population
Primary vs Secondary Data

In statistical analysis, collection of data plays a significant part. The method of collecting
information is divided into two different sections, namely primary data and secondary data. In
this process, the primary data is assembling data or information for the first time, whereas the
secondary data is the data that has already been gathered or collected by others. The most
important characteristic of the primary data is that it is original and first-hand, whereas the
secondary data is the interpretation and analysis of the primary data.
Primary Data: The mode of assembling the information is costly, as the analysis is done by an
agency or an external organization or by the researcher himself/herself and needs human
resources and investment. The investigator supervises and controls the data collection process
directly. The data is mostly collected through observations, physical testing, mailed
questionnaires, surveys, personal interviews, telephonic interviews, case studies, and focus
groups, etc.
Secondary Data: Secondary data is second-hand data that is already collected and recorded by
some researchers for their purpose, and not for the current research problem. It is accessible in
the form of data collected from different sources such as government publications, censuses,
internal records of the organization, books, journal articles, websites and reports, etc. This
method of gathering data is affordable, readily available, and saves cost and time. However, the
one disadvantage is that the information assembled is for some other purpose and may not meet
the present research purpose or may not be accurate. Using existing data generated by large
government Institutions, healthcare facilities etc. as part of organizational record keeping. The
data is then extracted from more varied datafiles.
Pros and Cons for each.

BASIS FOR PRIMARY DATA SECONDARY DATA
COMPARISON
Meaning Primary data refers to the first hand Secondary data means data collected by
data gathered by the researcher someone else earlier.
himself.
Data Real time data Past data
Process Very involved Quick and easy
Source Surveys, observations, experiments, Government publications, websites,

questionnaire, personal interview, etc. books, journal articles, internal records
etc.
Cost effectiveness Expensive Economical
Collection time Long Short
Specific Always specific to the researcher's May or may not be specific to the
needs. researcher's need.
Available in Crude form Refined form
Accuracy and More Relatively less

Reliability

Module 1

Uploaded by

Module 1

Uploaded by

What Is Statistics?

What is Descriptive Statistics?

What is Inferential Statistics?

Difference Between Descriptive and Inferential Statistics

Descriptive Statistics Inferential Statistics

What are the strengths of using descriptive statistics to

What are the limitations of descriptive statistics?

What are the limitations of inferential statistics?

Population parameter vs. sample statistic

Statistic vs Parameter Symbols

Differences Between Statistic and Parameter

Difference between Population and Sample

Characteristics Parameter Statistic

Data Complete enumeration or census Sampling or sample survey

Focus on Identification of the characteristics Making inferences about the

Primary vs Secondary Data

Pros and Cons for each.

Data Real time data Past data

Process Very involved Quick and easy

Source Surveys, observations, experiments, Government publications, websites,

Cost effectiveness Expensive Economical

Collection time Long Short

Available in Crude form Refined form

Accuracy and More Relatively less

You might also like