Unit 3

Unit III DESCRIPTIVE ANALYSIS
An Overview of Descriptive Analysis

Nowadays, Big Data and Data Science have become high volume keywords. They tend to
become extensively researched and this makes this data to be processed and studied with scrutiny.
One of the techniques to analyse this data is Descriptive Analysis.
This data needs to be analysed to provide great insights and influential trends that allows
the next batch of content to be made in accordance to the general population’s liking or dis-liking.
Introduction
The conversion of raw data into a form that will make it easy to understand & interpret, ie.,
rearranging, ordering, and manipulating data to provide insightful information about the provided
data.
Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data
points in a constructive way such that patterns might emerge that fulfill every condition of the
data.
It is one of the most important steps for conducting statistical data analysis. It gives you a
conclusion of the distribution of your data, helps you detect typos and outliers, and enables you to
identify similarities among variables, thus making you ready for conducting further statistical
analyses.
Techniques for Descriptive Analysis

Data aggregation and data mining are two techniques used in descriptive analysis to churn out
historical data. In Data aggregation, data is first collected and then sorted in order to make the
datasets more manageable.
1. Descriptive techniques often include constructing tables of quantities and means, methods
of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs"
that can be used to carry out many disparate hypotheses. These hypotheses often highlight
differences among subgroups.
2. Measures like segregation, discrimination, and inequality are studied using specialised
descriptive techniques. Discrimination is measured with the help of audit studies or
decomposition methods. More segregation on the basis of type or inequality of outcomes
need not be wholly good or bad in itself, but it is often considered a marker of unjust
social processes; accurate measurement of the different steps across space and time is a
prerequisite to understanding these processes.
3. A table of means by subgroup is used to show important differences across subgroups,

which mostly results in inference and conclusions being made. When we notice a gap in
earnings, for example, we naturally tend to extrapolate reasons for those patterns
complying.
But this also enters the province of measuring impacts which requires the use of different
techniques. Often, random variation causes difference in means, and statistical inference is
required to determine whether observed differences could happen merely due to chance.
4. A crosstab or two-way tabulation is supposed to show the proportions of components

with unique values for each of two variables available, or cell proportions. For example,
we might tabulate the proportion of the population that has a high school degree and also
receives food or cash assistance, meaning a crosstab of education versus receipt of
assistance is supposed to be made.
Then we might also want to examine row proportions, or the fractions in each education
group who receive food or cash assistance, perhaps seeing assistance levels dip
extraordinarily at higher education levels.
Column proportions can also be examined, for the fraction of population with
different levels of education, but this is the opposite from any causal effects. We might
come across a surprisingly high number or proportion of recipients with a college education,
but this might be a result of larger numbers of people being college graduates than people
who have less than a high school degree.
Types of Descriptive Analysis

Descriptive analysis can be categorized into four types which are measures of frequency, central
tendency, dispersion or variation, and position. These methods are optimal for a single variable at
a time.
1. Measures of Frequency
In descriptive analysis, it’s essential to know how frequently a certain event or response is
likely to occur. This is the prime purpose of measures of frequency to make like a count or
percent.
For example, consider a survey where 500 participants are asked about their favourite IPL
team. A list of 500 responses would be difficult to consume and accommodate, but the data
can be made much more accessible by measuring how many times a certain IPL team was
selected.
2. Measures of Central Tendency
In descriptive analysis, it’s also important to find out the Central (or average) Tendency
or response. Central tendency is measured with the use of three averages — mean, median,
and mode. As an example, consider a survey in which the weight of 1,000 people is
measured. In this case, the mean average would be an excellent descriptive metric to
measure mid-values.
3. Measures of Dispersion
Sometimes, it is important to know how data is divided across a range. To elaborate

this, consider the average weight in a sample of two people. If both individuals are 60
kilos, the average weight will be 60 kg. However, if one individual is 50 kg and the other
is 70 kg, the average weight is still 60 kg. Measures of dispersion like range or standard
deviation can be employed to measure this kind of distribution.
4. Measures of Position
Descriptive analysis also involves identifying the position of a single value or its
response in relation to others. Measures like percentiles and quartiles become very useful in
this area of expertise.
Apart from it, if you’ve collected data on multiple variables, you can use the
Bivariate or Multivariate descriptive statistics to study whether there are relationships between
them.
In bivariate analysis, you simultaneously study the frequency and variability of
two different variables to see if they seem to have a pattern and vary together. You can also test
and compare the central tendency of the two variables before carrying out further types of
statistical analysis.
Multivariate analysis is the same as bivariate analysis but it is carried out for more
than two variables. Following 2 methods are for bivariate analysis.
1. Contingency table
In a contingency table, each cell represents the combination of the two variables.
Naturally, an independent variable (e.g., gender) is listed along the vertical axis and a
dependent one is tallied along the horizontal axis (e.g., activities). You need to read “across”
the table to witness how the two variables i.e. independent and dependent variables relate to
each other.
9–
Group 0–4 5–8 13–16 17+
12
Men 33 68 37 23 22
Women 36 48 44 83 25
A table showing a tally of different gender with number of activities
2. Scatter plots
A scatter plot is a chart that enables you to see the relationship between two or three
different variables. It’s a visual rendition of the strength of a relationship.
In a scatter plot, you are supposed to plot one variable along the x-axis and another one
along the y-axis. Each data point is denoted by a point in the chart.
The scatter plot shows the hours of sleep needed per day by age,Source
Advantages of Descriptive Analysis
● High degree of objectivity and neutrality of the researchers are one of the main advantages of
Descriptive Analysis. The reason why researchers need to be extra vigilant is because descriptive
analysis shows different characteristics of the data extracted and if the data doesn’t match with
the trends then it will lead to major dumping of data.
● Descriptive analysis is considered to be more vast than other quantitative methods and provide a
broader picture of an event or phenomenon. It can use any number of variables or even a single
number of variables to conduct a descriptive research.
● This type of analysis is considered as a better method for collecting information that describes
relationships as natural and exhibits the world as it exists. This reason makes this analysis very
real and close to humanity as all the trends are made after research about the real-life behaviour
of the data.
● It is considered useful for identifying variables and new hypotheses which can be further
analyzed through experimental and inferential studies. It is considered useful because the margin
for error is very less as we are taking the trends straight from the data properties.
This type of study gives the researcher the flexibility to use both quantitative and qualitative data
in order to discover the properties of the population.
For example, researchers can use both case study which is a qualitative analysis and
correlation analysis to describe a phenomena in its own way. Using the case studies for
describing people, events, institutions enables the researcher to understand the behavior and
pattern of the concerned set to its maximum potential.
● In the case of surveys which consist of one of the main types of Descriptive Analysis, the
researcher tends to gather data points from a relatively large number of samples unlike
experimental studies that generally need smaller samples.
This is an out and out advantage of the survey method over other descriptive methods that
it enables researchers to study larger groups of individuals with ease. If the surveys are properly
administered, it gives a broader and neater description of the unit under research.
Data Visualization
What is Data Visualization?
Data visualization is the graphical representation of datasets and information. Data
visualization is an umbrella term for visualizing all types of data through charts, graphs, and
maps.
The ultimate goal is to visually represent your data in an accessible and easy-to-understand
manner. Visualizing data is a fundamental step in understanding trends, uncovering patterns,
and tracking outliers.
What are the Benefits of Data Visualization?
There are three key benefits to data visualization:
1. Making Big Data Digestible
There is no question that datasets are growing tenfold. Combine this with the growing
advancement of IT systems that are increasingly more advanced, and you have a colossal
pain point for businesses.
The fact that data is becoming so overwhelming for organizations has spurred the rise
of AIOps. AIOps helps businesses with various use cases such as:
● Predictive alerting
● Root cause analysis
● Prioritizing events
● Predictive outages
● Service desk ticketing

How does this relate to data visualization? Data visualization tools are an effective way to
map out data in an automated manner – and this automation is core to AIOps. This hands-off
approach makes approaching large datasets rather easy to gather, and then in turn to digest
using data visualization techniques.
2. Greater Accessibility
We mentioned how large datasets are now accessible for a greater number of users. This
demonstrates that data visualization is a key factor in data democratization.
Data visualization tools help simplify complex robust data points and present them in highly
digestible ways. This growing accessibility can help upskill employees and make businesses
more efficient.
3. Greater Efficiency & Understanding
Traditional means of sifting through data were meticulous and time-consuming. Data
visualization helps businesses discover insights at a much faster rate than prior to the advent
of visualization tools.
Speed is central here. This growing scalability means that business leaders have more room
to be granular in their analysis. If data is mapped out more quickly, IT teams and data
scientists have more time to draw more complex insights from their well-organized
databases.
Prior to data democratization, gaps in communication were all too common for enterprises
and businesses alike. Boiling down and explaining advanced insights can be difficult without
a common understanding of what the datasets behind these insights mean. With modern data
visualization software, such as Tableau and Microsoft Power BI, data analysis is broadened
to virtually any department within your organization.
What are the Different Types of Data Visualization?
Here are examples of various forms of data visualization and uses cases:
Bar graphs: These types of graphs are best utilized to compare aspects of different groups or
to track those aspects over time. Bar graphs are best used when changes are rather large.
Line graphs: One of the most popular and fundamental forms of data visualization are line
graphs, which are used to track changes over short and long periods of time. Line graphs are
particularly useful to highlight smaller changes.
Graphs are, for the most part, are rather modular. Above is an example of a line graph
tracking bounce rates; this is in contrast to a bar graph that represents page load times.
Pie Charts: Another fundamental form of data visualization, pie charts are effective for
comparing parts of a whole. Because they are not placed on an X-Y plot, tracking data over
time is not possible with a pie chart.
These charts are very basic examples of data visualization. Many modern tools are designed
to open up complex methods to the everyday user. For instance, here’s one way a
management trainee in the finance industry used data visualization software for their needs:
“Power BI is a widely used software in our organization where we deal with a huge amount
of raw data and process it to gather actionable insights. It helps us to visualize scattered and
unfiltered information efficiently and easy to understand manner…Overall, I would say that
this is a must-have software for any enterprise that directly witnesses a lot of data being
gathered to formulate strategies and plan of actions” – Management Trainee in finance
industry, review of Microsoft Power BI at Gartner Peer Insights.
Scatter Plots: A slightly more advanced data visualization method is scatter plotting. Scatter
plots are an effective way to explore the relationship between two variables and multiple sets
of data. Below is an example of a scatter plot mapping out profitability in various American
cities. Note how cities with larger profitability have larger circles.
Exploratory Data Analysis using Data
Visualization Techniques!
Desriptive Statistics Defined
Descriptive statistics describe, show, and summarize the basic features of a dataset found in a
given study, presented in a summary that describes the data sample and its measurements. It
helps analysts to understand the data better.
Descriptive statistics represent the available data sample and do not include theories,
inferences, probabilities, or conclusions. That’s a job for inferential statistics.
Descriptive Statistics Examples

If you want a good example of descriptive statistics, look no further than a student’s grade
point average (GPA). A GPA gathers the data points created through a large selection of
grades, classes, and exams then average them together and presents a general idea of the
student’s mean academic performance. Note that the GPA doesn’t predict future performance
or present any conclusions. Instead, it provides a straightforward summary of students’
academic success based on values pulled from data.
Here’s an even simpler example. Let’s assume a data set of 2, 3, 4, 5, and 6 equals a sum of
20. The data set’s mean is 4, arrived at by dividing the sum by the number of values (20
divided by 5 equals 4).
Analysts often use charts and graphs to present descriptive statistics. If you stood outside of a
movie theater, asked 50 members of the audience if they liked the film they saw, then put
your findings on a pie chart, that would be descriptive statistics. In this example, descriptive
statistics measure the number of yes and no answers and show how many people in this
specific theater liked or disliked the movie. If you tried to come up with any other
conclusions, you would be wandering into inferential statistics territory, but we'll later cover
that issue.
Finally, political polling is considered a descriptive statistic, provided it’s just presenting
concrete facts (the respondents’ answers), without drawing any conclusions. Polls are
relatively straightforward: “Who did you vote for President in the recent election?”
Types of Descriptive Statistics
Descriptive statistics break down into several types, characteristics, or measures. Some
authors say that there are two types. Others say three or even four.
Distribution (Also Called Frequency Distribution)
Datasets consist of a distribution of scores or values. Statisticians use graphs and tables to
summarize the frequency of every possible value of a variable, rendered in percentages or
numbers. For instance, if you held a poll to determine people’s favorite Beatle, you’d set up
one column with all possible variables (John, Paul, George, and Ringo), and another with the
number of votes.
Statisticians depict frequency distributions as either a graph or as a table.

Measures of Central Tendency
Measures of central tendency estimate a dataset's average or center, finding the result using
three methods: mean, mode, and median.
Mean: The mean is also known as “M” and is the most common method for finding averages.
You get the mean by adding all the response values together, and dividing the sum by the
number of responses, or “N.” For instance, say someone is trying to figure out how many
hours a day they sleep in a week. So, the data set would be the hour entries (e.g.,
6,8,7,10,8,4,9), and the sum of those values is 52. There are seven responses, so N=7. You
divide the value sum of 52 by N, or 7, to find M, which in this instance is 7.3.
Mode: The mode is just the most frequent response value. Datasets may have any number of
modes, including “zero.” You can find the mode by arranging your dataset's order from the
lowest to highest value and then looking for the most common response. So, in using our
sleep study from the last part: 4,6,7,8,8,9,10. As you can see, the mode is eight.
Median: Finally, we have the median, defined as the value in the precise center of the dataset.
Arrange the values in ascending order (like we did for the mode) and look for the number in
the set’s middle. In this case, the median is eight.
Variability (Also Called Dispersion)
The measure of variability gives the statistician an idea of how spread out the responses are.
The spread has three aspects — range, standard deviation, and variance.
Range: Use range to determine how far apart the most extreme values are. Start by
subtracting the dataset’s lowest value from its highest value. Once again, we turn to our sleep
study: 4,6,7,8,8,9,10. We subtract four (the lowest) from ten (the highest) and get six. There’s
your range.
Standard Deviation: This aspect takes a little more work. The standard deviation (s) is your
dataset’s average amount of variability, showing you how far each score lies from the mean.
The larger your standard deviation, the greater your dataset’s variable. Follow these six steps:
1. List the scores and their means.

2. Find the deviation by subtracting the mean from each score.
3. Square each deviation.
4. Total up all the squared deviations.
5. Divide the sum of the squared deviations by N-1.
6. Find the result’s square root.
Raw Number/Data Deviation from Mean Deviation Squared
4 4-7.3= -3.3 10.89
6 6-7.3= -1.3 1.69

7 7-7.3= -0.3 0.09
8 8-7.3= 0.7 0.49
8 8-7.3= 0.7 0.49
9 9-7.3=1.7 2.89
10 10-7.3= 2.7 7.29
M=7.3 Sum = 0.9 Square sums= 23.83
When you divide the sum of the squared deviations by 6 (N-1): 23.83/6, you get 3.971, and
the square root of that result is 1.992. As a result, we now know that each score deviates from
the mean by an average of 1.992 points.
Variance: Variance reflects the dataset’s degree spread. The greater the degree of data spread,
the larger the variance relative to the mean. You can get the variance by just squaring the
standard deviation. Using the above example, we square 1.992 and arrive at 3.971.
Univariate Descriptive Statistics
Univariate descriptive statistics are helpful when it comes to summarizing huge amounts of
numerical data as well as revealing patterns in the raw data. Patterns discovered in univariate
data may be described using central tendency (mean, mode, and median), as well as
dispersion: variance, range, quartiles, standard deviations, maximum, and minimum.
When dealing with univariate data, you have numerous alternatives for defining it.
● Frequency Distribution Table
● Histograms
● Bar Charts
● Pie Charts
● Frequency Polygon
Bivariate Descriptive Statistics
Bivariate statistics are inferential statistics that examine the connection between two
variables. In other words, bivariate statistics investigates how one variable compares to
another or how one variable impacts another.
Bivariate descriptive statistics include studying (comparing) two variables at the same time in
order to see whether there is a link between them. By convention, the columns represent the
independent variable and the rows represent the dependent variable.
Sampling Methods
One of the most common methods used for analyzing and measuring data on a large scale is
Sampling. There are various types of Sampling and Sampling methods used in statistical
analysis.
What is Sampling?
Sampling is a type of method used in a statistical analysis where a selected number of

elements are taken from a comparatively more extensive population. The idea and work
process behind taking a sample from a more significant population depends on the type of
statistical analysis being conducted.
In simple terms, it is a statistical process that concerns the predetermined elements of a
specific data set that facilitates further analysis and inferences about that entire group.
For Example:
If any vaccine is made for the betterment of health conditions then it is important to test it
first, to check its side effects and advantages. The test cannot be held on every single person
hence what is possible is to take individuals from each state to test that vaccine so that effects
according to place can be determined.
Why is Sampling Important?
In the case of a large population, gathering data about every single element can be time
consuming and expensive. A population is defined as a whole or a mass, which involves all
elements and their characteristics for studying a particular data set.
With the help of Sampling, an arbitrary section of a population is taken as a sample for
analysis. It helps analysts to make inferences about an entire population quicker than the
manual observation strategy.
So, for statistical analysis of a large population, it is a common practice to take a sample.
Thus, Sampling makes the study much more efficient and cost-effective, thereby showcasing
its importance in statistics.
There are different types of Sampling techniques, each applying a unique strategy to gain
knowledge about a broad set of near homogeneous elements.
Different Types of Sampling Methods
Sampling methods can be broadly categorized into two types – random or probability
Sampling methods and non-random or non-probability Sampling methods.
1)Random or probability Sampling methods can be further subdivided into 2 types, i.e.
restricted or simple random Sampling and unrestricted random Sampling.
2)Restricted random Sampling can be further classified as systematic Sampling, stratified
Sampling, and cluster Sampling.
3)Meanwhile, non-random or non-probability Sampling consists of 3 types : judgment
Sampling, quota Sampling, and convenience Sampling. You can get a clear understanding of
The various methods of Sampling and its types from the illustration below –
Restricted Random Sampling
● Systematic Sampling
● Stratified Sampling
● Cluster Sampling
Non-Random Sampling
● Judgment Sampling
● Quota Sampling
● Convenience Sampling
Random or Probability Sampling
Among the different types of Sampling in statistics, random or probability Sampling method
deserves mention. In the case of random or probability Sampling methods, every individual
element or observation has an equal chance to be selected as samples.
In this method, there should be no scope of bias or any pattern when drawing a
selected group of elements for observation.
As per the law of statistical regularity, a random or probable sample of an adequate
size which has been taken from a large population tends to have the same features and
characteristics as those of the entire population as a whole.
In a population of 1000 people, each person has a one-in-a-thousand probability of
being selected for a sample. Random Probability Sampling restricts population bias and
ensures that all individuals of the population have an equal opportunity of being included in
the sample.
Random or Probability Sampling can be broken down into 4 types, they are –
1. Unrestricted or Simple Random Sampling

Such type of Sampling is done with the random number generator technique. It is
also termed as unrestricted random Sampling for its lack of predeterminants in
picking a sample from a population.
It is considered the most reliable method as individuals are chosen randomly which is
why there is a chance for everyone to get selected for the Sampling process. This
works in a manner like suppose in an office if there is a team-building activity then
the HR can conduct a chit selecting activity through which every employee will get a
chance to take part in that activity.
Thus, simple random Sampling is also called unrestricted random Sampling. This
method has two types of procedures, samples drawn with replacements and without
replacements.
22656. Systematic Sampling
Systematic Sampling falls under the category of restricted random Sampling, which
means that it is not purely random. Samples are taken when elements meet certain
criteria.
In the case of systematic Sampling, the entire population is arranged in a specific
order. Then, every nth element of that population is selected as a sample.
This Sampling method is used by researchers to select samples of members of a
selected community at regular periods. It is necessary for this method that the choice
of sample and the sizing be done properly so that it can be used again when needed.
This method has a predetermined range which is why it is the least time-consuming.
For example, for evaluating the marks in language subjects of all the students of
standard 6, every 5th student’s mark sheet is selected as a sample. Here, n = 5.
22696. Stratified Sampling
In this method of statistical analysis, the whole population is segregated into multiple
homogenous groups or strata. From each stratum, samples are picked at random.
For example, if measuring the number of winter clothes with hoodies in a garment
store, firstly all clothes might be separated as men’s, women’s, and kids’ and then
random hoodies picked from each group act as samples for analysis.
22784. Cluster Sampling
For cluster Sampling, the whole population is divided into clusters and then selected
as samples. These samples are divided multiple times into smaller fractions until the
sample size is reduced to a state that is reasonable for statistical analysis. That is why
it is also known as multi-stage Sampling.
Based on demographic criteria such as age, sex, location, and so on, clusters are found
and included in a sample. This makes it very easy for a survey developer to extract
useful results from the research.
For example, departments of a business can be clusters as well as the number of roads within
a city.
● Non-Random or Non-Probability Sampling

In case of a non-probability sample, the elements and observations from a broader
population are selected based on non-random criteria. So, each element of a
population does not possess equal chances of being in a sample.
However, in the case of such a sample, it is not possible to make a valid judgment on
the whole population. Researchers use this kind of Sampling method to develop an
initial understanding of a small or semi-analysed population.
But, there are times when non-probability Sampling is far more valuable than the
other type, such as during the basic stages of study or while performing research on a
budget.
In qualitative research which is related to exploring, non-probability Sampling
methods are widely used. The goal of this form of research is to get a thorough
understanding of a tiny or not researched community, rather than to test a sample of a
large population that has been researched many times.
Such methods are mainly of 3 types based on the choice of element selection, which
are judgment Sampling, quota Sampling, and convenience Sampling.
Sampling Errors
Sampling error is a type of statistical error, which differentiates the analysis of samples with
the actual value of the investigated elements and observation of a population. There are
different types of Sampling errors, among them the important ones being biased and unbiased
errors.
The magnitude of both types of Sampling errors can be reduced by drawing a bigger sample.
How to control Sampling Error?
Statistical theories assist researchers in calculating the intuition of Sampling errors based on
sample size and population.
The amount of the Sampling error is mostly determined by the size of the sample taken from
the population. Larger sample sizes are related to reduced error rates.
To understand and analyze the amount of error, researchers use a statistic known as the
margin of error. A confidence level of 95 per cent is usually considered to be the normal level
of confidence.
Ways to Reduce Sampling Errors?
Sampling errors are simple to spot. To reduce sample error, one should:
Increase the Size of the Sample: A larger sample size has a more accurate conclusion
because the study is more related to the actual population.
Instead of a random sample, divide the population into groups and test groups based on their
size in the population. For example, if a given place makes up 20% of the population, make
sure this fact is included in the study.
Estimation techniques for business analysts
Estimation techniques are used to forecast the cost and effort involved in following a
particular course of action such as the implementation of a solution.
Estimation techniques are used to help organizations make strategic business decision by
analyzing the following.
1. The Cost and effort related to following a particular course of action.

2. The expected benefits to be realized by implementing that solution.
3. The lifecycle cost of the solution such as the cost of developing and operating that solution.
4. The potential risk impact of the implemented solution.
Once all these factors are analyzed, the result of the estimation can be shown as a single
number, but if the results are described as a range, with minimum and maximum values along
with probability, it may be easier for the stakeholders to understand.
This minimum and maximum range is called a confidence interval and it calculates the level
of unpredictability of the estimation results.
Estimation has some elements, which include the following:
1. Methods: there are different estimation methods that can be used for specific situations.
But it is very important that the stakeholders involved in the estimation have a shared
understanding of the elements that are to be estimated.
This shared understanding is usually achieved with the help of some decomposition tool such
as a work breakdown structure which would help breakdown a complex problem into simpler
pieces.
When creating and communicating an estimate, the constraints and assumptions also need to
be clearly communicated to avoid any confusion or misunderstanding.
Examples of some estimation methods include the following:
• Top-down: in this technique, the solution is analyzed from the and then it is broken down
into the lower levels and summed up.
for example, when analyzing the organizational budgets, the total cost of each department is
first identified and then it is split into the individual units in each of those departments.
• Bottom-up: this technique examines the organization from the lower-level and builds up
the estimated individual cost or effort, and then adds them across all elements to provide an
overall estimate.
For example, using the same budgetary example from above. The budgets for the individual
units in the department would first be identified then they would be summed up to their
departments and then summed up to the whole organization.
• Parametric estimation: this technique uses a calibrated parametric model of the element
being estimated.
The estimators would identify previous projects or solutions to analyze the costs. This would
provide an estimate of what this current solution or project might cost.
It is vital that the organization uses its own historical records to calibrate any parametric
model, because the values demonstrate the abilities of its employee and the processes used to
perform the work.
• Rough order of magnitude (ROM): this method uses a high-level estimate to estimate the
cost of the project or solution.
It usually used when there is little information to work with and is dependent on the
estimation skills of the estimators.

The greater the uncertainty, the wider the confidence interval will be.
pieces.
overall estimate.
being estimated.
perform the work.
• Rolling wave: this technique involves a continuous estimation of the project through out its
lifecycle.
It is based on the belief that as the estimators knowledge grows they would be able to give
better defined estimates for the next phase of the project.
• Delphi: this technique uses a mix of expert judgment and historical information.
It depends on historical records from the organization which is used to adjust the estimates.
The process involves creating initial estimates, sharing those estimates with the stakeholders,
and continuously refining those estimates until they are accepted by all the stakeholders.
• PERT: in this technique, each element of the estimate is given three values, which are:

The greater the uncertainty, the wider the confidence interval will be.
pieces.
overall estimate.
being estimated.
perform the work.
• Rolling wave: this technique involves a continuous estimation of the project through out its
lifecycle.
It is based on the belief that as the estimators knowledge grows they would be able to give
better defined estimates for the next phase of the project.
• Delphi: this technique uses a mix of expert judgment and historical information.
It depends on historical records from the organization which is used to adjust the estimates.
The process involves creating initial estimates, sharing those estimates with the stakeholders,
and continuously refining those estimates until they are accepted by all the stakeholders.
• PERT: in this technique, each element of the estimate is given three values, which are:
(1) optimistic value, which represents the best-case scenario.
(2) pessimistic value, which represents the worst-case scenario.
(3) most likely value., which as the name states is the most likely value.
Then a PERT value for each estimated element is computed as a weighted average, using this
formula:
(Optimistic + Pessimistic + (4 times Most Likely))/6.
2. Accuracy of the estimate: The accuracy of an estimate is a measure of uncertainty that

assesses just how close the estimate is to the actual value.
It can be calculated as a ratio of the width of the confidence interval to its mean value and
then expressed as a percentage.
When there is little information, such as early in the development of a solution approach,
a Rough Order of Magnitude (ROM) estimate can be used, which is expected to have a
wide range of likely values and a high level of uncertainty.
ROM estimates are usually not more than +50% to -50% accurate but a definitive estimate
which is more accurate can be made as soon as more information is available.
Definitive estimates that are used for forecasting timelines, final budgets, and resource needs
should ideally be accurate within 10% or less.
3. Sources of information: the estimators can use historical information from previous
experience along with the element being estimated to calculate the estimation.
But there are also some other sources of information which might be helpful and they include
the following :
• Analogous situations: this uses estimates from a similar initiative in the organizational’s
industry for example a competitor, to calculate the element being estimated.
• Organization history: this involves using historical records from similar projects in the
organization, especially if the same people and resources would be used to perform the work.
• Expert judgment: this involves using the expertise of individuals who are knowledgeable
in the element being estimated.
It relies on the knowledge of those who have done similar work in the past and this include
both internal or external people.
When using external experts, estimators should consider the relevant skills and abilities of
those doing the estimation.
4. Precision and reliability of estimates: when numerous estimates are made for a specific
attribute, the accuracy of the resulting estimate would be an average of the estimates.
By analyzing the measures of inaccuracy such as variances, estimators can agree on a final
estimate.
The accuracy of an estimate is shown in a range of estimates made by different estimators.
To show the degree of accuracy and precision, an estimate is often shown as a range of values
with a confidence interval, which is its probability level.
For example, if a team estimated that some task would take 50 hours, a 90% confidence
interval might be 44 to 54 hours, depending on what they gave as individual estimates.
A 95% confidence interval might be 48 to 52 hours.
The PERT techniques is used to calculate the confidence interval.
5. Contributors to estimates: The estimators of an element are usually in charge of the

element.
Team estimates are usually more accurate than those of a single individual especially if the
team members are those people who would do the actual work.
Field experts can also provide estimates especially for sensitive projects such as those use to
fulfil industry regulations.
When should the estimation technique be used ?
Estimation techniques have both their strengths and limitations which include the following:
Strengths
● Estimates provide an explanation for an allocated budget, their time frame, or magnitude of a
set of elements.
● If projects are planned without the use of estimates, It could lead to inadequate budgets and
unrealistic time frames.
● If there is limited information or knowledge in a project, a rough estimate can initially be
created. As more information becomes available this estimate can be refined over time to
improve its accuracy and help ensure the projects success.
Limitations
• Estimates are only as accurate as the knowledge level of the estimators. If the estimators are
novices, their estimates can be way off the mark.
• Using just one estimation method may lead to unrealistic expectations of the projects
feasibility.
What Is a Probability Distribution?
A probability distribution is a statistical function that describes all the possible values and
likelihoods that a random variable can take within a given range. This range will be bounded
between the minimum and maximum possible values, but precisely where the possible value
is likely to be plotted on the probability distribution depends on a number of factors. These
factors include the distribution's mean (average), standard deviation, skewness, and kurtosis.
How Probability Distributions Work
Perhaps the most common probability distribution is the normal distribution, or "bell curve,"
although several distributions exist that are commonly used. Typically, the data generating
process of some phenomenon will dictate its probability distribution. This process is called
the probability density function.
Probability distributions can also be used to create cumulative distribution functions (CDFs),
which adds up the probability of occurrences cumulatively and will always start at zero and
end at 100%.
Academics, financial analysts and fund managers alike may determine a particular stock's
probability distribution to evaluate the possible expected returns that the stock may yield in
the future. The stock's history of returns, which can be measured from any time interval, will
likely be composed of only a fraction of the stock's returns, which will subject the analysis
to sampling error. By increasing the sample size, this error can be dramatically reduced.
KEY TAKEAWAYS
● A probability distribution depicts the expected outcomes of possible values for a

given data generating process.
● Probability distributions come in many shapes with different characteristics, as
defined by the mean, standard deviation, skewness, and kurtosis.
● Investors use probability distributions to anticipate returns on assets such as stocks
over time and to hedge their risk.
Types of Probability Distributions

There are many different classifications of probability distributions. Some of them include
the normal distribution, chi square distribution, binomial distribution, and Poisson
distribution. The different probability distributions serve different purposes and represent
different data generation processes. The binomial distribution, for example, evaluates the
probability of an event occurring several times over a given number of trials and given the
event's probability in each trial. and may be generated by keeping track of how many free
throws a basketball player makes in a game, where 1 = a basket and 0 = a miss. Another
typical example would be to use a fair coin and figuring out the probability of that coin
coming up heads in 10 straight flips. A binomial distribution is discrete, as opposed to
continuous, since only 1 or 0 is a valid response.
The most commonly used distribution is the normal distribution, which is used frequently in
finance, investing, science, and engineering. The normal distribution is fully characterized by
its mean and standard deviation, meaning the distribution is not skewed and does exhibit
kurtosis. This makes the distribution symmetric and it is depicted as a bell-shaped curve
when plotted. A normal distribution is defined by a mean (average) of zero and a standard
deviation of 1.0, with a skew of zero and kurtosis = 3. In a normal distribution, approximately
68% of the data collected will fall within +/- one standard deviation of the mean;
approximately 95% within +/- two standard deviations; and 99.7% within three standard
deviations. Unlike the binomial distribution, the normal distribution is continuous, meaning
that all possible values are represented (as opposed to just 0 and 1 with nothing in between).
Probability Distributions Used in Investing
Stock returns are often assumed to be normally distributed but in reality, they exhibit kurtosis
with large negative and positive returns seeming to occur more than would be predicted by a
normal distribution. In fact, because stock prices are bounded by zero but offer a potentially
unlimited upside, the distribution of stock returns has been described as log-normal. This
shows up on a plot of stock returns with the tails of the distribution having a greater
thickness.
Probability distributions are often used in risk management as well to evaluate the probability
and amount of losses that an investment portfolio would incur based on a distribution of
historical returns. One popular risk management metric used in investing is value-at-risk
(VaR). VaR yields the minimum loss that can occur given a probability and time frame for a
portfolio. Alternatively, an investor can get a probability of loss for an amount of loss and
time frame using VaR. Misuse and overreliance on VaR has been implicated as one of the
major causes of the 2008 financial crisis.1
Example of a Probability Distribution
As a simple example of a probability distribution, let us look at the number observed when
rolling two standard six-sided dice. Each die has a 1/6 probability of rolling any single
number, one through six, but the sum of two dice will form the probability distribution
depicted in the image below. Seven is the most common outcome (1+6, 6+1, 5+2, 2+5, 3+4,
4+3). Two and twelve, on the other hand, are far less likely (1+1 and 6+6).

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit III DESCRIPTIVE ANALYSIS

An Overview of Descriptive Analysis

Techniques for Descriptive Analysis

3. A table of means by subgroup is used to show important differences across subgroups,

4. A crosstab or two-way tabulation is supposed to show the proportions of components

Types of Descriptive Analysis

2. Measures of Central Tendency

Sometimes, it is important to know how data is divided across a range. To elaborate

Advantages of Descriptive Analysis

What are the Benefits of Data Visualization?

There are three key benefits to data visualization:

1. Making Big Data Digestible

● Root cause analysis

● Service desk ticketing

3. Greater Efficiency & Understanding

What are the Different Types of Data Visualization?

Descriptive Statistics Examples

Types of Descriptive Statistics

Distribution (Also Called Frequency Distribution)

Statisticians depict frequency distributions as either a graph or as a table.

Variability (Also Called Dispersion)

1. List the scores and their means.

Raw Number/Data Deviation from Mean Deviation Squared

4 4-7.3= -3.3 10.89

6 6-7.3= -1.3 1.69

8 8-7.3= 0.7 0.49

8 8-7.3= 0.7 0.49

10 10-7.3= 2.7 7.29

M=7.3 Sum = 0.9 Square sums= 23.83

Univariate Descriptive Statistics

● Frequency Distribution Table

Bivariate Descriptive Statistics

Sampling is a type of method used in a statistical analysis where a selected number of

1. Unrestricted or Simple Random Sampling

● Non-Random or Non-Probability Sampling

1. The Cost and effort related to following a particular course of action.

Estimation has some elements, which include the following:

Examples of some estimation methods include the following:

1. The Cost and effort related to following a particular course of action.

Estimation has some elements, which include the following:

Examples of some estimation methods include the following:

1. The Cost and effort related to following a particular course of action.

Estimation has some elements, which include the following:

Examples of some estimation methods include the following:

(1) optimistic value, which represents the best-case scenario.

(2) pessimistic value, which represents the worst-case scenario.

(Optimistic + Pessimistic + (4 times Most Likely))/6.

2. Accuracy of the estimate: The accuracy of an estimate is a measure of uncertainty that

The accuracy of an estimate is shown in a range of estimates made by different estimators.

A 95% confidence interval might be 48 to 52 hours.

The PERT techniques is used to calculate the confidence interval.

5. Contributors to estimates: The estimators of an element are usually in charge of the

When should the estimation technique be used ?

What Is a Probability Distribution?

How Probability Distributions Work

● A probability distribution depicts the expected outcomes of possible values for a

Types of Probability Distributions

Probability Distributions Used in Investing

Example of a Probability Distribution

You might also like