0% found this document useful (0 votes)
163 views18 pages

Module 4 Mathematics in The Modern World

The document discusses data management and statistics. It defines key terms like population, sample, and measures of central tendency including the mean, median, and mode. It provides examples of calculating the mean average and weighted mean. The summary focuses on defining important statistical concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
163 views18 pages

Module 4 Mathematics in The Modern World

The document discusses data management and statistics. It defines key terms like population, sample, and measures of central tendency including the mean, median, and mode. It provides examples of calculating the mean average and weighted mean. The summary focuses on defining important statistical concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 18

GRACE MISSION COLLEGE

Catiningan, Socorro, Oriental Mindoro


e-Mail: grace.missioncollege@yahoo.com

Fear of the Lord is the foundation of true knowledge, but fools despise wisdom and discipline.

-Proverbs 1:7

MODULE 4 IN MATHEMATICS IN THE MODERN WORLD


DATA MANAGEMENT

LEARNING OBJECTIVES

After completing this module, the students must be able to:


a. Use a variety of statistical tools to process and manage numerical data.
b. Interpret correctly and objectively the statistical evidences through the gathered data and
make inferences of it.
c. Use the methods of linear regression and correlations to predict the value of a variable
given certain conditions.
d. Practice and display diligence, patience, honesty, accuracy and precision in solving
statistical problems.

INTRODUCTION
Data management is a process by which information is acquired and processed to ensure the
accessibility and reliability of the data for its users. One of the most important tool in processing and managing
such information is statistics. Statistics is utilized in most areas of human endeavor. It is usually used in
education, research, business, agriculture, and other fields and even in everyday life activities.

Definition 1: Statistics is a science which deals with the collection, organization, presentation, analysis, and
interpretation of data so as to give a more meaningful information.

Data or the pieces of information may be collected by conducting a survey, interview, observation, and
experiment. The data gathered can be properly organized and presented graphically by a line graph, bar graph or
pictograph or with the aid of a statistical table known as frequency distribution table (FDT). A concise and
meaningful conclusion is obtained from the analysis and interpretation of data. Relevant information can be
deduced from the analysis of numerical descriptions and predictions may be made based on a small group to
project the whole population. The work of statistics offers a wide area of concern. Thus, statistics is subdivided
into two branches, namely: descriptive statistics and inferential statistics.

Definition 2: Descriptive statistics refers to the collection, organization, summary, and presentation of data
while inferential statistics deals with the interpretation and analysis of data where conclusion is drawn based
from the subset of the population.

In descriptive statistics, a set of data is simply described without drawing any inferences or implications.
The data is merely summarized and discussed in a clear, concise and informative manner. In inferential statistics,
information or inferences concerning a large group known as population is provided based on the study of a
representative group or selected members in the population which are identified as sample. Calculating the
average rating of a class of 40 students in Math 01 illustrates the descriptive statistics while determining the
performance of the same class based on the performance of 10 randomly selected members in the class exhibits
inferential statistics.

BASIC TERMS

Some of the basic terminologies and notations involved in statistics are the following:

a. Population - a collection or set of things or objects under consideration


b. Sample - a subset or representative group of the population
c. Data - refers to the information gathered in a research
Statistical data are classified according to their sources, namely: primary data or secondary data.

 Primary data – information gathered from respondents by the researcher himself.


 Secondary data – information obtained from published materials or data gathered by other individuals
or agencies. These are the data which are transcribed from original sources.
d. Array – listing of observations which are arranged in an increasing or decreasing magnitude
e. Parameter - a value which is computed from a population
f. Statistic – a value which is computed from a sample
g. Variable – a characteristic of interest that has been observed or measured on every member of the
population or sample.
A variable may be quantitative or qualitative where quantitative variable is further classified as discrete or
continuous.

i. Quantitative/Numerical variable – describes the amount or number of an element of a sample or


population

 Discrete – takes on a countable amount (it is usually expressed as whole number)


Example: number of books owned by a student

 Continuous – measured in a continuous scale (it takes any value within a range or interval)
Example: height of the students (in feet)

ii. Qualitative/Categorical variable – describes the quality, category, or character of an element of a


population or sample

Examples:
gender (male or female)
hair color (black, brown, blonde)
level of satisfaction of a student on his grade (highly satisfied, satisfied, not satisfied)

LEVELS OF MEASUREMENT

A more detailed distinction, termed as the levels of measurement, is used by some researchers in
examining the information that is collected. It is classified as follows:
1. Nominal Measurement - numbers or symbols are used to code or classify each element in the
population. Note that the assigned numbers have no numerical meaning.
Examples: gender, educational background, employment status

2. Ordinal Measurement – uses numerical category that expresses the meaningful order. There is no
indication of distance between positions. The numbers become meaningful because they reveal whether
one class or category is more or less than the other. Categories are ranked according to the order of their
value on the property like first, second, third; oldest, next oldest, youngest.
Example: rank in beauty contest

3. Interval Measurement – has equal intervals. There is significance to the distance between any two
values. It tells us that one unit differs by a certain amount of the property from another unit. It has no
absolute zero.
Example: Aptitude test, temperature
4. Ratio Measurement – A variable measured at this level not only includes the concepts of order and
interval, but also includes the idea of ’nothingness’, or absolute zero.

Example: Measurement of height, weight, ages

Remark: The scale of measurement depends mainly on the method of measurements and not on the property
being measured.

For instance, the weight of a pack of milk measured in kilograms has an interval scale but if the boxes
are labelled as one of small, medium or large, the weight is measured in ordinal scale.

MEASURE OF CENTRAL TENDENCY

One way of summarizing the data is to figure out the data set by using the descriptive measures. Among
the most commonly used descriptive measures which are important are the measures of central tendency and
measures of dispersion.

Definition 3: A measure of central tendency (or central location) is a single value that is used to identify the
“center” of the data set or set of observations.

The three measures of central tendency are the mean, median and mode where the mean is the most
familiar measure of the “center”.

Definition 4: The mean also known as the arithmetic average is the sum of all the observed values divided by
x1
the number of observations in the data set. It can be computed as where is the ith observation
and n is the number of observations in the data set.

The mean of the population is symbolized by the lowercase letter “mu” in Greek alphabet, μ , while the
mean of the sample is represented by x (x – bar).
Example 1: The scores of five students who are selected randomly in a class of Math 01 are as follows: 44, 37,
41, 35 and 32. Find their average score.

Solution: Applying the mean of ungrouped data gives


Hence, the average score of the five students is 37.8.

The means of subgroups can be combined to come up with the group mean known as weighted mean. This can
be calculated using the formula

where: x1 is the i th observation


f 1 is the frequency or weight for each observation
n is the total of the frequencies

Example 2: If the final examination of a class in statistics is given the weight 2, the average quizzes the weight
3, and a project report the weight 1, what would be the mean grade of a student who got the grades 90, 85 and
87, respectively.

Solution:

The mean grade of the student is 87.

Remarks:

1. The mean may not be an actual observation in the data set.


2. The mean reflects the magnitude of every observation since every observation contributes to the value of
the mean.
3. The mean is not a good measure of central tendency if there is an extreme value or observation since it is
easily affected by extreme values. The best measure of center for this case is the median.
Definition 5: The median is a single value which divides an array of observations into two equal parts such that
50% of the observations falls above it and the remaining 50% falls below it. It may be written symbolically by
read as “x - tilde”.

The median of the data set consisting of an odd – numbered observations is the middlemost value in the
list. That is, where n is the number of observations. If n is even, the median is the average of the two
m m
middlemost values. It can be computed as where 1 and 2are the two middlemost values. Take
note that the observations are first arranged in an array form (from lowest to highest) before getting the median
value.

Example 1: The number of books owned by the eleven children are as follows: 5, 2, 4, 6, 5, 10, 7, 6, 9, 8, 6.
What is the median?

Solution: Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10. Since the list contains 11 numbers
then the median is the middlemost value (6th number) which is 6.
Example 2: Compute the median of the data set: 2.5, 4.0, 5.8, 3.5, 2.5, 8.2, 7.1, 3.7

Solution: Forming an array, we have 2.5, 2.5, 3.5, 3.7, 4.0, 5.8, 7.1, 8.2. There are values, hence, the median
is calculated as .

Remarks:

1. The median value may not be an actual observation in the data set.
2. The median is a positional value, hence, it is not affected by the presence of extreme observations.
3. When the data is qualitative, median is not a possible measure so described the center by determining
the mode.
Definition 6: The mode is an observation that occurs most frequently in the given data set.

Example 1: Find the mode in the following sets of scores.


a) set A: 36, 36, 12, 29, 35, 45. 50, 45, 45, 53
b) set B: 8, 7, 6, 5, 6, 9, 2, 3, 11, 11, 43, 10
c) set C: 39, 23, 25, 25, 63, 37, 45, 37, 48, 51, 28, 45, 50
d) set D: 2, 9, 8, 12, 5, 13, 6, 10 Solution:
The mode in set A is 45 because 45 occurs most frequently in the list. Both 6 and 11 have the most
number in set B, therefore, set B has the mode equal to 6 and 11. The mode in set C are 25, 37 and 45 since these
numbers have the highest frequency. Each element in set D has the same number of occurrences, thus, the data
set has no mode. The distribution of data may be classified as unimodal, bimodal, trimodal or multimodal
distribution depending upon the number of modal values in the given data set. In the above example, set A is
unimodal, set B is bimodal and set C is trimodal.

Example 2: What is the modal color of the shirt worn by the students if the data gathered were as follows:
white, gray, gray, black, white, red, red, gray, black, white, white, red, gray, red, gray, black, red, red, gray, gray,
black?

Solution: Since gray has the highest frequency, then the modal color of the shirt worn by the students is gray.

Remarks:
1. The mode can be used for both quantitative and qualitative data.
2. It is very much affected by the method of grouping.
3. It is determined by the frequency and not by the values of the observations.

ACTIVITY 1

1. Company ABC is awarding the top ten most outstanding workers in their company every year. The ages
of the top ten awardees for the year 2018 are 47, 53, 36, 60, 30, 28, 42, 43, 38 and 52. Determine the
mean, median and mode of the ages.
2. The mean weight of 50 Balikbayan boxes is 135 kgs. What is the approximate total weight of all the
boxes?
3. The average height of the four basketball players is 74 inches. If the height of the three players are 69
inches, 72 inches and 78 inches, what is the height of the fourth player?
4. What is the median of the distribution given by 23, 17, 12, 8, 14, 25, 19, 22, 18? If the maximum value
is replaced by 40, what effect will this have on the median? How about if the minimum is replaced by 0?
5. The final grades of a student in six subjects he enrolled last semester are shown below. Determine her
average grade. If the subjects were of equal number of units, what would be her average?

MEASURE OF DISPERSION

In some cases, describing the data using the measures of central tendency alone is not enough to provide
a sufficient information concerning a population or sample. It should be supplemented by an analysis on how the
individual elements of the population/sample tends to cluster around the central tendency. Thus, an analysis on
the variability of the observations may be applied.

Definition 7: A measure of dispersion/measure of variation is a quantity that measures the spread or


variability of the values in a given set of data.

The most commonly used measures of dispersion are the range, variance, and standard deviation. The
simplest measure and easiest to compute but a rough estimate for the measure of dispersion is the range.

Definition 8: The range, R, is the difference between the highest value (H) and lowest value (L) in the data set.
That is, R = H – L.

Example 1. Compare the performances of the three students based on their ratings (in percent) in the 5 long
tests.

Solution:
Student A : 83, 80, 89, 78, 70
Student B : 78, 79, 80, 81, 82
Student C : 80, 80, 80, 80, 80

In terms of measure of central tendency, each student performs equally since they have same average
rating of 80%. However, looking at the variability of their ratings, Student A has the highest range as compared
to the other students. This shows that scores of student A are more dispersed than the other. The rating of
Student A is fluctuating while that of Student B is uniformly distributed. On the other hand, Student C has range
equal to zero so his ratings are all concentrated at its mean indicating that the distribution has no spread.

Example 2. The average daily allowances (in pesos) of 12 college students studying at University Y are 112,
127, 118, 147.5, 165.5, 99.75, 150, 145, 145, 102, 136.25 and 113. Find the range.

Solution: Given: H and L then range, R .

The range of the daily allowances of 12 college students is pesos.

Remarks:
1. The larger the value of the range, the more dispersed the observations are.
2. The range considers only the extreme values or observations in the data set.
A more reliable measure in describing the spread of a set of observations is the standard deviation. Most
researches uses this measure in the treatment of data.

The computation includes all the values in the data set.

Definition 9: The standard deviation is the positive square root of the variance. The variance is the average of
the squared deviations of every observation from the mean.

The standard deviation and variance can be obtained from a population and a sample but most its
applications utilizes the sample rather than the population due to the complete enumeration of the latter. The unit
of the variance is squared unit while that of the standard deviation is the same as the unit of the data set. The
following symbols are used to designate these measures to a population and sample.

Population Sample

Standard deviation

Variance

The variance and standard deviation of a population are calculated by using the formulas below.

Variance and Standard deviation of Population: Consider be the N elements of a


population. Then, the population variance is and the population standard deviation is
.

Sample Variance: Let n be the random sample of observations. Then, the sample
variance is and the standard deviation of the sample is

Example 1: The following are the scores of a student in all her long exams in Calculus: 83, 80, 89, 78, and 70.
Calculate the standard deviation.

Solution:

The result indicates that on the average, the percentage scores of the student tends to deviate from the mean by
an amount of 6.23 units.
Example 2: The following data were obtained by sampling on a population. Find the variance and the standard
deviation of the sample. 10 12 14 15 17 18 18 24

Solution:

The variance is 18.57 while the standard deviation is approximately 4.31.

What can you infer from this?

Remarks: A large amount of standard deviation indicates that, on the average, the data values will be far from
the mean while the standard deviation of smaller amount shows that, on the average, the data values will be
close to the mean.

ACTIVITY 2

Answer the following. Show a complete and neat solution for each problem.

1. An interview was made to a class of 20 college students to determine the number of books owned by the
students. The data gathered are as follows: 4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10, 10, and 12.
Treating the data as a population, calculate the standard deviation.

2. (Adopted from Mathematics A Practical Odyssey). To settle an argument over who is a better bowler
between Danny and George, the two agreed to bowl six games and whoever has the highest “average” will be
the best. Their bowling scores are presented in the table below. Compute and compare their averages. Who is the
better bowler?

George 185 135 200 185 250 155

Danny 182 185 188 185 180 190


3. (Mathematical Excursions by Aufmann). A consumer testing agency has tested the strengths of 3 brands
1
of −¿inch rope. The results of the tests are shown in the following table. According to the same test results,
8
1
which company produces of −¿inch rope for which the breaking point has the smallest standard deviation?
8

Company Breaking point of 1/8 – inches rope in pounds

Trustworthy 122, 141, 151, 114, 108, 149, 125

Brand X 128, 127, 148, 164, 97, 109, 137

NeverSnap 112, 121, 138, 131, 134, 139, 135

4. Ten used trail bikes are randomly selected from a bike shop, and the odometer reading of each is
recorded as follows: 1,902, 103, 653, 1,901, 788, 361, 216, 363, 223, 656. Solve for the standard
deviation and interpret.

Measures of Relative Position

A statistical tool which is significant in identifying the position of an observation relative to the other
elements in a given data set the measure of relative position.

Definition 10: A measure of relative position is a statistical measure that provides the specific location of an
observation relative to the other values when the data are in ranked order.

This measure divides the data set into subgroups such that a specific portion of the data set belongs to the lower
bracket and the remaining on the higher bracket. Percentiles, deciles, and quartiles are among the most
commonly used measures of relative position.

Definition 11:

The percentile, denoted by P1, is a value that divides an array of observations into 100 equal parts in
order that i % of all the observations lies below P1.

The quartile, denoted by, is a value that divides an array of observations into four equal parts in order
that (𝑖 × 25%) of all the observations lies below Q1.

The decile, denoted by D1, is a value that divides an array of observations into ten equal parts in order
that (𝑖 × 10%) of all the observations lies below D1.

In determining the desired measure, the data must first be arranged in an increasing pattern. The entire
set of observations in a percentile contains 99 partitions which are located at P1, P2,..., and P99 where 1% of
the total observations are lower than P1 and the remaining 99% are higher than P1, 2% of the total observations
are found below P2 and 98% are above it, and so on.
Analogous to this, quartiles have the subdivisions described by Q1 (the first quartile which has 25% of
the observations falling below it and the remaining 75% above it), Q2 (the second quartile which is equal to the
median and has 50% of the observations below it), and Q3 (the third quartile with 75% of the total observations
falls below it and the remaining 25% lies above it).

The portions of deciles are the 1st decile (D1), 2nd decile (D2), , and 9th decile (D9). The lowest decile
D1 corresponds to a value in the set wherein 10% of the whole observations are located below D1, the second
decile D2 corresponds to a value in which 20% of the entire observations are lower than D2, … , and so on up
to the last decile D9 which has a value positioned at the top such that 90% of all the observations are located
below the value corresponding to D9.

Remarks:

1. The quartile and decile can be determined by solving its equivalent percentile.

a.
b.
2. Given a data set, then Median = P50 = Q2 = D5.

Example 1: Joy was told that relative to the other scores on a long exam in Statistics, her score was the 95 th
percentile. This means that at least 95% of those who took the test had scores less than or equal to Joy’s score,
while at least 5% had a score higher than Joy’s.

Example 2: Given the following data set: 25, 5, 6, 12, 8, 16, 17, 22, 20, 9. Compute for

a) 20th percentile c) first quartile e) 3rd decile

b) 56th percentile d) 2nd quartile f) seventh decile

Solutions:

Arrange the scores in an increasing manner.

5, 6, 8, 9, 12, 16, 17, 20, 22, 25

a. 20th percentile

(location of 20th percentile)

This means that the 20th percentile is the second score from the lowest.

So, P20 = 6.

b. 56th percentile

When the result is not exact round it to the nearest whole number. The 56th percentile is approximately
described by the 6th value in the data set.
Thus, P56 = 16.

Note: Interpolation may be applied to find for an exact value corresponding to the 56th percentile. P56 = 5.6
means that the 56th percentile is between the 5th and 6th value. To interpolate, multiply the difference of the 5th
and 6th values by the decimal part then add the result to the 5th value. That is, (16 – 12) x 0.6 = 2.4. So, P56 =
12 + 2.4 = 14.4 which is the exact value.

c. First quartile,

P25 is located halfway between the 2nd and 3rd value in the list. So, P25 = 7 .

Since Q1 = P25, therefore Q1 =7.

d. 2nd quartile

Note that Q2 has the same value as the median. Solving for the median gives . So, Q2 = 14.

e. 3rd decile

(3rd value from the lowest)

Therefore, D3 = 8.

f. Seventh decile

(7th number in the list)

The seventh decile is 17.

Box - and - Whisker Plot

Definition 12: A diagram showing the representation of a 5-point summary of a data set specified by the lowest
and the highest values, the values corresponding to Q1 and Q3, and the median is called a box – and - whisker
plot also known as box plot.

The five important numbers are arranged increasingly in a horizontal or vertical scale. Diagrammatically, we
have

Diagram from Mathematical Excursions by Aufmann

Here is a summary in the construction of a box plot.


Steps in the Construction of Box – and – Whisker Plot

1. Arrange the values in an increasing pattern.


2. Compute for Q1, median , and Q3.
3. Locate the five numbers (lowest and the highest values, Q1, median, and Q3) in the number line and
draw a rectangle (box) above the scales covering Q1 , median, and Q3 then draw a line segment across
the box passing through the median.
4. Connect the box to the extreme values by a line segment (known as whisker).
Example: Draw a box-and-whisker plot for the given data set: 23, 15, 5, 6, 12, 8, 16, 17, 22, 20, 9, 10.

Solution:

 Arrange the values in an increasing pattern.


5, 6, 8, 9, 10,12, 15, 16, 17, 20, 22, 23

 Identify the lowest and highest values and compute for Q1 , median , and Q3.
.

Follow steps 3 and 4 to illustrate the figure.

Stem-and-leaf display

An informative arrangement of data where actual values of the observations are displayed can be
visualized through the use of the stem-and-leaf display.

Definition 13. A stem - and- leaf display is an organized diagram showing the relative position of every
element in the data set such that the leading digit(s) become the stem and the trailing digit(s) becomes the leaf.

63 100 20 89 80 75 56 58 63 83

57 49 50 37 33 24 27 15 29 32

49 61 73 99 84 43 55 57 58 77

Example. The table lists the number of words used by 30 students in their reflection.

Draw a stem-and-leaf display of these data.

Answer:

Stem Leaf Origin

1 5 ` 15
2 0 4 7 9 20, 24, 27, 29
3 2 3 7 32, 33, 37
4 3 9 9 43, 49, 49
5 0 5 6 7 7 8 8 50, 55, 56, 57, 57, 57, 58, 58
6 1 3 3 61, 63, 63
7 3 5 7 73, 75, 77
8 0 3 4 9 80, 83, 84, 89
9 9 99
10 0 none

ACTIVITY 3

1. An interview was made to a class of 20 college students to determine the number of books owned by the
students. The data gathered are as follows: 4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10, 10, and 12. a.
a. Solve for the following measures and interpret the result.

i. P45 ii. Q1 iii. D4

b. Construct a box-and-whiskers plot.

c. Create the stem-and-leaf display.

2. Consider the scores of the two bowlers in the previous exercise.

George 185 135 200 185 250 155

Danny 182 185 188 185 180 190

a. Compare their scores which corresponds to i) Q3 ii) D7


b. If the scores of Danny and George are combined to form a single population, compute for i) P42 ii)
P70.

NORMAL DISTRIBUTION

When most of the observations are near the “center” and the distribution of data is nearly similar on
both sides then the distribution is said to follow a normal distribution. This distribution is one of the most
commonly used distribution in the field of Statistics which has various applications.

Definition 14: A normal distribution, named as the Gaussian distribution, is a continuous probability
distribution which is drawn graphically by a smooth bell-shaped curve called the normal curve having an
area under it which is equal to one.

Properties of a Normal Distribution


Any normal distribution has the following properties:
1. The total area under the normal curve is one.
2. The three measures of central tendency given by the mean, median and mode are all equal.
3. It is symmetric with respect to the vertical line .
4. The curve is asymptotic with respect to the horizontal axis on both directions.
The proportion of values in a given data set which is normally distributed is based on the mean and
the standard deviation of the data set. That is,

 about 68% of the observations fall within 1 standard deviation away from the mean;
 about 95% of the observations fall within 2 standard deviations away from the mean; and
 about 99.7% of the observations fall within 3 standard deviations away from the mean.
The diagram shows the different percentages defined by the empirical rules for normal
distributions.

Diagram from Mathematical Excursion by Aufmann


Every distribution has a unique probability so areas based on a standard normal distribution will be
used.

Definition: A standard normal distribution is a distribution of a random variable with mean zero and
standard deviation equal to one. That is, Z ~𝑁(0, 1).
A random variable X with mean and standard deviation can be transformed into a standard normal

variable Z with mean zero and standard deviation equal to one by using the formula .

Rules in Finding the Areas Under the Normal Curve


Case 1.
When the area under the curve is located to the left of , simply read its value corresponding to the area in
the table for the areas under the normal curve.

Example: 1. Find the area to the left of .


2. Give the probability
Solution:
Case 2.
Example: Find
Solution:

Case3.
This is applied when the area is bounded between two ordinates or values in an interval.

Example: What is the area bounded between Z = -1.22 and Z = 2.03

Applications:
Example : (Mathematical Excursions by Aufmann) During 1 week, an overnight delivery company found that
the weights of its parcels were normally distributed, with a mean of 24oz and a standard deviation of 6 oz.
a. What percent of the parcels weighed between 12 oz and 30 oz?
b. What percent of the parcels weighed more than 42 oz?
Solution:

a.

Example 2: The salaries of employees of a certain company in Metro Manila have a mean of Php5000 and a
standard deviation of Php1000. What is the probability that an employee selected will have a salary of

a. more than Php 5000?


b. between Php 5,750 and Php 6,500?
c. less than Php 9,000?

Exercises: Show a complete solution for each problem.


2. Given a normal distribution with µ = 50 and = 10, find the probability that X assumes a value between
45 and 62.
3. Given a normal distribution with µ = 300 and = 50, find the probability that X assumes a value greater
than 362.
4. In the qualifying examination for the admittance to college, the mean score was 65 and the standard
deviation was 8. If 1,265 students took the qualifying exam, how many of them scored between 60 and
75?

5. Records show that in a certain hospital the distribution of the “length of stay” of its patients is normal
with a mean of 10.5 days and a standard deviation of 2 days.
a. What percentage of the patients stayed 8 days?
b. What is the probability that a patient stays in the hospital between 9 and 11 days?
6. An electrical firm manufactures light bulbs that have a length of life that is normally distributed with
mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a bulb burns
between 778 and 834 hours.

CORRELATION AND REGRESSION


Several research studies focus on the relationships between two or more things. For instance, a
teacher may want to know if study habits of students may relate to their performance in the classroom. A
businessman needs to predict the selling prizes of his products based on the monthly consumption demand.
The doctor needs to find out if there is an evidence of relationship between cholesterol and triglyceride
levels. An agriculturist wants to know if the level of experience and practices of the farmers in planting
tobacco greatly affects their production. All of these things are involved in the correlation and regression
analysis of data.

Correlation and regression are two related statistical tools. Correlation is used to find out if there is a
relationship between two variables while regression is a means to predict or forecast the value of one
variable in terms of the other.

Definition: Correlation analysis is a method used measure the degree of relationship or association
between two or more variables.

The relationship between two variables can be shown graphically by sketching the scatter diagram.

Scatter diagram – also known as scatter plot, is pictorial presentation showing the relationship
between two variables. It shows the direction and shape of the association being conveyed. This is done
by plotting the points corresponding to the observations/data on the first quadrant of a rectangular
coordinate system.

Example:
Types of Correlation:
1. Positive correlation – a direct relationship between two variables exists. That is, as one variable
increases (decreases), the other also increases(decreases).
2. Negative correlation – an inverse relationship exists between the variables. Here, one variable
increases as the other decreases or vice versa.
3. Zero correlation – exists when scores in one variable tend to score neither systematically high nor
systematically low in the other variable. It indicates that there is no correlation between the
variables. The points in the scatter diagram are in random manner.
Remark: The relationship between two variables may be described by its magnitude or its strength. In
terms of strength, the correlation may be perfect, high, moderate, or low. In a perfect correlation, all points
in the scatter diagram lie on a straight line.

The degree or strength of relationship between two variables may also be described by computing a
single number called the correlation coefficient.

The Pearson Correlation Coefficient (r)


- named after an English mathematician Karl Pearson (1857 – 1936) - measures relationships in
variables that are linearly related.
- its value ranges from

- it is computed through the formula

The correlation coefficient may be interpreted using the correlation scale shown below:

Range of Values Interpretation


1 Perfect Positive (Negative) Correlation

0.91 0.99 Very high positive (Negative) Correlation

0.71 0.90 High positive (Negative) Correlation


0.51 0.70 Moderately positive (Negative) Correlation

0.31 0.50 Low positive (Negative) Correlation


0.01 0.30 Negligible positive (Negative) Correlation

0.00 No Correlation
Testing the Significance of

The t – test is used to verify if the result is statistically significant or not. This can be computed by using the

formula .

Example: A research study was conducted to determine the correlation between students’ grade in English and
their grades in Mathematics. A random sample of 10 students in a class was taken and the results of the sampling
were tabulated below.

Use the 5% level of significance.

Student No. 1 2 3 4 5 6 7 8 9 10

English grade 93 89 84 91 90 83 75 81 84 77

Mathematics grade 91 86 80 88 89 87 78 78 85 76

REGRESSION – describes the process of estimating the relationship between two variables. The
relationship is estimated by by fitting a straight line through the given data. The least squares method is
useful in determining the equation of the line that best fit the data. This line is known as the regression line
which keeps the prediction errors to be a minimum. It is given by the equation

where is the predicted value,

is the regression value ( slope of the line)


is the y – intercept of the line which is computed as
where is the mean of x – values

is the mean of y – values

To find the slope,

You might also like