Math 133 - Unit 7 Graphing Data-1
Math 133 - Unit 7 Graphing Data-1
In all experiments that we perform, we collect data. For example, we may collect data on a particular
metal’s resistance to electrical current with respect to its temperature. Or in social science, we may collect
data on the average income vs the age of the person. Is it true that the average income increases as one
grows older?
If the data is numerical, often, we will want to graph the data. From the graph, we can predict what the
outcome is like at a particular instance. For example, we would like to know how much gasoline a
particular engine consumes at a particular speed.
From the graph that we generate from our experiments or observations, we can predict or estimate the
outcome given a particular situation. For example, suppose we perform an experiment that measures the
height and weight of people on Earth. From the data we collect, we can obtain the function that relates
the weight and height of people. Then, from this function, we can obtain the expected weight of a person
given his or her height.
To determine a useful relationship, we need to distinguish between the independent variable and the
dependent variable in the data we collect. In the previous example of the height and weight of people,
since we want to know what is the expected weight of a person given his or her height, then we can take
height as the independent variable and weight as the dependent variable, i.e. the weight depends on the
height of a person. Usually, we assign the symbol x to be the independent variable and y as the dependent
variable.
Different types of situations will plot different types of graphs. For example, we already know that when
we throw a rock in the air, its distance from the ground vs time follows a parabolic graph. Thomas
Malthus, a renowned social scientist, discovered that the world’s population vs time follows an
exponential graph, i.e. the world’s population increases exponentially. Eventually, we are going to run
out of space on planet Earth to live in or we shall eventually run out of food given the rate we populate
Earth.
If our data points, when plotted on the rectilinear Cartesian coordinate plane, appears to follow a straight
line, then we can say that there is a linear correlation in our data. We can define a straight line that best
approximates the data points, calculate its slope m and its y-intercept, b.
7.1.2 The exponential relationship, 𝑦 = 𝑏10𝑚𝑥
If we plot our data points on the rectilinear Cartesian coordinate plane and the data points do not follow
a straight line, it does not mean there is no correlation in the data. It just means there is no linear
correlation and the correlation could be of a non-linear type.
To find out if this non-linear type correlation is exponential, we could plot the points (𝑥1 , 𝑦𝑖 ) on a semi-
log graph paper. The y-axis of this graph paper has been adjusted to increase on the logarithmic scale. So
when we plot the (𝑥1 , 𝑦𝑖 ) points on this semi-log graph paper, it is similar to us putting points (𝑥𝑖 , log10 𝑦𝑖 )
on the rectilinear graph paper. If we obtain, on this semi-log graph paper, a set of points that resemble a
straight line, this indicates that the data points follow an exponential correlation.
We can also check out if the correlation is exponential by transforming the exponential relationship to a
linear relation as follows,
⇒ log10 𝑦 = 𝑚𝑥 + log10 𝑏
We then find the logarithm to the base 10 of the y-coordinates and plot them versus the x-coordinates,
i.e. we plot the points (𝑥𝑖 , log10 𝑦𝑖 ) for all the data points on the rectilinear Cartesian coordinate system.
If this results in a set of points resembling a straight line, then the data follows an exponential correlation.
If not, then the data points do not follow an exponential correlation.
If the relationship is exponential, we can then find the straight line that best approximates the points
(𝑥𝑖 , log10 𝑦𝑖 ). We calculate the slope m and find the value of b by finding the y-intercept, 𝐵 = log10 𝑏,
and then calculating b from
𝑏 = 10𝐵
The power relationship is the other non-linear type of correlation we can check for if we are not able to
find a linear relationship in the data set. To check if there is a power relationship in the given data set, we
plot the points (𝑥1 , 𝑦𝑖 ) on a log-log type of graph paper. The x-axis and the y-axis on this type of graph
paper have been adjusted to increase on the logarithmic scale. So when we plot the (𝑥1 , 𝑦𝑖 ) points on
this log-log graph paper, it is similar to us putting points (log10 𝑥𝑖 , log10 𝑦𝑖 ) on the rectilinear graph paper.
If we obtain, on this log-log graph paper, a set of points that resemble a straight line, this indicates that
the data points follow a power correlation.
We can also check out if the correlation is power by transforming the power relationship to a linear
relation as follows,
We then find the logarithm to the base 10 of both the x-coordinates and the y-coordinates, and plot the
points (log10 𝑥𝑖 , log10 𝑦𝑖 ) for all the data points on the rectilinear Cartesian coordinate system. If this
results in a set of points resembling a straight line, then the data follows a power correlation. If not, then
the data points do not follow an exponential correlation.
If the relationship is power, we can then find the straight line that best approximates the points
(log10 𝑥𝑖 , log10 𝑦𝑖 ). We calculate the slope m and find the value of b by finding the y-intercept,
𝐵 = log10 𝑏, and then calculating b from
𝑏 = 10𝐵
The linear correlation coefficient, r, gives a measure of the strength of the linear relationship between
two quantitative variables.
Suppose we are given n number of data points (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), ⋯, (𝑥𝑛 , 𝑦𝑛 ). The formula to calculate
the linear correlation coefficient of these data points is
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑖=1
∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛
𝑖=1
𝑛
∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + ⋯ + 𝑥𝑛 2
𝑖=1
∑ 𝑦𝑖 2 = 𝑦1 2 + 𝑦2 2 + ⋯ + 𝑦𝑛 2
𝑖=1
𝑛 2
(∑ 𝑥𝑖 ) = (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )2
𝑖=1
𝑛 2
(∑ 𝑦𝑖 ) = (𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛 )2
𝑖=1
And
𝑛
∑(𝑥𝑖 𝑦𝑖 ) = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛
𝑖=1
The line that best approximates all the above data points is the least-squares regression line. It is the line
that minimizes the sum of the squares of the residuals or errors. It is a straight line
𝑦 = 𝑚𝑥 + 𝑏
Solution: First of all, let us plot the data points on a rectilinear graph.
50
40
30
20
10
0
0 5 10 15 20 25 30 35 40
Discharge, Q (L/s)
Now, we plot the data points on a semi-log graph.
10
1
0 5 10 15 20 25 30 35 40
Discharge, Q (L/s)
10
1
1 10 100
Discharge, Q (L/s)
It appears that all three graphs are quite close linearly. We can check mathematically if the data follows
best a linear relationship or an exponential relationship or a power relationship by calculating the linear
correlation coefficient for all three relationships.
(i) Does the data follow a Linear Relationship?
By letting the symbols x and y to represent Q and P respectively, we have from the above table
Plugging into the above formula, we calculate the linear correlation coefficient as
7105
= ≈ 0.999624
√4900√10310
This value is very close to 1 which indicates a very strong linear correlation between the Discharge and
the Power variables.
(ii) Does the data follow an Exponential Relationship?
Since we are calculating what is the power requirement for a particular discharge, we let Discharge to be
the independent variable and Power to be the dependent variable. Hence, we should let x be Q and y be
P.
𝑦 = 𝑏10𝑚𝑥
can be expressed linearly as
log10 𝑦 = 𝑚𝑥 + log10 𝑏
We further let 𝑌 = log10 𝑦 and 𝐵 = log10 𝑏, and the above equation becomes
𝑌 = 𝑚𝑥 + 𝐵
We need to calculate the linear correlation coefficient between x and log10 𝑦. Let us not forget that we
take x to represent Q and y to represent P. Thus, log10 𝑃 = log10 𝑦 = 𝑌.
Discharge, Power,
𝐥𝐨𝐠 𝟏𝟎 𝒚
Q (L/s) P (kW) 𝒙𝟐 𝒀𝟐 𝒙𝒀
Y
x y
5 31 1.491362 25 2.22416 7.456808
10 39 1.591065 100 2.531487 15.91065
15 45 1.653213 225 2.733112 24.79819
20 53 1.724276 400 2.973127 34.48552
25 60 1.778151 625 3.161822 44.45378
30 67 1.826075 900 3.334549 54.78224
35 75 1.875061 1225 3.515855 65.62714
Summation 140 - 11.9392 3500 20.47411 247.5143
Plugging into the linear correlation coefficient formula with 𝑛 = 7 number of data points, we have the
linear correlation coefficient as
61.11202
= ≈ 0.992186
√4900√0.774232
This value is also quite close to 1 which indicates a strong exponential relationship. However, this value
is not as close to 1 as the value calculated when assuming the data is linearly related. So our money is still
on the linear relationship.
𝑦 = 𝑏𝑥 𝑚
can be expressed linearly as
log10 𝑦 = 𝑚 log10 𝑥 + log10 𝑏
We further let 𝑋 = log10 𝑥, 𝑌 = log10 𝑦 and 𝐵 = log10 𝑏, and the above equation becomes
𝑌 = 𝑚𝑋 + 𝐵
We need to calculate the linear correlation coefficient between log10 𝑥 and log10 𝑦. Let us not forget that
we take x to represent Q and y to represent P. Thus, log10 𝑄 = log10 𝑥 = 𝑋 and log10 𝑃 = log10 𝑦 = 𝑌.
Discharge, Power,
𝐥𝐨𝐠 𝟏𝟎 𝒙 𝐥𝐨𝐠 𝟏𝟎 𝒚
Q (L/s) P (kW) 𝑿𝟐 𝒀𝟐 𝑿𝒀
X Y
x y
5 31 0.69897 1.491362 0.488559 2.22416 1.042417
10 39 1 1.591065 1 2.531487 1.591065
15 45 1.176091 1.653213 1.383191 2.733112 1.944329
20 53 1.30103 1.724276 1.692679 2.973127 2.243335
25 60 1.39794 1.778151 1.954236 3.161822 2.485749
30 67 1.477121 1.826075 2.181887 3.334549 2.697334
35 75 1.544068 1.875061 2.384146 3.515855 2.895222
Summation - - 8.595221 11.9392 11.0847 20.47411 14.89945
As before, plugging into the linear correlation coefficient formula with 𝑛 = 7 number of data points, we
have the linear correlation coefficient as
1.676075
= ≈ 0.988262
√3.715072√0.774232
This value is the lowest among the three coefficients we obtained. So this set of calculations show that
the data follows the linear relationship the best.
Finding the straight line that best approximates the data points
Now, we proceed to find the straight line that best approximates the data points. Let us pull out the table
we generated to find the linear correlation coefficient for the Linear Relationship. Once again, we let
𝑥 = 𝑄 and 𝑦 = 𝑃.
Discharge,
Power, P (kW)
Q (L/s) 𝒙𝟐 𝒙𝒚
y
x
5 31 25 155
10 39 100 390
15 45 225 675
20 53 400 1060
25 60 625 1500
30 67 900 2010
35 75 1225 2625
Summation 140 370 3500 8415
𝑦 = 𝑚𝑥 + 𝑏
Something interesting: This best approximating straight line is best known as the Least Squares
Regression Line because we use the least squares method to find the formula to generate its slope.
Plugging the values from the table into the formula, we have
(7)(8415) − (140)(370)
𝑚=
(7)(3500) − 1402
7105 29 9
= = =1
4900 20 20
So we have
∑𝑛𝑖=1 𝑦𝑖 370 6
𝑦̅ = = = 52
𝑛 7 7
∑𝑛𝑖=1 𝑥𝑖 140
𝑥̅ = = = 20
𝑛 7
And the y-intercept is
370 29 167 6
𝑏 = 𝑦̅ − 𝑚𝑥̅ = − ( ) (20) = = 23
7 20 7 7
∴ The least squares regression line that best approximates the data points is
9 6
𝑦 = (1 ) 𝑥 + 23
20 7
𝑦 = 1.45𝑥 + 23.86
Or in terms of Q and P,
𝑃 = 1.45𝑄 + 23.86
And when 𝑄 = 17.5 𝐿/𝑠, the estimated power required is obtained by plugging this value of Q into the
straight line that we obtained,
9 6
𝑃 = (1 ) (17.5) + 23 ≈ 49.23 𝑘𝑊
20 7
200
Resistance, R
150
100
50
0
0 2 4 6 8 10 12
Area, A
100
Resistance, R
10
1
0 2 4 6 8 10 12
Area, A
[3] On a log-log graph
100
10
1
0.01 0.1 1 10
Area, A
A visual inspection of the three graphs tells us that the data follows a Power Relationship since only its
log-log graph shows the data points following a straight line quite closely.
We want the area to be the independent variable and the resistance to be the dependent variable, i.e. we
want to know what is the resistance when the area is of a particular value. So we let
By letting 𝑋 = log10 𝑥, 𝑌 = log10 𝑦 and 𝐵 = log10 𝑏, we can rewrite the above equation as
𝑌 = 𝑚𝑋 + 𝐵
𝐵 = log10 𝑏 ⇒ 𝑏 = 10𝐵
Now, we expand the given table rewriting x and y in their logarithmic values,
Area, Resistance,
𝒎𝛀 log10 𝑥 log10 𝑦
A (𝒎𝒎𝟐 ) R( ) 𝑋2 𝑋𝑌
𝒎 X Y
x y
0.05 215 -1.30103 2.332438 1.692679 -3.03457
0.1 110 -1 2.041393 1 -2.04139
0.2 57 -0.69897 1.755875 0.488559 -1.2273
0.5 23 -0.30103 1.361728 0.090619 -0.40992
1 12 0 1.079181 0 0
3 4 0.477121 0.60206 0.227645 0.287256
5 2.5 0.69897 0.39794 0.488559 0.278148
10 1.3 1 0.113943 1 0.113943
Summation - - -1.12494 9.684558 4.988061 -6.03384
𝑌 = 𝑚𝑋 + 𝐵
Plugging the values from the table into the formula, we have
(8)(−6.03384) − (−1.12494)(9.684558)
𝑚=
(8)(4.988061) − (−1.12494)2
−37.3762
= ≈ −0.96732
38.639
So we have
∑𝑛𝑖=1 𝑌𝑖 9.684558
𝑌̅ = = = 1.21057
𝑛 8
and
∑𝑛𝑖=1 𝑋𝑖 −1.12494
𝑋̅ = = = −0.14062
𝑛 8
And the y-intercept is
𝑦 = 𝑏𝑥 𝑚
⇒
𝑦 = 11.87266𝑥 −0.96732
Or to be more precise,
𝑅 = 11.87266𝐴−0.96732
We want to find the type of relationship between the radiation passing through a plate and the plate’s
thickness. So let us plot the data points.
[i] Data points plotted on a rectilinear graph paper
6000
Geiger Reading, G (rad/s)
5000
4000
3000
2000
1000
0
0 10 20 30 40 50 60
Plate Thickness, t (in)
1000
100
10
1
0 10 20 30 40 50 60
Plate Thickness, t (in)
[iii] Data points plotted on a log-log graph paper
1000
100
10
1
0.1 1 10 100
Plate Thickness, t (in)
A visual inspection of the three graphs tells us that the data follows an Exponential Relationship since only
its semi-log graph shows the data points following a straight line closely.
We want the Plate Thickness to be the independent variable and the Geiger Reading to be the dependent
variable, i.e. we want to know what is the amount of radiation passing through the plate when the plate’s
thickness is of a particular value. So we let
𝑌 = 𝑚𝑥 + 𝐵
𝐵 = log10 𝑏 ⇒ 𝑏 = 10𝐵
Now, we expand the table and rewriting y in its logarithmic values,
Plate Geiger
Thickness, Reading, 𝐥𝐨𝐠 𝟏𝟎 𝒚
𝒙𝟐 𝒙𝒀
t (in) G (rad/s) Y
x y
0.1 5950 3.774517 0.01 0.377452
0.5 5780 3.761928 0.25 1.880964
1 5565 3.745465 1 3.745465
2.5 4975 3.696793 6.25 9.241983
5 4125 3.615424 25 18.07712
10 2835 3.452553 100 34.52553
20 1340 3.127105 400 62.5421
30 630 2.799341 900 83.98022
40 300 2.477121 1600 99.08485
50 140 2.146128 2500 107.3064
Summation 159.1 - 32.59637 5532.51 420.7621
𝑌 = 𝑚𝑥 + 𝐵
Plugging the values from the table into the formula, we have
(10)(420.7621) − (159.1)(32.59637)
𝑚=
(10)(5532.51) − (159.1)2
−978.462
= ≈ −0.0326
30012.29
𝑦 = 𝑏10𝑥𝑚
⇒
𝑦 = (6002.556)10−0.0326𝑥
Or to be more precise,
𝐺 = (6002.556)10−0.0326𝑡