AI in Math
AI in Math
1. Order of matrix – If a matrix has 3 rows and 4 columns, order of the matrix is
3*4 i.e. row*column
2. Square matrix – The matrix in which the number of rows is equal to the
number of columns
3. Diagonal matrix – A matrix in which all the non-diagonal elements equal
to 0 is called a diagonal matrix
4. Upper triangular matrix – Square matrix where all the elements below
the diagonal is equal to 0
5. Lower triangular matrix – Square matrix where all the elements above the
diagonal equal to 0
6. Scalar matrix – Square matrix where all the diagonal elements equal to some
constant k
7. Identity matrix – Square matrix where all the diagonal elements
equal to 1 and all the non- diagonal elements equal to 0
8. Column matrix – The matrix which consists of only 1 column.
Sometimes, it is used to represent a vector.
9. Row matrix – A matrix consisting only of row.
10. Trace – It is the sum of all the diagonal elements of a square matrix.
When we represent a set of numbers in the form of ‘M’ horizontal line (called rows) and ‘N’ vertical
line (called columns), this arrangement is called m x n (m by n) matrix.
If A= |123|
|456|
|789|
The top row is row 1. The leftmost column is column 1. This matrix is a 3x3 matrix because it has
three rows and three columns. In describing matrices, the format is:
rows X columns
Each number that makes up a matrix is called an element of the matrix. The elements in a matrix
have specific locations.
The upper left corner of the matrix is [row 1 x column 1]. In the above matrix the element at row 1
column 1 is the value 1. The element at [row 2 x column 3] is the value 6.
A=[3]
−5
3. Square Matrix: A matrix in which number of rows are equal to number of columns.
A= |123|
|456|
|789|
4. Diagonal Matrix: A matrix with all elements zero except its leading diagonal.
A= |2 0 0|
|0 30|
|0 0 4|
5. Scalar Matrix: A matrix in which all the diagonal elements are equal and all other elements
are zero.
A= |50 0|
|0 50|
|0 0 5|
And if all diagonal element is unity (1) and all other non-diagonal element is equal to zero, this matrix
is called Unit matrix.
A= |10 0|
|0 10|
|0 01|
1. Transpose
Transpose of a matrix creates a new matrix with number of rows and columns flipped This is denoted
by the superscript T next to the matrix AT.
C = AT.
A = | 12 |
|3 4|
|5 6|
AT = |1 35|
|2 46|
Inverse
For matrices, there is no such thing as division. You can add, subtract or multiply but you can’t divide
them. There is a related concept, which is called "inversion".
Matrix inversion is a process that finds another matrix that when multiplied with the matrix, results
in an identity matrix. Given a matrix A, find matrix B, such that
AB = I n or BA = I n
AB = BA = I n
Calculating inverse of matrix is slightly complicated, so let us use Inverse matrix calculator
- https://matrix.reshish.com/inverCalculation.php
2. Determinant
Every square matrix can be expressed using a number which is known as it determinant.
|a b|
A= |c d|
Example 1
If A= |2 4|
|3 8|
|A| = 2 x 8 – 4 x 3
= 16 – 12
=4
Example 2: A= |6 1 1|
|4 -2 5|
|2 8 7 |
= −306
There are 2 more matrices operations i.e. Trace and Rank, which students are advised to explore
themselves.
What is Vector?
The two-dimensional array-expression enclosed in brackets is a matrix while the one- dimensional
array expression in brackets are column vectors or simply vectors.
We begin by defining a vector, a set of n numbers which we shall write in the form
| x1 |
| x2 | x = | x3 |
| xn |
This object is called a column vector. Vectors are often represented using a lowercase character such
as “v”; for example: v = (V1, V2, V3) Where v1, v2, v3 are scalar values, often real values.
For instance, in the popular machine learning example of housing price prediction, we might have
features (table columns) including a house's year of construction, number of bedrooms, area (m^2),
and size of garage (auto capacity). This would give input vectors such as
| xn | = [ 1988 4 200 2]
[ 2001 3 220 1]
1. Vector Addition
The new vector has the same length as the other two. X = (y1 + z1, y 2 + z 2, y3 + z3 )
2. Vector Subtraction
Vector of unequal length can be subtracted from another vector of equal length to create a new third
vector.
x=x−y
As with addition, the new vector has the same length as the parent vectors and each element of the
new vector is calculated as the subtraction of the elements at the same indices.
3. Vector Multiplication
If we perform a scaler multiplication, there is only one type operation – multiply the scaler with a
scaler and obtain a scaler result,
axb=c
But vector has a different story, there are two different kinds of multiplication - the one in which the
result of the product is scaler and the other where the result of product is vector (there is third one
also which gives tensor result, but out of scope for now)
To begin, let’s represent vectors as column vectors. We’ll define the vectors A and B as the column
vectors
A= | Ax | B = | Box |
| Ay | | By |
| Az| | Bz|
We’ll now see how the two types of vector multiplication are defined in terms of these column
vectors and the rules of matrix arithmetic -
Scaler: Which has only magnitude, no direction. Vector: Which has both in it – magnitude and
direction.
This is the first type of vector multiplication, called dot product, written as A.B. The vector dot
product, multiplication of one vector by another, gives scaler result.
[Where do we use it in AI – This operation is used in machine learning to calculate weight. Please
refer “weight” in the Unit 2: Deep Learning]
If i = unit vector along the direction of x -axis j = unit vector along the direction of y -axis k = unit
vector along the direction of z -axis
If there are 2 vectors, vector a = a1i + a2j + a3k And vector b = b1i + b2j + b3k
| a1 a2 a3 |
| b1 b2 b3 |
Example 1
Using the formula for the dot product of three-dimensional vectors, a. b = a1b1 + a2b2 + a3b3,
Practice Sum -1: Calculate the dot product of c = (−4, −9) and d = (−1,2).
Matrices are a foundational elements of linear algebra. Matrices are used in machine learning to
processes the input data variable when training a model.
A and B are two matrices of order m x n (means it has m rows and n columns), then their sum A+B is
a matrices of order m x n, is obtained by adding corresponding elements of A and B.
| 12 1| B = |8 9|
A= |3 -5 | -1 4 |
A + B = | 12+8 1+9 | | 20 10 |
| 3 + (-1) -5 + 4 | = |2 -1 |
Let A = [a if] be an m x n matrix and K be any number called a scalar. Then matrix obtained by
multiplying scalar K is denote by K A
If A= | 12
|3 1 |
-5 |
and K=2
Then K A =
| 24
|6
2 |
-10 |
Two matrices with the same size can be multiplied together, and this is often called element- wise
matrix multiplication
Two matrices A and B can be multiplied (for the product AB) if the number of columns in A (Pre-
multiplier) is same as the number of rows in B (Post multiplier).
If A = [a if]mxn , B = [bif]nxp
Pre multiplier Post multiplier
A= | 2 -3 4 | and B= | 2 5|
| 3 6 -1 | | -1 0|
| 4 -2 |
Now we use to multiply them A and B matrix as (first row of A) X First column of B
(first Row of A) X second column of B (second row of A) X (first column of B) (second row of A) X
(second column of B)
For example
= | 4+3+16 10+0-8 |
| 6 -6 -4 15 + 0 +2 |
= | 23 2|
|-4 17|
Three people denoted by P1, P2, P3 intend to buy some rolls, buns, cakes and bread. Each of them
needs these commodities in different amounts and can buy them in two shops S1, S2. Which shop is
the best for each person P1, P2, P3 to pay as little as possible? The individual prices and desired
quantities of the commodities are given in the following tables:
|6 531|
and P=|3 6
31|
Q=
| 1.50
|2
|5
| 16
1 |
2.50 |
17 |
R = PQ = | 50 49 |
| 58.50 61 |
| 43.50 43.50 |
expresses the amount spent by the person P1 in the shop S1 (the element r11) and in the shop S2
(the element r12). Hence, it is optimal for the person P1 to buy in the shop S2, for the person P2 in
S1 and the person P3 will pay the same price in S1 as in S2.
Activity 2
A has INR 1000 worth stock of Apple, INR 1000 worth of Google and INR 1000 worth of Microsoft. B
has INR 500 of Apple, INR 2000 of Google and INR 500 of Microsoft.
Suppose a news broke and Apple jumps 20%, Google drops 5%, and Microsoft stays the same. What
is the updated portfolio of A and B and net profit /loss from the event?
|apple | | 1 0 0 | The original stock price matrices look like, | google | |010|
| Microsoft| |001|
| profit (+/-) | | 0 0 0 |
| google | |0 .95 0|
| Microsoft| |0 0 1|
Now let’s feed in the portfolios for A (INR 1000, 1000, 1000) and B (INR 500, 2000, 500). We can
crunch the numbers by hand.
Input Interpretation -
| 0. 95 0| * | 1000 2000 |
|0 0 1| | 1000 500 |
| .2 -.5 0|
| 1200 600 |
| 950 1900 |
| 50 0 |
The key is understanding why we’re setting up the matrix like this, not blindly crunching numbers.
This is the algorithm which plays behind any electronic spreadsheet (i.e. MS EXCEL) when you do
what if analysis.
In this module, we will try to explain the confluence of Set theory, which is a branch of mathematics,
and relational database (RDBMS), which is the part of the computer science. A lot of things are going
to come together today, because we are going to learn how set theory principles are helping in data
retrievals from database, which in turn is going to be used by AI model for its training. The important
topics which we are going to cover in this strand are as below:
Before we get into the actual relation between Set and Database of sets, we first need to understand
what do these terms refer to.
A Set is an unordered collection of objects, known as elements or members of the set. An element ‘a’
belongs to a set A can be written as ‘a ∈ A’, ‘a ∉ A’ denotes that a is not an element of the set A. So
set is a mathematical concept and the way we relate sets to other sets, is called set theory.
We use database (like Oracle, MS SQL server, MySql etc.) to store digital data. Database is made up of
several components, of which table is the most important. Database stores the data in the table.
Without tables, there would not be must significance of the DBMS.
For example, student John Smith, participated in swimming and he must have paid $17.
The data in the table of database are of limited values unless the data from different tables are
combined and manipulated to generate useful information. And from here, the role of relational
algebra begins.
Relational algebraic is a set of algebraic operators and rules that manipulates the relational tables to
yield desired information. Relational algebra takes relation (table) as their operands and returns
relation (table) as their results. Relational algebra consists of eight operators:
1. SELECT also known as RESTRICT, yields values for all the rows found in a table that satisfy a
given condition. SELECT yields a horizontal subset of a table as shown in the above diagram.
2. PROJECT yields all values for selected attributes. PROJECT yields a vertical subset of a table.
Please refer the above picture.
3. PRODUCT yields all possible pairs of rows from two tables- also known as Cartesian product.
Therefore, if one table has three rows and the other table has two, the PRODUCT yields a list
composed of 3 x 2= 6 rows as shown in the picture.
4. JOIN allows the combination of two or more tables based on common attributes. Please
refer the above picture.
5. UNION returns a table containing all records that appear in either or both of the specified
tables as shown in the diagram.
6. INTERSECTION returns only those rows that appears in both tables, see the diagram above.
7. DIFFERENCE returns all rows in one table that are not found in the other table, that is, it
subtracts one table from the other, a shown in the diagram above.
8. DIVIDE is typically required when you want to find out entities that are interacting with all
entities of a set of different type entities.
Say for example, if want to find out a person who has account in all the bank of a city?
The division operator is used when we have to evaluate queries which contain the keyword ‘all’.
Division is not supported by SQL directly. However, it can be represented using other operations (like
cross join, Except, In)
When two or more sets combined together to form another set under the mathematical principles of
sets, the process of combining of sets is called set operations.
Keeping these two sets as our example, let us perform four important set operations:
Union of the sets A and B is the set, whose element are distinct element of set A or Set B or both.
A U B = {2, 3, 4, 5}
Intersection of set A and set B is the set of elements belongs to both A and B. A∩B = {3, 4}
Complement of a set A is the set of all elements except A, which means all elements except A.
= {5}
Difference between sets is denoted by ‘A – B’, is the set containing elements of set A but not in
v) Cartesian Product
Remember the term used when plotting a graph, like axes (x-axis, y-axis). For example, (2, 3) depicts
that the value on the x-plane (axis) is 2 and that for y is 3 which is not the same as (3, 2).
The way of representation is fixed that the value of the x- coordinate will come first and then that for
y (ordered way). Cartesian product means the product of the elements say x and y in an ordered way.
A and B are two non-empty sets, then the Cartesian product of two sets, A and set B is the set of all
ordered pairs (a, b) such that a ∈A and b∈B which is denoted as A × B.
You may have understood by now that relational databases are based almost entirely upon set
theory. In fact, if you’ve ever worked with or SQL queried a database you’re probably familiar with
the idea of finding records from a database tables. Finding records from a database tables is nothing
but some form of set operations.
Look at the diagram below, all possible table join operations have been summarized here for your
quick reference:
(http://stackoverflow.com/questions/406294/left-join-vs-left-outer-join-in-sql-server)
What do we mean in fact by joining tables? Joining tables is essentially a Cartesian product followed
by a selection criterion (did you notice, set theory operation. JOIN operation also allows joining
variously related records from different relations (tables).
1. (INNER) JOIN
In an inner join, only those tuples that satisfy the matching criteria are included, while the rest are
excluded. Let's study various types of Inner Joins.
Select records from the first (left-most) table with matching right table records.
In the left outer join, operation allows keeping all tuple in the left relation. However, if there is no
matching tuple is found in right relation, then the attributes of right relation in the join result are
filled with null values.
Select records from the second (right-most) table with matching left table records.
In the right outer join, operation allows keeping all tuple in the right relation. However, if there is no
matching tuple is found in the left relation, then the attributes of the left relation in the join result
are filled with null values.
Selects all records that match either left or right table records.
In a full outer join, all tuples from both relations are included in the result, irrespective of the
matching condition.
Question 2: Which operation of set theory is equivalent to the ‘Product’ operation of relational
algebra?
Question 3: Specify if below statement is true or false:
i) A SQL query that calls for a FULL OUTER JOIN is merely returning the union of two sets.
ii) Finding the LEFT JOIN of two tables is nothing more than finding the set difference or the
relative complement of the two tables.
Question 4: Can you think of an entity like students, employee, sports - create 3 tables of any one of
the entities, you want?
For example
Entity: Students
Students Table (name, roll number, age, class, address) Marks Table (roll number, subject, marks
obtained)
I won’t be wrong if I say, Artificial Intelligence (Machine Learning / Deep Learning) is an engine that
needs data as fuel, so data is the primary building block of AI. And to understand data, statistics is
the key.
The purpose of this module is not to replace the statistics that you will study as a part of
Mathematics in your school, but to introduce you to statistics for the perspective of the Artificial
Intelligence and Machine learning.
This module on statistics, is divided into the following parts:
3.3. Activities
Statistics is the science of data, which is in fact a collection of mathematical techniques that helps to
extract information from data. For the AI perspective, statistics transforms observations into
information that you can understand and share. You will learn more about statistics and statistical
methods in your next level i.e. Level-2.
Usually, Statistics deals with large dataset (population of a country, country wise number of infected
people from CORONA virus and similar datasets). For the understanding and analysis purpose, we
need a data point, be it a number or set of numbers, which can represent the whole domain of data
and this data point is called the central tendency.
“Central tendency” is stated as the summary of a data set in a single value that represents the entire
distribution of data domain (or data set). The one important point that I would like to highlight here
that central tendency does not talk about individual values in the datasets but it gives a
comprehensive summary of whole data domain.
3.1.1. Mean
In statistics, the mean (more technically the arithmetic mean or sample mean) can be estimated
from a sample of examples drawn from the domain. It is a quotient obtained by dividing the total of
the values of a variable by the total number of their observations or items.
If we have n values in a data set and they have values x1, x2, x3 …, the sample mean, M = (x1 + x2 +
x3 …xn) / n
Example 1
Example 2
Class
Frequency
2-4 3
4-6 4
6–8 2
8 – 10 1
Solution
Class
f⋅x
2 -4 3 3 9
4-6 4 5 20
6–8 2 7 14
8–
10 1
n=10 ∑f⋅x=52
=52 / 10
= 5.2
1. Mean is more stable than the median and mode. So that when the measure of central
tendency having the greatest stability is wanted mean is used.
3. When you want your result should not be affected by sampling data.
1. The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being especially
small or large in numerical value.
Staff 1 2 3 4 5 6 7 8 9 10
Salar y 15
k 18
k 16
k 14
k 15
k 15
k 12
k 17
k 90
k 95
= 307 / 10
= 30.7 K
The mean salary for these ten staff is INR 30.7k. However, inspecting the raw data suggests that this
mean value might not be the best way to accurately reflect the typical salary of a worker, as most
workers have salaries in the INR 12k to INR 18k range. The mean is being skewed by the two large
salaries. Therefore, in this situation, we would like to have a better measure of central tendency. As
we will find out later, taking the median would be a better measure of central tendency in this
situation.
2. Sometimes it gives absurd values. For example, there are 41, 44 and 42 students in class VIII,
IX and X of a school. So the average students per class are 42.33. It is never possible.
Why 6ft tall man drowned while crossing a swimming pool which was on an average 5ft deep?
3.1.2. Median
The median is another measure of central tendency. It is positional value of the variables which
divides the group into two equal parts one part comprising all values greater than median and other
part smaller than median.
17 32 35 15 21 41 32 11 10 20 27 28 30
We arrange this data in an ascending or descending order. 10, 11, 15, 17, 20, 21, 27, 28, 30, 32, 32,
35, 40
As 27 is in the middle of this data position wise, therefore Median = 27 How to find median
values?
Use Case 1
In case of ungrouped data, the scores are arranged in order of size. Then the midpoint is found out,
which is the median. In this process two situations arise in computation of median, (a) N is odd (b) N
is even First we shall discuss how to compute median (Mdn) when N is odd.
Step 2: If there is an odd number of numbers, locate the middle number so that there is an equal
number of values to the left and to the right. If there is an even number of numbers locate the two
middle numbers so that there is an equal number of values to the left and to the right of these two
numbers.
Step 3: If there is an odd number of numbers, this middle number is the median. If there is an even
number of numbers add the two middles and divide by 2. The result will be the median.
Example 2
In your class, 5 students scored following marks in the unit test mathematics, find median value
Solution:
Example 2
In your class, 5 students scored following marks in the unit test mathematics, find median value: 11,
11, 14, 18, 20, 22
Solution
Total count is in even number, so median is the average of the two-middle number (14 + 18) / 2 = 16.
c.f= Cumulative frequency of the class preceding the median class f= Frequency of the median class
i= Class size
Example -1:
Number of
workers22 38 46 35 20
0-10 22 22
10-20 38 60
20-30 46 106
30-40 35 141
40-50 20 161
N=161
161+1
= 81
2
(as 81 is right before 106 in cumulative frequency table, median group is 20-30 group)
M = 20 +
= 20 +
= 20 +
161
2 −60 × 10
46
80.5−60 × 10
46
20.5 × 10
46
= 20 + 4.46
Median = 24.46
3.1.3. Mode
Mode is another important measure of central tendency of statistical series. It is the value which
occurs most frequently in the data series. On a histogram it represents the highest bar in a bar chart
or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
An example of a mode is presented below:
In this method mode is determined just by observation. We use mode by inspection method in the
individual series with method involves just an inspection of the series. One is simply identifying the
value that occurs most frequently in the series such a value is called a mode.
Age (years) 22, 24, 17, 18, 17, 19, 18, 21, 20, 21, 20, 23, 22, 22, 22,22,21,24
For frequency distribution, the method for mode calculation is somewhat different. Here we have to
find a modal class. The modal class is the one with the highest frequency value. The class just before
the modal class is called the pre-modal class. Whereas, the class just after the modal class is known
as the post-modal class. Lastly, the following formula is applied for calculation of mode:
Mode = l + h [(f1-f0)/(2f1-f0-f2)]
and f0 = Frequency corresponding to the pre-modal class Example – 2: Calculate mode for the
following data:
Frequency 3 10 15 10 2
Answer: As the frequency for class 30-40 is maximum, this class is the modal class. Classes 20- 30 and
40-50 are pre-modal and post-modal classes respectively. The mode is:
More than one value may command the highest frequency in the series. In such cases grouping
method of calculation is used.
The mean is a good measure of the central tendency when a data set contains values that are
relatively evenly spread with no exceptionally high or low values. The median is a good
measure of the central value when the data include exceptionally high or low values. The median is
the most suitable measure of average for data classified on an ordinal scale. Mode is used when
you need to find the distribution peak and peak may be many.
For example, it is important to print more of the most popular books; because printing different
books in equal numbers would cause a shortage of some books and an oversupply of others.
Measures of central tendency (mean, median and mode) provide the central value of the data set.
Variance and standard deviation are the measures of dispersion (quartiles, percentiles, ranges), they
provide information on the spread of the data around the centre.
In this section we will look at two more measures of dispersion: Variance and standard deviation.
Let us measure the height (at the shoulder) of 5 dogs (in millimetres)
As you can see, their heights are: 600mm, 470mm, 170mm, 430mm and 300mm. Let us calculate
their mean,
= 1970 / 5
= 394 mm
Now let us plot again after taking mean height (The green Line)
Now, let us find the deviation of dogs height from the mean height
Calculate the difference (from mean height), square them , and find the average. This average is the
value of the variance.
= 108520 / 5
= 21704
And standard deviation is the square root of the variance. Standard deviation = √21704 = 147.32
I am assuming that the example above, must have given you a clear idea about the variance and
standard deviation.
So just to summarize, Variance is the sum of squares of differences between all numbers and means.
In order to calculate variance , first, calculate the deviations of each data point from the mean, and
square the result of each .
=5
Then sum of square of differences between all numbers and mean = (2-5) 2 + (4-5) 2 + (4-5) 2 + (4-5)
2 + (5-5) 2 + (5-5) 2 + (7-5) 2 + (9-5) 2
= 9 + 1 +1 + 1+ 0 +0 + 4 + 16
= 32
= 32 / 8
=4
Standard Deviation is square root of variance. It is a measure of the extent to which data varies from
the mean.
• A small variance indicates that the data points tend to be very close to the mean, and to
each other.
• A high variance indicates that the data points are very spread out from the mean, and from
one another.
• A low standard deviation indicates that the data points tend to be very close to the mean
• A high standard deviation indicates that the data points are spread out over a large range of
values.
Activity 2
Can you please perform a statistical research on “The time students spend on social media”?
Condition 1: You will collect the data outside of your school
Once you have data ready with you, do your statistical analysis (central deviation, variance and
standard deviation) and present your story.
3. Visual representation of data
This module will provide an introduction about the purpose, importance and various methods of
data representation using graphs. Statistics is a science of data, so we deal with large data volume in
statistics or Artificial Intelligence. Whenever volume of data increases rapidly, an efficient and
convenient technique for representing data is needed. For a complex and large quantity, human
brain is more comfortable in dealing if represented through visual format.
And that is how the need arise for the graphical representation of data. The important topics that we
are going to cover in this module is:
There could be various reasons of representing data on graphs, few of them have been outlined
below
• The purpose of a graph is to present data that are huge in volume or complicated to
be described in the text / tables.
• Graphs only represent the data but also reveals relations between variables and shows the
trends in data sets.
Graph is a chart of diagram through with data are represented in the form of lines or curve drawn on
the coordinated points and its shows the relation between variable quantities.
The are some algebraic and coordinate geometry principle which apply in drawing the graphs of any
kind.
Graphs have two axis, the vertical one is called Y-axis and the horizontal one is called X-Axis. X and Y
axis are perpendicular to each other. The intersection of these two axis is called ‘0’ or the Origin. On
the X axis the distances right to the origin have positive value (see fig. 7.1) and distances left to the
origin have negative value. On the Y axis distances above the origin have a positive value and below
the origin have a negative value.
There are many characteristics of bar graphs that make them useful. Some of these are that:
• They clearly show trends in data, meaning that they show how one variable is affected as the
other rises or falls.
• Given one variable, the value of the other can be easily determined.
Example 1
The percentage of total income spent under various heads by a family is given below.
40%
10%
10%
15%
20%
5%
3.3.2 Histogram
Histogram is drawn on a natural scale in which the representative frequencies of the different class of
values are represented through vertical rectangles drawn closed to each other.
Measure of central tendency, mode can be easily determined with the help of this graph.
Histogram is easy to draw and simple to understand but it has one limitation that we cannot plot
more than one data distribution on the same axis as histogram.
Example 1
Below is the waiting time of the customer at the cash counter of a bank branch during peak hours.
You are required to create a histogram based on the below data.
Represent the above data in the form of a histogram
Scatter plots is way to represent the data on the graph which is similar to line graphs. A line graph
uses a line on an X-Y axis, while a scatter plot uses dots to represent individual pieces of data. In
statistics, these plots are useful to see if two variables are related to each other. For example, a
scatter chart can suggest a linear relationship (i.e. a straight line).
There is no line but dots are representation the value of variables on the graph.
Example 1
Here price of 1460 apartments and their ground living area. This dataset comes from a kaggle
( https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data) machine learning
competition. You can read more about this example here
(Sourec : https://www.data-to-viz.com/story/TwoNum.html )
Scatter plot is most frequently used data plotting technique in machine learning.
• It is used to observe relationship between two numeric variables. The dots on the plot not
only denotes value of variable but also the patterns, when data taken as whole.
• Scatter plot is a useful tool for the correlation. Relationships between variables can be
described in many ways: positive or negative, strong or weak, linear or nonlinear.