0% found this document useful (0 votes)

5 views

IDS - UNIT-2 - Notes part1_Introduction to Data Science and Prob concept[1]

Unit II introduces data science, covering its definition, life cycle, and applications, as well as tools used in the field. The data science life cycle includes steps such as business understanding, data mining, cleaning, exploration, feature engineering, predictive modeling, and visualization. Additionally, it discusses various applications of data science and tools like SAS, Apache Spark, BigML, and D3.js.

Uploaded by

thecarr2006

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

IDS - UNIT-2 - Notes part1_Introduction to Data Science and Prob concept[1]

Uploaded by

thecarr2006

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Unit-II

Introduction to Data Science:

What is Data Science? Applications of Data Science, Data science life cycle, Tools for data science.,
definition of AI, types of machine learning (ML), list of ML algorithms for classification, clustering, and
feature selection. Description of linear regression and Logistic Regression.
Probability theory, bayes theorem, bayes probability; Cartesian plane, equations of lines, graphs;
exponents.
Introduction to SQL: SQL: creation, insertion, deletion, retrieval of Tables by experimental
demonstrations. Import SQL Database Data into Excel
Unit-II

2.1 What Is Data Science?

A data scientist is someone who knows more statistics than a computer scientist and more computer
science than a statistician. In fact, some data scientists are — for all practical purposes — statisticians, while
others are pretty much indistinguishable from software engineers. Some are machine-learning experts, while
others couldn’t machine-learn their way out of kindergarten. Some are PhDs with impressive publication
records, while others have never read an academic paper. A data scientist is someone who extracts insights
from messy data. Today’s world is full of people trying to turn data into insight. Data science is an inter-
disciplinary field that uses scientific methods, processes, algorithms and systems to extract valuable
information from available huge structural and unstructured data. The life cycle of the data science is as
given below:

• The business requirement step deals with the identification of the problem and objectives of the
organization requirements. It also identifies the parameters that are to be forecasted or predicted.
• The data acquisition step deals with finding and collecting of the source of data and store the data for
finding the information of interest that meets the business requirements.
• The data processing step is used to transform the data to a form that suits better for finding the required
information. The major task of this step is data cleaning i.e removal of unwanted data from the
available raw data.
• The data exploration step is a brain storming step where identification of pattern is done. Here
visualization charts are used extract the required information.
• The data modeling step deals with building of data models and training the models using the data sets.
It uses machine learning algorithms and techniques for better prediction and forecasting.
• The deployment stage deals with the deployment of the model in the business environment.
2.2 Data Science Life Cycle:

Following are the seven steps that make up a data science lifecycle - business understanding, data mining,
data cleaning, data exploration, feature engineering, predictive modeling, and data visualization.

1. Business Understanding
The data scientists in the room are the people who keep asking the why’s. They’re the people who want to
ensure that every decision made in the company is supported by concrete data, and that it is guaranteed (with
a high probability) to achieve results. Before you can even start on a data science project, it is critical that you
understand the problem you are trying to solve. we typically use data science to answer five types of
questions:

1. How much or how many? (regression)

2. Which category? (classification)
3. Which group? (clustering)
4. Is this weird? (anomaly detection)
5. Which option should be taken? (recommendation)

In this stage, you should also be identifying the central objectives of your project by identifying the variables
that need to be predicted. If it’s a regression, it could be something like a sales forecast. If it’s a clustering, it
could be a customer profile. Understanding the power of data and how you can utilize it to derive results for
your business by asking the right questions is more of an art than a science, and doing this well comes with a
lot of experience. One shortcut to gaining this experience is to read what other people have to say about the
topic, which is why I’m going to suggest a bunch of books to get started.

2. Data Mining
Now that you’ve defined the objectives of your project, it’s time to start gathering the data. Data mining is the
process of gathering your data from different sources. Some people tend to group data retrieval and cleaning
together, but each of these processes is such a substantial step that I’ve decided to break them apart. At this
stage, some of the questions worth considering are - what data do I need for my project? Where does it live?
How can I obtain it? What is the most efficient way to store and access all of it?

If all the data necessary for the project is packaged and handed to you, you’ve won the lottery. More often
than not, finding the right data takes both time and effort. If the data lives in databases, your job is relatively
simple - you can query the relevant data using SQL queries, or manipulate it using a dataframe tool like
Pandas. However, if your data doesn’t actually exist in a dataset, you’ll need to scrape it. Beautiful Soup is a
popular library used to scrape web pages for data. If you’re working with a mobile app and want to track user
engagement and interactions, there are countless tools that can be integrated within the app so that you can
start getting valuable data from customers. Google Analytics, for example, allows you to define custom
events within the app which can help you understand how your users behave and collect the corresponding
data.

3. Data Cleaning
Now that you’ve got all of your data, we move on to the most time-consuming step of all - cleaning and
preparing the data. This is especially true in big data projects, which often involve terabytes of data to work
with. According to interviews with data scientists, this process (also referred to as ‘data janitor work’) can
often take 50 to 80 percent of their time. So what exactly does it entail, and why does it take so long?

The reason why this is such a time consuming process is simply because there are so many possible scenarios
that could necessitate cleaning. For instance, the data could also have inconsistencies within the same column,
meaning that some rows could be labelled 0 or 1, and others could be labelled no or yes. The data types could
also be inconsistent - some of the 0s might integers, whereas some of them could be strings. If we’re dealing
with a categorical data type with multiple categories, some of the categories could be misspelled or have
different cases, such as having categories for both male and Male. This is just a subset of examples where you
can see inconsistencies, and it’s important to catch and fix them in this stage.

One of the steps that is often forgotten in this stage, causing a lot of problems later on, is the presence of
missing data. Missing data can throw a lot of errors in the model creation and training. One option is to either
ignore the instances which have any missing values. Depending on your dataset, this could be unrealistic if
you have a lot of missing data. Another common approach is to use something called average imputation,
which replaces missing values with the average of all the other instances. This is not always recommended
because it can reduce the variability of your data, but in some cases it makes sense.

4. Data Exploration
Now that you’ve got a sparkling clean set of data, you’re ready to finally get started in your analysis. The data
exploration stage is like the brainstorming of data analysis. This is where you understand the patterns and bias
in your data. It could involve pulling up and analyzing a random subset of the data using Pandas, plotting a
histogram or distribution curve to see the general trend, or even creating an interactive visualization that lets
you dive down into each data point and explore the story behind the outliers.

Using all of this information, you start to form hypotheses about your data and the problem you are tackling.
If you were predicting student scores for example, you could try visualizing the relationship between scores
and sleep. If you were predicting real estate prices, you could perhaps plot the prices as a heat map on a
spatial plot to see if you can catch any trends.

There is a great summary of tools and approaches on the Wikipedia page for exploratory data analysis.

5. Feature Engineering
In machine learning, a feature is a measurable property or attribute of a phenomenon being observed. If we
were predicting the scores of a student, a possible feature is the amount of sleep they get. In more complex
prediction tasks such as character recognition, features could be histograms counting the number of black
pixels.

According to Andrew Ng, one of the top experts in the fields of machine learning and deep learning, “Coming
up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning’ is
basically feature engineering.” Feature engineering is the process of using domain knowledge to transform
your raw data into informative features that represent the business problem you are trying to solve. This stage
will directly influence the accuracy of the predictive model you construct in the next stage.

We typically perform two types of tasks in feature engineering - feature selection and construction.

Feature selection is the process of cutting down the features that add more noise than information. This is
typically done to avoid the curse of dimensionality, which refers to the increased complexity that arises from
high-dimensional spaces (i.e. way too many features). I won’t go too much into detail here because this topic
can be pretty heavy, but we typically use filter methods (apply statistical measure to assign scoring to each
feature), wrapper methods (frame the selection of features as a search problem and use a heuristic to perform
the search) or embedded methods (use machine learning to figure out which features contribute best to the
accuracy).

Feature construction involves creating new features from the ones that you already have (and possibly
ditching the old ones). An example of when you might want to do this is when you have a continuous
variable, but your domain knowledge informs you that you only really need an indicator variable based on a
known threshold. For example, if you have a feature for age, but your model only cares about if a person is an
adult or minor, you could threshold it at 18, and assign different categories to instances above and below that
threshold. You could also merge multiple features to make them more informative by taking their sum,
difference or product. For example, if you were predicting student scores and had features for the number of
hours of sleep on each night, you might want to create a feature that denoted the average sleep that the student
had instead.

6. Predictive Modeling
Predictive modeling is where the machine learning finally comes into your data science project. I use the term
predictive modeling because I think a good project is not one that just trains a model and obsesses over the
accuracy, but also uses comprehensive statistical methods and tests to ensure that the outcomes from the
model actually make sense and are significant. Based on the questions you asked in the business
understanding stage, this is where you decide which model to pick for your problem. This is never an easy
decision, and there is no single right answer. The model (or models, and you should always be testing several)
that you end up training will be dependent on the size, type and quality of your data, how much time and
computational resources you are willing to invest, and the type of output you intend to derive. There are a
couple of different cheat sheets available online which have a flowchart that helps you decide the right
algorithm based on the type of classification or regression problem you are trying to solve. The two that I
really like are the Microsoft Azure Cheat Sheet and SAS Cheat Sheet.

Once you’ve trained your model, it is critical that you evaluate its success. A process called k-fold cross
validation is commonly used to measure the accuracy of a model. It involves separating the dataset into k
equally sized groups of instances, training on all the groups except one, and repeating the process with
different groups left out. This allows the model to be trained on all the data instead of using a typical train-test
split.

For classification models, we often test accuracy using PCC (percent correct classification), along with a
confusion matrix which breaks down the errors into false positives and false negatives. Plots such as as ROC
curves, which is the true positive rate plotted against the false positive rate, are also used to benchmark the
success of a model. For a regression model, the common metrics include the coefficient of determination
(which gives information about the goodness of fit of a model), mean squared error (MSE), and average
absolute error.

7. Data Visualization
Data visualization is a tricky field, mostly because it seems simple but it could possibly be one of the hardest
things to do well. That’s because data viz combines the fields of communication, psychology, statistics, and
art, with an ultimate goal of communicating the data in a simple yet effective and visually pleasing way. Once
you’ve derived the intended insights from your model, you have to represent them in way that the different
key stakeholders in the project can understand.

Again, this is a topic that could be a blog post on its own, so instead of diving deeper into the field of data
visualization, I will give a couple of starting points. I personally love working through the analysis and
visualization pipeline on an interactive Python notebook like Jupyter, in which I can have my code and
visualizations side by side, allowing for rapid iteration with libraries like Seaborn and Bokeh. Tools like
Tableau and Plotly make it really easy to drag-and-drop your data into a visualization and manipulate it to get
more complex visualizations. If you’re building an interactive visualization for the web, there is no better
starting point than D3.js.

8. Business Understanding
Phew. Now that you’ve gone through the entire lifecycle, it’s time to go back to the drawing board.
Remember, this is a cycle, and so it’s an iterative process. This is where you evaluate how the success of your
model relates to your original business understanding. Does it tackle the problems identified? Does the
analysis yield any tangible solutions? If you encountered any new insights during the first iteration of the
lifecycle (and I assure you that you will), you can now infuse that knowledge into the next iteration to
generate even more powerful insights, and unleash the power of data to derive phenomenal results for your
business or project.

2.3 Data Science Applications

• Fraud and Risk Detection
• Healthcare
• Targeted Advertising
• Website Recommendations
• Advanced Image Recognition
• Speech Recognition
• Airline Route Planning
• Gaming
• Augmented Reality

2.4 Data Science Tools

1. SAS
It is one of those data science tools which are specifically designed for statistical operations.
SAS is a closed source proprietary software that is used by large organizations to analyze
data. SAS uses base SAS programming language which for performing statistical modeling.
It is widely used by professionals and companies working on reliable commercial software.
SAS offers numerous statistical libraries and tools that you as a Data Scientist can use for
modeling and organizing their data. While SAS is highly reliable and has strong support from
the company, it is highly expensive and is only used by larger industries. Also, SAS pales in
comparison with some of the more modern tools which are open-source. Furthermore, there
are several libraries and packages in SAS that are not available in the base pack and can
require an expensive upgradation.

2.Apache Spark
Apache Spark or simply Spark is an all-powerful analytics engine and it is the most used
Data Science tool. Spark is specifically designed to handle batch processing and Stream
Processing. It comes with many APIs that facilitate Data Scientists to make repeated access
to data for Machine Learning, Storage in SQL, etc. It is an improvement over Hadoop and
can perform 100 times faster than MapReduce. Spark has many Machine Learning APIs that
can help Data Scientists to make powerful predictions with the given data.
Spark does better than other Big Data Platforms in its ability to handle streaming data. This
meansthat Spark can process real-time data as compared to other analytical tools that process
only historical data in batches. Spark offers various APIs that are programmable in Python,
Java, and R.But the most powerful conjunction of Spark is with Scala programming language
which is based onJava Virtual Machine and is cross-platform in nature.
Spark is highly efficient in cluster management which makes it much better than Hadoop as
the latter is only used for storage. It is this cluster management system that allows Spark to
process application at a high speed.

3. BigML
BigML, it is another widely used Data Science Tool. It provides a fully interactable,
cloud-basedGUI environment that you can use for processing Machine Learning Algorithms.
BigML provides a standardized software using cloud computing for industry requirements.
Through it, companies can use Machine Learning algorithms across various parts of their
company. For example, it can use this one software across for sales forecasting, risk
analytics, and product innovation. BigML specializes in predictive modeling. It uses a wide
variety of Machine Learning algorithms like clustering, classification, time-series
forecasting, etc.
BigML provides an easy to use web-interface using Rest APIs and you can create a free
account or a premium account based on your data needs. It allows interactive visualizations
of data and provides you with the ability to export visual charts on your mobile or IOT
devices.
Furthermore, BigML comes with various automation methods that can help you to automate
the tuning of hyperparameter models and even automate the workflow of reusable scripts.

4. D3.js
Javascript is mainly used as a client-side scripting language. D3.js, a Javascript library allows
you to make interactive visualizations on your web-browser. With several APIs of D3.js, you
can use several functions to create dynamic visualization and analysis of data in your
browser. Another powerful feature of D3.js is the usage of animated transitions. D3.js makes
documents dynamic by allowing updates on the client side and actively using the change in
data to reflect visualizations onthe browser.
You can combine this with CSS to create illustrious and transitory visualizations that will
help you to implement customized graphs on web-pages. Overall, it can be a very useful tool
for Data Scientists who are working on IOT based devices that require client-side interaction
for visualization and data processing.

5.MATLAB
MATLAB is a multi-paradigm numerical computing environment for processing
mathematical information. It is a closed-source software that facilitates matrix functions,
algorithmic implementation and statistical modeling of data. MATLAB is most widely used
in several scientific disciplines.
In Data Science, MATLAB is used for simulating neural networks and fuzzy logic. Using the
MATLAB graphics library, you can create powerful visualizations. MATLAB is also used in
image and signal processing. This makes it a very versatile tool for Data Scientists as they
can tackle all the problems, from data cleaning and analysis to more advanced Deep Learning
algorithms.
Furthermore, MATLAB‘s easy integration for enterprise applications and embedded
systems make it an ideal Data Science tool. It also helps in automating various tasks ranging
from extraction of data to re-use of scripts for decision making. However, it suffers from the
limitation of being a closed-source proprietary software.

6. Excel
Probably the most widely used Data Analysis tool. Microsoft developed Excel mostly for
spreadsheet calculations and today, it is widely used for data processing, visualization, and
complex calculations. Excel is a powerful analytical tool for Data Science. While it has been
the traditional tool for data analysis, Excel still packs a punch.
Excel comes with various formulae, tables, filters, slicers, etc. You can also create
your own custom functions and formulae using Excel. While Excel is not for calculating the
huge amount of Data, it is still an ideal choice for creating powerful data visualizations and
spreadsheets. You can also connect SQL with Excel and can use it to manipulate and analyze
data. A lot of Data Scientists use Excel for data cleaning as it provides an interactable GUI
environment to pre-process information easily.

With the release of ToolPak for Microsoft Excel, it is now much easier to compute complex
analyzations. However, it still pales in comparison with much more advanced Data Science
tools like SAS. Overall, on a small and non-enterprise level, Excel is an ideal tool for data
analysis.

7. ggplot2
ggplot2 is an advanced data visualization package for the R programming language. The
developers created this tool to replace the native graphics package of R and it uses powerful
commands to create illustrious visualizations. It is the most widely used library that Data
Scientistsuse for creating visualizations from analyzed data.
Ggplot2 is part of tidyverse, a package in R that is designed for Data Science. One way in
which ggplot2 is much better than the rest of the data visualizations is aesthetics. With
ggplot2, Data Scientists can create customized visualizations in order to engage in enhanced
storytelling. Using ggplot2, you can annotate your data in visualizations, add text labels to
data points and boost intractability of your graphs. You can also create various styles of maps
such as choropleths, cartograms, hexbins, etc. It is the most used data science tool.

8. Tableau
Tableau is a Data Visualization software that is packed with powerful graphics to make
interactive visualizations. It is focused on industries working in the field of business
intelligence. The most important aspect of Tableau is its ability to interface with databases,
spreadsheets, OLAP (Online Analytical Processing) cubes, etc. Along with these features,
Tableau has the ability to visualize geographical data and for plotting longitudes and latitudes
in maps.
Along with visualizations, you can also use its analytics tool to analyze data. Tableau comes
with an active community and you can share your findings on the online platform. While
Tableau is enterprise software, it comes with a free version called Tableau Public.

9. Jupyter
Project Jupyter is an open-source tool based on IPython for helping developers in making
open- source software and experiences interactive computing. Jupyter supports multiple
languages like Julia, Python, and R. It is a web-application tool used for writing live code,
visualizations, and presentations. Jupyter is a widely popular tool that is designed to
address the requirements ofData Science.
It is an interactable environment through which Data Scientists can perform all of their
responsibilities. It is also a powerful tool for storytelling as various presentation features are
present in it. Using Jupyter Notebooks, one can perform data cleaning, statistical
computation, visualization and create predictive machine learning models. It is 100% open-
source and is, therefore, free of cost. There is an online Jupyter environment called
Collaboratory which runs on the cloud and stores the data in Google Drive.

10. Matplotlib
Matplotlib is a plotting and visualization library developed for Python. It is the most popular
tool for generating graphs with the analyzed data. It is mainly used for plotting complex
graphs using simple lines of code. Using this, one can generate bar plots, histograms,
scatterplots etc. Matplotlib has several essential modules. One of the most widely used
modules is pyplot. It offers a MATLAB like an interface. Pyplot is also an open-source
alternative to MATLAB‘s graphic modules.
Matplotlib is a preferred tool for data visualizations and is used by Data Scientists over other
contemporary tools. As a matter of fact, NASA used Matplotlib for illustrating data
visualizations during the landing of Phoenix Spacecraft. It is also an ideal tool for beginners
in learning data visualization with Python.

11. NLTK
Natural Language Processing has emerged as the most popular field in Data Science. It deals
with the development of statistical models that help computers understand human language.
These statistical models are part of Machine Learning and through several of its algorithms,
are able to assist computers in understanding natural language. Python language comes with
a collection of libraries called Natural Language Toolkit (NLTK) developed for this
particular purpose only.
NLTK is widely used for various language processing techniques like tokenization,
stemming, tagging, parsing and machine learning. It consists of over 100 corpora which are a
collection of data for building machine learning models. It has a variety of applications such
as Parts of Speech Tagging, Word Segmentation, Machine Translation, Text to Speech
Speech Recognition, etc.

12. Scikit-learn
Scikit-learn is a library based in Python that is used for implementing Machine Learning
Algorithms. It is simple and easy to implement a tool that is widely used for analysis and
data science. It supports a variety of features in Machine Learning such as data
preprocessing,classification, regression, clustering, dimensionality reduction, etc
Scikit-learn makes it easy to use complex machine learning algorithms. It is therefore in
situations that require rapid prototyping and is also an ideal platform to perform research
requiring basic Machine Learning. It makes use of several underlying libraries of Python
such as SciPy, Numpy, Matplotlib, etc.

13. TensorFlow
TensorFlow has become a standard tool for Machine Learning. It is widely used for advanced
machine learning algorithms like Deep Learning. Developers named TensorFlow after
Tensors which are multidimensional arrays. It is an open-source and ever-evolving toolkit
which is known for its performance and high computational abilities. TensorFlow can run on
both CPUs and GPUs and has recently emerged on more powerful TPU platforms. This gives
it an unprecedented edge in terms of the processing power of advanced machine learning
algorithms.
Due to its high processing ability, Tensorflow has a variety of applications such as speech
recognition, image classification, drug discovery, image and language generation, etc. For
Data Scientists specializing in Machine Learning, Tensorflow is a must know tool.

14. Weka
Weka or Waikato Environment for Knowledge Analysis is a machine learning software
written in Java. It is a collection of various Machine Learning algorithms for data mining.
Weka consists of various machine learning tools like classification, clustering, regression,
visualization and data preparation.
It is an open-source GUI software that allows easier implementation of machine learning
algorithms through an interactable platform. You can understand the functioning of Machine
Learning on the data without having to write a line of code. It is ideal for Data Scientists who
are beginners in Machine Learning.

2.5 Definition of AI and its types

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that
are programmed to think like humans and mimic their actions. The term may also be
applied to any machine that exhibits traits associated with a human mind such as learning
and problem-solving. The ideal characteristic of artificial intelligence is its ability to
rationalize and take actions that have the best chance of achieving a specific goal. A subset
of artificial intelligence is machine learning, which refers to the concept that computer
programs can automatically learn from and adapt to new data without being assisted by
humans. Deep learning techniques enable this automatic learning through the absorption
of huge amounts of unstructured data such as text, images, orvideo.
Understanding Artificial Intelligence (AI)
When most people hear the term artificial intelligence, the first thing they usually think of is
robots. That's because big-budget films and novels weave stories about human-like machines
that wreak havoc on Earth. But nothing could be further from the truth.
Artificial intelligence is based on the principle that human intelligence can be defined in a
way that a machine can easily mimic it and execute tasks, from the most simple to those
that are even more complex. The goals of artificial intelligence include mimicking human
cognitive activity. Researchers and developers in the field are making surprisingly rapid
strides in mimicking activities such as learning, reasoning, and perception, to the extent that
these can be concretely defined. Some believe that innovators may soon be able to develop
systems that exceed the capacity of humans to learn or reason out any subject. But others
remain skeptical because all cognitive activity is laced with value judgements that are subject
to human experience.
As technology advances, previous benchmarks that defined artificial intelligence become
outdated. For example, machines that calculate basic functions or recognize text through
optical character recognition are no longer considered to embody artificial intelligence, since
this function is now taken for granted as an inherent computer function.
AI is continuously evolving to benefit many different industries. Machines are wired using a cross-
disciplinary approach based on mathematics, computer science, linguistics, psychology, and more.
Algorithms often play a very important part in the structure of artificial intelligence, where simple
algorithms are used in simple applications, while more complex ones help frame strong artificial
intelligence.

Applications of Artificial Intelligence

The applications for artificial intelligence are endless. The technology can be applied to
many different sectors and industries. AI is being tested and used in the healthcare industry
for dosing drugs and different treatment in patients, and for surgical procedures in the
operating room.
Other examples of machines with artificial intelligence include computers that play chess and
self- driving cars. Each of these machines must weigh the consequences of any action
they take, as each action will impact the end result. In chess, the end result is winning the
game. For self- driving cars, the computer system must account for all external data and
compute it to act in away that prevents a collision.
Artificial intelligence also has applications in the financial industry, where it is used to detect
and flag activity in banking and finance such as unusual debit card usage and large account
deposits— all of which help a bank's fraud department. Applications for AI are also
being used to help streamline and make trading easier. This is done by making supply,
demand, and pricing of securities easier to estimate.

Categorization of Artificial Intelligence

Artificial intelligence can be divided into two different categories: weak and strong.
Weak artificial intelligence embodies a system designed to carry out one particular job.
Weak AI systems include video games such as the chess example from above and personal
assistants such as Amazon's Alexa and Apple's Siri. You ask the assistant a question, it
answers it for you.
Strong artificial intelligence systems are systems that carry on the tasks considered to be
human-like. These tend to be more complex and complicated systems. They are programmed
to handle situations in which they may be required to problem solve without having a person
intervene. These kinds of systems can be found in applications like self-driving cars or in
hospital operating rooms.

2.6 Types of machine learning (ML), list of ML algorithms for classification

Machine Learning
Machine learning is a growing technology which enables computers to learn
automatically from past data. Machine learning uses various algorithms for building
mathematical models and making predictions using historical data or information.
Currently, it is being used for various tasks such as image recognition, speech recognition,
email filtering, Facebook auto-tagging, recommender system, and many more.

This machine learning tutorial gives you an introduction to machine learning along with the
wide range of machine learning techniques such as Supervised,
Unsupervised, and Reinforcement learning. You will learn about regression and
classification models, clustering methods, hidden Markov models, and various sequential
models.
What is Machine Learning
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes therole of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with
the development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.

A machine has the ability to learn if it can improve its performance by gaining moredata.

How does Machine Learning work

A Machine Learning system learns from historical data, builds the prediction models,
and whenever it receives new data, predicts the output for it. The accuracy of predicted
output depends upon the amount of data, as the huge amount of data helps to build a better
model whichpredicts the output more accurately.

Suppose we have a complex problem, where we need to perform some predictions, so instead
of writing a code for it, we just need to feed the data to generic algorithms, and with the
help of these algorithms, machine builds the logic as per the data and predict the
output. Machine learning has changed our way of thinking about the problem. The below
block diagram explains the working of Machine Learning algorithm:
Features of Machine Learning:
o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount ofthe data.
Need for Machine Learning
The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes the
machine learning to make things easy for us.
We can train machine learning algorithms by providing them the huge amount of data and
let them explore the data, construct the models, and predict the required output automatically.
The performance of the machine learning algorithm depends on the amount of data, and it
can be determined by the cost function. With the help of machine learning, we can save both
time and money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and
Amazon have build machine learning models that are using a vast amount of data to analyze
the user interest and recommend product accordingly.

Following are some key points which show the importance of Machine Learning:
o Rapid increment in the production of data
o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning

At a broad level, machine learning can be classified into four types:
1. Supervised learning
2. Unsupervised learning
3. Semi- Supervised learning
4. Reinforcement learning

1)Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.
The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sampledata to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision ofthe teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision. The training is provided to the machine with the set of data that has not been
labeled, classified,or categorized, and the algorithm needs to act on that data without any
supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects
withsimilar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:
o Clustering
o Association

3) Semi-Supervised Learning
Semi-Supervised Learning is a learning method in which a machine learns with and
without anysupervision. It is combination of Supervised and Unsupervised Learning.

4) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets
a reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the
most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Difference between Artificial intelligence and Machine learning

Artificial intelligence and machine learning are the part of computer science that are
correlated with each other. These two technologies are the most trending technologies which
are used for creating intelligent systems.
Although these are two related technologies and sometimes people use them as a synonym
for each other, but still both are the two different terms in various cases.
On a broad level, we can differentiate both AI and ML as:
AI is a bigger concept to create intelligent machines that can simulate human thinking
capability and behavior, whereas, machine learning is an application or subset of AI that
allows machines to learn from data without being programmed explicitly.

Below are some main differences between AI and machine learning along with the overview
of Artificial intelligence and machine learning.
Artificial Intelligence
Artificial intelligence is a field of computer science which makes a computer system that can
mimic human intelligence. It is comprised of two words "Artificial" and "intelligence",
which means "a human-made thinking power." Hence we can define it as,
Artificial intelligence is a technology using which we can create intelligent systems that can
simulate human intelligence.

The Artificial intelligence system does not require to be pre-programmed, instead of that,
they use such algorithms which can work with their own intelligence. It involves machine
learning algorithms such as Reinforcement learning algorithm and deep learning neural
networks. AI is being used in multiple places such as Siri, Google?s AlphaGo, AI in Chess
playing, etc.

Machine learning
Machine learning is about extracting knowledge from the data. It can be defined as,
Machine learning is a subfield of artificial intelligence, which enables machines to learn
from past data or experiences without being explicitly programmed.

Machine learning enables a computer system to make predictions or take some decisions
using historical data without being explicitly programmed. Machine learning uses a massive
amount of structured and semi-structured data so that a machine learning model can
generate accurateresult or give predictions based on that data.
Machine learning works on algorithm which learn by it?s own using historical data. It
works onlyfor specific domains such as if we are creating a machine learning model to detect
pictures of dogs, it will only give result for dog images, but if we provide a new data like cat
image then it will become unresponsive. Machine learning is being used in various places
such as for online recommender system, for Google search algorithms, Email spam filter,
Facebook Auto friend tagging suggestion, etc.
Key differences between Artificial Intelligence (AI) and Machine learning (ML):

Artificial Intelligence Machine learning

Artificial intelligence is a technology Machine learning is a subset of AI which allows a
whichenables a machine to simulate human machine to automatically learn from past data without
behavior. programming explicitly.
The goal of AI is to make a smart The goal of ML is to allow machines to learn from data so
computersystem like humans to solve thatthey can give accurate output.
complex problems.
In AI, we make intelligent systems to In ML, we teach machines with data to perform a
performany task like a human. particulartask and give an accurate result.
Machine learning and deep learning are the Deep learning is a main subset of machine learning.
twomain subsets of AI.
AI has a very wide range of scope. Machine learning has a limited scope.
AI is working to create an intelligent Machine learning is working to create machines that can
systemwhich can perform various complex perform only those specific tasks for which they are
tasks. trained.
AI system is concerned about maximizing Machine learning is mainly concerned about accuracy
thechances of success. andpatterns.
The main applications of AI are Siri, The main applications of machine learning are
customer Online
support using catboats, Expert System, recommender system, Google search
Online game playing, intelligent humanoid algorithms, Facebook auto friend tagging suggestions,
robot, etc. etc.
On the basis of capabilities, AI can be divided Machine learning can also be divided into mainly three
into three types, which are, Weak AI, types that are Supervised learning, Unsupervised
General AI, and Strong AI. learning, and Reinforcement learning.
It includes learning, reasoning, and It includes learning and self-correction when introduced
self-correction. withnew data.
AI completely deals with Structured, semi- Machine learning deals with Structured and semi-
structured, and unstructured data. structureddata.

Classification Algorithm in Machine Learning

As we know, the Supervised Machine Learning algorithm can be broadly classified into
Regression and Classification Algorithms. In Regression algorithms, we have predicted the
output for continuous values, but to predict the categorical values, we need Classification
algorithms.

What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes
can be called as targets/labels or categories.

Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with the
corresponding output.

The main goal of the Classification algorithm is to identify the category of a given dataset,
andthese algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below
diagram,there are two classes, class A and Class B. These classes have features that are
similar to each other and dissimilar to other classes.

The algorithm which implements the classification on a dataset is known as a classifier.

There aretwo types of Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes,
then it iscalled as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes,
then it iscalled as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

Learners in Classification Problems:

In the classification problems, there are two types of learners:

1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives thetest dataset. In Lazy learner case, classification is done on the basis of the
most related data stored in the training dataset. It takes less time in training but more
time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners: Eager Learners develop a classification model based on a training
dataset before receiving a test dataset. Opposite to Lazy learners, Eager learners take
less time in training and more time in prediction. Example: Decision Trees, Naïve
Bayes, ANN.

Classification Algorithms can be further divided into the Mainly two category:

o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

Classification:
Classification is a process of finding a function which helps in dividing the dataset into
classes based on different parameters. In Classification, a computer program is trained on
the trainingdataset and based on that training, it categorizes the data into different
classes.

The task of the classification algorithm is to find the mapping function to map the input(x) to
thediscrete output(y).

Example: The best example to understand the Classification problem is Email Spam
Detection. The model is trained on the basis of millions of emails on different parameters,
and whenever it receives a new email, it identifies whether the email is spam or not. If the
email is spam, then it ismoved to the Spam folder.

Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the following types:

o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

Regression:

Regression is a process of finding the correlations between dependent and independent

variables. It helps in predicting the continuous variables such as prediction of Market
Trends, prediction ofHouse prices, etc.

The task of the Regression algorithm is to find the mapping function to map the input
variable(x)to the continuous output variable(y).

Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data, and
once the training iscompleted, it can easily predict the weather for future days.
Types of Regression Algorithm:
o Simple Linear Regression
o Multiple Linear Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression

Difference between Regression and Classification

Regression Algorithm Classification Algorithm

In Regression, the output variable In Classification, the output variable must

mustbe of continuous nature or real be adiscrete value.
value.

The task of the regression algorithm is The task of the classification algorithm is to map
to map the input value (x) with the input value(x) with the discrete output
thecontinuous output variable(y). variable(y).
Regression Algorithms are used Classification Algorithms are used with
withcontinuous data. discretedata.

In Regression, we try to find the best In Classification, we try to find the decision
fit boundary, which can divide the dataset into
line, which can predict the output different classes.
moreaccurately.

Regression algorithms can be used Classification Algorithms can be used to solve

tosolve the regression problems classification problems such as Identification of
such asWeather Prediction, House spam emails, Speech Recognition, Identification
price of
prediction, etc. cancer cells, etc.

The regression Algorithm can be The Classification algorithms can be divided into
further
Binary Classifier and Multi-class Classifier.
divided into Linear and Non-
linearRegression.

Clustering in Machine Learning

Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a
group that has less orno similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm,

and itdeals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-
ID. MLsystem can use this id to simplify the processing of large and complex datasets.

The clustering technique is commonly used for statistical data analysis.

Example: Let's understand the clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar usage are
grouped together.
Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that
we can easily find out the things. The clustering technique also works in the same way. Other
examples of clustering are grouping documents according to the topic.
The clustering technique can be widely used in various tasks. Some most common uses of
thistechnique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this
techniqueto recommend the movies and web-series to its users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can see the
different fruitsare divided into several groups with similar properties.
What is Dimensionality Reduction?
The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
A dataset contains a huge number of input features in various cases, which makes the
predictive modeling task more complicated. Because it is very difficult to visualize or make
predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information." These techniques are widely used in machine learning for obtaining a better fit
predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.

The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
A dataset contains a huge number of input features in various cases, which makes the
predictive modeling task more complicated. Because it is very difficult to visualize or make
predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.
Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information." These techniques are widely used in machine learning for obtaining a better fit
predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data visualization,
noise reduction, cluster analysis, etc.

2.7 Feature Selection

Feature selection is the process of selecting the subset of the relevant features and leaving out
theirrelevant features present in a dataset to build a model of high accuracy. In other words, it
is a way of selecting the optimal features from the input dataset.
Three methods are used for the feature selection:

1. Filters Methods
In this method, the dataset is filtered, and a subset that contains only the relevant features is
taken. Some common techniques of filters method are:
o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.
2.Wrappers Methods
The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and
evaluate the performance. The performance decides whether to add those features or remove
to increase the accuracy of the model. This method is more accurate than the filtering
method but complex to work. Some common techniques of wrapper methods are:
o Forward Selection
o Backward Selection
o Bi-directional Elimination

3. Embedded Methods: Embedded methods check the different training iterations of the
machinelearning model and evaluate the importance of each feature. Some common
techniques of Embedded methods are:
o LASSO
o Elastic Net
o Ridge Regression, etc.

Linear Regression in Machine Learning

Linear regression is one of the easiest and most popular Machine Learning algorithms.
It is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship
betweenthe variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom)
a1 = Linear regression coefficient (scale factor to each input
value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
dependentvariable, then such a Linear Regression algorithm is called Simple
Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple LinearRegression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is
calleda regression line. A regression line can show two types of relationship:
o Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases
on X-axis, then such a relationship is termed as a Positive linear relationship.

o Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable
increases onthe X-axis, then such a relationship is called a negative linear
relationship.

Logistic Regression in Machine Learning

o Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems, whereas
Logistic regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc.
o Logistic Regression is a significant machine learning algorithm because it has the
ability to provide probabilities and classify new data using continuous and discrete
datasets.
o Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
The belowimage is showing the logistic function:

Logistic Function (Sigmoid Function):

o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go
beyond thislimit, so it forms a curve like the "S" form. The S-form curve is called the
Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.

Assumptions for Logistic Regression:

o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.

Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the
above equation by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:

The above equation is the final equation for Logistic Regression.

Linear Regression vs Logistic Regression

linear Regression and Logistic Regression are the two famous Machine Learning Algorithms which come
under supervised learning technique. Since both the algorithms are of supervised in nature hence these
algorithms use labeled dataset to make the predictions. But the main difference between them is how they are
being used. The Linear Regression is used for solving Regression problems whereas Logistic Regression is
used for solving the Classification problems. The description of both the algorithms is given below along with
difference table.

Linear Regression:
o Linear Regression is one of the most simple Machine learning algorithm that comes
under Supervised Learning technique and used for solving regression problems.
o It is used for predicting the continuous dependent variable with the help of
independent variables.
o The goal of the Linear regression is to find the best fit line that can accurately predict
the output for the continuous dependent variable.
o If single independent variable is used for prediction then it is called Simple Linear
Regression and if there are more than two independent variables then such regression
is called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between dependent
variableand independent variable. And the relationship should be of linear nature.
o The output for Linear regression should only be the continuous values such as price,
age, salary, etc. The relationship between the dependent variable and independent
variable canbe shown in below image:

In above image the dependent variable is on Y-axis (salary) and independent variable is on x-
axis(experience). The regression line can be written as:
y= a0+a1x+ ε
Where, a0 and a1 are the coefficients and ε is the error term.
Logistic Regression:
o Logistic regression is one of the most popular Machine learning algorithm that
comesunder Supervised Learning techniques.
o It can be used for Classification as well as for Regression problems, but mainly
used forClassification problems.
o Logistic regression is used to predict the categorical dependent variable with the
help ofindependent variables.
o The output of Logistic Regression problem can be only between the 0 and 1.
o Logistic regression can be used where the probabilities between two classes is
required.Such as whether it will rain today or not, either 0 or 1, true or false etc.
o Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.In logistic
regression, we pass the weighted sum of inputs through an activation function that
can map values in between 0 and 1. Such activation function is known as sigmoid
function and the curve obtained is called as sigmoid curve or S-curve. Consider the
below image:

o The equation for logistic regression is:

Difference between Linear Regression and Logistic Regression:

Linear Regression Logistic Regression

Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a categorical dependent variable using a
given set of independent variables. given set of independent variables.

Linear Regression is used for Logistic regression is used for

solvingRegression problem. solvingClassification problems.

In Linear regression, we predict the value In logistic Regression, we predict the values
ofcontinuous variables. ofcategorical variables.

In linear regression, we find the best fit In Logistic Regression, we find the S-curve
line,by which we can easily predict the bywhich we can classify the samples.
output.

Least square estimation method is used Maximum likelihood estimation method

forestimation of accuracy. isused for estimation of accuracy.

The output for Linear Regression must The output of Logistic Regression must be
be acontinuous value, such as price, age, a Categorical value such as 0 or 1, Yes or
etc. No, etc.

In Linear regression, it is required that In Logistic regression, it is not required to

relationship between dependent variable have the linear relationship between the
and independent variable must be linear. dependent and independent variable.

In linear regression, there may be In logistic regression, there should not be

collinearitybetween the independent collinearity between the independent
variables. variable.

2. 8 Probability theory
There are many sources of uncertainty in ai, including variance in the specific data values, the sample of
data collected from the domain, and in the imperfect nature of any models developed from such data.
• Uncertainty is the biggest source of difficulty for beginners in machine learning, especially
developers.
• Noise in data, incomplete coverage of the domain, and imperfect models provide the three main
sources of uncertainty in machine learning.
• Probability provides the foundation and tools for quantifying, handling, and harnessing uncertainty
in applied machine learning.
Uncertainty means working with imperfect or incomplete information. Probability is a numerical description
of how likely an event is to occur or how likely it is that a proposition is true. Probability is a number
between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty.
How to compute the probability?
Given: Statistical experiment has n equally-likely outcomes, r outcome is “success”
Find: Probability of successful outcome(S)
P( S) = Number of Successes ∕ Total Number of Outcomes = r/n

Example:1
Given: 10 marbles: 2 red, 3 green, 5 blue.
Find: probability of selecting green? Solution: P(G) = 3/10= .30
A Random Variable is a set of possible values from a random experiment.
Example: Throw a die once
Random Variable X = "The score shown on the top face". X could be 1, 2, 3, 4, 5 or 6
So the Sample Space is {1, 2, 3, 4, 5, 6}
We can show the probability of any one value using this style: P(X = value) = probability of that value
X = {1, 2, 3, 4, 5, 6}
In this case they are all equally likely, so the probability of any one is 1/6
• P(X = 1) = 1/6
• P(X = 2) = 1/6
• P(X = 3) = 1/6
• P(X = 4) = 1/6
• P(X = 5) = 1/6
• P(X = 6) = 1/6
Atomic event: A complete specification of the state of the world about which the agent is uncertain. E.g., if
the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic
events:
Cavity = false  Toothache = false
Cavity = true  Toothache = false
Cavity = true  Toothache = false
Cavity = true  Toothache = true

Atomic events are mutually exclusive and exhaustive

In case of atomic events that are mutually exclusive if some atomic event is true, then all other atomic
events are false. And in case of atomic events that are exhaustive there is always some atomic event true.

Joint Probability
It is the likelihood of more than one event occurring at the same time.
Two types of joint probability we can find:
1. Mutual exclusive events(Without common outcomes)
2. Non Mutual exclusive events (With common outcomes)

Mutual exclusive mean occurrence of events both A and B together is impossible i.e. P(A and B)=0 and A
or B is the sum of A and B i.e. P(A or B) =P(A) + P(B)
In case of Non Mutual exclusive events A or B is the sum of A and B minus A and B i.e.
P(A or B) =P(A) + P(B) – P(A and B)
The conditional probability of an event B in relationship to an event A is the probability that event B occurs
given that event A has already occurred. The notation for conditional probability is P(B|A), read as the
probability of B given A i.e. probability of B given that an event A is already occurred.
When two events, A and B, are dependent, the probability of both occurring is:
Problem 1: A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class
passed the first test. What percent of those who passed the first test also passed the second test?
Answer: P(Second | First) = P(First and Second)/P(First) = 0.25/0.42=0.60 = 60%
Problem 2: A jar contains black and white marbles. Two marbles are chosen without replacement. The
probability of selecting a black marble and then a white marble is 0.34, and the probability of selecting a
black marble on the first draw is 0.47. What is the probability of selecting a white marble on the second
draw, given that the first marble drawn was black?
Answer: P(White | Black) = P(Black and White)/P(Black) = 0.34/0.47=.72 = 72%

Bayes Theorem Statement

Let E1, E2,…,En be a set of events associated with a sample space S, where all the events E1, E2,…, En have
nonzero probability of occurrence and they form a partition of S. Let A be any event associated with S, then
according to Bayes theorem,

Bayes Theorem Proof

Note:
The following terminologies are also used when the Bayes theorem is applied:
Hypotheses: The events E1, E2,… En is called the hypotheses
Priori Probability: The probability P(Ei) is considered as the priori probability of hypothesis Ei
Posteriori Probability: The probability P(Ei|A) is considered as the posteriori probability of hypothesis Ei

Bayes Theorem Formula

If A and B are two events, then the formula for Bayes theorem is given by:
P(A|B) = P(A∩B)/P(B)

Where P(A|B) is the probability of condition when event A is occurring while event B has already occurred.
P(A ∩ B) is the probability of event A and event B
P(B) is the probability of event B
Some illustrations will improve the understanding of the concept.
Example 1:A bag I contain 4 white and 6 black balls while another Bag II contains 4 white and 3 black
balls. One ball is drawn at random from one of the bags, and it is found to be black. Find the probability that
it was drawn from Bag I.
Solution:
Let E1 be the event of choosing the bag I, E2 the event of choosing the bag II, and A be the event of drawing
a black ball.
Then,P(E1) = P(E2) = 12
Also,P(A|E1) = P(drawing a black ball from Bag I) = 610 = 35
P(A|E2) = P(drawing a black ball from Bag II) = 37
By using Bayes’ theorem, the probability of drawing a black ball from bag I out of two bags,
P(E1|A) = P(E1)P(A|E1)P(E1)P(A│E1)+P(E2)P(A|E2)
=12 × 3512 × 35 + 12 × 37 = 712
Example 2: A man is known to speak truth 2 out of 3 times. He throws a die and reports that the
number obtained is a four. Find the probability that the number obtained is actually a four.
Solution:
Let A be the event that the man reports that number four is obtained.
Let E1 be the event that four is obtained and E2 be its complementary event.
Then, P(E1) = Probability that four occurs = 16
P(E2) = Probability that four does not occurs = 1 – P(E1) = 1 −16 = 56
Also, P(A|E1) = Probability that man reports four and it is actually a four = 23
P(A|E2) = Probability that man reports four and it is not a four = 13
By using Bayes’ theorem, probability that number obtained is actually a four,
P(E1|A) =P(E1)P(A|E1)P(E1)P(A│E1) + P(E2)P(A|E2) = 16 × 2316 × 23 + 56 × 13 = 27

Practice Problems: Solve the following problems using Bayes Theorem.

1. A bag contains 5 red and 5 black balls. A ball is drawn at random, its color is noted, and again the
ball is returned to the bag. Also, 2 additional balls of the color drawn are put in the bag. After that,
the ball is drawn at random from the bag. What is the probability that the second ball drawn from the
bag is red?
2. Of the students in the college, 60% of the students reside in the hostel and 40% of the students are
day scholars. Previous year result reports that 30% of all students who stay in the hostel scored A
Grade and 20% of day scholars scored A grade. At the end of the year, one student is chosen at
random and found that he/she has an A grade. What is the probability that the student is a hostlier?
3. From the pack of 52 cards, one card is lost. From the remaining cards of a pack, two cards are drawn
and both are found to be the diamond cards. What is the probability that the lost card being a
diamond?
Problem: A factory production line is manufacturing bolts using three machines, A, B and C. Of the total
output, machine A is responsible for 25%, machine B for 35% and machine C for the rest. It is known from
previous experience with the machines that 5% of the output from machine A is defective, 4% from machine
B and 2% from machine C. A bolt is chosen at random from the production line and found to be defective.
What is the probability that it came from (a) machine A (b) machine B (c) machine C?

Problem: An engineering company advertises a job in three newspapers, A, B and C. It is known that these
papers attract undergraduate engineering readerships in the proportions 2:3:1. The probabilities that an
engineering undergraduate sees and replies to the job advertisement in these papers are 0.002, 0.001 and 0.005
respectively. Assume that the undergraduate sees only one job advertisement. (a) If the engineering company
receives only one reply to it advertisements, calculate the probability that the applicant has seen the job
advertised in place A. (i) A, (ii) B, (iii) C. (b) If the company receives two replies, what is the probability that
both applicants saw the job advertised in paper A?
1) Of the students in the college, 60% of the students reside in the hostel and 40% of the students are day
scholars. Previous year result reports that 30% of all students who stay in the hostel scored A Grade and
20% of day scholars scored A grade. At the end of the year, one student is chosen at random and found
that he/she has an A grade. What is the probability that the student is a hostlier?

Ans:

2) you are planning a picnic today, but the morning is cloudy a. Oh no! 50% of all rainy days start off
cloudy! b. But cloudy mornings are common (about 40% of days start cloudy) And this is usually a dry
month (only 3 of 30 days tend to be rainy, or 10%). What is the chance of rain during the day?

Ans)P(Rain|Cloud) = P(Rain) P(Cloud|Rain)/P(Cloud)

• P(Rain) is Probability of Rain = 10%

• P(Cloud|Rain) is Probability of Cloud, given that Rain happens = 50%

• P(Cloud) is Probability of Cloud = 40%P(Rain|Cloud) = 0.1 x 0.5/0.4 = .125

Or a 12.5% chance of rain. Not too bad, let's have a picnic!

3) Jerome graphed the relationship between the temperature and the change in the number of people sledding
at a park. The x-coordinate is the temperature in

∘Cdegrees, and the y-coordinate is the change in the number of people who are sledding. Which quadrants
could contain a point showing a decrease in the number of people sledding?

Ans: Quadrants 3 &4

Explanation: Quadrants 3 &4 represent negative integers. If 0 on the x axis representsthe average/mean
number of people there, then 3& 4 show the decreasefrom the average number
4) Covid-19 tests are common nowadays, but some results of tests are not true. Let’s assume; a diagnostic
test has 99% accuracy and 60% of all people have Covid-19. If a patient tests positive, what is the
probability that they actually have the disease?

Ans: P(positive|covid19) = 0.99 P(covid19) = 0.6

P(positive) = 0.6*0.99+0.4*0.01=0.598

P(covid19|positive) =0.99*0.6/0.598=0.993 or 99.3%

1) In a chocolate manufacturing factory, machine A, B and C produces 25%, 35% and 40% of chocolates
respectively. Out of which 5%, 4% and 2% are spoiled chocolates respectively. If a chocolate is drawn at
random is spoiled then what is the probability that it is manufactured by machine B?

Ans: P(A)=25 % P(B)=35% P(C)=40% P(B|S) =?

P(S|A) =5 %

P(S|B) =4 % P(S|C)=2 %

P(B|S) =P(S|B) *P(B)/((P(S|A) P(A)+P(S|B) P(B)+P(S|C) P(C)

=0.040.35/ (0.050.25+0.040.035+0.020.4) 40.57

2) In Exton School, 40% of the girls like music and 24% of the girls like dance. Given that 30% of those that
like music also like dance, what percent of those that like dance also like music?

Ans: P(M)=40% P(D)=24% P(D|M) =30%

P(M|D) =0.3*0.4/0

=50%

1) 75% of the children in Exton school have a dog, and 30% have a cat. Given that 60% of those that have
cat also have a dog, what percent of those that have a dog also have a cat?

Ans: P(D)=75% P(C)=30% P(D|C) =60 %

P(C|D) =60%*30%/ (75 %) 24 %

2) 35% of the children in Exton school have a tablet, and 24% have a smart phone. Given that 42% of those
that have smart phone also have a tablet, what percent of those that have a tablet also have a smart phone?

Ans: P(T)=35% P(P)=24% P(T|P) =42 %

P(P|T) =0.42*0.24/0. 35
28.8%

3) A test for a disease gives a correct positive result with a probability of 0.95 when the disease is present but
gives an incorrect positive result (false positive) with a probability of 0.15 when the disease is not present.

Ans: If 5% of the population has the disease, and Jean tests positive to the test, what is the probability Jean
really has the disease?

P(Disease)=5% P(Positive)=

P (Disease Positive) =?

P (Positive Disease) =0. 95

P (Disease Positive) =P(P|D) P(D)/((P(P|D) P(D)+(P(P|D^*P(D^)

=0.950.05/ ((0.950.05)+(0.15*0. 95))

25%

5) Assume that the word ‘offer’ occurs in 80% of the spam messages in my account. Also, let’s assume
‘offer’ occurs in 10% of my desired e-mails. If 30% of the received e-mails are considered as a spam, and
I will receive a new message which contains ‘offer’, what is the probability that it is spam?

Ans: P(S)=30 %P(De) P(SPAM|OFEER) =? P(OFFER|DE) =10 %

P(OFFER|S) =80%

P(SPAM|OFEER) =P(OFFER|S) P(S)/(P(OFFER|S) P(S)+P(OFFER|DE)

*P(DE)

=0.80.3/ (0.80.3) +(0.1*0 .7) 77%

1) In a particular pain clinic, 10% of patients are prescribed narcotic pain killers. Overall, five percent of the
clinic’s patients are addicted to narcotics (including pain killers and illegal substances). Out of all the
people prescribed pain pills, 8% are addicts. If a patient is an addict, what is the probability that they will
be prescribed pain pills?
Ans: P(PK)=10% P(NA)=5% P(NA|PK) =8 %

P(PK|NA) =0.08*0.1/0. 05

16%

2) In Exton School, 60% of the boys play football and 36% of the boys play ice hockey. Given that 40% of
those that play football also play ice hockey, what percent of those that play ice hockey also play football?

Ans: P(F)=60% P(IH)=36% P(IH|F) =40%

P(F|IH) =0.4*0.6/0. 36

66.6%

6) . Imagine you are a financial analyst at an investment bank. According to your research of publicly-traded
companies, 60% of the companies that increased their share price by more than 5% in the last three years
replaced their CEOs during the period.

Ans: At the same time, only 35% of the companies that did not increase their share price by more than 5%in
the same period replaced their CEOs. Knowing that the probability that the stock prices grow by more than
5% is 4%, find the probability that the shares of a company that fires its CEO will increase by more than 5%

• P(A) – the probability that the stock price increases by 5%

• P(B) – the probability that the CEO is replaced

P(A|B) – the probability of the stock price increases by 5% given that the CEO has been replaced

P(B|A) – the probability of the CEO replacement given the stock price has increased by 5%.
7) A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class passed the
first test. What percent of those who passed the first test also passed the second test?

number of those who passed the first test is 42, while the number of those
who passed both tests is 25.

The problem asks what percent is 25 of 42.

The answer is = 59.52% (rounded)

1) Physics teacher gave her class two tests. 35% of the class passed both tests and 62% of the class passed
the first test. What percent of those who passed the first test also passed the second test?

Ans: The probability of both tests P (both tests) = 35% The probability of 1st test P (1st test) = 62%

The probability of 1st test and 2nd test P (1st test also 2nd test) =

= 35% / 62%

=0.5645

= 56.45%

2) . RTPCR tests are common nowadays, but some results of tests are not true. Let’s assume; a RTPCR test
has 60% accuracy and 80% of all people have Omicron. If a patient tests positive, what is the probability
that they actually have the disease?

If 5% of the population has the disease, and Jean tests positive to the test, what is the probability Jean really
has the disease?

Ans: If a RTPCR test has 60% accuracy, this means that the test is correct 60% of the time and incorrect 40%
of the time. If 80% of the population has Omicron, this means that 80% of the people who test positive for
Omicron actually have the disease, and 20% of the people who test positive for Omicron do not actually have
the disease. Therefore, if a patient tests positive for Omicron, the probability that they actually have the
disease is 80%.

In the second part of the question, if 5% of the population has the disease, and Jean tests positive for the
disease, the probability that Jean actually has the disease is 5%. This is because, regardless of the accuracy of
the test, the overall prevalence of the disease in the population remains the same
WHAT IS CARTESIAN PLANE?
A cartesian plane is part of the cartesian coordinate system This coordinate
system can be translated into one, two, and three dimensions. In two
dimensions, the plane is called the cartesian plane. It can also be called the
coordinate plane

Cartesian Plane Definition

A cartesian plane can be defined as a plane formed by the intersection of two

coordinate axes that are perpendicular to each other. The horizontal axis is
called the x-axis and the vertical one is the y-axis. These axes intersect with
each other at the origin whose location is given as (0, 0). Any point on the
cartesian plane is represented in the form of (x, y). Here, x is the distance of the
point from the y-axis and y is the distance from the x-axis.

Cartesian Plane Example

The two horizontal and vertical intersecting lines are the x and y axes respectively. The
coordinates of the point (5, 6) indicate that it is located at a distance of 5 units from the y-axisand
6 units from the x-axis.
Parts of a Cartesian Plane

A cartesian plane can be divided into three major parts. These three parts are vital when we try
to locate a point on the cartesian plane or draw the graph of a certain function. These are given
below as follows:

Axes - The two lines that intersect to form the cartesian plane are known as the axes.
The horizontal line is called the x-axis. The vertical line that is perpendicular to the x-axis is
known as the y- axis.

Origin - The point where the two perpendicular axes - x and y meet is known as the origin. The
coordinates of the origin are given by (0, 0). The axes are divided into two equal parts bythe
origin.

Quadrants - When the x and the y axes intersect, it divides the cartesian plane into 4 regions.
These are known as quadrants and extend infinitely.
WHAT ARE QUADRANTS?
Quadrant is the region enclosed by the intersection of the X-axis and the Y-axis. On the cartesian
plane when the two axes, X-axis and Y-axis, intersect with each other at 90º there are four regions
formed around it, and those regions are called quadrants. So, every plane has four quadrants each
bounded by half of the axes. Each quadrant is denoted by Roman numerals and named as
Quadrant I, Quadrant II, Quadrant III, and Quadrant IV based on their position with respect to the
axes.
WHAT ARE QUADRANTS?
In the cartesian system, a plane is divided into four regions by a horizontal line called X-axis and
a vertical line called Y-axis. These four regions are known as quadrants.

WHAT ARE THE CARTESIAN PLANE QUADRANTS?

When the two axes intersect each other, it divides the cartesian plane into four infinite regions.
These 4 regions are known as quadrants. The quadrants are bound by the two-semi x and y axes.
The quadrants can be numbered from one to four in an anti-clockwise direction. The signs of the x
and the y coordinates of a point will be different in each coordinate. Depending on the value of a
point, it can be located in a particular quadrant as given below.
 First Quadrant - x > 0 and y > 0. Thus, the sign of a point will be (+, +).
 Second Quadrant - x < 0 and y > 0. Thus, the sign of a point will be (-, +).
 Third Quadrant - x < 0 and y < 0. Thus, the sign of a point will be (-, -).
 Fourth Quadrant - x > 0 and y < 0. Thus, the sign of a point will be (+, -).

The positive direction will be upwards and towards the right while the negative direction is
downwards and to the left.
PROBLEM ON CARTESIAN PLANE
Dexter plans and records his spending. In the coordinate plane below, the x-axis represents the
number of days before or after today. The y-axis represents the amount of money Dexter planned
to spend in dollars. The points (-5, 8) and (2, 8) represent Dexter's planned spending on two
separate days. Plot the two points on the coordinate plane. How many days separate the two days
Dexter planned to spend $8
(b) How many days separate the two days Dexter planned to spend $8? Ans:
sqrt((x2-x1) 2 -(y2-y1)
=sqrt ((2-(-5))2 -(8-8)2
=sqrt (2+5)2
=sqrt (49)
=7 days

WHAT IS EQUATION OF LINE?

The equation of a line can be formed with the help of the slope of the line and a
point on the line. Let us understand more about the slope of the line and the
needed point on the line, to better understand the formation of the equation of a
line. The slope of the line is the inclination of the line with the positive x- axis
and is expressed as a numeric integer, fraction, or the tangent of the angle
it makes with the positive x-axis. The point refers to a point in the coordinate
system with the x coordinate and the y coordinate

The equation of line is an algebraic form of representing the set of points,

which together form a line in a coordinate system. The numerous points which
together form a line in the coordinate axis are represented as a set of variables
x, y to form an algebraic equation, which is referred to as an equation of a line.
Using the equation of any line, we can find whether a given The equation of line
is a linear equation with a degree of one. Let us understand more about the
different forms of the equation of a line and how to find the equation of line.

GENERAL EQUATION OF A LINE

The general equation of a line in two variables of the first degree is represented asAx

+ By +C = 0,

A, B ≠ 0 where A, B and C are constants which belong to real numbers. When

we represent the equation geometrically, we always get a straight line.SLOPE

INTERCEPT FORM

We know that the equation of a straight line in slope-intercept form is given as:

y = mx + c

Where m indicates the slope of the line and c is the y-intercept When B
≠ 0 then, the standard equation of first-degree Ax + By + C = 0 can be rewritten in slope-
intercept form as:

y = (− A/B) x − (C/B)

Thus, m= –A/B and c = –C/B.

60
The intercept of a line is the point through which the line crosses the x-axis or y-axis. Suppose a
line cuts the x-axis and y-axis at (a, 0) and (0, b), respectively. Then, the equation of a line
making intercepts equal to a and b on the x-axis and the y-axis respectively is given by:

x/a + y/b = 1

Now in case of the general form of the equation of the straight line, i.e., Ax+By+C = 0, if C ≠0,
then Ax + By + C = 0 can be written as;

x/(-C/A) + y/(-C/B) = 1 where a

= -C/A and b = – C/B

NORMAL FORM

The equation of the line whose length of the perpendicular from the origin is p and the angle
made by the perpendicular with the positive x-axis is given by α is given by:

x cos α+y sin α = p

This is known as the normal form of the line.

61
In case of the general form of the line Ax + By + C = 0 can be represented in normal form as: A

cos α = B sin α = – p

From this we can say that cos α = -p/A and sin α = -p/B.
Also it can be inferred that,

cos2α + sin2α = (p/A)2 + (p/B)21 =

p2 (A2 + B2/A2. B2)

From the general equation of a straight-line Ax + By + C = 0, we can conclude the following:

 The slope is given by -A/B, given that B ≠ 0.

 The x-intercept is given by -C/A and the y-intercept is given by -C/B.

 It can be seen from the above discussion that:

 If two points (x1, y1) and (x2, y2) are said to lie on the same side of the line Ax + By +
C = 0, then the expressions Ax1+ By1 + C and Ax2 + By2 + C will have the same sign
or else these points would lie on the opposite sides of the line.

WHAT IS A GRAPH?

1. A finite set of vertices also called as nodes.

2. A finite set of ordered pair of the form (u, v) called as edge. The pair
is ordered because (u, v) is not the same as (v, u) in case of a directed
graph(di-graph). The pair of the form (u, v) indicates that there is an
edge from vertex u to vertex v. The edges may contain
weight/value/cost.
Graphs are used to represent many real-life applications: Graphs are used to
represent networks. The networks may include paths in a city or
62
telephone network or circuit network. Graphs are also used in social networks
like linked In, Facebook. For example, in Facebook, each person is represented
with a vertex (or node). Each node is a structure and contains information like
person id, name, gender, and locale. See this for more applications of graph.

TYPES OF GRAPHS

NULL GRAPH

The Null Graph is also known as the order zero graph. The term "null graph"
refers to a graph with an empty edge set. In other words, a null graph has no
edges, and the null graph is present with only isolated vertices in the graph.

The image displayed above is a null or zero graphs because it has zero edges
between the three vertices of the graph.

TRIVIAL GRAPH

A graph is called a trivial graph if it has only one vertex present in it. The trivial
graph is the smallest possible graph that can be created with the least number of
vertices that is one vertex only.

The above is an example of a trivial graph having only a single vertex in the
whole graph named vertices A.

NON-DIRECTED GRAPH

A graph is called a non-directed graph if all the edges present between any
graph nodes are non-directed. By non-directed edges, we mean the edges of the
graph that cannot be determined from the node it is starting and at which node it
is ending. All the edges for a graph need to be non-directed to call it a non-
directed graph. All the edges of a non-directed graph don't have any direction.

The graph that is displayed above is an example of a disconnected graph. This graph is called a disconnected
graph because there are four vertices named vertex A, vertex B, vertex C, and vertex D. There are also
exactly four edges between these vertices of the graph. And all the vertices that are present between the
different nodes of the graph are not directed, which meansthe edges don't have any specific direction.

For example, the edge between vertex A and vertex B doesn't have any direction, so we cannot
determine whether the edge between vertex A and vertex B starts from vertex A or vertex B.
Similarly, we can't determine the ending vertex of this edge between these nodes.

DIRECTED GRAPH

Another name for the directed graphs is digraphs. A graph is called a directed graph or digraph if
all the edges present between any vertices or nodes of the graph are directed or have a defined
direction. By directed edges, we mean the edges of the graph that have a direction to determine
from which node it is starting and at which node it is ending.
All the edges for a graph need to be directed to call it a directed graph or digraph. All the edgesof a
directed graph or digraph have a direction that will start from one vertex and end at another.

The graph that is displayed above is an example of a connected graph. This graph is called a
connected graph because there are four vertices in the graph named vertex A, vertex B, vertex C,
and vertex D. There are also exactly four edges between these vertices of the graph and all

the vertices that are present between the different nodes of the graph are directed (or pointing to
some of the vertices) which means the edges have a specific direction assigned to them.

For example, consider the edge that is present between vertex D and vertex A. This edge showsthat
an arrowhead is pointing towards vertex A, which means this edge starts from vertex D and ends
at vertex A.

CONNECTED GRAPH

For a graph to be labelled as a connected graph, there must be at least a single path between
every pair of the graph's vertices. In other words, we can say that if we start from one vertex, we
should be able to move to any of the vertices that are present in that particular graph, which means
there exists at least one path between all the vertices of the graph.

The graph shown above is an example of a connected graph because we start from any one of the
vertices of the graph and start moving towards any other remaining vertices of the graph. There
will exist at least one path for traversing the graph.

For example, if we begin from vertex B and traverse to vertex H, there are various paths for
traversing. One of the paths is

Vertices B -> vertices C -> vertices D -> vertices F -> vertices E -> vertices H.

Similarly, there are other paths for traversing the graph from vertex B to vertex H. there is at
least one path between all the graph nodes. In other words, we can say that all the vertices or
nodes of the graph are connected to each other via edge or number of edges.

DISCONNECTED GRAPH

graph is said to be a disconnected graph where there does not exist any path between at least one
pair of vertices. In other words, we can say that if we start from any one of the vertices of the
graph and try to move to the remaining present vertices of the graph and there exists not even a
single path to move to that vertex, then it is the case of the disconnected graph. If any one of such
a pair of vertices doesn't have a path between them, it is called a disconnected graph.

The graph shown above is a disconnected graph. The above graph is called a disconnected graph
because at least one pair of vertices doesn't have a path to traverse starting from either node.For
example, a single path between both vertices doesn't exist if we want to traverse from vertex A to
vertex G. In other words, we can say that all the vertices or nodes of the graph are not connected
to each other via edge or number of edges so that they can be traversed.

REGULAR GRAPH
For a graph to be called a regular, it should satisfy one primary condition: all graph vertices
should have the same degree. By the degree of vertices, we mean the number of nodes associated
with a particular vertex. If all the graph nodes have the same degree value, then the graph is
called a regular graph. If all the vertices of a graph have the degree value of 6, then the graph is
called a 6-regular graph. If all the vertices in a graph are of degree 'k', then it is called a "k-
regular graph".
The graphs that are displayed above are regular graphs. In graph 1, there are three vertices named
vertex A, vertex B, and vertex C, All the vertices in graph 1, have the degree of each node as 2.
The degree of each vertex is calculated by counting the number of edges connected to that
particular vertex.

For vertex A in graph 1, there are two edges associated with vertex A, one from vertex B and
another from vertex D. Thus, the degree of vertex A of graph one is 2. Similarly, for other
vertices of the graph, there are only two edges associated with each vertex, vertex B and vertex

D. Therefore, vertex B and vertex Dare 2. As the degree of all the three nodes of the graph isthe
same, that is 2. Therefore, this graph is called a 2-regular graph.

Similarly, for the second graph shown above, there are four vertices named vertex E, vertex F,
vertex G, and vertex F. The degree of all the four vertices of this graph is 2. Each vertex of the
graph is 2 because only two edges are associated with all of the graph's vertices. As all the nodes
of this graph have the same degree of 2, this graph is called a REGULAR GRAPH

COMPLETE GRAPH

A graph is said to be a complete graph if, for all the vertices of the graph, there exists an edge
between every pair of the vertices. In other words, we can say that all the vertices are connected to
the rest of all the vertices of the graph. A complete graph of 'n' vertices contains exactly nC2
edges, and a complete graph of 'n' vertices is represented as Kn.

here are two graphs name K3 and K4 shown in the above image, and both graphs are complete
graphs. Graph K3 has three vertices, and each vertex has at least one edge with the rest of the
vertices. Similarly, for graph K4, there are four nodes named vertex E, vertex F, vertex G, and
vertex H. For example, the vertex F has three edges connected to it to connect it to the respective
three remaining vertices of the graph. Likewise, for the other three reaming vertices,

there are three edges associated with each one of them. As all the vertices of this graph have a
separate edge for other vertices, it is called a complete graph.
CYCLE GRAPH

If a graph with many vertices greater than three and edges form a cycle, then the graph is called a
cycle graph. In a graph of cycle type, the degree of all the vertices of the cycle graph will be2.

There are three graphs shown in the above image, and all of them are examples of the cyclic
graph because the number of nodes for all of these graphs is greater than two and the degree of all
the vertices of all these graphs is exactly 2.

CYCLIC GRAPH

For a graph to be called a cyclic graph, it should consist of at least one cycle. If a graph has a
minimum of one cycle present, it is called a cyclic graph.

The graph shown in the image has two cycles present, satisfying the required condition for a
graph to be cyclic, thus making it a cyclic graph.

FINITE GRAPH

A graph G= (V, E) in case the number of vertices and edges in the graph is finite in number.
INFINITE GRAPH

A graph is said to be infinite if it has an infinite number of vertices as well as an infinite

number of edges.

SIMPLE GRAPH

A simple graph is a graph that does not contain more than one edge between the pair of
vertices. A simple railway track connecting different cities is an example of a simple graph.

MULTI GRAPH

Any graph which contains some parallel edges but doesn’t contain any self-loop is called a
multigraph. For example, a Road Map.
Parallel Edges: If two vertices are connected with more than one edge then suchedges are called parallel
edges that are many routes but one destination.
 Loop: An edge of a graph that starts from a vertex and ends at

BIPATITE GRAPH the same vertex iscalled a loop or a self-loop.

PSEUDO GRAPH
A graph G with a self-loop and some multiple edges is called a pseudo graph.

73
A graph G = (V, E) is said to be a bipartite graph if its vertex set V(G) can be partitioned into
two non-empty disjoint subsets. V1(G) and V2(G) in such a way that each edge e of E(G)has
one end in V1(G) and another end in V2(G). The partition V1 U V2 = V is called Bipartite of G.
Here in the figure: V1(G)= {V5, V4, V3} and V2(G)= {V1, V2}

DIGRAPH GRAPH

A graph G = (V, E) with a mapping f such that every edge maps onto some ordered pair of
vertices (Vi, Vj) are called a Digraph. It is also called Directed Graph. The ordered pair (Vi, Vj)
means an edge between Vi and Vj with an arrow directed from Vi to Vj. Here in the figure: e1 =
(V1, V2) e2 = (V2, V3) e4 = (V2, V4)

SUB GRAPH
A graph G1 = (V1, E1) is called a subgraph of a graph G (V, E) if V1(G) is a subset of V(G)and
E1(G) is a subset of E(G) such that each edge of G1 has same end vertices as in G.
EXPONENTS

An exponent of a number shows how many times we are multiplying a number

by itself. For example, 34 means we are multiplying 3 four times. Its expanded
form is 3×3×3×3. Exponent is also known as the power of a number. It can be a
whole number, fraction, negative number, or decimals. Let's learn more about
exponents in this article.

WHAT ARE EXPONENTS?

The exponent of a number shows how many times the number is multiplied by itself. For
example, 2×2×2×2 can be written as 24, as 2 is multiplied by itself 4 times. Here, 2 is called the
"base" and 4 is called the "exponent" or "power." In general, xn means that x is multipliedby
itself for n times.

Here, in the term xn,

 x is called the "base"
 n is called the "exponent"
 xn is read as "x to the power of n" (or) "x raised to n".

one examples of exponents are as follows:

 3 × 3 × 3 × 3 × 3 = 35
 -2 × -2 × -2 = (-2)3
 a × a × a × a × a × a = a6
Exponents are important because, without them, when a number is repeated by itself
many times it is very difficult to write the product. For example, it is very easy to
write 57 instead ofwriting 5 × 5 × 5 × 5 × 5 × 5 × 5.

PROPERITES OF EXPONENTS
The properties of exponents or laws of exponents are used to solve problems
involving exponents. These properties are also considered as major exponents rules
to be followed whilesolving exponents.

The properties of exponents are mentioned below:

 Law of Product: am × an = am+n
 Law of Quotient: am/an = am-n
 Law of Zero Exponent: a0 = 1
 Law of Negative Exponent: a-m = 1/am
 Law of Power of a Power: (am)n = amn
 Law of Power of a Product: (ab)m = ambm
 Law of Power of a Quotient: (a/b)m = am/bm

NEGATIVE EXPONENT
A negative exponent tells us how many times we have to multiply the reciprocal of
the base. For example, if it is given that a-n, it can be expanded as 1/an. It means
we have to multiply the reciprocal of a, i.e 1/a 'n' times. Negative exponents are
used while writing fraction with exponents. Some of the examples of negative
exponents are 2 × 3-9, 7-3, 67-5, etc.

We can convert these into positive exponents as follows:

 2 × 3-9 = 2 × (1/39) = 2 / 39
 7-3 = 1/73
 67-5 = 1/675

Google Certificate (Notes)
No ratings yet
Google Certificate (Notes)
10 pages
SCLIP Cohort 19 - Module 1 Slides
No ratings yet
SCLIP Cohort 19 - Module 1 Slides
57 pages
Water Level Indicator With Numeric Display - Final
0% (1)
Water Level Indicator With Numeric Display - Final
17 pages
Ime-Ita Apostila Ingles Vol 1
No ratings yet
Ime-Ita Apostila Ingles Vol 1
34 pages
data scince report
No ratings yet
data scince report
11 pages
Data Scientist Archetypes
No ratings yet
Data Scientist Archetypes
11 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Intro DA and ML Lecture 1 - S-2
No ratings yet
Intro DA and ML Lecture 1 - S-2
17 pages
Unit 1 DA
No ratings yet
Unit 1 DA
72 pages
An Introduction To Data Science (2022 Updated Edition)
No ratings yet
An Introduction To Data Science (2022 Updated Edition)
9 pages
Data Science Pipeline, EDA & Data Preparation
No ratings yet
Data Science Pipeline, EDA & Data Preparation
14 pages
1 - Introduction To Data Science
No ratings yet
1 - Introduction To Data Science
6 pages
Data Science
No ratings yet
Data Science
5 pages
DS
No ratings yet
DS
94 pages
Introduction of Data Science
No ratings yet
Introduction of Data Science
3 pages
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
22amh32 - Data Analytics and Data Science Unit I & Data Science Process 1. Data Science Process
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Data Science Process 1. Data Science Process
7 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Data Science Material
No ratings yet
Data Science Material
48 pages
Unit I
No ratings yet
Unit I
52 pages
FDS Notes
No ratings yet
FDS Notes
143 pages
Data Science
No ratings yet
Data Science
18 pages
DATA ANALYTICS 1
No ratings yet
DATA ANALYTICS 1
13 pages
Life Cycle of Data Science - Complete Step-By-step Guide
No ratings yet
Life Cycle of Data Science - Complete Step-By-step Guide
3 pages
Data Science
No ratings yet
Data Science
18 pages
Fods Notes
No ratings yet
Fods Notes
139 pages
Foundation of Data Science
100% (2)
Foundation of Data Science
143 pages
5 Data Science Project Lifecycle
No ratings yet
5 Data Science Project Lifecycle
33 pages
FDS NOTES
No ratings yet
FDS NOTES
137 pages
Unit 1
No ratings yet
Unit 1
30 pages
Data Science
100% (2)
Data Science
33 pages
Introduction of Data Science.docx
No ratings yet
Introduction of Data Science.docx
28 pages
introduction to data science
No ratings yet
introduction to data science
8 pages
Module 1
No ratings yet
Module 1
35 pages
Unit 3
No ratings yet
Unit 3
9 pages
Data Science Notes
No ratings yet
Data Science Notes
61 pages
Life Cycle of DS Project
No ratings yet
Life Cycle of DS Project
9 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
Lecture 2 The data science process and tools for each step
No ratings yet
Lecture 2 The data science process and tools for each step
8 pages
DATA ANALYTICS (1)
No ratings yet
DATA ANALYTICS (1)
7 pages
ML Workflow Steps: Step 2: Building Dataset
No ratings yet
ML Workflow Steps: Step 2: Building Dataset
5 pages
Data Science
No ratings yet
Data Science
7 pages
Task 2a
No ratings yet
Task 2a
16 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
Data Science With Python (MSC 3rd Sem) Unit 1
No ratings yet
Data Science With Python (MSC 3rd Sem) Unit 1
17 pages
Data Manipulation at Scale
No ratings yet
Data Manipulation at Scale
4 pages
Data Science
No ratings yet
Data Science
4 pages
Unit-I Introduction To Data Science
No ratings yet
Unit-I Introduction To Data Science
40 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Inteliment Technologies Presentation
No ratings yet
Inteliment Technologies Presentation
11 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Data Discourse Over The Years
No ratings yet
Data Discourse Over The Years
7 pages
Data Analyst Interview Questions PDF - E-Learning Portal
No ratings yet
Data Analyst Interview Questions PDF - E-Learning Portal
18 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Chapter 1- Intr to DS and Business Understanding
No ratings yet
Chapter 1- Intr to DS and Business Understanding
35 pages
Thesis For Information Systems
100% (3)
Thesis For Information Systems
4 pages
MLM FDS
No ratings yet
MLM FDS
19 pages
Data Analyst
No ratings yet
Data Analyst
12 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
123 Loads
No ratings yet
123 Loads
4 pages
Free Multiple Face Swap AI(No Sign-up)
No ratings yet
Free Multiple Face Swap AI(No Sign-up)
1 page
Employee Nda 353
No ratings yet
Employee Nda 353
4 pages
Mis Research Paper
No ratings yet
Mis Research Paper
2 pages
CNT 2020 03 01
No ratings yet
CNT 2020 03 01
51 pages
Stratix 2000 Ethernet Unmanaged Switches: Installation Instructions
No ratings yet
Stratix 2000 Ethernet Unmanaged Switches: Installation Instructions
10 pages
Banking and Insurance Law Assignment (Abhinav)
No ratings yet
Banking and Insurance Law Assignment (Abhinav)
18 pages
STR Project Metvy
No ratings yet
STR Project Metvy
59 pages
Chris Marine Complete Product Catalogue Sep 2020 W Scanvi Logo
No ratings yet
Chris Marine Complete Product Catalogue Sep 2020 W Scanvi Logo
17 pages
SSM Institute of Engineering and Technology: P.Mohana Karthiga A.P/EEE
No ratings yet
SSM Institute of Engineering and Technology: P.Mohana Karthiga A.P/EEE
7 pages
Centreon Map Ene
No ratings yet
Centreon Map Ene
66 pages
Understanding Phases of E-Government Project PDF
No ratings yet
Understanding Phases of E-Government Project PDF
6 pages
NC941 One Zone Call Controller C/W PSU: General Installation
100% (1)
NC941 One Zone Call Controller C/W PSU: General Installation
2 pages
Computer Systems Servicing DLL
No ratings yet
Computer Systems Servicing DLL
11 pages
Animation: Introduction To Multimedia Athina Stewart Garfield Smith 1600948
No ratings yet
Animation: Introduction To Multimedia Athina Stewart Garfield Smith 1600948
21 pages
Automation: What Is RPA?
No ratings yet
Automation: What Is RPA?
7 pages
Icue H100i H115i H150i H170i Elite Capellix XT Series QSG Ab WW
No ratings yet
Icue H100i H115i H150i H170i Elite Capellix XT Series QSG Ab WW
122 pages
POWER9 Enterprise Servers Level 2
No ratings yet
POWER9 Enterprise Servers Level 2
9 pages
BL-M8822CS1-S (VS) Product Specification V1.0 (BT5.0)
No ratings yet
BL-M8822CS1-S (VS) Product Specification V1.0 (BT5.0)
16 pages
IAAF Advertising Regulations
100% (1)
IAAF Advertising Regulations
39 pages
WS - Significant Figures
No ratings yet
WS - Significant Figures
3 pages
Bca Jan 2024 Paper Solution c Lan
No ratings yet
Bca Jan 2024 Paper Solution c Lan
24 pages
Minutes of Meeting Held on 21.01.2025
No ratings yet
Minutes of Meeting Held on 21.01.2025
5 pages
Clark Kozma Paper
No ratings yet
Clark Kozma Paper
7 pages
Ibright CL1000 FL1000 Brochure
No ratings yet
Ibright CL1000 FL1000 Brochure
16 pages
Audio Spotlight: Presented By: Aswanidevaraj Tje16Ec009 Guided By: Ms - Anjaly Krishnan Asst - Professor Department of Ece
No ratings yet
Audio Spotlight: Presented By: Aswanidevaraj Tje16Ec009 Guided By: Ms - Anjaly Krishnan Asst - Professor Department of Ece
24 pages
Rogue Manual PDF
No ratings yet
Rogue Manual PDF
17 pages