IDS - UNIT-2 - Notes part1_Introduction to Data Science and Prob concept[1]
IDS - UNIT-2 - Notes part1_Introduction to Data Science and Prob concept[1]
• The business requirement step deals with the identification of the problem and objectives of the
organization requirements. It also identifies the parameters that are to be forecasted or predicted.
• The data acquisition step deals with finding and collecting of the source of data and store the data for
finding the information of interest that meets the business requirements.
• The data processing step is used to transform the data to a form that suits better for finding the required
information. The major task of this step is data cleaning i.e removal of unwanted data from the
available raw data.
• The data exploration step is a brain storming step where identification of pattern is done. Here
visualization charts are used extract the required information.
• The data modeling step deals with building of data models and training the models using the data sets.
It uses machine learning algorithms and techniques for better prediction and forecasting.
• The deployment stage deals with the deployment of the model in the business environment.
2.2 Data Science Life Cycle:
Following are the seven steps that make up a data science lifecycle - business understanding, data mining,
data cleaning, data exploration, feature engineering, predictive modeling, and data visualization.
1. Business Understanding
The data scientists in the room are the people who keep asking the why’s. They’re the people who want to
ensure that every decision made in the company is supported by concrete data, and that it is guaranteed (with
a high probability) to achieve results. Before you can even start on a data science project, it is critical that you
understand the problem you are trying to solve. we typically use data science to answer five types of
questions:
In this stage, you should also be identifying the central objectives of your project by identifying the variables
that need to be predicted. If it’s a regression, it could be something like a sales forecast. If it’s a clustering, it
could be a customer profile. Understanding the power of data and how you can utilize it to derive results for
your business by asking the right questions is more of an art than a science, and doing this well comes with a
lot of experience. One shortcut to gaining this experience is to read what other people have to say about the
topic, which is why I’m going to suggest a bunch of books to get started.
2. Data Mining
Now that you’ve defined the objectives of your project, it’s time to start gathering the data. Data mining is the
process of gathering your data from different sources. Some people tend to group data retrieval and cleaning
together, but each of these processes is such a substantial step that I’ve decided to break them apart. At this
stage, some of the questions worth considering are - what data do I need for my project? Where does it live?
How can I obtain it? What is the most efficient way to store and access all of it?
If all the data necessary for the project is packaged and handed to you, you’ve won the lottery. More often
than not, finding the right data takes both time and effort. If the data lives in databases, your job is relatively
simple - you can query the relevant data using SQL queries, or manipulate it using a dataframe tool like
Pandas. However, if your data doesn’t actually exist in a dataset, you’ll need to scrape it. Beautiful Soup is a
popular library used to scrape web pages for data. If you’re working with a mobile app and want to track user
engagement and interactions, there are countless tools that can be integrated within the app so that you can
start getting valuable data from customers. Google Analytics, for example, allows you to define custom
events within the app which can help you understand how your users behave and collect the corresponding
data.
3. Data Cleaning
Now that you’ve got all of your data, we move on to the most time-consuming step of all - cleaning and
preparing the data. This is especially true in big data projects, which often involve terabytes of data to work
with. According to interviews with data scientists, this process (also referred to as ‘data janitor work’) can
often take 50 to 80 percent of their time. So what exactly does it entail, and why does it take so long?
The reason why this is such a time consuming process is simply because there are so many possible scenarios
that could necessitate cleaning. For instance, the data could also have inconsistencies within the same column,
meaning that some rows could be labelled 0 or 1, and others could be labelled no or yes. The data types could
also be inconsistent - some of the 0s might integers, whereas some of them could be strings. If we’re dealing
with a categorical data type with multiple categories, some of the categories could be misspelled or have
different cases, such as having categories for both male and Male. This is just a subset of examples where you
can see inconsistencies, and it’s important to catch and fix them in this stage.
One of the steps that is often forgotten in this stage, causing a lot of problems later on, is the presence of
missing data. Missing data can throw a lot of errors in the model creation and training. One option is to either
ignore the instances which have any missing values. Depending on your dataset, this could be unrealistic if
you have a lot of missing data. Another common approach is to use something called average imputation,
which replaces missing values with the average of all the other instances. This is not always recommended
because it can reduce the variability of your data, but in some cases it makes sense.
4. Data Exploration
Now that you’ve got a sparkling clean set of data, you’re ready to finally get started in your analysis. The data
exploration stage is like the brainstorming of data analysis. This is where you understand the patterns and bias
in your data. It could involve pulling up and analyzing a random subset of the data using Pandas, plotting a
histogram or distribution curve to see the general trend, or even creating an interactive visualization that lets
you dive down into each data point and explore the story behind the outliers.
Using all of this information, you start to form hypotheses about your data and the problem you are tackling.
If you were predicting student scores for example, you could try visualizing the relationship between scores
and sleep. If you were predicting real estate prices, you could perhaps plot the prices as a heat map on a
spatial plot to see if you can catch any trends.
There is a great summary of tools and approaches on the Wikipedia page for exploratory data analysis.
5. Feature Engineering
In machine learning, a feature is a measurable property or attribute of a phenomenon being observed. If we
were predicting the scores of a student, a possible feature is the amount of sleep they get. In more complex
prediction tasks such as character recognition, features could be histograms counting the number of black
pixels.
According to Andrew Ng, one of the top experts in the fields of machine learning and deep learning, “Coming
up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning’ is
basically feature engineering.” Feature engineering is the process of using domain knowledge to transform
your raw data into informative features that represent the business problem you are trying to solve. This stage
will directly influence the accuracy of the predictive model you construct in the next stage.
We typically perform two types of tasks in feature engineering - feature selection and construction.
Feature selection is the process of cutting down the features that add more noise than information. This is
typically done to avoid the curse of dimensionality, which refers to the increased complexity that arises from
high-dimensional spaces (i.e. way too many features). I won’t go too much into detail here because this topic
can be pretty heavy, but we typically use filter methods (apply statistical measure to assign scoring to each
feature), wrapper methods (frame the selection of features as a search problem and use a heuristic to perform
the search) or embedded methods (use machine learning to figure out which features contribute best to the
accuracy).
Feature construction involves creating new features from the ones that you already have (and possibly
ditching the old ones). An example of when you might want to do this is when you have a continuous
variable, but your domain knowledge informs you that you only really need an indicator variable based on a
known threshold. For example, if you have a feature for age, but your model only cares about if a person is an
adult or minor, you could threshold it at 18, and assign different categories to instances above and below that
threshold. You could also merge multiple features to make them more informative by taking their sum,
difference or product. For example, if you were predicting student scores and had features for the number of
hours of sleep on each night, you might want to create a feature that denoted the average sleep that the student
had instead.
6. Predictive Modeling
Predictive modeling is where the machine learning finally comes into your data science project. I use the term
predictive modeling because I think a good project is not one that just trains a model and obsesses over the
accuracy, but also uses comprehensive statistical methods and tests to ensure that the outcomes from the
model actually make sense and are significant. Based on the questions you asked in the business
understanding stage, this is where you decide which model to pick for your problem. This is never an easy
decision, and there is no single right answer. The model (or models, and you should always be testing several)
that you end up training will be dependent on the size, type and quality of your data, how much time and
computational resources you are willing to invest, and the type of output you intend to derive. There are a
couple of different cheat sheets available online which have a flowchart that helps you decide the right
algorithm based on the type of classification or regression problem you are trying to solve. The two that I
really like are the Microsoft Azure Cheat Sheet and SAS Cheat Sheet.
Once you’ve trained your model, it is critical that you evaluate its success. A process called k-fold cross
validation is commonly used to measure the accuracy of a model. It involves separating the dataset into k
equally sized groups of instances, training on all the groups except one, and repeating the process with
different groups left out. This allows the model to be trained on all the data instead of using a typical train-test
split.
For classification models, we often test accuracy using PCC (percent correct classification), along with a
confusion matrix which breaks down the errors into false positives and false negatives. Plots such as as ROC
curves, which is the true positive rate plotted against the false positive rate, are also used to benchmark the
success of a model. For a regression model, the common metrics include the coefficient of determination
(which gives information about the goodness of fit of a model), mean squared error (MSE), and average
absolute error.
7. Data Visualization
Data visualization is a tricky field, mostly because it seems simple but it could possibly be one of the hardest
things to do well. That’s because data viz combines the fields of communication, psychology, statistics, and
art, with an ultimate goal of communicating the data in a simple yet effective and visually pleasing way. Once
you’ve derived the intended insights from your model, you have to represent them in way that the different
key stakeholders in the project can understand.
Again, this is a topic that could be a blog post on its own, so instead of diving deeper into the field of data
visualization, I will give a couple of starting points. I personally love working through the analysis and
visualization pipeline on an interactive Python notebook like Jupyter, in which I can have my code and
visualizations side by side, allowing for rapid iteration with libraries like Seaborn and Bokeh. Tools like
Tableau and Plotly make it really easy to drag-and-drop your data into a visualization and manipulate it to get
more complex visualizations. If you’re building an interactive visualization for the web, there is no better
starting point than D3.js.
8. Business Understanding
Phew. Now that you’ve gone through the entire lifecycle, it’s time to go back to the drawing board.
Remember, this is a cycle, and so it’s an iterative process. This is where you evaluate how the success of your
model relates to your original business understanding. Does it tackle the problems identified? Does the
analysis yield any tangible solutions? If you encountered any new insights during the first iteration of the
lifecycle (and I assure you that you will), you can now infuse that knowledge into the next iteration to
generate even more powerful insights, and unleash the power of data to derive phenomenal results for your
business or project.
1. SAS
It is one of those data science tools which are specifically designed for statistical operations.
SAS is a closed source proprietary software that is used by large organizations to analyze
data. SAS uses base SAS programming language which for performing statistical modeling.
It is widely used by professionals and companies working on reliable commercial software.
SAS offers numerous statistical libraries and tools that you as a Data Scientist can use for
modeling and organizing their data. While SAS is highly reliable and has strong support from
the company, it is highly expensive and is only used by larger industries. Also, SAS pales in
comparison with some of the more modern tools which are open-source. Furthermore, there
are several libraries and packages in SAS that are not available in the base pack and can
require an expensive upgradation.
2.Apache Spark
Apache Spark or simply Spark is an all-powerful analytics engine and it is the most used
Data Science tool. Spark is specifically designed to handle batch processing and Stream
Processing. It comes with many APIs that facilitate Data Scientists to make repeated access
to data for Machine Learning, Storage in SQL, etc. It is an improvement over Hadoop and
can perform 100 times faster than MapReduce. Spark has many Machine Learning APIs that
can help Data Scientists to make powerful predictions with the given data.
Spark does better than other Big Data Platforms in its ability to handle streaming data. This
meansthat Spark can process real-time data as compared to other analytical tools that process
only historical data in batches. Spark offers various APIs that are programmable in Python,
Java, and R.But the most powerful conjunction of Spark is with Scala programming language
which is based onJava Virtual Machine and is cross-platform in nature.
Spark is highly efficient in cluster management which makes it much better than Hadoop as
the latter is only used for storage. It is this cluster management system that allows Spark to
process application at a high speed.
3. BigML
BigML, it is another widely used Data Science Tool. It provides a fully interactable,
cloud-basedGUI environment that you can use for processing Machine Learning Algorithms.
BigML provides a standardized software using cloud computing for industry requirements.
Through it, companies can use Machine Learning algorithms across various parts of their
company. For example, it can use this one software across for sales forecasting, risk
analytics, and product innovation. BigML specializes in predictive modeling. It uses a wide
variety of Machine Learning algorithms like clustering, classification, time-series
forecasting, etc.
BigML provides an easy to use web-interface using Rest APIs and you can create a free
account or a premium account based on your data needs. It allows interactive visualizations
of data and provides you with the ability to export visual charts on your mobile or IOT
devices.
Furthermore, BigML comes with various automation methods that can help you to automate
the tuning of hyperparameter models and even automate the workflow of reusable scripts.
4. D3.js
Javascript is mainly used as a client-side scripting language. D3.js, a Javascript library allows
you to make interactive visualizations on your web-browser. With several APIs of D3.js, you
can use several functions to create dynamic visualization and analysis of data in your
browser. Another powerful feature of D3.js is the usage of animated transitions. D3.js makes
documents dynamic by allowing updates on the client side and actively using the change in
data to reflect visualizations onthe browser.
You can combine this with CSS to create illustrious and transitory visualizations that will
help you to implement customized graphs on web-pages. Overall, it can be a very useful tool
for Data Scientists who are working on IOT based devices that require client-side interaction
for visualization and data processing.
5.MATLAB
MATLAB is a multi-paradigm numerical computing environment for processing
mathematical information. It is a closed-source software that facilitates matrix functions,
algorithmic implementation and statistical modeling of data. MATLAB is most widely used
in several scientific disciplines.
In Data Science, MATLAB is used for simulating neural networks and fuzzy logic. Using the
MATLAB graphics library, you can create powerful visualizations. MATLAB is also used in
image and signal processing. This makes it a very versatile tool for Data Scientists as they
can tackle all the problems, from data cleaning and analysis to more advanced Deep Learning
algorithms.
Furthermore, MATLAB‘s easy integration for enterprise applications and embedded
systems make it an ideal Data Science tool. It also helps in automating various tasks ranging
from extraction of data to re-use of scripts for decision making. However, it suffers from the
limitation of being a closed-source proprietary software.
6. Excel
Probably the most widely used Data Analysis tool. Microsoft developed Excel mostly for
spreadsheet calculations and today, it is widely used for data processing, visualization, and
complex calculations. Excel is a powerful analytical tool for Data Science. While it has been
the traditional tool for data analysis, Excel still packs a punch.
Excel comes with various formulae, tables, filters, slicers, etc. You can also create
your own custom functions and formulae using Excel. While Excel is not for calculating the
huge amount of Data, it is still an ideal choice for creating powerful data visualizations and
spreadsheets. You can also connect SQL with Excel and can use it to manipulate and analyze
data. A lot of Data Scientists use Excel for data cleaning as it provides an interactable GUI
environment to pre-process information easily.
With the release of ToolPak for Microsoft Excel, it is now much easier to compute complex
analyzations. However, it still pales in comparison with much more advanced Data Science
tools like SAS. Overall, on a small and non-enterprise level, Excel is an ideal tool for data
analysis.
7. ggplot2
ggplot2 is an advanced data visualization package for the R programming language. The
developers created this tool to replace the native graphics package of R and it uses powerful
commands to create illustrious visualizations. It is the most widely used library that Data
Scientistsuse for creating visualizations from analyzed data.
Ggplot2 is part of tidyverse, a package in R that is designed for Data Science. One way in
which ggplot2 is much better than the rest of the data visualizations is aesthetics. With
ggplot2, Data Scientists can create customized visualizations in order to engage in enhanced
storytelling. Using ggplot2, you can annotate your data in visualizations, add text labels to
data points and boost intractability of your graphs. You can also create various styles of maps
such as choropleths, cartograms, hexbins, etc. It is the most used data science tool.
8. Tableau
Tableau is a Data Visualization software that is packed with powerful graphics to make
interactive visualizations. It is focused on industries working in the field of business
intelligence. The most important aspect of Tableau is its ability to interface with databases,
spreadsheets, OLAP (Online Analytical Processing) cubes, etc. Along with these features,
Tableau has the ability to visualize geographical data and for plotting longitudes and latitudes
in maps.
Along with visualizations, you can also use its analytics tool to analyze data. Tableau comes
with an active community and you can share your findings on the online platform. While
Tableau is enterprise software, it comes with a free version called Tableau Public.
9. Jupyter
Project Jupyter is an open-source tool based on IPython for helping developers in making
open- source software and experiences interactive computing. Jupyter supports multiple
languages like Julia, Python, and R. It is a web-application tool used for writing live code,
visualizations, and presentations. Jupyter is a widely popular tool that is designed to
address the requirements ofData Science.
It is an interactable environment through which Data Scientists can perform all of their
responsibilities. It is also a powerful tool for storytelling as various presentation features are
present in it. Using Jupyter Notebooks, one can perform data cleaning, statistical
computation, visualization and create predictive machine learning models. It is 100% open-
source and is, therefore, free of cost. There is an online Jupyter environment called
Collaboratory which runs on the cloud and stores the data in Google Drive.
10. Matplotlib
Matplotlib is a plotting and visualization library developed for Python. It is the most popular
tool for generating graphs with the analyzed data. It is mainly used for plotting complex
graphs using simple lines of code. Using this, one can generate bar plots, histograms,
scatterplots etc. Matplotlib has several essential modules. One of the most widely used
modules is pyplot. It offers a MATLAB like an interface. Pyplot is also an open-source
alternative to MATLAB‘s graphic modules.
Matplotlib is a preferred tool for data visualizations and is used by Data Scientists over other
contemporary tools. As a matter of fact, NASA used Matplotlib for illustrating data
visualizations during the landing of Phoenix Spacecraft. It is also an ideal tool for beginners
in learning data visualization with Python.
11. NLTK
Natural Language Processing has emerged as the most popular field in Data Science. It deals
with the development of statistical models that help computers understand human language.
These statistical models are part of Machine Learning and through several of its algorithms,
are able to assist computers in understanding natural language. Python language comes with
a collection of libraries called Natural Language Toolkit (NLTK) developed for this
particular purpose only.
NLTK is widely used for various language processing techniques like tokenization,
stemming, tagging, parsing and machine learning. It consists of over 100 corpora which are a
collection of data for building machine learning models. It has a variety of applications such
as Parts of Speech Tagging, Word Segmentation, Machine Translation, Text to Speech
Speech Recognition, etc.
12. Scikit-learn
Scikit-learn is a library based in Python that is used for implementing Machine Learning
Algorithms. It is simple and easy to implement a tool that is widely used for analysis and
data science. It supports a variety of features in Machine Learning such as data
preprocessing,classification, regression, clustering, dimensionality reduction, etc
Scikit-learn makes it easy to use complex machine learning algorithms. It is therefore in
situations that require rapid prototyping and is also an ideal platform to perform research
requiring basic Machine Learning. It makes use of several underlying libraries of Python
such as SciPy, Numpy, Matplotlib, etc.
13. TensorFlow
TensorFlow has become a standard tool for Machine Learning. It is widely used for advanced
machine learning algorithms like Deep Learning. Developers named TensorFlow after
Tensors which are multidimensional arrays. It is an open-source and ever-evolving toolkit
which is known for its performance and high computational abilities. TensorFlow can run on
both CPUs and GPUs and has recently emerged on more powerful TPU platforms. This gives
it an unprecedented edge in terms of the processing power of advanced machine learning
algorithms.
Due to its high processing ability, Tensorflow has a variety of applications such as speech
recognition, image classification, drug discovery, image and language generation, etc. For
Data Scientists specializing in Machine Learning, Tensorflow is a must know tool.
14. Weka
Weka or Waikato Environment for Knowledge Analysis is a machine learning software
written in Java. It is a collection of various Machine Learning algorithms for data mining.
Weka consists of various machine learning tools like classification, clustering, regression,
visualization and data preparation.
It is an open-source GUI software that allows easier implementation of machine learning
algorithms through an interactable platform. You can understand the functioning of Machine
Learning on the data without having to write a line of code. It is ideal for Data Scientists who
are beginners in Machine Learning.
Artificial intelligence (AI) refers to the simulation of human intelligence in machines that
are programmed to think like humans and mimic their actions. The term may also be
applied to any machine that exhibits traits associated with a human mind such as learning
and problem-solving. The ideal characteristic of artificial intelligence is its ability to
rationalize and take actions that have the best chance of achieving a specific goal. A subset
of artificial intelligence is machine learning, which refers to the concept that computer
programs can automatically learn from and adapt to new data without being assisted by
humans. Deep learning techniques enable this automatic learning through the absorption
of huge amounts of unstructured data such as text, images, orvideo.
Understanding Artificial Intelligence (AI)
When most people hear the term artificial intelligence, the first thing they usually think of is
robots. That's because big-budget films and novels weave stories about human-like machines
that wreak havoc on Earth. But nothing could be further from the truth.
Artificial intelligence is based on the principle that human intelligence can be defined in a
way that a machine can easily mimic it and execute tasks, from the most simple to those
that are even more complex. The goals of artificial intelligence include mimicking human
cognitive activity. Researchers and developers in the field are making surprisingly rapid
strides in mimicking activities such as learning, reasoning, and perception, to the extent that
these can be concretely defined. Some believe that innovators may soon be able to develop
systems that exceed the capacity of humans to learn or reason out any subject. But others
remain skeptical because all cognitive activity is laced with value judgements that are subject
to human experience.
As technology advances, previous benchmarks that defined artificial intelligence become
outdated. For example, machines that calculate basic functions or recognize text through
optical character recognition are no longer considered to embody artificial intelligence, since
this function is now taken for granted as an inherent computer function.
AI is continuously evolving to benefit many different industries. Machines are wired using a cross-
disciplinary approach based on mathematics, computer science, linguistics, psychology, and more.
Algorithms often play a very important part in the structure of artificial intelligence, where simple
algorithms are used in simple applications, while more complex ones help frame strong artificial
intelligence.
Machine Learning
Machine learning is a growing technology which enables computers to learn
automatically from past data. Machine learning uses various algorithms for building
mathematical models and making predictions using historical data or information.
Currently, it is being used for various tasks such as image recognition, speech recognition,
email filtering, Facebook auto-tagging, recommender system, and many more.
This machine learning tutorial gives you an introduction to machine learning along with the
wide range of machine learning techniques such as Supervised,
Unsupervised, and Reinforcement learning. You will learn about regression and
classification models, clustering methods, hidden Markov models, and various sequential
models.
What is Machine Learning
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes therole of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with
the development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining moredata.
Suppose we have a complex problem, where we need to perform some predictions, so instead
of writing a code for it, we just need to feed the data to generic algorithms, and with the
help of these algorithms, machine builds the logic as per the data and predict the
output. Machine learning has changed our way of thinking about the problem. The below
block diagram explains the working of Machine Learning algorithm:
Features of Machine Learning:
o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount ofthe data.
Need for Machine Learning
The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes the
machine learning to make things easy for us.
We can train machine learning algorithms by providing them the huge amount of data and
let them explore the data, construct the models, and predict the required output automatically.
The performance of the machine learning algorithm depends on the amount of data, and it
can be determined by the cost function. With the help of machine learning, we can save both
time and money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and
Amazon have build machine learning models that are using a vast amount of data to analyze
the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:
o Rapid increment in the production of data
o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.
1)Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.
The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sampledata to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision ofthe teacher. The example of supervised learning is spam filtering.
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision. The training is provided to the machine with the set of data that has not been
labeled, classified,or categorized, and the algorithm needs to act on that data without any
supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects
withsimilar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:
o Clustering
o Association
3) Semi-Supervised Learning
Semi-Supervised Learning is a learning method in which a machine learns with and
without anysupervision. It is combination of Supervised and Unsupervised Learning.
4) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets
a reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the
most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
Below are some main differences between AI and machine learning along with the overview
of Artificial intelligence and machine learning.
Artificial Intelligence
Artificial intelligence is a field of computer science which makes a computer system that can
mimic human intelligence. It is comprised of two words "Artificial" and "intelligence",
which means "a human-made thinking power." Hence we can define it as,
Artificial intelligence is a technology using which we can create intelligent systems that can
simulate human intelligence.
The Artificial intelligence system does not require to be pre-programmed, instead of that,
they use such algorithms which can work with their own intelligence. It involves machine
learning algorithms such as Reinforcement learning algorithm and deep learning neural
networks. AI is being used in multiple places such as Siri, Google?s AlphaGo, AI in Chess
playing, etc.
Machine learning
Machine learning is about extracting knowledge from the data. It can be defined as,
Machine learning is a subfield of artificial intelligence, which enables machines to learn
from past data or experiences without being explicitly programmed.
Machine learning enables a computer system to make predictions or take some decisions
using historical data without being explicitly programmed. Machine learning uses a massive
amount of structured and semi-structured data so that a machine learning model can
generate accurateresult or give predictions based on that data.
Machine learning works on algorithm which learn by it?s own using historical data. It
works onlyfor specific domains such as if we are creating a machine learning model to detect
pictures of dogs, it will only give result for dog images, but if we provide a new data like cat
image then it will become unresponsive. Machine learning is being used in various places
such as for online recommender system, for Google search algorithms, Email spam filter,
Facebook Auto friend tagging suggestion, etc.
Key differences between Artificial Intelligence (AI) and Machine learning (ML):
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes
can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with the
corresponding output.
The main goal of the Classification algorithm is to identify the category of a given dataset,
andthese algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below
diagram,there are two classes, class A and Class B. These classes have features that are
similar to each other and dissimilar to other classes.
o Binary Classifier: If the classification problem has only two possible outcomes,
then it iscalled as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes,
then it iscalled as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives thetest dataset. In Lazy learner case, classification is done on the basis of the
most related data stored in the training dataset. It takes less time in training but more
time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners: Eager Learners develop a classification model based on a training
dataset before receiving a test dataset. Opposite to Lazy learners, Eager learners take
less time in training and more time in prediction. Example: Decision Trees, Naïve
Bayes, ANN.
Classification Algorithms can be further divided into the Mainly two category:
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Classification:
Classification is a process of finding a function which helps in dividing the dataset into
classes based on different parameters. In Classification, a computer program is trained on
the trainingdataset and based on that training, it categorizes the data into different
classes.
The task of the classification algorithm is to find the mapping function to map the input(x) to
thediscrete output(y).
Example: The best example to understand the Classification problem is Email Spam
Detection. The model is trained on the basis of millions of emails on different parameters,
and whenever it receives a new email, it identifies whether the email is spam or not. If the
email is spam, then it ismoved to the Spam folder.
o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Regression:
The task of the Regression algorithm is to find the mapping function to map the input
variable(x)to the continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data, and
once the training iscompleted, it can easily predict the weather for future days.
Types of Regression Algorithm:
o Simple Linear Regression
o Multiple Linear Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
The task of the regression algorithm is The task of the classification algorithm is to map
to map the input value (x) with the input value(x) with the discrete output
thecontinuous output variable(y). variable(y).
Regression Algorithms are used Classification Algorithms are used with
withcontinuous data. discretedata.
In Regression, we try to find the best In Classification, we try to find the decision
fit boundary, which can divide the dataset into
line, which can predict the output different classes.
moreaccurately.
The regression Algorithm can be The Classification algorithms can be divided into
further
Binary Classifier and Multi-class Classifier.
divided into Linear and Non-
linearRegression.
It does it by finding some similar patterns in the unlabelled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.
After applying this clustering technique, each cluster or group is provided with a cluster-
ID. MLsystem can use this id to simplify the processing of large and complex datasets.
Example: Let's understand the clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar usage are
grouped together.
Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that
we can easily find out the things. The clustering technique also works in the same way. Other
examples of clustering are grouping documents according to the topic.
The clustering technique can be widely used in various tasks. Some most common uses of
thistechnique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this
techniqueto recommend the movies and web-series to its users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can see the
different fruitsare divided into several groups with similar properties.
What is Dimensionality Reduction?
The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
A dataset contains a huge number of input features in various cases, which makes the
predictive modeling task more complicated. Because it is very difficult to visualize or make
predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.
Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information." These techniques are widely used in machine learning for obtaining a better fit
predictive model while solving the classification and regression problems.
It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.
The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
A dataset contains a huge number of input features in various cases, which makes the
predictive modeling task more complicated. Because it is very difficult to visualize or make
predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.
Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information." These techniques are widely used in machine learning for obtaining a better fit
predictive model while solving the classification and regression problems.
It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data visualization,
noise reduction, cluster analysis, etc.
1. Filters Methods
In this method, the dataset is filtered, and a subset that contains only the relevant features is
taken. Some common techniques of filters method are:
o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.
2.Wrappers Methods
The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and
evaluate the performance. The performance decides whether to add those features or remove
to increase the accuracy of the model. This method is more accurate than the filtering
method but complex to work. Some common techniques of wrapper methods are:
o Forward Selection
o Backward Selection
o Bi-directional Elimination
3. Embedded Methods: Embedded methods check the different training iterations of the
machinelearning model and evaluate the importance of each feature. Some common
techniques of Embedded methods are:
o LASSO
o Elastic Net
o Ridge Regression, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
betweenthe variables. Consider the below image:
o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go
beyond thislimit, so it forms a curve like the "S" form. The S-form curve is called the
Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the
above equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
Linear Regression:
o Linear Regression is one of the most simple Machine learning algorithm that comes
under Supervised Learning technique and used for solving regression problems.
o It is used for predicting the continuous dependent variable with the help of
independent variables.
o The goal of the Linear regression is to find the best fit line that can accurately predict
the output for the continuous dependent variable.
o If single independent variable is used for prediction then it is called Simple Linear
Regression and if there are more than two independent variables then such regression
is called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between dependent
variableand independent variable. And the relationship should be of linear nature.
o The output for Linear regression should only be the continuous values such as price,
age, salary, etc. The relationship between the dependent variable and independent
variable canbe shown in below image:
In above image the dependent variable is on Y-axis (salary) and independent variable is on x-
axis(experience). The regression line can be written as:
y= a0+a1x+ ε
Where, a0 and a1 are the coefficients and ε is the error term.
Logistic Regression:
o Logistic regression is one of the most popular Machine learning algorithm that
comesunder Supervised Learning techniques.
o It can be used for Classification as well as for Regression problems, but mainly
used forClassification problems.
o Logistic regression is used to predict the categorical dependent variable with the
help ofindependent variables.
o The output of Logistic Regression problem can be only between the 0 and 1.
o Logistic regression can be used where the probabilities between two classes is
required.Such as whether it will rain today or not, either 0 or 1, true or false etc.
o Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.In logistic
regression, we pass the weighted sum of inputs through an activation function that
can map values in between 0 and 1. Such activation function is known as sigmoid
function and the curve obtained is called as sigmoid curve or S-curve. Consider the
below image:
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a categorical dependent variable using a
given set of independent variables. given set of independent variables.
In Linear regression, we predict the value In logistic Regression, we predict the values
ofcontinuous variables. ofcategorical variables.
In linear regression, we find the best fit In Logistic Regression, we find the S-curve
line,by which we can easily predict the bywhich we can classify the samples.
output.
The output for Linear Regression must The output of Logistic Regression must be
be acontinuous value, such as price, age, a Categorical value such as 0 or 1, Yes or
etc. No, etc.
2. 8 Probability theory
There are many sources of uncertainty in ai, including variance in the specific data values, the sample of
data collected from the domain, and in the imperfect nature of any models developed from such data.
• Uncertainty is the biggest source of difficulty for beginners in machine learning, especially
developers.
• Noise in data, incomplete coverage of the domain, and imperfect models provide the three main
sources of uncertainty in machine learning.
• Probability provides the foundation and tools for quantifying, handling, and harnessing uncertainty
in applied machine learning.
Uncertainty means working with imperfect or incomplete information. Probability is a numerical description
of how likely an event is to occur or how likely it is that a proposition is true. Probability is a number
between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty.
How to compute the probability?
Given: Statistical experiment has n equally-likely outcomes, r outcome is “success”
Find: Probability of successful outcome(S)
P( S) = Number of Successes ∕ Total Number of Outcomes = r/n
Example:1
Given: 10 marbles: 2 red, 3 green, 5 blue.
Find: probability of selecting green? Solution: P(G) = 3/10= .30
A Random Variable is a set of possible values from a random experiment.
Example: Throw a die once
Random Variable X = "The score shown on the top face". X could be 1, 2, 3, 4, 5 or 6
So the Sample Space is {1, 2, 3, 4, 5, 6}
We can show the probability of any one value using this style: P(X = value) = probability of that value
X = {1, 2, 3, 4, 5, 6}
In this case they are all equally likely, so the probability of any one is 1/6
• P(X = 1) = 1/6
• P(X = 2) = 1/6
• P(X = 3) = 1/6
• P(X = 4) = 1/6
• P(X = 5) = 1/6
• P(X = 6) = 1/6
Atomic event: A complete specification of the state of the world about which the agent is uncertain. E.g., if
the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic
events:
Cavity = false Toothache = false
Cavity = true Toothache = false
Cavity = true Toothache = false
Cavity = true Toothache = true
Joint Probability
It is the likelihood of more than one event occurring at the same time.
Two types of joint probability we can find:
1. Mutual exclusive events(Without common outcomes)
2. Non Mutual exclusive events (With common outcomes)
Mutual exclusive mean occurrence of events both A and B together is impossible i.e. P(A and B)=0 and A
or B is the sum of A and B i.e. P(A or B) =P(A) + P(B)
In case of Non Mutual exclusive events A or B is the sum of A and B minus A and B i.e.
P(A or B) =P(A) + P(B) – P(A and B)
The conditional probability of an event B in relationship to an event A is the probability that event B occurs
given that event A has already occurred. The notation for conditional probability is P(B|A), read as the
probability of B given A i.e. probability of B given that an event A is already occurred.
When two events, A and B, are dependent, the probability of both occurring is:
Problem 1: A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class
passed the first test. What percent of those who passed the first test also passed the second test?
Answer: P(Second | First) = P(First and Second)/P(First) = 0.25/0.42=0.60 = 60%
Problem 2: A jar contains black and white marbles. Two marbles are chosen without replacement. The
probability of selecting a black marble and then a white marble is 0.34, and the probability of selecting a
black marble on the first draw is 0.47. What is the probability of selecting a white marble on the second
draw, given that the first marble drawn was black?
Answer: P(White | Black) = P(Black and White)/P(Black) = 0.34/0.47=.72 = 72%
Note:
The following terminologies are also used when the Bayes theorem is applied:
Hypotheses: The events E1, E2,… En is called the hypotheses
Priori Probability: The probability P(Ei) is considered as the priori probability of hypothesis Ei
Posteriori Probability: The probability P(Ei|A) is considered as the posteriori probability of hypothesis Ei
Where P(A|B) is the probability of condition when event A is occurring while event B has already occurred.
P(A ∩ B) is the probability of event A and event B
P(B) is the probability of event B
Some illustrations will improve the understanding of the concept.
Example 1:A bag I contain 4 white and 6 black balls while another Bag II contains 4 white and 3 black
balls. One ball is drawn at random from one of the bags, and it is found to be black. Find the probability that
it was drawn from Bag I.
Solution:
Let E1 be the event of choosing the bag I, E2 the event of choosing the bag II, and A be the event of drawing
a black ball.
Then,P(E1) = P(E2) = 12
Also,P(A|E1) = P(drawing a black ball from Bag I) = 610 = 35
P(A|E2) = P(drawing a black ball from Bag II) = 37
By using Bayes’ theorem, the probability of drawing a black ball from bag I out of two bags,
P(E1|A) = P(E1)P(A|E1)P(E1)P(A│E1)+P(E2)P(A|E2)
=12 × 3512 × 35 + 12 × 37 = 712
Example 2: A man is known to speak truth 2 out of 3 times. He throws a die and reports that the
number obtained is a four. Find the probability that the number obtained is actually a four.
Solution:
Let A be the event that the man reports that number four is obtained.
Let E1 be the event that four is obtained and E2 be its complementary event.
Then, P(E1) = Probability that four occurs = 16
P(E2) = Probability that four does not occurs = 1 – P(E1) = 1 −16 = 56
Also, P(A|E1) = Probability that man reports four and it is actually a four = 23
P(A|E2) = Probability that man reports four and it is not a four = 13
By using Bayes’ theorem, probability that number obtained is actually a four,
P(E1|A) =P(E1)P(A|E1)P(E1)P(A│E1) + P(E2)P(A|E2) = 16 × 2316 × 23 + 56 × 13 = 27
Problem: An engineering company advertises a job in three newspapers, A, B and C. It is known that these
papers attract undergraduate engineering readerships in the proportions 2:3:1. The probabilities that an
engineering undergraduate sees and replies to the job advertisement in these papers are 0.002, 0.001 and 0.005
respectively. Assume that the undergraduate sees only one job advertisement. (a) If the engineering company
receives only one reply to it advertisements, calculate the probability that the applicant has seen the job
advertised in place A. (i) A, (ii) B, (iii) C. (b) If the company receives two replies, what is the probability that
both applicants saw the job advertised in paper A?
1) Of the students in the college, 60% of the students reside in the hostel and 40% of the students are day
scholars. Previous year result reports that 30% of all students who stay in the hostel scored A Grade and
20% of day scholars scored A grade. At the end of the year, one student is chosen at random and found
that he/she has an A grade. What is the probability that the student is a hostlier?
Ans:
2) you are planning a picnic today, but the morning is cloudy a. Oh no! 50% of all rainy days start off
cloudy! b. But cloudy mornings are common (about 40% of days start cloudy) And this is usually a dry
month (only 3 of 30 days tend to be rainy, or 10%). What is the chance of rain during the day?
3) Jerome graphed the relationship between the temperature and the change in the number of people sledding
at a park. The x-coordinate is the temperature in
∘Cdegrees, and the y-coordinate is the change in the number of people who are sledding. Which quadrants
could contain a point showing a decrease in the number of people sledding?
Explanation: Quadrants 3 &4 represent negative integers. If 0 on the x axis representsthe average/mean
number of people there, then 3& 4 show the decreasefrom the average number
4) Covid-19 tests are common nowadays, but some results of tests are not true. Let’s assume; a diagnostic
test has 99% accuracy and 60% of all people have Covid-19. If a patient tests positive, what is the
probability that they actually have the disease?
P(positive) = 0.6*0.99+0.4*0.01=0.598
1) In a chocolate manufacturing factory, machine A, B and C produces 25%, 35% and 40% of chocolates
respectively. Out of which 5%, 4% and 2% are spoiled chocolates respectively. If a chocolate is drawn at
random is spoiled then what is the probability that it is manufactured by machine B?
P(S|A) =5 %
P(S|B) =4 % P(S|C)=2 %
2) In Exton School, 40% of the girls like music and 24% of the girls like dance. Given that 30% of those that
like music also like dance, what percent of those that like dance also like music?
P(M|D) =0.3*0.4/0
24
=50%
1) 75% of the children in Exton school have a dog, and 30% have a cat. Given that 60% of those that have
cat also have a dog, what percent of those that have a dog also have a cat?
2) 35% of the children in Exton school have a tablet, and 24% have a smart phone. Given that 42% of those
that have smart phone also have a tablet, what percent of those that have a tablet also have a smart phone?
P(P|T) =0.42*0.24/0. 35
28.8%
3) A test for a disease gives a correct positive result with a probability of 0.95 when the disease is present but
gives an incorrect positive result (false positive) with a probability of 0.15 when the disease is not present.
Ans: If 5% of the population has the disease, and Jean tests positive to the test, what is the probability Jean
really has the disease?
P(Disease)=5% P(Positive)=
P (Disease Positive) =?
25%
5) Assume that the word ‘offer’ occurs in 80% of the spam messages in my account. Also, let’s assume
‘offer’ occurs in 10% of my desired e-mails. If 30% of the received e-mails are considered as a spam, and
I will receive a new message which contains ‘offer’, what is the probability that it is spam?
P(OFFER|S) =80%
*P(DE)
1) In a particular pain clinic, 10% of patients are prescribed narcotic pain killers. Overall, five percent of the
clinic’s patients are addicted to narcotics (including pain killers and illegal substances). Out of all the
people prescribed pain pills, 8% are addicts. If a patient is an addict, what is the probability that they will
be prescribed pain pills?
Ans: P(PK)=10% P(NA)=5% P(NA|PK) =8 %
P(PK|NA) =0.08*0.1/0. 05
16%
2) In Exton School, 60% of the boys play football and 36% of the boys play ice hockey. Given that 40% of
those that play football also play ice hockey, what percent of those that play ice hockey also play football?
P(F|IH) =0.4*0.6/0. 36
66.6%
6) . Imagine you are a financial analyst at an investment bank. According to your research of publicly-traded
companies, 60% of the companies that increased their share price by more than 5% in the last three years
replaced their CEOs during the period.
Ans: At the same time, only 35% of the companies that did not increase their share price by more than 5%in
the same period replaced their CEOs. Knowing that the probability that the stock prices grow by more than
5% is 4%, find the probability that the shares of a company that fires its CEO will increase by more than 5%
P(A|B) – the probability of the stock price increases by 5% given that the CEO has been replaced
P(B|A) – the probability of the CEO replacement given the stock price has increased by 5%.
7) A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class passed the
first test. What percent of those who passed the first test also passed the second test?
number of those who passed the first test is 42, while the number of those
who passed both tests is 25.
1) Physics teacher gave her class two tests. 35% of the class passed both tests and 62% of the class passed
the first test. What percent of those who passed the first test also passed the second test?
Ans: The probability of both tests P (both tests) = 35% The probability of 1st test P (1st test) = 62%
The probability of 1st test and 2nd test P (1st test also 2nd test) =
= 35% / 62%
=0.5645
= 56.45%
2) . RTPCR tests are common nowadays, but some results of tests are not true. Let’s assume; a RTPCR test
has 60% accuracy and 80% of all people have Omicron. If a patient tests positive, what is the probability
that they actually have the disease?
If 5% of the population has the disease, and Jean tests positive to the test, what is the probability Jean really
has the disease?
Ans: If a RTPCR test has 60% accuracy, this means that the test is correct 60% of the time and incorrect 40%
of the time. If 80% of the population has Omicron, this means that 80% of the people who test positive for
Omicron actually have the disease, and 20% of the people who test positive for Omicron do not actually have
the disease. Therefore, if a patient tests positive for Omicron, the probability that they actually have the
disease is 80%.
In the second part of the question, if 5% of the population has the disease, and Jean tests positive for the
disease, the probability that Jean actually has the disease is 5%. This is because, regardless of the accuracy of
the test, the overall prevalence of the disease in the population remains the same
WHAT IS CARTESIAN PLANE?
A cartesian plane is part of the cartesian coordinate system This coordinate
system can be translated into one, two, and three dimensions. In two
dimensions, the plane is called the cartesian plane. It can also be called the
coordinate plane
A cartesian plane can be divided into three major parts. These three parts are vital when we try
to locate a point on the cartesian plane or draw the graph of a certain function. These are given
below as follows:
Axes - The two lines that intersect to form the cartesian plane are known as the axes.
The horizontal line is called the x-axis. The vertical line that is perpendicular to the x-axis is
known as the y- axis.
Origin - The point where the two perpendicular axes - x and y meet is known as the origin. The
coordinates of the origin are given by (0, 0). The axes are divided into two equal parts bythe
origin.
Quadrants - When the x and the y axes intersect, it divides the cartesian plane into 4 regions.
These are known as quadrants and extend infinitely.
WHAT ARE QUADRANTS?
Quadrant is the region enclosed by the intersection of the X-axis and the Y-axis. On the cartesian
plane when the two axes, X-axis and Y-axis, intersect with each other at 90º there are four regions
formed around it, and those regions are called quadrants. So, every plane has four quadrants each
bounded by half of the axes. Each quadrant is denoted by Roman numerals and named as
Quadrant I, Quadrant II, Quadrant III, and Quadrant IV based on their position with respect to the
axes.
WHAT ARE QUADRANTS?
In the cartesian system, a plane is divided into four regions by a horizontal line called X-axis and
a vertical line called Y-axis. These four regions are known as quadrants.
The positive direction will be upwards and towards the right while the negative direction is
downwards and to the left.
PROBLEM ON CARTESIAN PLANE
Dexter plans and records his spending. In the coordinate plane below, the x-axis represents the
number of days before or after today. The y-axis represents the amount of money Dexter planned
to spend in dollars. The points (-5, 8) and (2, 8) represent Dexter's planned spending on two
separate days. Plot the two points on the coordinate plane. How many days separate the two days
Dexter planned to spend $8
(b) How many days separate the two days Dexter planned to spend $8? Ans:
sqrt((x2-x1) 2 -(y2-y1)
=sqrt ((2-(-5))2 -(8-8)2
=sqrt (2+5)2
=sqrt (49)
=7 days
The equation of a line can be formed with the help of the slope of the line and a
point on the line. Let us understand more about the slope of the line and the
needed point on the line, to better understand the formation of the equation of a
line. The slope of the line is the inclination of the line with the positive x- axis
and is expressed as a numeric integer, fraction, or the tangent of the angle
it makes with the positive x-axis. The point refers to a point in the coordinate
system with the x coordinate and the y coordinate
The general equation of a line in two variables of the first degree is represented asAx
+ By +C = 0,
INTERCEPT FORM
We know that the equation of a straight line in slope-intercept form is given as:
y = mx + c
Where m indicates the slope of the line and c is the y-intercept When B
≠ 0 then, the standard equation of first-degree Ax + By + C = 0 can be rewritten in slope-
intercept form as:
y = (− A/B) x − (C/B)
60
The intercept of a line is the point through which the line crosses the x-axis or y-axis. Suppose a
line cuts the x-axis and y-axis at (a, 0) and (0, b), respectively. Then, the equation of a line
making intercepts equal to a and b on the x-axis and the y-axis respectively is given by:
x/a + y/b = 1
Now in case of the general form of the equation of the straight line, i.e., Ax+By+C = 0, if C ≠0,
then Ax + By + C = 0 can be written as;
NORMAL FORM
The equation of the line whose length of the perpendicular from the origin is p and the angle
made by the perpendicular with the positive x-axis is given by α is given by:
cos α = B sin α = – p
From this we can say that cos α = -p/A and sin α = -p/B.
Also it can be inferred that,
If two points (x1, y1) and (x2, y2) are said to lie on the same side of the line Ax + By +
C = 0, then the expressions Ax1+ By1 + C and Ax2 + By2 + C will have the same sign
or else these points would lie on the opposite sides of the line.
WHAT IS A GRAPH?
TYPES OF GRAPHS
NULL GRAPH
The Null Graph is also known as the order zero graph. The term "null graph"
refers to a graph with an empty edge set. In other words, a null graph has no
edges, and the null graph is present with only isolated vertices in the graph.
The image displayed above is a null or zero graphs because it has zero edges
between the three vertices of the graph.
TRIVIAL GRAPH
A graph is called a trivial graph if it has only one vertex present in it. The trivial
graph is the smallest possible graph that can be created with the least number of
vertices that is one vertex only.
The above is an example of a trivial graph having only a single vertex in the
whole graph named vertices A.
NON-DIRECTED GRAPH
A graph is called a non-directed graph if all the edges present between any
graph nodes are non-directed. By non-directed edges, we mean the edges of the
graph that cannot be determined from the node it is starting and at which node it
is ending. All the edges for a graph need to be non-directed to call it a non-
directed graph. All the edges of a non-directed graph don't have any direction.
The graph that is displayed above is an example of a disconnected graph. This graph is called a disconnected
graph because there are four vertices named vertex A, vertex B, vertex C, and vertex D. There are also
exactly four edges between these vertices of the graph. And all the vertices that are present between the
different nodes of the graph are not directed, which meansthe edges don't have any specific direction.
For example, the edge between vertex A and vertex B doesn't have any direction, so we cannot
determine whether the edge between vertex A and vertex B starts from vertex A or vertex B.
Similarly, we can't determine the ending vertex of this edge between these nodes.
DIRECTED GRAPH
Another name for the directed graphs is digraphs. A graph is called a directed graph or digraph if
all the edges present between any vertices or nodes of the graph are directed or have a defined
direction. By directed edges, we mean the edges of the graph that have a direction to determine
from which node it is starting and at which node it is ending.
All the edges for a graph need to be directed to call it a directed graph or digraph. All the edgesof a
directed graph or digraph have a direction that will start from one vertex and end at another.
The graph that is displayed above is an example of a connected graph. This graph is called a
connected graph because there are four vertices in the graph named vertex A, vertex B, vertex C,
and vertex D. There are also exactly four edges between these vertices of the graph and all
the vertices that are present between the different nodes of the graph are directed (or pointing to
some of the vertices) which means the edges have a specific direction assigned to them.
For example, consider the edge that is present between vertex D and vertex A. This edge showsthat
an arrowhead is pointing towards vertex A, which means this edge starts from vertex D and ends
at vertex A.
CONNECTED GRAPH
For a graph to be labelled as a connected graph, there must be at least a single path between
every pair of the graph's vertices. In other words, we can say that if we start from one vertex, we
should be able to move to any of the vertices that are present in that particular graph, which means
there exists at least one path between all the vertices of the graph.
The graph shown above is an example of a connected graph because we start from any one of the
vertices of the graph and start moving towards any other remaining vertices of the graph. There
will exist at least one path for traversing the graph.
For example, if we begin from vertex B and traverse to vertex H, there are various paths for
traversing. One of the paths is
Vertices B -> vertices C -> vertices D -> vertices F -> vertices E -> vertices H.
Similarly, there are other paths for traversing the graph from vertex B to vertex H. there is at
least one path between all the graph nodes. In other words, we can say that all the vertices or
nodes of the graph are connected to each other via edge or number of edges.
DISCONNECTED GRAPH
graph is said to be a disconnected graph where there does not exist any path between at least one
pair of vertices. In other words, we can say that if we start from any one of the vertices of the
graph and try to move to the remaining present vertices of the graph and there exists not even a
single path to move to that vertex, then it is the case of the disconnected graph. If any one of such
a pair of vertices doesn't have a path between them, it is called a disconnected graph.
The graph shown above is a disconnected graph. The above graph is called a disconnected graph
because at least one pair of vertices doesn't have a path to traverse starting from either node.For
example, a single path between both vertices doesn't exist if we want to traverse from vertex A to
vertex G. In other words, we can say that all the vertices or nodes of the graph are not connected
to each other via edge or number of edges so that they can be traversed.
REGULAR GRAPH
For a graph to be called a regular, it should satisfy one primary condition: all graph vertices
should have the same degree. By the degree of vertices, we mean the number of nodes associated
with a particular vertex. If all the graph nodes have the same degree value, then the graph is
called a regular graph. If all the vertices of a graph have the degree value of 6, then the graph is
called a 6-regular graph. If all the vertices in a graph are of degree 'k', then it is called a "k-
regular graph".
The graphs that are displayed above are regular graphs. In graph 1, there are three vertices named
vertex A, vertex B, and vertex C, All the vertices in graph 1, have the degree of each node as 2.
The degree of each vertex is calculated by counting the number of edges connected to that
particular vertex.
For vertex A in graph 1, there are two edges associated with vertex A, one from vertex B and
another from vertex D. Thus, the degree of vertex A of graph one is 2. Similarly, for other
vertices of the graph, there are only two edges associated with each vertex, vertex B and vertex
D. Therefore, vertex B and vertex Dare 2. As the degree of all the three nodes of the graph isthe
same, that is 2. Therefore, this graph is called a 2-regular graph.
Similarly, for the second graph shown above, there are four vertices named vertex E, vertex F,
vertex G, and vertex F. The degree of all the four vertices of this graph is 2. Each vertex of the
graph is 2 because only two edges are associated with all of the graph's vertices. As all the nodes
of this graph have the same degree of 2, this graph is called a REGULAR GRAPH
COMPLETE GRAPH
A graph is said to be a complete graph if, for all the vertices of the graph, there exists an edge
between every pair of the vertices. In other words, we can say that all the vertices are connected to
the rest of all the vertices of the graph. A complete graph of 'n' vertices contains exactly nC2
edges, and a complete graph of 'n' vertices is represented as Kn.
here are two graphs name K3 and K4 shown in the above image, and both graphs are complete
graphs. Graph K3 has three vertices, and each vertex has at least one edge with the rest of the
vertices. Similarly, for graph K4, there are four nodes named vertex E, vertex F, vertex G, and
vertex H. For example, the vertex F has three edges connected to it to connect it to the respective
three remaining vertices of the graph. Likewise, for the other three reaming vertices,
there are three edges associated with each one of them. As all the vertices of this graph have a
separate edge for other vertices, it is called a complete graph.
CYCLE GRAPH
If a graph with many vertices greater than three and edges form a cycle, then the graph is called a
cycle graph. In a graph of cycle type, the degree of all the vertices of the cycle graph will be2.
There are three graphs shown in the above image, and all of them are examples of the cyclic
graph because the number of nodes for all of these graphs is greater than two and the degree of all
the vertices of all these graphs is exactly 2.
CYCLIC GRAPH
For a graph to be called a cyclic graph, it should consist of at least one cycle. If a graph has a
minimum of one cycle present, it is called a cyclic graph.
The graph shown in the image has two cycles present, satisfying the required condition for a
graph to be cyclic, thus making it a cyclic graph.
FINITE GRAPH
A graph G= (V, E) in case the number of vertices and edges in the graph is finite in number.
INFINITE GRAPH
number of edges.
SIMPLE GRAPH
A simple graph is a graph that does not contain more than one edge between the pair of
vertices. A simple railway track connecting different cities is an example of a simple graph.
MULTI GRAPH
Any graph which contains some parallel edges but doesn’t contain any self-loop is called a
multigraph. For example, a Road Map.
Parallel Edges: If two vertices are connected with more than one edge then suchedges are called parallel
edges that are many routes but one destination.
Loop: An edge of a graph that starts from a vertex and ends at
PSEUDO GRAPH
A graph G with a self-loop and some multiple edges is called a pseudo graph.
73
A graph G = (V, E) is said to be a bipartite graph if its vertex set V(G) can be partitioned into
two non-empty disjoint subsets. V1(G) and V2(G) in such a way that each edge e of E(G)has
one end in V1(G) and another end in V2(G). The partition V1 U V2 = V is called Bipartite of G.
Here in the figure: V1(G)= {V5, V4, V3} and V2(G)= {V1, V2}
DIGRAPH GRAPH
A graph G = (V, E) with a mapping f such that every edge maps onto some ordered pair of
vertices (Vi, Vj) are called a Digraph. It is also called Directed Graph. The ordered pair (Vi, Vj)
means an edge between Vi and Vj with an arrow directed from Vi to Vj. Here in the figure: e1 =
(V1, V2) e2 = (V2, V3) e4 = (V2, V4)
SUB GRAPH
A graph G1 = (V1, E1) is called a subgraph of a graph G (V, E) if V1(G) is a subset of V(G)and
E1(G) is a subset of E(G) such that each edge of G1 has same end vertices as in G.
EXPONENTS
The exponent of a number shows how many times the number is multiplied by itself. For
example, 2×2×2×2 can be written as 24, as 2 is multiplied by itself 4 times. Here, 2 is called the
"base" and 4 is called the "exponent" or "power." In general, xn means that x is multipliedby
itself for n times.
PROPERITES OF EXPONENTS
The properties of exponents or laws of exponents are used to solve problems
involving exponents. These properties are also considered as major exponents rules
to be followed whilesolving exponents.
NEGATIVE EXPONENT
A negative exponent tells us how many times we have to multiply the reciprocal of
the base. For example, if it is given that a-n, it can be expanded as 1/an. It means
we have to multiply the reciprocal of a, i.e 1/a 'n' times. Negative exponents are
used while writing fraction with exponents. Some of the examples of negative
exponents are 2 × 3-9, 7-3, 67-5, etc.