Machine Learning new
Machine Learning new
ML models are able to improve themselves based on more data, like user-
behavior and feedback. For example,
Voice Assistants (Siri, Alexa, Google Assistant) – Voice assistants
continuously improve as they process millions of voice inputs. They adapt
to user preferences, understand regional accents better, and handle
ambiguous queries more effectively.
Search Engines (Google, Bing) – Search engines analyze user behavior
to refine their ranking algorithms.
Self-driving Cars – Self-driving cars use data from millions of miles
driven (both in simulations and real-world scenarios) to enhance their
decision-making.
Evolution of
Machine Learning
Imagine a world where machines learn like humans, constantly evolving and improving. This
isn’t a scene from a sci-fi movie—it’s the reality of machine learning. This technology has come
a long way since its inception, and today, we’re taking you on a fascinating journey through the
milestones of machine learning, from 1805 to the present.
The Humble Beginnings: Linear Regression (1805-1809) It all started with Linear Regression,
developed independently by Adrien-Marie Legendre and Carl Friedrich Gauss. This technique,
based on the method of least squares, was a stepping stone in predictive modeling, allowing us to
forecast future trends from past data. It laid the groundwork for what was to become a revolution
in data analysis.
The Neural Network Precursor: Perceptron (1957) Fast forward to 1957, and we witness the
birth of the Perceptron by Frank Rosenblatt. This simple yet powerful model, simulating a
neuron for binary classification tasks, was labeled as the precursor to neural networks. It marked
the beginning of machines mimicking human brain functions.
The Art of Decision Making: Reinforcement Learning (1959) Richard Bellman’s invention of
Reinforcement Learning in 1959 introduced a new era of decision-making algorithms. By
teaching agents to make decisions based on rewards and penalties, this method laid the
foundation for developing autonomous systems and robotics.
Combining Weakness for Strength: Boosting Algorithms (1995) In 1995, Yoav Freund and
Robert Schapire introduced AdaBoost, an algorithm that improved prediction accuracy by
combining multiple weak learning models. This concept showed that strength could indeed be
found in numbers, or in this case, algorithms.
The Ensemble Approach: Random Forests (1995) Tin Kam Ho’s introduction of Random
Forests in 1995 brought a robust approach to classification and regression. By creating
ensembles of decision tree-like models, these forests demonstrated improved accuracy and
stability in predictions.
Sequencing Success: RNN and LSTM (1997) The development of RNN (Recurrent Neural
Networks) and LSTM (Long Short-Term Memory) networks, particularly by Sepp Hochreiter
and Jürgen Schmidhuber for LSTM, revolutionized sequential data processing. This was a
milestone in natural language processing and speech recognition, enabling machines to
understand and generate human-like language.
Visionary Machines: Deep Convolutional Neural Networks (2012) In 2012, Alex Krizhevsky,
Ilya Sutskever, and Geoffrey Hinton introduced Deep Convolutional Neural Networks. These
networks revolutionized image recognition, enabling machines to identify and classify images
with incredible accuracy, mimicking the human visual system.
The Creative AI: Generative Adversarial Networks (2014) Ian Goodfellow’s invention of
Generative Adversarial Networks in 2014 opened up new horizons in AI creativity. These
networks became groundbreaking in generating realistic images and videos, blurring the line
between AI-generated and real-life content.
Transforming Language Processing: Transformer Networks (2017) The introduction of
Transformer Networks by Ashish Vaswani and his team in 2017 marked a new era in natural
language processing. These networks, efficient in context-aware processing, became the
cornerstone for modern NLP tasks, leading to advanced models like BERT and GPT series.
From linear regression to transformer networks, the evolution of machine learning has been
nothing short of extraordinary. Each breakthrough has built upon the last, pushing the boundaries
of what’s possible with artificial intelligence. As we look to the future, one thing is certain:
machine learning will continue to evolve, transforming our world in ways we can only begin to
imagine. Stay tuned for the next chapter in this incredible journey.
Unsupervised Learning
In unsupervised learning, algorithms work with unlabeled
data to identify patterns and relationships. These methods
uncover commonalities within the data without predefined
categories. Techniques such as clustering and association
rules fall under unsupervised learning.
Reinforcement Learning
Reinforcement learning focuses on enabling intelligent
agents to learn tasks through trial-and-error interactions
with dynamic environments. Without the need for labelled
datasets, agents make decisions to maximize a reward
function. This autonomous exploration and learning approach
is crucial for tasks where explicit programming is
challenging.
Action-Reward feedback loop: an agent takes actions in an environment, which is interpreted into a
reward and a representation of the state, which are fed back into the agent.
Supervised Learning: Relies on labeled data. Each data point has a pre-defined
output or label (e.g., classifying emails as spam or not spam). The model learns the
Unsupervised Learning: Deals with unlabeled data. The goal is to identify patterns
or structures within the data itself (e.g., grouping customers with similar purchase
Reinforcement Learning: Doesn’t use labeled data. The agent interacts with the
neutral). The agent learns through trial and error to maximize future rewards.
Learning Process
(training data) what the correct output should be for a given input.
Unsupervised Learning: The model is like an explorer trying to find patterns and
Supervised Learning: Aims to learn a function that maps inputs to desired outputs
accurately.
In supervised learning, the model is trained with a training dataset that has a
correct answer key. The decision is done on the initial input given as it has all the
data that’s required to train the machine. The decisions are independent of each
Rote learning is an education method that involves repeating a piece of information many times
to embed it in a person's memory.
Ex: phonics in reading, the periodic table in chemistry, multiplication tables in mathematics,
anatomy in medicine, cases or statutes in law, basic formulae in any science, etc.
Learning by Induction:
Inductive learning is a fundamental machine learning technique that involves
using specific examples to make general predictions or generalizations. It's
also known as inductive reasoning or inductive inference.
Here are some key aspects of inductive learning:
Process
Inductive learning involves identifying common features in a set of examples,
and then using those features to create a model or hypothesis that can
predict or classify new instances.
Algorithms
Inductive learning algorithms search for relationships and structures in data,
allowing machines to classify new instances or make predictions based on
the learned patterns.
Rules
Inductive learning algorithms generate classification rules in the format of "If
this, then that". These rules determine the state of an entity at each iteration
step.
Inductive bias
Inductive learning is closely related to the concept of inductive bias. For
example, k-Nearest Neighbors (k-NN) has an inductive bias that assumes
similar data points are close to each other in feature space.
Reinforcement Learning:
Reinforcement Learning is a part of machine learning. Here, agents are
self-trained on reward and punishment mechanisms.
Reinforcement learning focuses on enabling intelligent
agents to learn tasks through trial-and-error interactions
with dynamic environments. Without the need for labelled
datasets, agents make decisions to maximize a reward
function.
It acts as a signal to positive and negative behaviors.
Basic Diagram of Reinforcement Learning
Terminologies in RL
actions to be performed
Action Space – a list of action which an agent can perform
Action -An agent’s single choice (move left, pick up object) in the
environment.
problem, the environment gives a reward. It’s usually a scalar value and
determines how rewards are assigned based on the state of the environment
actions.
Value Function – The value of state shows up the reward achieved starting
Model – Every RL agent doesn’t use a model of its environment. The agent’s
behavior, increases the strength and frequency of the behavior. It has a positive
impact on behavior.
Advantages
Disadvantage
Excess reinforcement can lead to an overload of states which would minimize the
results.
2. Negative Reinforcement
ways, when a negative condition is barred or avoided, it tries to stop this action in
the future.
Advantages
Maximized behavior
Training system which would issue custom instructions and materials with respect to
Conclusion
rewards.
TYPES OF DATA
Data Types In ML
Data Types Are A Way Of Classification That Specifies Which Type Of Value A Variable
Can Store And What Type Of Mathematical Operations, Relational, Or Logical
Operations Can Be Applied To The Variable Without Causing An Error. In Machine
Learning, It Is Very Important To Know Appropriate Datatypes Of Independent And
Dependent Variable.
1. Quantitative
2. Qualitative
The Numeric Data Which Have Discrete Values Or Whole Numbers. This Type Of
Variable Value If Expressed In Decimal Format Will Have No Proper Meaning.
Their Values Can Be Counted.
E.G.: – No. Of Cars You Have, No. Of Marbles In Containers, Students In A Class,
Etc.
The Numerical Measures Which Can Take The Value Within A Certain Range. This
Type Of Variable Value If Expressed In Decimal Format Has True Meaning. Their
Values Can Not Be Counted But Measured. The Value Can Be Infinite
These Are The Data Types That Cannot Be Expressed In Numbers. This
Describes Categories Or Groups And Is Hence Known As The Categorical Data
Type.
A. Structured Data:
This Type Of Data Is Either Number Or Words. This Can Take Numerical Values
But Mathematical Operations Cannot Be Performed On It. This Type Of Data Is
Expressed In Tabular Format.
E.G.) Sunny=1, Cloudy=2, Windy=3 Or Binary Form Data Like 0 Or1, Good Or
Bad, Etc.
This Type Of Data Does Not Have The Proper Format And Therefore Known As
Unstructured Data.This Comprises Textual Data, Sounds, Images, Videos, Etc.
Besides This, There Are Also Other Types Refer As Data Types Preliminaries Or
Data Measures:-
1. Nominal
2. Ordinal
3. Interval
4. Ratio
This Is In Use To Express Names Or Labels Which Are Not Order Or Measurable.
This Is Also A Categorical Data Type Like Nominal Data But Has Some Natural
Ordering Associated With It.
This Is Numeric Data Which Has Proper Order And The Exact Zero Means The
True Absence Of A Value Attached. Here Zero Means Not A Complete Absence
But Has Some Value. This Is The Local Scale.
E.G., Temperature Measured In Degree Celsius, Time, Sat Score, Credit Score,
PH, Etc. Difference Between Values Is Familiar. In This Case, There Is No
Absolute Zero. Absolute
Fig: Temperature, An Example Of Interval Data
Type
This Quantitative Data Type Is The Same As The Interval Data Type But Has The
Absolute Zero. Here Zero Means Complete Absence And The Scale Starts From
Zero. This Is The Global Scale.
Fig: Weight, An
Example Of Ratio Data Type
Matching
There are multiple matches for matches in machine learning, including data
matching, exact matching, and local feature matching:
Data matching
The process of identifying which records from different data sources
correspond to the same real-world entity. Machine learning models can learn
the relationship between data and what is considered a match in a specific
instance.
Exact matching
A stricter version of accuracy where all classes or labels must match exactly
for the sample to be correctly classified.
Local feature matching
A technique that has been explored in recent years with the introduction of
deep learning models. However, challenges remain in improving the
accuracy and robustness of matching due to factors like lighting and
viewpoint variations.
Probabilistic matching
A data matching technique that uses statistical methods to determine the
probability that two records represent the same entity.
Supervised learning
A subfield of machine learning that trains algorithms to make predictions or
decisions based on labeled training data
1. Problem Definition
2. Data Collection
3. Data Preparation
4. Data Visualization
5. ML Modeling
6. Feature Engineering
7. Model Deployment
These 7 stages are the key steps in our framework. We have
categorized them additionally into groups to get a better
understanding of the larger picture.
3. Production
Phase 3 — Production
In the third phase, one is taking the ML model and scaling it.
The goal is to integrate Machine Learning into a business
process solving a problem with a superior solution compared
to, for example, traditional programming. The process of
taking a trained ML model and making its predictions
available to users or other systems is known as model
deployment. Lastly, it is also essential to iterate on the ML
model over time to improve it.
1. Problem Definition
The first stage in the DDS Machine Learning Framework is
to define and understand the problem that someone is going
to solve. Start by analyzing the goals and the why behind a
particular problem statement. Understand the power of data
and how one can use it to make a change and drive results.
And asking the right questions is always a great start.
2. Data Collection
Once the goal is clearly defined, one has to start
getting the data that is needed from various available data
sources.
There are many different ways to collect data that is used for
Machine Learning. For example, focus groups, interviews,
surveys, and internal usage & user data. Also, public data
can be another source and is usually free. These include
research and trade associations such as banks, publicly-
traded corporations, and others. If data isn’t publicly
available, one could also use web scraping to get it (however,
there are some legal restrictions).
What is the most efficient way to store and access all of it?
3. Data Preparation
The third stage is the most time-consuming and labor-
intensive. Data Preparation can take up to 70% and
sometimes even 90% of the overall project time. But what is
the purpose of this stage?
Data Filtering
4. Data Visualization
Bar Chart
Box-and-whisker Plots
Bubble Cloud
Heat Map
Histogram
Network Diagram
Word Cloud
5. ML Modeling