Introduction to Classification and Classification Algorithms
Introduction to Classification and Classification Algorithms
Classification is a type of machine learning where we teach a computer to sort things into groups or
categories. For example, we can train it to decide if an email is "Spam" or "Not Spam," or if a picture
is of a "Dog," "Cat," or "Bird."
The computer learns by looking at examples (called training data) where the answer (or category) is
already known. Once trained, it can predict the category for new, unseen data.
Key Concepts:
• Features: These are the individual characteristics or attributes of the data points that are
used to classify them.
• Labels: These are the predefined categories or classes that the data points are assigned to.
• Training Data: This is the labeled data that the algorithm learns from.
• Test Data: This is the unlabeled data that the algorithm is evaluated on.
1. Logistic Regression:
A simple method that predicts whether something belongs to one group or another.
2. Decision Trees:
Works like a flowchart. It asks questions (based on features) to reach a decision.
3. Random Forest:
Combines many decision trees to make better predictions.
5. Naive Bayes:
Uses probabilities to make predictions, assuming features are independent.
7. Neural Networks:
Mimics how the human brain works to handle complex problems.
1. Data Collection:
Gather data relevant to the problem.
2. Data Preprocessing:
Clean the data, handle missing values, and encode categorical variables.
3. Feature Selection/Engineering:
Select or create features that improve model performance.
4. Model Selection:
Choose an appropriate classification algorithm based on the problem requirements.
5. Training:
Train the model using labeled data.
6. Evaluation:
Evaluate the model using metrics like accuracy, precision, recall, F1 score, and ROC-AUC.
7. Prediction:
Use the trained model to classify new, unseen data.
• Decide what you want to classify and the groups (classes) you need.
Example: Sorting emails into "Spam" or "Not Spam."
2. Collect Data
• Pick Features: Choose the details (like words in an email) that help with classification.
5. Choose a Method
• Use the test data to see how well the model works.
1. Store the Data: The algorithm keeps all the training data and doesn’t actually "train" a
model. It uses the data directly when making predictions.
2. Calculate Distance:When a new data point is given, k-NN calculates the distance between
this point and every other point in the dataset. Common distance metrics include:
3. Find Neighbors:It identifies the k closest points (neighbors) to the new data point.
4. Vote or Average:
o For Classification: The algorithm checks the class labels of these k neighbors and
assigns the new data point to the majority class.
o For Regression: It calculates the average of the values of these k neighbors to predict
the result.
3. Classify: Assign the new data point to the majority class among its neighbors.
• Training Data:
1. Ensemble Learning: Combines the predictions of multiple decision trees to make more
accurate and stable predictions.
2. Bootstrap Aggregation (Bagging): Each tree is trained on a different subset of the training
data, selected randomly with replacement.
3. Random Feature Selection: For each split in a tree, a random subset of features is
considered, reducing correlation among trees.
4. Robustness: Handles overfitting better than individual decision trees by averaging results.
1. Training Phase: Generate multiple decision trees by bootstrapping the training data.
o For each tree, select a random subset of features to split on at each node.
2. Prediction Phase:
o For classification tasks: Aggregate predictions using a majority vote across trees.
Advantages
Disadvantages
• Fraud detection
• Recommender systems
Fuzzy set theory is an extension of classical set theory to handle the concept of partial truth,
where the truth value of an element can range between 0 and 1. Unlike regular sets, where an item
either belongs to the set or doesn’t, in fuzzy sets, items can partially belong to the set with a degree
of membership.
In a fuzzy set:
Items have a membership degree between 0 and 1.
Example: A person 5’10” tall might belong to the "Tall" set with a membership of 0.8 (almost tall),
while someone 5’5” might have a membership of 0.3 (less tall)
o Height 6’ → Membership = 1
2. Fuzzy Operations:
o Union: Combines two fuzzy sets, taking the highest membership for each item.
o Intersection: Finds commonality, taking the lowest membership for each item.
3. Control Systems: Robots, washing machines, and cars use fuzzy logic to make decisions
based on imprecise inputs
• Control Systems: Fuzzy logic controllers are used in systems that require human-like
decision-making abilities, such as in washing machines, AC, and robotics.
• Data Classification: Fuzzy set theory is useful for classification tasks where the boundaries
between categories are not well-defined.
• Image Processing: Fuzzy sets are used in image segmentation, pattern recognition, and noise
reduction.
• Decision Making: Fuzzy decision support systems are used in areas like healthcare, finance,
and engineering to make decisions based on uncertain or incomplete information.
Support Vector Machine (SVM) - It is a machine learning algorithm used to classify data into
different categories. The idea is to draw a line that separates the different classes of data as clearly as
possible. SVM tries to find the best boundary (called a hyperplane) that divides the data into two
classes.
Key Points
1. Hyperplane: A hyperplane is the boundary or line that separates different classes of data. In
2D, it’s a line, and in 3D, it’s a plane.
2. Support Vectors: The support vectors are the data points that are closest to the hyperplane.
These points are the most important for deciding where to place the hyperplane.
3. Maximizing the Margin: SVM tries to place the hyperplane in a way that maximizes the
margin (the distance between the hyperplane and the nearest data points of each class).
This helps the model to classify new data points more confidently.
SVM with Non-linear Data: Sometimes, the data can’t be separated with a straight line. In these
cases, SVM uses something called a kernel to transform the data into a higher dimension, where a
straight line (hyperplane) can be used to separate the classes.
Types of Kernels:
• Polynomial Kernel: Useful for data that is not linearly separable but can be separated with a
polynomial decision boundary.
• Radial Basis Function (RBF) Kernel: A very powerful kernel that can handle complex datasets
with non-linear relationships.
• Sigmoid Kernel: Similar to the RBF kernel but based on a sigmoid function.
Example : Let’s say you have data points of Cats and Dogs based on their height and weight. SVM
will find the best line (or boundary) that separates the Cats from the Dogs. When a new animal
comes along, SVM will classify it as either a Cat or Dog based on which side of the line it falls.
1. Works well with high-dimensional data (when there are many features).
Disadvantages of SVM
2. Hard to understand because it doesn’t provide easy-to-follow rules like decision trees.