0% found this document useful (0 votes)
6 views5 pages

Module_5

Module 4 covers various classification algorithms including Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Naïve Bayes, Decision Trees, and Ensemble Learning techniques like Random Forest. Each algorithm is defined, key points are highlighted, and code implementations are provided for training and evaluating models using the Iris dataset. Additionally, it discusses evaluation metrics such as confusion matrix, accuracy, precision, recall, and F1-score.

Uploaded by

gohodoh495
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
6 views5 pages

Module_5

Module 4 covers various classification algorithms including Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Naïve Bayes, Decision Trees, and Ensemble Learning techniques like Random Forest. Each algorithm is defined, key points are highlighted, and code implementations are provided for training and evaluating models using the Iris dataset. Additionally, it discusses evaluation metrics such as confusion matrix, accuracy, precision, recall, and F1-score.

Uploaded by

gohodoh495
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 5

Module 4: Classification

Algorithms
1. Support Vector Machine (SVM)

Definition:

Support Vector Machine (SVM) is a supervised learning algorithm that aims to


find the best hyperplane that separates data into different classes.

Key Points:

• Works well for high-dimensional data.


• Uses a kernel trick to handle non-linear separations.
• Types of kernels: Linear, Polynomial, Radial Basis Function (RBF).

Code Implementation:
# Import libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix

# Load dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # Only first two features for simplicity
y = iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Train model
svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(X_train, y_train)

# Predict
y_pred = svm_model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

2. k-Nearest Neighbors (KNN)

Definition:

KNN is a lazy learning algorithm that classifies a new data point based on the
majority class of its kk nearest neighbors.

Key Points:

• Simple and effective for small datasets.


• Sensitive to the choice of kk and distance metric.

Code Implementation:
from sklearn.neighbors import KNeighborsClassifier

# Train model
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)

# Predict
y_pred = knn_model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

3. Naïve Bayes Classifier

Definition:

Naïve Bayes is a probabilistic algorithm based on Bayes' theorem, assuming


independence between features.

Code Implementation:
from sklearn.naive_bayes import GaussianNB

# Train model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict
y_pred = nb_model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

4. Decision Tree (CART and ID3)

Definition:

A Decision Tree splits data into subsets based on feature values, creating a tree-
like structure to make decisions.

Key Points:

• CART (Classification and Regression Tree): Uses Gini impurity or mean


squared error for splitting.
• ID3: Uses Information Gain based on entropy.

Code Implementation:
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

# Train model
dt_model = DecisionTreeClassifier(criterion='gini', random_state=42) #
Change to 'entropy' for ID3
dt_model.fit(X_train, y_train)

# Predict
y_pred = dt_model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Visualize the tree


plt.figure(figsize=(12, 8))
plot_tree(dt_model, feature_names=iris.feature_names[:2],
class_names=iris.target_names, filled=True)
plt.show()

5. Ensemble Learning

Definition:
Ensemble learning combines multiple models to improve performance. Two
common techniques:

• Bagging: Reduces variance by training models on different subsets of data


(e.g., Random Forest).
• Boosting: Reduces bias by training models sequentially (e.g., AdaBoost).

Random Forest Implementation (Bagging):


from sklearn.ensemble import RandomForestClassifier

# Train model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predict
y_pred = rf_model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

6. Evaluation Metrics for Classification Algorithms

1. Confusion Matrix:

• Displays the counts of true positives, true negatives, false positives, and
false negatives.

2. Accuracy:

3. Precision:

4.
Recall (Sensitivity):
5. F1-Score:

6. Gradient Descent (Optimization):

Gradient descent optimizes model parameters (like coefficients in logistic


regression) by iteratively minimizing the loss function.

Evaluation Code Example


from sklearn.metrics import classification_report

# Print evaluation metrics


print("Classification Report:\n", classification_report(y_test, y_pred))

You might also like