Assignment 1: Q1. Task Description
Assignment 1: Q1. Task Description
Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well
as your name and collaborators below:
You may remove raise NotImplementedError() line and write your own code.
NAME = ""
WSU_ID = ""
Assignment 1
Q1. Task Description
1. Data Generation:
• Generate two normal distributions each of size 30,000 with feature dimension=100.
(Hint: Shape(2*30000, 100))
• You may choose arbitrary mean and standard deviation.
• Label the first set of feature vectors as "+1" and the second set of feature vectors as
either "-1" or "0."
alt text
import numpy as np
x_train.shape
(48000, 100)
3. Classifier Implementation
• Implement Linear Classifiers using three different methods. (i.e. Write code from scratch.
Refer lecture slides)
– Random Method
– Perceptron Method
– Gradient Descent
import numpy as np
Args:
x (numpy.ndarray): Input data with shape (num_samples,
num_features).
num_epochs (int): Number of training epochs.
Returns:
predictions (numpy.ndarray): Randomly assigned labels (+1 or -1)
with shape (num_samples,).
val_accuracy (list): List of validation accuracies after each
epoch.
"""
num_samples = x.shape[0]
val_accuracy = []
num_epochs = 10
# Use the random classifier with epochs
random_predictions, val_accuracy = random_classifier(x_train, y_train,
x_val, y_val, num_epochs=num_epochs)
import numpy as np
class Perceptron:
def __init__(self, learning_rate, epochs):
self.weights = None
self.bias = None
self.learning_rate = learning_rate
self.epochs = epochs
train_accuracies = []
val_accuracies = []
# GRADIENT DESCENT
def gradient_descent(X, y, theta, iterations, alpha):
past_costs = []
past_thetas = [theta]
for i in range(iterations):
prediction = np.dot(X, theta)
error = prediction - y
cost = 1 / (2 * m) * np.dot(error.T, error)
past_costs.append(cost)
theta = theta - (alpha * (1 / m) * np.dot(X.T, error))
past_thetas.append(theta)
# Pass the relevant variables to the function and get the new values
back...
past_thetas, past_costs = gradient_descent(x_train, y_train, theta,
iterations, alpha)
theta = past_thetas[-1]
4. Performance Evaluation
• Evaluate the performance of each method on the test set using various evaluation
metrics discussed in the class, such as accuracy, precision, recall, F1-score. (You are
expected to write the code from scratch for each of the metrics.)
import numpy as np
Args:
y_true (numpy.ndarray): True labels with shape (num_samples,).
y_pred (numpy.ndarray): Predicted labels with shape
(num_samples,).
Returns:
float: Accuracy.
"""
correct = np.sum(y_true == y_pred)
total = len(y_true)
return correct / total
Args:
y_true (numpy.ndarray): True labels with shape (num_samples,).
y_pred (numpy.ndarray): Predicted labels with shape
(num_samples,).
Returns:
float: Precision.
"""
true_positive = 0
false_positive = 0
for i in range(len(y_true)):
if y_true[i] == 0 and y_pred[i] == 0:
true_positive += 1
elif y_true[i] == 0 and y_pred[i] == 1:
false_positive += 1
# Check for zero denominator
if true_positive + false_positive == 0:
return 0.0 # Return 0 precision if denominator is zero
Args:
y_true (numpy.ndarray): True labels with shape (num_samples,).
y_pred (numpy.ndarray): Predicted labels with shape
(num_samples,).
Returns:
float: Recall.
"""
true_positive = 0
false_negative = 0
for i in range(len(y_true)):
if y_true[i] == 0 and y_pred[i] == 0:
true_positive += 1
elif y_true[i] == 0 and y_pred[i] == 1:
false_negative += 1
return true_positive / (true_positive + false_negative)
Args:
y_true (numpy.ndarray): True labels with shape (num_samples,).
y_pred (numpy.ndarray): Predicted labels with shape
(num_samples,).
Returns:
float: F1-score.
"""
prec = precision(y_true, y_pred)
rec = recall(y_true, y_pred)
# Ex
# Predict using the trained model (w and bias obtained from training)
random_pred = np.random.choice([0, 1], size=len(x_test))
# Calculate evaluation metrics
acc = accuracy(y_test, random_pred)
prec = precision(y_test, random_pred)
rec = recall(y_test, random_pred)
f1 = f1_score(y_test, random_pred)
preceptron_pred = perceptron.predict(x_test)
acc = accuracy(y_test, preceptron_pred)
prec = precision(y_test, preceptron_pred)
rec = recall(y_test, preceptron_pred)
f1 = f1_score(y_test, preceptron_pred)
Best Method is Perceptron and it was decided based on the accuracy, precision, recall, F1-
Score
By following these steps, you can systematically generate, split, implement, and
evaluate the performance of different linear classification methods to identify the
most suitable one for your specific problem.
Extra Credits
• Implement cross validation on the best method and show the results. (Refer the lecture
slides for more information)
import numpy as np
class Perceptron:
def __init__(self, learning_rate, epochs):
self.weights = None
self.bias = None
self.learning_rate = learning_rate
self.epochs = epochs
for i in range(k):
# Split the data into training and validation sets for this
fold
val_start = i * fold_size
val_end = (i + 1) * fold_size
x_val_fold = X[val_start:val_end]
y_val_fold = y[val_start:val_end]
return val_accuracies
learning_rate = 0.01
epochs = 3
k = 5 # Number of folds for cross-validation