Machine Learning Techniques For Sensor Data Analysis

Machine Learning Techniques
for Sensor Data Analysis
Project: Balance Scale
Shrutesh Doshi
23-03-06
Dataset Characteristics
Multivariate
Subject Area
Social Science
Associated Tasks
Classification
Feature Type
Categorical
# Instances
625
# Features
4
Overview:
The Balance Scale dataset is aimed at modelling
psychological experimental results related to balancing
scales. Each instance in the dataset is classified based
on whether the balance scale tips to the right, tips to
the left, or remains balanced. The dataset contains
features representing the weights and distances on the
left and right sides of the balance scale.
Additional Information:
• Dataset Origin: The dataset was obtained from the
UCI Machine Learning Repository.
• Features:
• right-distance: Distance on the right side of the
balance scale.
• right-weight: Weight on the right side of the
balance scale.
• left-distance: Distance on the left side of the
balance scale.
• left-weight: Weight on the left side of the balance
scale.
• Target Variable:
• class: Tipping direction of the balance scale
(balanced, tips to the right, or tips to the left).
• from ucimlrepo import fetch_ucirepo, list_available_datasets
•
• # check which datasets can be imported
• #list_available_datasets()
•
• # import dataset
• balance_scale= fetch_ucirepo(id=12)
• # alternatively: fetch_ucirepo(name='Heart Disease')
•
• # access data
• X = balance_scale.data.features
• y = balance_scale.data.targets
• # train model e.g. sklearn.linear_model.LinearRegression().fit(X,
y)
•
• # access metadata
• print(balance_scale.metadata.uci_id)
• print(balance_scale.metadata.num_instances)
• print(balance_scale.metadata.additional_info.summary)
•
• # access variable info in tabular format
• print(balance_scale.variables)
12
625
This data set was generated to model psychological
experimental results. Each example is classified as
having the balance scale tip to the right, tip to the left,
or be balanced. The attributes are the left weight, the
left distance, the right weight, and the right distance.
The correct way to find the class is the greater of (left-
distance * left-weight) and (right-distance * right-
weight). If they are equal, it is balanced.
Name Role Type Demographic Description Units
Right Feature Categorical None L, B, R None
Distance
Right Feature Categorical None 1,2,3,4,5 None
Weight
Left Feature Categorical None 1,2,3,4,5 None
Distance
Left Feature Categorical None 1,2,3,4,5 None
Weight
Class Target Categorical None 1,2,3,4,5 None
missing_values
0 no
1 no
2 no
3 no
4 no
Printing first few rows of features and target:
# Print the first few rows of features
print("Features (X):")
print(X.head())
# Print the first few rows of the target

print("\nTarget (y):")
print(y.head())
Plotting the distribution of each feature and target

variable:
import matplotlib.pyplot as plt
# Plotting the distribution of each categorical feature

fig, axs = plt.subplots(2, 2, figsize=(12, 8))
axs = axs.flatten()
categorical_features = ['right-distance', 'right-weight', 'left-

distance', 'left-weight']
for i, feature in enumerate(categorical_features):
counts = X[feature].value_counts()
axs[i].bar(counts.index, counts.values)
axs[i].set_title(f'Distribution of {feature}')
axs[i].set_xlabel(feature)
axs[i].set_ylabel('Count')
plt.tight_layout()
plt.show()
# Plotting the distribution of the target variable

plt.figure(figsize=(8, 6)) # Adjust the figure size here
class_counts = y['class'].value_counts()
plt.bar(class_counts.index, class_counts.values)
plt.title('Distribution of Tipping Direction')
plt.xlabel('Tipping Direction')
plt.ylabel('Count')
plt.show()
Plotting the cumulative sum of counts for each feature
and target variable:
# Plotting the cumulative sum of counts for each categorical feature

fig, axs = plt.subplots(2, 2, figsize=(12, 8))
axs = axs.flatten()
categorical_features = ['right-distance', 'right-weight', 'left-

distance', 'left-weight']
for i, feature in enumerate(categorical_features):
counts = X[feature].value_counts().sort_index().cumsum()
axs[i].plot(counts.index, counts.values, marker='o')
axs[i].set_title(f'Cumulative Count of {feature}')
axs[i].set_xlabel(feature)
axs[i].set_ylabel('Cumulative Count')
plt.tight_layout()
plt.show()
# Plotting the cumulative sum of counts for the target variable

plt.figure(figsize=(8, 6))
class_counts = y['class'].value_counts().sort_index().cumsum()
plt.plot(class_counts.index, class_counts.values, marker='o')
plt.title('Cumulative Count of Tipping Direction')
plt.xlabel('Tipping Direction')
plt.ylabel('Cumulative Count')
plt.show()
Analysis:
K-Nearest Neighbors (KNN) Algorithm:
• Accuracy: 83.2%
• Methodology:
• The dataset was split into training and testing
sets.
• A KNN classifier was trained on the training
data with 5 neighbors.
• The model was evaluated using the accuracy
metric on the test set.
• Confusion Matrix:
• The confusion matrix reveals the model's
performance in classifying instances into
balanced, right, or left categories.
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
# Initialize the KNN classifier

knn = KNeighborsClassifier(n_neighbors=5) # You can adjust the number
of neighbors (k) as needed
# Train the classifier on the training data

knn.fit(X_train, y_train)
# Predict the classes for the test set

y_pred = knn.predict(X_test)
# Calculate the accuracy of the model

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

import seaborn as sns
from sklearn.metrics import confusion_matrix
# Predictions for KNN model

y_pred_knn = knn.predict(X_test)
# Confusion matrix for KNN

knn_cm = confusion_matrix(y_test, y_pred_knn)
sns.heatmap(knn_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix for KNN")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
Logistic Regression Algorithm:
• Accuracy: 84%
• Methodology:
• The dataset was split into training and testing
sets.
• A logistic regression model was trained on the
training data.
• The model was evaluated using the accuracy
metric on the test set.
• Confusion Matrix:
• The confusion matrix illustrates the model's
performance in classifying instances into
balanced, right, or left categories.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
# Initialize the logistic regression model

logistic_regression = LogisticRegression(max_iter=1000) # You can
adjust parameters as needed
# Train the model on the training data

logistic_regression.fit(X_train, y_train)
# Predict the classes for the test set

y_pred = logistic_regression.predict(X_test)
# Calculate the accuracy of the model

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
y_pred_lr = logistic_regression.predict(X_test)
# Confusion matrix for Logistic Regression
lr_cm = confusion_matrix(y_test, y_pred_lr)
sns.heatmap(lr_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix for Logistic Regression")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
Conclusion:
Both the KNN and Logistic Regression algorithms were
able to achieve respectable accuracies on the Balance
Scale dataset. The KNN model achieved an accuracy of
83.2%, while the accuracy of the Logistic Regression
model is 84%. These results indicate that both
algorithms are suitable for modeling and predicting the
tipping direction of balance scales based on the
provided features.

Machine Learning Techniques For Sensor Data Analysis

Uploaded by

Machine Learning Techniques For Sensor Data Analysis

Uploaded by

Machine Learning Techniques

for Sensor Data Analysis

Project: Balance Scale

# Print the first few rows of the target

Plotting the distribution of each feature and target

# Plotting the distribution of each categorical feature

categorical_features = ['right-distance', 'right-weight', 'left-

# Plotting the distribution of the target variable

import matplotlib.pyplot as plt

# Plotting the cumulative sum of counts for each categorical feature

categorical_features = ['right-distance', 'right-weight', 'left-

# Plotting the cumulative sum of counts for the target variable

# Split the data into training and testing sets

# Initialize the KNN classifier

# Train the classifier on the training data

# Predict the classes for the test set

# Calculate the accuracy of the model

import matplotlib.pyplot as plt

# Predictions for KNN model

# Confusion matrix for KNN

# Split the data into training and testing sets

# Initialize the logistic regression model

# Train the model on the training data

# Predict the classes for the test set

# Calculate the accuracy of the model

You might also like