0% found this document useful (0 votes)
10 views17 pages

Machine Learning Techniques For Sensor Data Analysis

The document describes analyzing a dataset containing features and targets related to balancing scales using machine learning techniques. It provides an overview of the dataset, explores the data distribution and applies KNN and logistic regression models to classify the tipping direction, achieving accuracies of 83.2% and 84% respectively.

Uploaded by

18btras051
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
10 views17 pages

Machine Learning Techniques For Sensor Data Analysis

The document describes analyzing a dataset containing features and targets related to balancing scales using machine learning techniques. It provides an overview of the dataset, explores the data distribution and applies KNN and logistic regression models to classify the tipping direction, achieving accuracies of 83.2% and 84% respectively.

Uploaded by

18btras051
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 17

Machine Learning Techniques

for Sensor Data Analysis

Project: Balance Scale

Shrutesh Doshi
23-03-06
Dataset Characteristics
Multivariate

Subject Area
Social Science

Associated Tasks
Classification

Feature Type
Categorical

# Instances
625

# Features
4
Overview:
The Balance Scale dataset is aimed at modelling
psychological experimental results related to balancing
scales. Each instance in the dataset is classified based
on whether the balance scale tips to the right, tips to
the left, or remains balanced. The dataset contains
features representing the weights and distances on the
left and right sides of the balance scale.
Additional Information:
• Dataset Origin: The dataset was obtained from the
UCI Machine Learning Repository.
• Features:
• right-distance: Distance on the right side of the
balance scale.
• right-weight: Weight on the right side of the
balance scale.
• left-distance: Distance on the left side of the
balance scale.
• left-weight: Weight on the left side of the balance
scale.
• Target Variable:
• class: Tipping direction of the balance scale
(balanced, tips to the right, or tips to the left).
• from ucimlrepo import fetch_ucirepo, list_available_datasets

• # check which datasets can be imported
• #list_available_datasets()

• # import dataset
• balance_scale= fetch_ucirepo(id=12)
• # alternatively: fetch_ucirepo(name='Heart Disease')

• # access data
• X = balance_scale.data.features
• y = balance_scale.data.targets
• # train model e.g. sklearn.linear_model.LinearRegression().fit(X,
y)

• # access metadata
• print(balance_scale.metadata.uci_id)
• print(balance_scale.metadata.num_instances)
• print(balance_scale.metadata.additional_info.summary)

• # access variable info in tabular format
• print(balance_scale.variables)

12
625
This data set was generated to model psychological
experimental results. Each example is classified as
having the balance scale tip to the right, tip to the left,
or be balanced. The attributes are the left weight, the
left distance, the right weight, and the right distance.
The correct way to find the class is the greater of (left-
distance * left-weight) and (right-distance * right-
weight). If they are equal, it is balanced.
Name Role Type Demographic Description Units
Right Feature Categorical None L, B, R None
Distance
Right Feature Categorical None 1,2,3,4,5 None
Weight
Left Feature Categorical None 1,2,3,4,5 None
Distance
Left Feature Categorical None 1,2,3,4,5 None
Weight
Class Target Categorical None 1,2,3,4,5 None

missing_values
0 no
1 no
2 no
3 no
4 no
Printing first few rows of features and target:
# Print the first few rows of features
print("Features (X):")
print(X.head())

# Print the first few rows of the target


print("\nTarget (y):")
print(y.head())

Plotting the distribution of each feature and target


variable:
import matplotlib.pyplot as plt

# Plotting the distribution of each categorical feature


fig, axs = plt.subplots(2, 2, figsize=(12, 8))
axs = axs.flatten()

categorical_features = ['right-distance', 'right-weight', 'left-


distance', 'left-weight']
for i, feature in enumerate(categorical_features):
counts = X[feature].value_counts()
axs[i].bar(counts.index, counts.values)
axs[i].set_title(f'Distribution of {feature}')
axs[i].set_xlabel(feature)
axs[i].set_ylabel('Count')

plt.tight_layout()
plt.show()

# Plotting the distribution of the target variable


plt.figure(figsize=(8, 6)) # Adjust the figure size here
class_counts = y['class'].value_counts()
plt.bar(class_counts.index, class_counts.values)
plt.title('Distribution of Tipping Direction')
plt.xlabel('Tipping Direction')
plt.ylabel('Count')
plt.show()
Plotting the cumulative sum of counts for each feature
and target variable:

import matplotlib.pyplot as plt

# Plotting the cumulative sum of counts for each categorical feature


fig, axs = plt.subplots(2, 2, figsize=(12, 8))
axs = axs.flatten()

categorical_features = ['right-distance', 'right-weight', 'left-


distance', 'left-weight']
for i, feature in enumerate(categorical_features):
counts = X[feature].value_counts().sort_index().cumsum()
axs[i].plot(counts.index, counts.values, marker='o')
axs[i].set_title(f'Cumulative Count of {feature}')
axs[i].set_xlabel(feature)
axs[i].set_ylabel('Cumulative Count')

plt.tight_layout()
plt.show()

# Plotting the cumulative sum of counts for the target variable


plt.figure(figsize=(8, 6))
class_counts = y['class'].value_counts().sort_index().cumsum()
plt.plot(class_counts.index, class_counts.values, marker='o')
plt.title('Cumulative Count of Tipping Direction')
plt.xlabel('Tipping Direction')
plt.ylabel('Cumulative Count')
plt.show()
Analysis:
K-Nearest Neighbors (KNN) Algorithm:
• Accuracy: 83.2%
• Methodology:
• The dataset was split into training and testing
sets.
• A KNN classifier was trained on the training
data with 5 neighbors.
• The model was evaluated using the accuracy
metric on the test set.
• Confusion Matrix:
• The confusion matrix reveals the model's
performance in classifying instances into
balanced, right, or left categories.
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Initialize the KNN classifier


knn = KNeighborsClassifier(n_neighbors=5) # You can adjust the number
of neighbors (k) as needed

# Train the classifier on the training data


knn.fit(X_train, y_train)

# Predict the classes for the test set


y_pred = knn.predict(X_test)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

import matplotlib.pyplot as plt


import seaborn as sns
from sklearn.metrics import confusion_matrix

# Predictions for KNN model


y_pred_knn = knn.predict(X_test)

# Confusion matrix for KNN


knn_cm = confusion_matrix(y_test, y_pred_knn)
plt.figure(figsize=(8, 6))
sns.heatmap(knn_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix for KNN")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
Logistic Regression Algorithm:
• Accuracy: 84%
• Methodology:
• The dataset was split into training and testing
sets.
• A logistic regression model was trained on the
training data.
• The model was evaluated using the accuracy
metric on the test set.
• Confusion Matrix:
• The confusion matrix illustrates the model's
performance in classifying instances into
balanced, right, or left categories.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Initialize the logistic regression model


logistic_regression = LogisticRegression(max_iter=1000) # You can
adjust parameters as needed

# Train the model on the training data


logistic_regression.fit(X_train, y_train)

# Predict the classes for the test set


y_pred = logistic_regression.predict(X_test)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

y_pred_lr = logistic_regression.predict(X_test)
# Confusion matrix for Logistic Regression
lr_cm = confusion_matrix(y_test, y_pred_lr)
plt.figure(figsize=(8, 6))
sns.heatmap(lr_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix for Logistic Regression")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
Conclusion:
Both the KNN and Logistic Regression algorithms were
able to achieve respectable accuracies on the Balance
Scale dataset. The KNN model achieved an accuracy of
83.2%, while the accuracy of the Logistic Regression
model is 84%. These results indicate that both
algorithms are suitable for modeling and predicting the
tipping direction of balance scales based on the
provided features.

You might also like