Machine Learning Techniques For Sensor Data Analysis
Machine Learning Techniques For Sensor Data Analysis
Shrutesh Doshi
23-03-06
Dataset Characteristics
Multivariate
Subject Area
Social Science
Associated Tasks
Classification
Feature Type
Categorical
# Instances
625
# Features
4
Overview:
The Balance Scale dataset is aimed at modelling
psychological experimental results related to balancing
scales. Each instance in the dataset is classified based
on whether the balance scale tips to the right, tips to
the left, or remains balanced. The dataset contains
features representing the weights and distances on the
left and right sides of the balance scale.
Additional Information:
• Dataset Origin: The dataset was obtained from the
UCI Machine Learning Repository.
• Features:
• right-distance: Distance on the right side of the
balance scale.
• right-weight: Weight on the right side of the
balance scale.
• left-distance: Distance on the left side of the
balance scale.
• left-weight: Weight on the left side of the balance
scale.
• Target Variable:
• class: Tipping direction of the balance scale
(balanced, tips to the right, or tips to the left).
• from ucimlrepo import fetch_ucirepo, list_available_datasets
•
• # check which datasets can be imported
• #list_available_datasets()
•
• # import dataset
• balance_scale= fetch_ucirepo(id=12)
• # alternatively: fetch_ucirepo(name='Heart Disease')
•
• # access data
• X = balance_scale.data.features
• y = balance_scale.data.targets
• # train model e.g. sklearn.linear_model.LinearRegression().fit(X,
y)
•
• # access metadata
• print(balance_scale.metadata.uci_id)
• print(balance_scale.metadata.num_instances)
• print(balance_scale.metadata.additional_info.summary)
•
• # access variable info in tabular format
• print(balance_scale.variables)
12
625
This data set was generated to model psychological
experimental results. Each example is classified as
having the balance scale tip to the right, tip to the left,
or be balanced. The attributes are the left weight, the
left distance, the right weight, and the right distance.
The correct way to find the class is the greater of (left-
distance * left-weight) and (right-distance * right-
weight). If they are equal, it is balanced.
Name Role Type Demographic Description Units
Right Feature Categorical None L, B, R None
Distance
Right Feature Categorical None 1,2,3,4,5 None
Weight
Left Feature Categorical None 1,2,3,4,5 None
Distance
Left Feature Categorical None 1,2,3,4,5 None
Weight
Class Target Categorical None 1,2,3,4,5 None
missing_values
0 no
1 no
2 no
3 no
4 no
Printing first few rows of features and target:
# Print the first few rows of features
print("Features (X):")
print(X.head())
plt.tight_layout()
plt.show()
plt.tight_layout()
plt.show()
y_pred_lr = logistic_regression.predict(X_test)
# Confusion matrix for Logistic Regression
lr_cm = confusion_matrix(y_test, y_pred_lr)
plt.figure(figsize=(8, 6))
sns.heatmap(lr_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix for Logistic Regression")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
Conclusion:
Both the KNN and Logistic Regression algorithms were
able to achieve respectable accuracies on the Balance
Scale dataset. The KNN model achieved an accuracy of
83.2%, while the accuracy of the Logistic Regression
model is 84%. These results indicate that both
algorithms are suitable for modeling and predicting the
tipping direction of balance scales based on the
provided features.